Clements A. Practical Computer Architecture With Python and ARM 2023
Clements A. Practical Computer Architecture With Python and ARM 2023
Clements A. Practical Computer Architecture With Python and ARM 2023
Every effort has been made in the preparation of this book to ensure the
accuracy of the information presented. However, the information
contained in this book is sold without warranty, either express or
implied. Neither the author, nor Packt Publishing or its dealers and
distributors, will be held liable for any damages caused or alleged to
have been caused directly or indirectly by this book.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83763-667-9
www.packtpub.com
Writing the dedication is the hardest part of writing a book. Of all
the people you have known, who do you honor? I would like to
dedicate this book to four people who have smoothed my path
through life, have been my friends, have set an example, and have
made my life worth living.
To my wife, Sue, who has been my companion and soulmate for over
half a century and has provided so much help with my writing.
To Samuel Gatley, my oldest school friend, who provides me with
inspiration and, like all good friends, is always there to speak to.
To Derek Simpson, who was my Dean at Teesside University for
many years and continues to be a friend. Derek is the most selfless
person I’ve ever met and is always willing to help anyone and
everyone.
To Patricia Egerton, a former colleague and friend who had faith in
me when I lacked it and propelled me to make a career decision that
changed the course of my life.
Contributors
Since retiring, Alan has thrown himself into photography and has
exhibited his work several times in the UK, Spain, and Venice.
Preface
Part 1: Using Python to Simulate a
Computer
10
11
12
Index
Other Books You May Enjoy
Preface
A fundamental thread of computer science is computer architecture. This
topic was once called computer hardware and is concerned with the
physical computer itself; that is, the central processing unit (CPU),
memory, buses, and peripherals. Computer hardware contrasts with
computer software, which applies to the programs, applications, and
operating systems that computers execute.
The part of computer science that deals with how a computer implements
the actions of its architecture is called computer organization and is
largely beyond the scope of this text. Computer organization is
concerned with the gates and circuits of the computer.
So, how does Python fit into this scheme? Python is a popular high-level
programming language that is freely available for use on the PC, Apple
Mac, and Raspberry Pi. Moreover, Python is probably the easiest
computer language to learn, and it is remarkably powerful.
People learn by doing. I have decided to include sufficient Python for the
reader to construct a simple computer simulator that can read a machine-
level computer instruction and execute it. Because I will show how this
Python simulator works, students can build computers to their own
specifications. They can experiment with instruction sets, addressing
modes, instruction formats, and so on. They can even build different
types of computers to their own specifications, for example, by using
complex instruction set computer (CISC) or reduced instruction set
computer (RISC) architectures. CISC and RISC offer two different
philosophies of computer design. Essentially, RISC computers have
fixed-length instructions that permit only register load and store memory
operations, whereas CISC computers can have variable-length
instructions and permit direct data operations on memory. In reality, the
distinction between RISC and CISC is more complex. The first
generation of microprocesses all conformed to CISC philosophy.
Readers can build computers because they can write a program in Python
that will execute the target language of a specific computer architecture
and they can design that target language themselves.
If you are using the digital version of this book, we advise you to
type the code yourself or access the code from the book’s GitHub
repository (a link is available in the next section). Doing so will help
you avoid any potential errors related to the copying and pasting of
code.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates that words in text are not plain English words,
but are words belonging to a program.
The break instruction breaks out of the while loop (that is, execution
continues beyond the end of the loop - it’s a sort of short-circuit
mechanism).
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book,
email us at [email protected] and mention the book title in
the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our
content, mistakes do happen. If you have found a mistake in this book,
we would be grateful if you would report this to us. Please visit
www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form
on the internet, we would be grateful if you would provide us with the
location address or website name. Please contact us at
[email protected] with a link to the material.
Your review is important to us and the tech community and will help us
make sure we’re delivering excellent quality content.
Do you like to read on the go but are unable to carry your print books
everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF
version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code
from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts,
newsletters, and great free content in your inbox daily.
https://packt.link/free-ebook/9781837636679
3. That’s it! We’ll send your free PDF and other benefits to your email
directly
Part 1: Using Python to Simulate a
Computer
In this part, you will be introduced to two threads: the digital computer
and the programming language Python. The purpose of this book is to
explain how a computer works by constructing a computer in software.
Because we assume that the reader will not have a knowledge of Python,
we will provide an introduction to Python. However, we will cover only
those topics relevant to building a computer simulator are explored. The
topics of the structure and organization of a computer and Python
programming are interleaved. Once we have introduced theTC1
(Teaching Computer 1) simulator, the two final chapters will first explore
ways of enhancing the simulator’s functionality, and then look at
simulators for alternative architectures.
Once we’ve introduced the concept of digital systems, the next chapter
will demonstrate how a computer operates by fetching instructions from
memory and executing them. After that, we will introduce Python and
demonstrate how you can write a program to simulate a computer and
observe its operations. This book is all about learning by doing; by
building a computer with software, you will learn how it operates and
how to extend and modify it.
The remainder of this book will look at a real computer, a Raspberry Pi,
and show you how to write programs for it and observe their execution.
In doing so, we will move on from simulating a hypothetical computer to
learning about a real computer.
We are going to pose a simple problem and then solve it. Our solution
will lead us to the concepts of algorithms and computers, and also
introduce key concepts such as discrete digital operations, memory and
storage, variables, and conditional operations. By doing this, we can
determine the types of operations a computer needs to perform to solve a
problem. After this, we can ask, “How can we automate this? That is,
how can we build a computer?”
It’s a trite statement, but once you understand a problem, you’re well on
the way to finding a solution. When you first encounter a problem that
requires an algorithmic solution, you have to think about what you want
to do, rather than how you are going to do it. The worst approach to
problem solving is to start writing an algorithm (or even actual computer
code) before you have fully explored the problem. Suppose you were
asked to design a cruise control system for an automobile. In principle,
this is a very simple problem with an equally simple solution:
IF cruise control on THEN keep speed constant
ELSE read the position of the gas pedal
Even if you design a correct algorithm, you have to consider the effect
erroneous or spurious data will have on your system. One of the most
popular criticisms of computers is that they produce meaningless results
if you feed them with incorrect data. This idea is summed up by the
expression garbage in, garbage out (GIGO). A well-constructed
algorithm should detect and filter out any garbage in the input data
stream.
Now, imagine state space, which is a grandiose term for all the states a
system can be in (for example, a plane can be in the climbing,
descending, or level flight state). States are a bit like time, except that
you can go forward or backward between discrete points in state space.
If there are a limited number of possible states, a device that models the
transitions between states is called a finite state machine (FSM). An
elevator is a finite state machine: it has states (position at the floors,
doors open or closed, and so on) and inputs (the elevator call buttons,
floor selection buttons, and door open and close buttons).
Before we take a serious look at FSMs, let’s begin with a simple example
of how to use FSMs to describe a real system. Consider the TV of
yesterday, which is a device with two states: on and off. It is always in
one of these two states, and it can move between these states. It is never
in a state that is neither on nor off. Modern TVs often have three states –
on, standby, and off– where the standby state provides a fast-on
mechanism (that is, part of the electronics is in an active on state, but the
display and sound system are powered down). The standby state is often
called the sleep state or idle state. We can model discrete states using a
diagram. Each state is represented by a labeled circle, as demonstrated in
Figure 1.1:
Figure 1.1 shows the three states, but it doesn’t tell us the most important
information we need to know: how we move between states. We can do
this by drawing lines between states and labeling them with the event
that triggers a change of state. Figure 1.2 does this. Please note that we
are going to construct an incorrect system first to illustrate some of the
concepts concerning FSMs:
In Figure 1.2, we labeled each transition by the event that triggers it; in
each case, it’s pressing a button on the remote controller. To go from off
to on, you have to first press the standby button and then the on button.
To go between on and standby, you must press the on button or the
standby button.
We’ve forgotten something – what if you are already in a state and you
press the same button? For example, let’s say the TV is on, and you press
the on button. Also, what’s the initial state? Figure 1.3 rectifies these
omissions.
Figure 1.3 has two innovations. There is an arrow to the off state marked
Power on. This line indicates the state the system enters when you first
plug it into the electricity supply. The second innovation in Figure 1.3 is
that each state has a loop back to itself; for example, if you are in the on
state and you press the on button, you remain in that state:
The state diagram shown in Figure 1.3 has both a logical error and an
ergonomic error. What happens if you are in the off state and press the on
button? If you are in the off state, pressing the on button (in this system)
is incorrect because you have to go to standby first. Figure 1.4 corrects
this error by dealing with incorrect inputs:
Figure 1.4 – The TV control with wrong button correction
Figure 1.4 now provides correct operations from any state and includes
the effect of pressing buttons that cause no change of state. But we still
have the ergonomic error – that is, it’s a correct design that behaves in a
way that many would consider poor. The standby state is a convenience
that speeds up operations. However, the user does not need to know
about this state – it should be invisible to the user.
We’ve labored with this example because the notion of FSMs is at the
heart of all digital systems. All digital systems, apart from the most
trivial, move from state to state, depending on the current input and past
states. In a digital computer, the change-of-state trigger is the system
clock. A modern computer operating at a clock speed of 4 GHz changes
state every 0.25 x 10-9 s or every 0.25 ns. Light traveling at 300,000 km/s
(186,000 mph) moves about 7.5 cm or 3 inches during a clock cycle.
We can use Table 1.1 to describe this system. We have provided the
current state of the lights (direction of traffic flow), indicated whether
any traffic had been detected in either the north-south or east-west
direction, the action to be taken at the next clock, and the next state. The
traffic rule is simple: the lights remain in their current state unless there
is pending traffic in the other direction.
We can now convert this table into the FSM diagram shown in Figure
1.6. Note that we have made the east-west state the power on state; this is
an arbitrary choice:
Figure 1.6 – A finite state machine for Table 1.1
What have we learned? The most important point is that a system is, at
any instant, in a particular state and that a transition to another state (or a
transition back to the current state) is made according to a defined set of
conditions. The FSM has several advantages, both as a teaching tool and
a design tool:
It uses a simple intuitive diagram to describe a system with discrete
states
The FSM is also an abstract machine in the sense that it models a real
system, but we don’t have to worry about how the FSM is
implemented in real hardware or software
As you can see, there are four states. We begin in state S0. Each time a
token is received, we move to the next state if it’s red, and back to the
initial state if it’s white. Once we’ve reached state S3, the process ends.
Now, we’ll perform the same operation algorithmically.
Line 7: Success – three consecutive red tokens have been taken out
of the bag
There’s no single solution to this problem – more often than not, lots of
algorithms can be constructed to solve a given problem. Let’s derive
another algorithm to detect a sequence of three consecutive red tokens:
Line 1: Set the total number of consecutive red tokens found so far to 0
Constructing an algorithm
The next step is to provide an algorithm that tells us how to solve this
problem clearly and unambiguously. As we step through the sequence of
digits, we will need to keep track of what’s happening, as Table 1.2
demonstrates:
Position in 0 1 2 3 4 5 6 7 8 9 10 11 12 13
sequence
New token R R W R W W W W R W W R R R
Is it red? Y Y N Y N N N N Y N N Y Y Y
Number of reds 1 2 0 1 0 0 0 0 1 0 0 1 2 3
The while numRed != maxRed: line means carry out the block of indented
operations, so long as (while) the value of numRed is not equal to maxRed.
The != Python operator means not equal.
This is the output when the program is executed. It correctly identifies
three consecutive reds and indicates the location of the first run in the
run of three:
count 0 token W numRed 0
count 1 token R numRed 1
count 2 token R numRed 2
count 3 token W numRed 0
count 4 token R numRed 1
count 5 token W numRed 0
count 6 token W numRed 0
count 7 token W numRed 0
count 8 token W numRed 0
count 9 token R numRed 1
count 10 token W numRed 0
count 11 token W numRed 0
count 12 token R numRed 1
count 13 token R numRed 2
count 14 token R numRed 3
Three reds found starting at location 12
Summary
In this first chapter, we introduced the concept of a computer via an
FSM. A state machine is an abstraction of any system that can exist in
one of several states at any instant. State machines are defined in terms
of the states and the transitions between states. We introduced state
machines as a precursor to digital systems. State machines introduce the
notion of discrete states and discrete times. A state machine moves from
one state to another at discrete instants in time. This mirrors the action of
a program where actions (change of state) take place only when an
instruction is executed.
Reading programs
Mathematical operators
Comments
Functions in Python
Computer memory
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter02.
There are alternatives to IDLE that let you create Python source files
supported by Python platforms. These alternatives are generally more
sophisticated and targeted at the professional developer. For the purposes
of this chapter, IDLE is more than sufficient, and nearly all the Python
programs in this text were developed with IDLE.
Geany: https://www.geany.org/
Thonny: https://thonny.org
Reading programs is not easy for the beginner because you don’t know
how to interpret what you see. The following section describes some of
the typography and layout conventions we will use in this chapter to
make the meaning of programs more clear and to highlight the features
of a program.
Reading programs
In order to help you follow the programs, we have adopted two different
type fonts – a variable-width font (where letters have different widths,
such as the bulk of the text here) and a mono-spaced font, such as the
Courier font found on old mechanical typewriters that looks like this.
Consider the following example. The gray text indicates reserved words
and symbols in Python that are necessary to specify this construct. The
numbers in bold black are values supplied by the programmer. The text
following # is in a non-monospaced font and is a comment ignored by
the computer:
Consider the example of an IDE (in this case, IDLE) in Figure 2.1.
Suppose we want to create a four-function calculator that performs
simple operations such as 23 x 46 or 58 - 32. I’ve chosen this example
because it is really a very simple computer simulator. Figure 2.1 is a
screenshot taken after a Python program has been loaded using the file
function. You can also directly enter a Python program from the
keyboard.
Like most high-performance IDE systems, IDLE uses color to help you
to read a program. Python lets you add a commentary to the program
because code is not always understandable. Any text following a #
symbol is ignored. In this case, the code is understandable, and these
comments are not necessary.
You have to save a program before you can run it. Saving a program
from IDLE automatically appends .py, so that a file named calc1 is saved
as calc1.py.
Figure 2.1 – A screenshot of a Python program in IDLE
Figure 2.1 shows the layout of Python programming, including the all-
important indentation, which is a key feature of Python and indicates
which operations belong to a particular command.
The break instruction breaks out of the while loop (that is, execution
continues beyond the end of the loop – it’s a sort of short-circuit
mechanism).
To run the program, you select the Run tab and then click on Run
module, or you just enter F5. The following demonstrates the effect of
running this program. The text in bold is the text I entered using the
keyboard:
Hello. Input operations in the form 23 * 4. Type E to end.
Type first number 12
Type operator + or - or / or * +
Type second number 7
Result = 19
Type first number 15
Type operator + or - or / or * -
Type second number 8
Result = 7
Type first number 55
Type operator + or - or / or * /
Type second number 5
Result = 11
Type first number 2
Type operator + or - or / or * E
Program ended
Float: These are numbers with a decimal point (e.g., 23.5 and -0.87).
These are also called real numbers. Surprisingly, we will not be using
real numbers in this text.
bool: Python has a Boolean type, bool, which has only two values,
True and False. The bool type is used in Boolean logic expressions.
It is also used in comparisons – for example, the English expression,
“Is x greater than y?” has two possible outcomes – True or False. If
you type print(5 == 5) in Python, it will print True because the ==
means “is the same as?” and the result is True.
Mathematical operators
Computers can perform the four standard arithmetic operations –
addition, subtraction, multiplication, and division. For example, we can
write the following:
X1 = a + b Addition
X2 = a – b Subtraction
X3 = a * b Multiplication
X4 = a / b Division
The symbol for multiplication is * (the asterisk) and not the conventional
x. Using the letter x to indicate multiplication would lead to confusion
between the letter x and x as a multiplication operator.
17//5 gives the result 3 and the fractional part (the remainder) is
discarded. Note that x = 19//5 gives the result 3 because it rounds down,
even though 4 is the closest integer. If you were to execute x = -19//5,
that would give the result -4 because it rounds down to the closest
integer.
The % symbol is the modulus operator and provides the remainder after
division. For example, x = 17%5 gives the result 2. Suppose, on Monday,
someone said they were visiting you in 425 days. What day of the week
is that? 425%7 is 5, which indicates Saturday. We can use % to test
whether a number is odd or even. If you write y = x%2, then y is 0 if the
number is even and 1 if it is odd.
Data elements are called variables because their values can be changed.
A variable has an address (location) in memory. A programmer doesn’t
have to worry about the actual location of data; the operating system
takes care of that for them automatically. All they have to do is to come
up with a name for the variable. For example, let’s take the following:
totalPopulation = 8024
When you write this, you define a new variable that you’ve called
totalPopulation and have told the computer to store the number 8024 at
that location. You don’t have to worry about all the actions involved in
doing this; that’s the job of the operating system and compiler.
Here, the computer reads the current value in the memory location
assigned to the name totalPopulation. It then adds 24 to this value, and
finally, stores the result in totalPopulation.
Comments
Because computer languages can be terse and confusing to the human
reader, they let you add comments that are ignored by the computer (i.e.,
they are not part of the program). In Python, any text on the same line
following the # symbol is ignored. In the following examples, we’ve put
that text in a different font to emphasize that it’s not part of the program.
First, consider the following Python expression:
hours = 12 # Set the value of the variable hours to 12
This code creates a new data element called hours and gives it the integer
value 12. Figure 2.2 illustrates the structure of this line.
Figure 2.2 – The structure of a statement with a comment
Then, the computer would read the value of the name on the right-hand
side of the expression (i.e., hours) and substitute its value with 12. Then,
it would add 3 to get 12 + 3 = 15, storing this new value in a memory
location called allTime. The text following # is ignored by the computer
and serves only to help humans understand the program.
As we’ve seen, Python has a special symbol for “is the same as,” and
that symbol is ‘==’. Never confuse = and ==. It is very easy to write if x
We have not covered all aspects of this program in detail yet. It is given
here to demonstrate the essential simplicity of Python. However, note
that Python allows you to request input from the keyboard and provide a
prompt at the same time.
Figure 2.3 – A Python program to add the first n integers
Our next topic introduces one of Python’s key elements – a feature that is
immensely powerful and flexible and one that makes Python such a
popular language, especially for beginners. We will describe the list.
Like any other object, the programmer assigns a name to a list. The
following Python list gives the result of eight consecutive exams taken
by a student, expressed as a percentage:
myTest = [63, 67, 70, 90, 71, 83, 70, 76]
Python indicates the position of an item in a list (or other data structures)
by means of parentheses – for example, element 3 in myTest is expressed
as myTest[3]. The first element of a list is position 0 because computers
count from zero. The first value in myTest is myTest[0], which is 63.
Similarly, the last result is myTest[7], which is 76. We have 8 results,
numbered 0 to 7.
Note that we have aligned the equals symbols in a column on the left.
This is not a feature of Python. Nor is it a requirement. We do it because
it makes the code easier to read and makes debugging easier.
Let’s return to the shopping list example and call the list veg1. We can set
up this list with the following:
veg1 = ['potatoes', 'onions', 'tomatoes'] # A list of three
strings
The individual items in this list are in quotation marks. Why? Because
they are strings – that is, text. Without quotations, the items would refer
to variables that were defined earlier, such as the following:
opClass = 4 # Define opClass
addOp = ['add', opClass] # A list with an instruction name and its
class
Python lets you select the last element in a string or list by using the - 1
index – for example, if you write x = y[-1], then x = ‘e’. Similarly,
y[-2]is ‘d’ (i.e., the next but one from the end).
We have used functions such as print() and len(). We will now discuss
functions in a little more detail. Later, we show you how you can define
your own functions as well as use Python’s built-in functions.
Functions in Python
Before continuing, we need to introduce the concept of a function in a
high-level language. In high school math, we encounter functions such as
sqrt(x), which is an abbreviation of square root and returns the square
root of x – for example, sqrt(9) = 3. Computer languages have borrowed
the same concept.
Python provides functions that are built into the language. You call a
function to perform a specific operation (it’s a bit like subcontracting in
the real world) – for example, len()operates on strings and lists. If you
call len(x) with the list x, it will return the number of items in that list.
Consider the following:
toBuy = len(veg1) # Determine the length of list veg1 (number of items in it)
This takes the list we called veg1 and counts the number of items in it,
copying that value to the toBuy variable. After this operation has been
executed, the value of toBuy will be the integer 3, since there are 3 items
in veg1.
Let’s say you write the following:
q = 'abcdefgh'
print(len(q))
Then, the number printed is 8 because there are eight characters in the
string q.
Let’s add together the number of items in the two lists. We can do that
with the following:
totalShopping = len(veg1) + len(fruit1)
This expression calculates the length of both lists (getting the integers 3
and 4, respectively), adds them together, and assigns the value 7 to the
totalShopping variable. You can print this on the screen by using the
print()function:
Is this what you would have expected? You might have been looking for
a list in the order of apples, oranges, grapes. But what you asked for was
the fruit1 Python list, and that is exactly what you got (hence the
brackets, and quotation marks around each item in the list). If we wanted
to print the list as apples, oranges, grapes, bananas, we would have
had to first convert the list of strings into a single string item (which we
will discuss later). However, for the impatient among you, here’s how
it’s done using the join() function:
fruit1 = ['apples', 'oranges', 'grapes', 'bananas']
print('Fruit1 as list = ', fruit1)
fruit1 = (' ').join(fruit1) # This function joins the
individual strings
print('Fruit1 as string = ', fruit1)
Next, we will discuss the very item that makes a computer a computer –
the conditional operation that selects one of two or more possible courses
of action, depending on the result of a test.
These actions are carried out sequentially. The first three actions are
simple operations. The fourth action is very different from the previous
three, and it is this action that gives the computer its power. It’s an
operation that performs one of two actions, depending on the outcome of
a test. In this case, the eggs are beaten, then tested. If they are not
sufficiently stiff, the beating is continued, they are retested, and so on.
It’s the same with a computer. At any point in a program, the computer
can be given two alternative courses of action. Depending on the
outcome of a test, the computer chooses one of these courses of action to
take and then carries out the appropriate operations for that decision.
We’ve put the condition in bold, and the action is shaded. The action is
carried out if, and only if, the condition is true. If the condition is not
true, the action is ignored. For example, if x is 6, the value of z will be
20. If x is 4, the value of z will remain at 9.
The only two reserved Python elements are if and the colon. The term
condition is any Python expression that returns the True or False value,
and action is any block of instructions the computer will execute if the
condition is True. Some programmers put the action on a new line. In
this case, the action MUST be indented, such as the following Figure
2.4:
Although Python lets you use any indentation, good practice suggests the
indentation be four spaces.
The condition is x > 5, and the action is z = 20. The condition is a
Boolean logic expression that yields one of two outcomes – True if x is
greater than 5 and False if x is not greater than 5.
These four conditions are equal to, not equal to, greater than, and less
than, respectively. Remember that the expression x == y reads, “Is x
equal to y?” Consider the following example (using the Python IDLE
interpreter):
>>> x = 3
>>> y = 4
>>> x == y
False
>>> x + 1 == y
True
Examples of if statements
A variable, x, varies between 0 and 9, inclusive. Let’s say we want y to
be 0 if x is less than 5, 1 if x > 4 and x < 8, and 2 if x > 7. We can express
this as follows:
if x < 5: y = 0
if x > 4 and x < 8: y = 1 # A compound test. Is x greater than 4 AND x
is less than 8
if x > 7: y = 2
Python’s if … else
The preceding code is correct, but it’s not efficient. If the result of the
first if is true, the other if statements can’t be true, and yet they are
tested. What we need is a test in which if it is true, we perform the
appropriate action. If it is not true, we perform a different action. We
have a word for this in English – it’s called “else”:
Consider the following:
if x < 5: # Is x less than 5?
y = 0 # If x is less than 5, then y = 0
else:
y = 3 # otherwise, y is 3
Python has an elif (else if) statement that allows multiple tests. We can
use elif to perform another if if the result of the first if is false.
Consider the preceding code using elif:
if x < 5: # Is x less than 5?
y = 0 # If x is less than 5, then y = 0
elif x > 4 and x < 8: # If x is not less than 5, test whether it's between
5 and 7
y = 1 # If x is between 5 and 7, then set y to 1
elif x > 7: # If both previous tests fail, then test whether x is
8 or more
y = 2
print('x and y ', x, y)
In this case, the computer drops out of this construct as soon as one of
the conditional tests is true. It does exactly the same as the preceding
example using if statements, but it is more efficient because it
terminates as soon as a condition is satisfied. That is, once any of the
tests yields true, its associated action is carried out, and then control
passes to the next statement after the if … elif sequence (in this case,
print).
The great thing about programming is that there are many ways of doing
the same thing. Suppose we know that a variable, x, is always in the
range from 0 to 10, and we want to know the value of y for each x (using
the preceding algorithm). We can calculate the value of y as we did
previously by using conditional statements and programming.
Input x Output y
0 0
1 0
2 0
3 0
4 0
5 1
6 1
7 1
8 2
9 2
10 2
Get the text from the keyboard – for example, myProg = input(‘Type
program ‘)
The first two techniques are great for testing a very short program but
less so for a long source program. The third technique means that you
write your source code using your favorite text editor, save it as a .txt
file, and then read that file in your assembler.
In the preceding code, the myFile variable provides the name of the
source file as a string. Then, the with open operation opens and reads
myFile. This operation also closes the file after use. The preceding code
opens c.txt for reading and creates a new file, sFile.
It is good practice to close a file after you have used it, with
filename.close(). Because the with open operation automatically closes
a file at the end of an operation, it is not necessary to call the close()
function.
The output of this program for a sequence I entered was the following:
How many red tokens are you looking for? 3
Which token is it? Red or white? r
Which token is it? Red or white? r
Which token is it? Red or white? w
Which token is it? Red or white? r
Which token is it? Red or white? w
hi h k i i d hi
Which token is it? Red or white? w
Which token is it? Red or white? w
Which token is it? Red or white? r
Which token is it? Red or white? r
Which token is it? Red or white? r
3 Reds found
Computer memory
Now, we are going to introduce the concept of memory, the mechanism
that holds programs and data. Real or physical memory is implemented
as DRAM, flash memory, and disk drives. This memory is part of a
computer’s hardware. We do not cover physical memory in this book.
Instead, we will discuss abstract memory and how it is modeled by
Python. This is the programmer’s view of memory.
All data is stored in physical memory, and all the data structures
designed by a programmer must be mapped onto physical memory. The
mapping process is the job of the operating system, and this book does
not deal with the translation of abstract memory addresses into real
memory addresses.
These three strings have the [0], [1], and [2] addresses in the friends
list. The operating system maps these elements onto the physical
memory storage locations. These strings each require a different number
of physical memory locations because each one has a different length.
Mercifully, computer users do not have to worry about any of that. That
is the job of the operating system.
We will now take a brief look at the concept of memory because you
have to understand the nature of memory in order to understand how
computers work, and how to write programs in assembly language.
Figure 2.6 shows how a program to find the number of red tokens in a
string of tokens is stored in a hypothetical memory. I must stress that the
program is conceptual rather than actual because real computer
instructions use rather more primitive machine-level instructions than
these. This figure, called a memory map, shows the location of
information within the memory. It’s a snapshot of the memory because it
represents the state of the memory at a particular instant. The memory
map also includes the variables used by the program and a string of
digits. The stored program computer stores instructions, variables, and
constants in the same memory.
Figure 2.6 demonstrates that each location in the memory contains either
an instruction or a data element. Numbers 0 to 23 in the first column are
addresses that express the position of data elements and instructions
within the memory (addresses start from 0 rather than 1 because 0 is a
valid identifier).
Expression (a) states that the contents of memory location 20 are equal to
the number 5. Expression (b) states that the number 6 is put into (copied
or loaded into) memory location 20. Expression (c) indicates that the
contents of memory location 6 are copied into memory location 20.
Expression (d) indicates that 4 is added to the contents of location 3, and
the sum is put in location 12. Expression (e) indicates that the sum of the
contents of locations 7 and 8 are added and put in location 19.
Expression (f) indicates that the contents of location 2 are used to access
memory to read a value, which is an address. The contents of that second
address are put in location 4. This expression is the most interesting
because it introduces the notion of a pointer. The value of memory
location 2, [2], is a pointer that indicates (points to) another memory
location. If we perform [2] ← [2] + 1, the pointer now points to the next
location in memory. We will return to this when we discuss indirect
addressing (also called pointer-based addressing or indexed addressing).
Address Data
0 6
1 2
2 3
3 4
4 5
5 2
6 8
7 1
8 5
9 2
10 1
11 5
Figure 2.9 – An example of the memory map of an abstract memory
We can translate this into Python and print the contents of location 6 as
follows:
mem = [0]*8 # Create memory with 8 locations, all set
to 0. This is Python
mem[3] = 4 # Load location 3 with 4
mem[5] = 9 # Load location 5 with 9
sum = mem[3] + mem[5] # Add locations 3 and 5 and assign result
to sum
mem[6] = sum # Store sum in location 6
print('mem[6] =', mem[6]) # Print contents of location 6
print('Memory =', mem) # Print all memory locations
As you can see, Python is remarkably close to RTL notation. Now, let’s
use data in memory as a pointer. Recall that a pointer is a value that
points to another location in memory. Figure 2.10 shows an eight-
location memory map with five integers stored in the memory.
Figure 2.10 – A memory map of an addition operation
Summary
One of the main themes of this book is writing a program to simulate a
computer so that you can run programs on a computer you designed
yourself. To do this, it is necessary to write a simulator in a suitable high-
level language. We chose Python because of its availability, simplicity,
and power.
Two important and very fundamental features of Python are the string
(which is important, as simulation involves text processing) and the list.
The list is simply a sequence of elements separated by commas and
enclosed by square brackets. What is special about Python’s lists is that
the elements can be any data elements, and they are easy to access – for
example, element 10 of list x is simply x[10]. Equally, character 5 of
string x = ‘a test’ and is expressed as [5] and is ‘i’. Like all computer
languages, Python numbers elements from 0.
We also looked at the function, a piece of code that can be called from
anywhere in a program to carry out some operation. You don’t need
functions. However, if you do the same thing often, calling a chunk of
code to do the job makes the program easier to read and debug.
An assembly-level program
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter03.
Below the application level, you have the high-level language used to
build the application. This language may be Python, Java, C++, and so
on. High-level languages were designed to enable programmers to build
applications that run on different types of computers. For example, a
program written in Python will run on any machine for which a Python
interpreter or compiler is available. Before the introduction of high-level
languages, you had to design the application for each specific computer.
The layer below the high-level language is the assembly language level,
which is a human representation of the computer’s binary machine code.
People can’t remember or easily manipulate strings of 1s and 0s.
Assembly language is a textual version of machine code. For example,
the assembly language operation ADD A,B,C means add B to C and put the
result in A (i.e., A = B + C) and might be represented in machine code
as 00110101011100111100001010101010.
The machine code layer is the binary code that the computer actually
executes. In the PC world, a machine-code program has the .exe file
extension because it can be executed by the computer. All computers
execute binary code, although this layer is different for each type of
computer – for example, Intel Core, ARM, and MIPS are three computer
families, and each has its own machine code.
Below the machine-code layer are the electronic circuits, which are
generically called microprocessors, or just chips. This is the hardware
that companies such as Intel make, and it’s this hardware that reads
programs from memory and executes them. In general, this layer cannot
be programmed or modified any more than you can change the number
of cylinders in your car’s engine.
Today, some digital systems do have circuits that can be modified
electronically – that is, it is possible for the circuits of a computer to be
restructured by changing the routing of signals through a circuit called a
field programmable gate array (FPGA). The FPGA contains a very
large number of gates and special-purpose circuit blocks that can be
interconnected by programming. An FPGA can be programmed to
perform dedicated applications such as signal processing in medical or
aerospace systems.
This book is about assembly language and machine code layers, and the
layers in Table 3.1 allow us to write programs that are executed by a
computer. By the end of this book, you will be able to design your own
machine code, your own assembly language, and your own computer.
In the 1940s and 1950s, all programming was done in assembly language
(or even machine code). Not today. Writing assembly language programs
is tedious and very challenging. Computer scientists have created high-
level languages such as C++, Java, Python, and Fortran. These languages
were developed to allow programmers to write programs in a near-
English language that expresses more powerful ideas than assembly
language. For example, in Python, you can print the text “Hello World”
When you run the program, you can execute instructions one by one and
observe their outcomes v – that is, you can read the values of data in
registers and memory as the program runs. The purpose of this computer
is not to perform useful computing functions but to show what
instructions look like and how they are executed.
TC1 has several useful facilities that are not present in conventional
computer instruction sets. For example, you can directly input data into
the computer from the keyboard, and you can load random numbers into
memory. This allows you to create data for testing purposes.
The next step is to introduce the notion of the von Neumann computer,
which can be regarded as the grandfather of most modern computers.
The mathematician von Neumann was one of the authors of The First
Draft Report on the EDVAC in 1945, which characterized the structure of
the digital computer.
A memory that holds the program and any data used by the program
A set of registers that each holds one word of data (in Figure 3.1,
there is one register, r0)
Figure 3.1 looks complicated. It’s not. We’ll explain its operation step by
step. Once we see how a computer operates in principle, we can look at
how it may be implemented in software. We describe the operation of a
very simple, so-called one-and-a-half address machine, whose
instructions have two operands – one in memory and one in a register.
Instructions are written in the form ADD B,A, which adds A to B and puts
the result in B. Either A or B must be in a register. Both operands may be
in registers. The term one-and-a-half address machine is a comment
about the fact that the memory address is 16 to 32 bits and selects one of
millions of memory locations, whereas the register address is typically 2
to 6 bits and selects only one of a small number of registers.
The program counter points to the next instruction to be executed. If, for
example, [PC] = 1234 (i.e., the PC contains the number 1234), the next
instruction to be executed will be found in memory location 1234.
Fetching an instruction begins with the contents of the program counter
being moved to the memory address register (i.e., [MAR] ← [PC]). Once
the contents of the program counter have been transferred to the memory
address register, the contents of the program counter are incremented and
moved back to the program counter, as follows:
[PC] ← [PC] + 1.
After this operation, the program counter points to the next instruction
while the current instruction is executed.
The memory address register (MAR) holds the address of the location
in the memory into which data is written in a write cycle, or from which
data is read in a read cycle.
The instruction is next moved from the MBR to the instruction register
(IR), where it is divided into two fields. A field is part of a word in
which the bits are grouped together into a logical entity – for example, a
person’s name can be divided into two fields, the given name and the
family name. One field in the IR contains the operation code (opcode)
that tells the CPU what operation is to be carried out. The other field,
called the operand field, contains the address of the data to be used by
the instruction. The operand field can also provide a constant to be
employed by the operation code when immediate or literal addressing is
used – that is, when the operand is an actual (i.e., literal) value and not
an address. For our current purposes, the register address is considered to
be part of the instruction. Later, we will introduce computers with
multiple registers. Real computers divide the instruction into more than
two fields – for example, there may be two or three register-select fields.
The control unit (CU) takes the opcode from the instruction register,
together with a stream of clock pulses, and generates signals that control
all parts of the CPU. The time between individual clock pulses is
typically in the range 0.3 ns to 100 ns (i.e., 3 x 10-10 to 10-7 s),
corresponding to frequencies of 3.3 GHz to 10 MHz. The CU is
responsible for moving the contents of the program counter into the
MAR, executing a read cycle, and moving the contents of the MBR to
the IR.
The data register, r0, holds temporary results during a calculation. You
need a data register (i.e., an accumulator) because dyadic operations with
two operands such as ADD use one operand specified by the instruction,
and the other register is the contents of a data register. ADD r0,P adds the
contents of the memory location, P, to the contents of the general-
purpose register, r0, and deposits the sum in the data register, destroying
one of the original operands. The arrangement of Figure 3.3 has one
general-purpose data register that we’ve called r0. A real processor, the
ARM, has 16 registers, r0 to r15 (although not all of them are general-
purpose data registers).
OR Logical OR (a = b | c)
A logical shift treats an operand as a string of bits that are moved left or
right. An arithmetic shift treats a number as a signed 2s complement
value and propagates the sign bit during a right shift (i.e., the sign bit is
replicated and duplicated). Most of these operations are implemented by
computers such as the 68K, Intel Core, and ARM.
The instruction ADD r1,r2,r3 reads the contents of registers r2 and r3,
adds them together, and deposits the result in register, r1. Since it’s not
clear which register is the destination register (i.e., the result), we use a
bold font to highlight the destination operand, which is normally the
leftmost operand.
Now that we’ve covered the basic structure of a computer and introduced
some instructions, the next step is to look at a complete program that
carries out a specific function.
An assembly-level program
Having developed our computer a little further, in this section, we will
show how a simple program is executed. Assume that this computer
doesn’t provide three-address instructions (i.e., you can’t specify an
operation with three registers and/or memory addresses) and we want to
implement the high-level language operation Z = X + Y. Here, the plus
symbol means arithmetic addition. An assembly language program that
carries out this operation is given in the following code block.
Remember that X, Y, and Z are symbolic names referring to the locations
of the variables in memory. Logically, the store operation should be
written STR Z,r2, with the destination operand on the left just like other
instructions. By convention, it is written as STR r2,Z, with the source
register on the left. This is a quirk of programming history:
LDR r2,X Load data register r2 with the contents of memory location X
ADD Z,X,Y Add the contents of X to the contents of Y and put the
result in Z
The way in which the CPU operates can best be seen by examining the
execution of, say, ADD r2,Y in terms of register-transfer language. In the
following code block, we describe the operations carried out during the
fetch and execute phases of an ADD r2,Y instruction:
FETCH [MAR] ← [PC] Move the contents of the PC to the
MAR
[PC] ← [PC] + 1 Increment the contents of the PC
[MBR] ← [[MAR]] Read the current instruction from the
memory
y
[IR] ← [MBR] Move the contents of the MBR to the
IR
CU ← [IRopcode] Move the opcode from the IR to the
CU
ADD [MAR] ← [IRaddress] Move the operand address to the
MAR
[MBR] ← [[MAR]] Read the data from memory
ALU ← [MBR], ALU ← [r2] Perform the addition
[r2] ← ALU Move the output of ALU to the data
register
During the fetch phase, the opcode is fed to the control unit by CU ←
[IRopcode] and used to generate all the internal signals required to place
the ALU in its addition mode. When the ALU is programmed for
addition, it adds together the data at its two input terminals to produce a
sum at its output terminals.
Operations of the form [PC] ← [MAR] or [r2] ← [r2] + [MBR] are often
referred to as microinstructions. Each assembly-level instruction (e.g.,
MOV, ADD) is executed as a series of microinstructions. Microinstructions
are the province of the computer designer. In the 1970s, some machines
were user-microprogrammable – that is, you could define your own
instruction set.
We can test the execute phase by extending the fetch phase code. The
following Python code provides three instructions – load a register with a
literal, add memory contents to the register, and stop. We have also made
the Python code more compact – for example, you can put expressions in
a function’s return statement. In this example, we return two values: ir
>> 8 and ir & 0xFF. The operation x >> y takes the binary value of x and
shifts the bits y places right; for example, 0b0011010110 >> 2 gives
0b0000110101. The shaded part of the code is the machine-level program
we execute:
# Implement fetch cycle and execute cycle: include three test
instructions
mem = [0] * 12 # Set up 12-location memory
pc = 0 # Initialize pc to 0
mem[0] = 0b000100001100 # First instruction load r0 with 12
mem[1] = 0b001000000111 # Second instruction add mem[7] to
r0
mem[2] = 0b111100000000 # Third instruction is stop
mem[7] = 8 # Initial data inlocation 7 is 8
def fetch(memory): # Function for fetch phase
global pc # Make pc a global variable
ir = memory[pc] # Read instruction and move to IR
pc = pc + 1 # Increment program counter for next
cycle
return(ir >> 8, ir & 0xFF) # Returns opCode and operand
run = 1 # run = 1 to continue
while run == 1: # REPEAT: The program execution
loop
opCode, address = fetch(mem) # Call fetch to perform fetch phase
if opCode == 0b1111: run = 0 # Execute phase for stop (set run to
0 on stop)
elif opCode == 0b0001: # Execute phase for load number
r0 = address # Load r0 with contents of address
field
elif opCode == 0b0010: # Execute phase for add
mar = address # Copy address in opCode to MAR
mbr = mem[mar] # Read the number to be dded
r0 = mbr + r0 # Do the addition
print('pc = ',pc - 1, 'opCode =', opCode, 'Register r0
=',r0)
# We print pc – 1 because the pc is
incremented
Three items have been added to our computer in Figure 3.5. These are
highlighted:
A path between the address field of the instruction register and the
program counter.
The condition code register or processor status register records the ALU
state after each instruction has been executed, and updates the carry,
negative, zero, and overflow flag bits. A conditional branch instruction
interrogates the CCR’s flags. The CU then either executes the next
instruction in sequence or branches to another instruction. Let’s look at
the details of the conditional branch. The following is a reminder of the
CCR bit functions:
2. Branch on equal (jump to the target address if the Z bit in the CCR is
1)
Both these actions have an ELSE condition, which is the default [PC] ←
[PC] + 1.
Note that we use two conventions for literals. One is ADD r0,#12 and the
other is ADDL r0,12. This matches typical instruction sets.
Figure 3.6 shows that an additional data path is required between the
operand field of the IR and the data register and ALU to deal with literal
operands. Figure 3.6 includes three general-purpose registers, r0, r1, and
r2. In principle, there is nothing stopping us from adding any number of
registers. However, the number of internal registers is limited by the
number of bits available to specify a register in the instruction. As you
can see, three data buses, A, B, and C, are used to transfer data between
the registers and ALU.
This sequence has been simplified because, as you can see from Figure
3.6, there is no direct path between register r0 and the MBR. You would
have to put the contents of r0 onto bus A, pass the contents of bus A
through the ALU to bus C, and then copy bus C to the MAR.
Figure 3.6 – Modifying the CPU to deal with literal operands
Let’s extend our Python code to include both literal operations and
conditional operations. The following Python code implements a load
register with literal instruction, an add/subtract, a conditional branch on
zero, and a stop. Here, we use LDRL to indicate a literal, rather than
prefixing the literal with #. The program to be executed is as follows:
mem[7]
6 [PC] ← 6
We load the literal 9 into r0, subtract the contents of memory location 7
(which contains 9), and then branch to location 6 if the result was 0. And
that’s what happens.
Later in this book, we will take a brief look at the concept of multi-
length instruction sets.
From the first computer to today’s chips with over 10 billion transistors,
computers have had instruction sets that include the following three
classes of operation. Table 3.3 gives the name of the instruction group,
an example of an operation in Python, and a typical assembly language
instruction.
TC1 has a 32-bit instruction but only a 16-bit data word. This
arrangement makes it easier to design and understand the computer, and
you can load a 16-bit data word with a single 32-bit instruction.
Computers with 32-bit instructions and data have to use convoluted
methods to load 32-bit data words, as we shall see when we introduce the
ARM.
In this text, we’ve used the ADDL convention in the design of some
simulators, but we will use the # convention when we introduce the
ARM processor because that’s used by ARM assemblers. In retrospect, if
I were writing this book again, I think I might have been tempted to use
only one representation, the # symbol. However, by using ADD and ADDL, I
was able to simplify the Python code because the decision point between
register and literal operands was made when examining the mnemonic,
not when examining the literal.
Summary
In this key chapter, we introduced the von Neumann computer with its
fetch-execute cycle, where an instruction is read from memory,
decoded, and executed in a two-phase operation. It is precisely these
actions that we will learn to simulate in later chapters in order to build a
computer in software. We have looked at the flow of information as an
instruction is executed. The model of the computer we introduced here is
the traditional model and does not take into account current technology
that executes multiple instructions in a pipeline.
We also looked at the instruction format and described how it has several
fields – for example, the opcode that defines the operation and the data
required by the operation (e.g., addresses, literals, and register numbers).
You will eventually be able to design your own instructions (thereby
defining the computer’s instruction set architecture) and create a
computer that will execute these instructions.
In the next chapter, we will begin to look more closely at the concept of
an interpreter that reads a machine-level instruction and carries out its
intended actions.
4
The key topics that we’ll cover in this chapter are as follows:
Collectively, these topics cover three areas. Some topics expand our
knowledge of Python to help us construct a simulator. Some topics
introduce the instruction set of a typical digital computer that we call
TC1 (TC1 simply means Teaching Computer 1). Some topics cover the
actual design of TC1 in Python.
This chapter will introduce the computer simulator and look at some
basic building blocks. The actual simulator will be presented in Chapter
6.
Technical Requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter04.
The computer has an array of eight registers, r[0] to r[7]. These are
specified in Python via the following:
r = [0,0,0,0,0,0,0,0] #. Define a list of 8 registers
and set them all to 0.
We are going to write the Python code necessary to read this instruction
in text form and carry out the action it defines. The two shaded lines in
the following code take this instruction and split it into a list of tokens
that can be processed. A token is an element in an instruction (just as a
sentence in English can be split into tokens that we call words). The
tokens here are ‘add’, ‘r[4]’, ‘mem[3]’, and ‘mem[7]’.
The next step is to create a list of tokens so that we can access the
individual components of the instruction. The effect of inst2 =
The split() method takes a string and creates a list of strings using the
delimiter specified as a parameter. If y = x.split(‘!’), the value of y is
a list of strings and the separator is !. An example of the use of split()
is shown here:
>>> x = 'asssfg! ! !,!!rr'
>>> x
'asssfg! ! !,!!rr'
>>> y = x.split('!')
>>> y
['asssfg', ' ', ' ', ',', '', 'rr']
The token2 = inst2[2] line gives token2 = ‘mem[3]’; that is, the fourth
token.
The computer on which the preceding assembly-level code runs has only
a handful of instructions, although it is very easy to add extra
instructions. Throughout this text, the term opcode or operation code
indicates the binary code (or text version) of an assembly language
instruction such as ADD or BNE. The structure of the simulator program in
pseudocode is as follows:
prog=['LDRL r0 0','LDRL r1 0','ADDL r1 r1 1','ADD r0 r0 r1', \
'CMPL r1 10','BNE 2','STOP']
Define and initialize variables (PC, registers, memory)
while run == True:
read instruction from prog
point to next instruction (increment program counter)
split instruction into fields (opcode plus operands)
if first field = op-code1 get operands and execute
elif first field = op-code2 get operands and execute
elif first field = op code3
elif first field = op-code3 . . .
. . .
else declare an error if no instruction matches.
We can now examine the four fields of this list; for example, inst[0] =
Suppose we want to get the contents of the source register, r2. The
source register is in the third position in the list at [‘ADDL’, ‘r1’, ‘r2’,
‘3’]; that is, inst[2]. Let’s write rS1 = inst[2]. The value of rS1 is
‘r2’.
In the next section, we will develop an instruction set for the TC1
computer. As well as providing a practical example of instruction set
design, we will demonstrate how instructions are divided into multiple
fields and each field supplies some information about the current
instruction.
To simplify this, we can use separate programs and data memories. This
departure from the traditional von Neuman model allows us to have a 32-
bit program memory and a 16-bit data memory. Moreover, we don’t have
to worry about accidentally putting data in the middle of the program
area.
For the sake of simplicity, the TC1 computer has a single, fixed format.
All instructions have the same number of fields, and fields are the same
size for each instruction. An instruction, as shown in Figure 4.1, is made
up of an operation class plus an opcode, three register fields, and a literal
field. Figure 4.1 shows the opcode field as 7 bits, with a 2-bit opcode
class and a 5-bit actual opcode.
We devote 16 bits to the literal field so that we can load a constant into
memory with a single instruction. That leaves 32 - 16 = 16 bits to
allocate to all the other fields.
TC1 has a three-register format, which is typical of load and store
computers such as ARM and MIPS. If we have eight registers, it takes 3
x 3 = 9 bits to specify all three registers. After allocating 16 bits to the
literal and 9 bits to the register selection, we are left with 32 - (16 + 9) =
7 bits to specify up to 128 different possible instructions (27 = 128).
The opcode field itself is divided into four categories or classes, which
take two bits, leaving 7- 2 = 5 for the instructions in each category. Table
4.1 defines the categories (class) of instructions:
Table 4.2 illustrates the TC1 instruction set. The first column (Binary
Code) provides the 7-bit instruction code; for example, 01 00001 loads a
register with the contents of a memory location. The leftmost two bits
are separated to indicate the instruction group:
Binary Operation Mnemonic Instruction Code
Code Format Format
Table 4.2 – TC1 Instruction encoding (the 4 code format bits are not part of the
opcode)
The rightmost column is called Code Format and is not part of the
instruction. This code can be derived from the opcode (as we shall see)
and tells the simulator what information is required from the instruction.
Each bit of the code format corresponds to an operand. If a bit in the
code format is 1, the corresponding operand in the instruction must be
extracted. The order of the code format bits (left to right) is destination
register, source register 1, source register 2, literal; for example, code
1001 tells the assembler that the instruction requires a destination register
and a 16-bit literal. Code 0000 tells us that the instruction contains no
operands at all.
NOP does nothing other than advance the program counter. It’s a dummy
operation that is useful as a marker in code, a placeholder for future
code, or as an aid to testing.
GET reads data from the keyboard and offers a simple way of getting
input from the keyboard into a register. This is useful when testing
programs and is not a normal computer instruction.
RND generates a random number. It’s not in computer instruction sets but
provides an excellent means of generating data internally when testing
your code.
The load and store instructions move data between memory and
registers. The difference between members of this class is the direction
(store is computer-to-memory, while load is memory-to-computer), size
(some computers allow byte, 16-bit, or 32-bit transfers), and addressing
mode (using a literal value, an absolute address, or an address from a
register).
MOV copies one register to another – for example, MOV r3,r1 copies the
contents of r1 to r3. MOV is, essentially, an a load register with register
(LDRR) instruction.
LDRL loads a register with a literal value – for example, LDRL r3,20 loads
register r3 with 20.
LDRI, the load register indexed (or load register indirect) instruction,
loads a register with the contents of a memory location specified by the
contents of a register plus a literal address. LDRM r2,[r3,10] loads r2
with the contents of the memory location whose address is given by the
contents of r3 plus 10. This instruction is the standard RISC load
operation.
DBNE We added the decrement and branch on not zero instruction for fun
because it reminds me of my old Motorola days. The 68000 was a
powerful microprocessor (at the time) with a 32-bit architecture. It has a
decrement and branch instruction that is used at the end of a loop. On
each pass around the loop, the specified register is decremented, and a
branch back to a label is made if the counter is not -1. DBNE r0,L
This addressing mode lets you modify addresses while the program is
running (because you can change the contents of the pointer, r0).
Pointer-based addressing makes it possible to step through a list of items,
element by element, simply by incrementing the pointer. Here’s an
example:
LDRI r1,[r2,0] @ Get the element pointed at by r2. Here the offset is 0
INC r2 @ Increment r2 to point to the next element
We access the element pointed at by r2 (in this case, the offset is 0). The
next line increments r2 to point to the next element. If this sequence is
executed in a loop, the data will be accessed element by element.
Computers implement both arithmetic and Boolean (logical) operations.
In the next section, we’ll briefly look at how Python can be used to
simulate logical operations at the assembly language level.
Bit-handling in Python
In this section, we’ll look at how Python deals with the fundamental
component of all computer data: the bit. Because simulated computers
operate at the level of bits, we have to look at how bits are manipulated
in Python before we can construct a simulator that can perform logical
operations such as AND and OR.
Python lets you input data and display it as a binary string of bits. You
can operate on individual bits of a string using Boolean operators, and
you can shift the bits of a word left and right. You have all the tools you
need in Python. We are now going to look at Python, which lets you
operate on the individual bits of an integer.
This shifts x two places left to give the new value 0b111011000. All the
bits have moved two places left and 0s have been entered at the right-
hand end to fill the newly vacated positions. The shifted version of x
now has nine bits, rather than seven. However, when we simulate a
computer, we have to ensure that, whatever we do, the number of bits in
a word remains constant. If a register has 16 bits, any operation you
perform on it must yield 16 bits.
x y z=x & y
0 0 0
0 1 0
1 0 0
Truth table for AND
1 1 1
Shifting the 16-bit word left two places made it an 18-bit word. ANDing
it with the 0b1111111111111111 binary value forces it to 16 significant
bits.
() Parentheses Highest
precedence
~ Negation
*,/, % Multiplication, division, modulus
+,- Addition, subtraction
<<, >> Bitwise shift left, right
& Logical (bitwise) AND
^ Logical (bitwise) XOR
| Logical (bitwise) OR
<, <+, >, >+, <>, !=, == Boolean comparisons Lowest
precedence
Table 4.2 shows that the encoding of ADD r1,r2,r3 is 10 00000 001 010
The shift-right operator in Python is >> and the bit-wise logical AND
operator is &. The mask is expressed as a string of bits (rather than a
decimal number) because ANDing with binary 111 is clearer than
ANDing with decimal 7. In Python, a binary value is preceded by 0b, so
7 is represented by 0b111. The literal is ANDed with 16 ones, expressed
as 0xFFFF. We use binary for short fields, and hex value for long fields.
It’s just personal preference.
The binCode >> 22 & 0b111 expression shifts the bits of binCode 22
places right, and then bitwise ANDs the result with 000…000111. Because
of operator precedence, the shifting is performed first. Otherwise, we
would have written (binCode >> 22) & 0b111. We often use parentheses,
even when not strictly necessary, to stress operator precedence.
Note that we extract all fields, even though they may not be required by
each instruction. Similarly, we read all three register’s contents. This
approach simplifies the Python code, at the cost of efficiency.
Consider extracting the destination register field, rrr. Suppose that the
instruction is ADD r1,r2,r3, and the opcode is
10000000010100110000000000000000. We have put alternate fields in
bold font and the destination field is shaded to make it easier to read.
Performing a 22-bit right shift moves the destination register into the
least-significant bits and leaves us with
00000000000000000000001000000
001
We have now isolated the first register field to get 001, which
corresponds to register r1. The final three lines of the program to decode
an instruction are as follows:
op0 = r[rD] # Operand 0 is the contents of the destination register
op1 = r[rS1] # Operand 1 is the contents of source register 1
op2 = r[rS2] # Operand 2 is the contents of source register 2
These instructions use the register addresses (rD, rS1, and rS2) to access
registers; that is, the instruction specifies which registers are to be used.
For example, if op0 = r[5], register r5 is operand zero, the destination
register. If an instruction does not specify a register, the unused field is
set to zero.
elif construct tests each opcode in turn. The first line compares the 7-bit
opcode with the binary value 0100010, which corresponds to the LDRL
(load a register with a literal value) instruction. Incidentally, in the final
version of TC1, we made the code easier to read by comparing
operations with the actual mnemonic; for example, ‘LDRL’ is easier to
read than its code, 0b100010.
lit). If not, the next line uses the elif command to compare the opcode
with 010001. If there is a match, the code after the colon is executed. In
this way, the opcode is compared with all its possible values until a
match is found.
Suppose the instruction’s opcode is 0100010, and the r[rD] = lit line is
executed. This Python operation transfers the value of the 16-bit literal
provided in the instruction to the destination register specified in the
instruction. In RTL terms, it carries out r[rD] ← lit and is used by the
programmer to load an integer into a register. Let’s say the binary pattern
of the instruction code is as follows:
01 00010 110 000 000 0000000011000001,
If you were to write the full Python code required to execute each
operation, it would require several lines of code per instruction. An
addition operation such as reg[dest] = reg[src1] + reg[src2] appears
simple enough, but there is more to an arithmetic operation.
0 0 0 0 0 0
0 0 0 1 1 1
0 0 1 0 2 2
8 4 2 1 Unsigned value Signed value
0 0 1 1 3 3
0 1 0 0 4 4
0 1 0 1 5 5
0 1 1 0 6 6
0 1 1 1 7 7
8 4 2 1
1 0 0 0 8 -8
1 0 0 1 9 -7
1 0 1 0 10 -6
1 0 1 1 11 -5
1 1 0 0 12 -4
1 1 0 1 13 -3
1 1 1 0 14 -2
8 4 2 1 Unsigned value Signed value
1 1 1 1 15 -1
The next topic deals with how we handle groups of repetitive operations.
If the same sequence of operations is going to be used more than once in
a program, it makes sense to combine them into a group and invoke that
group whenever you need it. In Python, this group of actions is called a
function.
Functions in Python
We will now describe Python’s functions. We’ve already used functions
that are part of the language, such as len(). In this section, we’ll do the
following:
Writing the Python code to deal with each arithmetic or logical operation
implemented by a simulator would be tedious because so much code
would be replicated by individual instructions. Instead, we can create a
Python function (that is, a subroutine or procedure) that carries out both
the arithmetic/logic operation and the appropriate flag-bit setting.
if f == 2: r = p – q # If f is 2, do a subtraction
The function is introduced by def, its name, and any parameters followed
by a colon. The body of the function is indented. The first parameter, f,
selects the operation we wish to perform (f = 1 for addition and 2 for
subtraction). The next two input parameters, p and q, are the data values
used by the function. The last line of the function returns the result to the
function’s calling point. This function can be called by, for example,
opVal = alu(2,s1,s2). In this case, the result, opVal, would be the value
of s1 – s2.
We also update two flag bits, z and n. Initially, both z and n are set to
zero by z, n = 0, 0. (Python allows multiple assignments on the same
line; for example, a,b,c = 1,5,0 sets a, b, and c to 1, 5, and 0,
respectively.)
You can pass data to a function via parameters and receive a result via
return(). However, you can declare variables in a function as global,
which means they can be accessed and modified as if they were part of
the calling program.
return() is not mandatory because some functions don’t return a value.
You can have multiple return() statements in a function because you
can return from more than one point in a function. A return can pass
multiple values because Python permits multiple assignments on a line;
for example, see the following:
this, that = myFunction(f,g) # Assign f to this and g to that
Testing for a zero result can easily be done by comparing the result, r,
with 0. Testing for a negative result is harder. In two’s complement
arithmetic, a signed value is negative if the most significant bit is 1. We
are using 16-bit arithmetic, so that corresponds to bit 15. We can extract
bit 15 by ANDing the result, r, with the binary value of
1000000000000000 by writing r&0x8000 (the literal is expressed in hex
form as 0x8000, which is mercifully shorter than the binary version).
We test for opcode 1000000 and call the alu function if it corresponds to
ADD. The function is called with the f = 1 parameter for addition; the
numbers to be added are the contents of the two source registers. The
result is loaded into the r[rD] register. In the current version of TC1, we
use the opcode to look up the mnemonic and then apply the test if
mnemonic == ‘ADD’:. This approach is easier to read and can use the
mnemonic when displaying the output during tracing.
We have made the z and n variables global (that is, they can be changed
by the function and accessed externally). If we didn’t make them global,
we would have to have made them return parameters. In general, it is
regarded as a better practice to pass variables as parameters rather than
making them global.
Functions and scope
Variables are associated with scope or visibility. If you write a program
without functions, variables can be accessed everywhere in the program.
If you use functions, life becomes rather more complex.
If you declare a variable in the main body, that variable will be visible in
functions (that is, you can use it). However, you cannot change it in the
function and then access the new value outside the function. What goes
on in the function stays in the function. If you wish to access it from
outside the function, you must declare it as global. If you write global
temp3, then the temp3 variable is the same variable both in and outside
the function.
After running this code, we get the following. As you can see, the
function changes q because it is global, whereas p does not change since
it is a local variable in the function:
In body: p = 5 q = 10
In fun_1 p = 3
In fun_1 q = 10
In body after fun_1 q = 11 after fun_1 p = 5
This means that variables such as the z and n condition code flags can be
accessed in a function. If you wish to change them in a function, they
must be declared as global by using the following command:
global z,n
In the next section, we’ll describe the very thing that makes a computer a
computer – its ability to take two different courses of action, depending
on the outcome of an operation.
Load register with a literal (LDRL) is used three times to load r2 with 0, r0
with 5, and r1 with 1. In the line labeled Loop, we add r2 to r1 and put
the result in r2. On its first execution, r2 becomes 0 + 1 = 1.
In the next chapter, we will return to Python and extend our ability to
handle data structures and use Python’s functions.
Summary
We started this chapter by designing a computer simulator. However, we
haven’t created a final product yet. Instead, we looked at some of the
issues involved, such as the nature of an instruction set and the structure
of an opcode.
List comprehensions
String processing
The dictionary
Functions
Lists of lists
Imports
Indenting in Python
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter05.
A statement is a Python operation that must be evaluated by an interpreter; that is, it’s
an action. Typical Python statements involve if… for… actions. These two terms are
often used in formal definitions; for example, the definition of an if statement is as
follows:
if <expr>:
<statement>
Angle brackets are used in descriptions of the language to indicate something that will
be replaced by its actual value; for example, a valid if statement is as follows:
if x > y:
p = p + q
During the execution of a Python program, you can read a string from
the keyboard and also provide a prompt in the following way.
String processing
You can change (substitute) characters in a string by using the method
replace. For example, suppose we wish to replace all occurrences of ‘$’
in the string price with ‘£’:
price = 'eggs $2, cheese $4'
price = price.
replace
('$', '£')
# Replace $ by £ in the string price
Other string methods are upper() (convert text to upper case), lstrip()
(remove leading characters), and rstrip() (remove trailing characters).
Let x =’###this Is A test???’. Consider the following sequence:
x = x.lstrip('#') # Remove left-hand leading '#' characters to get x =
'this Is A test???'
x = x.rstrip('?') # Remove right-hand trailing '?' characters to get x =
'this Is A test'
x = x.lower() # Convert to lower-case to get x = 'this is a test'
Strings are immutable. You cannot change them once they are defined. In
the preceding code, it looks as if we have modified x by removing
leading and trailing characters and converting upper case to lower case.
No! In each case we have created a new string with the same name as the
old string (i.e., x).
The TC1 assembler uses the following string methods to remove spaces
before an instruction, enable users to employ upper- or lower-case, and
allow the use of a space or comma as a separator. For example, this code
fragment lets you write either r0,[r1] or R0,R1 with the same meaning.
The code below shows how TC1 takes a line of the input (i.e., an
assembly instruction) and simplifies it for later conversion to binary:
line = line.replace(‘,’,’ ‘) # Allow use of a space or comma to
separate operands line = line.replace(‘[‘,”) # Allow LDRI r0,
[r2] or LDRI r0,r2 First remove ‘[‘ line = line.replace(‘]’,”) #
Replace ‘]’ by null string. line = line.upper() # Let’s force
lower- to upper-case line = line.lstrip(‘ ‘) # Remove
leading spaces line = line.rstrip(‘\n’) # Remove end of line
chars. End-of-line is \n
The second statement uses T2[1:] to convert string ‘R2’ into a new string
‘2’ by removing the first character. The slice notation [1:] is interpreted
as “All characters following the first.” This lets us deal with one- or two-
digit values like R2 or R23. Since there are only 8 registers, we could
have written [1:2]. Using [1:] allows the extension 16 registers in a
future version of TC1 without changing the code.
We have to use the integer function int to convert the register number
from a string into its value as an integer. When learning Python, a
common mistake is to forget to convert a string to an integer:
regNum = input('Please enter register number >>>')
contents = reg[regNum]
This file was processed by Python as follows. I use the address it had on
my computer. This is read into a variable sFile:
myFile = 'E:\\ArchitectureWithPython\\testText.txt'
with open(myFile,'r') as sFile:
sFile = sFile.readlines() # Open the source program
print (sFile)
List comprehensions
We now introduce a very powerful feature of Python, the list
comprehension. It’s not powerful because of what it can do, but because
of how succinct it is. A list comprehension lets you take a list and
process it in a single line. We take a look at list comprehensions, because
they are so useful in processing text; for example, you can use a list
comprehension to take a line of text and replace all double-spaces with
single spaces, or convert all lower-case characters to upper-case
characters.
x = [i for i in y]
Here, x and y are strings (or lists). The text in bold represents Python
reserved words and punctuation. The variable i is a user-chosen variable
used to step through the list. We could have used any name instead of i;
it simply doesn’t matter. Consider the following example:
lettersList = [i for i in ‘Tuesday’]
The expression i.split() divides the source string into individual tokens
(strings) at each space. This means we can then then process the line as a
sequence of tokens. The condition if i != ” is used to remove empty
strings by not copying them.
We’ve created a list of three instructions that has empty lines in it,
denoted by ‘ ‘. When we execute this list comprehension, we convert
each string into a sublist and we remove the empty lines:
sFile = ['ADD R1 R2 R3', 'BEQ LOOP', '', 'LDRL R2 4','']
sFile = [i.split() for i in sFile if i != '']
print(sFile)
The tuple
We now introduce the tuple for the sake of completeness, although we
make little use of it in this text. A list is a sequence of elements enclosed
by square brackets; for example, P = [1,4,8,9]. A tuple is a sequence of
elements separated by round brackets; for example, Q = (1,4,8,9).
There is little difference between a tuple and a list; they are both data
structures that hold a sequence of elements. However, a tuple is
immutable and cannot be modified, unlike a list. A tuple is a read-only
list and is used when you wish to store data that does not change.
Although not relevant here, tuples do have implementation and
performance advantages over lists; that is, if you have a list that is fixed,
it is better to use a tuple.
In this case, the values in bold are each two-component tuples. We could
have used a list, but the tuple indicates a fixed structure that cannot
change. If you were to use a list instead of a tuple, you would write this:
opCodes = {'add':[2,34], 'inc':[4,37]}
Use different data and parameters each time you carry out the action
This code first sets inList to False to indicate that the element ‘grapes’
has not been found. The for loop steps though all elements in the list,
testing each one for the item we’re looking for. If it is found, inList is
set to True. This code works, but it is not good. If there are a million
elements in the list and grapes the first one, the code still steps through
the remaining 999,999 elements. This is horribly inefficient.
If we find ‘grapes’, the else part of the if statement sets inList to True
and then uses a break statement to exit the loop and avoid further
pointless cycles round the loop. A break in a for or while loop tells
Python to exit the loop now and continue with the next instruction after
the loop:
listSize = len(fruit1)
for i in range (0,listSize):
if fruit1[i] != 'grapes': inList = False # Is the item here?"
else: # If it is, drop out of
the loop
inList = True # Set flag on finding it
break # Jump out of the
loop
The variable inList is just a flag that we can use later in the program; for
example, we could write this:
if inList == False: print('Yes, we have no grapes')
if inList == True: print('Grapes --- we got lots')
Another approach is to use the list operator in. If we have a list, we can
check whether an item is a member of that list by using the following
construct:
if 'grapes' in fruit1:
inList = True
else: inList = False
The first line returns True if ‘grapes’ is in the list fruit1, and False
otherwise. The in construct is very useful in testing whether an item
belongs to a group of other items arranged as a list; for example, if all
employees are in the list staff, then
The words in bold are the reserved Python words; the other words are
user-defined variables. Here, the i is not a sequence-counting integer as
it was in the previous example using range(). It is the value of each
element (or iterable) in the list taken in turn. Consider the following
example using a list of colors:
car = ['red', 'white', 'green' ,'silver', 'teal']
for color in car: print(color) # Color is a loop variable;
we could have used i.
This code steps through each element of the list car and prints out its
value, as follows.
red
white
green
silver
teal
We have now demonstrated that you can iterate through a list of any type
in Python.
The iterator, color, has become a sequence of tuples with the element
index and the corresponding value from the list. Remember that a tuple
is like a list except that its elements are immutable and can’t be changed.
Here’s a case where I would use an iterator name like color, rather than
i, because it is more explicit/descriptive, and it is less easy to confuse
with an integer.
Lists of lists
Here we extend the use of Python’s most important data structure, the
list. First, we demonstrate that a list can, itself, contain lists. Python lets
you construct lists with any type of item; for example, x =
We use a for loop to step through the list of fruits. Then, when we’ve
located the item we want (which is a list), we read the second item of
that list. As you can see, we use two subscripts, first [i] and then [1].
This is not easy on the eye! Let’s use bold font and shading to emphasize
the components of the string:
testList = [[4,9,[1,6]], [8,7,[0,9]]] # Each element in the list is itself a
list
Indenting in python
We have been indenting code since we introduced Python. Now we re-
emphasize the use of the indent in Python because it is so vital to correct
programming. Most computer languages allow you to group statements
together, as a block, for a particular purpose. Typically, the instructions
in a group are executed as a batch, one by one. Such groups are often
associated with conditional statements and loops.
Here, you have several blocks of operations, which include nested blocks
(i.e., a block within another block). Blocks are executed as if they were a
single operation; that is, they are the computer equivalent of
subcontracting. Although it is not a programming requirement, it is
normal to use indentation as a reading aid to make the code more
understandable to people, as here:
{some operations}
{main loop
{some operations}
if x == 1
{do this batch of operations}
repeat
{do these operations}
}
The dictionary
In this section we introduce Python’s dictionary mechanism, which
makes writing simulators so easy. Here, you will learn how to create a
dictionary that translates one thing into another, for example, translating
the name of an instruction into its binary code. Here we learn about the
following:
The key is often a string, but that is not a requirement. In our computer
simulator, the keys are usually the mnemonic codes of a computer
language. The value associated with a key can be any legal Python data
structure. In some of the simulators we create, we often specify the value
as a tuple, which is an ordered list. For example, the dictionary entry
‘INC’:(8,16)has the key ‘INC’ and the value (8,16). Searching the
dictionary using the key ‘INC’, returns the tuple (8,16). In this case, the
value is the format of the instruction (i.e., 8), and its op-code (i.e., 16).
You could use a list as a value instead of a tuple, that is, ‘INC’:[8,16].
The only significant difference is that you can’t change a tuple once it is
defined.
You can check whether an item is in the dictionary by writing if key in
dictionary, as follows:
if 'INC' in opCodes: # This returns True if 'INC' is in opCodes
We can then access the two fields of the tuple associated with ‘INC’ as
follows:
binaryCode = opData[0]
formatStyle = opData[1]
If the requested key is not in the dictionary, the get method returns None.
None is a Python reserved word and indicates a null value. Note that None
is not zero or an empty string, it has its own type None. Consider the
following:
if opCodes.get(thisInstruction) == None: # Ensure that the instruction is
valid
If we wish to get the value associated with Hastings, we can write this:
x = namSub.get('Hastings')
The first directory in the example converts a register name into its
register number; for example, a register name x can be converted to its
register number y by y = regs.get(x). Of course, you don’t need to use
a dictionary. We could simply write y = int(x[1:]) to convert the string
‘r6’ into the integer 6 by using string processing. However, the
dictionary method is more elegant and easier to follow. Moreover, it’s
more flexible:
regs = {'r0':0, 'r1':1, 'r2':2, 'r3':3, 'r4':4} # Register name-
to-number translation
symTab = {'start':0,'time':24,'stackP':'sp','next':0xF2}
# Symbol table converts
symbolic name to value
x0 = 'add r1,r2,r4' # An example of an instruction in text
form
x1 = x0.split(' ') # Split instruction into op-code and
predicate
x2 = x1[1].split(',') # Split the predicate into tokens
x3 = x2[0] # Get the first token of x2
if x3 in regs: # Is this a valid register?
x4 = regs.get(x3) # Use get() to read its value
print ('x0 = ',x0, '\nx1 = ',x1, '\nx2 = ',x2, '\nx3 = ',x3,
'\nx4 = ',x4)
y0 = 'beq next' # Another example: instruction with a
label
y1 = y0.split(' ') # Split into op-code and predicate on
the space
y2 = y1[1] # Read the predicate (i.e.,'next')
y3 = symTab.get(y2) # Get its value from the symbol table
(i.e., 0xF2)
print('beq ',y3) # Print the instruction with the actual
address
z = symTab.get('beq next'.split(' ')[1]) # We've done it all in one
line. Not so easy to follow.
print('beq ',z)
print('Symbol table ', symTab) # Print the symbol table
using a print
symTab['nextOne'] = 1234 # This is how we add a new
key and value
print('Symbol table ', symTab) # Here's the augmented
symbol table
opCode = {'add':('Arith',0b0001,3),'ldr':('Move',0b1100,2), \
'nop':('Miscellaneous',1111,0)} # New directory. Each key
has three values in a tuple
thisInst = 'ldr' # Let's look up an
instruction
if thisInst in opCode: # First test if it's valid and in
the dictionary
the dictionary
if thisInst == 'ldr': # If it is:
instClass = opCode.get('ldr')[0] # Get first element of the
instruction
binaryVal = opCode.get('ldr')[1] # Get the second element
operands = opCode.get('ldr')[2] # Get the third element
print('\nFor opCode: ',thisInst, '\nClass = ', instClass, \
'\nBinary code = ', bin(binaryVal), '\nNumber of operands
= ',operands)
print('\nThis is how to print a directory')
# Now print a formatted dictionary
(key and value on each line)
for key,value in opCode.items():
print(key, ':', value)
print()
for i,j in opCode.items(): # Note that key and value can be any
two variables
print(i, ':', j)
theKeys = opCode.keys() # The function .keys() returns the
keys in a dictionary
print('The keys are: ',theKeys)
test = {'a':0,'b':0,'c':0,'d':0} # A new directory. The values are just
integers
test['a'] = test['a'] + 1 # You can change a value! Use the
key to locate it
test['d'] = test['d'] + 7
test1 = {'e':0, 'f':0} # Here's a second dictionary.
test.update(test1) # Append it to test using .update()
print('Updated dictionary test is: ',test) # Not convinced? Here
it is then.
The following is the output after executing the above fragment of code:
x0 = add r1,r2,r4
x1 = ['add', 'r1,r2,r4']
x2 = ['r1', 'r2', 'r3']
x3 = r1
x4 = 1
beq 242
beq 242
Symbol table {'start': 0, 'time': 24, 'stackPointer': 'sp',
'next': 242}
Symbol table {'start': 0, 'time': 24, 'stackPointer': 'sp',
'next': 242,
'nextOne': 1234}
F C d ld
For opCode: ldr
Class = Move
Binary code = 0b1100
Number of operands = 2
This is how to print a directory
add : ('Arith', 1, 3)
ldr : ('Move', 12, 2)
nop : ('Miscellaneous', 1111, 0)
add : ('Arith', 1, 3)
ldr : ('Move', 12, 2)
nop : ('Miscellaneous', 1111, 0)
The keys are: dict_keys(['add', 'ldr', 'nop'])
Updated dictionary test is: {'a': 1, 'b': 0, 'c': 0, 'd': 7,
'e': 0, 'f': 0}
Let’s look at dictionaries in more detail with another example. The use of
Python’s dictionaries makes it easy to implement symbolic names for
labels and variables. All we have to do is to create a dictionary with
name: value pairs and use a name to get its associated value. Suppose
we’ve read an instruction, say, ‘ADD r4,r2,r3’, and tokenized it into this:
predicate = ['r4','r2','r3']
# The list of parameters for the op-code
We can get the integer value of a register the hard way by using slicing:
rD = int([predicate[0]][1:])
We can write rD = rD[1:] to return all characters in the string except the
initial ‘r’. The final step is to convert to an integer, which we can do
with rD = int(rD).
The [1:] means all the characters after the first character, r, which
returns ‘4’ if the register was ‘r4’. We could have written [1:2] rather
than [1:]. However, by using [1:], we can later increase the number of
registers beyond 9 without changing the program. Putting all three steps
together, we get this:
rD = int([predicate[0]][1:])
Let’s use a dictionary to carry out the same action. Assume also that
we’ve set up a directory for the registers:
regs = {'r0':0, 'r1':1, 'r2':2, 'r3':3, 'r4':4} # Register names
and values
Finally, note that you can access a dictionary in two ways. Consider the
following:
regs = {'r0':0, 'r1':1, 'r2':2, 'r3':3, 'r4':4}
aaa = regs.get('r3')
bbb = regs['r3']
print('Test aaa = ',aaa, 'bbb =',bbb)
The advantage of get is that it returns None if the key is not found,
whereas the other method creates a runtime error, called KeyError.
Functions revisited
This section looks at functions in a little more detail and demonstrates
the use of the global statement to make parameters accessible outside a
function.
When simulating a computer, you often need data for testing. Typing it in
is time-consuming. Fortunately, Python has a library of functions that
generate random numbers. In order to use a library, you first have to
import it. Consider the following:
import random # Get the library (usually at the start of the
program)
.
.
X = random.randint(0,256) # Generate a random integer in the range
0 to 255
Function calls are usually of the form library.action. In this case, the
library is random and the action is randomint(a,b). The parameters a and
b give the range of random integer values.
Summary
In this chapter, we’ve extended our knowledge of Python and introduced
or expanded some of the features that demonstrate its power and
versatility. For example, we’ve looked at the list and the string, the two
data structures that are of most importance to us. We’ve also expanded
on the use of loops and other repetitive structures.
Analyzing instructions
TC1 postscript
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter06.
In order to construct a Python-based simulator, you need the same tools
used in earlier chapters; that is, you require an editor to create the Python
program and a Python interpreter. These are included in the freely
available Python package we introduced in Chapter 1.
Analyzing instructions
In this section, we will look at the way in which we take a text string
representing an assembly language instruction and process it to create
binary code that can be executed by the simulator.
00010101110000000000000000101010
It’s easy to translate code by hand, but it’s no fun. We are going to create
an assembler that automates the process of translation and lets you use
symbolic names rather than actual literals (constants). Consider the
following example of assembly language code. This is written using
numeric values (shaded), rather than symbolic names. This is not
intended to be a specific assembly language; it is designed to illustrate
the basic concepts:
LDRL R0,60 @ Load R0 with the time factor, 60
. CMP R3,R5 @ Compare R3 and R5
BEQ 1 @ If equal, jump to next-but-one instruction
ADD R2,R2,4 SUB R7,R1,R2
Some languages are case-sensitive and some are not. The assembly
language we have designed is case-insensitive; that is, you can write
either ADD r0,r1,r2 or ADD R0,R1,R2. Consequently, we can write the
load register immediate assembly instruction in all the following forms
to execute a load register-indexed operation:
LDRI R2,[R1],10 or
LDRI R2,r1,10 or
LDRI r2,r1,10
We’ve used different variable names for clarity. Normally, you would
write:
sFile = [i.upper() for i in sFile] #
sFile = [i.split('@')[0] for i in sFile] #
The following are two examples of using this code. In the first case, the
user input is d, indicating a disk program, and in the second case, the
input is x, indicating the use of the embedded source program. In each
case, the course and output values are printed to demonstrate the string-
processing operations.
When the assembler reads a line, it needs to know how to deal with the
opcode and its operands. So, how does it know how to proceed? We can
use Python’s dictionary facility to solve this problem in a very simple
way, by just looking in a table to see what information an opcode
requires.
This returns True if the key is in the dictionary, and False otherwise. You
can use not in to test whether something is not in a dictionary.
Python allows a key to be associated with any valid object, for example,
a list. We could write, for example, the following key:value pair:
'ADD': [3, 0b1101001, 'Addition', '07/05/2021', timesUsed]
Here, the value associated with a key is a five-element list that associates
the ADD mnemonic with the number of its operands, its binary encoding,
its name, the date it was designed, and the number of times it was used in
the current program (as well as being able to read a value from a
dictionary, you can write to it and update it).
In this example, we used the get() method to read the value associated
with a key. If the key is x, its value is given by validCd.get(x); that is,
the syntax is dictionaryName.get(key).
Processing labels
The first version of TC1 required you to provide actual values for all
names and labels. If you wanted to jump to an instruction, you had to
provide the number of lines to jump. It’s much better to allow the
programmer to write the following:
JMP next
Here, next is the label of the target line. This is preferred over writing the
following:
JMP 21
MULL R0,R1,MINUTES
MULL R0,R1,60
The EQU assembler directive equates a value with a symbolic name. For
example, TC1 lets you write the following:
MINUTES EQU 60
The for loop (shaded) reads each line of the source code, sFile, and tests
for lines where ‘EQU’ is the second token in the line. The len(sFile[i])
> 2 comparison ensures that this line has at least three tokens to ensure
it’s a valid equate directive. The text is in bold font.
We check that the second token is ‘EQU’ with sFile[i][1] == ‘EQU’. The
sFile[i][1] notation has two list indexes. The first, in bold, indicates line
i of the source code, and the second index indicates token 1 of that line;
that is, it is the second element.
This line uses a list comprehension to scan the source file and delete any
line with EQU, because only instructions are loaded in program memory.
A line containing EQU is a directive and not an instruction. This operation
uses the count method, i.count(‘EQU’), to count the number of times EQU
appears in a line, and then deletes that line if the count isn’t 0. The
condition we test for before moving (i.e., keeping) a line is as follows:
if i.count(‘EQU’) == 0:
Here, i is the current line being processed. The count method is applied
to the current line and counts the number of occurrences of the ‘EQU’
string in the line. Only if the count is 0 (i.e., it isn’t a line with an EQU
directive) does that line get copied into sFile.
The code in bold is the code we’ve discussed. The remaining code is
made up of print statements used to observe the code’s behavior. The
key line in this code is as follows:
symbolTab[sFile[i][0]] = sFile[i][2]
symbolTab[key] = value
The final two lines give the symbol table and the post-processed version
of sFile. The two equates have been loaded into the dictionary (symbol
table) and the processed output has had the two equates stripped.
Labels
The next step in processing the source file is to deal with labels. Take the
following example:
DEC r1 @ Decrement r1
BEQ NEXT1 @ If result in r1 is 0,
then jump to line NEXT1
INC r2 @ If result not 0,
increment r2 . NEXT1 .
The binary program (machine code) generated by TC1 does not store or
use labels. It requires either the actual address of the next instruction or
its relative address (i.e., how far it needs to jump from the current
location). In other words, we need to translate the NEXT1 label into its
actual address in the program.
This is a job for the dictionary. All we have to do is put a label in the
dictionary as a key and then insert the corresponding address as the value
associated with the key. The following three lines of Python demonstrate
how we collect label addresses and put them in the symbol table:
1. for i in range(0,len(sFile)): # Add branch
addresses to symbol tab
2. if sFile[i][0] not in codes: # If first token
not an opcode, it's a label
3. symbolTab.update({sFile[i][0]:str(i)}) # Add pc value, i to
sym tab as string
4. print('\nEquate and branch table\n') # Display symbol
table
5. for x,y in symbolTab.items(): # Step through symbol
table
6. print('{:<8}'.format(x),y)
The three lines, 1 to 3, define a for loop that steps through every line in
the source code in sFile. Because we’ve processed the code to convert
each instruction into a list of tokens, each line begins with either a valid
mnemonic or a label. All we have to do is check whether the first token
on a line is in the list (or dictionary) of mnemonics. If the first token is in
the list, it’s an instruction. If it’s not in the list, then it’s a label (we are
ignoring the case that it’s an error).
Here, sFile[i][0] represents the first item (i.e., token) of line i in the
dictionary of mnemonics. The not in Python code returns True if the
mnemonic is not in the dictionary called codes. If the test does return
True, then we have a label and must put it in the symbol table with the
following operation:
3. symbolTab.update({sFile[i][0]:str(i)}) # i is the pc
value
This expression says, “Add the specified key:value pair to the dictionary
called symbolTable.” Why is the value associated with the label given as
i? The value associated with the label is the address of that line (i.e., the
value of the program counter, pc, when that line is executed). Since we
are stepping through the source code line by line, the counter, i, is the
corresponding value of the program counter.
The update method is applied to the symbol table with sFile[i][0] as the
key and str(i) as the value. The key is sFile[i][0], which is the label
(i.e., a string). However, the value of i is not a string. The value is an
integer, i, which is the current line address. We convert the integer
address into a string with str(i) because equates are stored in the table as
strings (i.e., this is a design decision made by me).
The value of the symbol table is printed using a for loop. We extract a
key:value pair by using the following:
5. for x,y in symbolTab.items():
The items() method steps through all the elements of the symbolTab
dictionary and allows us to print each key:pair value (i.e., all
names/labels and their values). The print statement displays eight
characters, right justified, by using {:<8}.format(x) to format the value
of x.
Note that the code in this section describes some of the instruction
processing involved in analyzing instructions. The actual simulator
differs in minor details, although the principles are the same.
We first have to extract the mnemonic, convert it into binary, then extract
the register numbers (where appropriate), and finally, insert the 16-bit
literal. Moreover, because the assembler is in text form, we have to be
able to deal with literals that are symbolic (i.e., they are names rather
than numbers), decimal, negative, binary, or hexadecimal; that is, we
have to handle instructions of the following form:
LDRL r0,24 @ Decimal numeric value
LDRL r0,0xF2C3 @ Hexadecimal numeric value
LDRL r0,$F2C3 @ Hexadecimal numeric value (alternative
representation)
LDRL r0,%00110101 @ Binary numeric value
LDRL r0,0b00110101 @ Binary numeric value (alternative
representation)
LDRL r0,-234 @ Negative decimal numeric value
LDRL r0,ALAN2 @ Symbolic value requiring symbol table
look-up
The assembler looks at each line of the source code and extracts the
mnemonic. An instruction is a list of tokens (e.g., ‘NEXT’, ‘ADD’, ‘r1’,
‘r2’, ‘0x12FA’, which is five tokens, or ‘STOP’, which is one token). The
situation is made more complex because the mnemonic may be the first
token, or the second token if the instruction has a label. In the following
example, sFile contains the program as a list of instructions, and we are
processing line i, sFile[i]. Our solution is as follows:
1. Read the first token, sFile[i][0]. If this token is in the list of codes,
then it’s an instruction. If it is not in the list of codes, it’s a label, and
the second token, sFile[i][1], is the instruction.
3. Read the register numbers from the tokens in the instruction; for
example, ADD r3,r2,r7 would return 3,2,7, whereas NOP would return
0,0,0 (if a register field is not used, it is set to 0).
4. Read any literal and convert it into a 16-bit integer. This is the most
complex operation because the literal may have one of the seven
different formats described previously.
Lines 2 and 3 in the loop declare and initialize the variables and provide
default values.
The first if…else statement on line 4 looks at the first token on line i of
the source code, sFile[i][0]. If that token is in the codes dictionary,
then sFile[i][0] is the opcode. If it isn’t in the dictionary, then that
token must be a label and the second token is the opcode (lines 4 and 5):
4. if sFile[i][0] in codes: opcode = sFile[i][0] # If first token is a valid
opcode, get it
We have to deal with two cases: the first token is the mnemonic and the
second token is the mnemonic. We also check that the line is long
enough to have a predicate. If there is a predicate, it is extracted by lines
7 and 9:
7. predicate = sFile[i][1:] # The predicate is the second
and following tokens
9. predicate = sFile[i][2:] # The predicate is the third and
following tokens
The notation [2:] indicates everything from token 2 to the end of the line.
This is a very nice feature of Python because it doesn’t require you to
explicitly state the length of the line. Once we’ve extracted the predicate
containing the register and literal information, we can start to assemble
the instruction.
Next, we extract the current line’s code format to get the information
required from the predicate. Line 10, form = codes.get(opCode),
accesses the codes dictionary to look for the mnemonic, which is in the
opCode variable. The get method is applied to codes and the form
variable receives the key value, which is the (format,code) tuple, for
example, (8,10). The form[0] variable is the instruction format, and
form[1] is the opcode:
10. form = codes.get(opCode) # Use opcode to read
instruction format
11. if form[0] & 0b1000 == 0b1000: # Bit 3 of format
selects destination reg rD
12. if predicate[0] in symbolTab: # Check whether
first token is symbol table
13. rD =int(symbolTab[predicate[0]][1:]) # If it's a label, then
get its value
The second element of the tuple, form[1], gives the 7-bit opcode; that is,
0100010 for LDRL. Lines 10 to 13 demonstrate how the destination register
is extracted. We first use AND form[0] with 0b1000 to test the most
significant bit that indicates whether a destination register, rD, is required
by this instruction. If it is required, we first test whether the register is
expressed in the form R0, or whether it’s given as a name, for example,
TIME. We have to do this because TC1 lets you rename registers by using
the EQU directive.
if ‘INC’ in opCodes:
opCodes.get(‘INC’).
The preceding example returns format = (8,82). 8 refers to the format
code 0b1000 (specifying a destination register). 82 is the opcode for this
instruction. We access the two fields of the value associated with ‘INC’
with, for example, the following:
binaryCode = format[0]
formatStyle = format[1]
rD = int(symbolTab[predicate[0]][1:])
We interrogate the symbol table with a key, which is the first element of
the predicate since the destination register always comes first in a TC1
assembly language instruction (e.g., in ADD r4,r7,r2, register r4 is the
first element). The register is given by predicate[0]. The
symbolTab[predicate[0]] expression looks up the symbolic name and
provides its value; for example, consider TIME EQU R3. The INC TIME
assembly language instruction will look up TIME and return R3. We now
have the destination operand, but it is a string, ‘R3’, and not a number.
We just want 3 and have to use the int function to convert a number in
string format into an integer value.
destReg = symbolTab[predicate[0]]
Remember that [1:] means all the characters after the first character,
‘R’. Consequently, this returns ‘3’ if the register was ‘R3’. We could
have written [1:2] rather than [1:] since the number is in the range 1 to
7. However, by using the [1:] notation, we can later increase the number
of registers beyond 9 without changing the program.
int(symbolTab[predicate[0]][1:]).
This takes the last element of the predicate (indicated by the [-1] index)
and looks to see whether it’s in the symbol table. If it isn’t, the code tests
for other types of literal. If it is in the symbol table, it is extracted and the
myData symbolic name is replaced with its actual value.
The if construct uses the type() function, which returns the type of an
object. In this case, it will be ‘int’ if the object is an integer. The str()
function converts an integer object into a string object.
Let’s look at the hex conversion. We have to make two selections: the
token and then the specific characters of the token. Consider ADDL
R1,R2,0XF2A4. The predicate is ‘R1 R2 0XF2A4’, which is tokenized as
predicate = [‘R1’, ‘R2’, ‘0XF2A4’].
We can save a line by combining the two list-index suffixes, [-1] and
[0:2], into predicate[-1][0:2].
In the next chapter, we’ll return to the TC1 simulator and expand it.
We’ll also demonstrate how the TC1 simulator can be extended by
adding new operations to the instruction set and some ways of printing
the results of a simulator.
In this section, you will learn how to design a simulator without some of
the complications associated with a fully fledged design.
The Simulator
The simulator supports register-to-register operations, such as ADD
r1,r2,r3. Its only memory access is pointer-based, that is, LDRI r1,[r2]
The key:value pair uses a mnemonic as the key and a list with one item,
the class of the instruction, as the value. The classes range from 0 (a
mnemonic with no operands) to 7 (a mnemonic with a register and
register indirect operand). We’ve not implemented TC1’s 4-bit format
code, which is used to determine the parameters required by an
instruction, because that information is implicit in the class. Moreover,
we do not assemble the instruction into a binary code. We read the
mnemonic in text form and directly execute it.
Once the mnemonic and register numbers/values and literal are known, a
simple if .. elif structure is used to select the appropriate instruction
and then execute it. Most instructions are interpreted in a single line of
Python.
At the end of the instruction reading and execution loop, you are invited
to hit a key to execute the next instruction in sequence. The data
displayed after each instruction is the program counter, z-bit, instruction,
registers, and memory location. We use only four registers and eight
memory locations.
We have split this program into sections with brief descriptions between
them. The first part provides the source code as a built-in list. It defines
the instruction classes and provides a list of opcodes and their classes.
We don’t use a dictionary for this. However, we do provide dictionaries
for the registers and their indirect versions to simplify analyzing
instructions. For example, we can look up both r1 and r2 in the LDRI r1,
[r2] instruction:
sFile = ['LDRL r2,1','LDRL r0,4','NOP','STRI r0,[r2]','LDRI r3,
[r2]', \
'INC r3','ADDL r3,r3,2','NOP','DEC r3', 'BNE -2','DEC
r3','STOP']
# Source program for
testing
# Simple CPU instruction interpreter. Direct instruction interpretation. 30 September
2022. V1.0
# Class 0: no operand NOP
# Class 1: literal BEQ 3
# Class 2: register INC r1
# Class 3: register,literal LDRL r1,5
# Class 4: register,register, MOV r1,r2
# Class 5: register,register,literal ADDL r1,r2,5
# Class 6: register,register,register ADD r1,r2,r3
# Class 7: register,[register] LDRI r1,[r2]
codes = {'NOP':[0],'STOP':[0],'BEQ':[1],'BNE':[1],'BRA':[1], \
'INC':[2],'DEC':[2],'CMPL':[3],'LDRL':[3],'MOV':[4],
\
'CMP':[4],'SUBL':[5],'ADDL':[5],'ANDL':[5],'ADD':[6],
\
'SUB':[6], 'AND':[6],'LDRI':[7],'STRI':[7]}
reg1 = {'r0':0,'r1':1,'r2':2,'r3':3} # Legal registers
reg2 = {'[r0]':0,'[r1]':1,'[r2]':2,'[r3]':3} # Legal pointer
registers
r = [0] * 4 # Four registers
r[0],r[1],r[2],r[3] = 1,2,3,4 # Preset registers for
testing
m = [0] * 8 # Eight memory locations
pc = 0 # Program counter
initialize to 0
go = 1 # go is the run control (1
to run)
z = 0 # z is the zero flag.
Set/cleared by SUB, DEC, CMP
while go == 1: # Repeat execute fetch
and execute loop
thisLine = sFile[pc] # Get current instruction
pc = pc + 1 # Increment pc
pcOld = pc # Remember pc value for
this cycle
temp = thisLine.replace(',',' ') # Remove commas: ADD
r1,r2,r3 to ADD r1 r2 r3
Note that the execution loop ends with an input request from the
keyboard. In this way, the next cycle is not executed until the
Enter/Return key is pressed.
Single-stepping
A computer executes instructions sequentially unless a branch or
subroutine call is encountered. When testing a simulator, you frequently
want to execute a batch of instructions together (i.e., without printing
register values), or you may wish to execute instructions one at a time by
hitting Enter/Return after each instruction has been executed or to
execute instructions until you hit a specific instruction.
In this version of TC1, you can execute and display an instruction, skip
the display of the next n instructions, or not display instructions until a
change-of-flow instruction is encountered. After the program is loaded,
the input prompt is displayed. If you enter a return, the simulator
executes the next instruction and waits. If you enter an integer (and
return), the specified number of instructions is executed without
displaying the results. If you enter b followed by a return, the simulator
executes instructions without displaying them until the next branch
instruction is encountered.
I’ve edited it to remove memory locations as they are not accessed. After
the prompt, >>>, you select what is to happen: trace one instruction,
execute n instructions without stopping or displaying registers, or
execute code to the next branch instruction without displaying it. In each
case, the following program counter value is highlighted in the following
output. The text in bold is a comment I left on an action on the current
line (trace indicates a Return/Enter was hit, which executes the next
instruction):
>>> trace 0 NOP PC= 0 z=0 n=0 c=0 R 0000 0000
0000 0000 0000 0000 0000 0000 >>>3 jump 3 instructions (silent
trace) 4 DEC R2 PC= 4 z=0 n=1 c=1 R 0000 0001 ffff
0000 0000 0000 0000 0000 >>>b jump to branch (silent mode up to
next branch/rts/jsr) 6 BRA ABC PC= 6 z=0 n=1 c=1 R
0000 0001 ffff 0000 0000 0000 00aa 0000 >>> trace Here’s the
sample run 10 ABC LDRL R3 $ABCD PC=10 z=0 n=1 c=1 R 0000 0001
ffff abcd 0000 0000 00aa 0000 >>> trace
11 NOP PC=11 z=0 n=1 c=1 R 0000 0001 ffff abcd 0000
0000 00aa 0000 >>>4 jump 4 16 INC R5 PC=16 z=0 n=0 c=1
R 0000 0001 ffff abce 0001 0001 00aa 0000 >>> trace
17 END! PC=17 z=0 n=0 c=1 R 0000 0001 ffff abce 0001
0001 00aa 0000
File input
When we first started writing a simulator, we inputted test programs the
easy way by typing the instructions in one by one. This worked for the
simplest of tests but soon became tedious. Later, programs were input as
a text file. That worked well when the filename was short, such as t.txt,
but it got more tedious with long filenames (e.g., when I stored the
source code in a specific directory).
As their names suggest, try requires Python to run the following block
of code, and exception is a block of code that is executed if the try
block failed. Essentially, it means, “If you can’t do this, do that.” The
difference between if and try is that if returns True or False and
performs the specified action if True, whereas try attempts to run a block
and calls an exception if it fails, that is, if it crashes.
try allows you to attempt to open a file and then gives you a way out if
the file doesn’t exist (i.e., it avoids a fatal error). Consider the following:
myProg = 'testException1.txt' # Name of
default program
try: # Check
whether this file exists
with open(myProg,'r') as prgN: # If it's there,
open it and read it
myFile = prgN.readlines()
except: # Call exception
if file not there
altProg = input('Enter source file name: ') # Request a
filename
with open(altProg,'r') as prgN: # Open the user
file
myFile = prgN.readlines()
print('File loaded: ' myFile)
print( File loaded: , myFile)
This code looks for a file called testException1.txt. If it’s present (as it
is in this case), the simulator runs it and we get the following output:
>>> %Run testTry.py
File loaded: [' @ Test exception file\n', ' nop\n', '
nop\n', ' inc\n', ' end!']
This expression automatically provides the path for the filename and
address of the file type.
Remember that Python lets you use the + operator to concatenate strings.
TC1 program
The first part of the program provides a list of instructions and their
encoding. This text is placed between two ”’ markers that indicate it is
not part of the program. This avoids having to start each line with #. The
triple quote marks is called a docstring comment.
The first part of TC1 is a listing of the instructions. These are provided to
make the program easier to follow:
### TC1 computer simulator and assembler. Version of 11 September 2022
''' This is the table of instructions for reference and is not part of the program code
00 00000 stop operation STOP 00 00000 000
00 00000 stop operation STOP 00 00000 000
000 000 0 0000
00 00001 no operation NOP 00 00001 000
000 000 0 0000
00 00010 get character from keyboard GET r0 00 00010 rrr
000 000 0 1000
00 00011 get character from keyboard RND r0 00 00011 rrr
000 000 L 1001
00 00100 swap bytes in register SWAP r0 00 00100 rrr
000 000 0 1000
00 01000 print hex value in register PRT r0 00 01000 rrr
000 000 0 1000
00 11111 terminate program END! 00 11111 000
000 000 0 0000
01 00000 load register from register MOVE r0,r1 01 00000 rrr
aaa 000 0 1100
01 00001 load register from memory LDRM r0,L 01 00001 rrr
000 000 L 1001
01 00010 load register with literal LDRL r0,L 01 00010 rrr
000 000 L 1001
01 00011 load register indirect LDRI r0,[r1,L] 01 00011 rrr
aaa 000 L 1101
01 00100 store register in memory STRM r0,L 01 00100 rrr
000 000 L 1001
01 00101 store register indirect STRI r0,[r1,L] 01 00101 rrr
aaa 000 L 1101
10 00000 add register to register ADD r0,r1,r2 10 00000 rrr
aaa bbb 0 1110
10 00001 add literal to register ADDL r0,r1,L 10 00001 rrr
aaa 000 L 1101
10 00010 subtract register from register SUB r0,r1,r2 10 00010 rrr
aaa bbb 0 1110
10 00011 subtract literal from register SUBL r0,r1,L 10 00011 rrr
aaa 000 L 1101
10 00100 multiply register by register MUL r0,r1,r2 10 00100 rrr
aaa bbb 0 1110
10 00101 multiply literal by register MULL r0,r1,L 10 00101 rrr
aaa 000 L 1101
10 00110 divide register by register DIV r0,r1,r2 10 00110 rrr
aaa bbb 0 1110
10 00111 divide register by literal DIVL r0,r1,L 10 00111 rrr
aaa 000 L 1101
10 01000 mod register by register MOD r0,r1,r2 10 01000 rrr
aaa bbb 0 1110
10 01001 mod register by literal MODL r0,r1,L 10 01001 rrr
aaa 000 L 1101
10 01010 AND register to register AND r0,r1,r2 10 01000 rrr
aaa bbb 0 1110
10 01011 AND register to literal ANDL r0,r1,L 10 01001 rrr
aaa 000 L 1101
10 01100 OR register to register OR r0,r1,r2 10 01010 rrr
aaa bbb 0 1110
10 01101 NOR register to literal ORL r0,r1,L 10 01011 rrr
aaa 000 L 1101
10 01110 EOR register to register OR r0,r1,r2 10 01010 rrr
aaa bbb 0 1110
10 01111 EOR register to literal ORL r0,r1,L 10 01011 rrr
aaa 000 L 1101
10 10000 NOT register NOT r0 10 10000 rrr
000 000 0 1000
10 10010 increment register INC r0 10 10010 rrr
000 000 0 1000
10 10011 decrement register DEC r0 10 10011 rrr
000 000 0 1000
10 10100 compare register with register CMP r0,r1 10 10100 rrr
aaa 000 0 1100
10 10101 compare register with literal CMPL r0,L 10 10101 rrr
000 000 L 1001
10 10110 add with carry ADC 10 10110 rrr
aaa bbb 0 1110
10 10111 subtract with borrow SBC 10 10111 rrr
aaa bbb 0 1110
10 11000 logical shift left LSL r0,L 10 10000 rrr
000 000 0 1001
10 11001 logical shift left literal LSLL r0,L 10 10000 rrr
000 000 L 1001
10 11010 logical shift right LSR r0,L 10 10001 rrr
000 000 0 1001
10 11011 logical shift right literal LSRL r0,L 10 10001 rrr
000 000 L 1001
10 11100 rotate left ROL r0,L 10 10010 rrr
000 000 0 1001
10 11101 rotate left literal ROLL r0,L 10 10010 rrr
000 000 L 1001
10 11110 rotate right ROR r0,L 10 10010 rrr
000 000 0 1001
10 11111 rotate right literal RORL r0,L 10 10010 rrr
000 000 L 1001
11 00000 branch unconditionally BRA L 11 00000 000
000 000 L 0001
11 00001 branch on zero BEQ L 11 00001 000
000 000 L 0001
11 00010 branch on not zero BNE L 11 00010 000
000 000 L 0001
11 00011 branch on minus BMI L 11 00011 000
000 000 L 0001
11 00100 branch to subroutine BSR L 11 00100 000
000 000 L 0001
11 00101 return from subroutine RTS 11 00101 000
000 000 0 0000
11 00110 decrement & branch on not zero DBNE r0,L 11 00110 rrr
000 000 L 1001
11 00111 decrement & branch on zero DBEQ r0,L 11 00111 rrr
000 000 L 1001
11 01000 push register on stack PUSH r0 11 01000 rrr
000 000 0 1000
11 01001 pull register off stack PULL r0 11 01001 rrr
000 000 0 1000
'''
import random # Get library for random number
generator
def alu(fun,a,b): # Alu defines operation and a and b
are inputs
global c,n,z # Status flags are global and are set
up here
if fun == 'ADD': s = a + b
elif fun == 'SUB': s = a - b
elif fun == 'MUL': s = a * b
elif fun == 'DIV': s = a // b # Floor division returns an integer
result
elif fun == 'MOD': s = a % b # Modulus operation gives
remainder: 12 % 5 = 2
elif fun == 'AND': s = a & b # Logic functions
elif fun == 'OR': s = a | b
elif fun == 'EOR': s = a & b
elif fun == 'NOT': s = ~a
elif fun == 'ADC': s = a + b + c # Add with carry
elif fun == 'SBC': s = a - b – c # Subtract with borrow
c,n,z = 0,0,0 # Clear flags before recalculating
them
if s & 0xFFFF == 0: z = 1 # Calculate the c, n, and z flags
if s & 0x8000 != 0: n = 1 # Negative if most sig bit 15 is 1
if s & 0xFFFF != 0: c = 1 # Carry set if bit 16 is 1
return (s & 0xFFFF) # Return the result constrained to
16 bits
Because the shift operation is rather complex with left and right shifts,
variable-length shifts, plus shifts, and rotates, we have provided a
function to implement shifts. This takes the type of shift, direction, and
number of places shifted as input parameters, together with the word to
be shifted:
def shift(dir,mode,p,q): # Shifter: performs shifts and rotates. dir =
left/right, mode = logical/rotate
global z,n,c # Make flag bits global. Note v-bit
not implemented
if dir == 0: # dir = 0 for left shift, 1 for right shift
for i in range (0,q): # Perform q left shifts on p
sign = (0x8000 & p) >> 15 # Sign bit
p = (p << 1) & 0xFFFF # Shift p left one
place
if mode == 1:p = (p & 0xFFFE) | sign # For rotate left,
add in bit shifted out
else: # dir = 1 for right
shift
for i in range (0,q): # Perform q right
shifts
bitOut = 0x0001 & p # Save lsb shifted
out
sign = (0x8000 & p) >> 15 # Get sign-bit for
ASR
p = p >> 1 # Shift p one place
right
if mode == 1:p = (p&0x7FFF)|(bitOut<<15) # If mode =
1, insert bit rotated out
if mode == 2:p = (p&0x7FFF)|(sign << 15) # If mode =
2, propagate sign bit
z,c,n = 0,0,0 # Clear all flags
if p == 0: z = 1 # Set z if p is zero
if p & 0x8000 != 0: n = 1 # Set n-bit if p = 1
if (dir == 0) and (sign == 1): c = 1 # Set carry if left
shift and sign 1
if (dir == 1) and (bitOut == 1): c = 1 # Set carry bit if right shift
and bit moved out = 1
return(0xFFFF & p) # Ensure output is 16 bits wide
def listingP(): # Function to perform listing and
formatting of source code
global listing # Listing contains the formatted
source code
listing = [0]*128 # Create formatted listing file for
display
if debugLevel > 1: print('Source assembly code listing ')
for i in range (0,len(sFile)): # Step through the program
if sFile[i][0] in codes: # Is first token in opcodes
(no label)?
i2 = (' ').join(sFile[i]) # Convert tokens into string
for printing
i1 = '' # Dummy string i1
represents missing label
else:
i2 = (' ').join(sFile[i][1:]) # If first token not opcode,
it's a label
i1 = sFile[i][0] # i1 is the label (first token)
listing[i] = '{:<3}'.format(i) + '{:<7}'.format(i1) + \
'{:<10}'.format(i2) # Create listing table entry
if debugLevel > 1: # If debug = 1, don't print
source program
print('{:<3}'.format(i),'{:<7}'.format(i1),'{:
<10}'.format(i2)) \
# print: pc, label, opcode
return()
This is the function, getLit, that processes a literal. It can handle literals
in a range of possible formats, including decimal, binary, hexadecimal,
and symbolic form:
def getLit(litV): # Extract a
literal
if litV[0] == '#': litV = litV[1:] # Some
systems prefix literal with '#
if litV in symbolTab: # Look in sym
tab and get value if there
literal = symbolTab[litV] # Read the
symbol value as a string
literal = int(literal) # Convert string
into integer
elif litV[0] == '%': literal = int(litV[1:],2)
e t [0] % : te a t( t [ :], )
# If first char is
%, convert to integer
elif litV[0:2] == '0B':literal = int(litV[2:],2)
# If prefix 0B,
convert binary to integer
elif litV[0:2] == '0X':literal = int(litV[2:],16)
# If 0x, convert
hex string to integer
elif litV[0:1] == '$': literal = int(litV[1:],16)
# If $, convert
hex string to integer
elif litV[0] == '-': literal = (-int(litV[1:]))&0xFFFF
# Convert 2's
complement to int
elif litV.isnumeric(): literal = int(litV)
# If decimal
string, convert to integer
else: literal = 0 # Default value
0 (default value)
return(literal)
This short section deals with the equate assembler directive and binds
values to symbolic names using the EQU directive. These bindings are
placed in the symbol table dictionary and the equates are removed from
the source code:
# Remove assembler directives from
source code for i in range (0,len(sFile)): # Deal with equates of
the form PQR EQU 25 if len(sFile[i]) > 2 and sFile[i][1] ==
‘EQU’: # If line is > 2 tokens and second is EQU symbolTab[sFile[i]
[0]] = sFile[i][2] # Put third token EQU in symbol table sFile = [i
for i in sFile if i.count(‘EQU’) == 0] # Remove all lines with ‘EQU’
# Debug: 1 none, 2 source, 3 symbol
tab, 4 Decode i, 5 stack listingP() # List the
source code if debug level is 1
Here, we perform the instruction decoding; that is, we analyze the text of
each instruction to extract the opcode and parameters:
# Look for labels and add to symbol
table for i in range(0,len(sFile)): # Add branch addresses to
symbol table if sFile[i][0] not in codes: # If first token not
opcode, then it is a label symbolTab.update({sFile[i]
[0]:str(i)}) # Add it to the symbol table if debugLevel >
2: # Display symbol table if debug level 2
print(‘\nEquate and branch table\n’) # Display the
symbol table for x,y in symbolTab.items(): print(‘{:
<8}’.format(x),y) \
# Step through the symbol
table dictionary print(‘\n’) # Assemble source code in sFile
if debugLevel > 3: print(‘Decoded instructions’) # If debug level
4/5, print decoded ops for pcA in range(0,len(sFile)): #
ASSEMBLY: pcA = prog counter in assembly opCode, label, literal,
predicate = [], [], 0, [] # Initialize variables
# Instruction = label +
opcode + predicate rD, rS1, rS2 = 0, 0, 0 #
Clear all register-select fields thisOp =
sFile[pcA] # Get current instruction, thisOPp, in text
form # Instruction: label +
opcode or opcode if thisOp[0] in codes: opCode =
thisOp[0] # If token opcode, then get token
else: # Otherwise, opcode is
second token opCode = thisOp[1] # Read the
second token to get opcode label = sFile[i]
[0] # Read the first token to get the label if
(thisOp[0] in codes) and (len(thisOp) > 1): # If first token opcode, rest
is predicate predicate = thisOp[1:] # Now get the
predicate else: # Get predicate
if the line has a label if len(thisOp) > 2: predicate =
thisOp[2:] form = codes.get(opCode) # Use
opcode to read type (format)
# Now check the bits of the
format code if form[0] & 0b1000 == 0b1000: # Bit 4 selects
destination register rD if predicate[0] in symbolTab: #
Check if first token in symbol table rD =
int(symbolTab[predicate[0]][1:]) # If it is, then get its value
else: rD = int(predicate[0][1:]) # If not label, get register
from the predicate if form[0] & 0b0100 == 0b0100: # Bit 3
selects source register 1, rS1 if predicate[1] in symbolTab:
rS1 = int(symbolTab[predicate[1]][1:]) else:
rS1 = int(predicate[1][1:]) if form[0] & 0b0010 ==
0b0010: # Bit 2 of format selects register rS1 if
predicate[2] in symbolTab: rS2 =
int(symbolTab[predicate[2]][1:]) else: rS2 =
int(predicate[2][1:]) if form[0] & 0b0001 ==
0b0001: # Bit 1 of format selects the literal field litV =
predicate[-1] literal = getLit(litV)
This section was added after the development of TC1. We introduce the
concept of a debug level. That is, at the beginning of a simulation run,
you can set a parameter in the range of 1 to 3 that determines how much
information is displayed during the assembly processing. This allows
you to get more information about the instruction encoding when testing
the program:
if debugLevel > 3: # If debug level > 3, print
decoded fields t0 = ‘%02d’ % pcA # Format
instruction counter t1 = ‘{:<23}’.format(‘ ‘.join(thisOp)) #
Format operation to 23 spaces t3 = ‘%04x’ %
literal # Format literal to 4-character hex t4 =
‘{:04b}’.format(form[0]) # Format the 4-bit opcode format field
print(‘pc =’,t0,‘Op =’,t1,‘literal’,t3,‘Dest reg
=’,rD,‘rS1 =’, \ ‘rS1,‘rS2 =’,rS2,‘format =’,t4) #
Concatenate fields to create 32-bit opcode binCode = form[1]<<25|(rD)
<<22|(rS1)<<19|(rS2)<<16|literal # Binary pattern memP[pcA] =
binCode # Store instruction in program memory
# End of the assembly
portion of the program
We are about to execute the instructions. Before we do that, it is
necessary to initialize several variables concerning the current operation
(e.g., tracing):
# The code is executed here
r = [0] * 8 # Define registers r[0] to r[7]
pc = 0 # Set program counter to 0
run = 1 # run = 1 during execution
sp = 16 # Initialize the stack pointer
(BSR/RTS) goCount = 0 # goCount
executes n operations with no display traceMode =
0 # Set to 1 to execute n instructions without display
skipToBranch = 0 # Used when turning off
tracing until a branch
silent = 0
# silent = 1 to turn off single stepping
This section performs a function called tracing and allows us to list the
contents of the register or turn off the listing as we execute the code:
# Instruction interpretation
complete. Deal with display if silent == 0: #
Read keyboard ONLY if not in silent mode x =
input(‘>>>’) # Get keyboard input to continue if x
== ‘b’: skipToBranch = 1 # Set flag to execute to branch with no display
if x.isnumeric(): # Is this a trace mode with a
number of steps to skip? traceMode = 1 # If so,
set traceMode goCount = getLit(x) + 1 # Record the
number of lines to skip printing if skipToBranch == 1: #
Are we in skip-to-branch mode? silent =
1 # If so, turn off printing status if mnemonic
in branchGroup: # Have we reached a branch? silent =
0 # If branch, turn off silent mode and allow tracing
skipToBranch = 0 # Turn off skip-to-branch mode
if traceMode == 1: # If in silent mode (no display of
data) silent = 1 # Set silent flag
goCount = goCount – 1 # Decrement silent mode count
if goCount == 0: # If we’ve reached zero, turn
display on traceMode = 0 # Leave trace mode
silent = 0 # Set silent flag back to zero (off)
if silent == 0: printStatus()
Now that we’ve explained the TC1 simulator, we’ll demonstrate its use.
Original (source) 0 1 2 3 4
Swapped (destination) 4 3 2 1 0
As you can see, location 0 is swapped with location 4, then location 1
with location 3; then, at location 2, we have reached the middle point and
the reversal is complete. To perform this action, we need two pointers,
one for each end of the string. We select the two characters at the ends of
the string and swap them. Then, we move the pointers inward and do a
second swap. The task is complete when the pointers meet in the middle.
Note that this assumes an odd number of items to reverse:
Set upper pointer to top
Set lower pointer to bottom
Repeat
Get value at upper pointer
Get value at lower pointer
Swap values and store
Until upper pointer and lower pointer are equal
The first block is the source code printed by TC1 before the start of the
instruction execution:
TC1 CPU simulator 11 September 2022
Input debug level 1 - 5: 4
Source assembly code listing
0 LDRL R0 0
1 LDRL R1 5
2 LOOP1 RND R2
3 STRI R2 R0 0
4 INC R0
5 DEC R1
6 BNE LOOP1
7 NOP
8 LDRL R0 0
9 LDRL R1 4
10 LOOP2 LDRI R2 R0 0
11 LDRI R3 R1 0
12 MOVE R4 R2
13 STRI R3 R0 0
14 STRI R4 R1 0
15 INC R0
16 DEC R1
17 CMP R0 R1
18 BNE LOOP2
19 NOP
20 STOP
21 END!
Equate and branch table
START 0
LOOP1 2
LOOP2 10
The second code block shows the output of the assembler as instructions
are decoded. You can see the various registers, the literal, and the format
field:
Decoded instructions
pc=00 Op = LDRL R0 0 literal 0000 RD=0 rS1=0 rS2=0
format=1001
pc=01 Op = LDRL R1 5 literal 0005 RD=1 rS1=0 rS2=0
format=1001
pc=02 Op= LOOP1 RND R2 0XFFFF literal ffff RD=2 rS1=0 rS2=0
format=1001
pc=03 Op = STRI R2 R0 0 literal 0000 RD=2 rS1=0 rS2=0
format=1101
pc=04 Op = INC R0 literal 0000 RD=0 rS1=0 rS2=0
format=1000
pc=05 Op = DEC R1 literal 0000 RD=1 rS1=0 rS2=0
p p
format=1000
pc=06 Op = BNE LOOP1 literal 0002 RD=0 rS1=0 rS2=0
format=0001
pc=07 Op = NOP literal 0000 RD=0 rS1=0 rS2=0
format=0000
pc=08 Op = LDRL R0 0 literal 0000 RD=0 rS1=0 rS2=0
format=1001
pc=09 Op = LDRL R1 4 literal 0004 RD=1 rS1=0 rS2=0
format=1001
pc=10 Op= LOOP2 LDRI R2 R0 0 literal 0000 RD=2 rS1=0 rS2=0
format=1101
pc=11 Op = LDRI R3 R1 0 literal 0000 RD=3 rS1=1 rS2=0
format=1101
pc=12 Op = MOVE R4 R2 literal 0000 RD=4 rS1=2 rS2=0
format=1100
pc=13 Op = STRI R3 R0 0 literal 0000 RD=3 rS1=0 rS2=0
format=1101
pc=14 Op = STRI R4 R1 0 literal 0000 RD=4 rS1=1 rS2=0
format=1101
pc=15 Op = INC R0 literal 0000 RD=0 rS1=0 rS2=0
format=1000
pc=16 Op = DEC R1 literal 0000 RD=1 rS1=0 rS2=0
format=1000
pc=17 Op = CMP R0 R1 literal 0000 RD=0 rS1=1 rS2=0
format=1100
pc=18 Op = BNE LOOP2 literal 000a RD=0 rS1=0 rS2=0
format=0001
pc=19 Op = NOP literal 0000 RD=0 rS1=0 rS2=0
format=0000
pc=20 Op = STOP literal 0000 RD=0 rS1=0 rS2=0
format=0000
pc=21 Op = END! literal 0000 RD=0 rS1=0 rS2=0
format=0000
The following provides the output of a run using this program. We’ve set
the trace level to 4 to show the source code (after text processing), the
symbol table, and the decoded instructions.
Then, we’ve executed the code line by line. In order to make the output
more readable and to fit it on the page, we’ve removed registers and
memory locations that don’t change, and we’ve highlighted values
(memory, registers, and z-flag) that change as the result of an instruction.
You can follow this through and see how memory/registers change with
each instruction.
In the next section, we demonstrate how you might go about testing the
operation of TC1. We cover the following:
Testing the assembler (e.g., the ability to use a free format of code)
This is not exactly stylish code; it’s just random test code. In the
following code, we provide the output of the assembler when operated in
debug mode. This includes the formatting of the code (removal of blank
lines and lowercase to uppercase conversion). The first listing provides
the instructions as an array of lists of tokens:
TC1 CPU simulator 11 September 2022
Input debug level 1 - 5: 4
Source assembly code listing
0 NOP
1 BRA EEE
2 INC R4
3 ALAN INC R5
4 EEE STOP
5 AA NOP
6 BB NOP 1
7 LDRL R0 12
8 LDRL R3 0X123
9 LDRL R7 0XFF
10 INC R2
11 BRA LAST
12 WWW STRI R1 R2 1
13 LAST LDRL R5 0XFAAF
14 BEQ AA
15 STOP 2
The second listing is the symbol table that ties symbol names and labels
to integer values:
Equate and branch table
START 0
TEST1 999
ABC 25
QWERTY 888
ALAN 3
EEE 4
AA 5
BB 6
WWW 12
LAST 13
LOOP1 18
LOOP2 26
The next listing was used largely for debugging when an instruction
didn’t behave as intended. It lets you determine whether an instruction
has been correctly decoded:
Decoded instructions
pc=0 op=NOP literal 000 Dest reg=0 rS1-0
rS2=0 format=0000
pc=00 Op=NOP literal 0000 Dest reg=0 rS1=0
rS2=0 format=0000
pc=01 Op=BRA EEE literal 0004 Dest reg=0 rS1=0
rS2=0 format=0001
pc=02 Op=INC R4 literal 0000 Dest reg=4 rS1=0
rS2=0 format=1000
pc=03 Op=ALAN INC R5 literal 0000 Dest reg=5 rS1=0
rS2=0 format=1000
pc=04 Op=EEE STOP literal 0000 Dest reg=0 rS1=0
rS2=0 format=0000
pc=05 Op=AA NOP literal 0000 Dest reg=0 rS1=0
rS2=0 format=0000
pc=06 Op=BB NOP 1 literal 0000 Dest reg=0 rS1=0
rS2=0 format=0000
pc=07 Op=LDRL R0 12 literal 000c Dest reg=0 rS1=0
rS2=0 format=1001
pc=08 Op=LDRL R3 0X123 literal 0123 Dest reg=3 rS1=0
rS2=0 format=1001
pc=09 Op=LDRL R7 0XFF literal 00ff Dest reg=7 rS1=0
rS2=0 format=1001
pc=10 Op=INC R2 literal 0000 Dest reg=2 rS1=0
rS2=0 format=1000
pc=11 Op=BRA LAST literal 000d Dest reg=0 rS1=0
rS2=0 format=0001
pc=12 Op=WWW STRI R1 R2 1 literal 0001 Dest reg=1 rS1=2
rS2=0 format=1101
pc=13 Op=LAST LDRL R5 0XFAAF literal faaf Dest reg=5 rS1=0
rS2=0 format=1001
pc=14 Op=BEQ AA literal 0005 Dest reg=0 rS1=0
rS2=0 format=0001
pc=15 Op=STOP 2 literal 0000 Dest reg=0 rS1=0
rS2=0 format=0000
>>>
Testing flow control operations
Here, we demonstrate how to test the computer’s most important class of
operations, the flow-control instruction, that is, the conditional branch.
One of the most important classes of instructions to test are those that
change the flow of control: the branch and subroutine call instructions.
The following fragment of code is also meaningless (it serves only to test
instruction execution) and is designed only to test loops. One loop is
built using a branch on a not-zero operation, and the other uses an
automatic loop mechanism that operates by decrementing a register and
branching until the register decrements to zero. The decrement and
branch on not zero (DBNE) instruction has the format DBNE r0,loop,
where r0 is the counter being decremented and loop is the branch target
address.
The following provides the output after a debugging session. As you can
see, the sequence of branches is faithfully implemented. Note that we’ve
highlighted the branch actions and consequences (i.e., the next
instruction):
0 NOP PC= 0 z=0 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
1 BRA LAB1 PC= 1 z=0 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
3 LAB1 INC R2 PC= 3 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
4 NOP PC= 4 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
5 BRA LAB6 PC= 5 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
20 LAB6 BRA LAB2 PC=20 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
7 LAB2 LDRL R2 3 PC= 7 z=0 n=0 c=1
R 0000 0000 0003 0000 0000 0000
0000 0000
8 LAB4 DEC R2 PC= 8 z=0 n=0 c=1
R 0000 0000 0002 0000 0000 0000
0000 0000 000 0000 0000 0000
0000 0000
9 NOP PC= 9 z=0 n=0 c=1
R 0000 0000 0002 0000 0000 0000
0000 0000
10 BNE LAB4 PC=10 z=0 n=0 c=1
R 0000 0000 0002 0000 0000 0000
0000 0000
8 LAB4 DEC R2 PC= 8 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
9 NOP PC= 9 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
10 BNE LAB4 PC=10 z=0 n=0 c=1
R 0000 0000 0001 0000 0000 0000
0000 0000
8 LAB4 DEC R2 PC= 8 z=1 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
9 NOP PC= 9 z=1 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
10 BNE LAB4 PC=10 z=1 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
11 NOP PC=11 z=1 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
12 BSR LAB7 PC=12 z=1 n=0 c=0
R 0000 0000 0000 0000 0000 0000
0000 0000
22 LAB7 DEC R7 PC=22 z=0 n=1 c=1
R 0000 0000 0000 0000 0000 0000
0000 ffff
23 DEC R7 PC=23 z=0 n=1 c=1
R 0000 0000 0000 0000 0000 0000
0000 fffe
24 RTS PC=24 z=0 n=1 c=1
R 0000 0000 0000 0000 0000 0000
0000 fffe
13 NOP PC=13 z=0 n=1 c=1
R 0000 0000 0000 0000 0000 0000
0000 fffe
14 LDRL R3 4 PC=14 z=0 n=1 c=1
R 0000 0000 0000 0004 0000 0000
0000 fffe
15 LAB5 NOP PC=15 z=0 n=1 c=1
R 0000 0000 0000 0004 0000 0000
0000 fffe
16 INC R7 PC=16 z=0 n=1 c=1
R 0000 0000 0000 0004 0000 0000
0000 ffff
17 DBNE R3 LAB5 PC=17 z=0 n=1 c=1
R 0000 0000 0000 0003 0000 0000
0000 ffff
15 LAB5 NOP PC=15 z=0 n=1 c=1
R 0000 0000 0000 0003 0000 0000
0000 ffff
16 INC R7 PC=16 z=1 n=0 c=0
R 0000 0000 0000 0003 0000 0000
0000 0000
17 DBNE R3 LAB5 PC=17 z=1 n=0 c=0
R 0000 0000 0000 0002 0000 0000
0000 0000
15 LAB5 NOP PC=15 z=1 n=0 c=0
R 0000 0000 0000 0002 0000 0000
0000 0000
16 INC R7 PC=16 z=0 n=0 c=1
R 0000 0000 0000 0002 0000 0000
0000 0001
17 DBNE R3 LAB5 PC=17 z=0 n=0 c=1
R 0000 0000 0000 0001 0000 0000
0000 0001
15 LAB5 NOP PC=15 z=0 n=0 c=1
R 0000 0000 0000 0001 0000 0000
0000 0001
16 INC R7 PC=16 z=0 n=0 c=1
R 0000 0000 0000 0001 0000 0000
0000 0002
17 DBNE R3 LAB5 PC=17 z=0 n=0 c=1
R 0000 0000 0000 0000 0000 0000
0000 0002
18 NOP PC=18 z=0 n=0 c=1
R 0000 0000 0000 0000 0000 0000
0000 0002
19 STOP PC=19 z=0 n=0 c=1
R 0000 0000 0000 0000 0000 0000
0000 0002
In the next chapter, we will look at some of the ways in which the TC1
program can be enhanced to add facilities such as error checking, the
inclusion of new instructions, and special features such as variable-
length operand fields.
Most real computers have two other shift variations: an arithmetic shift
that preserves the sign of two’s complement numbers when shifted right
(divide-by-2 operation) and a rotate-through-carry shift where the bit
shifted in at one end is the old carry bit and the bit shifted out becomes
the new carry bit. Essentially, if the register has m bits, the carry bit is
included to create an m+1 bit word. This feature is used for multi-
precision arithmetic. We haven’t included these modes in TC1.
As well as specifying the shift type, we have to specify the shift direction
(left or right). Most computers let you specify the number of shifts. We
provide both facilities and the number of shifts can be specified using
either a register or a literal. In a multi-length shift, the state of the carry
bit is the last bit shifted out into the carry. The shift operations (with
examples) are as follows:
The following output from the simulator (edited to show only relevant
information) gives the registers and condition codes as the preceding
code is executed. The binary value of register r0 is displayed on the
right. This allows us to verify whether the operations have been executed
correctly by manual inspection:
1 LDRL R1 %1000000110000001 z = 0 n = 0 c = 0
Regs 0 - 3 0000 8181 0000 0000 R0 = 0000000000000000
2 LSLL R0 R1 1 z = 0 n = 0 c = 1
Regs 0 – 3 0302 8181 0000 0000 R0 = 0000001100000010
3 LSLL R0 R1 2 z = 0 n = 0 c = 0
Regs 0 - 3 0604 8181 0000 0000 R0 = 0000011000000100
4 LSRL R0 R1 1 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0000 0000 R0 = 0100000011000000
5 LSRL R0 R1 1 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0000 0000 R0 = 0100000011000000
6 LDRL R1 %1000000110000001 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0000 0000 R0 = 0100000011000000
7 LDRL R2 1 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0001 0000 R0 = 0100000011000000
8 LDRL R3 2 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0001 0002 R0 = 0100000011000000
9 LSL R0 R1 R2 z = 0 n = 0 c = 1
Regs 0 - 3 0302 8181 0001 0002 R0 = 0000001100000010
10 LSL R0 R1 R3 z = 0 n = 0 c = 0
Regs 0 - 3 0604 8181 0001 0002 R0 = 0000011000000100
11 LSR R0 R1 R2 z = 0 n = 0 c = 1
Regs 0 - 3 40c0 8181 0001 0002 R0 = 0100000011000000
12 LSR R0 R1 R2 z = 0 n = 0 c = 1
Regs 0 – 3 40c0 8181 0001 0002 R0 = 0100000011000000
Note that a load operation does not affect the z-bit. Some computers
update the z-bit after almost every operation. Some update the z-bit on
demand (e.g., ARM, which we will introduce later), and some update it
only after certain operations.
TC1 postscript
The version of TC1 presented here grew during the development of this
book. The current version has more features than the prototype; for
example, initially, it didn’t include symbolic branch addresses and
required users to enter actual line numbers.
The difference between the two simulators, TC1 and TC1mini, is that the
4-bit binary code provides pre-decoding; that is, the simulator doesn’t
have to calculate what parameters the instruction requires because the
code directly tells you that. If you use a class number instead, you have
to decode the class number to determine the actual parameters required.
However, a class number can be very creative. TC1mini uses seven
different instruction formats and requires a minimum of seven classes to
be defined. If you had, say, 14 classes, each addressing mode class could
be divided into two subclasses to give you greater control over the
instruction execution process.
There are two cases to consider: instructions with a label and those
without a label. In the former case, the mnemonic is the second token in
the instruction, and in the latter case, the mnemonic is the first token. We
can test whether a token is a mnemonic by using Python’s if … in
if token[0] in codes
This returns True if the first token is a valid mnemonic. We can combine
the two tests with an or Boolean to get the preceding expression. In the
program, we call testLine with the tokens parameter and it returns an
error. We use the error to print a message and return it to the operating
system with the sys.exit() function.
General comments
The following line demonstrates how we extract the operation class from
the mnemonic. The expression looks strange because of the () and []
parentheses. The codes.get(key) operation uses key to get the associated
value from the codes dictionary:
opClass = codes.get(mnemonic)[0] # Use mnemonic to read opClass
from the codes dictionary
In this case, the key is the mnemonic, and the value returned is the
operation class; for example, if the mnemonic is ‘LDRL’, the
corresponding value is [3]. Note that the value returned is not 3! It is a
list with the single value 3. Consequently, we have to extract the value
from the list by specifying the first item, that is, mnemonic[0].
In this example, we use the list() function to combine items into a list,
and then we use append() to add this item to an existing list. Note the
syntax of list(). You might expect it to be list(a,b,c). No. It’s
list((a,b,c)). The list() function uses parentheses as normal but the
list itself must be in parentheses. That’s because the list items constitute
a single parameter to list.
This section deals with decoding instructions into the appropriate task in
order to correctly execute them with the appropriate parameters:
def classDecode(predicate):
lit,rD,rS1,rS2 = '',0,0,0 # Initialize
variables
if opClass in [1]: lit = predicate
if opClass in [2]: rD = reg1[predicate]
if opClass in [3,4,5,6,7]:
predicate = predicate.split(',')
rD = reg1[predicate[0]]
if opClass in [4,5,6]: rS1 = reg1[predicate[1]] \
# Get source reg 1
for classes 4, 5, and 6
if opClass in [3,5]: lit = (predicate[-1]) # Get literal
for classes 3 and 5
if opClass in [6]: rS2 = reg1[predicate[2]] # Get
source reg 2 for class 6
if opClass in [7]: rS1 = reg2[predicate[1]] # Get
source pointer reg for class 7
return(lit,rD,rS1,rS2)
The following is the actual instruction execution loop. As you can see, it
is remarkably compact:
# Instruction execution
run = 1 z = 0 pc = 0 while run == 1: thisOp = prog[pc] if
thisOp[2] in [‘STOP’, ‘END’]: run = 0 # Terminate on STOP or END
(comment on this) pcOld = pc pc = pc + 1 mnemonic =
thisOp[2] predicate = thisOp[3] opClass = thisOp[4]
lit,rD,rS1,rS2 = classDecode(predicate) lit = getLit(lit)
if mnemonic == ‘NOP’: pass elif mnemonic == ‘BRA’: pc =
lit elif mnemonic == ‘BEQ’: if z == 1: pc = lit
elif mnemonic == ‘BNE’: if z == 0: pc = lit elif
mnemonic == ‘INC’: r[rD] = r[rD] + 1 elif mnemonic == ‘DEC’:
z = 0 r[rD] = r[rD] - 1 if r[rD] == 0: z
= 1 elif mnemonic == ‘NOT’: r[rD] = (~r[rD])&0xFFFF # Logical
NOT elif mnemonic == ‘CMPL’: z = 0 diff =
r[rD] - lit if diff == 0: z = 1 elif mnemonic ==
‘LDRL’: r[rD] = lit elif mnemonic == ‘DBNE’: r[rD] =
r[rD] - 1 if r[rD] != 0: pc = lit elif mnemonic ==
‘MOV’: r[rD] = r[rS1] elif mnemonic == ‘CMP’: z = 0
diff = r[rD] - r[rS1] if diff == 0: z = 1
elif mnemonic == ‘ADDL’: r[rD] = r[rS1] + lit elif
mnemonic == ‘SUBL’: r[rD] = r[rS1] - lit elif mnemonic ==
‘ADD’: r[rD] = r[rS1] + r[rS2] elif mnemonic ==
‘SUB’: r[rD] = r[rS1] - r[rS2] elif mnemonic ==
‘AND’: r[rD] = r[rS1] & r[rS2] elif mnemonic ==
‘OR’: r[rD] = r[rS1] | r[rS2] elif mnemonic == ‘LDRI’:
testIndex() r[rD] = m[r[rS1]] elif mnemonic
== ‘STRI’: testIndex() m[r[rS1]] = r[rD] regs
= ‘ ‘.join(‘%04x’ % b for b in r) # Format memory location’s
hex mem = ‘ ‘.join(‘%04x’ % b for b in m) # Format
register’s hex print(‘pc =’,’{:<3}’.format(pcOld),’{:
<18}’.format(sFile[pcOld]),\ ‘regs =’,regs,‘Mem
=’,mem,‘z =’,z)
Many of the instructions are executed in only one line of code; for
example, ADD is implemented by adding two registers together: r[rD] =
r[rS1] + r[rS2]. Some instructions, such as compare, require two
registers to be subtracted and then the status bits to be set accordingly.
The ability to avoid different mnemonics (e.g., ADD and ADDL) for the
same basic operation
Consider ADD. We can write ADD r1,r2,5 or ADD r1,r2,r3; that is, the
second number added to a register may be a literal or a register.
Consequently, ADD is in class 5 and class 6. To resolve the ambiguity, we
look at the final operand; if it’s a literal, then it’s class 5, and if it’s a
register, it’s class 6.
Summary
In this chapter, we presented the TC1 simulator, which can take a text
file in TC1 assembly language, convert it into machine code, and then
execute it. TC1’s instruction set architecture is close to the classic RISC
architecture with a register-to-register architecture (i.e., data operations
take place on the contents of registers). The only memory operations
permitted are loading a register from memory (or a literal) and storing a
register in memory.
The TC1 CPU simulator executes instructions one by one and prints the
contents of the registers, program counter, and status flags after each
instruction is executed. You can use this information to debug assembly-
level programs. Often, when you look at the data, you find that the
results are not what was expected; for example, you might want to
execute a loop 9 times but execute it 10 times because you made an error
in testing for the end of the loop.
We have three issues to deal with. The first is displaying the data. How
do we display the data and how do we format it? Should the contents of a
register be displayed as a decimal value, a binary string of 1s and 0s, or
as hexadecimal characters?
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter07.
Python lets you display a prompt before receiving the input; for example,
you can write the following:
x = input('Please enter your age')
Because the input is in character form, you must convert numeric values
into integer form before using them. It’s easy to perform conversions into
decimal, binary, or hexadecimal, as the following examples show. You
just add the number base as a second parameter to the int() function:
x = input('Enter the constant ')
y = int(x) # For a decimal constant
y = int(x,2) # For a binary constant
y = int(x,16) # For a hexadecimal constant
The following code performs this operation. The input uses a replace
function to convert all commas into spaces. We combine the replace
operation with the input operation to create compact code. The input is
followed by a split function to convert the string into tokens:
inst = input('Enter operation: >>').replace(',',' ')
p = inst.split(' ')
t1,t2,t3,t4 = p[0],int(p[1][1:]),int(p[2][1:]),int(p[3][1:],16)
Finally, we examine each of the four tokens in turn and extract the
parameter as an integer (t1, t2, t3, and t4). Consider t4. The p[3]
expression extracts the “$12FA” string. The second index, [1:], extracts
all characters after the first one to give “12FA”. This is still a character
string. The final operation, int(p[3][1:],16), converts the parameter
string in hexadecimal form into the integer 4858. The output produced by
a second example, ADD r3,r7,$1102, was ADD 3 7 4354.
As we’ve already seen, Python lets you put several equates on a line –
for example, a,b,c = p,q,r. This results in the following:
a = p
b = q
c = r
The output is next. Note that when we print t1 to t4, the numeric value
of the hexadecimal operand is given in its decimal form:
Enter operation: >>add r1,r2,$FACE
inst add r1 r2 $FACE
p ['add', 'r1', 'r2', '$FACE']
t1,t2,t3,t4 add 1 2 64206
The next section looks at how we format data such as numbers so that
they can be made much easier for the reader to understand; for example,
sometimes you might wish to represent the decimal 42 as 101010 and
sometimes as 00101010 or 002A.
Displaying data
We now look more deeply at the ways in which data can be displayed in
Python. When you are observing the execution of a program, you want to
see what has changed after each instruction has been executed. A
computer’s state is determined by the contents of its registers and
memory, the program counter, and its status bits, plus its memory.
How do we display data? Since data can represent anything you want it
to, the data in a register has no intrinsic meaning. By convention, CPU
simulators represent data in hexadecimal form. This is partially because
each 16-bit register holds 4 hexadecimal characters and that provides a
rather convenient way for humans to handle data (try remembering 16-
bit strings of 1s and 0s). Some simulators permit binary, hex, or decimal
displays, and others allow data to be displayed as characters (i.e., the
data is assumed to be ASCII-encoded).
This looks up the string item (instruction) at the pcOld address and prints
it. Since pc is modified during an instruction cycle, we print the old value
at the start of the current cycle.
Suppose we want to display the eight registers on the same line, each as
a six-character hexadecimal value. First, consider the following Python
code:
z = 0x4ace # The hex data to print
print("%06x" %z) # Printing it in 6 chars with leading zeros
Note how this has been printed on two lines due to ‘\n’ and the two
values tabbed by ‘\t’. You can control the size of the tab, as the
following example shows. The expandtabs() method sets the tab width
(number of spaces) to the parameter provided. In this case, we have
embedded the tab into a string and set the tab width to 6:
print('This is a\ttab\ttest'.expandtabs(6))
This is a tab test
\ Backslash
\r Enter (return)
\b Backspace
The only change to the format string is from “{0:b}” to “{0:16b}”. That
is, we have inserted the field-width of 16 characters before b. The effect
of 16 is to define a 16-bit width for the string. The string is padded with
spaces on the left. The output of this code is as follows:
p1 is 11010
p2 is 11111110001
This gives an output where the numbers are displayed in 16 bits and
padded with leading zeros, as follows:
p1 is 0000000000011010
p2 is 0000011111110001
This gives the following output. As you can see, it’s analogous to the
binary version:
1e
0000001e
fa7b
0000fa7b
We print the result twice. In the second case, prefixes are added to the
values to indicate the base. If the first number, xBin, is binary, we can
concatenate “0b” simply by using a “+” symbol to add 0b immediately
before the binary string. The output from this code is as follows:
x is 0000010000001101 y is 0145
x is 0b0000010000001101 y is 0x0145
Suppose you were printing a table of the powers of integers in the form
x, x2, x3, x4. We could write the following:
for x in range (1,7):
print("Table of powers {0:2d}{1:3d}{2:4d}
{3:6d}".format(x,x**2,x**3,x**4))
Table of powers 2 4 8 16
Table of powers 3 9 27 81
The following code prints the decimal integer 123 first in decimal form,
using the three modifiers, and then in binary form using the same three
modifiers. In each case, we have specified a width of 10 characters:
x = 123
print('{:<10d}'.format(x))
print('{:>10d}'.format(x))
print('{:^10d}'.format(x))
print('{:<10b}'.format(x))
print('{:>10b}'.format(x))
print(
'{:^10b}'
.format(x))
123 Right-justified
123 Centered
1111011 Left-justified
1111011 Right-justified
1111011 Centered
We are now going to provide three examples of how strings representing
numbers can be printed. The first demonstrates the formatting of
individual numbers in integer, hexadecimal, binary, and real forms. The
second example shows how we can take a list of registers, join them as a
single string, and print their values. This is very useful in displaying data
when stepping through instructions during a simulation. The third
example demonstrates the successive steps in processing a hexadecimal
value into the desired format.
Validating data
Since the TC1 assembler doesn’t perform error-checking on the input, if
you make an error, it’s likely that the program will crash, leaving you to
do your own debugging. Good software performs error-checking, which
ranges from the simple detection of invalid instructions to the exact
pinpointing of all errors.
Here, we demonstrate how you can read a line of code and check for
several types of common errors – for example, invalid opcodes, invalid
instruction formats (too many or two few operands), typos (typing T6
instead of R6), and registers out of range (entering R9).
The purpose of this section is to show how you can add your own
modifications to TC1. A formal way of dealing with the problem would
be to construct a grammar for the assembly language and then build a
parser to determine whether the input conforms to that grammar. We are
going to take a simpler and more ad hoc approach.
The first test to perform is on the validity of the instruction. Assume that
all mnemonics have been defined in a list or directory called codes. All
we have to do is to look it up in the codes directory using the following:
if jj not in codes: error = 1
Python keywords are shaded. This expression sets the error variable to 1
if this instruction is not in the dictionary. Then, we can test error and
take whatever action is necessary.
The next step is to use the name of the instruction to look up its details,
and then check whether that instruction requires parameters. Remember
that our dictionary entries have a two-component tuple, with the first
component being the instruction’s format (i.e., the number of operands
required) and the second being the actual operation code:
form = codes.get(y[0]) # Read the 4-bit format code
It looks up the instruction (i.e., y[0]) in the dictionary and returns its
value, which is a tuple, such as (8:12). The first element of the tuple,
form[0], describes the instruction’s operands and the second is the
opcode (which is not of interest here). The parameters required by the
instruction are determined by form[0]. Consider the following code:
opType = form[0] #
Get operand info
if opType == 0: totalOperands = 1 #
Just a mnemonic
elif opType == 8 or opType == 1: totalOperands = 2 #
p yp p yp p
Mnemonic + 1 operand
elif opType == 12 or opType == 9: totalOperands = 3 #
Mnemonic + 2 operands
elif opType == 14 or opType == 13: totalOperands = 4 #
Mnemonic + 3
elif opType == 15: totalOperands = 5 #
Mnemonic + 4 (not used)
The four bits of the format code represent rD, rS1, rs2, and a literal. TC1
instructions have several valid formats; for example, if opType = 0b1001
The preceding code uses an if…else to get the length (the number of
tokens including the opcode) of each instruction. All we then have to do
is to count the number of tokens in the current instruction and see
whether it’s the same as the expected value (i.e., the total length). The
following code performs this check:
totalTokens = len(y) # Get the number of tokens
in this instruction y if totalTokens < totalOperands: # Are
there enough tokens? error = 2 #
Error 2: Too few operands continue if totalTokens >
totalOperands: # Are there too many tokens? error =
3 # Error 3: Too many operands
continue
We use the format information to test for each operand in turn. Here, we
just deal with the first operand, rD (the destination register):
if opType & 0b1000 == 0b1000: # If the destination
register bit is set rDname = y[1] # Get
the register name (second item in string) error,q =
syntaxTest(rDname) # Call syntax test to look for errors
The first line of this code tests whether the leftmost bit of format is 1 or
0 by ANDing the format code with 0b1000 and testing for 0b1000. If the
result is true, then we need to check for the first register operand, which
is the second token – that is, y[1].
Three tests are performed, one for each type of error that we are looking
for. The first test is to check whether the first character of the token is
‘R’ . If it is not ‘R’ , a return is made with the error code 4, and the
dummy or default register number is set to 0. The second test looks for a
numeric value for the register (the characters following the ‘R’, which is
token[1:]). The third test checks whether the number is greater than 7
and returns an error code if it is. Finally, when the last line is reached, a
return is made with the error code 0 and the appropriate register number.
Note that we don’t need to use an elif because if an if yields True, the
code is exited via return().
The code for an error-testing routine is given next. This is not intended to
be a complete program, but a demonstration of the way in which you can
extend a program to include error-testing on the input data:
# Testing Python parsing # 22 Aug 2020 Version of 29 July 2021
import sys # System library used
to exit program
codes = {'NOP':(0,0), 'STOP': (0,1),'BEQ':(1,4), 'INC':(8,2), \
'MOVE':(12,23), 'LDRL':(9,13), 'ADD':(14,12),'ADDL':
(13,12)}
def syntaxTest(token): # Test the format of a register
operand for validity (R0 to R7)
if token[0] != 'R': return(4,0) # Fail on missing initial
R. Return error 2
if not token[1:].isnumeric(): return(5,0) # Fail on missing
register number. Return error 3
if int(token[1:]) > 7: return(6,0) # Fail on register number not in
range 0-7. Return error 4
return(0,int(token[1:])) # Success return with error
code 0 and register number
def printError(error):
if error != 0:
if error == 1: print("Error 1: Non-valid operation")
if error == 2: print("Error 2: Too few operands")
if error == 3: print("Error 3: Too many operands")
if error == 4: print("Error 4: Register operand error-
no 'R'")
if error == 5: print("Error 5: Register operand error -
no valid num")
if error 6: print("Error 6: Register operand error
if error == 6: print("Error 6: Register operand error -
not in range")
run = 1
error = 0
while run == 1:
if error != 0: printError(error) # if error not zero, print
message
x = input("\nEnter instruction >> ")# Type an instruction (for
testing)
x = x.upper() # Convert lowercase into
uppercase
x = x.replace(',',' ') # Replace comma with space
to allow add r1,r2 or add r1 r2
y = x.split(' ') # Split into tokens. y is the
tokenized instruction
if len(y) > 0: # z is the predicate (or null if
no operands)
z = y[1:]
else: z = ''
print("Inst =",y, 'First token',y[0])
if y[0] not in codes: # Check for valid opcode
error = 1 # Error 1: instruction not valid
print("Illegal instruction", y[0])
continue
form = codes.get(y[0]) # Get the code's format
information
print('Format', form)
if form[1] == 1: # Detect STOP, opcode value
1,and terminate
print("\nProgram terminated on STOP") # Say "Goodbye"
sys.exit() # Call OS function to leave
opType = form[0]
if opType ==
0: totalOperands = 1
elif opType == 8 or opType == 4 or opType ==
1: totalOperands = 2
elif opType == 12 or opType ==
9: totalOperands = 3
elif opType == 14 or opType ==
13: totalOperands = 4
totalTokens = len(y) # Compare tokens we have
with those we need
if totalTokens < totalOperands:
error = 2 # Error 2: Too few operands
continue
continue
if totalTokens > totalOperands:
error = 3 # Error 3: Too many operands
continue
if opType & 0b1000 == 0b1000:
rDname = y[1]
error,q = syntaxTest(rDname)
if error != 0: continue
if opType & 0b0100 == 0b0100:
rS1name = y[2]
error,q = syntaxTest(rS1name)
if error != 0: continue
if opType & 0b0010 == 0b0010:
rS2name = y[3]
error,q = syntaxTest(rS2name)
if error != 0: continue
if opType & 0b0001 == 0b0001:
if not y[-1].isnumeric():
error == 7
print("Error 7: Literal error")
if error == 0:
print("Instruction", x, "Total operands",
totalOperands,"Predicate", z)
The function returns two values: an error code and the number of the
register. If an error is detected, the register value of 0 is returned as a
default. The function is given here:
def regTest(tokNam,token): # Test format of a register operand
for validity (R0 to R7)
if token in regSet: # Is it in the register set?
return (0,regSet.get(token)) # If it's there, return 0 and token
value
else: # If not there, return error code 4
and the token's name
print("Error in register ",tokNam)
return (4,0)
The code with the light gray background reads the tuple with the four
data elements associated with the mnemonic and extracts the individual
parameters.
The three lines beginning with “if opCode == 1:” read the operation to
determine whether the instruction was “STOP”. If it was STOP, the
sys.exit() operation terminates the program. Note that we have to use
import sys at the start of the program to import the library of system
functions:
mnemonic = y[0] # Get the mnemonic if
mnemonic not in codes: # Check for a valid opcode
error = 1 # If none found, set error code
continue # and jump to the end of the loop
opData = codes.get(mnemonic) # Read codes to get the data for
this instruction opForm = opData[0] # Get each of this
instruction’s parameters opStyle = opData[1]
opCode = opData[2] opLen = opData[3] if opCode
== 1: # If the op_Code is 1, then it’s “STOP”, so exit the
program print(“\nProgram terminated on STOP”)
sys.exit() totalTokens = len(y) # How
many tokens do we have? if totalTokens < opLen: #
Compare with the expected number error =
2 # Error 2: Too few operands continue
if totalTokens > opLen: error =
3 # Error 3: Too many operands continue
The final two blocks in the preceding code fragment with a dark gray
background perform error-detecting operations. They both get the
number of tokens from the instruction and then compare that number to
the value for this instruction. In the first case, an error of 2 indicates too
few tokens, and in the second case, an error of 3 indicates too many
tokens.
At this stage, we have determined that the instruction is valid and has the
correct number of operands. The next stage is to check the operands. The
check is performed according to the style of the instruction. There are
seven styles. Style 1 has no further checking because there is no operand
(e.g., for NOP). We will just look at the checking for style 6, which
corresponds to instructions with a mnemonic, rD1, rS1, and a literal such
as ADD R1,R2,25.
We call the regTest function first with the ‘rD’ parameter to tell it we are
testing for the destination register and the predicate[0] token, which is
the first parameter. This returns an error flag and the value of the register.
Because we perform two tests (register rD and rS1), we must use two
error names: e1 for the first and e2 for the second test. If we used error
as the variable in both cases, a non-error second result would clear the
first error. The line if (e1 != 0) or (e2 != 0): error = 4 returns
error with the appropriate error status independent of which register was
in error. continue at the end of this block skips further error checking for
this instruction:
# Input error checking - using dictionaries Modified 30 July
2021
# Instruction dictionary 'mnemonic':(format, style, op_code,
length)
# Style definition and example of the instruction format
# 0 NOP mnemonic only
# 1 BEQ L mnemonic + literal
# 2 INC R1 mnemonic + rD
# 3 MOVE R1,R2 mnemonic + rD1 + rS1
# 4 LDRL R1,L mnemonic + rD1 + literal
# 5 ADD R1 R2 R3 mnemonic + rD + rS1 + rS2
# 6 ADDL R1 R2 L mnemonic + rD + rS1 + literal
# 7 LDRI R1 (R2 L) mnemonic + rD + rS1 + literal (same as 6)
import sys # System library used
to exit program
# Dictionary of instructions (format, style,
op_code, length)
codes = {'NOP': (0b0000,0,0,1),'STOP':(0b0000,0,1,1),'BEQ':
(0b0001,1,2,2), \
'INC': (0b1000,2,3,2),'MOVE':(0b1100,3,4,3),'LDRL':
(0b1001,4,6,3), \
'LDRI':(0b1101,7,7,4),'ADD': (0b1110,5,8,4),'ADDL':
(0b1101,6,9,4)}
regSet =
{'R0':0,'R1':1,'R2':2,'R3':3,'R4':4,'R5':5,'R6':6,'R7':7} #
Registers
def regTest(token): # Test register
operand for R0 to R7
if token in regSet: return (0) # Return with error 0
if legal name
else: return (4) # Return with error 4
OK
elif n[0] == '-': error = 0 # Negative number
OK
elif n[0] == '%': error = 0 # Binary number OK
elif n[0:2] == '0X': error = 0 # Hex number OK
else: error = 6 # Anything else is an
error
return(error) # Return with error
number
This is the main loop. An instruction is input and then checked for errors.
As in earlier examples, the instruction is processed for validity first and
the mnemonic is checked to see whether it is in codes:
error = 0
while True: # Infinite loop
if error != 0: printError(error)
error = 0
x = input(">> ").upper() # Read instruction and
provide limited processing
if len(x) == 0: continue # Ignore empty lines and
continue
x = x.replace(',',' ') # remove commas
x = x.replace('(','') # remove (
x = x.replace(')','') # remove )
y = x.split(' ') # Create list of tokens
(mnemonic + predicate)
mnemonic = y[0] # Get the mnemonic (first
token)
if mnemonic not in codes: # Check for validity
error = 1 # If not valid, set error code
and drop out
continue
opData = codes.get(mnemonic) # Read the four parameters
for this instruction
opForm = opData[0] # opcode format
(rDS,rS1,rS2,L)
opStyle = opData[1] # Instruction style (0 to 7)
opCode = opData[2] # Numeric opcode
opLen = opData[3] # Length (total mnemonic +
operands in range 1 to 4)
if opLen > 1: predicate = y[1:] # Get predicate if this is one
else: predicate = '' # If single token, return null
print("Mnemonic =",mnemonic, "Predicate", predicate, \
"Format =", bin(opForm),"Style =",opStyle,"Code
=",opCode, \
"Length =",opLen)
if opCode == 1: # Used to terminate this
program
print("\nProgram ends on STOP")
sys.exit()
totalTokens = len(y)
if totalTokens < opLen:
error = 2 # Error 2: Too few operands
continue
if totalTokens > opLen:
error = 3 # Error 3: Too many operands
continue
if opStyle == 0: # e.g., NOP or STOP so
nothing else to do
continue
elif opStyle == 1: # e.g., BEQ 5 just check for
literal
literal = predicate[0]
error = litCheck(literal)
continue
elif opStyle == 2: # e.g., INC r6 check for single
register
error = regTest(predicate[0])
continue
elif opStyle == 3: # e.g., MOVE r1,r2 check for
two registers
e1 = regTest(predicate[0])
e2 = regTest(predicate[1])
if e1 != 0 or e2 != 0:
error = 4
continue
elif opStyle == 4: # e.g., LDRL r1,12 Check
register then literal
error = regTest(predicate[0])
error = regTest(predicate[0])
if error != 0: continue
literal = predicate[1]
error = litCheck(literal)
continue
elif opStyle == 5: # e.g., ADD r1,r2,r3 Check for
three register names
e1 = regTest(predicate[0])
e2 = regTest(predicate[1])
e3 = regTest(predicate[2])
if e1 != 0 or e2 != 0 or e3 !=0:
error = 4
continue
elif opStyle == 6: # e.g., ADDL R1,R2,4 Check
for two registers and literal
e1 = regTest(predicate[0])
e2 = regTest(predicate[1])
literal = predicate[2]
e3 = litCheck(literal)
if e1 != 0 or e2 != 0:
error = 4
if e1==0 and e2==0 and e3 !=0: # If registers are OK but not
literal
error = 6 # report literal error
continue
elif opStyle == 7: # e.g., LDRI r4,r0,23 or LDRI
r4,(r0,23)
e1 = regTest(predicate[0])
e2 = regTest(predicate[1])
literal = predicate[2]
e3 = litCheck(literal)
if e1 != 0 or e2 != 0:
error = 4
if e1==0 and e2==0 and e3 !=0: # If registers are OK but not
literal
error = 6 # report literal error
continue
When you have completed this section, you will be able to construct
your own instruction tracing facilities.
Suppose we create a variable, trace, and then, at the end of the execute
loop, print the appropriate data if trace is 1 and jump to the next
instruction without printing data if trace = 0:
The CPU state is printed after each instruction only if trace = 1. How
do we turn trace on and off? Turning trace off is easy; all you need do
is read the keyboard input when single-stepping, and turn trace off if a
particular character or string is entered. However, once trace is 0, we’ve
lost control, and instructions are executed until the program is
terminated.
Similarly, you can enter an instruction that will be loaded into the
traceCodes table. This behaves exactly like the PC breakpoint. When an
instruction that’s in the traceCodes table is encountered, the machine
status is displayed. Thus, the simulator provides four modes:
The first step is to choose a mnemonic and unique opcode and insert
them into the table of code. We’ve arranged the instruction set to leave
some unallocated code (e.g., code beginning with 11). The second step is
to write the code to interpret the new instruction.
We will call the instruction ORD (order numbers) and write it as ORD r0.
The binary code is 1110000 rrr 00…0 (where rrr is the 3-bit register
field) and is assigned to this instruction. ‘ORD’:(8,112) is entered in the
Python dictionary of instructions. The opcode is 112 and the parameter
allocation code in binary is 1000 (i.e., 8), because the only parameter
required is Rd.
How do we reverse bits? Consider the four bits 1101 and assume they are
in T1 (see Fig 7.1). Suppose we shift the bits one place left so that the bit
that leaves the left-hand end of T1 goes into the right-hand end of T2,
and then we shift T2 one place to the right. We repeat that operation four
times. Figure 7.1 shows what we get:
Figure 7.1 – Shifting one register’s output into a second register’s input to reverse a
string
We have reversed the order of the bits. If the register to be shifted is op1,
then we can write the Python code as follows. This code is in the form of
a function that can be called from the instruction interpreter:
def reverseBits(op1): # Reverse the bits of
register op1
reversed = 0 # The reversed value is
i iti li d
initialized
toShift = r[op1] # Read the register
contents
for i in range(0,16): # Repeat for all 16 bits
bitOut = toShift & 0x8000 # Get msb of word to
reverse
toShift = toShift << 1 # Shift source word one
place left
reversed = reversed >> 1 # Shift result one place
right
reversed = reversed | bitOut # Use OR to insert bit in
lsb of result
return(reversed)
We can now change the code of TC1 to incorporate this. There are three
steps:
3. Step 3: Insert the reverseBits function into the Python code. This
instruction replaces the data in the rD register with the bits reversed.
Note that we have changed the instruction format for minimal changes to
the code (in this case, it’s just the change of source register from op0 to
op1).
String 1 has an odd number of characters and 4 is the center. String 2 has
an even number of characters, and 4 and 5 are on either side of the
middle.
Suppose we are stepping through a string using two pointers, one at each
end. As we step in from both sides, one pointer goes up and the other
goes down. When we get to the middle, either the pointers are the same
(odd length) or the pointers differ by one (even length).
It would be nice to have a compare operation that compares two values
and returns equality if either they are the same or if the second one
differs from the first by +1. The new instruction, CMPT (compare
together), does this. For example, CMPT r4,r6 sets the z bit to 1 if the
contents of r4 and r6 are the same, or if the contents of r4 are one less
than the contents of r6. The code to do this is as follows:
if mnemonic == “CMPT”: z = 0 if (r[rD] ==
r[rS1]) or (r[rD] == r[rS1] + 1): z = 1
As you can see, this performs two tests on the pointers, one for equality
and one for higher by 1, and combines the test results using a Boolean or
operator; that is, if the pointers are x and y, then the test is true if x = y
is true or if x + 1 = y is true.
Variable-length instructions
This short section provides ideas for experimentation with instructions
and their formats and extends your understanding of instructions, their
structure, and the trade-off involved in creating instruction sets. It is not
designed to illustrate a real computer.
Like many computers, TC1 has fixed-length fields in its opcode; that is,
the number of bits dedicated to each field is fixed and does not vary from
instruction to instruction. There are always 16 bits in the literal field,
even if the current instruction does not require a literal. Wasteful indeed.
Since the purpose of TC1 is experimentation, we demonstrate how you
might make the number of registers variable (i.e., user-definable).
Adding more registers speeds up computation by requiring fewer
memory accesses. However, there is a price; where do you get the extra
bits that would be needed to specify the registers? Do you take the extra
register bits from the opcode field (reducing the number of different
instructions), or do you take them from the literal field (reducing the
maximum size of a literal that can be loaded in a single instruction)? Or
do you implement multiple banks of registers and switch in a new set of
registers (called windowing) as a temporary measure?
Figure 7.2 provides the output of a single run of this program. The inputs
are in bold. You can see that the register fields have been selected as 3, 3,
and 5 bits wide. The instruction is ADD R7,R2,R31 (note that the only data
extracted is 7, 2, and 31, as we are not interested in the actual
instruction):
The final binary instruction is given with each of its fields in a different
style for clarity. You can see that the register fields have been placed in
the correct positions in the instruction and the remaining bits (the literal
field) are padded with zeros.
Running this code with some sample values gives the following output
(Figure 7.3). As you can see, the register files have been inserted into the
opcode:
Enter three width for: rD,rS1,rS2 (e.g., 2,2,3) >> 3,4,5
Enter register operands for: rD,rS1,rS2 (e.g.,R1,R3,R2)>>
R4,R6,R30
Register widths: rD = 4 rS1 = 6 rS2 = 30
opCode 0b1111110 100 0110 11110 0000000000000
Figure 7.3 – Demonstration of variable-length operand fields
The number of registers used by this machine is … none! For the sake of
simplicity, and fun, we decided to make all instructions memory-based.
Consequently, we need two counters: one that counts the instructions and
one that counts the bytes. For example, the instruction sequence in Table
7.1 demonstrates the instruction address (sequential) and the memory
address of the first byte of an instruction. Here, instructions vary from 1
byte (stop) to 4 bytes (add):
ld 28,7 0 0
ld 27,2 1 3
ld 26,1 2 6
add 28,28,26 3 9
dec 26 4 13
Code Instruction address Memory address
bne 3 5 15
stop 6 17
Here, we have used simple numeric addresses. Some addresses are literal
bytes; for example, ld 28,7 means load memory location 28 with the
number 7.
Since you have to give a byte branch address, you not only have to count
the number of instructions branched but also the number of bytes
branched. To do this, we create a mapping table that maps the instruction
address to the byte address. This table is called map[]:
print ('Demonstrating multiple length instructions. Version 3
December 8 2022 \n')
mem = [0] * 128
The lookUp{} dictionary describes each instruction with a binary key and
a value consisting of a mnemonic. The allOps{} dictionary consists of a
key (the mnemonic) and a tuple containing the instruction length and
opcode:
lookUp =
{0b00001:'nop',0b00010:'stop',0b01000:'inc',0b01001:'dec', \
0b01010:'bra',0b01011:'beq',0b01100:'bne',0b10000:'m
ov', \
0b10001:'cmpl',0b10010:'cmp',0b10011:'ld',0b10100:'s
t', \
0b11000:'add',0b11001:'sub'}
allOps = {'nop':(1,1),'stop':(1,2),'inc':(2,8),'dec':
(2,9),'bra':(2,10), \
'beq':(2,11),'bne':(2,12),'mov':(3,16),'ld':
(3,19), \
'cmpl':(3,17),'cmp':(3,18),'add':(4,24),'sub':
(4,25),'test':(0,0)}
# NOTE that progS is the actual program to be executed. It is embedded into the
program
progS = ['this: equ 26','ld this:,7','that: equ 28','ld
27,2', \
'ld that:,1','loop: add 28,28,26', 'dec 26','bne
loop:','stop']
symTab = {} # Label
symbol table
prog = [] # progS is
prog = [] # progS is
prog without equates
for i in range (0,len(progS)): # Process
source code for equates
thisLine = progS[i].split() # Split
source code on spaces
if len(thisLine) > 1 and thisLine[1] == 'equ': # Is this line
an equate?
symTab.update({thisLine[0][0:]:thisLine[2]}) # Store
label in symbol table.
else: prog.append(progS[i]) # Append line to
prog unless it's an equate
The next step after removing equates is to clean up the source code and
deal with labels:
for i in range (0,len(prog)): # Process source code (now
without equates)
prog[i] = prog[i].replace(',',' ') # Remove commas
prog[i] = prog[i].split(' ') # Tokenize
token1 = prog[i][0] # Get first token of
instruction
if token1[-1] == ':': # If it ends in :, it's a label
j = str(i) # Note: we have to store i
as a string not an integer
symTab.update({token1:j}) # Add label and instruction
number to symbol table
prog[i].pop(0) # Remove label from this
line. NOTE "pop"
print('Symbol table: ', symTab)
map = [0] * 64 # Map instruction number to
byte address
At the end of the execute loop, we get input from the keyboard. This
simply introduces a wait until the Enter/return key is hit before the next
instruction is executed. The remaining Python code formats the output:
x = input(‘… ‘) xxxx = mnemonic + ‘ ‘ + str(operand1) +
‘ ‘ + str(operand2) \ + ‘ ‘ + str(operand3) instPrint
= ’ {0:<15}’.format(xxxx) # re-format the instruction
print (‘iC=’,iC-1,’\tpc=’,pcOld,’\tOp=’,mnemonic,‘z=’,z, \
Summary
The previous chapter introduced TC1, a Python-based computer
simulator that could be used to develop and test instruction set
architectures. In this chapter, we explored aspects of simulator design in
more depth.
We looked at how you can create new instructions and add them to
TC1’s instruction set. Advanced instructions that perform a lot of
special-purpose computation were once the province of the classic CISC
processor, such as the Motorola 68K family. Then, with the rise of the
RISC architecture and its stress on simplicity and single-cycle
instructions, the CISC processor seemed about to go out of fashion.
However, many modern computers have incorporated complex
instructions for special applications such as data encoding, image
processing, and AI applications.
We looked a little more deeply at how you can check the input of a
simulator and ensure that errors in data and instructions can be detected.
The next chapter returns to the simulator and looks at several simulators
for different types of architecture.
8
Finally, we will present the code of TC4. This is a simulator for a non-
von Neumann machine with separate address and data memories and
where the address and data word lengths differ.
Technical requirements
You can find the programs used in this chapter on GitHub at
https://github.com/PacktPublishing/Practical-Computer-Architecture-
with-Python-and-ARM/tree/main/Chapter08.
A register called a Stack Pointer (SP) points to the TOS. That is, the
stack pointer contains the address of the item at the top of the stack. By
convention, the stack pointer grows upward as items are added and
shrinks downward as items are removed. Since we draw memory
diagrams with low addresses at the top of the page, the stack grows up
toward low addresses. In other words, if the top of a stack is at the
location 1231, pushing an element on the stack stores it at address 1230,
since the stack grows toward low addresses.
Remember that the stack pointer is decremented because the stack grows
toward lower addresses. If an item is popped off the stack, the inverse
operation is as follows:
A = stack[sp] # Retrieve the item at the top of the stack
sp = sp + 1
# Move the stack pointer down
We’ve taken a shortcut. We could have pulled two elements off the stack,
added them, and pushed the result. Instead, we put the result back where
the second operand was and saved two stack pointer movements. The
following Python code illustrates a very simple stack machine
interpreter. It does not implement branch operations, so it is not a
realistic computation machine. Because a stack machine often operates
on the top of the stack and the element below it, the second element is
frequently called NOS. Note that the program is stored as a list of lists,
with each instruction consisting of either a two-element list (e.g.,
[‘push’, ‘2’]) or a single-element list (e.g., [‘mul’]):
# Stack machine simulator prog
= [[‘push’,0],[‘push’,1],[‘add’], [‘push’,2],
[‘push’,1], \ [‘sub’], [‘push’,3],
[‘sub’], [‘mul’], [‘push’,4], \
[‘swap’], [‘dup’],[‘pull’,4], [‘stop’]] stack = [0] *
8 # 8-location stack. Stack grows to lower addresses
mem = [3,2,7,4,6,0] # Data memory (first locations are
preloaded 3, 2,7, 4, 6) run = True # Execution
continues while run is true pc = 0 #
Program counter - initialize sp = 8 #
Initialize stack pointer to 1 past end of stack while
run: # Execute MAIN LOOP until run is false
(STOP command) inst = prog[pc] # Read the next
instruction pc = pc + 1 # Increment program
counter if inst[0] == ‘push’: # Test for push operation
sp = sp - 1 # Pre-decrement stack pointer
address = int(inst[1]) # Get data from memory
stack[sp] = mem[address] # Store it on the stack elif
inst[0] == ‘pull’: # Test for a pull instruction
address = int(inst[1]) # Get destination address
mem[address] = stack[sp] # Store the item in memory
sp = sp + 1 # Increment stack pointer
elif inst[0] == ‘add’: # If operation add TOS to NOS and
push result p = stack[sp] sp = sp + 1 q =
stack[sp] stack[sp] = p + q elif inst[0] ==
‘sub’: # sub p = stack[sp] sp = sp +
1 q = stack[sp] stack[sp] = q - p elif
inst[0] == ‘mul’: # mul p = stack[sp]
sp = sp + 1 q = stack[sp] stack[sp] =
p * q elif inst[0] == ‘div’: # div (note floor division
with integer result) p = stack[sp] sp = sp + 1
q = stack[sp] stack[sp] = p//q elif inst[0]
== ‘dup’: # dup (duplicate top item on stack) p =
stack[sp] # get current TOS sp = sp -
1 # and push it on the stack to duplicate
stack[sp] = p elif inst[0] == ‘swap’: #
swap (exchange top of stack and next on stack) p = stack[sp]
q = stack[sp+1] stack[sp] = q
stack[sp+1]=p elif inst[0] == ‘stop’: #
stop run = False if sp == 8: TOS = ‘empty’ #
Stack elements 0 to 7. Element 8 is before the TOS else: TOS = stack[sp]
print(‘pc =’, pc-1,‘sp =’,sp,‘TOS
=’,TOS,‘Stack’,stack,‘Mem’,mem,‘op’,inst)
The following is the output from this program, which shows the program
counter, the top of the stack, NOS, the stack itself, the data, and the
opcode being executed. Values that change between cycles are in bold:
pc=0 sp=7 TOS=3 Stack [0,0,0,0,0,0,0,3] Mem [3,2,7,4,6,0] op
['push',0]
pc=1 sp=6 TOS=2 Stack [0,0,0,0,0,0,2,3] Mem [3,2,7,4,6,0] op
['push',1]
pc=2 sp=7 TOS=5 Stack [0,0,0,0,0,0,2,5] Mem [3,2,7,4,6,0] op
['add']
pc=3 sp=6 TOS=7 Stack [0,0,0,0,0,0,7,5] Mem [3,2,7,4,6,0] op
['push',2]
pc=4 sp=5 TOS=2 Stack [0,0,0,0,0,2,7,5] Mem [3,2,7,4,6,0] op
['push',1]
pc=5 sp=6 TOS=5 Stack [0,0,0,0,0,2,5,5] Mem [3,2,7,4,6,0] op
['sub']
pc=6 sp=5 TOS=4 Stack [0,0,0,0,0,4,5,5] Mem [3,2,7,4,6,0] op
['push',3]
pc=7 sp=6 TOS=1 Stack [0,0,0,0,0,4,1,5] Mem [3,2,7,4,6,0] op
['sub']
pc=8 sp=7 TOS=5 Stack [0,0,0,0,0,4,1,5] Mem [3,2,7,4,6,0] op
['mul']
pc=9 sp=6 TOS=6 Stack [0,0,0,0,0,4,6,5] Mem [3,2,7,4,6,0] op
['push',4]
pc=10 sp=6 TOS=5 Stack [0,0,0,0,0,4,5,6] Mem [3,2,7,4,6,0] op
['swap']
pc=11 sp=5 TOS=5 Stack [0,0,0,0,0,5,5,6] Mem [3,2,7,4,6,0] op
['dup']
pc=12 sp=6 TOS=5 Stack [0,0,0,0,0,5,5,6] Mem [3,2,7,4,6,5] op
['pull',5]
pc=13 sp=6 TOS=5 Stack [0,0,0,0,0,5,5,6] Mem [3,2,7,4,6,5] op
['stop']
[M] [A] + L
accumulator/memory
with zero
unconditionally to L
then[PC] ←L
L then[PC] ←L
STOP Stop 7
Table 8.2 – Typical operations of a register-to-memory computer
where CCC is the opcode field, D is the direction bit, M is the mode bit,
and LLLLL is the literal or memory address (Figure 8.3). The extreme
simplicity of this makes it easy to write a tiny simulator and leaves the
user with a lot of opportunities to expand the code into a more realistic
machine.
Figure 8.3 – TC2 instruction format
The TC2 code has a setup section and a while loop that includes a fetch
instruction and an execute instruction part. The structure of the while
loop part of the code (instruction fetch/execute cycle) consists of the
following:
while run == True:
operation # Body of while loop operation
.
.
statement # Next operation after the while loop
Right shifts and ANDs extract fields from the instruction; for example,
the 3-bit opcode is extracted from the 10-bit CCCDMLLLLL instruction by
shifting seven places left to get 0000000CCC. The direction bit, Dir, is
extracted by performing six left shifts to get 000000CCCD and then
ANDing the result with 1 to get 000000000D. These two operations can be
combined and written as follows:
(IR >> 6) & 1 # 6-bit shift right with >> and AND with 1 using
the AND operator, &
Similarly, we extract the mode bit by performing Mode = (IR >> 5) & 1.
In the execute phase, the three op-code bits, OpCode, select one of the
eight possible instructions. Of course, the use of if … elif would have
been more appropriate:
if OpCode == 0:
Code for case 0
elif OpCode == 1:
Code for case 1
.
.
elif OpCode == 7:
Code for case 7
Each op-code is guarded by an if statement. Here’s the code for the load
and store accumulator instruction. We treat this as one operation and use
the direction flag, Dir, to select between LDA (direction memory to
accumulator) and STA (direction accumulator to memory):
if OpCode == 0: # Test for Load A or Store A instruction
if Dir == 0: # If direction bit is 0, then it’s a load
accumulator if Mode == 0: # Test for literal or direct
memory operand Acc = Lit # If mode is 0, then it’s a
literal operand else: #If mode is 1, then it’s a
memory access MAR = Lit #Copy field (address) to
MAR MBR = Memory[MAR] #Do a read to get the operand in
MBR Acc = MBR #and send it to the accumulator
else: MAR = Lit # If direction is 1 then it’s a
store accumulator MBR = Acc # Copy accumulator to
MBR Memory[MAR] = MBR # and write MBR to memory
To make it easier to read the code, we’ve divided it into two blocks (one
shaded in dark gray and one in light gray) guarded by the if Dir == 0
Note the use of the Mode flag. When loading the accumulator from
memory, LDA, the mode flag is used to load the accumulator with either a
literal or the contents of memory. When executing a STA, which refers to
the store accumulator, the mode flag is ignored because only a memory
store is possible.
We don’t need to describe the ADD and SUB operations because they are
simply extensions of the load and store operations. We’ve included a
clear operation, CLR, which sets either the accumulator to 0 or the
contents of memory to 0 depending only on the Mode flag.
We’ll now present the full simulator code. The Memory[MAR] notation
means the contents of memory whose address is in the MAR and is
conveniently identical to the RTL we’ve been using. In the execute
instruction block, alternate opcodes are shaded gray and blue to facilitate
reading.
TC2 has a clear operation, CLR, that sets either the accumulator or the
contents of memory to 0 depending on the Mode flag. This simplified
computer has only a Z-bit (no N and C bits).
The branch group of instructions (BRA, BEQ, and BNE) load the program
counter with a literal to force a jump. BRA performs an unconditional
branch, and BEQ/BNE depending on the state of the Z-bit, which is
set/cleared by add and subtract operations. The branch target address is
an absolute address provided by the literal field.
Now that we’ve loaded memory with the program and set up some
variables, we can enter the fetch execute loop:
# MAIN LOOP – FETCH/EXECUTE
while run: # This is the fetch/execute cycle loop that
continues until run is False MAR = PC # FETCH PC
to mem Address Register pcOld = PC # Keep a copy
of the PC for display PC = PC + 1 # Increment PC
MBR = mem[MAR] # Read the instruction, copy it to the mem
Buffer Register IR = MBR # Copy instruction to
Instruction Register – prior to decoding it OpCode = (IR >> 7) & 0x7 #
Extract Op-Code from instruction bits 7 to 10 by shifting masking Dir = (IR
>> 6) & 1 # Extract data direction from instruction (0 = read, 1 = write)
Mode = (IR >> 5) & 1 # Extract address mode from instruction (0
= literal, 1 = mem) Lit = IR & 0x1F # Extract literal/address
field (0 = address, 1= literal) # EXECUTE The
EXECUTE block is an if statement, one for each opcode if OpCode ==
0: # Test for LDA and STA (Dir is 0 for load acc and 1 for store in mem)
if Dir == 0: # If Direction is 0, then it’s a load
accumulator, LDA if Mode == 0: # Test for Mode bit to
select literal or direct mem operand Acc = Lit # If
mode is 0, then the accumulator is loaded with L
else: # If mode is 1, then read mem to get
operand MAR = Lit # Literal (address) to MAR
MBR = mem[MAR] # Do a read to get operand in MBR
Acc = MBR # and send it to the accumulator
else: MAR = Lit # If Direction is 1, then
it’s a store accumulator MBR = Acc # Copy
accumulator to MBR mem[MAR] = MBR # and write MBR to
mem elif OpCode == 1: # Test for ADD to accumulator
if Mode == 0: # Test for literal or direct mem operand
total = Acc + Lit # If mode is 0, then it’s a literal operand
if total == 0: z = 1 # Deal with z flag else:
z=0 else: # If mode is 1, then it’s a direct
mem access MAR = Lit # Literal (address) to MAR
MBR = mem[MAR] # Do a read to get operand in MBR
total = MBR + Acc # And send it to the accumulator
if Dir == 0: Acc = total # Test for destination (accumulator)
else: mem[MAR] = total # Or mem elif OpCode ==
2: # Test for SUB from accumulator if Mode ==
0: # Test for literal or direct mem operand total =
Acc – Lit # If mode is 0 then it’s a literal operand
else: # If mode is 1 then it’s a direct mem
access MAR = Lit # Literal (address) to MAR
MBR = mem[MAR] # Do a read to get operand in MBR
total = Lit – MBR # and send it to the accumulator
if total == 0: z = 1 # Now update z bit (in all cases)
if Dir == 0: Acc = total # Test for destination (accumulator)
else: mem[MAR] = total # Or mem
You could argue that we should have inserted a break or exit here
because if we haven’t encountered a valid op-code by the end of the
execute loop, the source code must be invalid:
# End of main fetch-execute loop
mnemon = mnemonics.get(OpCode) # Get the mnemonic for printing
print('PC',pcOld, 'Op ',OpCode, 'Mode = ', Mode, 'Dir =
',Dir, \
'mem', mem[16:19], 'z',z, 'Acc', Acc, mnemon)
We now run this program. The output when running this program is as
follows:
PC 0 OpCode 0 Mode = 1 Dir = 0 mem [4, 5, 0] z 0 Acc
4 LDA/STR
PC 1 OpCode 1 Mode = 1 Dir = 0 mem [4, 5, 0] z 0 Acc
9 ADD
PC 2 OpCode 0 Mode = 1 Dir = 1 mem [4, 5, 9] z 0 Acc
9 LDA/STR
PC 3 OpCode 2 Mode = 0 Dir = 0 mem [4, 5, 9] z 0 Acc
6 SUB
PC 4 OpCode 5 Mode = 0 Dir = 0 mem [4, 5, 9] z 0 Acc
6 BEQ
PC 5 OpCode 0 Mode = 0 Dir = 0 mem [4, 5, 9] z 0 Acc 18
LDA/STR
PC 6 OpCode 0 Mode = 1 Dir = 1 mem [4, 5, 18] z 0 Acc 18
LDA/STR
PC 7 OpCode 3 Mode = 0 Dir = 0 mem [4, 5, 18] z 0 Acc
0 CLR
PC 8 OpCode 0 Mode = 0 Dir = 0 mem [4, 5, 18] z 0 Acc
2 LDA/STR
PC 9 OpCode 2 Mode = 0 Dir = 0 mem [4 5 18] z 1 Acc
PC 9 OpCode 2 Mode = 0 Dir = 0 mem [4, 5, 18] z 1 Acc
0 SUB
PC 10 OpCode 5 Mode = 0 Dir = 0 mem [4, 5, 18] z 1 Acc
0 BEQ
PC 12 OpCode 7 Mode = 0 Dir = 0 mem [4, 5, 18] z 1 Acc
0 STOP
BRA 0 0
Undefined 0 1
BEQ 1 0
BNE 1 1
Table 8.3 – Re-purposing the direction and mode bits
We have used the Dir and Mode instruction bits to select the branch type.
As a bonus, we have a spare operation that is marked undefined. The
code for the branch group is as follows. We’ve used shading to help
identify the blocks. Note that in this example, we demonstrate how
branches can be made program counter relative:
if OpCode == 3: # Test for the branch
group
if Dir == 0: # Direction 0 for
unconditional
if Mode == 0: PC = PC + Lit - 1 # If Mode is zero then
unconditional branch
else: run = 0 # If Mode is 1 then
this is undefined so stop
else:
if Dir == 1: # If direction is 1, it's
a conditional branch
if Mode == 0: # If mode is 0 then
we have a BNE
if Z == 0: PC = PC + Lit - 1 # Branch on Z = 0
(not zero)
else: # If Mode is 1 we
have a BEQ
if Z == 1: PC = PC + Lit - 1 # Branch on Z = 1
(zero)
This code looks a little more complex than it is, because we have if
statements nested four deep when we test for op-code, direction, mode,
and then Z-bit. However, this example demonstrates how instruction bits
can be reused to increase the number of instructions at the cost of
decoding complexity.
There’s still room to maneuver and squeeze more functionality out of the
instruction set. Look at the CLR instruction. We use the mode bit to clear
memory or the accumulator. How about being a little creative and using
the direction bit to provide another operation? Incrementing a register or
memory is a common operation, so let’s provide that. We can use Dir ==
0 for CLR and Dir == 1 for INC Memory/accumulator. The block shaded
in gray is the original clear and the block shaded in blue is the new
increment operation:
if OpCode == 6: # Test for clear mem/Acc or increment
mem/Acc
if Dir == 0: # Direction = 0 for clear operation
if Mode == 0: # If Mode = 0
Acc = 0 # Then clear accumulator
else:
MAR = Lit # If Mode = 1
Memory[MAR] = 0 # Then clear memory location
else: # Direction = 1 for increment
if Mode == 0: # If Mode = 0
Acc = Acc + 1 # Then increment accumulator
else:
MAR = Lit # If Mode = 1
MBR = Memory[MAR] # Then increment memory location
MBR = MBR + 1 # Increment memory in MBR
Memory[MAR] = MBR # Write back incremented memory
value
Finally, consider the STOP (halt) instruction with the 111DMLLLLL opcode.
Here, we have 7 bits doing nothing. That is 27 = 128 combinations. If we
were to reserve one code for halt, say, 1110000000, we could allocate
codes 1110000001 to 1111111111 to new instructions. The next section
extends this architecture to create a more realistic simulator.
0 No operand STOP 0
+ 1
3 Reserved
register [[R2]]
Mode Address Example RTL Class
indirect [R2]
M:123,R2 [R2]
R1,M:123 M[123]
13-15 Reserved
This example uses literal, register direct, and register indirect (pointer-
based) addressing. We have provided the binary code of each instruction
with the class, op-code, addressing mode, registers, and literal fields.
The first part of the TC3 simulator is given as follows. We create two
lists: one for the program memory and one for the data memory (pMem
and dMem). The instructions in program memory are imported from a
file. The data memory is set up as 16 locations that are initialized to 0.
The text file containing the source program is src and is processed to
reformat instructions and remove assembler directives.
The shaded section of the code was added to detect the ‘END’ directive in
the source code, which terminates the assembly processing and acts as a
STOP when the code is executed. I added it for convenience. I sometimes
want to test one or two instructions but don’t want to write a new source
code program. I can put the code under test at the top of an existing
program, followed by END. All code after END is ignored. Later, I can
delete the new code and END.
sTab = {} # Symbol table for equates and
labels name:integerValue
pMem = [] # Program memory (initially
empty)
dMem = [0]*16 # Data memory. Initialized and 16
locations
reg = [0]*8 # Register set
z,c,n = 0,0,0 # Define and status flags: zero,
carry, negative
testCode = "E:\\AwPW\\TC3_NEW_1.txt" # Source filename on my
computer
with open(testCode) as src: # Open the source file containing
the assembly program
lines = src.readlines() # Read the program into lines
src = [i[0:-1].lstrip() for i in lines ]
# Remove the /n newline from
each line of the source code
src = [i.split("@")[0] for i in src] # Remove comments in the
code
src = [i for i in src if i != ''] # Remove empty lines
for i in range(0,len(src)): # Scan source code line-by-line
src[i] = src[i].replace(',',' ') # Replace commas by a space
src[i] = src[i].upper() # Convert to upper-case
src[i] = src[i].split() # Split into tokens (label,
mnemonic, operands)
src1 = [] # Set up dummy source file,
initially empty
for i in range (0,len(src)): # Read source and stop on first
END instruction
src1.append(src[i]) # Append each line to dummy
source file
if src[i][0] == 'END': break # Stop on 'END' token
src = src1 # Copy dummy file to source
(having stopped on 'END')
for i in range (0,len(src)): # Deal with equates of the form
EQU PQR 25
if src[i][0] == 'EQU': # If the line is 3 or more tokens
and first token is EQU
sTab[src[i][1]] = getL(src[i][2]) # Put token in symbol
table as integer
src = [i for i in src if i.count('EQU') == 0] # Remove lines with
"EQU" from source code
The next section of the assembler does all the work. Here, we generate
the binary code. Unlike other simulators we’ve developed, we use
directories and lists to detect registers, as the following (partial) code
shows:
rName = {'R0':0,'R1':1,'R2':2,'R3':3} # Relate register name to
numeric value (lookup table)
rNamInd = {'[R0]':0,'[R1]':1,'[R2]':2,'[R3]':3}
# Look for register indirect
addressing (lookup table)
iClass0 = ['STOP', 'NOP', 'END'] # Instruction class 00
mnemonic with no operands
iClass1 = ['BRA', 'BEQ', 'BNE','CZN' ] # Instruction class 01
mnemonic with literal operand
Now, we can take a token and ask whether it’s in rName to detect R0 to
R7, or whether it’s in rNamInd to detect whether it’s [R0] to [R7].
Moreover, we can use the mnemonic from an instruction and ask
whether it’s in each class in turn in order to determine the two class bits
of the instruction; for example, if t0 is the first token (corresponding to
the mnemonic), we can write the following:
if t0 in iClass0: mode = 0.
The most complex class of instructions is iClass3, which deals with two-
operand instructions, such as ADD [R3],R4. In this case, token t0 would
be ‘ADD’, token t1 would be ‘[r3]’, and token t2 would be ‘R4’. To
identify the class of this instruction, we look for a first operand, which is
an indirect register, and a second operand, which is a register, as follows:
if (t1 in rNamInd) and (t2 in rName): mode = 7
After extracting the instruction class, op-code, and mode, the final step is
to get the actual register numbers and any literals. In the following
fragment of code, we define the two register fields and the literal field,
respectively. These are rField1, rField2, and lField and are all
initialized to 0, because instructions without three fields have the
corresponding bits set to 0.
Here, we use the list as a very convenient method for extracting fields
rather than combined if and or operators. For example, register field 1 is
used by modes 4, 5, 6, and 11. We could write the following:
if (mode == 4) or (mode == 5) or (mode == 6) or (mode == 11):
The following code shows how the three register/literal fields are
evaluated:
binC = (mnemon[t0] << 18) + (mode << 14) # Insert op_Code and mode
fields in instruction
rField1, rField2, lField = 0, 0, 0 # Calculate register and
literal fields. Initialize to zero
if mode in [4,5,6,11]: rField1 = rName[t1] # Convert register names
into register numbers
if mode in [7,8,12]: rField1 = rNamInd[t1]
if mode in [5,7,9]: rField2 = rName[t2] # rField2 is second register
field
if mode in [6,8,10]: rField2 = rNamInd[t2]
if mode in [4,11,12]: lField = getL(t2)
# if (mode==4) or (mode==11) or
(mode==12): lField = getL(t2)
if mode in [9,10]: lField = getL(t1)
# if (mode == 9) or (mode == 10):
lField = getL(t1) Literal field
1. Printing data
This would print binary code only when debugging is required by setting
the variable to 5 or greater.
Here is the code used to deal with class 1 instructions. We do not have to
worry about decoding the mode as there is only one mode for this class.
Of course, the class could be extended (in the future) by the addition of
other modes.
elif opClass == 1: # Class 1 operation
instructions with literal operand
if thisOp == 'BRA': pc = lit # BRA Branch
unconditionally PC = L
elif (thisOp == 'BEQ') and (z == 1): pc = lit # BEQ
Branch on zero
elif (thisOp == 'BNE') and (z == 0): pc = lit # BNE
Branch on not zero
elif thisOp == 'CZN': # Set/clear c, z, and n
flags
flags
c = (lit & 0b100) >> 2 # Bit 2 of literal is c
z = (lit & 0b010) >> 1 # Bit 1 of literal is z
n = (lit & 0b001) # Bit 0 of literal is n
Class 1 instructions have an op-code and literal and are generally used to
implement branch operations. Notice that we compare the current
instruction with a name (e.g., ‘BRA’) rather than an op-code, as we did in
other simulators. The use of a table of reverse op-code-to-mnemonic
translations makes life much easier.
4. Handling literals
5. Result Writeback
Sample output
The following is a sample of the output from the simulator that
demonstrates integer handling. We have written a program with six
different ways of inputting a literal. In each case, we load the literal into
register r0. The source program is as follows:
EQU www,#42
MOV r0,#12
MOV r0,#%11010
MOV r0,#0xAF
MOV r0,#-5
MOV r0,M:7
MOV r0,#www
NOP
STOP
END
In the following code block, we have the output of TC3. This output has
been designed for the purpose of developing and testing the simulator
(for example, following the assembly process):
Source code This is the tokenized source
code
['MOV', 'R0', '#12']
['MOV', 'R0', '#%11010']
['MOV', 'R0', '#0XAF']
['MOV', 'R0', '#-5']
['MOV', 'R0', 'M:7']
['MOV', 'R0', '#WWW']
['NOP']
['STOP']
['END']
Equate and branch table This is the symbol table. Only
one entry
WWW 42
The following is the output during the assembly and analysis phase:
Assembly loop
MOV R0 #12 pc=0 110000010000000000001100 Class=3 mode=4
MOV t1=R0 #12
MOV R0 #%11010 pc=1 1110000010000000000011010 Class=3 mode=4
MOV t1=R0 #%11010
MOV R0 #0XAF pc=2 110000010000000010101111 Class=3 mode=4
MOV t1=R0 #0XAF
MOV R0 #-5 pc=3 110000010000000011111011 Class=3 mode=4
MOV t1=R0 #-5
MOV R0 M:7 pc=4 110000101100000000000111 Class=3
mode=11 MOV t1=R0 M:7
MOV R0 #WWW pc=5 110000010000000000101010 Class=3 mode=4
MOV t1=R0 #WWW
NOP pc=6 000000000000000000000000 Class=0 mode=0
NOP t1 =
STOP pc=7 001110000000000000000000 Class=0 mode=0
STOP t1 =
END pc=8 001111000000000000000000 Class=0 mode=0
END t1 =
110000010000000000001100 This is the program in binary
form
110000010000000000011010
110000010000000010101111
110000010000000011111011
110000101100000000000111
110000010000000000101010
000000000000000000000000
001110000000000000000000
001111000000000000000000
The following two functions provide the ability to read integer operands
in various formats, and an ALU that performs arithmetic and logic
operations. Both of these functions can be expanded to provide
additional capabilities:
def getL(lit8): # Convert string to
integer
lit8v = 9999 # Dummy default
if lit8[0:2] == 'M:': lit8 = lit8[2:] # Strip M: prefix from
memory literal addresses
if lit8[0:1] == '#': lit8 = lit8[1:] # Strip # prefix from
literal addresses
if type(lit8) == int: lit8v = lit8 # If integer, return it
elif lit8.isnumeric(): lit8v = int(lit8) # If decimal in text
from convert to integer
elif lit8 in sTab: lit8v = sTab[lit8] # If in symbol table,
ti it
retrieve it
elif lit8[0] == '%': lit8v = int(lit8[1:],2) # If binary
string convert to int
elif lit8[0:2] == '0X': lit8v = int(lit8[2:],16) # If hex string
convert to int
elif lit8[0] == '-': lit8v = -int(lit8[1:]) & 0xFF
# If decimal negative
convert to signed int
return(lit8v) # Return integer
corresponding to text string
def alu(fun,op1,op2): # Perform arithmetic and logical
operations on operands 1 and 2
global z,n,c # Make flags global
z,n,c = 0,0,0 # Clear status flags
initially
if fun == 0: res = op2 # MOV: Perform data copy
from source to destination
elif fun == 1: # ADD: Perform addition -
and ensure 8 bits plus carry
res = (op1 + op2) # Do addition of
operands
if thisOp == 'ADC': res = res + c # If operation ADC
then add carry bit
elif fun == 2: res = (op1 - op2) # SUB: Perform
subtraction
elif fun == 3: res = op1 - op2 # CMP: Same as subtract
without writeback
elif fun == 4: res = op1 & op2 # AND: Perform bitwise AND
elif fun == 5: res = op1 | op2 # OR
elif fun == 6: res = ~op2 # NOT
elif fun == 7: res = op1 ^ op2 # XOR
elif fun == 8:
res = op2 << 1 # LSL: Perform single
logical shift left
elif fun == 9:
res = op2 >> 1 # LSR: Perform single
logical shift right
elif fun == 10: # ONES (Count number of 1s
in register)
onesCount = 0 # Clear the 1s counter
for i in range (0,8): # For i = 0 to 7 (test each bit) AND
with 10000000 to get msb
if op2 & 0x80 == 0x80: # If msb is set
onesCount = onesCount + 1 # increment the 1s counter
op2 = op2 << 1 # shift the operand one
place left
res = onesCount # Destination operand is 1s
count
elif fun == 11: # MRG (merge alternate bits
of two registers)
t1 = op1 & 0b10101010 # Get even source operand
bits
t2 = op2 & 0b01010101 # Get odd destination
operand bits
res = t1 | t2 # Merge them using an OR
elif fun == 12: # FFO (Find position of
leading 1)
res = 8 # Set default position 8 (i.e.,
leading 1 not found)
for i in range (0,8): # Examine the bits one by
one
temp = op2 & 0x80 # AND with 10000000 to get
leading bit and save
op2 = op2 << 1 # Shift operand left
res = res - 1 # Decrement place counter
if temp == 128: break # If the last tested bit was 1
then jump out of loop
if res & 0xFF == 0: z = 1 # TEST FLAGS z = 1 if bits
0 to 7 all 0
if res & 0x80 == 0x80: n = 1 # If bit 7 is one, set the carry
bit
if res & 0x100 == 0x100: c = 1 # carry bit set if bit 8 set
if (thisOp == 'LSR') and (op2 & 1 == 1): c = 1
# Deal with special case of
shift right (carry out is lsb)
return(res & 0xFF) # Return and ensure value
eight bits
The following is the instruction execution part of the program. Note that
instructions are executed in the order of their class:
### EXECUTE THE CODE
thisOp = mnemonR[opCode] # Reverse assemble. Get
mnemonic from op-code
if iClass == 0: # Class 0 no-operand
instructions
if thisOp == 'END' or thisOp == 'STOP': run = False
# If END or STOP clear
run flag to stop execution
if opCode == 'NOP': pass # If NOP then do nothing
and "pass"
elif iClass == 1: # Class 1 operation
# Class 1 branch and instr
with literal operands
if thisOp == 'BRA': pc = lit # BRA Branch
unconditionally PC = L
elif (thisOp == 'BEQ') and (z == 1): pc = lit # BEQ
Branch on zero
elif (thisOp == 'BNE') and (z == 0): pc = lit # BNE
Branch on not zero
elif thisOp == 'CZN': # Set/clear
c, z, and n flags
c = (lit & 0b100) >> 2 # Bit 2 of
literal is c
z = (lit & 0b010) >> 1 # Bit 1 of
literal is z
n = (lit & 0b001) # Bit 0 of literal is c
elif iClass == 2: # Class 0 single-register
operand
if thisOp == 'INC': reg[reg1] = alu(1,reg[reg1],1)
# Call ALU with second
operand 1 to do increment
elif thisOp == 'DEC': reg[reg1] =
alu(2,reg[reg1],1) # Decrement register
elif thisOp == 'RND': reg[reg1] =
random.randint(0,0xFF)
# Generate random
number in range 0 to 0xFF
elif thisOp == 'TST': # Test a register: return z
and n flags. Set c to 0
z, n, c = 0, 0, 0 # Set all flags
to 0
if reg[reg1] == 0: z = 1 # If operand 0
set z flag
if reg[reg1] & 0x80 == 0x80: n = 1 # If operand
ms bit 1 set n bit
elif iClass == 3: # Class 3 operation: Two
operands.
if mode in [4,5,6,11]: op1 = reg[reg1]
# Register, literal e.g.
MOVE r1,#5 or ADD r3,#0xF2
elif mode in [7,8,12]: op1 = dMem[reg[reg1]]
# Register, literal e.g.
MOVE r1,#5 or ADD r3,#0xF2
elif mode in [9,10]: op1 = lit # MOV M:12,r3 moves
register to memory
if mode in [4,11,12]: op2 = lit # Mode second operand
literal
elif mode in [5,7,9]: op2 = reg[reg2]
# Modes with second
operand contents of register
elif mode in [6,8,10]: op2 = dMem[reg[reg2]]
# Second operand pointed
at by register
if thisOp == 'MOV' : fun = 0 # Use mnemonic to get
function required by ALU
if thisOp == 'ADD' : fun = 1 # ADD and ADC use same
function
if thisOp == 'ADC' : fun = 1
if thisOp == 'SUB' : fun = 2
if thi O 'AND' f 4
if thisOp == 'AND' : fun = 4
if thisOp == 'OR' : fun = 5
if thisOp == 'NOT' : fun = 6
if thisOp == 'EOR' : fun = 7
if thisOp == 'LSL' : fun = 8
if thisOp == 'LSR' : fun = 9
if thisOp == 'ONES': fun = 10
if thisOp == 'MRG' : fun = 11
if thisOp == 'FFO' : fun = 12
op3 = alu(fun,op1,op2) # Call ALU to perform the
function
if mode in [4,5,6,11]: reg[reg1] = op3
# Writeback ALU result in
op3 result to a register
elif mode in [7,8,12]: dMem[reg[reg1]] = op3
# Writeback result to
mem pointed at by reg
elif mode in [9,10]: dMem[lit] = op3
# Writeback the result to
memory
trace() # Display the results line
by line
1. Source code
X 1
Y 5
Z 9
NEXT 1
LOOP 9
3. Assembly loop
4. EXECUTE
We have provided only a few lines of the traced output and reformatted
them to fit on the page.
>>>
MOV R0 #8 pc = 0 110000010000000000001000
Class = 3 mode = 4
Reg = 08 00 00 00 00 00 00 00
Mem = 00 00 00 00 00 00 00 00 00 00 00
C = 0 Z = 0 N = 0
NEXT: RND R5 pc = 1 100010001010100000000000
Class = 2 mode = 2
Reg = 08 00 00 00 00 8f 00 00
Mem = 00 00 00 00 00 00 00 00 00 00 00
C = 0 Z = 0 N = 0
MOV [R0] R5 pc = 2 110000011100010100000000
Class = 3 mode = 7
Reg = 08 00 00 00 00 8f 00 00
Mem = 00 00 00 00 00 00 00 00 8f 00 00
C = 0 Z = 0 N = 1
DEC R0 pc = 3 100001001000000000000000
Class = 2 mode = 2
Reg = 07 00 00 00 00 8f 00 00
Mem = 00 00 00 00 00 00 00 00 8f 00 00
C = 0 Z = 0 N = 0
BNE NEXT pc = 4 010010000100000000000001
Class = 1 mode = 1
Reg = 07 00 00 00 00 8f 00 00
Mem = 00 00 00 00 00 00 00 00 8f 00 00
C = 0 Z = 0 N = 0
NEXT: RND R5 pc = 1 100010001010100000000000
Class = 2 mode = 2
Reg = 07 00 00 00 00 35 00 00
Mem = 00 00 00 00 00 00 00 00 8f 00 00
C = 0 Z = 0 N = 0
NOTE
Output not displayed to save space
Note the calculation of overflow. The v-bit is set if the sign bits of the
two operands are the same and the sign bit of the result is different.
Overflow is valid only for addition and subtraction. The modulus
function returns a positive value if the input parameter is negative in
two’s complement terms. We do this by inverting the bits and adding 1.
# This function simulates an 8-bit ALU and provides 16 operations
# It is called by alu(op,a,b,cIn,display). Op defines the ALU function
# a,b and cIn are the two inputs and the carry in
# If display is 1, the function prints all input and output on the terminal
# Return values: q, z, n, v, cOut) q is the result
def alu(op,a,b,cIn,display):
allOps = {0:'clr',
1:'add',2:'sub',3:'mul',4:'div',5:'and',6:'or', \
7:'not', 8:'eor', 9:'lsl',10:'lsr',
11:'adc',12:'sbc', \
13:'min',14:'max',15:'mod'}
a, b = a & 0xFF, b & 0xFF # Ensure the input is 8 bits
cOut,z,n,v = 0,0,0,0 # Clear all status flags
if op == 0: q = 0 # Code 0000 clear
elif op == 1: q = a + b # Code 0001 add
elif op == 2: q = a - b # Code 0010 subtract
elif op == 3: q = a * b # Code 0011 multiply
elif op == 4: q = a // b # Code 0100 divide
elif op == 5: q = a & b # Code 0100 bitwise AND
elif op == 6: q = a | b # Code 0100bitwise OR
elif op == 7: q = ~a # Code 0111 bitwise negate
(logical complement)
elif op == 8: q = a ^ b # Code 0100 bitwise EOR
elif op == 9: q = a << b # Code 0100 bitwise logical
shift left b places
elif op == 10: q = a >> b # Code 0100 bitwise logical
shift right b places
elif op == 11: q = a + b + cIn # Code 0100 add with carry
in
elif op == 12: q = a - b - cIn # Code 0100 subtract with
borrow in
elif op == 13: # Code 1101 q =
minimum(a,b)
if a > b: q = b
else: q = a
elif op == 14: # Code 1110 q =
maximum(a,b)
if a > b: q = a # Note: in unsigned terms
else: q = b
elif op == 15: # Code 1111 q = mod(a)
if a > 0b01111111: q = (~a+1)&0xFF # if a is negative q = -a (2s
comp)
else: q = a # if a is positive q = a
# Prepare to exit: Setup flags
cOut = (q&0x100)>>8 # Carry out is bit 8
q = q & 0xFF # Constrain result to 8 bits
n = (q & 0x80)>>7 # AND q with 10000000 and
shift right 7 times
if q == 0: z = 1 # Set z bit if result zero
p1 = ( (a&0x80)>>7)& ((b&0x80)>>7)&~((q&0x80)>>7)
p2 = (~(a&0x80)>>7)&~((b&0x80)>>7)& ((q&0x80)>>7)
if p1 | p2 == True: v = 1 # Calculate v-bit (overflow)
if display == 1: # Display parameters and
results
a,b = a&0xFF, b&0xFF # Force both inputs to 8 bits
print('Op =',allOps[op],'Decimals: a =',a,' b =',b, \
'cIn =',cIn,'Result =',q)
print('Flags: Z =',z, 'N =',n, 'V =',v, 'C =',cOut)
print('Binaries A =',format(a,'08b'), 'B
=',format(b,'08b'), \
'Carry in =',format(cIn,'01b'), 'Result
=',format(q,'08b'))
print ()
return (q, z, n, v, cOut) # Return c (result), and flags
as a tuple
I have arranged the code so that you can see only the parameters entered
as needed for the operation, for example, add 3,7, mod 5, or sbc 3 4 1. To
make it easier to test logic functions, you can enter parameters in binary
(%10110) or hexadecimal ($3B) format.
A feature of the test code is that I use a reverse dictionary. This allows
you to enter a function by its name, rather than number.
We’ll present the code first and then add some comments via the labels
that indicate points of interest. Shaded parts of the code have comments
following the code:
import re # Library for regular expressions for
removing spaces (See 1)
from random import * # Random number library
import sys # Operating system call library
from datetime import date # Import date
function (See 2)
bPt = [] # Breakpoint table (labels and PC
values)
bActive = 0
today = date.today() # Get today's
date (See 2)
print('Simulator', today, '\n')
deBug, trace, bActive = 0, 0, 0 # Turn off debug, trace and
breakpoint modes (See 3)
x1 = input('D for debug >>> ') # Get command input
if x1.upper() == 'D': deBug = 1 # Turn on debug mode if 'D' or 'd'
entered
x2 = input('T or B') # Get command input
x2 = x2.upper() # Convert to upper-case
if x2 == 'T': trace = 1 # Turn on trace mode if 'T' or 't'
entered
elif x2 == 'B': # If 'B' or 'b' get breakpoints until 'Q'
input (See 4)
next = True
bActive = 1 # Set breakpoint active mode
while next == True: # Get breakpoint as either label or PC
value
y = input('Breakpoint ')
y = y.upper()
bPt.append(y) # Put breakpoint (upper-case) in table
if y == 'Q': next = False
if deBug == 1: # Display breakpoint table if in debug
mode
print ('\nBreakpoint table')
for i in range (0,len(bPt)): print(bPt[i])
print()
print()
The memProc() function deals with the data memory and allows you to
store data in memory and even ASCII code. This function processes
assembler directives:
def memProc(src): # Memory
processing
return()
Here, we carry out the usual cleaning up of the source text in the
assembly language file and prepare the text for later parsing and analysis.
Note that we use a regular expression to remove multiple spaces. This is
a feature we do not use in this book, but it is worthwhile investigating if
you are doing extensive text processing:
for i in range(0,len(src)): # Remove comments from
source
src[i] = src[i].split('@',1)[0] # Split line on first occurrence
of @ and keep first item
src = [i.strip(' ') for i in src ] # Remove leading and
trailing spaces
src = [i for i in src if i != ''] # Remove blank lines
src = [i.upper() for i in src] # Convert lower- to
upper-case
src = [re.sub('+', ' ',i) for i in src ] # Remove multiple
spaces 1
src = [i.replace(', ',' ') for i in src] # Replace commas
space by single space
src = [i.replace('[','') for i in src] # Remove [ in register
indirect mode
src = [i.replace(']','') for i in src] # Remove [
src = [i.replace(',',' ') for i in src] # Replace commas by
spaces
src = [i for i in src if i[0] != '@'] # Remove lines with just
a comment
src = [i.split(' ') for i in src] # Tokenize
if deBug == 1: # If in debug mode print
the source file
print('\nProcessed source file\n')
[print(i) for i in src]
# Initialize key variables
# memP program memory, memD data memory
sTab = {} # Set up symbol table for
labels and equates
memP = [0] * 64 # Define program
memory
memD = [0] * 64 # Define data memory
memPoint = 0 # memPoint points to
next free location
[sTab.update({i[1]:i[2]}) for i in src if i[0] == '.EQU']
# Scan source file and
deal with equates
src = [i for i in src if i[0] != '.EQU'] # Remove equates from
source
src = memProc(src) # Deal with memory-
related directives
for i in range (0,len(src)): # Insert labels in symbol
table
if src[i][0][-1]== ':': sTab.update({src[i][0][0:-1]:i})
# Remove the colon from
labels
print('\nSymbol table\n')
for x,y in sTab.items(): print("{:<8}".format(x),y) # Display
symbol table
if deBug == 1:
print("\nListing with assembly directives removed\n")
for i in range(0,len(src)): # Step through each line
of code
z = '' # Create empty string for
non-labels
if src[i][0][-1] != ':': z = ' '
# Create 8-char empty
first spaced
for j in range(0,len(src[i])): # Scan all tokens of
instruction
y = src[i][j] # Get a token
y = y.ljust(8) # Pad it with spaces with
a width of 8 characters
z = z + y # Add it to the line
print(str(i).ljust(3),z) # Print line number and
instruction
if deBug == 1: # Display data memory
for debugging
print("\nData memory")
[print(memD[i]) for i in range(0,memPoint+1)] # print pre-
loaded data in memory
print()
#### MAIN ASSEMBLY LOOP
if deBug == 1: print('Assembled instruction\n') # If in debug
mode print heading 4
Now, we can use opFormat to extract the required parameters from the
predicate:
# OP-CODE FORMATS
if opFormat[0] == 1: # Type 1 single register rD: inc r0
rD = get_reg(pred,0)
if opFormat[0] == 2: # Type 2 literal operand: BEQ 24
lit = get_lit(pred,-1)
if opFormat[0] == 3: # Type 3 two registers dD, rS1:
MOV r3,R0
rD = get_reg(pred,0)
rS1 = get_reg(pred,1)
if opFormat[0] == 4: # Type 4 register and literal Rd, lit:
LDRL R1,34
rD = get_reg(pred,0)
lit = get_lit(pred,-1)
if opFormat[0] == 5: # Type 5 three registers Rd, Rs1
Rs2: ADD R1,R2,R3
rD = get_reg(pred,0)
rS1 = get_reg(pred,1)
rS2 = get_reg(pred,2)
if opFormat[0] == 6: # Type 6 two registers and lit Rd,
Rs1 lit: ADD R1,R2,lit
rD = get_reg(pred,0)
rS1 = get_reg(pred,1)
lit = get_lit(pred,-1)
if opFormat[0] == 7: # Type 7 two registers and lit Rd,
Rs1 lit: LDR R1,(R2,lit)
rD = get_reg(pred,0)
pred[1] = pred[1].replace('(','') # Remove brackets
pred[2] pred[2] replace(')' '')
pred[2] = pred[2].replace(')','')
rS1 = get_reg(pred,1)
lit = get_lit(pred,-1)
if opFormat[0] == 8: # Type 8 UNDEFINED
pass
This block initializes variables, registers, memory, and the stack pointer
before we enter the code execution loop. Note that we create a stack with
16 entries. The stack pointer is set to 16, which is one below the bottom
of the stack. When the first item is pushed, the stack pointer is pre-
decremented to 15, the bottom of the available stack area:
r = [0] * 8 # Register set
stack = [0] * 16 # stack with 16
locations See 7
sp = 16 # stack pointer
initialize to bottom of stack + 1
lr = 0 # link register initialize
to 0
run = 1 # run = 1 to execute
code
pc = 0 # Initialize program
counter
z,c,n = 0,0,0 # Clear flag bits. Only
z-bit is used
while run == 1: # Main loop
instN = memP[pc] # Read instruction
pcOld = pc # Remember the pc
(for printing)
pc = pc + 1 # Point to the next
instruction
op = (instN >> 25) & 0b1111111 # Extract the op-code
(7 most-significant bits)
rD = (instN >> 22) & 0b111 # Extract the
destination register
rS1 = (instN >> 19) & 0b111 # Extract source
register 1
rS2 = (instN >> 16) & 0b111 # Extract source
register 2
lit = (instN ) & 0xFFFF # Extract literal in
least-significant 16 bits
rDc = r[rD] # Read destination
register contents)
rS1c = r[rS1] # Read source register
1 contents
rS2c = r[rS2] # Read source register
2 contents
In the following code we’ve included stack-based operations for the sake
of demonstrating stack usage and for versatility:
if op == 0b0100000: # BSR sp = sp - 1
stack[sp] = pc pc = lit if op ==
0b0100001: # RTS pc = stack[sp]
sp = sp + 1 if op == 0b0100010: #
PUSH (See 7) sp = sp - 1
stack[sp] = rDc if op ==
0b0100011: # POP (See 7)
r[rD] = stack[sp] sp = sp + 1 if op ==
0b0100100: # BL branch with link (See
10) lr = pc pc = lit if op ==
0b0100101: # RL return from link pc = lr
if op == 0b0110000: # INC r[rD] =
alu(rDc,1,1) if op == 0b0110001: # DEC
r[rD] = alu(rDc,1,2) if op == 0b0000011: # PRT r0
displays the ASCII character in register r0 See 11
character = chr(r[rD]) print(character) if op
== 0b0000000: # STOP run = 0 # END OF
CODE EXECUTION Deal with display if bActive
==1: # Are breakpoints active? if
src[pcOld][0] in bPt: # If the current label or mnemonic is in the
table display() # display the data
if str(pcOld) in bPt: # If the current PC (i.e.,
pcOld) is in the table display display() if trace ==
1: # If in trace mode, display registers x =
input(‘<< ‘) # Wait for keyboard entry (any key will do)
display() # then display current
operation elif bActive != 1: display() # If not trace and
not breakpoints, display registers if run ==
0: # Test for end of program See
12 print(‘End of program’) # If end, say ‘Goodbye’
sys.exit() # and return
Comments on TC4
We have not provided a detailed discussion of this program because it
follows the same pattern as earlier simulators. However, we have
highlighted some of its principal features. The following numbers
correspond to the numbers (at the end of the comment field) in the
shaded lines of the code:
6. The function processes the source code and deals with assembly
directives related to setting up the data memory, for example, loading
data values with the .WORD directive. It also supports storing an
ASCII character in memory and reserving a named memory location
for data.
8. These directives are removed from the source code after they have
done their job.
9. We have created a very simple ALU that implements only add and
subtract. This was done to keep the program small and concentrate
on more interesting instructions in this final example. Simple logic
operations are directly implemented in the code execution of the
program, in the style
10. TC4 provides several stack operations (push and pull). We initially
create a separate stack. TC4’s stack does not use the data memory.
This feature is for demonstration and can be expanded.
13. We have provided an interesting branch and return operation like that
of the ARM’s branch with link. The BL operation jumps to a target
address and saves the return address in a special register called the
link register, rl. At the end of the subroutine, the RL (return from
link) instruction returns to the instruction after the call. This
mechanism allows only one call, because a second call would
overwrite the return address in the link register.
15. When the program has been executed, the sys.exit() library
function exits the program.
16. Here’s an example of code that can be executed by TC4. It’s been
badly set out in order to test TC4’s ability to process text:
@ TC4_test
@ 31 Oct 2021
.equ abc 4
.word aaa abc
.word bbb 5
.dsw dataA 6 @ data area to store numbers
.word end 0xFFFF
ldrl r0,0xF
addl r1,r7,2
bl lk
back: rnd r0
ldrl r3,dataA @ r3 points at data area
ldrm r4,bbb @ r4 contains value to store
ldrl r5,4 @number of words to store
loop: nop @
bsr sub1
dec r5
bne loop
stop
sub1: stri r4,[r3,0]
inc r3
addl r4,r4,2
cmpl r4,9
bne skip
addl r4,r4,6
skip: rts
lk: ldrl r6,%11100101
andl r7,r6,0xF0
rl
Summary
In this chapter, we have extended our overview of simulator design. We
started with one of the simplest simulators of them all, the zero-address
machine; that is, the stack computer, TC0. This simulator is not a true
computer, because it does not include conditional and branch operations.
However, it demonstrates the use of the stack as a means of performing
chained calculations.
In the next chapter, we’ll change course and introduce the ARM-based
Raspberry Pi microprocessor, which can be used to write programs in
Python, and learn how to program a real 32-bit ARM microprocessor in
assembly language.
Part 2: Using Raspberry Pi to Study a
Real Computer Architecture
Now we will turn our attention to a real computer – the ARM that is
embedded at the heart of the Raspberry Pi single-board computer. We
begin by looking at the Raspberry Pi itself and explain how you can
enter an assembly program, execute it, and observe its execution by
examining registers and memory during the process of executing
instructions. Then, we look at the ARM computer in greater detail. First,
we examine the ARM’s instruction set, and then we demonstrate its
addressing modes and how it accesses memory. Finally, we provide an in
depth-coverage of the way in which subroutines are handled by the
ARM.
This is not a handbook for Raspberry Pi. We are interested only in using
it to enter assembly language programs, run them, and observe their
behavior. We do not cover Raspberry Pi’s Windows-style GUI because it
is very similar to the corresponding PC and macOS user interfaces.
Moreover, the Raspberry Pi operating system includes utilities and a web
browser.
Technical requirements
This chapter is based on the Raspberry Pi 4. The software we use should
also be compatible with the earlier 3B model. In order to use Raspberry
Pi, you will need the following:
Raspberry Pi 4 (available with 2 GB, 4 GB, and 8 GB DRAM)
USB mouse
USB keyboard
The text was written using NOOBS (New Out Of the Box Software).
The Raspberry Pi Foundation no longer supports NOOBS and
recommends that you download the latest version of the operating
system using Raspberry Pi Imager, which runs under macOS, Windows,
and Ubuntu. You can find the necessary information at
https://www.raspberrypi.org/.
Raspberry Pi basics
Microcomputers have been around since the 1970s. In the 1970s, several
systems aimed at the enthusiast based on the Z80, 6502, and 6809 8-bit
microprocessors appeared. Operating systems, apps, and the web didn’t
exist then.
Then, in the late 1970s, Intel introduced the 8086 and Motorola its 68000
16-bit CPU (the 68000 microprocessor actually had a 32-bit instruction
set architecture, but Motorola marketed it initially as a 16-bit machine. In
my view this was a catastrophic marketing mistake. 16-bit computers
were a giant leap up from their 8-bit predecessors for two reasons. First,
the technology had advanced, permitting designers to put far more
circuitry on a chip (i.e., more registers, more powerful instruction sets,
etc.), and second, processors were far faster due to the reduction in
feature size (i.e., smaller transistors). Finally, the declining cost of
memory meant that people could run larger and more sophisticated
programs.
In the 1960s, the giant corporation IBM was famous for its large-scale
data-processing machines. However, IBM wanted a change of direction
and IBM’s engineers decided to build a PC around Motorola’s 68000
processor. Unfortunately for Motorola, a version of that chip wasn’t yet
in production. Intel released the 8088, an 8-bit version of its 16-bit 8086
processor with an 8-bit data bus that made it easy to create a low-cost
microcomputer using 8-bit peripherals and memory components. The
8088 still had a 16-bit architecture but was able to interface to 8-bit
memory and I/O devices.
IBM formed a relationship with Intel, and the IBM PC in all its beige-
colored splendor arose in 1981. Unlike Apple, IBM created an open
architecture that anyone could use without paying a royalty. And a
million PC clones flowered. The rest is history. However, the PC and
Apple’s Mac left a hole in the market: an ultra-low-cost computer that
the young, the student, the experimenter, and the enthusiast can play
with. Raspberry Pi plugs this gap.
Low-cost computing has been around for a long time. For a few dollars,
you can buy a greeting card that plays “Happy Birthday” when you open
it. High-performance computing is more expensive. The cost of a
computer often lies not in the processor but in the supporting cast of
components and systems required to convert a microprocessor into a
computer system – in particular, the graphics and display interface, the
memory interface, and the communications interface (input/output).
That’s why the Raspberry Pi has been such an amazing success. On a
tiny, low-cost board, you have all the peripherals and interfaces that you
need to create a complete system comparable to a PC (although not in
terms of performance).
Video display and graphics logic system (you just need to plug the
card into a monitor)
Bluetooth 5.0
Ethernet port
In the 1980s, the Free Software Foundation led by Richard Stillman led
the development of the GNU operating system, which was designed to
provide an open source version of Unix. In 1991, Linus Torvalds
released an open source component of GNU, its kernel, called Linux.
Today, the Linux kernel plus the GNU tools and compilers have become
a free, open source alternative to proprietary operating systems such as
Windows. GNU/Linux is available in different flavors (distributions
written by various groups with the same basic structure but different
features). The original official Raspberry Pi operating system was called
Raspbian and is based on a version of Debian Linux optimized for
Raspberry Pi.
Another useful package is the Geany editor, which has built-in support
for more than 50 programming languages. You can get Geany at
https://www.geany.org/.
There is also a Terminal emulator window that lets you operate in the
Linux command-line mode – a feature that is useful when working with
the ARM assembly language utilities. Figure 9.2 shows the Raspberry Pi
screen on a 4K monitor with several windows open.
Figure 9.2 – Screenshot of Raspberry Pi’s multiple windows
While writing this book, I was also introduced to Visual Studio Code,
which is an editor and debugging platform. Visual Studio Code is free
and available on Linux, macOS, and Windows platforms. Figure 9.3
shows an example of a session using Visual Studio Code to write a
Python program.
Figure 9.3 – A VS Code session while developing a Python program
The top-level folder is / and is called the root folder. The / backslash is
used to navigate the filing system very much like the Windows
equivalent. A big difference between Linux and Windows is that, in
Linux, you don’t have to specify the disk on which the file resides (e.g.,
Windows invariably uses c:/ for operating systems files). In Figure 9.4,
the MyFile.doc file is a text file whose location is
/home/pi/Documents/MyFile.doc.
Directory navigation
If you press the enter key, Raspberry Pi responds with an “I am here”
prompt, as shown in this example:
pi@raspberrypi:/var/log/apt $
This prompt gives the device name and the path to the current directory
(in bold font in this example). You can change the active directory with
the cd (change directory) command, as shown in this example:
cd .. # This means change directory to parent (the node above)
cd home
cd pi
cd Desktop
To list the files and subdirectories in the current directory, you can use
the ls command (list files).
File operations
We now introduce some of Linux’s basic file commands. The pwd
command looks as if it should mean password. Actually, it means print
working directory and displays the contents of the current directory. It’s a
“where am I?” command. Entering pwd will generate a response such as
/home/pi.
To create a new subdirectory, you use the mkdir command. Typing mkdir
newFolder creates a subdirectory called newFolder in the current
directory.
deletes the tempData.py file in the current subdirectory. You can remove
an entire directory with rm -r. This deletes the current directory and is
not reversible. It is a dangerous command. The alternative is rm –d,
which removes the current directory only if it is empty (i.e., you must
first delete its contents).
Linux has a help command, man (i.e., manual) that provides details of
another command; for example, man ls would provide details of the ls
command.
In general, when working with Raspberry Pi, most users will be using the
graphical interface. However, we will be using the command-line input
to set up the Raspberry Pi and assemble, debug, and execute assembly
language programs.
NOTE
sudo apt-get update updates packages but does not install them.
To install a new package on Raspberry Pi, you use the apt-get install
The -h parameter indicates enter the halt state, and the now parameter
indicates an immediate halt. A command to shut down is sudo shutdown
-r now. To reboot Raspberry Pi, you can enter either of the following two
commands. These commands have the same effect on a single-user
system. You would use shutdown -r on a multi-user system:
sudo shutdown -r now
sudo reboot
First, you have to create an assembly language program in text form with
a .s file type. There are many text editors and the one you choose is a
personal preference. I initially used Geany, which is an IDE for
languages such as C. I later used Thonny on my desktop PC. Both Geany
and Thonny are excellent tools. If you create a text file on a desktop PC
(or any other device), you simply change the .txt extension to .s to
make it compatible with RPi’s assembler.
Note that ARM uses mov to load a literal and not ldr (as you might
expect).
The two GCC commands we need to assemble the source myProg.s text
file are as follows:
as –o myProg.o myProg.s
ld –o myProg myProg.o
The first command, as, takes the assembly language source file,
myProg.s, and creates an object code file, myProg.o. The second
command, ld, invokes a linker that uses the object file to create a binary
code file, myProg, that can be executed. The -o option is necessary to
build an output file. You can then run the assembled binary code
program by typing ./myProg.
./myProg ; echo $?
The echo $? command prints a message from the executed program. The
print command is echo and $? indicates the actual message to be printed.
In this case, the $? command returns the exit status of the last command.
You can print other messages; for example, $3 prints the contents of
register r3.
When you load a program into gdb, nothing appears to happen. If you try
to look at your assembly language code or the registers, you will get an
error message. You must explicitly run the program first.
Command Effect
quit Quit: leave the gdb debugger and return to the shell. Ctrl +
D also exits gdb.
next Single step (execute one instruction). This does not trace a
function.
The label field beginning in the first column (bold in the preceding code)
provides a user-defined tag that must be terminated by a colon. The label
field is followed by the instruction consisting of an operation and any
required operands. It doesn’t matter if there is more than one space after
commas in argument lists. The text following the @ symbol is a comment
field and is ignored by the assembler. The GCC compiler also supports
the C language style of comments: text delimited by /* */ characters, as
this example shows.
Table 9.2 describes some of the ARM’s instructions. There is only one
surprise here; the mla multiply and add instruction that specifies four
registers. It multiplies two registers together and adds a third register,
and then puts the sum in a fourth register; that is, it can calculate A = B +
C.D:
Table 9.2 – ARM data processing, data transfer, and compare instructions
By declaring a label as global, you are telling the linker that this label is
visible to other modules and they can refer to it. Labels without a global
directive are local to the current module and invisible to all other
modules; that is, you could use the same label in two modules and there
would not be a conflict.
The _start label indicates the point at which execution begins. The
linker and operating system deal with storing the program in memory;
that is, you don’t have to worry about where it is going to be actually
stored in the computer’s physical memory.
Finally, the last two operations (shaded) provide a means of getting back
to the operating system level once the code has been executed. ARM has
an svc instruction, which stands for service call and is used to invoke the
operating system. Most computers have an operation such as svc and it
has many names – for example, software interrupt. This instruction calls
the operating system and supplies one or more parameters. The
parameter can be part of the instruction itself or it can be loaded into a
register. When the operating system detects a service call, the parameter
is read and the appropriate operation is performed. This action is entirely
system dependent; that is, it is part of the operating system and not part
of the computer’s architecture.
In this case, the specific function required by the service call is pre-
loaded into r7. This mechanism is part of the Raspberry Pi’s operating
system.
Key points to note about the assembly language program are as follows:
Comments are preceded by an @ symbol (or the C language /* */
book ends)
The .global directive provides a label that indicates the entry point
of the program
The next step is to assemble and link the code, which we called a4.s (I
got fed up with typing long names and called the source program a4.s).
We can do this with the following:
pi@raspberrypi:~ $ cd Desktop # Change to
Desktop directory
pi@raspberrypi:~/Desktop $ as -g -o a4.o a4.s # Assemble the
program a4.s
pi@raspberrypi:~/Desktop $ ld -o a4 a4.o # Now link it to
create executable
pi@raspberrypi:~/Desktop $ ./a4 ; echo $? # Run the
pi@raspberrypi: /Desktop $ ./a4 ; echo $? # Run the
executable program a4
The text in bold is my input. These lines change the working directory to
Desktop where my source program is, and then assemble and link the
source program. The final line, ./a4 ; echo $?, runs the program and
prints its return value (showing it’s been successfully executed by
printing 4, the value in r0).
Figure 9.8 – The demonstration program in the Geany editor
The following four lines demonstrate how we call the gdb debugger and
set a breakpoint. Text in bold font indicates lines entered from the
keyboard. The other text is the debugger’s output:
pi@raspberrypi:~/Desktop $ gdb a4
Reading symbols from a4...done.
(gdb) b _start
Breakpoint 1 at 0x10074: file a4.s, line 14.
After entering a run command, the debugger begins execution and prints
the next line to be executed – that is, the line labeled by _start. The gdb
instruction i r (information registers) displays the ARM’s registers as
follows:
(gdb) i r
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x0 0
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
0 0 0
r12 0x0 0
sp 0xbefff390 0xbefff390
lr 0x0 0
pc 0x10074 0x10074 <_start>
cpsr 0x10 16
fpscr 0x0 0
cpsr and fpscr are both status registers that contain information about
the state of the processor.
Finally, we will continue stepping until the code has been executed. You
can step just by using the enter key after the first si 1 command:
(gdb) si 1
21 mov r7,#1 @ Prepare to exit
(gdb)
22 svc 0 @ Go
(gdb)
[Inferior 1 (process 1163) exited with code 04]
The gdb executes the move and supervisor call instruction and exits the
simulation. What have we learned? This example demonstrates the
following:
How to set breakpoints and run the code until a breakpoint is reached
Note that the code continues after my last instruction, svc. This is
because the disassembler reads a block of memory and displays it as
code (even if it is not part of your program). In this case, the data we
entered in memory with the .word directive is read and displayed as the
corresponding ARM instruction. Remember that the debugger does not
know whether a binary value in memory is an instruction or user data. If
it reads a data value corresponding to an instruction op-code, it prints
that op-code.
The bottommost panel contains the commands you enter. In this case,
I’ve used si 1 to step through the instructions.
Figure 9.9 – The TUI showing registers and memory contents
.equ symbol, Equates the symbolic name to its value (e.g., .equ
value hours 24)
@ USING ADR @
adr r3,v3 @ Load address of v3 into register r3
(a pseudo-instruction) ldr r4,[r3] @ Read
contents of v3 in memory @ Read
from memory, increment and store in next location
ldr r0,adr_dat1 @ r0 is a pointer to dat1 in memory
ldr r1,[r0] @ Load r1 with the contents of dat1
add r2,r1,#1 @ Add 1 to dat1 and put in r2
add r3,r0,#4 @ Use r3 to point to next memory
location after dat1 str r2,[r3] @ Store new data
in location after dat1
"\n"
@ This string has 15 characters
This code illustrates several points – for example, the use of assembler
directives such as.equ, which binds a symbolic name to a value. I’ve
shaded interesting blocks of code so that we can discuss them.
We have used ARM’s pseudo-instructions. These are adr r3,v3 and ldr
@ PRINT STRING @
The first block demonstrates how we can print data from an assembly
program. Well, in fact, we can’t print the data but we can ask the
operating system to do it for us. Most processors have an instruction
called a software interrupt (or a system call, a trap, an exception, or an
extra code). All these terms refer to the same thing: an instruction
inserted by the programmer that invokes the operating system. In the
case of ARM, it’s the svc instruction (previously called swi). When used
by Linux, this instruction is called with the parameter 0 – that is svc 0.
@ USING ADR @
The second block demonstrates the use of an adr pseudo-instruction with
adr r3,v3. We are going to load register r3 with the address of a variable
we’ve called v3 and loaded into memory with a .word directive. One
practical consideration is that when you disassemble the code, you will
not see adr; you’ll see the actual code that the ARM assembler translated
it into.
Putting the address of the v3 variable into a register means we can use
that register as a pointer with a load instruction; for example, ldr r4,
[r3] loads the value of the variable (i.e., 0x1111 ) into r4. If you wish to
modify that variable, you might think that you could store it back in
memory with str r5,[r3]. Sadly not! The adr instruction generates code
that allows you to access only the current segment of the program. That
segment is read-only because it contains the code. You cannot alter
memory in that segment. If you wish to modify memory, you have to use
a different technique, as we will soon see.
This stores the 0x1234 value in memory and gives it the name dat1. As
we have seen, that name is used to create the address of the variable in
the code section by the following:
adr_dat1: .word dat1
The next step is to run the code. We’ve done this and have provided an
edited output from the session (removing empty prompt lines between
operations and some text) in Listing 9.1:
pi@raspberrypi:~ $ cd Desktop
pi@raspberrypi:~/Desktop $ as -g -o t1a.o t1a.s
pi@raspberrypi:~/Desktop $ ld -o t1a t1a.o
pi@raspberrypi:~/Desktop $ gdb
GNU gdb (Raspbian 8.2.1-2) 8.2.1
(gdb) file t1a
Reading symbols from t1a...done.
(gdb) b _start
Breakpoint 1 at 0x10074: file t1a.s, line 7.
(gdb) r 1
Starting program: /home/pi/Desktop/t1a 1
Breakpoint 1, _start () at t1a.s:7
7 _start: mov r0,#23 @ Just three dummy operations for
debugging
(gdb) si 1
8 mov r1,#v1
9 add r2,r0,r1
11 ldr r1,=banner @ Test display function (r1 has
address of a string)
12 mov r2,#15 @ Number of characters to print 13
plus two newlines)
13 mov r0,#1 @ Tell the OS we want to print on
console display
14 mov r7,#4 @ Tell the OS we want to perform a
print operation
15 svc 0 @ Call the operating system to do the
printing
Test printing
17 adr r3,v3 @ Load address of v3
18 ldr r4,[r3] @ Read its contents in memory
21 ldr r0,adr_dat1 @ r0 is a pointer to dat1 in memory
22 ldr r1,[r0] @ Load r1 with the contents of data1
23 add r2,r1,#1 @ Add 1 to dat1 and put in r2
24 add r3,r0,#4 @ Use r3 to point to next memory
location after dat1
25 str r2,[r3] @ Store new data in location after
dat1
(gdb) i r r0 r1 r2 r3
r0 0x200e8 131304
r1 0x1234 4660
r2 0x1235 4661
r3 0x200ec 131308
(gdb) si 1
28 mov r0,#0 @ Exit status code (indicates OK)
29 mov r7,#1 @ Tell the OS we want to return from
this program
30 svc 0 @ Call the operating system to return
(gdb) x/2xw 0x200e8
0x200e8: 0x00001234 0x00001235
(gdb) si 1
[Inferior 1 (process 7601) exited normally]
Accessing memory
We have demonstrated how you can step through a program and display
registers as instructions are executed. For example, gdb lets you display
the contents of registers r0 to r3 using the i r r0 r1 r2 r3 command.
We will now demonstrate how the contents of memory locations can be
displayed.
In Listing 9.1, we single-step the code through the first few instructions
(memory access and store operations) and then, after line 25, we can see
that the address of the dat3 variable is 0x200e8. Suppose we want to
check that its value is 0x1234, and that the next word location 4 bytes on,
0x2008c, contains the 0x1235 value.
You might reasonably expect that the gdb command to read the memory
location is m 0x200c. As you can see from Listing 9.1, the command is the
rather less memorable:
x/2xw 0x2208 Read the contents of two memory locations
The memory access command is x/ and the three required parameters are
2xw. These are as follows:
o octal
d decimal
x hexadecimal
u unsigned integer
s string
b byte
b byte
x/1xw 0x1234 Print one 4-byte word in hex form at address 0x1234
x/6xh 0x1234 Print six 2-byte values in hex form at address 0x1234
The location counter is advanced by four bytes so that the next .word or
instruction will be placed in the next word in memory. The term location
counter refers to the pointer to the next location in memory when a
program is being assembled and is similar, in concept, to the program
counter.
You don’t have to use 32-bit values in the ARM programs. The.byte and
.hword assembler directives store a byte and a 16-bit halfword in
memory, respectively, as in this example:
Q1: .byte 25 @ Store the byte 25 in
memory
Q2: .byte 42 @ Store the byte 42 in
memory
Tx2: .hword 12342 @ Store the 16-bit halfword
12,342 in memory
Although you could use .byte to store text strings in memory, it would
be very clumsy because you would have to look up the ASCII value of
each character. The GCC ARM assembler provides a simpler
mechanism. The .ascii directive takes a string and stores each character
as an 8-bit ASCII-encoded byte in consecutive memory locations. The
.asciz command performs the same function but inserts an 8-bit binary
byte of all 0s as a terminator:
Mess1: .ascii "This is message 1" @ Store string memory
Because the ARM aligns all instructions on 32-bit word boundaries, the
.balign 4 directive is required to align whatever follows on the next
word boundary (the 4 indicates a 4-byte boundary). In other words, if
you store three 8-bit characters in memory, the .balign 4 command
skips a byte to force the next address to a 32-bit boundary. Note that
.balign 2 forces alignment on a halfword boundary (you can use
.balign 16, or any other power of 2, to force the next memory access to
be appropriately aligned).
The following ARM code demonstrates storage allocation and the use of
the .balign 4 directive:
.global _start @ Tell the linker where we start
from .text @ This is a text (code)
segment _start: mov r0,#XX @ Load r0 with 5 (i.e.,
XX) mov r1,#P1 @ Load r1 with P1 which is
equated to 0x12 or 18 decimal add r2,r0,r1 @
Just a dummy instruction add r3,r2,#YY @ Test
equate to ASCII byte (should be 0x42 for ‘B’)
adr r4,test @ Let’s load an address (i.e.,
location of variable test) ldr r5,[r4] @ Now,
access that variable which should be 0xBB)
Again: b Again @ Eternal endless loop (terminate
here) .equ XX,5 @ Equate XX to 5
.equ P1,0x12 @ Equate P1 to 0x12
.equ YY,‘B’ @ Equate YY to the ASCII value
for ‘B’ .ascii “Hello” @ Store the ASCII byte
string “Hello” .balign 4 @ Ensure code is
on a 32-bit word boundary .ascii “Hello” @
Store the ASCII byte string “Hello”
.byte 0xAA @ Store the byte 0xAA in memory
test: .byte 0xBB @ Store the byte 0xBB in memory
.balign 2 @ Ensure code is on a 16-bit
halfword boundary .hword 0xABCD @ Store the
16-bit halfword 0xABCD in memory
last: .word 0x12345678 @ Store a 32-bit hex value in
memory .end
Let’s assemble, link, and run this code on a Raspberry Pi using gdb. The
first few lines from the terminal windows show the loading of the
program, setting a breakpoint, and executing in a single-step mode:
pi@raspberrypi:~ $ cd Desktop
pi@raspberrypi:~/Desktop $ as -g -o labels.o labels.s
pi@raspberrypi:~/Desktop $ ld -o labels labels.o
pi@raspberrypi:~/Desktop $ gdb labels
GNU gdb (Raspbian 8.2.1-2) 8.2.1
Reading symbols from labels...done.
(gdb) b _start
Breakpoint 1 at 0x10054: file labels.s, line 3.
(gdb) run 1
Starting program: /home/pi/Desktop/labels 1
Breakpoint 1, _start () at labels.s:3
3 _start: mov r0,#XX @ Load r0 with 5 (i.e., XX)
(gdb) si 1
4 mov r1,#P1 @ Load r1 with 0x12 (i.e., P1)
5 add r2,r0,r1 @ Dummy instruction (r2 is
5+0x12=0x17)
6 add r3,r2,#YY @ Dummy instruction (r3 is
0x17+0x42=0x59)
7 adr r4,test
8 ldr r5,[r4]
Again () at labels.s:9
9 Again: b Again @ Eternal endless loop (enter
control-C to exit)
So far, so good. Let’s see what the registers hold. We have deleted lines
with registers that we’re not interested in to make the output more
readable:
(gdb) i r
r0 0x5 5
r1 0x12 18
r2 0x17 23
r4 0x1007e 65662
r5 0xabcd00bb 2882339003
The problem is that ldr loads a 32-bit value into a register from memory.
0xABCD00 is the word following 0xBB plus a null byte due to the .balign 2
statement. We should have used a special “load a byte” instruction,
loaded four bytes and cleared three to zero, or aligned the byte correctly
in memory. The great strength of a computer is that it does what you tell
it. Alas, its great weakness is that…it does exactly what you tell it.
Next, we look at the data stored in memory using the x/7xw 0x1006c
We next look at a dilemma that affects all computers: how do you load a
constant (literal) that is the same size as the instruction word?
The ARM has 32-bit data words and instructions. You can’t load a 32-bit
literal into an ARM register in one instruction because you can’t specify
both the operation and the data in one instruction. CISC processors chain
two or more instructions together; for example, a 16-bit machine might
take 2 instruction words to create a 32-bit instruction containing a 16-bit
operation and a 16-bit literal. Some processors load a 16-bit literal (load
high) with one instruction and then load a second 16-bit literal (load low)
with a second instruction. The computer then concatenates the high and
low halfword 16-bit values into a 32-bit literal.
The ARM has two pseudo-instructions that can load a 32-bit value into a
register by letting the assembler generate the actual code needed to do
this. The pseudo-instruction adr (load address) has the format adr
rdestination,label, where label indicates a line (address) in the program.
adr lets the assembler generate the appropriate machine code and
relieves the programmer of some housekeeping. The adr uses the ARM’s
add or sub instruction together with PC relative addressing to generate
the required address. Program counter-relative addressing specifies an
address by its distance from the current instruction. The following code
fragment demonstrates the use of adr:
adr r0,someData @ Setup r1 to point to someData in memory
ldr r1,[r0] @ Read someData using the pointer in r0
. . someData: .word 0x12345678 @ Here’s the
data
The following is the edited output of a gdb debugger session. The code
has been executed to completion and the register contents are as follows.
The righthand column displays the data in decimal form:
r0 0x0 0
r1 0x10078 65656
r2 0x1007c 65660
r3 0xabcddcba 2882395322
r4 0xffffffff 4294967295
r5 0xaaaaaaaa 2863311530
pc 0x10074 0x10074 <wait+8>
The pointer registers, r1 and r2, have been loaded with the addresses of
the two data elements in memory (i.e., Table1 and Table2). These
pointers have been used to retrieve the two elements, and you can see
from the debugger that the operation worked.
The first load instruction loads register r0 with data from memory 36
bytes from the current program counter. At that location, the assembler
has stored the 0x12345678 constant to be loaded.
Let’s look at the data in memory. We use the x/6xw 0x10080 gdb
command to display six words of memory from address 0x10080:
(gdb) x/6xw 0x10080
0x10080
<Table2+4>: 0x12345678 0xaaaaaaaa 0x00001141 0x6165
6100
0x10090: 0x01006962 0x00000007
This shows the 0x12345678 constant that has been loaded in memory
following the program, together with the other constants we loaded.
A note on endianism
We’ve not mentioned one topic yet - endianism. The term is borrowed
from Gulliver’s Travels where the world is divided into those who eat
their boiled eggs from the big end and those who eat their eggs from the
little end. This divides the world into mutually hostile big enders and
little enders (it is, of course, satire).
78 or as 78 56 34 12?
Figure 9.4 illustrates three memory systems. In all three cases, memory
is byte-addressed. In the 32-bit version, we have two 32-bit values
representing 0x12345678 stored in memory at addresses c and 0x1014.
Notice that the individual bytes of the stored word have different byte
addresses. A little-endian number is arranged so that the most significant
byte, 0x12, is stored in the lowest address of the word 0x1010. A big-
endian number is stored with the most-significant byte at the lowest
address, 0x1013.
Figure 9.11 – Memory organization
This example also demonstrates how memory data is displayed and how
to use the memory display function to read data. We have used gdb and
copied various screens during the debugging. These have been put
together in what follows. We have removed some material (e.g., status
registers and registers not accessed) and have slightly edited the format
for readability.
.end
The first steps are to assemble and load the program (called endian) and
invoke the gdb debugger. We use bold font to indicate input from the
keyboard:
alan@raspberrypi:~/Desktop $ as g o endian.o endian.s
alan@raspberrypi:~/Desktop $ ld o endian endian.o
alan@raspberrypi:~/Desktop $ gdb endian
GNU gdb (Raspbian 10.11.7) 10.1.90.20210103git
We can use gdb to set a breakpoint at _start and then run the program to
that breakpoint:
(gdb) b _start
Breakpoint 1 at 0x10074: file endian.s, line 2.
(gdb) r
Starting program: /home/alan/Desktop/endian
Breakpoint 1, _start () at endian.s:2
2 _start: adr r0,mike @ r0 points to ASCII string "Mike"
Let’s look at the program that is actually loaded into memory. This
differs slightly from the one we wrote because pseudo-operations have
been replaced by actual code. Note that the adr is translated into an add
by taking the program counter and adding the distance of the required
variable to the current pc to generate its address.
instruction.
You can see the 0x12345678 constant loaded by the assembler and some
of the markers.
The next step is to look at the registers before the program runs to
completion. We do this with gdb’s i r command. There’s not much to
see yet (it’s a partial listing), as we’ve executed only the first few
instructions. However, r0 now contains a pointer to the ASCII text string
“Mike” at address 0x100B8. If you look back at that address, you see that
it contains 0x656b694d, which is ekiM. That’s what little-endian does!
(gdb) i r
r0 0x100b8 65720
r1 0x656b694d 1701538125
r2 0x65 101
r3 0x0 0
r4 0x0 0
sp 0x7efff360 0x7efff360
lr 0x0 0
pc 0x10080 0x10080 <_start+12>
Continuing single-stepping:
(gdb) si 1
7 ldrb r4,[r3] @ Read byte of test Pointer in r0
8 ldrb r5,[r3,#1] @ Read single byte 1 offset
9 ldrh r6,[r3] @ Read halfword of test
11 ldr r7,a_adr @ r7 points at address of testRW
12 ldr r8,=0x12345678 @ r8 loaded with 32-bit 0x12345678
(gdb) i r
r0 0x100b8 65720
r1 0x656b694d 1701538125
r2 0x65 101
r3 0x100bc 65724
r3 0x100bc 65724
r4 0xce 206
r5 0xfa 250
r6 0xface 64206
r7 0x200cc 131276
sp 0x7efff360 0x7efff360
lr 0x0 0
pc 0x10094 0x10094 <_start+32>
Let’s look at the memory in the data section. Register r7 points to the
read/write data area. It starts 4 bytes before the pointer to testRW, in r7 ;
that is, 0x200CC - 4 = 0x200C8. The four words beginning at that address
are as follows:
(gdb) x/4xw 0x200c8
0x200c8: 0x99999999 0xffffffff 0x77777777 0xbbbbbbb
b
Finally, we step through the instruction until we meet the nop at the end:
(gdb) si 1
13 str r8,[r7] @ Store r8 in read/write memory at
testRW
14 ldrh r9,[r7] @ Read halfword of testRW
15 mvn r10,r9 @ Logically negate r9
16 strh r10,[r7,#4] @ Store halfword in next word after
testRW
17 nop
@ Just a place to stop
Here’s our final look at the data memory. Note that 0xFFFFFFFF has been
replaced with the value 0x12345678 that we wrote to memory. This
demonstrates how you can access data memory using an ARM.
Note also the data value at 0x200D0 ; that is, 0x7777a987. We have
changed half the word using a halfword load:
(gdb) x/4xw 0x200c8
0x200c8: 0x99999999 0x12345678 0x7777a987 0xbbbbbbb
b
Summary
In this chapter, we have introduced a real computer, the Raspberry Pi.
Instead of designing our own computer instruction sets, we’ve looked at
the ARM microprocessor that is at the heart of the Raspberry Pi and
most smartphones.
In the next chapter, we return to the ARM architecture and one of its
most important aspects: addressing and how data is transferred to and
from memory.
10
Technical requirements
Because this chapter is an extension of the previous chapter, no new
hardware or software is required. All you need is Raspberry Pi,
configured as a general-purpose computer. The only software needed is a
text editor to create assembly language programs and the GCC assembler
and loader.
Not only has ARM survived when many of the earlier microprocessors
failed – it has also prospered and successfully targeted the world of
mobile devices, such as netbooks, tablets, and cell phones. ARM
incorporates some interesting architectural features that have given it a
competitive advantage over its rivals.
ARM is, in fact, a fabless company – that is, it develops the architecture
of computers and allows other companies to manufacture those
computers. The term fabless is derived from fab (short for fabrication).
Because the ARM’s architecture has developed over the years, and
because there are different versions of the ARM architecture in use, a
teacher of it has a problem. Which version should be used to illustrate a
computer architecture course? In this chapter, we will use the ARMv4
32-bit architecture, which has 32-bit instructions. Some ARM processors
can switch between 32-bit and 16-bit instruction states (the 16-bit state is
called the Thumb state). The Thumb state is intended to run very
compact code in embedded control systems. We will not cover the
Thumb state here.
Arithmetic instructions
Bitwise instruction
Shifting operations
Register r15 is a truly different register from all the others and can never
be used as a general-purpose register (even though you can apply some
instructions to it as if it were general-purpose). Register r15 is the
program counter that contains the address of the next instruction to be
executed and is normally written pc rather than r15 in ARM code.
Putting the program counter in a general register is very rare in the world
of computer architecture. Note that, in practice, pc contains an address
that is 8 bytes ahead of the current pc because of the way that the ARM
is internally organized.
We will look at the ARM’s data processing instructions first, rather than
the data movement operations. We take this approach because data
movement instructions are more complicated, since they involve
complex addressing modes.
Arithmetic instructions
Let’s begin with ARM’s arithmetic instructions that perform operations
on data representing numeric quantities:
Addition add
Subtraction sub
Multiplication mul
The second addition, adc, means, add with carry, and adds any carry out
from the previous addition. We’ve used CL, AL, BL, and so on, rather
than r1, r2, and r3 to demonstrate that these are upper- and lower-order
parts of a number distributed between two registers. We can extend this
principle to perform extended-precision arithmetic with integers of any
length.
The ARM also provides a simple subtract operation, sub, together with a
sbc or subtract with carry instruction to support extended-precision
subtraction, which operate like the corresponding adc.
As well as sub and sbc, the ARM has a reverse subtract operation, where
rsc r1, r2, r3 perform the subtraction of r2 from r3. This instruction
may seem strange and unnecessary because you can simply reverse the
order of the second two registers, can’t you? However, ARM lacks a
negation instruction that subtracts a number from zero ; for example, the
negative of r0 is 0 – [r0]. The reverse subtraction operation can be used
to do this because rsb r1,r1,#0 is equivalent to neg r1.
The operation cmp r1,r2 evaluates [r1] – [r2] and updates the Z, C, N,
and V bits. We can then perform operations such as beq next that branch
to label next if r1 and r2 are equal. We said that you need to append s to
update condition codes. Comparison operations are exceptions because
setting condition codes is what they do. You can write cmps if you want,
since it’s the same as cmp.
There are two types of integer comparison. Consider (in 8 bits) the A =
Multiplication
ARM’s multiply instruction, mul Rd,Rm,Rs, generates the low-order 32
bits of the 64-bit product Rm x Rs. When using mul, you should ensure
that the result does not go out of range because multiplying two m-bit
numbers yields a 2m-bit product. This instruction doesn’t let you multiply
the contents of a register by a constant – that is, you can’t perform mul
r9,r4,#14. Moreover, you can’t use the same register to specify both the
Rd destination and the Rm operand. These restrictions are due to the
implementation of this instruction in hardware. The following code
demonstrates the use of ARM’s multiplication to multiply 23 by 25:
mov r4,#23 @ Load register r4 with 23
mov r7,#25 @ Load register r7 with 25
We’ve already seen that ARM has a multiply and accumulate instruction,
mla, with a four-operand format mla Rd,Rm,Rs,Rn, whose RTL definition
is [Rd] ← [Rm] x [Rs] + [Rn]. The 32-bit by 32-bit multiplication is
truncated to the lower-order 32 bits. Like the multiplication, Rd must not
be the same as Rm (although this restriction was removed in the ARMv6
and later architectures).
The ARM’s NOT operation is written as mvn rd,rs. This move instruction
negates, inverts the bits of the source register and copies them to the
destination register.
When you design instruction sets, one of the major tasks is to construct
binary codes for instructions. These operations make it easy to
implement the manipulation of bits. For example, suppose variable sR1
specifies source register 1, and sR2 specifies source register 2, and we
have to construct a 16-bit binary code, C, with the format
xxxxxaaaxxbbbxxx. Source bits a are in sR1 and source bits b are in sR2
in the lower-order three bits.
We must insert the bits of sR1 and sR2 at the appropriate places without
changing any other bits of C. In Python, we can do this with the
following:
C = C & 0b1111100011000111 # Clear the two fields for sR1 and sR2
sR1 = sR1 << 8 # Move sR1 into position by shifting left
8 times
C = C | sR1 # Insert sR1
sR2 = sR2 << 3 # Move sR2 into position by shifting left
3 times
C = C | sR2 # Insert sR2
We can readily translate this into ARM assembly language using AND, OR,
and shift operations. Assume sR1 is in r1, sR2 is in r2, and C is in register
r0. Moreover, assume that the register bits are already in place in their
respective registers:
ldr r3,=0b1111100011000111 @ Load r3 with 1111100011000111 mask
and r0,r0,r3 @ Mask r0 to get xxxxx000xx000xxx
or r0,r0,r1 @ Insert r1 to get xxxxxaaaxx000xxx
or r0,r0,r2 @ Insert r2 to get xxxxxaaaxxbbbxxx
Shift operations
Python can shift bits left using the << operator, or right using the >>
operator. ARM’s assembly language lacks explicit instructions such as
LSR or LSL that shift bits right or left. However, it does have pseudo-
instructions such as lsl r1,r3,#4 that shift the contents of r3 four places
left, transferring the result to r1. The ARM’s actual approach to shifting
is rather more unusual, complicated, and versatile.
This instruction takes the second source operand, r3, and performs a
logical shift left. The number of left shifts is determined by the contents
of r4. You can also implement a fixed shift using a constant with the
following:
add r1,r2,r3,lsl #3
In this case, register r3 is shifted left by three bits before it is added to r2.
A shift is called dynamic if the number of shifts is specified by a register,
since you can change the number of shifts at runtime by changing the
shift count. If the number of shifts is given by a literal (constant), it
cannot be changed at runtime. This is a static shift.
Shift types
All shifts look the same from the middle of a string of bits – that is, the
bits move one (or more) places left or right. However, what happens to
the bits at the end? When bits are shifted in a register, at one end, a bit
will drop out. That bit can disappear into oblivion, go to the carry bit, or
move around to the other end in a circular fashion. At the end where a bit
is vacated, the new bit can be set to 0, 1, the same as the carry bit, or the
bit that fell off the other end.
The variations in the way that the bit shifted in is treated by computers
correspond to specific types of shift – logical, arithmetic, rotate, and
rotate through carry. Let’s look at some shift operations (table 10.1):
The bits in the destination string in italic are the bits shifted in, and the
bits in the source string in bold are the bits lost (dropped) after the shift.
This type of shift is a logical shift:
Logical shift: The bits shifted are moved one or more places left or
right. Bits fall off at one end and zeros enter at the other end. The last
bit shifted out is copied to the carry flag. Figure 10.3 illustrates the
logical shift left and the logical shift right.
Figure 10.3 – Logical shifts
Figure 10.4 illustrates the arithmetic shift left and shift right. The ARM
has an asr operation but not asl, because asl is identical to LSL – that is,
you use a logical shift left because it is exactly the same as asl.
Figure 10.4 – Arithmetic shifts
Rotate: A rotate treats the value to be shifted as a ring – that is, the
two ends are adjacent. The bit shifted out at one end moves into the
position vacated at the other end. If you apply n rotates to an n-bit
value, you end up where you started. Consider the 8-bit value
01101110 being rotated left, one bit at a time:
A variation of the rotate operation is the rotate through carry, where the
carry bit is considered as part of the word being shifted – that is, an n-bit
word becomes an n+1 bit word. Figure 10.7 demonstrates a rotate
through carry operation, where the carry shifted out is copied into the
carry bit, and the old value of the carry bit becomes the new bit shifted
in. This operation is used in chained arithmetic (it’s the analog of the add
with carry and subtract with borrow operations).
Figure 10.7 – Rotate through carry
rrx, which rotates bits right through carry (Figure 10.7), behaves
differently from other shifts. First, only one direction of shift is
permitted; there is no left shift through carry. Second, the ARM supports
both static and dynamic shifts for all other shift operations, whereas rrx
allows only one single shift.
Consider adcs r0,r0,r0 (add with carry and set status flags). This adds
the contents of r0 to the contents of r0, plus the carry bit, to generate 2 x
r2 = XXXXXXBB r2 is source 2
r3 = XXXXXXCC r3 is source 3
The next section looks at a class of instruction that does not move data or
process data; it determines which instruction will be executed next.
Conditional branches
Unconditional branches
ARM’s unconditional branch is expressed as b target, where target
denotes the branch target address (the address of the next instruction to
be executed). The unconditional branch forces a jump (branch) from one
point in a program to another. It is exactly the same as the unconditional
branch we introduced earlier. The following ARM code demonstrates
how the unconditional branch is used:
.. do this @ Some code .. then that @
Some other code b Next @ Now skip past the next
instructions and jump to Next: .. @ …the code being
skipped past .. @ …the code being skipped past
Next: .. @ Target address for the branch, denoted by label
Next
Conditional branch
ARM’s conditional branches consist of a mnemonic Bcc and a target
address. The subscript defines one of 16 conditions that must be satisfied
for the branch to be taken. If the condition is true, execution continues at
the branch target address. If the condition is not true, the next instruction
in sequence is executed. Consider the flowing example in ARM
assembly language that implements the following:
if x == y: y = y + 1
else: y = y + 2
cmp r1,r2 @ Compare x and y (r1 contains y and r2 contains
x)
bne plus2 @ If not equal, then branch to the else part
add r1,r1,#1 @ If equal, fall through to here and add one to y
b leave @ Now, skip past the else part
plus2: add r1,r1,#2 @ ELSE add 2 to y
leave: … @ Continue from here
1111 NV Never
(reserved)
Conditional executions
Here, we will deal with just one topic, conditional executing, and we will
demonstrate how you can ignore an instruction if it does not fulfill a
specified criterion (related to the condition control status bits). This
mechanism enables programmers to write more compact code.
Consider the add instruction. When the computer reads it from memory,
it is executed, exactly like almost every other computer. The ARM is
different; each of its instructions is conditionally executed – that is, an
instruction is executed only if a specific condition is met; otherwise, it is
bypassed (annulled or squashed). Each ARM instruction is associated
with a logical condition (one of the 16 in Table 10.3). If the stated
condition is true, the instruction is executed.
Translated into ARM code using conditional execution, we can write the
following:
cmp r1,r2 @ Compare A == B
subeq r3,r1,r4 @ If (A== B) then C = D - E
After the test, the operation is either executed or not executed, depending
on the result of the test. Now, consider a construct with a compound
predicate:
if ((a == b)AND(c == d)): e = e + 1
cmp r0,r1 @ Compare a == b
cmpeq r2,r3 @ If a == b then test c == d
addeq r4,r4,#1 @ If a == b AND c == d THEN increment e
The first line, cmp r0,r1, compares a and b. The next line, cmpeq r2,r3,
executes a conditional comparison only if the result of the first line was
true (i.e., a == b). The third line, addeq r4,r4,#1, is executed only if the
previous line was true (i.e., c == d) to implement e = e + 1. Without
conditional execution, we might write the following:
cmp r0,r1 @ Compare a == b
bne Exit @ Exit if a =! b
cmp r2,r3 @ Compare c == d bne Exit
@ Exit if c =! d add r4,r4,#1 @ Else increment
e Exit
We can use ARM’s teq instruction (test if equal) that tests whether two
values are equal. teq is similar to CMP, but teq does not set the V and C
flags during the test. teq is useful to test for negative values because the
N-bit is set to 1 if the number tested is negative:
teq r0,#0 @ Compare r0 with zero
Here, the operand in r0 is tested, and the N-bit is set if it is negative and
is clear if it is positive. The conditional instruction, rsbmi, is not
executed if the tested operand was positive (no change is necessary). If
the number was negative, the reverse substation performs 0 – r0, which
reverses its sign and makes it positive.
The first instruction, cmp, checks whether the character is ‘A’ or greater
by subtracting the ASCII code for ‘A.’ If it is, the rsbges checks that the
character is less than ‘Z.’ This test is performed only if the character in
r0 is greater or equal to ‘A.’ We use reverse subtraction because we want
to test whether the ASCII code for Z minus the ASCII code for the
character is positive. If we are in range, the conditional orr is executed,
and an uppercase to lowercase conversion is performed by setting bit 5.
In the next chapter, we will look at how operands are specified – that is,
we will look at addressing modes.
Summary
In this chapter, we’ve extended our knowledge of the ARM beyond the
basic data-processing instructions we encountered in the previous
chapter.
We began with the ARM’s register set, which is different from almost
every other processor. RISC processors generally have 32 general-
purpose registers. The ARM has only 16 registers.
Probably the most intriguing feature of the ARM is its ability to perform
conditional execution – that is, before an instruction is executed, the
condition code bits are checked. For example, addeq r0,r1,r2 performs
an addition only if the z-bit is set to 1. This is a very powerful operation,
and you can use it to write compact code.
In the next chapter, we will look at the ARM’s addressing modes – one
of the highlights of this processor.
11
Literal addressing
Scaled literals
Auto-incrementing pointers
Literal addressing
The easiest addressing mode is literal addressing. Instead of saying
where an operand is in memory, you provide the operand in an
instruction (i.e., this is literally the value). Other addressing modes
require you to specify where an operand is in memory. Consider the
following Python expression, which has two literals, 30 and 12:
if A > 30: B = 12
Scaled literals
The ARM implements 12-bit literals in an unusual way, using a
technique borrowed from the world of floating-point arithmetic. Four of
the 12 bits of a literal are used to scale an 8-bit constant. That is, the 8-bit
constant is rotated right by twice the number in the 4-bit scaling field.
The four most-significant bits of the literal field specify the literal’s
alignment within a 32-bit word. If the 8-bit immediate value is N and the
4-bit alignment is n, then the value of the literal is given by N rotated
right by 2n places. For example, if the 8-bit literal is 0xAB and n is 4, the
resulting 32-bit literal is 0xAB000000 because of the eight-position right
rotation (2 x 4). Remember that an eight rotate right position is
equivalent to a 32 - 8 = 24 bit shift left. Figure 11.1 demonstrates some
32-bit literals and the 12-bit literal codes that generate them.
Figure 11.1 – ARM’s literal operand encoding
You might find this rather strange. Why didn’t ARM use the 12-bit literal
field to provide a number in the range 0 to 4,095, rather than a number in
the range 0 to 255 scaled by the power of 2? The answer is that ARM’s
designers determined that the scaled literals were more useful in real-
world applications than unscaled numbers. For example, suppose you
wanted to clear all bits of a 32-bit word, except bits 8 to 15. You would
need to AND it with the literal 0b00000000000000001111111100000000 or
0x0000FF00 in hexadecimal. Using the scaling mechanism, we can take 8
bits 0x11111111 and shift them left by 8 bits (i.e., right by 24 bits) to get
the required constant. However, the scaling factor n needs twice the
number of rotation rights to achieve this. That is (32 – 8)/2, which is 12.
Consequently, the literal stored in the 12-bit instruction field is 12,255,
or CFF in hexadecimal.
Figure 11.2 illustrates the effect of ldr r1,[r0], where r0 is the pointer
and contains the value n.
The first instruction loads r1 with the 32-bit word pointed at by r0. The
second instruction increments r0 by 4 to point at the next byte in
memory. Repeating this pair of instructions will allow you to step
through a table of values, element by element. We will soon see that the
ARM includes a mechanism to automatically increment or decrement the
pointer.
The next fragment of code demonstrates how you would add together the
elements of a table. Suppose that you have a table of daily expenditures
you have made over four weeks. Each item is stored consecutively in a
table with 4 x 7 = 28 entries:
adr r0,table @ r0 points to the table of data (pseudo-
instruction) add r3,r0,#28 * 4 @ r3 points to the end of the
table (28 x 4 bytes) mov r1,#0 @ Clear the sum in r1
loop: ldr r2,[r0] @ REPEAT: Get the next value in r2
add r1,r1,r2 @ Add the new value to the running total
add r0,r0,#4 @ Point to the next location in the table (4
bytes increment) cmp r0,r3 @ Are we at the end of
the table? bne loop @ UNTIL all elements added
table: .word 123 @ Data for day 1 .word
456 @ Data for day 2 .word
20 @ Data for day 28
In this simple example, we set up a loop and step through elements from
the first to the last. On each cycle, we read an element and add it to the
total. The shaded lines are where the action takes place – getting an
element and pointing to the next one.
bne loop
The ARM allows you to specify an address using a pointer register, plus
a 12-bit literal that supplies the offset. Note that this is a true 12-bit
literal rather than the 8-bit scaled value used as a literal operand. The
literal can be positive or negative (indicating that it’s to be added to or
subtracted from the base pointer). Consider ldr r5,[r2,#160], where the
address of the operand loaded into r5 is the contents of r1 plus 160.
You can apply a shift to the second operand, such as the following:
ldr r2,[r0,r1,lsl #3] @ Load r7 with the contents of the location
pointed at by r0 plus r1 x 8
In this case, register r1 is scaled by 8. The scaling factor must be a power
of 2 (i.e., 2, 4, 8, 16…).
The two key lines of this program are the load and store instructions
(shaded), where data is read from the source and copied to the
destination. As we have stated, you can’t run this code directly on
Raspberry Pi without modification because of the way in which the
memory space is allocated to variables. The following code demonstrates
a runnable version for Raspberry Pi.
This is, essentially, the same code. As well as dealing with the memory
problem, we’ve added assembly directives and dummy data (complete
with markers that allow you to observe data in memory more easily).
There’s also a nop instruction. Note that some versions of ARM have a
true nop and some use a pseudo-instruction. I added this as a dummy
instruction to “land on” while testing. Remember that the address of the
actual data is stored in the program area, and then a pointer is loaded
with that address:
Figure 11.7 – The use of pointers when accessing read/write memory
Example of string-copying
The next example, Figure 11.9, uses post-indexing to copy a string from
one place to another in reverse order by moving one pointer down and
the other up. The destination pointer is incremented by len-1 to point to
the end of the string, initially. The following code includes assembly
language directives, enabling it to run on the RPi:
.equ len,5 @ Length of string to reverse
.text @ Program (code) area
.global _start _start: mov r0,#len @ Number of characters to
move adr r1,adr_st1 @ r1 points at source address 1
adr r2,adr_st2 @ r2 points at source address 2 ldr r1,
[r1] @ Register r1 points to source ldr r2,
[r2] @ Register r2 points to destination
add r2,r2,#len-1 @ r2 points to bottom of destination
Loop: ldrb r3,[r1],#1 @ Get char from source, increment pointer (note ldbr)
strb r3,[r2],#-1 @ Store char in destination, decrement pointer
subs r0,r0,#1 @ Decrement char count bne Loop @ REPEAT
until all done nop @ Stop here adr_st1:
.word str1 adr_st2: .word str2 .data str1: .ascii
“Hello” @ Source string
str2: .byte 0,0,0,0,0 @ Destination string .end
Figure 11.9 – Reversing a string
Think about it. You give the address of data with respect to the program
that’s using it and not an absolute address in memory. If you move the
program in memory, the data is still the same distance from the
instructions that access it, using program counter-relative addressing.
The introduction of program counter relative addressing was one of the
major advances in computer architecture. By the way, most branch
instructions use program counter relative addressing because the
destination of a branch is specified with respect to the current instruction.
There are three ldr instructions. Two load addresses into registers and
the third, ldr, loads a 32-bit literal 0x11111111. There is an adr
instruction that loads a 32-bit address into register r2. What happens
when these codes are executed?
Note that these differ from the source code. That’s because the source
code uses pseudo-instructions that are translated. For example, ldr
r0,=pqr is translated into ldr r0,[pc,#32]. The source code cannot
specify a 32-bit instruction. However, the translated version used a
conventional load to specify the location of the actual operand 32 bits for
the current value of the program counter.
Let’s look at the registers when the code is executed up to nop using gdb.
We will examine both the contents of the registers at the end of the
program and then look at the memory locations. You can see the data
stored in memory and the constants accessed by program counter relative
addressing:
r0 0x00010078
r1 0x00010074
r2 0x00010078
r3 0x11111111
(gdb) x/8xw 0x10074
0x10074
<abc>: 0x22222222 0x33333333 0x00010078 0x0
0010074
0x10084
<pqr+12>: 0x11111111 0x00001141 0x61656100 0x0
0010069
Summary
Addressing modes comprise all the ways to express the location of an
item in memory. Addressing modes are simultaneously the easiest and
most difficult topic in assembly language programming. The concept is
simple, but indirect addressing modes that use pointers may take some
effort to visualize.
The stack
There are two basic ways of implementing subroutine calls and returns.
The classic CISC approach is BSR (branch to subroutine) and RTS (return
from subroutine). The typical code might be as follows:
bsr abc @ Call the subroutine on the line labeled abc …
… abc: … @ Subroutine abc entry point … rts @ Subroutine abc
return to calling point
This is simplicity in action. You call a piece of code, execute it, and
return to the instruction after the calling point. Most RISC processors
reject this mechanism because the subroutine call and return are complex
instructions that save the return address on the stack during a call, and
then pull the return address off the stack during a return. This is very
convenient for a programmer but requires several CPU clock cycles to
execute, and it does not fit into the one-cycle-per-instruction paradigm of
the RISC processor.
You will soon see that you can implement this mechanism yourself on an
ARM, but not by using two dedicated instructions. You have to write
your own code.
If you want a simple subroutine call and return (the subroutine is called a
leaf), all you need to do is save the return address in a register (no
external memory or stack is required). Then, to return, you just put the
return address in the program counter – simple and fast. However, once
you are in the subroutine, you can’t do the same thing again and call
another subroutine. Doing that would destroy your existing saved return
address.
The ARM’s subroutine mechanism is called branch with link and has the
mnemonic bl target, where target is the symbolic address of the
subroutine. The actual address is program counter-relative and is a 24-bit
signed word that gives you a branch range of 223 words from the current
PC. The range is 223 words in either direction from the PC (i.e., branch
forward and branch back).
The branch with link instruction behaves like a branch instruction, but it
also copies the return address (i.e., the address of the next instruction to
be executed following a return into the link register r14. Let’s say you
execute the following:
bl sub_A @ Branch to sub_A with link and save return
address in r14
The ARM executes a branch to the target address specified by the label
sub_A. It also copies the program counter, held in register r15, into the
link register r14 to preserve the return address. At the end of the
subroutine, you return by transferring the return address in r14 to the
program counter. You don’t need a special return instruction; you just
write the following:
mov pc,lr @ We can also write this as mov r15,r14
All you need to create a subroutine is an entry point (the label Func1) and
a return point that restores the saved address by bl in the link register.
The stack
We’ve already described the stack. We’ll go over it again here because
it’s probably the single most important data structure in computing. The
stack is a pile that you add things on at the top and take things off, also
from the top. If you take something off the stack, it is the last thing that
was added to the stack. Consequently, the stack is called a last in first
out queue (LIFO), in which items enter at one end and leave in the
reverse order.
There are four variations of the stack. They all do the same thing but are
implemented differently. The ARM supports all four variations, but we’ll
use only one here for the sake of simplicity. A stack is stored in memory,
which has no up or down in the normal human sense. When items are
added to the stack, they can be added to the next location with a lower
address or the next location with a higher address. By convention, most
stacks are implemented so that the next item is stored at the lower
address. We say that the stack grows up toward lower addresses (that’s
because we number lines in a book from top to bottom, with line one at
the top).
The other variation in the arrangement of stacks is that the stack pointer
can either point to the top item on the stack, TOS, or point to the next
free item on that stack. I will cover stacks where the stack pointer points
to the top item on the stack (again, this is the most common convention).
The stack pointer points at the top item on the stack, and when an item is
added to the stack (pushed), the stack pointer is first decremented. When
an item is removed from the stack, it is taken at the location indicated by
the stack pointer, and the stack pointer is incremented (i.e., moved
down). We can define the push and pull (pop) actions with relation to the
stack pointer (SP) as follows:
PUSH: [SP] ← [SP] – 4 @ Move stack pointer up one word (up
toward lower addresses)
[[SP]] ← data @ Push data onto the stack. Push uses pre-
decrementing.
PULL: data ← [[SP]] @ Pull data off the stack by reading TOS
[SP] ← [SP] + 4 @ Move stack pointer down one word (pull uses post-
incrementing)
The following is a simple program that sets up a call and return using
this mechanism. Note that we don’t set up the initial stack pointer. The
ARM’s operating system does that:
.section .text
.global start
g _
_start: mov r0,#9 @ Dummy operation
sub sp,sp,#4 @ Decrement stack
str pc,[sp] @ Save pc on stack
b target @ Branch to subroutine "target"
mov r2,#0xFFFFFFFF @ Return here ... this is a marker
nop @ Dummy operation
mov r7,#1 @ Set up exit code
svc 0 @ Leave program
target: mov r1,#0xFF @ Subroutine ... dummy operation
ldr r12,[sp],#+4 @ Pull pc off the stack
mov r15,r12 @ Return
.end
Figure 12.2 demonstrates the output of the ARM simulator after running
this code. We have included the disassembly window and the register
windows. Note how the mov r2,#0xFFFFFFFF instruction has been
transformed into the mvn r2,#0 operation. Recall that MVN (move
negative) moves a literal to a register and inverts its bits. Note also how
ldr r12,[sp],#+ has been renamed pop {r12}. This is equivalent to the
pop stack operation (removing an item from the top of the stack).
In the next section, we will look at one of the ARM’s most powerful and
least RISC-like operations – the ability to move blocks of data between
memory and multiple registers.
A great feature of some CISC processors was that you could push a
group of registers on the stack in a single instruction. RISC processors
generally don’t have such an instruction because it conflicts with the
one-operation-per-cycle design constraint that’s at the heart of the RISC
philosophy. Surprisingly, the ARM implements a block move instruction
that lets you copy a group of registers to or from memory in one
operation (i.e., an instruction). The following ARM code demonstrates
how to load registers r1,r2,r3,r5 from memory:
adr r0,DataToGo @ Load r0 with the address of the data area
ldr r1,[r0],#4 @ Load r1 with the word pointed at by r0 and update
the pointer ldr r2,[r0],#4 @ Load r2 with the word pointed at by r0
and update the pointer ldr r3,[r0],#4 @ and so forth for the
remaining registers r3 and r5… ldr r5,[r0],#4
ARM has a block move to memory instruction, stm, and a block move
from memory, ldm, that copies groups of registers to and from memory.
Block move instructions require a two-character suffix to describe how
the data is accessed (e.g., stmia or ldmdb), as we shall see.
Let’s move the contents of registers r1, r2, r3, and r5 into sequential
memory locations with stm:
stmia r0!,{r1-r3,r5} @ Note block move syntax. The register list is in
braces
@ r0! is the destination register with auto indexing
@ The register list is {r1-r3,r5} r1,r2,r3,r5
Consider the following example of the block move. Because it’s a little
more complicated than some instructions we’ve encountered, we will
demonstrate its execution. I’ve included several features that are not
strictly part of the demonstration but include features I use when
experimenting. In particular, I use markers in both registers and memory
so that I can follow debugging more easily. For example, in the memory
block, I store the data words 0xFFFFFFFF and 0xAAAAAAAA. These serve no
function other than to show me, at a glance, where my data area starts
and stops when I debug memory. Similarly, I use values such as
0x11111111 as words to move from registers because I can easily follow
them in debugging:
.text @ This is a code section
.global _start @ Define entry point for code
execution _start: nop @ nop = no operation
and is a dummy instruction ldr r0,=0xFFFFFFFF @
Dummy values for testing ldr r1,=0x11111111
ldr r2,=0x22222222 ldr r3,=0x33333333
ldr r4,=0x44444444 ldr r5,=0x55555555
ldr r0,adr_mem @ Load pointer r0 with memory
stmia r0!,{r1-r3,r5} @ Do a multiple load to memory
mov r10,r0 @ Save r0 in r10 (for debugging)
ldmdb r0!,{r6-r9} @ Now load data from memory
mov r11,r0
mov r1,#1 @ Terminate command
svc 0 @ Call OS to leave
.word 0xFFFFFFFF @ A dummy value for testing
.word 0xAAAAAAAA @ Another dummy value
adr_mem: .word memory @ The address of the memory for
storing data .data @ Declare a memory segment for the data
memory: .word 0xBBBBBBBB @ Yet another memory marker .space
32 @ Reserve space for storage (8 words)
.word 0xCCCCCCCC @ Final memory marker .end
This code sets up five registers with data that is easily visible when
examining memory. Thirty-two bytes of memory are saved between two
word markers at the end of the program with the .space directive. The
start of this block is labeled memory, and r0 points to it. Then the five
registers are stored in memory. Instructions that carry out the block store
are shaded in light gray, and the data area is shaded in dark gray.
The code we are initially interested in is for the five register loads that
preset registers r1 to r5 with 0x11111111 to 0x55555555, respectively.
Register r0 was set initially to 0xFFFFFFFF just as a marker for
debugging. The key instruction is stmia r0!,{r1-r3,r5}, whose purpose is
to store the contents of registers r1, r2, r3, and r5 in consecutive memory
locations pointed at by r0.
The following Raspberry Pi output is from the gdb debugger. The source
code is blockMove1.s. We’ve omitted some of the register values to make
the listing more readable when registers haven’t changed or haven’t been
used. Similarly, repetitive command lines have been omitted:
pi@raspberrypi:~/Desktop $ as -g -o blockMove1.o blockMove1.s
pi@raspberrypi:~/Desktop $ ld -o blockMove1 blockMove1.o
pi@raspberrypi:~/Desktop $ gdb blockMove1
(gdb) b 1
Breakpoint 1 at 0x10078: file blockMove1.s, line 6.
(gdb) r
Starting program: /home/pi/Desktop/blockMove1
Breakpoint 1, _start () at blockMove1.s:6
6 ldr r0,=0xFFFFFFFF @ Dummy value for
testing
(gdb) i r
r0 0x0 0 # These are the initial
registers before we start
r1 0x0 0
r2 0x0 0
r3 0x0 0
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
sp 0xbefff380 0xbefff380 # The OS sets up the
stack pointer
lr 0x0 0
pc 0x10078 0x10078 <_start+4> # The OS
sets up stack pointer
Now, let’s look at the registers we set up. Only registers of interest have
been displayed. Note that r0 points to the data at 0x200CC. The system
software is responsible for this address:
r1 0x11111111 286331153
r2 0x22222222 572662306
r3 0x33333333 858993459
r4 0x44444444 1145324612
r5 0x55555555 1431655765
r6 0x0 0
You can see that the registers have been set up with their easy-to-trace
values. The next step is to examine memory using the x/16xw gdb
The two markers we’ve stored in memory are in bold font. Now, let’s
execute the stored multiple registers. Before that, we will copy the
pointer to r10 (again, that is just for my own debugging purposes) so that
we can see what it was before the move. After the block move
instruction, we display registers of interest:
(gdb) si 1
14 mov r10,r0
15 ldmdb r0!,{r6-r9} @ Now load data
from memory
(gdb) i r
r0 0x200dc 131292
r1 0x11111111 286331153
r2 0x22222222 572662306
r3 0x33333333 858993459
r4 0x44444444 1145324612
r5 0x55555555 1431655765
r6 0x0 0
r10 0x200dc 131292
pc 0x1009c 0x1009c <_start+40>
Now for the proof of the pudding. Here’s the memory after the x/16xw
display command. Note that the contents of the four registers have been
stored in consecutive rising memory locations:
(gdb) x/16xw 0x200cc
0x200cc: 0x11111111 0x22222222 0x33333333 0x5555555
5
0x200dc: 0x00000000 0x00000000 0x00000000 0x0000000
0
0x200ec: 0x00000000 0xcccccccc 0x00001141 0x6165610
0
0x200fc: 0x01006962 0x00000007 0x00000108 0x0000001
c
Finally, we will execute the last two commands and display the register
contents:
(gdb) si 1
16 mov r11,r0
18 mov r1,#1 @ Terminate command
(gdb) i r
r0 0x200cc 131276
r1 0x11111111 286331153
r2 0x22222222 572662306
r3 0x33333333 858993459
r4 0x44444444 1145324612
r5 0x55555555 1431655765
r6 0x11111111 286331153 Data copied
to registers r6 - r9
r7 0x22222222 572662306
r8 0x33333333 858993459
r9 0x55555555 1431655765
r10 0x200dc 131292
You can see that the block move from the ldmdb r0!,{r6-r9} memory
operation has copied the four registers from the memory and placed them
in registers r7 to r9.
Consider the suffix of ldm, which is db. Why ldmdb? When we
transferred data to memory, we used the increment after suffix, where the
pointer register is used to move the data to a memory location, and then
it is incremented after the move. When we retrieve the data, we initially
point at the location after the last value is moved. Consequently, to
remove the items we stored in memory, we have to decrement the pointer
before each move – hence the decrement before (db) suffix. For this
reason, the instruction pair stmia and ldmdb correspond to the stack push
and pull operations, respectively.
Line 5 contains a nop instruction that does nothing (other than advance
the PC to the next instruction). It can provide a placeholder for later
code, or act as a debugging aid. Here, it provides a space for the first
instruction to land on. The ARM lacks a nop, and the assembler
translates nop to mov r0,r0. Like nop, this instruction achieves nothing!
When used by load operations, the suffix is increment after. When used
by store operations, the suffix is decrement before.
Figure 12.2 – One of ARM’s four stack modes – full descending (FD, IA load, and DB
store)
In a full descending stack, the stack pointer points at the item at the top
of the stack (full), and when an item is added to the stack, the stack
pointer is decremented before and when an item is removed, the stack is
incremented after.
Figure 12.3 – One of ARM’s four stack modes – full ascending (FA, DA load, and IB
store)
In a full ascending stack, the stack pointer points at the item at the top of
the stack (full), and when an item is added to the stack, the stack pointer
is incremented before. When an item is removed, the stack is
decremented after.
{r0-r3}
Figure 12.4 – One of ARM’s four stack modes – empty descending (ED, IB load, and
DA store)
In an empty descending stack, the stack pointer points at the item above
the top of the stack (empty), and when an item is added to the stack, the
stack pointer is incremented after. When an item is removed, the stack is
decremented before. Consequently, we have the following:
Push r0 to r3 on the stack stmea sp!,{r0-r3} or stmia sp!,
{r0-r3}
Pull r0 to r3 off the stack ldmea sp!,{r0-r3} or ldmdb sp!,
Pull r0 to r3 off the stack ldmea sp!,{r0 r3} or ldmdb sp!,
{r0-r3}
Figure 12.5 – One of ARM’s four stack modes – empty ascending (EA, DB load, and
IA store)
We use the fd block move suffix to mean full descending. ARM lets you
use two different naming conventions for block move instructions. You
can write the pair stmia and ldmdb, or the pair stmfd and ldmfd; they are
the same. Yes, it is confusing:
@ Call abc and save some registers
bl abc @ Call subroutine abc, save return
address in lr (r14) . abc: stmfd sp!,{r0-r3,r8} @
Subroutine abc. Block move saves registers on the stack . .
@ Body of code .
ldmfd sp!,{r0-r3,r8} @ Subroutine complete. Now restore the
registers mov pc,lr @ Copy the return address in
lr to the PC
The link register with the return address is pushed onto the stack, and
then at the end, we pull the saved registers, including the value of the
return address that is placed in the PC, to return.
This ends the section on Raspberry Pi and the ARM assembly language.
In this book, you have learned how a computer works and what it does.
We’ve examined instructions sets, their encoding, and their execution. In
the last four chapters, we looked at high-performance architecture with
an imaginative design.
Now, you should be able to write your own programs.
Summary
One of the key data structures in computing is the stack, or the LIFO
queue. A stack is a queue with only one end – that is, new items enter at
the same end as items leave. This single end is called the top of stack
(TOS).
We have looked at the ARM’s branch and link instruction, bl, that can be
used to call a subroutine without the overhead of the stack. However,
using the branch with link a second time overwrites the return address in
the link register, and you then have to use a stack to preserve previous
addresses.
There are four standard stack implementations. The stack pointer can
point either to the item at the top of the stack, or to the free space above
that item. Similarly, the stack can grow (as items are added) toward low
addresses or toward high addresses. This gives four possible
arrangements. However, most computers implement a stack that points to
the top item on it that grows toward low addresses.
Having reached the end of this book, you might like to consider
designing your own ARM simulator.
The fourth appendix covers some concepts that can cause students
confusion, such as the computer use of the terms up and down, which
sometimes mean something different from the normal meaning of up and
down. For example, adding something to a computer stack causes the
computer stack to grow up toward lower addresses.
The final appendix defines some of the concepts that we use when
discussing computer languages such as Python.
Using IDLE
The Python programs in this book have been written in Python, saved as
a .py file, and then executed in an integrated development environment.
However, there is another approach to executing Python that you will see
mentioned in many texts. This is the Python IDLE environment
(included with the Python package) that lets you execute Python code
line by line.
Consider the following example, where the text in bold font is my input:
Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC
v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
>>> x = 4
>>> y = 5
>>> print('Sum =', x+y)
Sum = 9
>>>
When you run a compiled Python program, the output is displayed in the
run window. Here, as you can see, each input line after the >>> prompt is
read and interpreted, and the result is printed.
This window is, in fact, part of the IDLE environment. This means that if
your program crashes, you are able to examine variables after the crash.
Consider the following example, where we create and run a program that
contains an error:
# An error example
x = 5
y = input('Type a number = ')
z = x + y
print('x + y =',z)
Linux
cd .. @ Change dictionary to parent
mkdir /home/pi/testProg @ Create new file called testProg in folder pi
ls /home/pi @ List files in folder pi
as -g -0 file.o file.s @ Assemble source file file.s to create object file
file.o
ld -0 file file.o @ Link object file file.o
gdb file @ Call debugger to debug file
sudo apt-get update @ Download packages in your configuration
source files
sudo apt-get upgrade @ Updates all installed packages
Assembler directives
.text @ This is a code section
.global _start @ _start is a label (first instruction)
.word @ Bind 32-bit value to label and store in memory
.byte @ Bind 8-bit value to label and store in memory
.equ @ .equ x,7 binds or equates 7 to the name x
.asciz @ Bind ASCII string to label and store (terminated
by 0)
.balign @ .balign 4 locates instruction/data is on a word
boundary
gdb debugger
file toDebug @ Load code file toDebug for debugging
b address @ Insert breakpoint at <address> (maybe line
number or label)
x/4xw <address> @ Display memory: four 32-bit words in
hexadecimal format
x/7db <address> @ Display memory: seven bytes in decimal
format
r @ Run program (to a breakpoint or its
termination)
s @ Step (execute) an instruction
n @ Same as step an instruction
i r @ Display registers
i b @ Display breakpoints
c
@ Continue from breakpoint
We have located the source string, string1, in the body of the program,
in the .text section, because it is only read from and never written to.
The destination, str2, that will receive the reversed string is in read/write
memory in the .data section. Consequently, we have to use the technique
of indirect pointers – that is, the .text portion has a pointer at adr_str2
that contains the address of the actual string, str2.
The program contains several labels that are not accessed by the code
(e.g., preLoop and Wait). The purpose of these labels is to make it easy to
use breakpoints when debugging by giving them names.
A final feature is the use of markers. We have inserted markers in
memory that follow both strings – that is, 0xAAFFFFBB and 0xCCFFFFCC.
These make it easier to locate data when you look at memory because
they stand out.
This program tests pointer-based addressing, bytes load and store, and
auto-incrementing and decrementing of pointer registers. We will step
through the execution of this program using gdb’s facilities:
.equ len,8 @ Length of string to reverse (8
bytes/chars) .text @ Program (code) area
.global _start @
_start: mov r1,#len @ Number of characters to move in r1
adr r2,string1 @ r2 points at source string1 in this
section adr r3,adr_str2 @ r3 points at dest string str2
address in this section ldr r4,[r3] @ r4 points to
dest str2 in data section preLoop: add r5,r4,#len-1 @ r5 points to
bottom of dest str2 Loop: ldrb r6,[r2],#1 @ Get byte char
from source in r6 inc pointer strb r6,[r5],#-1 @ Store
char in destination, decrement pointer
subs r1,r1,#1 @ Decrement char count
bne Loop @ REPEAT until all done
Wait: nop @ Stop here for testing
Exit: mov r0,#0 @ Stop here
mov r7,#1 @ Exit parameter required by svc
svc 0 @ Call operating system to exit
program string1: .ascii “Hello!!!” @ The source string
marker: .word 0xAAFFFFBB @ Marker for testing
adr_str2: .word str2 @ POINTER to source string2 in data
area .data @ The read/write data area
str2: .byte 0,0,0,0,0,0,0,0 @ Clear destination string
.word 0xCCFFFFCC @ Marker and terminator
.end
The program is loaded into the gdb and debugged by the following. Note
that my input is in bold font:
The first step is to place three breakpoints on the labels so that we can
execute code up to those points and then examine registers or memory:
(gdb) b _start
Breakpoint 1 at 0x10074: file pLoop.s, line 5.
(gdb) b preLoop
Breakpoint 2 at 0x10084: file pLoop.s, line 9.
(gdb) b Wait
Breakpoint 3 at 0x10098: file pLoop.s, line 14.
The next step is to run the program as far as the first instruction:
(gdb) r
Starting program: /home/alan/Desktop/pLoop
Breakpoint 1, _start () at pLoop.s:5
5 _start: mov r1,#len @ Number of characters to
move in r1
(gdb) c
Continuing.
There’s not a lot to see here. So, we hit c to continue to the next
breakpoint, and then enter i r to display the registers. Note we have not
displayed registers that have not been accessed:
Breakpoint 2, preLoop () at pLoop.s:9
9 preLoop: add r5,r4,#len-1 @ r5 points to bottom of dest
str2
(gdb) i r
r0 0x0 0
r1 0x8 8
r2 0x100a8 65704 Pointer to
string1
r3 0x100b4 65716 Pointer to str2
address
r4 0x200b8 131256 Pointer to str2
value
sp 0x7efff360 0x7efff360
lr 0x0 0
pc 0x10084 0x10084 <preLoop>
Let’s have a look at the data section in the code. Register r2 points at this
area, and the command means four words of memory in the hexadecimal
form are displayed, starting at 0x100A8:
(gdb) x/4wx 0x100a8
0x100a8 <string1>: 0x6c6c6548 0x2121216f 0xaaffffbb 0x000200b8
The three highlighted values present the string “Hello!!!” and the
marker 0xCCFFFFCC. Note how these values appear back to front. This is a
consequence of the little-endian byte ordering mode. The least-
significant byte is located at the least-significant end of a word. In terms
of ASCII characters, these are lleH !!!o.
We next perform a single step and display the memory in the data region.
At this stage, the code had not been executed fully and this region should
be as originally set up:
(gdb) si 1
Loop () at pLoop.s:10
10 Loop: ldrb r6,[r2],#1 @ Get byte char from source in
r6 inc pointer
(gdb) x/4wx 0x200b8
0x200b8: 0x00000000 0x00000000 0xccffffcc 0x00001141
Here, you can see the zeros loaded at bytes and the marker following
them. We then enter c again and continue to the Wait breakpoint when
the code should have been completed. Finally, we look at the registers
and then the data memory:
(gdb) c
Continuing.
Breakpoint 3, Wait () at pLoop.s:14
14 Wait: nop @ Stop here for testing
(gdb) i r
r0 0x0 0
r1 0x0 0
r2 0x100b0 65712
r3 0x100b4 65716
r4 0x200b8 131256
r5 0x200b7 131255
r6 0x21 33
sp 0x7efff360 0x7efff360
lr 0x0 0
pc 0x10098 0x10098 <Wait>
(gdb) x/4wx 0x200b8
0x200b8: 0x6f212121 0x48656c6c 0xccffffcc 0x00001141
Note that the data is changed. As you can see, the order has been
reversed. Again, note the effect of the little-endianism on the byte order
within words. The sequence of the data is now o!!! Hell. Finally, we
enter c again and the program is completed:
(gdb) c
Continuing.
[Inferior 1 (process 11670) exited normally]
(gdb)
Common confusions
The growth of computing from the 1960s to today was rapid and chaotic.
The chaos arose because the technology developed so rapidly that
systems became obsolete in months, and that meant much of the design
was obsolete but had been incorporated in systems that were now being
held back by it. Similarly, many different notations and conventions
arose – for example, does MOVE A,B move A to B, or B to A? Both
conventions were used at the same time by different computers. Here are
a few pointers to help with the confusion.
In this book, we will largely adopt the right-to-left convention for data
movement. For example, add r1,r2,r2 indicates the addition of r2 and
r3, and the sum is put in r1. As a means of highlighting this, I often put
the destination operand of an operation in bold font.
Symbols are often used with different meanings. This is particularly true
of #, @, and %.
Vocabulary
All specializations have their own vocabulary, and programming is no
exception. Here are a few words that you might find helpful in
understanding the text and its context.
example 313
A
abstract memory
simulating, in Python 42
arithmetic operations 90
ARM Program
example 292-294
running 374-377
ARM’s architecture
overview 322
assembler
Assembler Directives
B
Binary Instruction
extracting 130-135
bit-handling
in Python 85-87
bool 20
C
carry bit 92
character 20
CISC Machine
Code Format 82
comments 22-24
building, in Python 75
conditional expression
using, in Python 30
constant 24
D
data structure 25
decision-making 29
DRAM 38
E
endianism 312, 313
expression 23, 100
F
field programmable gate array (FPGA) 47
algorithm, constructing 12
float 20
floor integer 21
functions 117
in Python 27, 28
G
garbage in, garbage out (GIGO) 4
features 306-310
Geany
H
hard disk drive (HDD) 38
I
IDLE environment
if statements
examples 30, 31
indenting
indexed addressing 39
indirect addressing 39
input, processing
instruction
decoding 88, 89
executing 88, 89
address 49
bit 48
instruction 48
literal 49
move 49
register 49
word 48
Instruction Format 83
integer 20
iterable 103
K
keys 112
L
last in first out queue (LIFO) 357
examples 104
logical operations 90
M
Machine-level Instruction 66, 67
CISC 70
RISC 70
mathematical operators 21
modulus operator 21
N
natural numbers 20
notations 377
at symbol 378
endian 378
O
One-Address Accumulator Machine 229-235
operator precedence
in Python 87
P
pipelining 51
pointer-based addressing 39
positive integers 20
programs
reading 16, 17
Pseudocode 12
pseudo-instruction 292
Python 17
function 94
functions 27, 28
operator precedence 87
URL 17
Python code
Python function 93
bool 20
character 20
float 20
integers 20
string 20
R
random-access memory (RAM) 38
Raspberry Pi basics 280-284
Raspberry Pi debugger
using 294-298
real numbers 20
usage, example 40
S
sequential conditional execution 338
set 25
Shift Operations
sign bit 92
simulator 136-140
features 244-246
strings 20
features 100
processing 100-102
slicing 26
executing 359
T
TC1 Assembly Language Program
example 154-160
explaining 83, 84
TC2
TC2 Simulator
enhancing 235-238
TC3
comments 273-275
TC4
example 263-273
tuple 105
U
ultra-primitive one-instruction computer 74, 75
V
variable-length instructions
variables 21, 24
vocabulary 378
address 379
compiler 378
constant 379
pointer 379
variable 379
Z
zero bit 92
www.packtpub.com
Subscribe to our online digital library for full access to over 7,000 books
and videos, as well as industry leading tools to help you plan your
personal development and advance your career. For more information,
please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks
and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Did you know that Packt offers eBook versions of every book published,
with PDF and ePub files available? You can upgrade to the eBook
version at packtpub.com and as a print book customer, you are entitled to
a discount on the eBook copy. Get in touch with us at
[email protected] for more details.
Joakim Wassberg
ISBN: 978-1-83921-686-2
Become well-versed with how to track and fix bugs in your programs
Embedded Systems Architecture - Second Edition
Daniele Lacamera
ISBN: 978-1-80323-954-5
Your review is important to us and the tech community and will help us
make sure we’re delivering excellent quality content.
Do you like to read on the go but are unable to carry your print books
everywhere? Is your eBook purchase not compatible with the device of
your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF
version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code
from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts,
newsletters, and great free content in your inbox daily
https://packt.link/free-ebook/9781837636679
2. Submit your proof of purchase
3. That’s it! We’ll send your free PDF and other benefits to your email
directly