Ch2&3 Lecture Notes
Ch2&3 Lecture Notes
47
Computer Arithmetic
50
In general, if an n-bit sequence of binary digits an-1an-2…a1a0 is
interpreted as an unsigned integer A, its value is:
In general,
52
Instruction Sets: Characteristics
What is an Instruction Set?
The operation of the processor is determined by the instructions
it executes, referred to as machine instructions or computer
instructions.
The collection of different instructions that the processor can
execute is referred to as the processor’s instruction set.
Elements of a Machine Instruction
Each instruction must contain the information required by the
processor for execution including:
Operation code (opcode): a binary code that specifies the operation
to be performed (e.g., ADD, I/O).
Source operand reference: the operation may involve one or more
source operands (operands that are inputs for the operation).
53
Result operand reference: the operation may produce a result. This
tells where to put the result produced.
Next instruction reference: this tells the processor where to fetch
the next instruction after the execution of this instruction is complete.
Source and result operands can be in one of four areas:
Main or virtual memory: as with next instruction references, the
main or virtual memory address must be supplied.
Processor register: with rare exceptions, a processor contains one or
more registers that may be referenced by machine instructions.
If only one register exists, reference to it may be implicit. If more than one
register exists, then each register is assigned a unique name or number, and the
instruction must contain the number of the desired register.
Immediate: the value of the operand is contained in a field in the
instruction being executed.
I/O device: the instruction must specify the I/O module and device
for the operation. If memory-mapped I/O is used, this is just another
main or virtual memory address.
54
Instruction Representation
Within the computer, each instruction is represented by a sequence of
bits.
The instruction is divided into fields, corresponding to the constituent
elements of the instruction.
A simple example of an instruction format is shown in Figure 1.
As another example, the IAS instruction format is shown in Figure 2.
With most instruction sets, more than one format is used.
During instruction execution, an instruction is read into an instruction
register (IR) in the processor.
The processor must be able to extract the data from the various
instruction fields to perform the required operation.
It is difficult for both the programmer and the reader of textbooks to
deal with binary representations of machine instructions.
Thus, it has become common practice to use a symbolic representation of
machine instructions. An example of this was used for the IAS instruction set,
in Table 1.
55
Table 1: The IAS Instruction Set
56
Opcodes are represented by abbreviations, called mnemonics, that
indicate the operation.
Common examples include:
ADD Add
SUB Subtract
MUL Multiply
DIV Divide
LOAD Load data from memory
STOR Store data to memory
Operands are also represented symbolically.
For example, the instruction ADD R,Y may mean add the value
contained in data location Y to the contents of register R.
In this example,Y refers to the address of a location in memory,
and R refers to a particular register. Note that the operation is
performed on the contents of a location, not on its address.
57
Figure 1. A Simple Instruction Format
58
Instruction Types
Consider a high-level language instruction that could be expressed in a
language such as C++ or Java.
For example, X = X + Y.
This statement instructs the computer to add the value stored in Y to the
value stored in X and put the result in X.
How might this be accomplished with machine instructions? Let us assume
that the variables X and Y correspond to locations 513 and 514.
If we assume a simple set of machine instructions, this operation could be
accomplished with three instructions:
1. Load a register with the contents of memory location 513.
2. Add the contents of memory location 514 to the register.
3. Store the contents of the register in memory location 513.
As can be seen, the single BASIC instruction may require three machine
instructions.
This is typical of the relationship between a high-level language and a
machine language.
59
A high-level language expresses operations in a concise algebraic
form, using variables.
A machine language expresses operations in a basic form involving
the movement of data to or from registers.
A computer should have a set of instructions that allows the user
to formulate any data processing task.
Any program written in a high-level language must be translated
into machine language to be executed.
Thus, the set of machine instructions must be sufficient to express
any of the instructions from a high-level language.
With this in mind we can categorize instruction types as follows:
Data processing: Arithmetic and logic instructions
Data storage: Movement of data into or out of register and or memory
locations
Data movement: I/O instructions
Control: Test and branch instructions
60
Arithmetic instructions: processing of numeric data.
Logic (Boolean) instructions: operate on the bits of a word as bits
rather than as numbers; thus, they provide capabilities for processing
any other type of data the user may wish to employ.
Data Storage instructions: arithmetic and Boolean operations are
performed primarily on data in processor registers.
Therefore, there must be memory instructions for moving data between
memory and the registers.
I/O instructions: are needed to transfer programs and data into
memory and the results of computations back out to the user.
Test instructions: are used to test the value of a data word or the
status of a computation.
Branch instructions: are then used to branch to a different set of
instructions depending on the decision made.
61
Number of Addresses
One way of describing processor architecture is in terms of the number of
addresses contained in each instruction.
What is the maximum number of addresses one might need in an
instruction?
Arithmetic and logic instructions will require the most operands.
All arithmetic and logic operations are either unary (one source operand)
or binary (two source operands).
Thus, we would need a maximum of two addresses to reference source
operands.
The result of an operation must be stored, suggesting a third address, which
defines a destination operand.
Finally, after completion of an instruction, the next instruction must be
fetched, and its address is needed.
Hence an instruction could contain four address references: two source
operands, one destination operand, and the address of the next instruction.
In most architectures, most instructions have one, two, or three operand
addresses, with the address of the next instruction being implicit (obtained
from the program counter).
62
Comparison of one-, two- and three-address instructions to compute:
Y = (A – B)/[C + (D X E)].
63
Exercise: Repeat the above using 0-
address Instructions
Y = (A – B)/[C + (D X E)]
PUSH A
PUSH B
SUB
PUSH C
PUSH D
PUSH E
MUL
ADD
DIV
POP Y
64
Three-address instruction formats are not common because they
require long instruction format to hold the three address references.
Two-address instruction format: one address doubles as operand and
result.
ADD a, b: carries out the calculation a+b and stores the result in a.
Reduces length of instruction
Requires some extra work (move instruction)
Temporary storage to hold some results
One-address instruction format: Implicit second address
Usually a register (accumulator, AC). Also holds the result.
Common on early machines
Zero-address instruction format: applicable to a special memory
organization called a stack.
A stack is a last-in-first-out set of locations.
The stack is in a known location and, often, at least the top two elements
are in processor registers.
Thus, zero-address instructions would reference the top two stack
elements.
65
Table 2. Utilization of Instruction Addresses
More addresses
• More complex instructions
• More registers
Inter-register operations are quicker
• Fewer instructions per program
Fewer addresses
• Less complex (powerful?) instructions
• More instructions per program
• Faster fetch/execution of instructions
66
Instruction Set Design
One of the most interesting, and most analyzed, aspects of
computer design
Very complex because it affects so many aspects of the
computer system.
The instruction set defines many of the functions performed by
the processor and thus has a significant effect on the
implementation of the processor.
The instruction set is the programmer’s means of controlling
the processor.
Thus, programmer requirements must be considered in
designing the instruction set.
67
The most important of the fundamental instruction set design issues
include:
Operation repertoire: How many and which operations to provide,
and how complex operations should be.
Data types: The various types of data upon which operations are
performed.
Instruction format: Instruction length (in bits), number of addresses,
size of various fields, and so on.
Registers: Number of processor registers that can be referenced by
instructions, and their use.
Addressing modes: The mode or modes by which the address of an
operand is specified.
68
TYPES OF OPERANDS
Machine instructions operate on data.
The most important general categories of data are
Addresses
Numbers
Characters
Logical data
All machine languages include numeric data types.
Even in nonnumeric data processing, numbers to act as counters,
field widths, and so forth.
Distinction between numbers used in ordinary mathematics and
numbers stored in a computer is that the latter are limited.
a limit to the magnitude of numbers representable on a machine
In the case of floating-point numbers, a limit to their precision.
Thus, the programmer is faced with understanding the consequences
of rounding, overflow, and underflow.
69
Three types of numerical data are common in computers:
Binary integer or binary fixed point
Binary floating point
Decimal
The human users deal with decimal numbers. Thus, there is a
necessity to convert from decimal to binary on input and from binary
to decimal on output.
For applications in which there is a great deal of I/O and
comparatively little and simple computation, it is preferable to store
and operate on the numbers in decimal form.
The most common representation for this purpose is packed
decimal.
With packed decimal, each decimal digit is represented by a 4-bit
code, in the obvious way, with two digits stored per byte.
Thus, 0 = 0000, 1 = 0001, 8 = 1000, and 9 = 1001.
70
To form numbers, 4-bit codes are strung together, in multiples
of 8 bits.
Example: the code for 246 is 0000 0010 0100 0110.
Less compact than a straight binary representation.
But avoids the conversion overhead.
Negative numbers can be represented by including a 4-bit sign
digit at either the left or right end of a string of packed decimal
digits.
Standard sign values are 1100 for positive (+) and 1101 for
negative (-).
71
Characters
A number of codes have been devised to represent characters by a
sequence of bits.
The earliest common example of this is the Morse code.
The most commonly used character code in the International
Reference Alphabet (IRA)/referred also as the American Standard
Code for Information Interchange (ASCII)
Each character in this code is represented by a unique 7-bit pattern;
128 different characters can be represented.
represent printable characters and control characters.
ASCII-encoded characters are almost always stored and transmitted
using 8 bits per character.
The eighth bit may be set to 0 or used as a parity bit for error
detection. (odd parity/even parity)
72
Logical Data
Normally, each word or other addressable unit (byte,
halfword, and so on) is treated as a single unit of data.
Sometimes an n-bit unit can be considered as consisting of n
1-bit items.
When data are viewed this way, they are considered to be
logical data.
There are two advantages to the bit-oriented view.
To store an array of Boolean or binary data items, in which
each item can take on only the values 1 (true) and 0 (false).
To manipulate the bits of a data item. For example, if
floating-point operations are implemented in software, we
need to be able to shift significant bits in some operations.
73
x86 numeric data formats
74
Types of Operations
The number of different opcodes varies widely from machine to
machine.
However, the same general types of operations are found on all
machines.
A typical categorization is the following:
Data transfer
Arithmetic
Logical
Conversion
I/O
System control
Transfer of control
75
Common Instruction Set Operations
76
Common Instruction Set Operations …
77
Processor actions for various types of operations
78
Data Transfer
The most fundamental type of machine instruction.
Must specify
Location of source and destination operands:
Each location could be memory, a register, or the top of the stack
The length of data to be transferred.
The mode of addressing for each operand
May be different instructions for different movements or one
instruction and different addresses
If one or both operands are in memory, then the processor performs
the following:
Calculate the memory address, based on the address mode
If the address refers to virtual memory, translate from virtual to real
memory address.
Determine whether the addressed item is in cache.
If not, issue a command to the memory module.
79
80 Examples of IBM EAS/390 Data Transfer Operations
Arithmetic
Most machines provide the basic arithmetic operations:
add, subtract, multiply, and divide.
These are invariably provided for signed integer (fixed-point) numbers.
Often they are also provided for floating-point and packed decimal
numbers.
Other operations include single-operand instructions;
Absolute: Take the absolute value of the operand.
Negate: Negate the operand.
Increment: Add 1 to the operand.
Decrement: Subtract 1 from the operand.
Execution of an arithmetic instruction may involve data transfer
operations
Bring operands for input to the ALU
Deliver the output of the ALU
81
Logical
Operations for manipulating individual bits of a word or other
addressable units.
They are based upon Boolean operations.
Some of the basic logical operations that can be performed on
Boolean or binary data are shown in Table below.
The NOT operation inverts a bit.
AND, OR, and Exclusive-OR (XOR):
The most common logical functions with two operands.
EQUAL is a useful binary test.
Can be applied bitwise to n-bit logical data units.
If two registers contain the data:
(R1) = 10100101
(R2) = 00001111
(R1) AND (R2) = 00000101
82
Shifting and Rotating Operations
Logical shift:
the bits of a word are shifted left or right. On one end, the bit
shifted out is lost.
On the other end, a 0 is shifted in.
Logical shifts are used to isolate fields within a word.
The 0s that are shifted into a word displace unwanted information
that is shifted off the other end
83
Example: We want to transmit characters of data to an I/O
device 1 character at a time.
Assume each memory word is 16 bits in length containing two
characters.
To send the two characters in a word: first the left-hand
character
1. Load the word into a register.
2. Shift to the right eight times. This shifts the remaining
character to the right half of the register.
3. Perform I/O. The I/O module reads the lower-order 8
bits from the data bus.
To send the right-hand character,
1. Load the word again into the register.
2. AND with 0000000011111111. (Masking)
3. Perform I/O.
84
Arithmetic shift
Treats the data as a signed integer and does not shift the sign
bit.
On a right arithmetic shift, the sign bit is replicated into the bit
position to its right.
On a left arithmetic shift, a logical left shift is performed on all bits
but the sign bit, which is retained.
These operations can speed up certain arithmetic operations.
With numbers in twos complement notation, a right arithmetic shift
corresponds to a division by 2, with truncation for odd numbers.
Both an arithmetic left shift and a logical left shift correspond to a
multiplication by 2 when there is no overflow.
If overflow occurs, arithmetic and logical left shift operations
produce different results, but the arithmetic left shift retains the sign
of the number.
85
Rotate, or cyclic shift
preserve all of the bits being operated on.
One use of a rotate is to bring each bit successively into the
leftmost bit, where it can be identified by testing the sign of the
data.
Examples of Shift and Rotate Operations
86
Conversion
Conversion instructions are those that change the format or
operate on the format of data. An example is converting from
decimal to binary.
Input/Output
May be specific instructions
May be done using data movement instructions (memory mapped)
May be done by a separate controller (DMA) or I/O processor
System Control
Those instructions that can be executed only while the processor is
in a certain privileged state or is executing a program in a special
privileged area of memory.
Instructions reserved for the use of the operating system.
Some examples:
A system control instruction may read or alter a control register.
An instruction to read or modify a storage protection key.
Access to process control blocks in a multiprogramming system.
87
Transfer of Control
For these instructions, processor updates the program counter
to contain the address of some instruction in memory.
Why we need them?
To execute each instruction more than once and perhaps many
thousands of times. If a table or a list of items is to be processed, a
program loop is needed.
Decision making. do one thing if one condition holds, and another
thing if another condition holds.
Breaking a bigger program up into smaller pieces that can be
worked on one at a time.
88
Branch
e.g. branch to x if result is zero
Conditional and unconditional branches
Unconditional: in which the branch is always taken
Conditional: 1 or multiple bits condition code.
Example: Consider an arithmetic operation (ADD, SUBTRACT,
…) could set a 2-bit condition code with one of the four values:
zero, positive, negative, overflow.
On such a machine, there could be four different conditional
branch instructions:
BRP X Branch to location X if result is positive.
BRN X Branch to location X if result is negative.
BRZ X Branch to location X if result is zero.
BRO X Branch to location X if overflow occurs.
89
Another approach that can be used with a three-address instruction
format is to perform a comparison and specify a branch in the same
instruction.
Example:
BRE R1, R2, X
Branch to X if contents of R1 = contents of R2.\
90
SKIP INSTRUCTIONS
Another form of transfer-of-control instruction;
includes an implied address;
the skip implies that one instruction be skipped;
the implied address = address of the next instruction plus one
instruction length.
Because the skip instruction does not require a destination address
field, it is free to do other things.
Example: increment-and-skip-if-zero (ISZ)
91
In this fragment, the two transfer-of-control instructions are
used to implement an iterative loop.
R1 is set with the negative of the number of iterations to be
performed.
At the end of the loop, R1 is incremented.
If it is not 0, the program branches back to the beginning of the
loop.
Otherwise, the branch is skipped, and the program continues
with the next instruction after the end of the loop.
92
PROCEDURE CALL INSTRUCTIONS
Perhaps the most important innovation in the development of
programming languages is the procedure.
A procedure is a selfcontained computer program that is incorporated
into a larger program.
At any point in the program the procedure may be invoked, or called.
The processor is instructed to go and execute the entire procedure and
then return to the point from which the call took place.
Two reasons for the use of procedures are economy and modularity.
A procedure allows the same piece of code to be used many times.
This is important for
economy in programming effort and
making the most efficient use of storage space in the system (the
program must be stored).
93
Procedures also allow large programming tasks to be subdivided into
smaller units.
This use of modularity greatly eases the programming task.
The procedure mechanism involves two basic instructions:
a call instruction that branches from the present location to the
procedure, and
a return instruction that returns from the procedure to the place from
which it was called.
Both of these are forms of branching instructions.
Figure below illustrates the use of procedures to construct a
program.
There is a main program starting at location 4000.
This program includes a call to procedure PROC1, starting at
location 4500.
When this call instruction is encountered, the processor suspends
execution of the main program and begins execution of PROC1 by
fetching the next instruction from location 4500.
94
95
Within PROC1, there are two calls to PROC2 at location 4800.
In each case, the execution of PROC1 is suspended and PROC2
is executed.
The RETURN statement causes the processor to go back to the
calling program and continue execution at the instruction after
the corresponding CALL instruction.
Three points to note:
1. A procedure can be called from more than one location.
2. A procedure call can appear in a procedure. This allows the
nesting of procedures to an arbitrary depth.
3. Each procedure call is matched by a return in the called
program.
96
The processor must save the return address so that the return can
take place appropriately.
There are three common places for storing the return address:
Register
Start of called procedure
Top of stack
Consider a machine-language instruction CALL X, (call procedure at
location X).
If the register approach is used, CALL X causes the following actions:
RN PC + 1
PC X
RN = a register always used for this purpose,
PC = is the program counter
= the instruction length.
The called procedure can use the contents of RN later return.
97
A second option is to store the return address at the start of the
procedure.
In this case, CALL X causes
X PC
PC X + 1
Limitation: both these approaches complicate the use of
reentrant procedures.
A reentrant procedure is one in which it is possible to have
several calls open to it at the same time. (E.g. recursive
procedure (one that calls itself) is reentrant.
If parameters are passed via registers/memory, for a reentrant
procedure, some code must be responsible for saving the
parameters so that the registers or memory space are available
for other procedure calls.
98
Stacks
A powerful approach to store return address and parameters
When a call is executed, return address is kept on the stack.
When a return is executed, the address on the stack is used.
99
For parameters passing, registers or memory could be used
For example by keeping the parameters in memory just after CALL/ or directly by passing in registers
But the approaches are not flexible enough for variable number of parameters.
Again, stacks are more efficient and flexible ways to pass parameters.
Example- Consider the following:
Assignment I:
1. Transmitting characters in GOLD to an I/O device, one character at a time.
2. Implement a loop using the skip instruction (ISZ)
3. Show the stack contents for return address of the above program
100
ADDRESSING MODES
Relatively small address field in instructions
But a need to reference a large range of memory locations
A variety of addressing techniques:
1. Immediate
2. Direct
3. Indirect
4. Register
5. Register indirect
6. Displacement
7. Stack
101
102
103
• All computer architectures support more than one of
these addressing modes.
• How does the process determine the addressing mode
used in an instruction?
• Different opcodes use different modes
• One or more bits used in the instruction format (mode field)
104
Immediate Addressing
The simplest form of addressing is immediate addressing
The operand value is present in the instruction
Operand = A
Can be used to define and use constants or set initial values of
variables.
Typically, the number will be stored in twos complement form;
the leftmost bit of the operand field is used as a sign bit.
Advantage of immediate addressing:
No memory reference other than the instruction fetch is required
to obtain the operand
saving one memory or cache cycle in the instruction cycle.
Disadvantage: the size of the number is limited to the size of the
address field, which is small compared with the word length.
105
Direct Addressing
A very simple form of addressing
The address field contains the effective address of the operand:
EA = A
Disadvantage: provides only a limited address space.
Indirect Addressing
With direct addressing:
length of the address field < word length => limiting the address range
One solution is to have the address field refer to the address of a
word in memory,
This in turn contains a full-length address of the operand.
This is known as indirect addressing:
EA = (A), ( ) stands for “contents of ”
106
Advantage: for a word length of N, an address space of 2N is
available.
Disadvantage: instruction execution requires two memory
references to fetch the operand:
one to get its address and another to get its value.
107
Register Addressing
similar to direct addressing.
Difference: address field refers to a register rather than a main
memory address:
EA = R
That is: if the contents of a register address field in an
instruction is 5, then the operand value is contained in R5.
If n-bits are used for the address field that references registers
2n registers can be referenced.
Typically n is 3 to 5 bits
Hence a total of 8 to 32 general-purpose registers can be
referenced.
108
Registers are used for operands that remain in use for multiple
operations. (for example for intermediate results)
It would be wasteful if every operand is brought into a register
from main memory, operated on once, and then returned to
main memory.
Register Indirect Addressing
Analogous to indirect addressing.
Difference: address field refers to a register.
EA = (R)
The advantages and limitations of register indirect addressing are the
same as for indirect addressing.
In both cases, the address space limitation (limited range of
addresses) of the address field is overcome by having that field refer
to a wordlength location containing an address.
In addition, register indirect addressing uses one less memory
109
reference than indirect addressing.
Displacement Addressing
A very powerful mode of addressing combines the capabilities
of direct addressing and register indirect addressing.
It is known by a variety of names depending on the context of
its use, but the basic mechanism is the same.
We will refer to this as displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two
address fields, at least one of which is explicit.
The value contained in one address field (value = A) is used
directly.
The other address field, or an implicit reference based on
opcode, refers to a register whose contents are added to A to
produce the effective address.
110
Displacement Addressing
A very powerful mode of addressing
Combines the capabilities of direct addressing and
register indirect addressing.
EA = A + (R)
Requires the instruction to have two address fields:
The value contained in one address field (value = A) is used directly.
The other address field, or an implicit reference based on
opcode, refers to a register whose contents are added to A to
produce the effective address.
111
Three common uses of displacement addressing:
Relative addressing
Base-register addressing
Indexing
RELATIVE ADDRESSING (also called PC-relative
addressing)
The implicitly referenced register is the program counter (PC).
That is, the next instruction address is added to the address field to
produce the EA.
Typically, the address field is treated as a twos complement
number for this operation.
Thus, the effective address is a displacement relative to the address
of the current instruction.
112
BASE-REGISTER ADDRESSING
Interpreted as follows:
The referenced register contains a main memory address
The address field contains a displacement (usually an unsigned
integer representation) from that address.
The register reference may be explicit or implicit.
Either a single segment-base register is employed and is used
implicitly or the programmer has to choose a register to hold the
base address of a segment, and the instruction must reference it
explicitly.
In this latter case, if the length of the address field is K and the
number of possible registers is N, then one instruction can reference
any one of N areas of 2K words.
(N addresses each from N registers) x (2K addresses from the
address field of k bits) = N x 2K addresses.
113
INDEXING
Interpreted as follows:
The address field references a main memory address, and
the referenced register contains a positive displacement from that
address.
Just the opposite of the interpretation for base-register addressing.
Because the address field is considered to be a memory address in
indexing, it generally contains more bits than an address field in a
comparable base-register instruction.
But the method of calculating the EA is the same for both base-
register addressing and indexing.
And, in both cases, the register reference is sometimes explicit and
sometimes implicit (for different processor types).
114
An important use of indexing is to provide an efficient
mechanism for performing iterative operations.
For example:
Consider a list of numbers stored starting at location A.
Suppose that we would like to add 1 to each element on the list.
We need to fetch each value, add 1 to it, and store it back.
The sequence of effective addresses that we need is A, A + 1, A +
2, . . . , up to the last location on the list.
With indexing, this is easily done.
The value A is stored in the instruction’s address field, and the
chosen register, called an index register, is initialized to 0.
After each operation, the index register is incremented by 1.
115
Because index registers are commonly used for such iterative
tasks, it is typical that there is a need to increment or
decrement the index register after each reference to it.
Because this is such a common operation, some systems will
automatically do this as part of the same instruction cycle.
This is known as autoindexing.
If certain registers are devoted exclusively to
indexing, then autoindexing can be invoked implicitly and
automatically.
If general-purpose registers are used, the autoindex operation
may need to be signaled by a bit in the instruction.
Autoindexing using increment can be depicted as follows.
EA = A + (R)
(R) (R) + 1
116
In some machines, both indirect addressing and indexing are
provided, and it is possible to employ both in the same
instruction.
There are two possibilities: the indexing is performed either
before or after the indirection.
If indexing is performed after the indirection, it is termed
postindexing:
EA = (A) + (R)
First, the contents of the address field are used to access a
memory location containing a direct address.
This address is then indexed by the register value.
117
With preindexing, the indexing is performed before
the indirection:
EA = (A + (R))
An address is calculated as with simple indexing.
In this case, the calculated address contains not the operand, but
the address of the operand.
An example of the use of this technique is to construct a
multiway branch table.
At a particular point in a program, there may be a branch to
one of a number of locations depending on conditions.
A table of addresses can be set up starting at location A.
By indexing into this table, the required location can be found.
118
Stack Addressing
A stack is a linear array of locations.
It is also referred to as a pushdown list or last-in-first-out queue.
The stack is a reserved block of locations.
Items are appended to the top of the stack so that, at any given
time, the block is partially filled.
Associated with the stack is a pointer whose value is the address of
the top of the stack.
Alternatively, the top two elements of the stack may be in
processor registers.
In this case, the stack pointer references the third element of the
stack.
The stack pointer is maintained in a register.
Hence, references to stack locations in memory are register
indirect addresses.
119
The stack mode of addressing is a form of implied addressing.
The machine instructions need not include a memory reference
but implicitly operate on the top of the stack.
120
ASSEMBLY LANGUAGE
A processor can understand and execute machine instructions.
Such instructions are simply binary numbers stored in the
computer.
To program directly in machine language, then the program has
to be entered as binary data.
Consider the statement
N=I+J+K
Suppose we wished to program this statement in machine
language and to initialize I, J, and K to 2, 3, and 4, respectively.
121
ASSEMBLY LANGUAGE …
122
ASSEMBLY LANGUAGE …
The program starts in location 101 (hexadecimal).
Memory is reserved for the four variables starting at location
201.
The program consists of four instructions:
Load the contents of location 201 into the AC.
Add the contents of location 202 to the AC.
Add the contents of location 203 to the AC.
Store the contents of the AC in location 204.
This is clearly a tedious and very error-prone process.
A slight improvement is to write the program in hexadecimal
rather than binary notation. Shown in (b).
123
ASSEMBLY LANGUAGE …
We could write the program as a series of lines.
Each line contains the address of a memory location and the
hexadecimal code of the binary value to be stored in that
location.
Then we need a program that will accept this input, translate
each line into a binary number, and store it in the specified
location.
For more improvement, we can make use of the symbolic name
or mnemonic of each instruction.
This results in the symbolic program shown in (c).
Each line of input still represents one memory location.
Each line consists of three
124
Memory
Systems
125
Characteristics
We classify memory systems according to their characteristics.
The most important of these are:
Location: whether memory is internal or external to the
computer
Internal memory is often equated with main memory.
But there are other forms of internal memory. The processor
requires its own local memory, in the form of registers. Further, as
we shall see, the control unit portion of the processor may also
require its own internal memory.
Cache is another form of internal memory.
External memory: peripheral storage devices, such as disk and
tape, that are accessible to the processor via I/O controllers.
126
Characteristics …
Capacity: For internal memory, this is typically expressed in
terms of bytes (1 byte = 8 bits) or words.
Common word lengths are 8, 16, and 32 bits.
External memory capacity is typically expressed in terms of bytes.
Unit of transfer: For internal memory, this is equal to the
number of electrical lines into and out of the memory module.
This may be equal to the word length, but is often larger, such as
64, 128, or 256 bits.
To clarify this point, consider three related concepts for internal
memory:
Word: The “natural” unit of organization of memory.
The size of a word is typically equal
to the number of bits
used to represent an integer and to the instruction
length.
127
Characteristics …
Addressable units: In some systems, the addressable unit is the
word.
However, many systems allow addressing at the byte level.
In any case, the relationship between the length in bits A of an address
and the number N of addressable units is 2A = N.
Unit of transfer: For main memory, this is the number of bits read
out of or written into memory at a time.
The unit of transfer need not equal a word or an addressable unit.
For external memory, data are often transferred in much larger units
than a word, called blocks.
Method of accessing:
Sequential Access: Memory is organized into units of data,
called records.
Access must be made in a specific linear sequence.
128
Characteristics …
A shared read–write mechanism is used, and this must be moved
from its current location to the desired location, passing and
rejecting each intermediate record.
Thus, the time to access an arbitrary record is highly variable.
E.g. Tape
Direct access: As with sequential access, direct access involves
a shared read–write mechanism.
However, individual blocks or records have a unique address based
on physical location.
Access is accomplished by direct access to reach a general vicinity
plus sequential searching, counting, or waiting to reach the final
location.
Access time is variable.
E.g. Disks
129
Characteristics …
Random access: Each addressable location in memory has a
unique, physically wired-in addressing mechanism.
Access time is independent of the sequence of prior accesses and is
constant.
Thus, any location can be selected at random and directly addressed and
accessed.
E.g. Main memory and some cache systems are random access.
Associative: This is a random access type of memory that enables
one to make a comparison of desired bit locations within a word for a
specified match, and to do this for all words simultaneously.
Thus, a word is retrieved based on a portion of its contents rather than
its address.
As with ordinary random-access memory, each location has its own
addressing mechanism, and retrieval time is constant independent of
location or prior access patterns.
E.g. Cache memories.
130
Characteristics …
Here
Performance: Three performance parameters are used:
Access time (latency): For random-access memory, this is the time it
takes to perform a read or write operation;
That is, the time from the instant that an address is presented to the memory to
the instant that data have been stored or made available for use.
For non-random-access memory, access time is the time it takes to position the
read–write mechanism at the desired location.
Memory cycle time: This concept is primarily applied to random-access
memory
Consists of: access time + any additional time required before a second access
can commence.
This additional time may be required for transients to die out on signal lines or
to regenerate data if they are read destructively.
Note that memory cycle time is concerned with the system bus, not the
processor.
131
Characteristics …
Transfer rate: This is the rate at which data can be transferred
into or out of a memory unit.
For random-access memory, it is equal to 1/(cycle time).
For non-random-access memory, the following relationship holds:
n
Tn TA
R
Where :
Tn = Average time to read or write n bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)
132
Characteristics …
133
Types of Memory
Physical types:
The most common are:
Semiconductor memory
Magnetic surface memory, used for disk and tape, and
Optical and magneto-optical.
Physical Characteristics:
Volatile vs non-volatile
Magnetic-surface memories are nonvolatile.
Semiconductor memory (memory on integrated circuits) may be either
volatile or nonvolatile.
Non-erasable (ROM)
Cannot be altered (except by destroying the storage unit).
Semiconductor memory of this type is known as read-only memory (ROM).
134
The Memory Hierarchy
The design constraints on a computer’s memory:
How much?
How fast?
How expensive?
Trade-off among these key characteristics.
The following relationships hold for a variety of technologies used
to implement memory systems:
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
Dilemma:
Use large-capacity memory: capacity is needed and cost per bit is low.
But, for performance requirements, we need to use expensive, lower-
capacity memory.
135
Memory Hierarchy …
Way out of this dilemma:
Instead of relying on a single memory component or technology,
employ a memory hierarchy. A typical one shown below.
As one goes down the hierarchy:
a. Decreasing cost per bit
b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access of the memory by the processor
Smaller, more expensive, faster memories are supplemented by
larger, cheaper, slower memories.
The key to the success of this organization is item (d):
Decreasing frequency of access.
The use of two levels of memory reduces average access time.
136
Memory Hierarchy …
137
Memory Hierarchy …
138
Memory Hierarchy …
139
Memory Hierarchy …
The basis for the validity of condition (d) is a principle known
as locality of reference.
The effectiveness of a memory hierarchy depends on the
principle of moving information into the fast memory
infrequently and accessing it many times before replacing it
with new information.
During execution of programs, memory references by the
processor, for both instructions and data, tend to cluster.
That is, within a given period of time, programs tend to
reference a relatively confined area of memory repeatedly.
140
Memory Hierarchy …
Programs contain a number of iterative loops and subroutines.
Repeated references to a small set of instructions once a loop or
subroutine is entered.
Similarly, operations on tables and arrays involve access to a
clustered set of data words.
Over a long period of time, the clusters in use change;
But over a short period of time, the processor is primarily
working with fixed clusters of memory references.
There exist two forms of locality: spatial and temporal
locality.
Accordingly, it is possible to organize data across the hierarchy
such that:
The percentage of accesses to each successively lower level is
141
substantially less than that of the level above.
Memory Hierarchy …
Spatial locality refers to the phenomenon that when a given
address has been referenced, it is most likely that addresses near
it will be referenced within a short period of time.
Example: consecutive instructions in a straight-line program.
Temporal locality refers to the phenomenon that once a
particular memory item has been referenced, it is most likely
that it will be referenced next,
Example: an instruction in a program loop. (refer to lmc loop)
When the processor makes a request for an item:
First, the item is sought in the first memory level of the memory
hierarchy.
The probability of finding the requested item in the first level is
called the hit ratio, h1.
The probability of not finding (missing) the requested item in the
first level of the memory hierarchy is called the miss ratio, (1-h1).
142
Memory Hierarchy …
When the requested item causes a “miss,” it is sought in the next subsequent
memory level.
The probability of finding the requested item in the second memory level,
the hit ratio of the second level, is h2.
The miss ratio of the second memory level is (1-h2).
The process is repeated until the item is found.
Upon finding the requested item, it is brought and sent to the processor.
In a memory hierarchy that consists of three levels, the average memory
access time can be expressed as follows:
tA = h1 ⅹ t1 + (1-h1)[t1 + h2 ⅹ t2 + (1 - h2)(t2 + t3)]
= t1 + (1 - h1)[t2 + (1 - h2)t3]
In general,
t
The average access time of a memory level is defined as the time
required to access one word in that level.
In this equation, t1, t2, t3 represent, respectively, the access times of the
three levels.
143
Memory Hierarchy …
Example:
Two levels of memory: L1 and L2;
L1: 1000 words, t1= 0.01µs; L2: 100,000 words, t2 = 0.1µs
Assume: if word is in L1, access directly; if it is in L2, word first transferred to L1,
and then accessed by the processor.
Hit ratio, h1: the fraction of all memory accesses in the faster memory (e.g. cache).
t1=access time for L1, t2 = access time for L2.
Hence, average access time to access a word:
tA = h1 x t1 + (1-h1) x (t1 + t2)
In our example: Lets say the processor finds the words it needs 95% of the time
from level 1. Hence,
tA = 0.95(0.01µs) + 0.05(0.01µs + 0.1µs)
= 0.0095 + 0.0055 = 0.015µs
The average access time is much closer to 0.01µs, as desired.
In general, for high percentages of level 1 access, the average access time is much
closer to that of level 1 than that of level 2.
That is,
lim [H x T1 + (1-H) x (T1 + T2)] = T1
H 1
144
Performance of Accesses Involving only Level 1 (hit ratio)
145
Cache Memory
• It is a small, very fast memory that retains copies of
recently used information from main memory.
• It operates transparently to the programmer,
automatically deciding which values to keep and which
to replace (overwrite).
146
Cache Memory …
A small high-speed memory that is near the CPU
Sits between normal main memory and CPU
May be located on CPU chip or module
The idea behind using a cache as the first level of the memory
hierarchy is to keep the information expected to be used more
frequently by the CPU in the cache.
At any given time some active portion of the main memory is
duplicated in the cache.
Therefore, when the processor makes a request for a memory
reference, the request is first sought in the cache.
If the request corresponds to an element that is currently in
the cache, we call that a cache hit.
147
Cache Memory …
On the other hand, if the request corresponds to an element that is
not currently in the cache, we call that a cache miss.
A cache hit ratio, hc: the probability of finding the requested
element in the cache.
A cache miss ratio (1 - hc): the probability of not finding the
requested element in the cache.
During cache miss, a block of main memory, consisting of some fixed
number of words, is read into the cache
Then the word is delivered to the processor.
Because of the phenomenon of locality of reference, when a block of
data is fetched into the cache to satisfy a single memory reference, it
is likely that there will be future references to that same memory
location or to other words in the block.
148
Cache and Main Memory
Instruction Cache/Data Cache
• It is common to split cache memory into cache
dedicated to data and cache dedicated to instructions
150
Cache/Main Memory Structure
M = 2n/K blocks
Cache …
For mapping purposes, main memory is considered to consist of a
number of fixed-length blocks of K words each.
M = 2n/K blocks in main memory.
The cache consists of m blocks, called lines: m << M
The term line (not block) is used for cache:
To differentiate it from memory blocks
It contains additional fields as tag and control bits
Control bits, such as a bit to indicate whether a line has been modified since
being loaded into the cache.
Each line contains K words
The length of a line (not including tag and control bits) is called
line size.
Line size could be as small as 32 bits
with each word = a byte
A line size = 4 bytes
152
Cache …
More blocks in memory than cache lines: m << M
A line cannot be dedicated to a particular block.
Each line has a tag that identifies which particular block is
currently being stored.
A tag is usually a portion of the main memory address
153
Cache Read Operation
CPU generates read address (RA) of a word to be read.
If word is in cache, it is delivered to the processor (fast).
If not present, read required block from main memory to
cache and then deliver it to CPU
These two operations occur in parallel. (A typical organization)
In an alternative organization,
The cache is physically interposed between the processor and
the main memory for all data, address, and control lines.
In this case, for a cache miss, the desired word is first read into
the cache and then transferred from cache to processor.
Cache Read Operation
Another
Organization
Typical Cache Organizations
Cache connects to the processor via data, control and address
lines
The data and address lines also attach to data and address
buffers, which attach to the system bus
Through which main memory is reached
When cache hit occurs:
Data and address buffers are disabled
Communication is only between processor and cache
When cache miss occurs:
Desired address is loaded onto the system bus
Data are copied to both the cache and processor
156
Typical Cache Organization
Cache Design
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Cache Addressing
Where does cache sit?
Between processor and virtual memory management unit
Between MMU and main memory
Logical cache (virtual cache) stores data using virtual addresses
Processor accesses cache directly, not thorough physical cache
Cache access faster, before MMU address translation
Virtual addresses use same address space for different applications
Must flush cache on each context switch
Physical cache stores data using main memory physical addresses
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Comparison of Cache Sizes
Mapping Function
Cache of 64kByte
Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address
(224=16M)
Mapping …
Algorithm is needed for mapping main memory blocks into
cache lines.
And a means is needed for determining which main memory
block currently occupies a cache line.
The choice of the mapping function dictates how the cache is
organized.
Three techniques:
Direct
Associative, and
Set associative
163
Direct Mapping
The simplest technique of mapping.
Maps each block of main memory into only one possible
cache line
i.e. if a block is in cache, it must be in one specific place
The mapping is expressed as:
i = j modulo m
where
o i = cache line number
o j = main memory block number
o m = number of lines in the cache
Direct Mapping …
Figure below shows the mapping for the first m blocks of main
memory.
Each block of main memory maps into one unique line of the
cache.
The next m blocks of main memory map into the cache in the
same fashion;
That is, block Bm of main memory maps into line L0 of cache,
block Bm+1 maps into line L1, and so on.
165
Direct Mapping from Cache to Main Memory
Direct Mapping …
167
Direct Mapping …
To summarize:
Address length = (s + w) bits
Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory = 2s+w/2w = 2s
Number of lines in cache = m = 2r
Size of cache = 2r+w words or bytes
Size of tag = (s - r) bits
168
Direct Mapping Address Structure
Tag s-r Line or Slot r Word w
8 14 2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks that map into the same line have the same Tag field.
Check contents of cache by finding line and checking Tag
Direct Mapping Cache Line Table
0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
Word
Tag 22 bit 2 bit
Word
Tag 9 bit Set 13 bit 2 bit
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
Cache size (bytes)
direct
2-way
4-way
8-way
16-way
Replacement Algorithms (1)
Direct mapping
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2)
Associative & Set Associative
To achieve high speed, such an algorithm must be
implemented in hardware.
A number of algorithms have been tried.
Least Recently used (LRU)
e.g. in 2 way set associative
Which of the 2 block is lru?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random
Write Policy
Two cases to consider.
If the old block in the cache has not been altered, then it may be
overwritten with a new block without first writing out the old block.
If at least onewrite operation has been performed on a word in that line of
the cache, then main memory must be updated by writing the line of
cache out to the block of memory before bringing in the new block.
Must not overwrite a cache block unless main memory is up to date
Multiple CPUs may have individual caches
If a word is altered in one cache, it could invalidate a word in other
caches.
I/O may address main memory directly
If a word has been altered only in the cache, then the corresponding
memory word is invalid.
Further, if the I/O device has altered main memory, then the cache word
is invalid.
Write through
The simplest technique.
All writes go to main memory as well as cache
Multiple CPUs can monitor main memory traffic to keep local
(to CPU) cache up to date
Lots of traffic
Slows down writes
Write back
Updates initially made in cache only
Update bit (dirty bit) for cache slot is set when update occurs
If block is to be replaced, write to main memory only if update
bit is set
Other caches get out of sync
I/O must access main memory through cache
This makes for complex circuitry and a potential bottleneck.
(why?)
Multilevel Caches
High logic density enables caches on chip
Faster than bus access
Frees bus for other transfers
Common to use both on and off chip cache
L1 on chip, L2 off chip in static RAM
L2 access much faster than DRAM or ROM
L2 often uses separate data path
L2 may now be on chip
Resulting in L3 cache
Bus access or now on chip…
Unified v Split Caches
One cache for data and instructions or two, one for data and
one for instructions
Advantages of unified cache
Higher hit rate
Balances load of instruction and data fetch
Only one cache to design & implement
Advantages of split cache
Eliminates cache contention between instruction fetch/decode
unit and execution unit
Important in pipelining