2co N3 PDF
2co N3 PDF
2co N3 PDF
In this lecture we develop models for the control unit, ALU and memory. It is the first
that will require most care.
Output Enable(s): We know that every register has an Output Enable (OE) input which
determines whether the tri-states on its output lines have a high or low resistance.
These are driven by levels such as OEac, OEpc and so on. We assume that OEac=1,
and similar, sets the tri-state for the AC in low resistance mode.
Next, as a more careful study of the CPU diagram Fig. 3.1 shows, some extra tri-state
buffers are required to satisfy every register input that has more than one potential
1
3/2 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
MAR
OE1 OEpc OEsp OE2 Memory
OEmar
PC SP CLKmem
INCpc/LOADpc WRITE/READ
OEmem
OEad MBR
IR OEmbr
IR(opcode) IR(address)
OE3 OE4 OE5 OE6 OE7
OEop
AC
OEac
CU Status
ALU
SETalu
SETshft
Control Lines
ALU: We develop internal hardware for the ALU in §3.2. For now, we treat it as
a black box with 8 or fewer functions, which thus require 3 level bits SETalu[2:0] to
define. These are specified in the following table.
SETalu Operation Comment
000 ALUnoop Do nothing. Let the AC input appear on output
001 ALUnot Invert each bit in the AC input
010 ALUor Output = AC .OR. MBR
011 ALUand Output = AC .AND. MBR
100 ALUadd Output = AC + MBR
.. ..
. .
PC: The Program Counter has a one-bit level input which tells it whether to load the
input or to increment when the clock pulse is received.
SP: The Stack Pointer has a two-bit level input which tells it whether to load the
input, increment, or decrement, when the clock pulse is received.
3.1. THE CONTROL UNIT 3/3
Now consider the execution phase of a few of the instructions in terms of levels and
pulses
LDA x (levels and pulses)
10. OEad=1; OE1=1; CLKmar;
11. OEmar=1; WRITE=0; OEmem=1; CLKmbr;
12. OEmbr=1; OE4=1; CLKac;
We can already start sketching out a one-hot controller design — that in Fig. 3.2. The
first three D-types handle the fetch, then at D-type #4 comes the decoding. Then if
STA were high, say, the “hot 1” would pass to D-type #13 then #14 which execute
STA, then back to the fetch, and so on.
3/4 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
IR(opcode)
CSL1 CSP1
Decoder
8−to−256
CLK
LDA
D Q D Q D Q D Q D Q D Q D Q
1 2 3 4 10 11 12
STA
CLK D Q D Q
13 14
ADD
D Q D Q D Q
15 16 17
AND
D Q
18
Figure 3.2: The control unit for the first four opcodes.
HALT Decoder
8−to−256
LDA
HALT
STA LDA
STA
ADD
ADD AND
and so on
Figure 3.3: Decoding the opcode. (a) With gates for using bits, and (b) as a black-box.
3.1. THE CONTROL UNIT 3/5
SETalu[2]
SETalu[1]
SETalu[0]
CLKmem
LOADpc
LOADsp
CLKmbr
CLKmar
OEmem
OEmbr
OEmar
CLKpc
CLKac
CLKsp
INCsp
CLKir
OEop
OEad
OEpc
OEac
What
OEsp
OE1
OE2
OE3
OE4
OE5
OE6
OE7
Line
Ftch 1. 1 X X X 1
2. 1 X X X 1 1
3. 1 X X X 1 1
Dcd 4. 1 X X X
LDA 10. 1 1 X X X 1
11. 1 X X X 1 1
12. 1 1 X X X 1
STA 13. 1 1 1 1 0 0 0 1 1
14. 1 1 1 1
ADD 15.
16.
17.
AND 18.
19.
20.
JMP 21.
BZ 22.
23.
NOT 24.
SHR 25.
HLT 99.
Now look down the columns for each signal. As a work in progress, after filling in the
Fetch, Decode, and the execute phases of LDA and STA, we have already
OEpc = CSL1
OEad = CSL10 .OR. CSL13
OEmar = CSL2 .OR. CSL11 .OR. CSL14
CLKmar = CSP1 .OR. CSP10 .OR. CSP13
3/6 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
ALU Out
Full Adder
Carry In
An
Bn
Decoder
setALU 0
0 1
1 2
2 3
4
Carry Out
carry is slow, and that speed-up can be achieved by inserting carry-look-ahead circuitry
between a number of bits.
The ALU contains a none/left/right shifter at its output. This is operated separately
from the other functions using a 2-bit setSHFT input (so one can add two numbers
and rightshift them all in one pass). Some designs use a shift register, but here we
regard it a black box which transparently allows bit[n] of the input to appear as one of
bit[n], bit[n-1], or bit[n+1] of the output.
BA 15 BA 14 BA 2 BA 1 BA 0
setALU
3 0
Carry O 15 O 14 O2 O1 O0
setSHFT Shifter
2
C
N ZNV
Z generation
V 16
Output
Figure 3.5: Multi-bit ALU. The carry-in on the right must be 0. A shifter is placed on the back end. In
our design it is not a clocked register.
Z Zero flag: This is set to 1 whenever the output from the ALU is zero.
N Negative flag: This is set to 1 whenever the most significant bit of the output is 1.
Note that it is not correct to say that it is set when the output of the ALU is neg-
3/8 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
ative: the ALU doesn’t know or care whether you are working in 2’s complement.
However, this flag is used by the controller for just such interpretations.
C Carry flag: Set to 1 when there is a carry from the adder.
V oVerflow flag: Set to 1 when Amsb = 1, Bmsb = 1, but Omsb = 0; or when
Amsb = 0, Bmsb = 0, but Omsb = 1. Allows the controller to detect overflow
during 2’s complement addition. Note (i) that the ALU does not know why it is
setting the flag, and (ii) in our BSA, the msb=15.
3.4 Memory
You will have gathered that memory is no more than a very large collection of registers
held on an array on a chip, one register being accessible at a time via an addressing
mechanism.
To write to memory, one needs to set the address, place the data on the data bus,
and clock the recipient register. To read from memory, one needs to set the address,
output enable the relevant register onto the data bus, and clock the MBR.
Address Bus
MAR D Q D Q D Q
Address Decoder
D Q D Q D Q
ChipSelect
Write/Read
OE
MBR
Data Bus
Figure 3.6: Memory hardware Note that the input data lines (top) and output data lines (bottom) are
different. They are not of course — they are the same bus and have been drawn like this to avoid
crossing wires.
Figure 3.6 shows the design of a memory with contents 3 bits wide.
The address lines enter a decoder, which selects one register, ie one row of three D-
type latches with couple clock inputs. Notice that that each output from the decoder
is used in two ways: first for writing it is ANDed with the clock signal, and second for
reading it is ANDed with the register outputs.
The ChipSelect (CS) input is unexpected, as one might expect the memory to be
permanently selected! We shall see in a moment that it actually used as part of
address selection in a memory made from multiple chips. Assume it is CS=1 for now.
3/10 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
The OE input is expected, but instead of a CLKmem input we now use the more usual
name WRITE/READ. (By the way, this notation means WRITE is the “equivalent” of
READ.)
At first this looks like a level selector signal but, by process of elimination, this must
be the clocked input.
1
W/R 0
Bus no longer floats
1 Bus floating
Data Valid
Data
0
Read access time Data hold time
Figure 3.7: Timing of signals for (a) memory read.
3.5. MEMORY ORGANIZATION IN HARDWARE 3/11
Figure 3.8: Time of signals for the memory write. Notice that the signal which clocks the register on
the memory during write is the WRITE/READ signal.
A 23 A 0 go to
Address Bus both chips
MAR
High Low
CS
Byte Byte
OE
W/R
MBR
Figure 3.9: Using two Byte wide chips to make a 16-bit wide memory.
to generate an array 2 chips high and, as before, 2 wide, as shown in Fig. 3.10.
But each chip has only 23 address lines, A0 − A22 . What happens to A23 ? It is input
into a 1-to-2 line decoder, whose output is connected to the ChipSelect inputs. If
A23 = 0, the lower pair is selected, and if A23 = 1 the upper pair is selected. The OE
and W/R inputs are connected to all chips.
A 22 A 0 go to
Address Bus all chips
MAR
A23
1 CS CS
1 to 2−line OE OE
Decoder W/R W/R
0
CS CS
OE OE OE
W/R W/R W/R
MBR
Figure 3.10: Using 8MByte chips to create a memory with 16M locations and a width of 2 Bytes.
3.6. MEMORY ORGANIZATION IN SOFTWARE: MEMORY ADDRESSING MODES 3/13
Data
Figure 3.11: Full address decoding, but using only 2× 1K memories in an 8K address space.
(1) Immediate, (2) Direct, (3) Indirect, and (4) Indexed addressing.
Immediate addressing does not involve further memory addressing after the instruction
fetch. It provides a method of specifying a constant number rather than an address
in the operand. For example LDA# x loads the accumulator immediately with the
operand x.
CPU MAR
Address Bus
PC SP Memory
LDA# x INCpc/LOADpc
CLKmem
AC←IR (address) IR
IR(opcode) IR(address) MBR
Data Bus
AC
ADD# 22
AC←AC + 22 CU Status
ALU
SETalu
Control Lines
to Registers, ALU, Memory, etc Outside the CPU
Looking back at our Standard architecture, you will see that there is a direct link from
the IR (address) to the AC to allow this to happen.
Immediate addressing allows statements like “n=n+10” written in some high level lan-
guage to be turned into assembler, using modified versions of other instructions. For
example ADD# 22. How would you actually realize this on our machine?
We have already use direct addressing in the lectures. This is where the operand is the
address of the data you require. Another way of saying this is that the operand is a
pointer to the data.
LDA x ♣ Quick Example: What does location 23 and
MAR←IR (address) the AC contain after this code snippet?
MBR←hMAR i Code AC Loc22 Loc23
AC←MBR LDA #21
STA 22
ADD x
ADD #1
MAR←IR (address)
etc ADD 22
MBR←hMAR i
STA 23
AC←MBR + AC
3.6. MEMORY ORGANIZATION IN SOFTWARE: MEMORY ADDRESSING MODES 3/15
LDA x,X
Cpus often provide a number of registers for temporary storage which avoid the need
to hold and access pointers in main memory (with obvious savings in time). Methods
which use these registers are known as register addressing modes.
Indexed addressing is a straightforward example of register addressing, and the only
one we consider in these lectures. In the example above, x is an address, and X is an
index register holding an offset. The effective address given to LDA is x+X, the sum
of the two.
The register X will be a counter register, and can be loaded separately, incremented
and decremented.
Here is some half-baked code for Matlab-esque addition of two arrays of length 100,
and placing the result in a third array ...
3/16 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
47 38 38 38 38
6 52 52 52 52
5
4
3
2 47 47 47 47
1
0
Register X
LDA #2 LDA 2 LDA (2) LDA 2,X
4
AC 2 47 38 52
Immediate Direct Indirect Indexed
Figure 3.12: Addressing modes: the contents of the AC after LDA immediate, direct, indirect and
indexed
3.7. A SMALL PROGRAM 3/17
Initialize PC to 0 and start the controller running by starting the clock. The AC gets
loaded with 5 (from location 20), and then has 300 (from location 22) subtracted from
it. Because 5 isn’t equal to 300, the Z flag does not get set, and so we reload the AC
with 5, add 1 (from location 21) to it, and then restore it in location 20. The program
eventually halts when location 20 is incremented up to 300. (Halting simply stops the
clock.)
Notice that there is no requirement for assembler mnemonics or for that matter a high
level language. The raw form of the executable program and data is binary code. In
the good old days, when folk were short of things to do, binary code was loaded into
the memory by hand.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING) 3/19
• it reads the file of high level statements, checks the syntax, and replaces each
statement with the relevant sequence of assembler instructions.
• it places declared variables, and the names of labels, into a symbol table.
• each time the variable or label is encountered, it replaces the name in the code by
its location in the symbol table.
The instructions are written out into a temporary file. Since each instruction is of
known length, the compiler can associate with it provisional memory addresses, relative
to the Beginning of the Program (BOP). So, after the first pass, the compiler knows
the full extent of the program — or, more strictly, the extent of the instructions. After
the first pass, the compiler therefor knows where it can start placing allocated data.
We will call this memory location BOD (for Beginning of Data).
On the second pass the compiler
After reading the declaration of variables, the compiler could start making the symbol
table, where BOD indicates the unknown Beginning Of Data.
Name SymTab Actual Location Words
a var0 BOD 1
b var1 BOD+1 1
c var2 BOD+2 1
Let the beginning of the program be located at BOP, and let us count in words. The
compiler would start making its temporary file:
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING) 3/21
Address Instruction
BOP LDA #1
BOP+1 STA var0
BOP+2 LDA #2
BOP+3 STA var1
BOP+4 ADD var0
BOP+5 STA var2
BOP+6 BZ lab0
At this point, the compiler knows that the label will be just after the closing bracket,
but doesn’t know where that is. So it adds an entry to the symbol table
Name SymTab Actual Location Words
a var0 BOD 1
b var1 BOD+1 1
c var2 BOD+2 1
lab0 ? -
then carries on.
Address Instruction
BOP LDA #1
BOP+1 STA var0
BOP+2 LDA #2
BOP+3 STA var1
BOP+4 ADD var0
BOP+5 STA var2
BOP+6 BZ label0
BOP+7 LDA #3
BOP+8 STA var0
BOP+9 LDA #4
BOP+10 STA var1
BOP+11 HALT
By the LDA #4 instruction, the compiler knows that lab0 is actual BOP+9, so the
symbol table is updated
Name SymTab Actual Location Words
a var0 BOD 1
b var1 BOD+1 1
c var2 BOD+2 1
lab0 BOP+9 -
3/22 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
When it reached the end of the program, the compiler knows that the beginning of
data BOD = BOP+12. It then rewrites the symbol table as:
Name SymTab Actual Location Words
a var0 BOP+12 1
b var1 BOP+13 1
c var2 BOP+14 1
lab0 BOP+9 -
There are now two possibilities for the second pass. The compiler can either rewrite
var0, var1 and var2 and lab0 leaving BOP as a variable to be filled in at run time, or it
can set a value a value for BOP. Choosing the latter with BOP=100 (dec), we end up
with:
Address Instruction
100 LDA #1
101 STA 112
102 LDA #2
103 STA 113
104 ADD 112
105 STA 114
106 BZ 109
107 LDA #3
108 STA 112
109 LDA #4
110 STA 113
111 HALT
112 0
113 0
114 0
Now simply replace the opcodes by the relevant binary, and we have executable machine
code. Notice that the assembler has to use a different opcode for LDA, LDA#, and
so on.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING) 3/23
Free memory
Memory big2
allocated 1097
on heap
Free memory during big1
Loc
execution
97
Free memory 96 address of big2 96 1097
95 address of big1 95 97
94 x 94 x 94 x
93 array[10] 93 array[10] 93 array[10]
Declared
Declared data
data
84 array[1] 84 array[1] 84 array[1]
83 d2 83 d2 83 d2
82 d1 82 d1 82 d1
81 instruction 81 instruction 81 instruction
80 instruction 80 instruction 80 instruction
Program Program
Figure 3.13: (a) All data of known size when the program is written. (b) Sizes of big1 and big2 unknown
beforehand. Space is allocated on the heap. Only the address of the address is known beforehand, not
the address itself.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING) 3/25
1
often called POPing
3/26 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
3.8.4 Subroutines
Subroutines allow the programmer to modularize code into small chunks which do a
specific task and which may be reused. For example,
main () /* main program */
{
int a ,b , v1 ;
a = 1; b = 10;
v1 = mysub (a , b ); /* subroutine call */
c = 0;
..
}
How can we call a subroutine in assembler language. We need to (1) jump to the
subroutine’s instructions, (2) transfer the necessary data to the subroutine, (3) arrange
for the result to be transferred back to the calling routine, and (4) jump back to handle
the instruction after the calling point.
Let us consider the assembler for a subroutine where there are two parameters. One
is in location 47, the other in location 48.
... etc ...
LDA #1 \\ a =1
STA 47
LDA #10 \\ b =10
STA 48
LDA 48 \\ 2 nd parameter is datum b in loc 48
PUSH \\ push it onto stack
LDA 47 \\ 1 st parameter is datum a in loc 47
PUSH \\ push it onto stack
JSR MYSUB \\ Now Jump to SubRoutine at location MYSUB
LDA #0 \\ rest of prog : these lines do c =0
STA 49
... etc ...
\\ Subroutine code from here
MYSUB LDA SP ,2
ADD SP ,3
SHR
RTS \\ ReTurn from Subroutine
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING) 3/27
Notice that the subroutine starts at labelled address, so JSR is very like JMP. The
difference is that we have to get back after the subroutine. After the fetch, the
program counter is already incremented to point at the next instruction in the calling
program (ADD in this example). So in its execute phase, JSR pushes the current value
of the PC onto the stack, and then loads the operand into the PC, causing the next
instruction fetched to be the first in the subroutine. The RTS command ends the
subroutine by pulling the stored program counter from the stack. Because the stack is
a LIFO buffer, subroutines can be “nested” to any level until memory runs out.
Missing out the detail the JSR and RTS instructions perform the following:
JSR x RTS
hSP i ←PC SP←SP +1
PC←IR (address); SP←SP-1 PC←hSP i
When the subroutine has parameters, we also have to worry how to transfer the param-
eters to the subroutine. There are various ways, described in Clements §6.5 and Hill
and Peterson § 15.2. Here we mention just one method which again uses the stack.
Two methods are
In the latter method, the calling routine pushes the parameters onto the stack in order,
then uses JSR which of course pushes the return PC value onto the stack.
Figure 3.15 shows a very correct way of using the parameters from the stack by popping
and pushing. The return PC is pulled and stored temporarily, and then the parameters
are pulled, and the return PC pushed.
The problem with this is that it is very time consuming. A more efficient method is to
use the parameters by indexed addressing relative to the stack pointer, SP. That is,
the subroutine would access parameter 1, for example, by writing
LDA SP,2
When the subroutine RTS’s it will pull the return PC off the stack, but the parameters
will be left on. However, the calling routine knows how many parameters there were.
However, rather than pulling them off, it can simply increase the stack pointer by the
number of parameters (ie, by three in our example). This leaves the parameters in
memory, but as they are now outside the stack, they will get overwritten on the next
PUSH.
3/28 LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
Param2 SP
During subroutine Param1
Access Param1 as SP+2 SP
Access Param2 as SP+3
Leave return value in AC
RTS SP:=SP+2