Computer Basics
Computer Basics
qxd
22/12/2005
6:17 PM
Page 293
7
7 Structure of the CPU
Chapters 5 and 6 describe what a computer does; in this chapter we show how a computer converts an instruction op-code into the actions that implement the op-code. We build on some of the material we covered in the section on digital logic. In this chapter we demonstrate how a computer is organized internally and how it reads instructions from memory, decodes them, and executes them.
8 Accelerating performance
The previous chapter described the structure of a simple computer. Here we describe how the performance of computers can be increased by overlapping the execution of instructions (pipelining). We also look at cache memory and introduce parallel processing.
INTRODUCTION
In Chapters 2 and 3 we introduced combinational and sequential logic elements and demonstrated how to build functional circuits. In Chapters 5 and 6 we introduced the instruction set architecture and low-level programming. This chapter bridges the gap between digital circuits and the computer by demonstrating how we can construct a computer from simple circuits; that is, we show how a computer instruction is interpreted (i.e. executed). We begin by describing the structure of a simple generic CPU. Once we see how a computer operates in principle, we can look at how it may be implemented. We describe the operation of a very simple one-and-a-half address machine whose instructions have two operands; one in memory and one a register. Instructions are written in the form ADD A, B that adds A to B and puts the result in B. Either A or B must be a register. Some readers will read this introduction to the CPU before the previous two chapters on assembly language programming. Consequently, some topics will be re-introduced. Instead of introducing the computer all at once, we will keep things simple and build up a CPU step by step. This approach helps demonstrate how an instruction is executed because the development of the computer broadly follows the sequence of events taking place during the execution of an instruction. In the next chapter we will nd that this computer is highly simplied; real computers dont execute an instruction from start to nish. Todays computers overlap the execution of instructions. As soon as one instruction is fetched from memory, the next instruction is fetched before the previous instruction has completed its execution. This mechanism is called pipelining and we examine it more closely in the next chapter.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 294
by step, on a very primitive hypothetical computer. We begin with the address paths that are used to locate the next instruction to be executed.
We now demonstrate how instructions like MOVE. W X,DO are read from memory and how the sequence of actions that implement this operation are carried out inside the CPU. Because the CPU is such a complex device, we have decided to walk through the execution of an instruction, step
A HYPOTHETICAL COMPUTER Anyone describing the internal operation of a computer must select an architecture and an organization for their target machine. We have two choices: register to memory or register to register. The register-to-memory model ts the architecture of processors like the Pentium and 68K, whereas the register-to-register model corresponds to processors like the ARM, which we introduce later. When describing the internal structure of a computer, we could
describe either a system that executes an instruction to completion before beginning the next instruction, or a computer that overlaps or pipelines the execution of instructions. I have decided to begin this chapter with the description of a register-to-memory, non-pipelined processor. A non-pipelined organization is easier to describe than one that overlaps the execution of instructions.
Address
The first stage in the execution of any instruction is to fetch it from memory. The program counter contains the address of the next instruction to be executed
The contents of the PC are incremented each time it is used Instruction op-code
Memory
Data
The main memory or immediate access store contains both instructions and data MBR
IR
Op-code
Address
Memory buffer register When you read an instruction, it is moved to the instruction register where it is decoded
When you read from memory, the contents of the memory location accessed are loaded into the MBR
Control signals
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 295
295
CONSTANTS, VARIABLES, POINTERSA REMINDER A constant or literal is a value that does not change during the execution of a program. Its value is set at compile time or assemble time. Literal addressing is used to handle constants. A variable is a value that changes during the execution of a program. A pointer is a variable that contains the address of a variable; that is, a pointer points to a variable in memory. An address register is a pointer register. The following code implements the expression X 5 Y Z3 and illustrates the nature of constants, variables, and pointers.
data, and control. Data comprises the instructions, constants, and variables that are stored in memory and registers. Control paths comprise the signals that trigger events, provide clocks, and control the ow of data and addresses throughout the computer.
a write cycle, or from which data is being read in a read cycle. At this stage, the MAR contains a copy of the (previous) contents of the PC. When a memory read cycle is performed, the contents of the memory location specied by the MAR are read from the memory and transferred to the memory buffer register (MBR). We can represent this read operation in RTL terms as
At the end of this operation, the program counter points to the next instruction while the current instruction is being executed. The program counter in incremented by 4 rather than 1 because many of todays high-performance computers have 32-bit instructions. Computers are byte-addressed but have 32-bit architectures; that is, they can address individual bytes in memory, although instructions and data normally fall on 32-bit (i.e. 4-byte) boundaries.1 The memory address register (MAR), holds the address of the location in the memory into which data is being written in
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 296
THE INSTRUCTION REGISTER A modern RISC processor like the ARM, which we introduce in Chapter 9 has a 32-bit instruction register which is sufcient for an op-code and three register addresses; for example, ADD R1, R2, R3. Processors like the Pentium and 68K have very complex variable-length instruction formats and you cannot use a simple xed-length instruction register. The processor has to fetch the rst 16 bits of an instruction from memory, decode it, and execute successive fetches to assemble the rest of the instruction.
All instructions are executed in a two-phase operation called a fetchexecute cycle. During the fetch phase, the instruction is read from memory and decoded by the control unit. The fetch phase is followed by an execute phase in which the control unit generates all the signals necessary to execute the instruction. Table 7.1 describes the sequence of operations taking place in a fetch phase. In Table 7.1 FETCH is a label that serves to indicate a particular line in the sequence of operations. The notation IR(op-code) means the operationcode eld of the instruction register.
The arithmetic and logic unit (ALU) is the workhorse of the CPU because it carries out all the calculations. Arithmetic and logical operations are applied either to the contents of the data register or MBR alone, or to the contents of the data register and the contents of the MBR. The output of the ALU is fed back to the data register or to the MBR. Two types of operation are carried out by the ALUarithmetic and logical. The fundamental difference between arithmetic and logical operations is that logical operations dont generate a carry when bit ai of word A and bit bi of B are operated upon. Table 7.2 provides examples of typical arithmetic and logical operations. A logical shift treats an operand as a string of bits that are moved left or right. An arithmetic shift treats a number as a signed twos complement value and propagates the sign bit during a right shift. Most of these operations are implemented by computers like the 68K, Pentium, and ARM. Having developed our computer a little further, we can now execute an elementary program. Consider the high-level language operation P Q R. Here the plus symbol means arithmetic addition. The assembly language program required to carry out this operation is given below. Remember that P, Q, and R are symbolic names that refer to the locations of the variables in memory.
Load data register D0 with the contents of memory location Q3 ADD R,D0 Add the contents of memory location R to data register D0 STORE D0,P Store the contents of data register D0 in memory location P LOAD Q,D0
2 A machine with a single accumulator does not need to specify it explicitly; for example, an 8-bit microprocessor may use ADD P to indicate add the contents of P to the accumulator. We write ADD P,DO and make D0 explicit to be consistent with the notation we used for 68K instructions. 3 We have dened explicit LOAD and STORE operations to be consistent with the CPU we construct later in this chapter. The 68K uses a single MOVE operation to indicate LOAD and STORE
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 297
297
Incrementer Operand address to memory address register Instruction read from memory
Address
Memory
Data
IR
Op-code
Address
Memory buffer register MBR Data read from memory or to be written to memory
Control unit
Control signals Output from ALU goes to D0 or MBR Address paths A Data is transmitted to the ALU where it is operated on
f(A,B)
ALU B
Data paths
Operation Addition Subtraction Negation Multiplication Division Divide by 2 Multiply by 2 AND OR NOT EOR Shift left Shift right
Class Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Logical Logical Logical Logical Logical Logical
Typical mnemonic ADD SUB NEG MULU DIVU ASR ASL AND OR NOT EOR LSL LSR
The one-address machine requires a rather cumbersome sequence of operations just to carry out the simple action of adding two numbers. If we had a computer with a three-address format, we could have written
ADD Q,R,P Add the contents of Q to the con-
tents of R and put the result in P Three-address machines are potentially4 faster than one-address machines, because they can do in one instruction things that take other machines three operations. However, the power of threeaddress machines can be achieved only by means of a more complex and expensive CPU and memory system. We can demonstrate how the CPU operates by examining the execution of ADD R, D0 in terms of register transfer language. Table 7.3
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 298
Move the contents of the PC to the MAR Increment the contents of the PC Read the current instruction from the memory Move the contents of the MBR to the IR Move the op-code from the IR to the CU Move the operand address to the MAR Read the data from memory Perform the addition Move the output of ALU to the data register Operations sharing the same line are executed simultaneously. ALU [MBR] and ALU [D0] are executed simultaneously.
Table 7.3 Expressing the FETCH/EXECUTE cycle for an ADD instruction in RTL.
MICROPROGRAMMING The terms microprogram, microprogramming, microcode, microinstruction, and micro-operation have nothing to do with microprocessor or microcomputer. A micro-operation is the smallest event that can take place within a computer; typical micro-operations are the clocking of data into a register, a memory read operation, putting data on a bus, or adding two numbers together. A microprogram consist of a sequence of microinstructions that, when executed, implement a machine-level instruction. For example, the machine-level or macro-level operation ADD P,D0 can be implemented by executing a sequence of micro-operations. The instructions that comprise a microprogram are called microcode.
gives the sequence of operations carried out during the fetch and execute phases of an ADD R, DO instruction. These operations tell us what is actually going on inside the computer. During the fetch phase the op-code is fed to the control unit by CU [IR(op-code)] and used to generate all the internal signals required to perform the additionthis includes programming the ALU to do addition by adding together the data at its two input terminals to produce a sum at its output terminals. Operations of the form [PC] [MAR] or [D0] [D0] [MBR] are often referred to as microinstructions. Each assembly level instruction (e.g. LOAD, ADD) is executed as a series of microinstructions. Microinstructions and microprogramming are the province of the computer designer. In the 1970s some machines were user-microprogrammable; that is, you could dene your own instruction set. We take a further look at microinstructions later in this chapter.
sequential mode; that is, the computer can execute only a stream of instructions, one by one in strict order. We covered conditional behavior in the previous chapter and we require a means of implementing instructions such as BEQ Target (branch on zero ag set to Target). The computer in Fig. 7.2 lacks a mechanism for making choices or repeating a group of instructions. To do this, the CPU must be able to execute conditional branches or jumps such as BEQ XYZ. Weve already met the branch that forces the CPU to execute an instruction out of the normal sequence. The block diagram of Fig. 7.3 shows the new address and data paths required by the CPU to execute conditional branches. Three items have been added to our computer in Fig. 7.3:
G
a condition code register, CCR a path between the CCR and the control unit a path between the address eld of the instruction register and the program counter.
4 A three-address machine is faster than a machine with registerto-register instructions only if the access time of memory is comparable to the access time of register. Memory takes much longer to access than internal registers and, therefore, a three-address machine is not currently practical.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 299
299
Incrementer
Address The next instruction may be the next instruction in sequence or the instruction at the branch target address
Memory
The IR's operand field provides the branch target address IR Op-code Address
Data
Clock
Control signals
The control units uses the CCR output to decide whether to execute the next instruction or to force a branch
f(A,B)
ALU B
CCR Condition code register program flow control paths The word in the CCR indicates whether the last operation gave a zero or a negative result or whether a carry was generated
The condition code register or processor status register takes a snapshot of the state of the ALU after each instruction has been executed and records the state of the carry, negative, zero, and overow ag bits. A conditional branch instruction interrogates the CCRs current state. The control unit then either forces the CPU to execute the next instruction in series or to branch another instruction somewhere in the program. Lets look at the details of the conditional branch. The CPU updates the bits of its condition code register after it carries out an arithmetic or a logical operation to reect the nature of the result. The following is a reminder of the operations of the condition code bits.
C carry Set if a carry was generated in the last operation. The C-bit is, of course, the same as the carry bit in the carry ip-op. Z zero Set if the last operation generated a zero result. N negative Set if the last result generated a negative result. V overow Set if the last operation resulted in an arithmetic overow; that is, an operation on one or two twos complement values gave a result that was outside its allowable range (an arithmetic overow occurs during addition if the sign bit of the result is different from the sign bit of both operands).
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 300
Figure 7.4 shows that an additional data path is required between the operand eld of the IR and the data register and ALU to deal with literal operands. In fact, the architecture of Fig. 7.4 can execute any computer program. Any further modications to this structure improve the CPUs performance without adding any fundamentally new feature. Figure 7.5 completes the design of the computer. We have added a second general-purpose data register D1 and a pointer register A0. In principle, there is nothing stopping us adding any number of registers. As you can see, three buses, A, B, and C are used to transfer data between the registers and ALU. The structure of Fig. 7.5 can implement instructions with more complex addressing modes than the simple direct (absolute) addressing we have used so far; for example MOVE (A0),D1 can be implemented by the following sequence of micro-operations.
Move source operand address to MAR Read the actual operand from memory Copy data to D1
Typical machine-level conditional operations expressed in RTL are 1. Branch on carry clear (jump to the target address if the carry bit in the CCR is 0)
BCC target: IF [C]= 0 THEN [PC][IR(address)]
2. Branch on equal (jump to the target address if the Z-bit in the CCR is 1)
BEQ target: IF [Z]= 1 THEN [PC][IR(address)]
An example of a conditional branch is as follows. Subtract X from contents of D0 If the result was zero then branch to Last, otherwise continue Target address of branch (if taken)
This sequence has been simplied because, you will see from Fig. 7.5, that there is no direct path between register A0 and the MBR. You would have to put the contents of A0 onto bus A, pass the contents of bus A through the ALU to bus C, and then copy bus C to the MAR. We will return to this theme when we look at the detailed design of computers. We have now demonstrated the ow of information that takes place during the execution of a single address computer instruction. In the next section we reinforce some of the things we have covered by showing how you can simulate a computer architecture in C.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 301
301
Incrementer
Address
Memory
Data
IR
Op-code
Operand
Clock
Control signals The operand field of the IR is fed to the data register or ALU to provide a literal operand The literal from the instruction register can by loaded into the data register Literal data paths CCR
A f(A,B)
ALU B The literal from the instruction register can be supplied to the ALU
have avoided all but Cs most basic elements. All data types are 8 bits and the only C constructs we use are the while, the if . . . then . . . else, and the switch constructs, which select one of several courses of action. We are going to construct two simulatorsthe rst is a very primitive CPU with an 8-bit instruction that simply demonstrates the fetch/execute cycle, and the second is not too dissimilar to typica rst-generation 8-bit microprocessors.
branch instructions. Only the store instruction performs a write to memory. Choosing an instruction set requires many compromises; for example, if the number of bits in an instruction is xed, increasing the number of different instructions reduces the number of bits left for other functions such as addressing modes or register selection. We can dene an instruction set for our primitive 8-bit machine as
Instruction
Mnemonic
LDA STA ADD BRA BEQ N N N N N
RTL definition
[D0] [N] [N] [D0] [D0] [D0] + [N] [PC] N IF [D0] = 0 THEN [PC] N
Load D0 from memory Store D0 in memory Our rst computer has a single data register (i.e. Add memory to D0 accumulator) called D0 and all instructions are Branch to location N memory to register apart from the store and the If [D0] = 0 then branch to N
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 302
Incrementer
Address
Memory Path between operand field of IR and bus A to literals MBR Op-code Address Memory buffer register
Data
Data register D1
Address register A0 Control signals A Bus A The output from the ALU is fed via bus C to the registers f(A,B) Bus C B Bus B CCR Condition code register ALU
We have provided only ve instructions because these are illustrative of all instructions. This computer has an 8-bit instruction format that includes both the op-code and the operand. If we chose a 3-bit op-code (eight instructions) and a 4-bit operand (a 16-bit memory), the remaining bit can be used to specify the addressing mode (absolute or literal). Real 8-bit microprocessors solve the problem of instruction set design by using 1 byte to provide an operation code and then 0, 1, or 2 succeeding bytes to provide an operand.
Figure 7.6 denes the structure of an 8-bit instruction for our simulated machine. The rst step in constructing a simulator is to describe the action of the computer in pseudocode.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 303
303
Op-code 000 = LDA 001 = STA 010 = ADD 011 = BRA 100 = BEQ
C allows you to operate on individual bits of a byte; for example, the operator n performs a right shift by n bits. The opcode is obtained from the three most-signicant bits of the IR by shifting right ve times. A bitwise logical AND can be performed between a variable and a hexadecimal value; for example IR & 0x0F ANDs the IR with 000011112 to extract the operand bits in the four least-signicant bit positions. Once weve extracted the addressing mode (bit 4 of the instruction register) with amode (IR & 0x10) 4, we can calculate the source operand for the load and add instruction by
We can now write a program to implement this algorithm. The following fragment of code is largely self-explanatory. The instruction in the 8-bit instruction register (IR) is decoded by the three operations
The following listing provides the C code for this CPU simulator.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 304
Most of the work done in the simulator takes place in the switch construct at the end of this program where each instruction is interpreted.
Instruction 7 6 5 4 3 2 1 0 7 6
Operand 5 4 3 2 1 0
Op-code
Addressing mode 00 = absolute 01 = literal (immediate) 10 = indexed 11 = PC relative Direction 0 = register to memory 1 = memory to register
provided program counter relative addressing (discussed in the next chapter) in which the operand is specied with respect to the current value of the program counter; for example, MOVE D0,12(PC) means store the conCopy contents of the PC to the MAR tents of data register D0 12 bytes on from the locaIncrement contents of the PC tion pointed at by the program counter. Read the instruction from memory The instruction itself is divided into four elds, Move the instruction to the IR as Fig. 7.7 demonstrates. A 4-bit op-code in bits 7, Save the op-code 6, 5, 4 provides up to 16 instructions. A 2-bit Copy contents of the PC to the MAR addressing mode in bits 1, 0 selects the way in Increment contents of the PC which the current operand is treated. When the Read the operand from memory addressing mode is 00, the operand provides the Move the operand to the IR address of the data to be used by the current Save the operand instruction. When the addressing mode is 01 the operand provides the actual (i.e. literal) operand. Modes 10 This multibyte instruction format is used by 8-bit and and 11 provide indexed and program counter relative 16-bit microprocessors. Indeed, the 68K has one 10-byte addressing respectively (i.e. the operand is added to the A0 register or the PC, respectively). instruction. Bit 2 of the instruction is a direction bit that determines The architecture of this computer is memory to register or register to memory; for example, it supports both ADD D0,M whether the source operand is in memory or is provided by and ADD M, D0 instructions. In addition to the direct and lit- the data register; for example, the difference between eral addressing modes, we have provided address register MOVE D0,123 and MOVE 123,D0 is determined by the value indirect addressing with a single A0 register. We have also of the direction bit.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 305
305
We can express the basic fetch cycle and decode instruction phase in C as
Each instruction is executed by means of the switch construct. Note that the CCR has only a zero ag (it would have been more complex to have provided a C- and V-bit). The following provides the complete code for the processor.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 306
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 307
307
Now that we have examined the sequence of events that take place during the execution of an instruction, the next step is to demonstrate how the binary code of an instruction is translated into the actions that
implement the instruction. In the next two sections we describe two different types of control unit; the microprogrammed control unit and the so-called random logic control unit.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 308
to execute each machine code instruction and what logic elements do we need to implement them? In other words, designers resort to the Boolean techniques we described in Chapter 2. The word random in the expression random logic element is employed in a specic sense and implies that the arrangement of gates from which the control unit is constructed varies widely from computer to computer. The same microprogrammed control unit can readily be adapted to suit many different computers with relatively little modication, whereas the random logic control unit is dedicated to a specic CPU and cannot easily be modied.
R W
Data out EMSR Data in Each tri-state buffer puts data on the bus when enabled
CMAR
MAR GMBR
CMBR
MBR
EMBR GIR
CIR
IR
EIR GPC
Bus
CPC
PC
EPC GDO The inputs to the ALU are fro register D0 and the system bus The output of the ALU is a function of inputs P and Q and is determined by control signals F0 and F1. The result is latched into the ALU register by CALU
CDO
D0
EDO P
F0 and F1 select the ALU F0 function and CALU clocks F the ALU output into the 1 C ALU register ALU
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 309
309
Signal number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Signal R W CMAR CMBR CPC CIR CD0 CALU EMSR EMBR EIR EPC ED0 EALU F0 F1
Type Memory control Memory control Register clock Register clock Register clock Register clock Register clock Register clock Bus control Bus control Bus control Bus control Bus control Bus control ALU control ALU control
Operation Read from memory Write to memory Clock data into MAR Clock data into MBR Clock data into program counter Clock data into instruction register Clock data into data register Clock data into ALU register Gate data from the memory onto the bus Gate data from the MBR onto the bus Gate operand from IR onto the bus Gate program counter onto the bus Gate data register onto the bus Gate ALU funtion register onto the bus Select ALU function bit 0 Select ALU funtion bit 1
comes only from the system bus. Note that this structure allows the memory to transfer data directly to or from any register; that is, all data does not have to pass through the MBR. The memory receives the address of the location to be accessed directly from the MAR, whose output is permanently connected to the memorys address input. A dedicated connection between the MAR and memory is possible because the memory never receives an address input from a source other than the memory address register. A permanent connection removes the need for bus control circuits. Two data paths link the memory to the system bus. In a read cycle when memory control input R is asserted, data is transferred from the memory to the system bus via tri-state gate GMSR. During a memory write cycle when memory control input W is asserted, data is transferred from the system bus directly to the memory. The MBR, data register, program counter, and instruction register are each connected to the system bus in the same way. When one of these registers wishes to place data on the bus, its tri-state gate is enabled. Conversely, data is copied into a register from the bus by clocking the register. The instruction register (IR) receives data from the memory directly, without the data having to pass through the MBR. The ALU receives data from two sources, the system bus and data register D0, and places its own output on the system bus. This arrangement begs the question, If the ALU gets data from the system bus how can it put data on the same bus at the same time it is receiving data from this bus? Figure 7.8 shows that the ALU contains an internal ALU register. When
F1 0 0 1 1
F0 0 1 0 1
this register is clocked by CALU, the output from the ALU is captured and can be put on the system bus by enabling gate GALU. Table 7.4 denes the 16 control signals in Fig. 7.8. Instruction decoding takes an instruction and uses it to create a sequence of 16-bit signals that control the system in Fig. 7.8. The ALU is controlled by a two-bit code, F1, F0, which determines its functions as dened in Table 7.5. These operations are representative of real instructions, although a practical ALU would implement, typically, 16 different functions. In order to keep the design of a random logic control unit as simple as possible, we will construct a 3-bit operation code giving a total of eight instructions. This instruction set dened in Table 7.6 presents a very primitive instruction set indeed, but it does include the types of instruction found in real rst-generation processors. We have dened explicit LOAD and STORE instructions rather than a single MOVE instruction which does the work of both LOAD and STORE. Having constructed an instruction set, we dene each of the instructions in terms of RTL and determine the
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 310
Op-code
Mnemonic
Operation
Table 7.6 A primitive instruction set for the CPU of Fig. 7.8.
Instruction Op-code Operations (RTL) Fetch [MAR] [IR] [ALU] [PC] 000 [MAR] [D0] [PC] [[MAR]] [PC] [ALU]
Control actions EPC R EPC E ALU E IR R =1 = 1, EMSR = 1, = 1, F 1,F 0 = 1,0, =1 =1 = 1, E MSR = 1, C MAR C IR C ALU C PC C MAR C D0 C MAR W=1 C MAR C MBR C ALU C D0 C MAR C MBR C ALU C D0 C MAR C MBR C ALU W=1 C MAR C MBR C ALU W=1 C PC IF Z = 1 THEN CPC
LOAD
[IR] [[MAR]]
STORE
001
[MAR] [IR] [[MAR]] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [[MAR]] [MAR] [MBR] [ALU] [[MAR]] [PC] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU]
E IR = 1 E D0 = 1 E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,0, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 1,1, =1 =1
ADD
010
SUB
011
INC
100
DEC
101
BRA BEQ
110 111
[IR]
E IR = 1
Table 7.7 Interpreting the instruction set of Table 7.6 in RTL and microinstructions.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 311
311
Consider the load D0 from memory operation; this requires the following two steps:
Then we have to put the memory in read mode, put the data from the memory onto the bus by enabling the GMSR gate, and nally capture the data in D0 by clocking register D0.
CMAR EMSR = 1,
CD0
We have to send the operand address in the instruction register to the memory address register by enabling the GIR gate and then clocking the data into the memory address register.
Instruction Operations (RTL) Control actions R Fetch [MAR] [IR] [ALU] [PC] LOAD [MAR] [D0] STORE [MAR] [PC] [PC] [ALU] [IR] 0 0 0 0 W 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 CMAR 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 CMBR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPC 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CIR 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CD0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
CALU 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
EMSR 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
EMBR 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
EIR 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
EPC 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ED0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EALU 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0
F1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
F0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
[[MAR]] 1
[[MAR]] 1 [IR] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[[MAR]] [D0] ADD [MAR] [MBR] [ALU] [D0] SUB [MAR] [MBR] [ALU] [D0] INC [MAR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR]
[[MAR]] 1
[[MAR]] 1
[[MAR]] 1
[[MAR]] 1
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 312
Fetch phase
Clock Put [IR] on bus EIR Enable IR to bus CMAR Clock data into MAR R Read memory EMSR Enable memory to bus CD0 Clock data into D0 Capture data in D0 Put operand on bus Read operand
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 313
313
000 LOAD
N,D0
111 BEQ
CLR 1 Clock 1 K J C Q 1 K Q 1 J
CLR Q C Q 1 K 1 J
CLR Q C Q
T0
T7
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 314
MBR
IR
PC
E E
D0
ALU
MAR
MBR
W E
CIR
MSR
Figure 7.14 The OR gate array used to generate the actual microinstructions.
Each microinstruction is activated by one or more control signals from the nine columns of AND gates. Figure 7.14 shows the array of OR gates that combine the outputs from the AND gates to generate the control signals. The inputs from these OR gates come from the nine columns of AND gates.
always in one of two states (fetch or execute), an RS ip-op provides a convenient way of switching from one state to another. When Q 0 the current operation is a fetch phase, and when Q 1 an execute phase is being performed. Figure 7.15 is an extension of Figure 7.13 and demonstrates how the instruction decoder is enabled by the Q output of _ the fetchexecute ip-op, and the fetch decoder by the Q output. At the end of each fetch phase, a clock pulse from the timing generator sets the fetchexecute ip-op, permitting the current op-code to be decoded and executed. The timingpulse generator is reset at the end of each fetch. At the end of each execute phase, the fetchexecute ip-op is cleared and the sequencer reset, enabling the next fetch phase to begin. Table 7.9 shows how the machine-level instructions can be represented in terms of both timing signals and microinstructions. Note that weve included the micro-operation [MAR] [IR] in the fetch phase. The microinstructions are the enable signals to the bus drivers, the register clocks, the ALU function select bits, the memory controls (R and W), and the reset and set inputs of the fetchexecute ip-op. For each of the microinstructions we can write down a Boolean expression in terms of the machine-level instruction and the sequence of timing pulses. For example, consider expressions for EMBR, EIR, and CMAR. EMBR ADD T1 SUB T1 INC T1 DEC T1 EIR Fetch T4 BRA T0 BEQ T0 CMAR Fetch T0 Fetch T4
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 315
315
Q Fetchexecute flip-flop
EXECUTE
T1
T2 At the end of each operation the last T timing pulse is used 3 to reset the fethexecute flip-flop T4
In the next section we look at how the design of a control unit can be simplied by putting the sequence of microoperations in a table and then reading them from a table, rather than by synthesizing them in hard logic.
data at the registers D input terminals is transferred to its output terminals and held constant until the register is clocked again. The connections between the registers are by means of m-bit wide data highways, which are drawn as a single bold line. The output from each register can be gated onto the bus by enabling the appropriate tri-state buffer. We have used a multiplexer, labeled MPLX, to select the program counters input from either the incrementer or the operand eld of the instruction register. The multiplexer is controlled by the 1-bit signal Mux, where Mux 0 selects the incrementer path, and Mux 1 selects the branch target address from the address/operand eld of the instruction register, IRaddress. Suppose our computer performs a fetchexecute cycle in which the op-code is ADD N,D0. This instruction adds the contents of the memory location specied by the operand eld N to the contents of the data register (i.e. D0) and deposits the result in D0. We can write down the sequence of operations that take place during the execution of ADD not only in terms of register transfer language, but also in terms of the enabling of gates and the clocking of ip-ops. Table 7.10 illustrates the sequence of microinstructions executed during the fetchexecute cycle of an ADD instruction. It should be emphasized that the fetch phase of all instructions is identical and it is only the execute phase that varies according to the nature of the op-code read during the fetch phase.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 316
If, for each of the seven steps in Table 7.11, the 17 signals are fed to the various parts of the CPU in Fig. 7.16, then the fetchexecute cycle will be carried out. Real microprogrammed computers might use 64 to 200 control signals rather than the 17 in this example. One of the most signicant differences between a microinstruction and a macroinstruction is that the former contains many elds and may provide several operands, while the macroinstruction frequently species only an op-code and one or two operands. The seven steps in Table 7.11 represent a microprogram that interprets a fetch phase followed by an ADD instruction. We have demonstrated that a macroinstruction is interpreted by executing a microprogram, which comprises a sequence of microinstructions. Each of the CPUs instructions has its own microprogram. We now look at the
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 317
317
PC
Q D G1
Clock MAR
D Q
G3 G4 G5 Bus A
Q IR
Clock MBR
MBR
Q D
G6 G7
Bus B
Control units
D0
Q D A B
ALU
Copy bus A to IR
Step 1 1a 2 3 4 4a 5 6 7 7a 7b
Note 1 Where there is no entry in the column labeled Operations required, that operation happens automatically. For example, the output of the program counter is always connected to the input of the incrementer and therefore no explicit operation is needed to move the contents of the PC to the incrementer. Note 2 Any three-state gate not explicitly mentioned is not enabled. Note 3 Steps 1, 1a are carried out simultaneously, as are 4, 4a and 7, 7a, 7b.
Table 7.10 Interpreting a fetchexecute cycle for an ADD N,D0 instruction in terms of RTL.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 318
Table 7.11 Control signals generated during the fetch and execution phases of an ADD instruction.
Address mapper
Incrementer
Microprogram counter
Clock
Address Microprogram memory Data Microinstruction register Control signals to other parts of the CPU
Branch condition
Multiplexer
Branch on zero Branch on not zero Branch never (logical zero) Branch always (logical one)
microprogram itself and consider the hardware required to execute it. The microprogram is executed by the same type of mechanism used to execute the macroprogram (i.e., machine code) itself. This is a good example of the common expression wheels within wheels. Figure 7.17 describes the basic structure of a microprogrammed control unit that has a microprogram counter, a microprogram memory, and a microinstruction register (this structure is typical of the 1980s). The microinstruction
address from the microprogram counter is applied to the address input of the microprogram memory and the data output of the memory fed to the microinstruction register. As weve said, the structure of the control unit that executes the macroinstruction is very much like the structure of the CPU itself. However, there is one very big difference between the macroinstruction world and the microinstruction worldthe microinstruction register is very much longer than the macroinstruction register and the microinstructions
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 319
319
structure is much more complex that of the macroinstruction. Information in the microinstruction register is divided into four elds: next microinstruction address eld, microprogram counter load control eld, condition select eld, and CPU control eld. Most of the bits in the microinstruction register belong to the CPU control eld, which controls the ow of information within the CPU by enabling tri-state gates and clocking registers as weve described; for example, all the control signals in Table 7.11 belong to this eld. Our next task is to describe one of the principal differences between the micro- and macroinstruction. Each microinstruction is also a conditional branch instruction that determines the location of the next microinstruction to be executed. We will now explain how microinstructions are sequenced.
the contents of the next microinstruction address eld of the current microinstruction eld into the microprogram counter, a branch can be made to any point in the microprogram memory. In other words each microinstruction determines whether the next microinstruction is taken in sequence or whether it is taken from the next address eld of the current microinstruction. The obvious question to ask is, What determines whether the microprogram counter continues in sequence or is loaded from the next microinstruction address eld of the current microinstruction? The microprogram load control eld in the microinstruction register tells the microprogram counter how to get the next microinstruction address. This next address can come from the incrementer and cause the microprogram to continue in sequence. The next address can also be obtained from the address mapper (see below) or from the address in the next microinstruction address eld of the microinstruction register. The condition select eld in the microinstruction register implements conditional branches at the macroinstruction level by executing a conditional branch at the microinstruction level. In the simplied arrangement of Fig. 7.17, the condition select eld directly controls a 4-to-1 multiplexer that selects one of four ag bits representing the state of the CPU. These ag bits are obtained from the ALU and are usually the ag bits in the condition code register (e.g. Z, N, C, V). The condition select eld selects one of these ag bits for testing (in this example only the Z-bit is used). If the output of the multiplexer is true, a microprogram jump is made to the address specied by the contents of the next microinstruction address eld, otherwise the microprogram continues sequentially. In Fig. 7.17 two of the conditions are obtained from the CCR and two bits are permanently true and false. A false condition implies branch never (i.e. continue) and a true condition implies branch always (i.e. goto). To emphasize what weve just said, consider the hypothetical microinstruction of Fig. 7.18. This microinstruction is interpreted as:
IF Z = 1 THEN [PC] ADD3 ELSE [PC] [PC] + 1
where PC indicates the microprogram counter. A conditional branch at the macroinstruction level (e.g. BEQ) is interpreted by microinstructions in the following
Location of the next microinstruction to execute if a branch is taken Next address field ADD3
This field selects the This field determines where condition to be tested when making a the next microinstruction conditional branch address comes from
This field provide the CPU control signals that select source and destination registers, control buses, and determine the ALU function
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 320
An arithmetic unit of any length (as long as it is a multiple of 4) is constructed by connecting together bit-slice ALUs. Designers can use the ALUs internal registers in any way they desire. For example, they may choose to implement eight addressable data registers, two stack pointers (described later), two index registers, a program counter, and three scratchpad registers. Flexibility is the most powerful feature of bit-slice microprocessors. This description of the microprogrammed control unit is highly simplied. In practice the microprogram might include facilities for dealing with interrupts, the memory system, input/output, and so on. One of the advantages of a microprogrammed control unit is that it is possible to alter the content of the microprogram memory (sometimes called the control store) and hence design your own machine-level instructions. In fact it is perfectly possible to choose a set of microprograms that will execute the machine code of an entirely different computer. In this case the computer is said to emulate another computer. Such a facility is useful if you are changing your old computer to a new one whose own machine code is incompatible with your old programs. Emulation applies to programs that exist in binary (object) form on tape or disk. By writing microprograms (on the new machine) to interpret the machine code of the old machine, you can use the old software and still get the advantages of the new machine. One of the greatest problems in the design of a bit-slice computer lies in the construction and testing of the
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 321
321
RAM0
Q Q register
C Z N V
Bit slice control functions: ALU control, source and destination control
microprogram. You can, or course, write a program to emulate the bit-slice processor on another computer. A popular method of developing a microprogram is to replace the microprogram ROM with read/write memory and to access this memory with a conventional microprocessor. That is, the microprogram memory is common to both the bit-slice system and the microprocessor. In this way, the microprocessor can input a microprogram in mnemonic form, edit it, assemble it, and then pass control to the bit-slice system. The microprocessor may even monitor the operation of the bit-slice system. Such a microprogram memory is called a writable control store and once a writable control store was regarded as a big selling point of microprogrammed minicomputers and mainframes. However, we have already pointed out that a microprogrammable control store is of very little practical use due to the lack of applications
software. Even if a computer user has the expertise to design new microprogrammed macroinstructions, it is unlikely that the system software and compilers will be able to make use of these new instructions. Finally, RISC technology (as we shall see) does not use microprogramming and interest in microprogramming is much less than it once was. In the next chapter we look at how the performance of computers can be enhanced by three very different techniques. We begin with a brief introduction to the RISC revolution of the 1970s and 1980s and show how processors with regular instruction sets lend themselves to pipelining (the overlapping of instruction execution). We also look at cache memory and explain how a small quantity of very-high-speed random access memory can radically improve a computers performance. Finally, we describe the multiprocessora system that uses more than one processing unit to accelerate performance.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 322
I PROBLEMS
7.1 Within a CPU, what is the difference between an address path and a data path? 7.2 In the context of a machine-level instruction, what is an operand? 7.3 What is a literal operand? 7.4 How does a computer know whether an operand in its instruction register is a literal or a reference to memory (i.e. an address)? 7.5 Why is the program counter a pointer and not a counter? 7.6 Explain the function of the following registers in a CPU: (a) (b) (c) (d) PC MAR MBR IR
7.7 What is the CCR? 7.8 Does a computer need data registers? 7.9 Some microprocessors have one general-purpose data register, some two, some eight, and so on. What do you think determines the number of such general-purpose data registers in any given computer? 7.10 What is the signicance of the fetchexecute cycle? 7.11 What is the so-called von Neumann bottleneck? 7.12 Design a computer (at the register and bus level) to implement a zero address instruction set architecture. 7.13 In the context of CPU design, what is a random logic control unit? What is the meaning of the word random in this expression? 7.14 What is a microprogrammed control unit? 7.15 Microprogramming has now fallen into disfavor. Why do you think this is so? 7.16 For the computer structure of Fig. 7.20, state the sequence of micro-operations necessary to carry out the following instruction. Assume that the current instruction is in the IR.
ADDsquare D0, D1
This instruction reads the contents of register D0, squares that value, and then adds it to the contents of register D1. The result
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 323
323
CMBR
This computer has two internal buses A and B. All registers capture data from a bus when they are clocked. All tri-state gates can put data on a bus when they are enabled. The function of the ALU is controlled by inputs F2, F1, F0. This memory is controlled by a read signal and a write signal. The ALU performs an operation on one or both its P and Q inputs depending on the state of its control inputs. Data iputs for the ALU come from two registers, latch1 and latch 2.
CIR
CPC
CD0
D0
ED0 GD1
CD1 D1 CB_to_A EB_to_A ALU CALU f(P, Q) EALU Function select Q Latch 2 P CL1 Latch 1
ED1
F2,
F1,
F0
CL2
is put in register D1. The function codes F2,F1, and F0 are given below.
Assume that only one operand, A, is required by the instruction and that operands B and C are in the next consecutive two memory locations, respectively. 7.18 For the architecture of the hypothetical two-bus computer of Fig. 7.20, derive a microprogram to carry out the operation
Move D0,[D1]
F2 0 0 0 0 1 1
F1 0 0 1 1 0 0
F0 0 1 0 1 0 1
Operation Copy P to F Add P to Q Subtract Q from P Add 1 to P Add 1 to Q Multiply P by Q F=P F=P+Q F=PQ F=P+1 F=Q+1 F=Qx1
7.17 For the structure of Fig. 7.20 write a microprogram to implement the operation
D1=[A]+[B]+[C]+1
This operation copies the contents of register D0 into the memory location whose address is given by the contents of register D1. You should describe the actions that occur in plain English (e.g. Put data from this register on the B bus) and as a sequence of events (e.g. Read 1, EMSR). The table in Question 16 denes the effect of the ALUs function code. All data has to pass through the ALU to get from bus B to bus A. Note that the ALU has two input latches. Data has to be loaded into these latches before an ALU operation takes place.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 324
Your answer should explain what each of the microoperations does individually. You should also state what these actions achieve collectively; that is, what is the effect of the equivalent assembly language operation?
EALU,
EALU,