We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
3.1.2 Additional Features
Next we examine some more advanced features of CPUs and look at representative
Sears extensions. There are many ways in which the basic design of
eae 3.3 can be improved. Most recent CPUs contain the following extensions,
which significantly improve their performance and ease of programming.
* Muitipurpose\register Set for Storing dala@land GAdresse? These replace the accumu-
lator AC and the auxiliary registers DR and AR of our basic CPU. The resulting CPU
is sometimes said to have the general register organization exemplified by the third-
generation IBM System/360-370 (Figure 1.17), which has 32 such registers. The set
of general registers is now usually referred to as a register file.
+ AdditionaldataminStrNCtiOn Nand @adPreSSIpeS) Most CPUs have instructions to han-
die data and addresses with several different word sizes and formats. Although some
microprocessors have only add and subtract instructions in the arithmetic category,
relatively little extra circuitry is required for (fixed-point) multiply and divide
instructions, which simplify many programming tasks. Call and return instructions
also simplify program design.
+ Register to indicate compuration satus; A stats register (also called condition
code or flag register) indicates infrequent or exceptional conditions resulting from
the instruction execution. Examples are the appearance of an all-zero result or an
invalid instruction like divide by zero. A status register can piso indicate the wa
supervisor states. Conditional branch instructions can test the status register, whicl
simplifies the programming of conditional actionsSECTION 3.1
CPU Organization
| registers and instructions facilitate
dure calling or external intey
which employs pare,
of te
sh-down stack (see also Example 1.5). The stack
aa ae its bout an interrupted program via musa
is intended for savi vy information al
led for saving key informal u
tions so that the saved information can De retrieved later via pop operations, pera.
‘ions s
address register called a stack pointer automatically keeps track of the stack’s
point. }
Figure 3.7 shows the organization of a processor with the foregoing features
has a register file in the DPU for data and/or address storage. The ALU ob
most of its operands from the register file and also stores most of its results there. §
status register monitors the output of the ALU and other key points. The Principat
Various special
fer of control among programs duc (0 proces
Special circuits are included for address computation, although the main ALU cap
also be used for this purpose. The control circuits in the PCU derive their ‘inputs
from the instruction register, which stores the opcode of the current instruction, ang
Data processing unit DPU
Register. [>] Arithmetic
file : logic unit
Status
register register
7
To Mand aaa +
(Oayeicm L System bus
if ¥
rem 4
control
Geercu Address Instructi
register register
I ir
Progam 1
as Address Control
Stack circuits
pointer sTiiacastal
’
Perce mage
Internal control signals
s
Figure 3.7
A typical CPU with the general register organization,0 ee rs
the status register. Communication with
; the outside —
transmits address, data, and contro} ; f utside world
system. Various nonpro; trol information amon; ie oo that
stem. grammable “buffer” regi 'g the CPU, M, and the IO
points between the system bus and the CPU," SCTVE a temporary storage
modern
CPUs employ a
(eveliparallelism, Such parallelisti may he mre, 2 ste fs natn
the DPU or in the overlapping of th 'y be present in the internal organization of
These features add to the CPU's coy
this book.
The considerable potential for Paral
€ Operations carried
; out by th
mplenity and will be explored in woe eee
be explored in depth later in
0 a lel processing at the i i is evi
dent even in the simple CPU of Figure 3.3. We see from the cascades
ure 3.6 that the main PCU and DPU activities take place in different clock Eley
If these activities do not share a resource such as the system bus, th be ;
ried out at the same time, : ae Resin
u J ‘or example, the
three-instruction negation routine we gave earlier to cl ange AC to -AC on be
executed as follows in the style of Figure 3.6
Clock Instruction
cycle cycle PC PCU actions DPU actions
1 MOV DR,AC 2000 IRAR = M(PC),PC = PC+1
2 2001 DR:= AC
3 SUB 2001 IR.AR = M(PC), PC:= PC +1
4 2002 AC:= AC-DR
e SUB 2002 IR.AR := M(PC), PC:= PC +1
6 2003 AC:=AC-DR
‘as shown below. (We use subscripts to distinguish the first and
second SUB instructions.)
Clock Instruction rhe
cycle cycle EG xt POU SRO:
IR.AR = M(PC), PC == PC +1
DPU actions
1 MOV 2000
2 MOV/SUB, 2001 JRAR= (PC), PC = PC +1
=PC+1
3 SUB,/SUB, 2002. =IRAR = M(PC), PC = PC
4 SUB, 2003
tching and execution is an example of
instruction :
CGRERIpSIIRIED, ich is an portant 592280 above
stage Pl =
Figure 33 illusiraes graphically the type of WO-st086 PPT Tie stages of
s a
Each instruction can be thought of as passinga
ows
Instruction J,
Instruction /,
Instruction fy (branch)
Instruction fy
‘Clock cycle 1
processing: a fetch stage implemented mainly by the PCU and an execution stage
implemented mainly by the DPU. Hence two instructions can be processed simul.
taneously in every CPU clock cycle, with one completing its fetch phase and the
cxher completing its execute phase. Ajtwo=stage pipeline can therefore double'the
‘every clock cycle.
A problem arises when a (Gfafichlinstructionislencountered, such as the FE
Joop instruction stored in address (line) 17 of the multiplication program (Figure
&
Immediately before this instruction is fetched in some clock cycle i the pro-
counter PC stores the address 17.
ae
that instruction is not even in the multiplication
program. In clock cycle i + 1, BRA is executed, which causes (@@pl=tSpto be loaded
into PC, implying that the next instruction should be taken from location 5. The
fetching of this instruction can’t begin until cycle i + 2, however, as illustrated in
Figure 3.8 with i = 4. It follows that
Thus we see that branch instructions reduce the efficiency of instruction pipe-
lining, although we will see later that steps can be taken to reduce this problem. We
will also see that instruction processing is usually broken into more than two stages
to increase the level of the parallelism attainable.
EXAMPLE 3.2 THE ARM6 MICROPROCESSOR [VAN SOMEREN AND ATACK
1994). We now examine in some detail the
in a relatively direct and elegant form. The
ARM has ils origins in the Acom RISC Machine, a microprocessor developed in the
United Kingdom in the 1980s to serve as the CPU of a personal computer. Subse
quently. the family name was changed—without changing its acronym, however—!0
GRIST 11 ARM family is primarily aimed at low-cost, low-powseh
sPpleaions such as porable computers and games. cx the Newton, a hand
held “personal digital assistant” ae
introduced by Apple Corp. in 1993 employs he
i ee whose main features are described Stays
:
in that Is
aaa It has a z id store
instructions can address exte pneolonly sts oat
tem/360, mai ‘mal memory M. As in most computers since the IBM Rrthe bo arate
i also referred to as 4
gigabytes : The ARM6 employs an instruction pipeline to meet the goal of
one instructiof €xecuted per CPU clock cycle, Note that it shares all these features with
a more powerful (and more expensive) RISC microprocessor, the PowerPC (Example
1.7). The ARM6's instruction set is much smallot than the Pr
C obePCL :
has no floating-point instructions, for exan SEC Newt
The
thas a 32-bit
ALU and a file of 32-bit general-purpose registers. rect interaction between
data and control registers, the ARM has the unusual feature of placing its PC and status
registers in the register file; conceptually, we will continue to view these registers as
part of the PCU. There are
register file appears to contain
"as well as a current program status register
designated{@PSR? (Additional registers, which we will not discuss here, are used when
the CPU is in other operating modes; they are “iiViSIBICUMMMSEEMGdE) The ALU is
designed to perform basic arithmetic operations on 32-bit integers, Itemploys
i cgaNeDp CHUNG oR ae eNeacoa dada
to that described in Example 2.7 for multiplication. A is
SSS address-manipulation operations such as PC := PC + 1
Access to external memory M (a cache or main memory) is
straightforward. The address of the desired location in M is placed in the PCU’s address
register. In the case of a the data to be stored is also placed in the
‘DPU’s write data register aa causes a data word to be fetched from
memory and placed in the Several internal buses transfer data effi-
ciently among the DPU’s registers and data processing circuits.
All ARM6 instructions are 32 bits long, and they have a SaRiGORSRTatSanay
ore are a Co ir are listed in Fig-
Ve have omitted block move and coprocessor instructions.) This number is
ly small, however, as instructions have options that substantially increase the
number of operations they can perform. Most instructions can be applied either to 32-
bit operands (words) or to 8-bit operands (bytes). Qperanasianalwdaresses are lusually
SCATTERED that can be ferredyONbyyshoryyAsbitimames, allowing a single
Sa ara 8 iss pce
is shared between memory an levices Consequently, the
load/store instructions used for CPU-memory transfers are also used for IO opera-
tions.
‘Any instruction can be COR@itiOnallysexecuted, meaning that execution may or
may not occur depending on the value of esis EE SRE
‘The status flags are set by a previous instruction and include a negative flag N (the pre-
vious result R computed by the ALU was a negative number), a zero flag Z (R was
zero), a carry flag C (R generated an output carry), and an overflow flag V (R generated
a sign overflow). Hence every ARM6 instruction is effectively combined with a condi-
tional branch instruction. The basic unconditional move instruction MOV RO, RI can
have any of 15 conditions attached to it to determine if it is to be executed (see problem
3.8). Some examples:
MOVCC RO0,RI ;If flag C = 0, then RO:=R1
MOVCS RO,RI jf flag C = 1, then RO:=R1
MOVHI R0,RI ; If flag C= | and flag Z = 0, then RO :=R1
151152
SECTION 3.1
CPU Organization
Register file
[_sansreisers |
Program counter PC
Arithmetic-
logic unit
Y
Write data register Read data register
| $
Y
m0 Syaembs
yoy
Address register
I F
Address incrementer
Instruction register
Control
circuits
Program control unit PCU.
ceetoone ofits operands. For instance, '** * Shift r rotation operation that is applied
MOV RO,R1,LSL #2
sROc=RIx4 oes Ay
‘means logically left shift (LSL)
‘This shift is tantamount to multiplying R1 by four before the move
‘The opcode suffix S specifies whether or not status
: ‘ an i
IFS is present, appropriate flags are changed; otherwive, the age conceal Te 2
example, the ARM6's move instructions affect the N, Z, and € flags, so appending $
eee
Type Instruction a ‘Assembly- Narrative
_ format Janguage format format (comment)
Daa Moveregister 3 MOV R3R9
Copy contents of register RO to register R3.
transfer Moveregister RO MOV RO#12 Copy operand (decimal number 12) to reg-
ister RO.
Move inverted RT MYN_ R7,RO Copy bitwise inverted contents of RO to RT
Load RS LDR RS, adr Load RS with contents of memory location
adr.
Store M(adr) := R8 STR R8.adr Store contents of R8 in memory location
adr.
Data Add R3:= RS +25 ADD R3,R5,#25 Add 25 to RS; place sum in R3.
processing Add with carry R3-=R5+R6+C ADC R3RSR6 Add R6 and camry bit C to RS: place sum im
R3
Subtract R3 SUB R3,.R5#9 Subtract 9 from RS; place difference im RS.
Subtract with R3 SBC R3.R5,9 Subtract 9 and borrow bit from RS: place
carry difference in R3
Reverse subtract R3:=9-RS RSB R3,R5,#9 Subtract RS from 9: place difference in R3.
Reverse subtract. R3:=9-RS—C RSC. R3,R5#9._ Subtract RS and borrow bit from 9; place
with carry difference in R3.
Multiply RL :=R3xR2 MUL RLR2,R3. Multiply R3 by R2; place result im RI
Multiply and add RI =(R3XR2)+R4 MLA RLR2RIR4 Multiply RS by R2; dd RS; place result in
RI
And Ré4- 1001, then AO — 1001 > Oa
CMPA sets the zero flag Z to 0, indicating a nonzero result, (It also sets various olhet
ree by this program). When AO finally reaches 1001, AQ — 1001 = 0,8
tOZeRD, te og NOM the last instruction BNE, which stands for branch ifmot dil
210, is a conditional branch instruction whose operation is described by
ifZ #1 then PC = START
ao to exit from the program, :
tis interesting to : :
ee nian eae this 680X0 program with the similar programs £1oo Se of the 68020 includes fixed-
n “based instructions for transferri -
trol between Programs. Hardware-implemented floating-point stanton gee
available directly; however, they are provided indirectly by means of an auxili
IC, the 68881 floating-point coproc isons for
; Icroprocessor so that instructi 7
cuted by P can be aactiiaen iN programs fetched by the mG SE te
coprocessor serves as an extension to the microprocessor and forms part of the
CPU as indicated in Figure 3.14,
The 68881 (and the similar but faster 68882
registers for storing floating-point numbers of various formats, including 32+ and
64-bit numbers conforming to the standard IEEE 754 format (presented later).
Additional control registers in the 68881 allow it to communicate with the
68020. A set of coprocessor instructions are defined for the 68020; they contain
command fields specifying floating-point operations that the 68881 can execute.
When the 68020 fetches and decodes such an instruction, it transfers the com-
mand portion to the coprocessor, which then executes it. Further exchanges take
place between the main processor and the coprocessor until the coprocessor com-
pletes execution of its current operation, at which point the 68020 proceeds to
its next instruction. The commands executed by the 68881 include the basic
) contains a set of eight 80-bit
a Main memory
i Read-write
68020 el Read-only a
floating memory (RAM)
micro- pont (ROM)
Processor |_| coprocessor
+ tk
7 tit can aE
+ 32-bit address bus t 5
ft 32-bit data bus y i f ;
System ¥ | fee) - =
a ea iz came i ;
Le = +
Lio
Input-output
npc ae interface circuit
interface circuit : oer
QO port)
TO device
10 device
Figure 3.14
68020-based microcomputer with floating-point coprocessor.multiply, and divide), square roo
print a f piibineiric ae on
in similar fashion. nee
Pee aye a aeigics in VLSI to integrate a floating-point (co)p
into the CPU chip.
i tem/360-370 and the ARM6, the Cpy
design features. Like the IBM Sys
has ere state intended for operating system use and a a for
d 3.12 indicate, certain “privileged” contro
Ren ist can vo sed only in the supervisor state. User and super.
isters and instructions can be used 1
way phos’ are thus clearly separated—for example, they employ different
stack pointers—thereby improving system security. oe na are
also designed to allow easy implementation of virtual memory, w' aie ne Oper
ating system makes the main memory appear larger to user programs than it really
is. Hardware support for virtual memory is provided by the 68851 memory man-
agement unit (MMU), another 680X0 coprocessor. .
Provided they meet certain independence conditions, up to three 68020 instruc-
tions can be processed simultaneously in pipeline fashion. This pipelining is com-
plicated by the fact that instruction lengths and execution times vary, a problem that
RISCs try to eliminate. Another speedup feature found in the 68020 is a small
instruction-only cache (I-cache). The 68020 prefetches instructions from main
memory while the system bus is idle; the instructions can subsequently be read
much more quickly from the on-chip cache than from the off-chip main memory.
An unusual feature of the 68020 noted in Figure 3.11 is its use of two levels of
microprogramming to implement the CPU’s control logic. For the manufacturer,
this feature increases design flexibility while reducing IC area compared with con-
ventional (one-level) microprogrammed control.
(add, subtract,