0% found this document useful (0 votes)
21 views14 pages

CPU Organization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

CPU Organization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
3.1.2 Additional Features Next we examine some more advanced features of CPUs and look at representative Sears extensions. There are many ways in which the basic design of eae 3.3 can be improved. Most recent CPUs contain the following extensions, which significantly improve their performance and ease of programming. * Muitipurpose\register Set for Storing dala@land GAdresse? These replace the accumu- lator AC and the auxiliary registers DR and AR of our basic CPU. The resulting CPU is sometimes said to have the general register organization exemplified by the third- generation IBM System/360-370 (Figure 1.17), which has 32 such registers. The set of general registers is now usually referred to as a register file. + AdditionaldataminStrNCtiOn Nand @adPreSSIpeS) Most CPUs have instructions to han- die data and addresses with several different word sizes and formats. Although some microprocessors have only add and subtract instructions in the arithmetic category, relatively little extra circuitry is required for (fixed-point) multiply and divide instructions, which simplify many programming tasks. Call and return instructions also simplify program design. + Register to indicate compuration satus; A stats register (also called condition code or flag register) indicates infrequent or exceptional conditions resulting from the instruction execution. Examples are the appearance of an all-zero result or an invalid instruction like divide by zero. A status register can piso indicate the wa supervisor states. Conditional branch instructions can test the status register, whicl simplifies the programming of conditional actions SECTION 3.1 CPU Organization | registers and instructions facilitate dure calling or external intey which employs pare, of te sh-down stack (see also Example 1.5). The stack aa ae its bout an interrupted program via musa is intended for savi vy information al led for saving key informal u tions so that the saved information can De retrieved later via pop operations, pera. ‘ions s address register called a stack pointer automatically keeps track of the stack’s point. } Figure 3.7 shows the organization of a processor with the foregoing features has a register file in the DPU for data and/or address storage. The ALU ob most of its operands from the register file and also stores most of its results there. § status register monitors the output of the ALU and other key points. The Principat Various special fer of control among programs duc (0 proces Special circuits are included for address computation, although the main ALU cap also be used for this purpose. The control circuits in the PCU derive their ‘inputs from the instruction register, which stores the opcode of the current instruction, ang Data processing unit DPU Register. [>] Arithmetic file : logic unit Status register register 7 To Mand aaa + (Oayeicm L System bus if ¥ rem 4 control Geercu Address Instructi register register I ir Progam 1 as Address Control Stack circuits pointer sTiiacastal ’ Perce mage Internal control signals s Figure 3.7 A typical CPU with the general register organization, 0 ee rs the status register. Communication with ; the outside — transmits address, data, and contro} ; f utside world system. Various nonpro; trol information amon; ie oo that stem. grammable “buffer” regi 'g the CPU, M, and the IO points between the system bus and the CPU," SCTVE a temporary storage modern CPUs employ a (eveliparallelism, Such parallelisti may he mre, 2 ste fs natn the DPU or in the overlapping of th 'y be present in the internal organization of These features add to the CPU's coy this book. The considerable potential for Paral € Operations carried ; out by th mplenity and will be explored in woe eee be explored in depth later in 0 a lel processing at the i i is evi dent even in the simple CPU of Figure 3.3. We see from the cascades ure 3.6 that the main PCU and DPU activities take place in different clock Eley If these activities do not share a resource such as the system bus, th be ; ried out at the same time, : ae Resin u J ‘or example, the three-instruction negation routine we gave earlier to cl ange AC to -AC on be executed as follows in the style of Figure 3.6 Clock Instruction cycle cycle PC PCU actions DPU actions 1 MOV DR,AC 2000 IRAR = M(PC),PC = PC+1 2 2001 DR:= AC 3 SUB 2001 IR.AR = M(PC), PC:= PC +1 4 2002 AC:= AC-DR e SUB 2002 IR.AR := M(PC), PC:= PC +1 6 2003 AC:=AC-DR ‘as shown below. (We use subscripts to distinguish the first and second SUB instructions.) Clock Instruction rhe cycle cycle EG xt POU SRO: IR.AR = M(PC), PC == PC +1 DPU actions 1 MOV 2000 2 MOV/SUB, 2001 JRAR= (PC), PC = PC +1 =PC+1 3 SUB,/SUB, 2002. =IRAR = M(PC), PC = PC 4 SUB, 2003 tching and execution is an example of instruction : CGRERIpSIIRIED, ich is an portant 592280 above stage Pl = Figure 33 illusiraes graphically the type of WO-st086 PPT Tie stages of s a Each instruction can be thought of as passing a ows Instruction J, Instruction /, Instruction fy (branch) Instruction fy ‘Clock cycle 1 processing: a fetch stage implemented mainly by the PCU and an execution stage implemented mainly by the DPU. Hence two instructions can be processed simul. taneously in every CPU clock cycle, with one completing its fetch phase and the cxher completing its execute phase. Ajtwo=stage pipeline can therefore double'the ‘every clock cycle. A problem arises when a (Gfafichlinstructionislencountered, such as the FE Joop instruction stored in address (line) 17 of the multiplication program (Figure & Immediately before this instruction is fetched in some clock cycle i the pro- counter PC stores the address 17. ae that instruction is not even in the multiplication program. In clock cycle i + 1, BRA is executed, which causes (@@pl=tSpto be loaded into PC, implying that the next instruction should be taken from location 5. The fetching of this instruction can’t begin until cycle i + 2, however, as illustrated in Figure 3.8 with i = 4. It follows that Thus we see that branch instructions reduce the efficiency of instruction pipe- lining, although we will see later that steps can be taken to reduce this problem. We will also see that instruction processing is usually broken into more than two stages to increase the level of the parallelism attainable. EXAMPLE 3.2 THE ARM6 MICROPROCESSOR [VAN SOMEREN AND ATACK 1994). We now examine in some detail the in a relatively direct and elegant form. The ARM has ils origins in the Acom RISC Machine, a microprocessor developed in the United Kingdom in the 1980s to serve as the CPU of a personal computer. Subse quently. the family name was changed—without changing its acronym, however—!0 GRIST 11 ARM family is primarily aimed at low-cost, low-powseh sPpleaions such as porable computers and games. cx the Newton, a hand held “personal digital assistant” ae introduced by Apple Corp. in 1993 employs he i ee whose main features are described Stays : in that Is aaa It has a z id store instructions can address exte pneolonly sts oat tem/360, mai ‘mal memory M. As in most computers since the IBM Rr the bo arate i also referred to as 4 gigabytes : The ARM6 employs an instruction pipeline to meet the goal of one instructiof €xecuted per CPU clock cycle, Note that it shares all these features with a more powerful (and more expensive) RISC microprocessor, the PowerPC (Example 1.7). The ARM6's instruction set is much smallot than the Pr C obePCL : has no floating-point instructions, for exan SEC Newt The thas a 32-bit ALU and a file of 32-bit general-purpose registers. rect interaction between data and control registers, the ARM has the unusual feature of placing its PC and status registers in the register file; conceptually, we will continue to view these registers as part of the PCU. There are register file appears to contain "as well as a current program status register designated{@PSR? (Additional registers, which we will not discuss here, are used when the CPU is in other operating modes; they are “iiViSIBICUMMMSEEMGdE) The ALU is designed to perform basic arithmetic operations on 32-bit integers, Itemploys i cgaNeDp CHUNG oR ae eNeacoa dada to that described in Example 2.7 for multiplication. A is SSS address-manipulation operations such as PC := PC + 1 Access to external memory M (a cache or main memory) is straightforward. The address of the desired location in M is placed in the PCU’s address register. In the case of a the data to be stored is also placed in the ‘DPU’s write data register aa causes a data word to be fetched from memory and placed in the Several internal buses transfer data effi- ciently among the DPU’s registers and data processing circuits. All ARM6 instructions are 32 bits long, and they have a SaRiGORSRTatSanay ore are a Co ir are listed in Fig- Ve have omitted block move and coprocessor instructions.) This number is ly small, however, as instructions have options that substantially increase the number of operations they can perform. Most instructions can be applied either to 32- bit operands (words) or to 8-bit operands (bytes). Qperanasianalwdaresses are lusually SCATTERED that can be ferredyONbyyshoryyAsbitimames, allowing a single Sa ara 8 iss pce is shared between memory an levices Consequently, the load/store instructions used for CPU-memory transfers are also used for IO opera- tions. ‘Any instruction can be COR@itiOnallysexecuted, meaning that execution may or may not occur depending on the value of esis EE SRE ‘The status flags are set by a previous instruction and include a negative flag N (the pre- vious result R computed by the ALU was a negative number), a zero flag Z (R was zero), a carry flag C (R generated an output carry), and an overflow flag V (R generated a sign overflow). Hence every ARM6 instruction is effectively combined with a condi- tional branch instruction. The basic unconditional move instruction MOV RO, RI can have any of 15 conditions attached to it to determine if it is to be executed (see problem 3.8). Some examples: MOVCC RO0,RI ;If flag C = 0, then RO:=R1 MOVCS RO,RI jf flag C = 1, then RO:=R1 MOVHI R0,RI ; If flag C= | and flag Z = 0, then RO :=R1 151 152 SECTION 3.1 CPU Organization Register file [_sansreisers | Program counter PC Arithmetic- logic unit Y Write data register Read data register | $ Y m0 Syaembs yoy Address register I F Address incrementer Instruction register Control circuits Program control unit PCU. cee toone ofits operands. For instance, '** * Shift r rotation operation that is applied MOV RO,R1,LSL #2 sROc=RIx4 oes Ay ‘means logically left shift (LSL) ‘This shift is tantamount to multiplying R1 by four before the move ‘The opcode suffix S specifies whether or not status : ‘ an i IFS is present, appropriate flags are changed; otherwive, the age conceal Te 2 example, the ARM6's move instructions affect the N, Z, and € flags, so appending $ eee Type Instruction a ‘Assembly- Narrative _ format Janguage format format (comment) Daa Moveregister 3 MOV R3R9 Copy contents of register RO to register R3. transfer Moveregister RO MOV RO#12 Copy operand (decimal number 12) to reg- ister RO. Move inverted RT MYN_ R7,RO Copy bitwise inverted contents of RO to RT Load RS LDR RS, adr Load RS with contents of memory location adr. Store M(adr) := R8 STR R8.adr Store contents of R8 in memory location adr. Data Add R3:= RS +25 ADD R3,R5,#25 Add 25 to RS; place sum in R3. processing Add with carry R3-=R5+R6+C ADC R3RSR6 Add R6 and camry bit C to RS: place sum im R3 Subtract R3 SUB R3,.R5#9 Subtract 9 from RS; place difference im RS. Subtract with R3 SBC R3.R5,9 Subtract 9 and borrow bit from RS: place carry difference in R3 Reverse subtract R3:=9-RS RSB R3,R5,#9 Subtract RS from 9: place difference in R3. Reverse subtract. R3:=9-RS—C RSC. R3,R5#9._ Subtract RS and borrow bit from 9; place with carry difference in R3. Multiply RL :=R3xR2 MUL RLR2,R3. Multiply R3 by R2; place result im RI Multiply and add RI =(R3XR2)+R4 MLA RLR2RIR4 Multiply RS by R2; dd RS; place result in RI And Ré4- 1001, then AO — 1001 > Oa CMPA sets the zero flag Z to 0, indicating a nonzero result, (It also sets various olhet ree by this program). When AO finally reaches 1001, AQ — 1001 = 0,8 tOZeRD, te og NOM the last instruction BNE, which stands for branch ifmot dil 210, is a conditional branch instruction whose operation is described by ifZ #1 then PC = START ao to exit from the program, : tis interesting to : : ee nian eae this 680X0 program with the similar programs £1 oo Se of the 68020 includes fixed- n “based instructions for transferri - trol between Programs. Hardware-implemented floating-point stanton gee available directly; however, they are provided indirectly by means of an auxili IC, the 68881 floating-point coproc isons for ; Icroprocessor so that instructi 7 cuted by P can be aactiiaen iN programs fetched by the mG SE te coprocessor serves as an extension to the microprocessor and forms part of the CPU as indicated in Figure 3.14, The 68881 (and the similar but faster 68882 registers for storing floating-point numbers of various formats, including 32+ and 64-bit numbers conforming to the standard IEEE 754 format (presented later). Additional control registers in the 68881 allow it to communicate with the 68020. A set of coprocessor instructions are defined for the 68020; they contain command fields specifying floating-point operations that the 68881 can execute. When the 68020 fetches and decodes such an instruction, it transfers the com- mand portion to the coprocessor, which then executes it. Further exchanges take place between the main processor and the coprocessor until the coprocessor com- pletes execution of its current operation, at which point the 68020 proceeds to its next instruction. The commands executed by the 68881 include the basic ) contains a set of eight 80-bit a Main memory i Read-write 68020 el Read-only a floating memory (RAM) micro- pont (ROM) Processor |_| coprocessor + tk 7 tit can aE + 32-bit address bus t 5 ft 32-bit data bus y i f ; System ¥ | fee) - = a ea iz came i ; Le = + Lio Input-output npc ae interface circuit interface circuit : oer QO port) TO device 10 device Figure 3.14 68020-based microcomputer with floating-point coprocessor. multiply, and divide), square roo print a f piibineiric ae on in similar fashion. nee Pee aye a aeigics in VLSI to integrate a floating-point (co)p into the CPU chip. i tem/360-370 and the ARM6, the Cpy design features. Like the IBM Sys has ere state intended for operating system use and a a for d 3.12 indicate, certain “privileged” contro Ren ist can vo sed only in the supervisor state. User and super. isters and instructions can be used 1 way phos’ are thus clearly separated—for example, they employ different stack pointers—thereby improving system security. oe na are also designed to allow easy implementation of virtual memory, w' aie ne Oper ating system makes the main memory appear larger to user programs than it really is. Hardware support for virtual memory is provided by the 68851 memory man- agement unit (MMU), another 680X0 coprocessor. . Provided they meet certain independence conditions, up to three 68020 instruc- tions can be processed simultaneously in pipeline fashion. This pipelining is com- plicated by the fact that instruction lengths and execution times vary, a problem that RISCs try to eliminate. Another speedup feature found in the 68020 is a small instruction-only cache (I-cache). The 68020 prefetches instructions from main memory while the system bus is idle; the instructions can subsequently be read much more quickly from the on-chip cache than from the off-chip main memory. An unusual feature of the 68020 noted in Figure 3.11 is its use of two levels of microprogramming to implement the CPU’s control logic. For the manufacturer, this feature increases design flexibility while reducing IC area compared with con- ventional (one-level) microprogrammed control. (add, subtract,

You might also like