Microprocessor
Microprocessor
Samuel O. Aletan
Tech University
Louisiana
College of Engineering
Computer Science Department
Ruston, Louisiana 71272
11
dominated the computer industry for almost half (LSI) and very large scale integration (VLSI) al-
a century [11, 16]. The ideas behind the RISC lowed designers to support system functions with
design principle have already been used in many hardware.
supercomputers such as CRAY and CDC 6600 Advances in technology have made some of the
systems even though most computer designs were goals of CISC less important and some of them,
based on CISC philosophy. such as the assumption that programming will be
The philosophy of CISC was never stated in the done in assembly language, obsolete. Semicon-
manner that RISC philosophy was stated in the ductor technology has improved in terms of mem-
early 1980’s, but the reasoning behind the devel- ory speed and capacity [9, 14]. VLSI technology
opment of CISCS could be considered its design has also improved and more transistors are be-
philosophy. The design philosophy objectives of ing packed on the same chip. Today larger and
CISC are: faster memory makes “sacrificing execution speed
● reduce the amount of storage used and ac- for memory conservation”, less attractive. The
cessed (memory was small and slow), cache size has increased and compiler technol-
●
reduce the number of load and store oper- ogy has also improved. These advancements al-
ations (they dominate other operations even lowed programmers to switch from assembly lan-
though they are considered overhead opera- guage to HLLs. Although concern for software
tions), costs contributed to the popularity of HLLs, it was
●
support compatibility by making sure that the advances in memory technology and compiler
new computers can execute the existing code, design that made the transition less costly. The
●
support system software by implementing use of HLLs instead of assembly languages made
some system software functions in hardware, compatibility problems a bit more manageable.
●
make compiler writers’ jobs easier, and Faster memory and a larger cache size also re-
●
support instructions that will allow assembly duced memory access time.
language programmers to be creative in being The problem of semantic clash made most com-
able to write efficient codes. plex instruction sets less usable for compilers. Se-
In order to meet these goals, CISC designers sup- mantic clash is due to the problems caused by
port complex instructions, instructions that re- some HLL statements that seem to be functionally
quire several clock cycles to execute, and are usu- the same but are actually different in subtle ways.
ally variable in length [12]. Each instruction was When compilers were faced with semantic clash
designed with the maximum possible functional- problems, compilers usually opted for series of
ity in order to minimize the amount of instruc- simple instructions as a substitute for a complex
tions that a program needed. Control units were instruction [9]. Complex instructions were also
microprogrammed because large and complex in- found to create what has been termed the “n+l”
structions made it difficult to implement the con- phenomenon. This is attributed to the negative
trol units in hardware. Variable length instruc- impact, in terms of gate delays, that complex in-
tions were implemented in order to reduce code structions have on simple instructions. Complex
size and storage used. Higher functionality was instructions were found to be more difficult to op-
used to reduce the number of storage accesses. timize than simple instructions. Simple instruc-
Memory-to-memory operations were supported to tions are more suitable for the pipelining tech-
reduce the number of load/store operations. As- nique than a mixture of complex and simple in-
sembly programming was very common while the structions.
use of high level languages (I-ILLs) was not as The above stated problems were observed by
dominating as it is today. Advances in fabrication many researchers at Berkele y, IBM, Stanford, and
and the developments of large scale integration many other institutions. These researchers’ obser-
12
vations led to the formulation of new ideas/goals Although not all RISCS support these design char-
that were radically different than those of conven- acteristics, most adhere to many of them. In
tional architecture, CISC, designs. RISC design RISC philosophy emphasis is placed on increas-
philosophy can be stated as follows [6]: ing speed rather than reducing memory usage.
Further, RISC also emphasizes making instruc-
Analyze the applications to be supported to
tions a better target for the compiler rather than
determine the operations that are used more
make the job of the compiler writer easier. The
frequently and those that consume a lot of
concern is how to design instructions that compil-
time.
ers can use more efficiently. Designers must tar-
Design a datapath that is optimized to support
get instruction sets towards compilers rather than
these operations effectively.
towards assembly programmers. There is little
Add any other necessary instructions as long
emphasis on the compatibility issue at the HLL
as they do not have a negative impact on the level than at the assembly language level. The
pwiously selected instructions, and they are focus is on how to support the most time con-
relatively frequent.
suming operations in typical programs, and little
Design other supporting processors following emphasis is placed on closing the semantic gap
these stated steps: a resource is to be included with higher level instructions and hardware im-
only if it can be shown that the resource plementation of conventional software functions.
is frequently used, and its addition to other
previously accepted resources will not slow First Generation of RISC
down the entire system.
Move many of the conventional run-time The resurgence of RISC, as it is known today,
complexities to compile-time software, i.e., started with three independent projects at IBM,
charge the compiler with more responsibili- the University of California at Berkeley, and
ties. Stanford University. IBM’s research started in
1974 with an effort to design a large telephone-
The adherence to this philosophy varies from one
switching network that could handle about three
RISC design to another but they all have common
hundred calls per second. The system was never
traits. These traits are:
completed but the research led to the design of
●
most instructions are executed in one clock a minicomputer. The design of the minicom-
cycle, puter used some of the RISC traits listed ear-
●
load/store architecture is supported, lier — uniform instruction length, register-to-
●
instructions are simple and they have fixed register operations (load/store architecture), and
formats, delayed branch. The minicomputer was named
9 relatively few instructions and address modes “the 801.“A compiler was designed as an inte-
are supported, gral part of the 801 project. Memory hierarchy,
●
control unit is hardwired, pipelining, and 1/0 organization were used to ac-
8
compiler is made to optimize code for exe- complish the objective of executing each instruc-
cution efficiency, tion in one clock cycle. The first 801 was found to
●
several memory hierarchies are supported, have several flaws; a second 801 machine was de-
s highly pipelined data path is adopted for pos- signed. All instructions were 32–bits long; thirty-
two general purpose registers were provided on
sible concurrency operations, and
the second machine (the first machine had six-
●
a large register set, usually thirty-two or
teen registers which were found to be inadequate).
more, or overlapped window registers, are
The 801, at the time, was the fastest experimental
implemented.
processor by IBM. The IBM 3090 and 9370 used
13
Table 1. Variations in RBCVMOC Implementations
the 801 as an 1/0 processor and microcomputer, cessors discussed here. These processors could be
respectively [3]. considered the core of RISC architectures. Com-
The research efforts at Berkeley resulted in the mercial based RISC (CBRISC) processors differ
design of two processors, RISC I and 11 [9, 11, in many ways from each other and in some cases
12, 16]. Both RISCS had overlapped registers. dramatically different from these three research
The research at Stanford resulted in the design of based RISCS (RBRISCS).
a processor called “microprocessor without inter-
Variations in Commercial RISCS
locked pipe stages” (Mips). The processor is re-
ferred to as MIPS but it will be referred to in this Superminicomputers were among the first se-
paper as Mips to avoid any confusion with the ex- ries of computers marketed as RISC systems.
ecution speed of computers measured in millions The systems marketed included Celerity C1200,
of instructions per second (MIPS). The processor C1230, and C1260; Pyramid 90x and 98x; and
was similar to the Berkeley RISCS because of the Ridge 32,3200, and 5100 [2, 5, 13]. Many micro-
simplicity of its instruction set. It had a small processors, based on RISC, have also been intro-
number of instructions, all instructions had the duced. The microprocessors have enjoyed great
same wordlength, and each instruction required successes as the host processing units on work-
the same amount of time to execute [11]. But stations. They ha~e not been widely used in PCs.
Mips differed from the Berkeley RISCS in two Some of these processors are SPARC, Intel i860,
major areas — no register overlay and software Motorola 88000, Mips, and Transputer. Most of
was used extensively in cooperation with a five today’s RISC based processors are more “hybrid”
stage pipeline for the execution of instructions. than “true” RISC. Based on the earlier discussion
Berkeley had a two stage pipeline — decode and on RISC and CISC philosophies, one can con-
execute. The Stanford researchers, like the Stan- clude that the two philosophies are diametrically
ford researchers, but unlike Berkeley researchers, opposed to each other. It is becoming clear that
designed a compiler as part of their project. The most commercial RISC based processors are ac-
IBM project was unique in one way, The ap- tually similar to CISC based processors in many
plication in which the 801 would be used was ways. The first generations of RISC processors
considered in the design of the 801. and systems can be evaluated by examining their
Table 1 shows the characteristics of the three pro- approaches to memory hierarchy, instruction set,
14
parallelism, and architecture evolution. The first using RISC but they intended it to mean “regular
three of this list represents the main elements of instruction set computer” [3]. Although the word
RISC — compiler, VLSI, and microarchitecture. “regular” seems to fit better than “reduced”, RISC
The fourth, architecture evolution, is a by product based systems are not concerned only with in-
of the normal evolution of architecture. struction set. RISC designers are also concerned
about making trade-offs between hardware and
Memory Hierarchy software, computing elements organization and
Support for virtual memory space is part of the chip area, and compile-time and execution-time
features of modem computers. There are usually optimization. These trade-offs involve the com-
three levels of main memory hierarchy — main piler, VLSI, and the microarchitecture. It seems
memory, cache, and registers. Both CISC and appropriate to refer to RISC as compiler, VLSI,
RISC support these levels of memory hierarchy and microarchitecture optimized computers (CV-
MOCS). CBRISC and CBCVMOC are used in-
and that seems to be the only common ground.
terchangeably in this paper.
Most RISC have 32–bit addresses, a few sup-
port 24-bit addresses, but the amount of cache RISC I and II, designed at Berkeley, and Mips,
and number of registers and their implementa- designed at Stanford, had about31, 39, and 55 in-
tions differ between RISCS, Cache and registers structions, respectively [11]. But these RBRISCS
are crucial in RISC based architectures because did not target any application and they virtually
of the memory bandwidth requirements and sup- ignore floating-point operations. The IBM RT/PC
port for load/store architectures. All RISC based and its successor, the IBM RS/6000, had 118
systems support this principle (load/store archi- and 184 instructions, respectively. All CBCV-
tecture). The cache is required because of the MOC designs do consider floating point opera-
need to fetch, at least, one instruction per cy- tions. Floating point units were usually imple-
cle. Cache sizes for data and instructions vary mented as coprocessor. This is changing as
from one RISC implementation to another. Most more processors are incorporating both integer
RISCS use separate instruction and data caches to and floating point units on the same chip. The
ensure high bandwidth. Also, most RISCS do not Motorola 88000 and Mips R2000 supported 61
support overlapped register windows. Pyramid and 49 integer instructions, respectively. The
and SPARC are two of the few exceptions. Table ACCEL architecture, used for the Celerity super-
2 lists the cache sizes and number of registers for minicomputers (C 1200, C 1230, and Cl 260), sup-
some RISCS. ported 142 and 126 instructions on its integer and
floating point processors, respectively. It is clear
Instruction Set that reducing the number of instructions is not as
important to CBCVMOCS designers as they were
The basic principle of RISC is to support few
and simple instructions. The instructions to be to the designers of RBRISC.
supported are those that are frequently used. The On CBCVMOCS, application plays an important
decision on frequency of use are determined by role in choosing the type of operations to sup-
analyzing the dynamic use of instructions on var- port in hardware. For instance, the floating point
ious programs [9, 16]. In addition, RISC philoso- instructions on Celerity included sine, cosine,
phy dictates that only a few and simple instruction square root, logarithm, exponentiation and arctan-
formats and addressing modes be supported. All gent [2]. The Ridge architecture supported 16-,
instructions are expected to have fixed instruction 32–, and 48–bit instruction formats with 16 bits
length. The researchers at Berkeley coined the always specifying the operation and target regis-
name RISC (reduced instruction set computer). ter pair for the instruction [5]. The 90x proces-
The researchers at IBM had also thought about sor, the predecessor of the 9000 series, supported
15
only 90 instructions while the Pyramid 9000 se- Some of these instructions, just like the instruc-
ries supported 128 instructions. The 9000 series tions implemented for the IBM RT/PC, required
included instructions to support I/O and multi- several cycles to execute. Multiply instructions
processing [5]. The instruction set for the IBM take 3 to 5 cycles, and divide instructions can take
RS/6000 included some non-RISC like instruc- up to 20 cycles [7]. On the average, the RT/PC
tions — an update form of load/store operations instructions required only 1.1 cycles to execute,
(similar to autoincrement and decrement), char- some instructions required 6 cycles, and the su-
acter string handling, and a branch-and-count in- pervision call instruction (SVC) required 16-20
struction that could be used as a loop counter. cycles to execute [15].
I System/Processor
I
Cache Size (Kbytes)
Dats/instruction I
Number of Registers General
Purpose/Float/others I
Overlapped
I
I Celerity/Accel I
?
I 16/1 800/1 6384 I Yes, 32/window I
Transputer ? 6 No
16
IF ID OF OE WB
I 1 1 1 1 J
r I 1 I 1 I
IF ID OF OE WFt
t I I 1
IF ID OF OE WB
(a)
❑ ❑ID EEmEl
IF
m
❑IF El El •I w
E@lEEl OE
•1
WB
•1
~@❑ ❑ OF OE
•1 ❑
Many studies have shown that branch operations have nonparallel feature, then the compiler must
occur very frequently [4], It was observed that be charged with the task of finding code that can
‘{branch”occurred once out of every eight instruc- safely execute concurrently. It is also essential to
tions executed on the IBM 370 and once out of design architectures that would provide this fea-
every three on the Vax 11 [8]. RISC designers ture and the necessary resources to support con-
adopted the delayed branch technique; this tech- currency. The Ridge CPU was partitioned logi-
nique was used as early as in 1952 in the Los cally into an integer unit, a memory management
Alamos MANIAC machine [3]. The NOOP (no unit, and a floating point unit; these units have the
operation) had been a technique that had been potential to operate in parallel. Each unit has its
widely used before the delayed branch technique own copy of the general purpose registers and this
was resurrected by RISC researchers. NOOP in enhances the systems’s ability to support several
this case is a form of delayed branch in disguise. functional units executing instructions in parallel.
An optimizing compiler might eliminate the need Real parallelism was supported in the Transputer.
for NOOP by moving the branch instruction ahead Transputers have a large number of registers in-
of other instruction(s) on which the branch does corporated on the same chip as the processor. The
not depend (data dependency must be avoided). Transputers, unlike other CBRISCS, keep vari-
All the instructions that the branch instruction ables in memory and copy them to and from reg-
moved over must be executed before the branch isters attached to the ALU when they are needed
is taken. But the pipeline can always fetch the in a computation. The designers of the Transputer
proper sequence of instructions when the delayed were influenced by the Occam language. The
branch technique is applied. Transputer’s instructions were microcode to pro-
Many RISC designers advertised their systems as vide greater efficiency for frequently used com-
one in which floating point processors (usually in plex operations. All Transputers’ instructions are
form of coprocessor) could operate concurrently 1 byte long and usually execute in one to two cy-
with the main CPU, the integer processor. The cles [17]. The T800 Transputer is a 32–bit proces-
main problem has been accomplishing this using sor supported with a 64-bit floating point unit on
HLLs. We tried to do this on the RT/PC but with the same chip. Both the CPU and floating point
no success. If programs are written in HLL(s) that unit can function in parallel. Several Transputers
17
can be connected together to form a “true” parallel functions than the i860 XR. The Motorola 88000
system. Systems with well over 1000 Transputers combined both integer and floating point units on
have been demonstrated [18]. The Pyramid 9000 the same chip. But the i860 had more features
systems included several multiprocessors, called including [1]:
isoprocessors. The processors are symmetric and ● an integer unit with thirty-one registers;
they handle data processing requirements equally ●
thirty-two floating point registers, an adder
well. The isoprocessor technology is totally trans-
and multiplier units;
parent to the user. The models with multiproces- ● a 4-Kbyte cache for instruction, and a
sors were — 9820 with two processors, 9830 with
4-Kbyte for the data;
three processors, and 9840 with four processors ● a memory management unit with 64-entry
[13]. Data General Corporation also introduced a
translation look-aside buffer; and
multiprocessing system, called Topgun, based on
●
a graphics unit.
the Motorola 88000. The system can be expanded
to four processors (CPUS). One drawback of this high integration is that a
larger cache size cannot, currently, be provided
Architecture Evolution on chip. In the future, it is conceivable that
larger size of on chip cache will be possible. As
In the past, CISC architecture provided improved manufacturers are improving their first generation
performance through architecture evolution. The RISCS, they are also introducing processors that
Intel 80x86 and 680x0 are two prime exam- represent the second generation RISCS. When
ples. Compatibility is one of the objectives that Ridge introduced Ridge 5100, they claimed it
CISC designers always strive for because of large was based on their fourth generation of RISC
amount of existing software base that have been superminicomputers [5]. But the introduction of
built over the years for CISC systems. This was the i860 seemed to have marked the beginning of
one problem that most RISC designers did not the second generation of RISC.
have to deal with since they either did not have
previous machines or architecture to worry about,
Second Generation CBCVMOCS
or they were introducing systems or processors as
a new product line. Mips systems is an example The second generation RISCS are achieving higher
of the first category, and IBM, HP, Motorola, and execution speed by using higher clocks, support-
Intel will qualify for the second category. ing concurrency among functional units, and ex-
Many manufacturers concentrated on improving tending the conventional pipeline stages. CBCV-
the performance of their processors through in- MOCS were introduced with low clock frequency,
tegration and use of faster technology such Gal- 12MHz to 20MHz, but the clock frequency had
lium Arsenide (GaAs). GaAs can offer up to 20:1 gradually increased over the years. The pro-
speed advantage over NMOS silicon, or approx- cessors used by Hewlett Packard (HP), called
imately 6:1 speed advantage over current CMOS HP Precision architecture (HP-PA), in their new
[10]. Manufacturers could also improve the pro- workstations, HP Apollo 9000 Series 700, have
cessing speed of their processor(s) by providing a CPU clock of 50MHz and 66MHz. The HP-
larger cache size (sometimes on chip), or faster PA merged the PRISM (Apollo’s architecture) and
clock frequency, and/or integrating more func- PA-RISC (HP’s architecture) to develop the new
tional units on the same chip. For example, Intel architecture. Of the current processors used in
introduced the i860 XP in 1991 as a new im- workstations, the fastest in terms of MIPS is the
plementation of the i860 XR that was released HP-PA Model 730 running at 66 MHz and rated
in 1989 [1]. The i860 XP microprocessor was at 91 MIPS. This will not last for long as new pro-
a larger, faster version that implemented more cessors with higher clock speed, better integration
18
of functional units, and improved compilers are sharing could reduce the theoretical performance.
constantly being released. Also, the program being executed must have an
Second generation RISCS are also exploiting the instruction sequence that can keep the functional
idea of supporting parallelism by providing fea- units busy. The compiler must organize the in-
tures that will allow multiple instructions to be ex- structions to keep the functional units occupied.
ecuted concurrently, and by extending the number On the RS/6000, four instructions can be exe-
of stages in the pipeline. First generation RISCS cuted simultaneously in one cycle — a branch
do have functional units that can potentially func- instruction, a condition code register instruction,
tion in parallel, but in most cases, do not. This is a fixed point instruction, and a floating point in-
partly because compilers for the first generation struction [7]. The HP-PA uses a semi-superscalar
RISCS do not take full advantage of the potential approach to process three instructions per cycle
for concurrency. Compilers for the second gen- — two floating point instructions and one integer
eration RISCS ate designed to take full advantage instruction.
of this technique. The technique is referred to Execution speed in RISC has also been improved
as superscaling. One of the first processors to with an increase in the number of stages in
use this method is the IBM RIOS, the proces- pipelines. If the same notation used in Figure
sor for the IBM RS/6000. The current RS/6000 1 with five pipeline stages is adopted, a super-
system consists of nine units — instruction-cache pipelining can be represented as shown in Figure
unit (ICU), fixed point unit (FXU), floating point 2. The goal is to find ways to break down some of
unit (FPU), four data-cache units (DCUS), stor- the pipeline stages into simpler steps. The clock
age control unit (SCU), input/output interface unit must also be operated at a faster speed because
(C), clock chip (CLK), data-multiplexing chips each stage will now require less work (time). The
(D), and control chip (R) [7]. Many of these pro- execute stage of the floating point unit is a prime
cessing units can operate in parallel. Resource target for this technique.
❑ =ll=’pdl=l
19
References [10].McNeley, Kevin J., and Velijko M. Miluti-
novic, “Emulating a Complex Instruction Set
[1]. Atkins, Mark “Performance and the i860 Mi- Computer with a Reduced Instruction Set
croprocessor’’, IEEE MICRO, October 1991, Computer,” IEEE MICRO, Feb. 1987, pp.
pp. 24-27, 72-78. 60-72.
[2]. Celerity Computing C1260/C1230, ACCEL [11].Patterson, D. A., “Reduced Instruction Set
Architecture Overview, 1985, PN 650, 920, Computers”, Communications of the ACM,
Celerity Computing, 9692 via Excelencia, Vol. 28, No. 1, January 1985, pp. 8–21.
San Diego, CA 92126. [12].Patterson, D. A. and R. Piepho, “RISC As-
[3]. Cocke, J. and V. Markstain, “The Evolution sessment: A High-ievel Language Exper-
of RISC Technology at IBM,” IBM Journal iment,” Symposium of Computer Architec-
of Research and Development, Vol. 34, No. ture, ACM SIGARCH, Vol. 10, No. 3, April
1, January 1990, pp. 4-11. 1982, pp. 3–8.
[4]. DeRosa, J. A. and H. M. Levy, “An Evalua- [13].Pyramid Technology Series 9000, A Datapro
tion of Branch Architectures,” Computer Ar- Report, August 1987, Datapro Research Cor-
chitecture News, Vol. 1, No. 2, June 1987, poration, McGraw-Hill, Delran, NJ 08075.
pp. 10-16. [14].Sherbume, R. W. et al., “Datapath Design
[5]. Electronics News, Vol. 33, No. 1674, Mon- for RISC,” 1982 Conference on Advanced
day September 28, 1987. Research in VLSI, M.I.T., January 25, 1982,
[6]. Gimarc, C. E. and V. M. Milutinovic, “A pp. 53–62.
Survey of RISC Processors and Computers [15].Simpson, Richard O. “The IBM RT Personal
of the Mid- 1980s,” IEEE COMPUTER, Vol. Computer,” BYTE, Extra Edition, 1986, pp.
20, No. 9, September 1987, pp. 59-69. 43-78.
[7]. IBM RISC System/6000 Processor, IBM
[16].Stallings, W. Computer Organization and Ar-
Journal of Research and Development, Vol.
chitecture, Macmillan Publishing company,
34, No. 1, January 1990, pp. 1-136. 1986.
[8]. Lazzerini, Beatrice, “Effective VLSI Proces-
[ 17].Stein, Richard M. “T800 and Counting,”
sor Architectures for HLL Computers: The
BYTE, November 1988, pp. 287-296.
RISC Approach,” IEEE MICRO, February
[18].The Transputer Handbook, SGS-THOMSON
1989, pp. 57-65.
Microelectronics Inc., 1000 East Bell Road,
[9]. Markoff, J. “RISC Chips,” BYTE, November
Phoenix, AZ 85022.
1984, pp. 191–206.
20