Implementation of Precise Interrupts in Pipelined Processors
Implementation of Precise Interrupts in Pipelined Processors
James E. Smith
Andrew R. Pleszkun
Abstract
An interrupt is precise if the saved process state corresponds with the sequential model of program execution where
one instruction completes before the next begins. In a pipelined processor, precise interrupts are difficult to achieve
because an instruction may be initiated before its predecessors have been completed. This paper describes and
evaluates solutions to the precise interrupt problem in pipelined processors.
The precise interrupt problem is first described. Then five solutions are discussed in detail. The first forces in-
structions to complete and modify the process state in architectural order. The other four allow instructions to com-
plete in any order, but additional hardware is used so that a precise state can be restored when an interrupt occurs.
All the methods are discussed in the context of a parallel pipeline struck sure. Simulation results based on the
CRAY-1S scalar architecture are used to show that, at best, the first solution results in a performance degradation
of about 16%. The remaining four solutions offer similar performance, and three of them result in as little as a 3%
performance loss. Several extensions, including virtual memory and linear pipeline structures, are briefly discussed.
Figure 1. Pipelined implementation of our model architecture. Not shown is the result shift register used to
control the result bus.
2.2. Interrupts Prior to Instruction Issue That is, they do not depend on the operands, only on
the function. Thus. the result bus can be reserved at the
Before proceeding with the various precise interrupt time of issue.
methods. we discuss interrupts that occur prior to in- First, we consider a method commonly used to
struction issue separately because they are handled the control the pipelined organization shown in Fig. 1. This
same way by all the methods. method may be used regardless of whether precise in-
In the pipeline implementation of Fig. 1. instruc- terrupts are to be implemented. However, the precise
tions stay in sequence until the time they are issued. interrupt methods described in this paper are integrated
Furthermore, the process state is not modified by an into this basic control strategy. To control the result
instruction before it issues. This makes precise inter- bus, a “result shift register,” is used; see Fig. 2. Here,
rupts a simple manner when an exception condition can the stages are labeled 1 through n, where n is the length
be detected prior to issue. Examples of such exceptions of the longest functional unit pipeline. An instruction
are privileged instruction faults and unimplemented that takes i clock periods reserves stage i of the result
instructions. This class also includes external interrupts shift register at the time it issues. If the stage already
which can be checked at the issue stage. contains valid control information, then issue is held
When such an interrupt condition is detected, in- until the next clock period, and stage i is checked once
struction issuing is halted. Then, there is a wait while again. An issuing instruction places control information
all previously issued instructions complete. After they in the result shift register. This control information
have completed, the process is in a precise state, with identifies the functional unit that will be supplying the
the program counter value corresponding to the in- result and the destination register of the result. This
struction being held in the issue register. The registers control information is also marked “valid” with a va-
and main memory are in a state consistent with this lidity bit. Each clock period, the control information is
program counter value. shifted down one stage toward stage one. When it
Because exception conditions detected prior to in- reaches stage one, it is used during the next clock to
struction can be handled easily as described above, we control the result bus so that the functional unit result is
will not consider them any further. Rather, we will con- placed in the correct result register.
centrate on exception conditions detected after instruc- Still disregarding precise interrupts, it is possible
tion issue. for a short instruction to be placed in the result pipeline
in stage i when previously issued instructions are in
3. In-order Instruction Completion stage j, j > i. This leads to instructions finishing out of
the original program sequence. If the instruction at
With this method, instructions modify the process state stage j eventually encounters an exception condition,
only when all previously issued instructions are known the interrupt will be imprecise because the instruction
to be free of exception conditions. This section de- placed in stage i will complete and modify the process
scribes a strategy that is most easily implemented when state even though the sequential architecture model
pipeline delays in the parallel functional units are fixed. says i does not begin until j completes.
Figure 3. (a) Reorder Buffer Organization. (b) Reorder Buffer and associated Result Shift Register.
Table 1. Relative Performance for the first 14 Lawrence Livermore Loops, with stores blocked until the re-
sults pipeline is empty.
Table 2. Relative Performance for the first 14 Lawrence Livermore Loops. with stores held in the memory
pipeline after issue
Five methods have been described that solve the pre- [Ande67] D.W. Anderson, F.J. Sparacio, and F.M. Tomasulo, “The
IBM Svstem/360 Model 91: Machine Philosophy and Instruc-
cise interrupt problem. These methods were then evalu- tion Handling,” IBM Journal of Research and Development, V
ated through simulations of a CRAY-1S implemented 11, January 1967, pp. 8-24.
with these methods. These simulation results indicate [Bons69] P. Bonseigneur, “Description of the 7600 Computer Sys-
that, depending on the method and the way stores are tem,” Computer Group News, May 1969, pp. 11–15.
handled, the performance degradation can range from Buch62] W. Bucholz, ed., Planning a Computer System.
between 2.5% to 3%. It is expected that the cost of im- McGraw-Hill, New York, 1962.
plementing these methods could vary substantially,
[Dc841 Control Data Corporation, “CDC Cyber 180 Computer Sys-
with the method producing the smallest performance tem Model 990 Hardware Reference Manual,” pub. No.
degradation probably being the most expensive. Thus, 60462090, 1984.
selection of a particular method will depend not only on [CDC81] Control Data Corporation, “CDC CYBER 200 Model 205
the performance degradation, but whether the imple- Computer System Hardware Reference Manual," Arden Hills,
mentor is willing to pay for that method. MN, 1981.
It is important to note that some indirect causes for [Cray79] Cray Research, Inc., “CRAY-1 Computer Systems, Hard-
performance degradation were not considered. These ware Reference Manual,” Chippewa Falls, Wl, 1979.
include longer control paths that would tend to lengthen [Henn82] J. Hennessy et. al., “Hardware/Software Tradeoffs for
the clock period. Also, additional logic for supporting Increased Performance,” Proc. Symp. Architectural Support for
precise interrupts implies greater board area which im- Programming Languages and Operating Systems, 1982, pp.
2-11.
plies more wiring delays which could also lengthen the
clock period. [HiTa721 R. G. Hintz and D. P. Tate, “Control Data STAR-100
Processor Design,” Proc. Compcon 72. 1972, pp. 1-4.
One of the authors (J. E. Smith) would like to thank R. [PaSm83] N. Pang and J. E. Smith, “CRAY-1 Simulation Tools,”
Tech. Report ECE-83-11, University of Wisconsin-Madison,
G. Hintz and J. B. Pearson of the Control Data Corp. Dec. 1983.
with whom he was associated during the development
[Russ78] R.M. Russell, “The CRAY-1 Computer System,” Comm.
of the CYBER 180/990. This paper is based upon re- ACM, V 21, N 1, January 1978, pp. 63-72.
search supported by the National Science Foundation
[Stev81] David Stevenson, “A Proposed Standard for Binary Float-
under grant ECS-8207277. ing Point Arithmetic,” Computer, V 14 N 3, March 1981, pp.
5l62.
11. References [Thor70] J.E. Thornton, Design of a Computer - The Control Data
6600, Scon, Foresman and Co., Glenview, IL, 1970
[Amdh81] Amdahl Corporation, “Amdahl 470V/8 Computing Sys- [Ward82] William P. Ward, “Minicomputer Blasts Through 4 Mil-
tem Machine Reference Manual,” publication no. lion Instructions a Second,” Electronics, Jan. 13, 1982, pp.
G1014.0-03A, Oct. 1981. 155–159.