0% found this document useful (0 votes)
15 views

A Dynamic Instruction Set Computer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

A Dynamic Instruction Set Computer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Dynamic Instruction Set Computer*

Michael J. Wirthlin and Brad L. Hutchings


Dept. of Electrical and Computer Eng.
Brigham Young University
Provo, UT 84602

Abstract during application execution can provide more hard-


A Dynamic Instruction Set Computer (DISC) has ware resources than is available on a one-time config-
been developed that supports demand-driven modifi- ured FPGA. This technique, known as run-time re-
cation of its instruction set. Implemented with par- configuration (RTR), has been shown t o increase the
tially reconfigurable FPGAs, DISC treats instructions
as removable modules paged an and out through par-
tial reconfiguration as demanded by the executing pro-
functional density of reconfigurable FPGAs 61. The
DISC processor uses RTR to ameliorate FP A hard- d:
ware limitations and provide an essentially limitless
gram. Instructions occupy FPGA resources only when application-specific instruction set.
needed and FPGA resources can be reused to imple- Early attempts in modifying a processor instruc-
ment an arbitrary number of performance-enhancing tion set involved a writable control store and gen-
application-specific instructions. DISC further en- erating custom micro-code for each application[l4].
hances the functional density of FPGAs by physi- The PRISM project extended this idea by augmenting
cally relocating instruction modules to available FPGA the instruction set of a d8ndard RISC processor with
space. application-specific instructions on a tightly coupled
FPGA. Hardware images of these instructions are ex-
1 Introduction tracted and compiled from the source code transpar-
Developing customized stored-program processors ent t o the user[2]. The WASMII project discusses a
is a convenient design technique that combines the more dynamic approach that involves swapping hard-
enhanced performance of application-specific circuits ware compute configurations in and out of the FPGA
with the flexibility of general-purpose programmable resource as demanded by the data-flow token[9].
processors. Application-specific instruction sets, cus- The DISC processor implements each instruction in
tomized 1/0 and optimized control can substantially the instruction set as an independent circuit module.
improve the performance of even the simplest pro- The individual instruction modules are paged onto the
grammable processors. FPGAs provide an excellent hardware in a demand-driven manner as dictated by
implementation platform for application specific pro- the application program. Hardware limitations are
cessors because of the quick development time and eliminated by replacing unused instruction modules
simplified design process. In addition, SRAM based with usable instructions at run-time. An application
FPGAS provide the ability to reconfigure more than running on DISC contains source code, indicating in-
one distinct application-specific processor on a single struction ordering, and a library of application-specific
device. instruction circuit modules.
A number of general purpose processors have This paper will begin by describing the techniques
been developed to show the feasibility of implement- used to implement DISC. These include partial recon-
ing a processor architecture on an FPGA[5, 7, 171. figuration, relocatable hardware, and the linear hard-
Several custom processors have successfully demon- ware model. The architecture of the DISC processor
strated the advantages of adding specialized hard- will be presented along with several example custom
ware to general purpose processor cores. Applica- instructions. The DISC processing system, including
tion areas for these processors include digital audio software and hardware platform, will be described.
processing[l6], systems of linear equations[l7], and The paper will conclude by presenting results from
st at istical physics [121. an algorithm implemented on DISC.
One limitation of building customized processors
on FPGAs is the lack of hardware resources avail- 2 Partial FPGA Reconfiguration
able for specialized instruction sets. A few hardware- DISC takes advantage of partial FPGA configura-
intensive instruction modules can quickly consume all tion t o implement dynamic instruction paging. Partial
the resources of even the largest FPGAs available to- reconfiguration provides the ability to configure a sub-
day. Reconfiguring an FPGA to replace idle circuitry section of an FPGA while remaining logic operates
unaffected. Although all SRAM-based FPGAs can be
*This work was supported by ARPAICSTO under contract reconfigured in-circuit, only the CAL[l], Atmel[3], and
number DABT63-94-C-0085 under a subcontract to National National Semiconductor[lS] FPGAs support the aba-
Semiconductor ity to partially reconfigure hardware resources.

99
$04.000 1995 IEEE
0-8186-7086-W95

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
Although few partially reconfigurable systems processor begins execution. The sequencing of instruc-
have actually been implemented, several have been tions on a small FPGA may execute and configure as
proposed such as hardware multi-tasking[lO], a follows:
multi-phase serial communication algorithm[ll], a
data acquisition system[4], and a self-reconfiguring Operation Instruction
processor[8]. In addition, caching logic to in- Configure INSTA Configure INSTA on FPGA
crease hardware efficiency in standard digital sys- Execute INSTA Execute first INSTA
tems has been proposed using partially reconfigurable Execute INSTA Execute second INSTA
FPGAs[15]. Configure INSTB Configure INSTB on FPGA
DISC uses partial configuration to implement Execute INSTB Execute first INSTB
custom-instruction caching. Instruction modules are Configure INSTC Configure INSTC on FPGA
implemented as partial configurations and individu- Execute INSTC Execute first INSTC
ally configured on DISC as demanded by the applica- Execute CMP Execute CMP (always available)
tion program. Before initiating execution of a custom- Execute JNE Execute JNE (always available)
instruction, DISC queries the FPGA for the pres- (continue looping to INSTC until JNE fails)
ence of the custom-instruction configuration. If the Remove INSTA FPGA full, remove oldest modul
custom-instruction is on the FPGA, execution is initi- Configure INSTD Configure INSTD
ated. Otherwise, program execution pauses while the Execute INSTD Execute INSTD
custom-instruction is configured on the FPGA. Execute INSTB Execute second INSTB
As a typical program executes, custom-instructions Remove INSTC FPGA full, remove oldest modul
are configured onto the FPGA until all available hard- Configure INSTE Configure INSTE
ware is consumed. When all hardware is used by the Execute INSTE Execute INSTE
custom-instructions, new custom-instruction modules
may not be configured on the FPGA until enough ex-
isting hardware is removed. By replacing the oldest
custom-instruction modules on the FPGA with newer In the previous example, it is assumed that the first
modules, the FPGA serves as a cache of the most- five instructions (INSTA, INSTB, INSTC, CMP, and
recently used custom-instruction modules. JNE) consume all available space on a single FPGA.
Partially configuring the FPGA allows two additional
2.1 Example instructions (INSTD and INSTE) to execute on an oth-
The following assembly language source code exem- erwise full FPGA.
plifies the use of partial configuration on DISC:
2.2 Advantages
begin:
Partial configuration provides a number of advan-
tages for DISC over conventional configuration meth-
;instruction INSTA operates on
ods. First, idle instruction modules can be removed to
;memory location meml make room for other usable modules. The ability to
INSTA meml replace instruction modules in the system at run-time
INSTA mem2 allows the implementation of an instruction set much
;instruction INSTB operates on larger than is possible on a single one-time configured
;mem3 and mem2 FPGA.
INSTB mem3,mem2 Second, configuration time is substantially reduced.
; "loopback" label defined Although the DISC FPGA could be completely con-
loopback: figured every time a new instruction is needed, config-
INSTC mem3
uration overhead can be dramatically reduced by con-
;instruction CMP compares
figuring only the requested instruction. Reducing the
size of hardware to configure significantly reduces the
;memi with mem3 configuration bit-stream. Configuration bit-stream re-
CMP meml,mem3
;instruction JNE jumps
ductions for DISC instruction modules fall between &
;to loopback if not equal
and of a complete FPGA configuration. With a sig-
JNE loopback
nificantly smaller bit-stream, the corresponding con-
figuration time is reduced. In an environment of run-
continue : time configuration, reducing the configuration time
INSTD mem3 will limit the reconfiguration overhead.
INSTB mem2 Third, system state can be saved on the FPGA dur-
INSTE mem3 ing configuration. Conventional configuration tech-
end: niques prevent the preservation of system state during
configuration by destroying the contents of all flip-
Once each inst ruc- flops. Implementing DISC with conventional configu-
tion in the previous program (INSTA, INSTB , INSTC , ration methods would require the saving and restor-
CMP, JNE, INSTD, and INSTE) has been designed as ing of system state (program counter, register values,
an independent partial configuration, the source code etc.) every time a configuration occurs. To prevent
representing the program is loaded into DISC and the the time-consuming process of saving and restoring

100

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
state, DISC implements a global controller that re- no affect on the physical layout or placement of any
mains on the FPGA at all times. other module in the library.
In summary, partial configuration allows DISC
to implement an essentially infinite instruction set 4 Linear Hardware Space
in hardware with limited configuration and state- DISC implements relocatable hardware in the form
preserving overhead. of a linear hardware model. As the name suggests, the
model is based on a linear , one-dimensional hardware
3 Relocatable Hardware space. The two-dimensional grid of configurable logic
The ability to partially con- cells are organized as an array of rows: location is
figure custom-instruction modules allows DISC to im- specified by vertical location and module size is spec-
plement an important strategy - relocatable hardware. ified by module height (in rows).
Relocatable hardware, implemented only in partially The global context for the linear hardware model
configurable FPGAs, provides the ability to relocate or consists of a uniform communication network and a
make placement decisions of partial configurations at global controller. The communication network is con-
run-tame. Although not essential for a general purpose structed by running each global signal vertically across
processor, it is used on DISC to substantially improve the die and spreading the global signals across the
run-time hardware utilization. width of the die parallel to each other (see Figure 1).
Sub-modules in traditional digital systems require
a single fixed location in hardware because of strict
global and local physical constraints. Because sub-
modules in traditional systems are not paged in and
out of hardware, a fixed location does not pose any
problems and global optimizations can be made on the
static circuitry to improve hardware utilization. In a
run-time partial reconfigurable system, however, fixed
locations for partial configurations can pose serious
performance problems.
If DISC modules are designed for a single physi-
cal location, instructions in the library will inevitably
overlap each other on the hardware. Two overlap-
ping instructions can never operate properly on the
FPGA at the same time. If two overlapping instruc-
tions are used frequently together in an application
program, the configuration overhead needed to replace
the instructions quickly becomes the system bottle-
neck. DISC removes these problems by designing each
custom-instruction module for multiple locations on
the FPGA. U
The flexibility of multiple locations for DISC
custom-instructions significantly improves run-time
utilization. Instruction modules are initially config- = UODisabled
ured on the FPGA as close as possible to avoid wasted
hardware between modules. Once the hardware space
is full, additional instruction modules are placed in Figure 1: Linear Hardware Space.
locations where older unneeded instruction modules
currently lie. Relocatable hardware allows run-time The communication network provides access to
constraints and conditions to dictate instruction mod- global resources for all instruction modules and per-
ule placement for optimal hardware utilization. forms intermodule communication. The global con-
Relocatable hardware is implemented by design- troller specifies the communication protocol, controls
ing custom-instruction modules around a firmly de- global resources (such as 1/0 and global state) and
fined global context. A global context provides physi- monitors circuit execution. The global controller and
cal placement positions and a communication network the communication network remain in the same loca-
necessary for these modules t o operate correctly. The tion throughout application execution to preserve the
global context partitions the available hardware into global context.
an array of potential placement locations for the relo- To gain access of all global signals, sub-modules
catable instruction modules. The communication net- within a linear hardware space are designed horizon-
work is provided at each placement location t o insure tally, across the width of the FPGA. The modules
adequate communication between the global controller lie perpendicular to the global communication signals
and the instruction modules at any location. for full access of all global signals regardless of their
In order to design instruction modules that fit vertical placement (see Figure 2). Although all sub-
within the global context, all instruction modules modules must span the entire width of the FPGA, each
must be physically independent from each other. The module may consume an arbitrary amount of hard-
physical layout of any instruction module must have ware by varying its height.

101

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
and global state. The global controller consumes ten
complete rows (approximately 1/6 of the chip) leav-
ing 46 rows available for custom-instruction modules.
The physical layout of the global controller, estimated
I I I I I at 1007 gates, along with the communication network
is seen in Figure 4.

Width of FPGA
Figure 4: DISC Global Controller Layout.

The architecture of the global controller is seen


Figure 2: Simplified Custom Instruction Module. in Figure 5 and is comprised of the following sub-
modules:
Relocatable circuit modules communicate as estab- 0 Data Register (DR):stores intermediate results,
lished by the global protocol and thus operate properly provides inter-module communication buffering
at any vertical location. In a run-time environment, and assists in complex address generation (8 bits),
these circuit modules can be relocated as needed to
optimize the available hardware space. 0 Address Register (AR): provides standard ad-
dressing modes for memory access (16 bits),
5 DISC Architecture 0 Program Counter (PC):provides the sequencing
The DISC architecture implements relocatable capability of the processor (16 bits),
hardware with the linear hardware model on a sin-
gle National Semiconductor CLAy31 FPGA coupled
to an external RAM. The CLAy31 provides a 56 x 0
processor (4 bits , \
Status Register SR): stores internal state of the
Instruction Register (IR): stores the opcode of
56 array of fine-grain logic cells allowing 56 complete the current instruction (8 bits),
rows in the linear hardware space. A complete proces- Global Control Unit (GCU): contains the cir-
sor is made by coupling a global controller to a library cuitry necessary to preserve communication pro-
of custom-instruction circuit modules (see Figure 3). tocol, sequence through processor states, and in-
terface with I/O.

Add
Subtract
Multiply
AND
U 0
D
n
U Custom Module I
n i DataRep~erFeedback Data Register
i t
Custom Module 2
n
: Data Reaser Valw
n a+b-c"d
17 Edge Detection
id
FFT
0uuu000000 Figure 5: DISC Global Controller Architecture.

The global controller provides a consistent com-


Figure 3: DISC Linear Hardware Space. munication interface and standard protocol for all
custom-instructions at every vertical location. The
5.1 Global Controller global signals available to the custom-instructions in-
The global controller provides the circuitry for op- clude the following:
erating and monitoring global resources such as the ex- 0 Data Register Value: accesses contents of Data
ternal RAM, I/O, the internal communication network Register (8 bits),

102

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
Data Register Feedback: provides new values for
Data Register (8 bits),
Memory Address: allows address generation con-
I IF IOFIEXI
trol by custom-instructions (16 bits), Standard Instruction Sequence
Memory Data: allows bi-directional access of
memory data by custom-instructions (8 bits),
Status Signals: provides control capability for
custom-instructions (4 bits),
Instruction Register: provides opcode of current Custom Instruction Sequence
instruction (8 bits).
The global controller is also responsible for sequenc- Figure 6: DISC Instruction Sequences.
ing through the instruction cycles for the custom-
instruction modules. The following instruction cycles
are implemented by the global controller: 0 load data register: load data register from mem-
ory,
0 Instruction Fetch (IF), 0 conditional jump: jump with carry not set.
0 Operand Fetch (OF),
0 Halt Processor (HP), Each of these instructions follow the standard in-
0 Custom Cycle ( C C ) , struction sequence of three cycles. These instructions,
0 Instruction Execution (EX) coupled with the custom-instruction library designed
for a particular application, provide the complete in-
The IF cycle stores the current program memory struction set of the processor. An application can im-
into the instruction register and increments the pro- plement an instruction set of any size by paging in-
gram counter. The OF cycle stores the current pro- struction modules in a demand-driven manner from
gram byte into the address register and also incre- the instruction library.
ments the program counter. The HP cycle causes all 5.2 Custom-instruction Modules
processor resources to remain idle and is used dur- Custom-instruction modules vary in size and com-
ing configuration. The C C cycle is used by complex plexity, but each is designed to fit within the global
custom-instruction modules for adding additional cy- context described above. Specifically, each module
cles and has no affect on global resources. The EX contains a decode and a data-path unit. Complex
cycle loads the value of the data register with the con- modules contain additional control structures.
tents of the data register feedback path. The decode unit assigns a specific op-code to the
Each instruction in the library operates in one of custom instruction and is responsible for acknowledg-
two possible instruction cycle sequences: standard ing its presence to the global controller. The decode
and custom. The standard instruction sequence fol- unit compares the contents of the I R for a match
lows a simple three-cycle execution: IF, OF, and EX. against its own opcode during the OF cycle. On a
Any instruction that completes its computation or positive match the module signals the global controller
function in a single clock cycle, such as basic arith- that the hardware is present and instruction sequenc-
metic and logic operations, will operate with this se- ing continues.
quence. The data-path is responsible for providing the
The custom-instruction sequence offers additional proper connections to the global communication net-
cycles for complex custom-instructions. The custom work and adhering to the established communication
sequence begins with the following two cycles: IF protocol. Instruction modules not executing refrain
followed by OF. The sequence then varies by insert- from sending any signals on the communication chan-
ing as many CC cycles as necessary to complete a nel to prevent the corruption of other operating in-
complex application-specific operation. The custom- structions. The data-path unit provides a new value
instruction sequence completes with the EX instruc- for the data register during the EX stage. Most in-
tion cycle. The custom-instruction module has com- structions perform their function by modifying the
plete control over the number of C C cycles needed for DR.
a particular function. Some instructions add as few as Several custom-instruction modules of varying size
one cycle, while others require thousands of cycles for have been implemented on DISC. These vary from a
a single operation. Figure 6 displays the two instruc- simple single row shifter to a complex edge-detection
tion sequences. module of 34 rows. Table 1 shows the current instruc-
The global control unit contains a number of de- tions available for DISC. The circuit layout for the
fault instructions necessary for controlling global re- Adder/Subtracter module is seen in Figure 7.
sources. These instructions are used for sequencing,
status control, and memory transfer and include the 6 System Operation
following: The DISC processor was implemented on a PC-
ISA custom board made exclusively for the study.
0 set carry: sets carry bit in status register, The board includes static bus interface circuitry, two
0 clear carry: clears carry bit in status register, CLAy3l FPGAs, and memory. A configuration con-
0 store data register: store data register in memory, troller is implemented on the first FPGA to monitor

103

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
Upon receiving a request for an instruction mod-
ule, the host evaluates the current state of the DISC
FPGA hardware and chooses a physical location for
Comparator 3 I 155 the requested module. The physical location is chosen
based on available FPGA resources and the existence
of idle instruction modules. If possible, the instruc-
tion module is loaded in an FPGA location not cur-
rently occupied by any other instruction module. If no
empty hardware locations are available, a simple least-
recently-used (LRU) algorithm is used to remove idle
hardware. The host modifies the bit-stream of the
requested hardware module to reflect the placement
changes. The hardware module is then configured on
the DISC platform by sending the new configuration
Table 1: Sample Custom Instruction Modules. to the system. Figure 9 provides a simplified flow chart
of DISC instruction execution.

-+,
OL
Instruction
Figure 7: DISC Adder/Subtracter Custom Module
Layout.
Present?

processor execution and request instructions from the


host. DISC is implemented on the second FPGA and
the application program memory is stored in the adja-
cent memory (see Figure 8). The board operates under
a UNIX-based operating system and is controlled by
a host device driver.

Processor Conuoller

I
ISA Bus

Figure 9: DISC Instruction Execution.


Figure 8: DISC System.
One drawback of partially configuring the device
Performance has not been a main consideration as during run-time is the overhead caused by continually
DISC was implemented primarily to study dynamic reconfiguring instruction modules. The current board
instruction set modification through partial reconfig- configures the DISC processor by sending the config-
uration. As a research tool, the processor is 8 bits uration bit-stream one bit per bus transfer over the
and operates at the host bus speed of 7.5 MHz (max- PC-ISA bus. Operating at a maximum transfer rate
imum operating speed calculated at 12 MHz). Pro- of 1.5 Mb/secl the PC-host is capable of configuring
cessor widths and operating speeds can be increased one row in 600 us. This represents 4511 processor cy-
as device densities increase and tool enhancements be- cles or 1500 simple instruction executions for each row
come available. configured. By removing the current system board
A DISC application is initiated by first, loading the and bus limitations, configuration speeds improve by
program memory with the target application, and sec- a factor of 64 and operate at the device maximum of
ond] configuring the DISC FPGA with the global con- 12 MB/sec.
troller. During execution] the processor validates the Custom instruction modules should remain resident
presence of each instruction in the hardware. If the in- in the processor for long periods of time to decrease the
struction requested by the application program does reconfiguration overhead. In addition] custom instruc-
not exist on the hardware] the processor enters a halt- tion modules should provide enough performance im-
ing state and requests the instruction module from the provement over a sequence of general purpose ALU in-
host. structions to justify the cost of reconfiguration at run-

104

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
time. The following application example will demon- simple instructions used in the general purpose ap-
strate this tradeoff. proach.
The MEAN instruction module calculates the aver-
7 Application Example age of a 3x3 neighborhood through the use of a sliding
A simple image mean filter was developed as both window as seen in Figure 11. Each numbered element
a sequence of general purpose instructions and as an of the sliding window represents a pixel register in the
application specific hardware module to demonstrate custom module. Instead of loading the entire window
the performance improvements gained by tailoring the from memory at each pixel, register values are shifted
hardware t o the application. Both demonstrations to represent a sliding window (see Figure 12). Only
calculate the mean value of each pixel in an image, registers 3, 6, and 9 are loaded at each new pixel.
g(z, y), by obtaining an average over a 3x3 neighbor-
hood as follows:
! I l I I !
- 1 1

A coefficient of was used to simplify the design. The


128 x 64 grey scale image in Figure 10 was used as the
test image for both cases.

Figure 11: Sliding Pixel Window.

With the window registers loaded, the custom in-


struction module adds all nine pixel values in parallel
with eight custom adders as seen in Figure 12. The di-
vision by eight is achieved by shifting the results three
bit positions.

Figure 10: Original Test Image.

7.1 General Purpose Approach


The general purpose approach required four in-
structions not found in the processor core: add, sub- Figure 12: Dataflow of MEAN Instruction Module.
tract, shift, and enhanced addressing modes. These
additional modules comprised a total of 8 rows, leav- The MEAN instruction requires only 7 clock cycles
ing 38 rows free for other custom instruction modules. to evaluate each pixel of the image. The clock cycles
Execution of the algorithm centered in the in- are scheduled as follows:
ner loop calculation of the 3x3 neighborhood mean
value. Calculating each pixel value involved individu- 1. Load register 3
ally adding each pixel of the neighborhood. Many of 2. Load register 6
the instructions used for this summing operation in-
volved address calculation and pointer manipulations. 3. Load register 9
Computation of each pixel finishes with three shifts 4. Wait (add delay to parallel add)
for the division by eight.
Complete processing of a pixel required an aver- 5 . Write results to image memory
age 160 instructions or 560 clock cycles. Processing 6. Calculate new address
the complete image, including overhead, required 4.59 7. Shift register window
Mclocks or 610 ms (7.5 MHz).
7.2 Application Specific Approach Reducing the pixel calculation to seven clock cy-
The application specific approach significantly im- cles and eliminating much of the address calculation
proves performance of the algorithm by assuming con- overhead reduces the clock count from 4.59M in the
trol of address generation, buffering pixel values, and general purpose case t o 57k for an 80 times speedup.
pipelining the arithmetic. With 31 rows of hardware, Operating at 7.5 MHz, the image is filtered in 7.6 ms.
the extra registers, arithmetic operators and control Figure 13 displays the image filtered with the MEAN
logic consume significantly more hardware than the custom instruction.

105

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
Although the techniques of partial configuration,
relocatable hardware, and the linear hardware model
were implemented as a general purpose processor,
they offer similar advantages to other digital archi-
tectures. They may enhance the usefulness of FPGA
co-processors by providing demand-driven computa-
tion. In addition, these techniques may allow FPGA
based computing machines to operate in more dy-
namic environments such as multi-tasking operating
systems. Any digital architecture that could benefit
from demand-driven hardware may find these tech-
niques useful.
Figure 13: Test Image Filtered Through MEAN Cus- References
tom Instruction. Algotronix, Edinburgh, UK. CALI024 Prelimi-
nary Data Sheet, 1988.
7.3 Configuration Overhead P. M. Athanas and H. F. Silverman. Processor
Because the cost of reconfiguring the application- reconfiguration through instructiom-set metamor-
specific instruction module is so high, configuration phosis. Computer, 26(3):11-18, March 1993.
overhead must be considered when comparing the two
approaches. The 31 row MEAN instruction requires Atmel, San Jose, CA. Configurable Logic: Design
an additional 140 kcycles for configuration, raising the 63 Application Book, 1993-1994.
total cycle count to 197 kcycles. The MEAN configu-
ration overhead represents 71% of the total operating R. Camerota and J. Rosenberg. Data acquisition
time. If device configuration speeds are maximized, systems using Cache Logic FPGAs. In Conjig-
this configuration overhead is reduced to 16% of the urable Logic: Design 63 Application Book, pages
total operating time. 7.15-7-18. Atmel, San Jose, CA, 1993-1994.
The extra four modules needed for the general pur-
pose approach require only 36 kcycles for configura- J . Davidson. FPGA implementation of a recon-
tion. This represents less than 1%of the total operat- figurable microprocessor. In Proceedings of the
ing time. When considering the high-cost of configura- IEEE 1993 Custom Integrated Circuits Confer-
tion in total operating time, the MEAN filter custom ence, pages 3.2.1-3.2.4, 1993.
instruction provides a 23 times speedup t o the general
purpose approach (see Table 2). J. G. Eldredge and B. L. Hutchings. Density en-
hancement of a neural network using FPGAs and
run-time reconfiguration. In D. A. Buell and K. L.
General Application Pocek, editors, Proceedings of IEEE Workshop on
Purvose Svecific FPGAs for Custom Computing Machines, pages
Rows 8 I 31 180-188, Napa, CA, April 1994.

Raw Speedup 1 8U B. S. Fagin. Quantitative measurements of FPGA


Area*Time 36.'(M 1.8M utility in special and general purpose processors.
36k 140k
Journal of VLSI Signal Processing, 6(2):129-137,
Configuration Cycles August 1993.
Total Cycles 4.63M 197k
Actual SDeeduv 1 23.5 P. C. French and R. W. Taylor. A self-
reconfiguring processor. In D. A. Buell and K. L.
Pocek, editors, Proceedings of IEEE Workshop on
Table 2: Performance Comparison between General FPGAs for Custom Computing Machines, pages
Purpose and Application Specific Approaches. 50-59, Napa, CA, April 1993.
X. P. Ling and H. Amano. WASMII: a data driven
8 Conclusions computer on a virtual hardware. In D. A. Buell
The DISC processor successfully demonstrates that and K. L. Pocek, editors, Proceedings of IEEE
application specific processors with arbitrarily large Workshop on FPGAs for Custom Computing Ma-
instruction sets can be be constructed on partially chznes, pages 33-42, Napa, CA, April 1993.
reconfigurable FPGAs. The relocatable hardware
model improved run-time utilization of FPGA re- P. Lysaght. Dynamically reconfigurable logic
sources and the linear hardware model provided a con- in undergraduate projects. In W. Moore and
venient framework for relocating custom instruction W. Luk, editors, FPGAs: Proceedings of the 2991
modules. DISC demonstrates the general concept of International workshop on field-programmable
alleviating density constraints of FPGAs by partially logic and applications, Oxford, England, Septem-
reconfiguring a device at run-time. ber 1991. Abingdon EE and CS Books.

106

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.
[ll] P. Lysaght and J. Dunlop. Dynamic reconfigura-
tion of FPGAs. In W. Moore and W. Luk, edi-
tors, More FPGAs: Proceedings of the 1993 In-
ternational workshop on field-programmable logic
and applications, pages 82-94, Oxford, England,
September 1993.
[12] S. Monaghan and C. P. Cowen. Reconfigurable
multi-bit processor for DSP applications in statis-
tical physics. In D. A. Buell and K. L. Pocek, ed-
itors, Proceedings of IEEE Workshop on FPGAs
for Custom Computing Machines, pages 103-110,
Napa, CA, April 1993.
[13] National Semiconductor. Configurable Logic Ar-
ray (CLAY) Data Sheet, December 1993.
[14] T. G . Rauscher and A. K. Agrawala. Dy-
namic problem-oriented redefinition of com-
puter architecture via microprogramming. IEEE
Transactions on Computers, C-27( 11):1006-1014,
November 1978.
[15] J. Rosenberg. Implementing Cache Logictm with
FPGAs. In Configurable Logic: Design 63 Appli-
cation Book, pages 7.11-7.14. Atmel, San Jose,
CA, 1993-1994.
[16] M. J. Wirthlin, B. L. Hutchings, and K. L. Gilson.
The Nan0 Processor: A low resource reconfig-
urable processor. In D. A. Buell and K. L. Pocek,
editors, Proceedings of IEEE Workshop on FP-
GASfor Custom Computing Machines, pages 23-
30, Napa, CA, April 1994.
[17] A. Wolfe and J . P. Shen. Flexible processors:
a promising application-specific processor design
approach. In Proceedings of the 21st Annual
Workshop on Microprogramming and Microarchi-
tecture - MICRO '21, pages 30-39, San Diego,
CA, November 1988.

107

Authorized licensed use limited to: ULAKBIM UASL - IZMIR YUKSEK TEKNOLOJI ENSTITUSU. Downloaded on October 19,2024 at 09:06:26 UTC from IEEE Xplore. Restrictions apply.

You might also like