Introduction To Reconfigurable Computing
Introduction To Reconfigurable Computing
Introduction To Reconfigurable Computing
Introduction
to Reconfigurable
Computing
Architectures, Algorithms,
and Applications
by
Christophe Bobda
University of Kaiserslautern, Germany
A C.I.P. Catalogue record for this book is available from the Library of Congress.
Published by Springer,
P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
www.springer.com
Foreword vii
Preface xiii
About the Author xv
List of Figures xvii
List of Tables xxv
1. INTRODUCTION 1
1 General Purpose Computing 2
2 Domain-Specific Processors 5
3 Application-Specific Processors 6
4 Reconfigurable Computing 8
5 Fields of Application 9
6 Organization of the Book 11
2. RECONFIGURABLE ARCHITECTURES 15
1 Early Work 15
2 Simple Programmable Logic Devices 26
3 Complex Programmable Logic Device 28
4 Field Programmable Gate Arrays 28
5 Coarse-Grained Reconfigurable Devices 49
6 Conclusion 65
3. IMPLEMENTATION 67
1 Integration 68
2 FPGA Design Flow 72
3 Logic Synthesis 75
4 Conclusion 98
vi Contents
9. APPLICATIONS 285
1 Pattern Matching 286
2 Video Streaming 294
3 Distributed Arithmetic 298
4 Adaptive Controller 307
5 Adaptive Cryptographic Systems 310
6 Software Defined Radio 313
7 High-Performance Computing 315
8 Conclusion 317
References 319
Appendices 336
A Hints to Labs 337
1 Prerequisites 338
2 Reorganization of the Project Video8 non pr 338
B Party 345
C Quick Part-Y Tutorial 349
Preface
Dr. Bobda received the Licence degree in mathematics from the Univer-
sity of Yaounde, Cameroon, in 1992, the diploma of computer science and the
Ph.D. degree (with honors) in computer science from the University of Pader-
born in Germany in 1999 and 2003, respectively. In June 2003, he joined the
department of computer science at the University of Erlangen-Nuremberg in
Germany as post doc. In October 2005, he moved to the University of Kaiser-
slautern as Junior Professor, where he leads the working group Self-Organizing
Embedded Systems in the department of computer science. His research inter-
ests include reconfigurable computing, self-organization in embedded systems,
multiprocessor on chip and adaptive image processing.
Dr. Bobda received the Best Dissertation Award 2003 from the University of
Paderborn for his work on synthesis of reconfigurable systems using temporal
partitioning and temporal placement.
Dr. Bobda is member of The IEEE Computer Society, the ACM and the
GI. He has also served in the program committee of several conferences (FPL,
FPT, RAW, RSP, ERSA, DRS) and in the DATE executive committee as pro-
ceedings chair (2004, 2005, 2006, 2007). He served as reviewer of several
journals (IEEE TC, IEEE TVLSI, Elsevier Journal of Microprocessor and Mi-
crosystems, Integration the VLSI Journal ) and conferences (DAC, DATE, FPL,
FPT, SBCCI, RAW, RSP, ERSA).
List of Figures
INTRODUCTION
A memory for storing program and data. Harvard architectures contain two
parallel accessible memories for storing program and data separately.
A control unit (also called control path) featuring a program counter that
holds the address of the next instruction to be executed.
An arithmetic and logic unit (also called data path) in which instructions
are executed.
Figure 1.2. Sequential and pipelined execution of instructions on a Von Neumann Computer
Introduction 5
2. Domain-Specific Processors
A domain-specific processor is a processor tailored for a class of algorithms.
As mentioned in the previous section, the data path is tailored for an optimal
execution of a common set of operations that mostly characterizes the algo-
rithms in the given class. Also, memory access is reduced as much as possible.
Digital Signal Processor (DSP) belong to the most used domain-specific pro-
cessors.
A DSP is a specialized processor used to speed-up computation of repeti-
tive, numerically intensive tasks in signal processing areas such as telecommu-
nication, multimedia, automobile, radar, sonar, seismic, image processing, etc.
The most often cited feature of the DSPs is their ability to perform one or more
multiply accumulate (MAC) operations in single cycle. Usually, MAC opera-
tions have to be performed on a huge set of data. In a MAC operation, data
are first multiplied and then added to an accumulated value. The normal VN
computer would perform a MAC in 10 steps. The first instruction (multiply)
would be fetched, then decoded, then the operand would be read and multiply,
the result would be stored back and the next instruction (accumulate) would be
read, the result stored in the previous step would be read again and added to
the accumulated value and the result would be stored back. DSPs avoid those
steps by using specialized hardware that directly performs the addition after
multiplication without having to access the memory.
Because many DSP algorithms involve performing repetitive computations,
most DSP processors provide special support for efficient looping. Often a
special loop or repeat instruction is provided, which allows a loop implementa-
tion without expending any instruction cycles for updating and testing the loop
counter or branching back to the top of the loop. DSPs are also customized for
data with a given width according to the application domain. For example if a
DSP is to be used for image processing, then pixels have to be processed. If the
pixels are represented in Red Green Blue (RGB) system where each colour is
represented by a byte, then an image processing DSP will not need more than
8 bit data path. Obviously, the image processing DSP cannot be used again for
applications requiring 32 bits computation.
6 Reconfigurable Computing
3. Application-Specific Processors
Although DSPs incorporate a degree of application-specific features such
as MAC and data width optimization, they still incorporate the VN approach
and, therefore, remain sequential machines. Their performance is limited. If
a processor has to be used for only one application, which is known and fixed
in advance, then the processing unit could be designed and optimized for that
particular application. In this case, we say that ‘the hardware adapts itself to
the application’.
In multimedia processing, processors are usually designed to perform the
compression of video frames according to a video compression standard. Such
processors cannot be used for something else than compression. Even in com-
pression, the standard must exactly match the one implemented in the proces-
sors. A processor designed for only one application is called an Application-
Specific Processor (ASIP). In an ASIP, the instruction cycles (IR, D, EX, W)
are eliminated. The instruction set of the application is directly implemented
in hardware. Input data stream in the processor through its inputs, the proces-
sor performs the required computation and the results can be collected at the
outputs of the processor. ASIPs are usually implemented as single chips called
Application-Specific Integrated Circuit (ASIC)
Algorithm 1
if a < b then
d=a+b
c=a·b
else
d=b+1
c=a−1
end if
With tcycle being the instruction cycle, the program will be executed in 3 ∗
5 ∗ tcycles = 15 ∗ tcycle without pipelining.
Let us now consider the implementation of the same algorithm in an ASIP.
We can implement the instructions d = a + b and c = a ∗ b in parallel. The
same is also true for d = b + 1, c = a − 1 as illustrated in figure 1.3
Introduction 7
The four instructions a+b, a∗b, b+1, a−1 as well as the comparison a < b
will be executed in parallel in a first stage. Depending on the value of the com-
parison a < b, the correct values of the previous stage computations will be
assigned to c and d as defined in the program. Let tmax be the longest signal
needed by a signal to move from one point to another in the physical implemen-
tation of the processor (this will happen on the path Input-multiply-multiplex).
tmax is also called the cycle time of the ASIP processor. For two inputs a and
b, the results c and d can be computed in time tmax . The VN processor can
compete with this ASIP only if 15 ∗ tcycle < tmax , i.e. tcycle < tmax /15. The
VN must be at least 15 times faster than the ASIP to be competitive. Obviously,
we have assumed a VN without pipeline. The case where a VN computer with
a pipeline is used can be treated in the same way.
ASIPs use a spatial approach to implement only one application. The func-
tional units needed for the computation of all parts of the application must be
available on the surface of the final processor. This kind of computation is
called ‘Spatial Computing’.
Once again, an ASIP that is built to perform a given computation cannot be
used for other tasks other than those for which it has been originally designed.
8 Reconfigurable Computing
4. Reconfigurable Computing
From the discussion in the previous sections, where we studied three
different kinds of processing units, we can identify two main means to charac-
terize processors: flexibility and performance.
The VN computers are very flexible because they are able to compute any
kind of task. This is the reason why the terminology GPP (General Pur-
pose Processor) is used for the VN machine. They do not bring so much
performance, because they cannot compute in parallel. Moreover, the five
steps (IR, D, R, EX, W) needed to perform one instruction becomes a major
drawback, in particular if the same instruction has to be executed on huge
sets of data. Flexibility is possible because ‘the application must always
adapt to the hardware’ in order to be executed.
ASIPs bring much performance because they are optimized for a particular
application. The instruction set required for that application can then be
built in a chip. Performance is possible because ‘the hardware is always
adapted to the application’.
If we consider two scales, one for the performance and the other for the flex-
ibility, then the VN computers can be placed at one end and the ASIPs at the
other end as illustrated in figure 1.4.
Between the GPPs and the ASIPs are a large numbers of processors. De-
pending on their performance and their flexibility, they can be placed near or
far from the GPPs on the two scales.
Given this, how can we choose a processor adapted to our computation
needs? If the range of applications for which the processor will be used is large
or if it is not even defined at all, then the GPP should be chosen. However, if
the processor is to be used for one application like it is the case in embedded
systems, then the best approach will be to design a new ASIP optimized for
that application.
Ideally, we would like to have the flexibility of the GPP and the performance
of the ASIP in the same device. We would like to have a device able ‘to adapt
to the application’ on the fly. We call such a hardware device a reconfigurable
hardware or reconfigurable device or reconfigurable processing unit (RPU)
in analogy the Central Processing Unit (CPU). Following this, we provide a
definition of the term reconfigurable computing. More on the taxonomy in
reconfigurable computing can be found in [111] [112].
Definition 1.2 (Reconfigurable Computing) Reconfigurable com-
puting is defined as the study of computation using reconfigurable devices.
For a given application, at a given time, the spatial structure of the device
will be modified such as to use the best computing approach to speed up that
application. If a new application has to be computed, the device structure will
be modified again to match the new application. Contrary to the VN comput-
ers, which are programmed by a set of instructions to be executed sequentially,
the structure of reconfigurable devices are changed by modifying all or part
of the hardware at compile-time or at run-time, usually by downloading a so-
called bitstream into the device.
Definition 1.3 (Configuration, Reconfiguration) Configuration
respectively reconfiguration is the process of changing the structure of a recon-
figurable device at star-up-time respectively at run-time
Progress in reconfiguration has been amazing in the last two decades. This is
mostly due to the wide acceptance of the Field Programmable Gate Array (FP-
GAs) that are now established as the most widely used reconfigurable devices.
The number of workshops, conferences and meetings dealing with this topics
has also grown following the FPGA evolution. Reconfigurable devices can be
used in a wide number of fields, from which we list some in the next section.
5. Fields of Application
In this section, we would like to present a non-exhaustive list of fields, where
the use of reconfiguration can be of great interest. Because the field is still
growing, several new fields of application are likely to be developed in the
future.