Paris July 02 Course 1
Paris July 02 Course 1
Reconfigurable Computing
July 8, 2002, ENST, Paris, France
part 1:
Reconfigurable Computing (RC)
Schedule
Xputer Lab
University of Kaiserslautern
time slot
xx.30 – xx.00 Reconfigurable Computing (RC)
xx.00 – xx.30 coffee break
xx.30 – xx.00 Design / Compilation Techniques for
RC
xx.00 – xx.00 lunch break
xx.00 – xx.30 Resources for Data-Stream-based RC
xx.30 – xx.00 coffee break
xx.00 – xx.30 FPGAs: recent developments
© 2002, [email protected] 2 http://kressarray.de
Xputer Lab
Opportunities by new patent
University of Kaiserslautern
laws ?
• Emerging awareness:
– New mind set
– New curricular embedding
• coming Dichotomie of CS
– SW <-> CW
– HW <-> FW
– computing in time <-> computing in space
coarse
grain hard-
reconfig- wired
FPGA urable
cifi n-
specatio
fic*-
rpo l
se
Sp main
pu nera
c
eci
pli
ge
do
ap
flexibility trade-off
efficiency
ASPP, application-specific
programmable product is:
• Application-specific Flash / RAM
DRAM/Flash/SRAM
standard product and: memory banks
• embedded programmable Logic
logic
Programmable Logic
CSoC, configurable SoC is: M ,
AR S ,
• an industry standard Reconfigurable MIP..
µProcessor, Microprocessor
Accelerator or.
• embedded reconfigurable
Array Analog
array,
Soap
• memory, Chip : System
dedicated on a bus
systen Logic
programmable
... Chip
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de
Source:Altera
1.2
Price (Normalized to Q1/1993)
0.4
0.261
TTL µproc.,
?
re
memory
co
1967 1987 2007
nfi
LSI, ASICs,
?
1957 1977 1997
accel’s
gu
MSI d xt
e
ra
custom l h
is g9ne
b
bl
Pu m1i9 n8
e
c oin
at ’s
W h
© 2002, [email protected] 12 http://kressarray.de
Xputer Lab
Makimoto’s 3rd Wave
University of Kaiserslautern
•
–
–
•
–
–
–
© 2002, [email protected] 13 http://kressarray.de
Xputer Lab
How’s next Wave ?
University of Kaiserslautern
1967 1987
FPGAs
Coarse
2007 grain
?
1957 1977 1997
RAs
custom ? Hartenstein’s
Curve
Tredennick’s
Paradigm Shifts no further wave !
Xputer Lab
University of Kaiserslautern
Makimoto’s Paradigm
Shifts
Software Industry’s Repeat Success Story by
Secret of Success new Machine Paradigm !
Procedural structural
Personalization personalization personalization:
(CAD) before via RAM-based RAM-based
fabrication Machine Paradigm before run time
standard
TTL µproc.,
re
memory
co
1967 1987 2007
nfi
1957
LSI, 1977 ASICs, 1997
accel’s
gu
MSI
ra
custom
bl
e
© 2002, [email protected] 15 http://kressarray.de
Semiconductor Revolutions
Xputer Lab
University of Kaiserslautern
Tredennick’s
Paradigm Shifts vN machine
paradigm
anti machine
paradigm
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de
Logic Synthesis
“von Neumann”
loading time
Reconfigurable
Computing
compile time
programming domain:
time domain time & space space domain
(procedural) (hybrid) (structural)
• In a number of application
areas throughput
requirements are growing
faster than Moore's law
• Fundamental flaws in
software processor
solutions
• Stream-based distributed
processing is the way to go
© 2002, [email protected] 26 http://kressarray.de
Xputer Lab
It’s a Paradigm Shift !
University of Kaiserslautern
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de
area used by L L L
application
partly for S S
configuration
code storage
L L L
resources
needed for
reconfigurability S S
“hidden RAM”
not shown L L L
FF FF CLB CLB
Connection-
Point
FF
FF
CLB CLB
Tap
FF FF
FF FF
>1000 transistors
>
Ý 40 transistors at each cross bar
at each
Routing Congestion [DeHon]: switching
often 50% or less of CLBs used point FF FF
FF
FF part of the
hidden RAM >
Ý 15 transistors
at each tap FF
most FPGA
vendors’
FF FF
gate count:
1 flipflop of FF FF
configuration
RAM = 4 gates
physical
ory ~ 10
m logical
me
ic
100 000 000
t ol FPGA
000
sys physical ~ 10 000
p er
10 000 000 000 su
FPGA
Transistors / chip
• rDPAs;
– KressArray (academic)
– Xtreme (PACT AG, Munich)
– ACM (Quicksilver Tech)
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de
Foundries offer
up to 8 metal layers
and up to 3 poly
layers
reconfigurable
interconnect fabric
layouted over the
rDU cell
routing
only
e) rD PU:
routing +
and
function
g) h) i)
f)
16 8 32
taylored KressArray 24
rDPU example
2
rDPU
external view: only 4
NNport Abutment
Architecture shown
http://kressarray.de
© 2002, [email protected] 40 http://kressarray.de
KressArray Family generic
Xputer Lab
University of Kaiserslautern Select mode,
Fabrics:
Select
a few examples
number, Function
width of Repertory
NNports
16 8 32
+
24
2 rout-through rout-
rDPU and function throug
h
4
only
more NNports:
rich Rout Resources
select Nearest Neighbour (NN) Interconnect: an
example
Examples of
2nd Level
Interconnect:
layouted
over
rDPU cell -
no separate
routing areas
http://kressarray.de
© 2002, [email protected] 41 ! http://kressarray.de
SNN filter KressArray Mapping
Xputer Lab
University of Kaiserslautern Example
http://kressarray.de
rout thru only
array size:
10 x 16
= 160
rDPUs Legend: rDPU not used backbus
backbus connect connect
used for routing only operator and routing not usedmarker
port location
CFG
ALU Ctrl
PAE
core
• 2 X PACs (Cluster) • Full 32 or 24 Bit Design
• • 2 Configuration Hierarchies
128 X ALU-PAEs
• Evaluation Board (2001)
• 32 X 1Kbyte RAM-PAEs
• XDS Development Tool with
• 8X I/O Elements Simulator
library
placement
mapper & routing
scheduler
data stream assembly
0.001
2 1 0.5 0.25 0.13 0.1 0,07 µ feature size
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de
multi-context:
Configuration Loading Resources:
• separate configuration fabrics (e.g. host RAM
Soft
FPGA)
• wormhole routing (KressArray, Colt, PipeRench)Compiler, RAM
Data
Mapper, RAM Path
• RA part computes code for other
RA part (self reconfiguration) dynamic RTOS
etc. RAM
- END -
© 2002, [email protected] 51 http://kressarray.de
Xputer Lab
Also as an autonomous Machine
University of Kaiserslautern
time slot
08.30 – Reconfigurable Computing (RC)
10.00
10.00 – coffee break
10.30
10.30 – Compilation Techniques for RC
12.00
12.00 – lunch break
14.00
14.00 – Resources for Stream-based RC
© 2002, 15.30
[email protected] 54 http://kressarray.de
Schedule
Xputer Lab
University of Kaiserslautern
time slot
08.30 – Reconfigurable Computing (RC)
10.00
10.00 – coffee break
10.30
10.30 – Compilation Techniques for RC
12.00
12.00 – lunch break
14.00
14.00 – Resources for Stream-based RC
© 2002, 15.30
[email protected] 55 http://kressarray.de
Xputer Lab
Primarily Mesh-based ….
University of Kaiserslautern
C C C C
T EXU T EXU T EXU T EXU
L L L L
1990: UC
Berkeley (Jan
Rabaey) crossbar switch
I/O I/O
1993: PADY-II
(Jan Rabaey)
C C C C
T EXU T EXU T EXU T EXU
1997: Pleiades L L L L
(mesh & crossbar)
32 bit
break-switch
break-switch
P46
Level-2
Network
16 x 16b P47
ALU
SequencerALU R
A
R
A
R
A
R
A
M M M M
R R R R
Memory Interface
A A A A
M M M M
ALU R
A
R R
A
R
A
A
M M M M
R R R R
A A A A
ine
M M M M
ALU ALU
RAM
R R R R
ach
A A A A
M M M M
y4
te M
R R R R
A A A A
ALU M M M M
16b
Sta
R R R R
A A A A
M M M M
• RISC processor and an array of 108 arithmetic processing units. Each of those
32-bit processing cores runs at 125 MHz.
alternating
data/instructio
n stream
highly
dynamic
reconfigurati
on
C / R Network
BFU opc operation
RAW (M.I.T. 1997) compare / reduce 2 0 ×
1 ×+
MIPS-like
2 ×++
processor 3 × const
WE mode
4 insh
Network Port A
core
Network Port B
256x8 bit nsh
5
Mem 6 dsh
7 csh
global cross 8 +
lines bar 9 +0
ti- r
u l 10 +1
global 8 bit mnula 11
Reconfigurable lines ALU ra 12 :=
Architecture g 13 nand
Workbench compare / reduce 1
14 nor
Level-1 Network C / R Network 15 xor
its
neighbours
M
Datapath U
Registers L
T
R R R
A A A A A A
L M L M L M
U U U
Multiplier
DP DP
wormhol I/O Pins I/O Pins
reconfigurati
on
© 2002, [email protected] 70 http://kressarray.de
Xputer Lab
SOC Alternatives… not including
University of Kaiserslautern
C/C++ CAD Tools [Gordon Bell]
• The blank sheet of paper: FPGA
• Auto design of a basic system: Tensilica
• Standardized, committee designed components*, cells,
and custom IP
• Standard components including more application specific
processors *, IP add-ons and custom
• One chip does it all: SMOP **
*) Processors, Memory, Communication & Memory Links,
**) Simple Matter of Programming
1000 000
Algorithmic Complexity microprocessor / DSP
100 000 100
(Shannon’s Law)
1G
10 000 computational 10
efficiency
1000 SH7752 1
processor speed
100 0.1
battery performance
10 0.01
1 0.001
1960 1970 1980 1990 2000 2010
15
1500 $
er PC
s um
es
C o n
nc
S
ia
r&
Ap
a Cons
l av. reumer PC
llu [fo sale ($)
n
ce r re s t e
io
r]
at
m
f or
In
1000 000
Algorithmic Complexity microprocessor / DSP
100 000 100
(Shannon’s Law)
1G
10 000 computational 10
efficiency
1000 SH7752 1
processor speed
100 0.1
battery performance
10 0.01
1 0.001
1960 1970 1980 1990 2000 2010
Stream-based ComputingArrays ch
• ignored by Curricula & most R&D scenes
scheduling
pipeline properties
array applications mapping (data stream
formation)
shape resources
regular data
systolic linear uniform linear projection or
dependencies
array only only algebraic synthesis
only
super- simulated (e.g. force-directed)
systolic no restrictions annealing or scheduling
rDPA
* P&R algorithm algorithm
*) KressArray [1995]
scheduling
pipeline properties
array applications mapping (data stream
formation)
shape resources
regular data
systolic linear uniform linear projection or
dependencies
array only only algebraic synthesis
only
super- simulated (e.g. force-directed)
systolic no restrictions annealing or scheduling
RA * P&R algorithm algorithm
*) KressArray [ASP-DAC-1995]
[13]
operator
http://kressarray.de + operand
configuration time
Extremes:
Class of design
processor product vendor time ASIP
ASI P Tensilica Tensilica fabrication
time
MECA family Malleable
compile statically re-
Network time
Processor CALISTO SiliconSpice configurable
Dissertation
Jürgen Becker: Professor at Univ. Karlsruhe
• ... Automatically partitioning Co-compiler
• (configware / software co-compilation)
• Resource-parameter-driven retargettable
• Profiler-driven optimization
• Accepts HLL „ALE-X“ (extended C subset)
• (subset: pointers not supported)
© 2002, [email protected] 91 http://kressarray.de
Xputer Lab
Karin Schmidt
University of Kaiserslautern
Dissertation
Karin Schmidt: Karin Schmidt,
DaimlerChrysler
• Compilation Techniques Research
for Xputers
• modified loop transformations
• Modified parts of implementation used
for Jürgen Becker‘s Ph. D. thesis