0% found this document useful (0 votes)
29 views93 pages

Paris July 02 Course 1

Combinational Design

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views93 pages

Paris July 02 Course 1

Combinational Design

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 93

Enabling Technologies for

Reconfigurable Computing
July 8, 2002, ENST, Paris, France

Reiner Hartenstein Enabling Technologies for


University of Reconfigurable Computing
Kaiserslautern
and Software / Configware
Co-Design

part 1:
Reconfigurable Computing (RC)
Schedule
Xputer Lab
University of Kaiserslautern

time slot
xx.30 – xx.00 Reconfigurable Computing (RC)
xx.00 – xx.30 coffee break
xx.30 – xx.00 Design / Compilation Techniques for
RC
xx.00 – xx.00 lunch break
xx.00 – xx.30 Resources for Data-Stream-based RC
xx.30 – xx.00 coffee break
xx.00 – xx.30 FPGAs: recent developments
© 2002, [email protected] 2 http://kressarray.de
Xputer Lab
Opportunities by new patent
University of Kaiserslautern
laws ?

• to clever guys being keen on patents:

• don‘t file for patent following details !

• everything shown in this presentation


has been published years ago

© 2002, [email protected] 3 http://kressarray.de


Xputer Lab
Reconfigurable: why?
University of Kaiserslautern

•Exploding design cost and shrinking product life cycles of ASICs


create a demand on RA usage for product longevity.
•Performance is only one part of the story. The time has come fully
exploit their flexibility to support turn-around times of minutes
instead of months for real time in-system debugging, profiling,
verification, tuning, field-maintenance, and field-upgrades.
•A new “soft machine” paradigm and language framework is
available for novel compilation techniques to cope with the new
market structures transferring synthesis from vendor to customer.

© 2002, [email protected] 4 http://kressarray.de


A Decade of Research in
Xputer Lab
University of Kaiserslautern Reconfigurable Computing

• Due to the achievements of numerous Research


Projects throughout the 90ies the Breakthrough
in Commercialization has started and already a
quite comprehensive Methodology is available.

• Dear Colleague, the RC Scene welcomes your


contributions to improve it and to push for
Inclusion in contemporary CS&E Curricula.

• It is one of the Goals of this Talk to stimulate you


by Highlights and introducing some Key Issues.

© 2002, [email protected] 5 http://kressarray.de


Xputer Lab
no more a strange niche area
University of Kaiserslautern

• was “Hardware” design for a strange platform


– CAD, but no Compilation

• Emerging awareness:
– New mind set
– New curricular embedding
• coming Dichotomie of CS
– SW <-> CW
– HW <-> FW
– computing in time <-> computing in space

© 2002, [email protected] 6 http://kressarray.de


Xputer Lab
University of Kaiserslautern
flexibility / universality trade-off

coarse
grain hard-
reconfig- wired
FPGA urable

cifi n-
specatio
fic*-
rpo l
se

Sp main
pu nera

c
eci

pli
ge

do

ap
flexibility trade-off
efficiency

*) by design space explorer (e. g. Kressarray)


© 2002, [email protected] 7 http://kressarray.de
Xputer Lab
RAs are heading for Mainstream
University of Kaiserslautern
... become indispensable for SoC products ?

ASPP, application-specific
programmable product is:
• Application-specific Flash / RAM
DRAM/Flash/SRAM
standard product and: memory banks
• embedded programmable Logic
logic

Programmable Logic
CSoC, configurable SoC is: M ,
AR S ,
• an industry standard Reconfigurable MIP..
µProcessor, Microprocessor
Accelerator or.
• embedded reconfigurable
Array Analog
array,
Soap
• memory, Chip : System
dedicated on a bus
systen Logic
programmable
... Chip

© 2002, [email protected] 8 http://kressarray.de


Reconfigurable Logic going Mainstream
Xputer Lab
University of Kaiserslautern

• Fine grain: FPGAs shrinking the ASIC market


• Fastest growing segment of semiconductor market
• Substantially improved design flow and libraries
• Coarse grain: PACT AG startup
• Comprehensive Methodology
• Please, Lobby for New Curricula.
• One of the goals of this talk: to motivate You by
Key Issues and Visionary Highlights.

© 2002, [email protected] 9 http://kressarray.de


Xputer Lab
>> History
University of Kaiserslautern

• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de

© 2002, [email protected] 10 http://kressarray.de


Xputer Lab Logic Gate Price Trend
University of Kaiserslautern

Source:Altera
1.2
Price (Normalized to Q1/1993)

1 Price per Logic Element


0.8 40% lower per Year
0.6

0.4
0.261

0.2 0.086 0.042 0.029


0
Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q1
'93 '94 '95 '96 '97 '98 '99 '00

© 2002, [email protected] 11 http://kressarray.de


The History of
Xputer Lab
University of Kaiserslautern
Paradigm Shifts

“Mainstream Silicon Application


is switching every 10 Years”
’ s W a v e
akimoto

2nd Design Crisis


1st Design Crisis
M “The Programmable System-on-a-Chip
is the next wave“
standard

TTL µproc.,
?

re
memory

co
1967 1987 2007

nfi
LSI, ASICs,
?
1957 1977 1997
accel’s

gu
MSI d xt
e

ra
custom l h
is g9ne
b

bl
Pu m1i9 n8

e
c oin
at ’s
W h
© 2002, [email protected] 12 http://kressarray.de
Xputer Lab
Makimoto’s 3rd Wave
University of Kaiserslautern







© 2002, [email protected] 13 http://kressarray.de
Xputer Lab
How’s next Wave ?
University of Kaiserslautern

standard hardwired procedural programming structural programming


4th wave ?

1967 1987
FPGAs
Coarse
2007 grain
?
1957 1977 1997
RAs

custom ? Hartenstein’s
Curve

algorithm: fixed algorithm: variable algorithm: variable


resources: fixed resources: fixed resources: variable

Tredennick’s
Paradigm Shifts no further wave !

© 2002, [email protected] 14 http://kressarray.de


The Impact of Dr. Makimoto: FPL 2000 keynote

Xputer Lab
University of Kaiserslautern
Makimoto’s Paradigm
Shifts
Software Industry’s Repeat Success Story by
Secret of Success new Machine Paradigm !

Procedural structural
Personalization personalization personalization:
(CAD) before via RAM-based RAM-based
fabrication Machine Paradigm before run time
standard

TTL µproc.,

re
memory

co
1967 1987 2007

nfi
1957
LSI, 1977 ASICs, 1997
accel’s

gu
MSI

ra
custom

bl
e
© 2002, [email protected] 15 http://kressarray.de
Semiconductor Revolutions
Xputer Lab
University of Kaiserslautern

algorithm: fixed algorithm: variable algorithm: variable


resources: fixed resources: fixed resources: variable

Tredennick’s
Paradigm Shifts vN machine
paradigm
anti machine
paradigm

anti machine paradigm

© 2002, [email protected] 16 http://kressarray.de


Xputer LabLab
Reconfigurable goes mainstream
Xputer
University of Kaiserslautern
University of Kaiserslautern

Topic adopted by congresses: ASP-DAC, DAC, DATE, ISCAS, SPIE ...

• FCCM, FPGA (founded 1992), and FPL (founded 1991 at Oxford,


UK):
International Conference on Field-programmable Logic and
Applications
• FPL 2002, La Grande Motte
(Montpellier, France), Sept. 2 –
4
http://fpl.org
FPL 2002: 214
Submissions -sensational
increase by 83%
© 2002, [email protected]
© 2001, [email protected]
17 http://kressarray.de
http://KressArray.de
Xputer Lab
>> Paradigm Shift
University of Kaiserslautern

• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de

© 2002, [email protected] 18 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Sequential vs. structural RAM

Logic Synthesis

Software Route and Place


(procedural)
downloading download
sequential
I/O RAM RAM FPGA
RAM re-
instruction structural conf.
data path sequencer RAM accelerator(s)

“von Neumann”

© 2002, [email protected] 19 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Changing Models of
Computing
the tail Software
wagging the dog Configware
Software
(procedural) occupies most silicon (structural)
downloading downloading CAD downloading

I/O RAM hardwired


RAM RAM RAM re-
instruction
accelerator(s) conf.
data path sequencer host hostaccelerator(s)

“von Neumann” contemporary reconfigurable


computing
Hardware Flexware
© 2002, [email protected] 20 http://kressarray.de
Xputer Lab The Microprocessor is a Methuselah
University of Kaiserslautern

9 technology generations ...


• 1th 4004
• 2nd 8008 ... the steam engine
• 3rd 8086 of the silicon age
• 4th 80286
• 5th 80386
• 6th 80486
• 7th P5 (Pentium)
• 8th P6 (Pentium Pro / Pentium II)
• 9th Pentium III

© 2002, [email protected] 21 http://kressarray.de


Xputer Lab
Basics of Binding Time
University of Kaiserslautern

time of “Instruction Fetch”

run time microprocessor


parallel computer

loading time
Reconfigurable
Computing
compile time

© 2002, [email protected] 22 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Binding Time vs. Computing
Domain
Binding time: (Set-up of
Communication Channels)

microprocessor array processor


at run time parallel computer Antimachine
with rDPA
at loading time
Reconfigurable
Computing
at compile time The KressArray
is a generalization
of the systolic array
later fabrication step ASICs
Antimachine
with DPA super
before fabrication systolic
systolic full custom
arrays ICs

programming domain:
time domain time & space space domain
(procedural) (hybrid) (structural)

© 2002, [email protected] 23 http://kressarray.de


Applications
Xputer Lab
University of Kaiserslautern
• next generations’ wireles
• network processors
• Image Processing: • many other areas
– for smart car (collision avoidance, others ...),
– Smart traffic pilots, robotics, fast material inspection,
– smart stub finders, motion detection (MPEG-4, ...)
• Signal Processing, Speech Processing, Software Radio,
• Correlation, Encryption, Comm. Switching / Protocols,
• Innovative consumer electronics:
– super smart cards, smart handies, wearable,
– portable, set-top, laptop, desktop, embedded, ...
• many others, ...

© 2002, [email protected] 24 http://kressarray.de


Xputer Lab Applications
University of Kaiserslautern

•new cellular standard: up to 2 Mbit/sec: new CDMA standard: >


500 MIPS needed just for RF receiver part

•wide variety of end-user‘s devices: smart handies, palm pilots,


laptops, games, camcorder-likes, ..the internet car, many new
types of devices to come ...

•increasing wide variety of services available from network


provider:download just what a particular customer is subscribed
to

•expert group [Vissers]: > 20% of it will be accelerator code*

© 2002, [email protected] 25 http://kressarray.de


Xputer Lab
Shannon‘s Law
University of Kaiserslautern

• In a number of application
areas throughput
requirements are growing
faster than Moore's law

• Fundamental flaws in
software processor
solutions

• 32 soft ARM cores fit onto


contemporary FPGA

• Stream-based distributed
processing is the way to go
© 2002, [email protected] 26 http://kressarray.de
Xputer Lab
It’s a Paradigm Shift !
University of Kaiserslautern

• Using FPGAs (fine grain reconfigurable)


just mainly has been classical Logic
Synthesis on a “strange hardware”
platform
• Coarse Grain Reconfigurable Arrays (rDPAs)
(Reconfigurable Computing), however,
mean a really fundamental Paradigm Shift

• This is still ignored by CS and EE


Curricula and almost all R&D scenes

© 2002, [email protected] 27 http://kressarray.de


Xputer Lab
>> Coarse Grain: why ?
University of Kaiserslautern

• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de

© 2002, [email protected] 28 http://kressarray.de


Xputer Lab
Reconfigurability Overhead
University of Kaiserslautern

area used by L L L
application
partly for S S
configuration
code storage
L L L
resources
needed for
reconfigurability S S
“hidden RAM”
not shown L L L

© 2002, [email protected] 29 http://kressarray.de


Xputer Lab
Principle of a Typical FPGA
University of Kaiserslautern

FF FF CLB CLB

Connection-
Point
FF

FF
CLB CLB

Tap
FF FF
FF FF

FF of hidden RAM CLB CLB

© 2002, [email protected] 30 http://kressarray.de


Xputer Lab
Routing Overhead in FPGAs
University of Kaiserslautern

>1000 transistors
>
Ý 40 transistors at each cross bar
at each
Routing Congestion [DeHon]: switching
often 50% or less of CLBs used point FF FF

FF
FF part of the
hidden RAM >
Ý 15 transistors
at each tap FF
most FPGA
vendors’
FF FF
gate count:
1 flipflop of FF FF
configuration
RAM = 4 gates

© 2002, [email protected] 31 http://kressarray.de


Xputer Lab
>>> extremely high efficiency
University of Kaiserslautern

1. avoiding address computation overhead

2. avoiding instruction fetch and interpretation


overhead

3. high parallelism, massively multiple deep pipelines

4. much less configuration memory

5. no routing areas to configure functions from CLBs

© 2002, [email protected] 32 http://kressarray.de


Why Coarse Grain instead of FPGA
Xputer Lab
University of Kaiserslautern ?
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

physical
ory ~ 10
m logical
me
ic
100 000 000
t ol FPGA
000
sys physical ~ 10 000
p er
10 000 000 000 su
FPGA
Transistors / chip

1000 000 000


logical
100 000 000 or FPGA
pr ocess routed
10 000 000 m icro
reduced reconfigurability
1000 000 overhead by up to ~ 1000
100 000 drastically
much fastersmaller
loading
configuration memory
10 000
a lot of more benefits
1980
1000 1990 2000 2010

© 2002, [email protected] 33 http://kressarray.de


Xputer Lab Environments available
University of Kaiserslautern

• rDPAs;
– KressArray (academic)
– Xtreme (PACT AG, Munich)
– ACM (Quicksilver Tech)

• Compilation techniques feasibility studies:


– Partitioning Co-Compiler (academic)
– Design Space Explorer (academic)
– others

© 2002, [email protected] 34 http://kressarray.de


Xputer Lab
>> Coarse Grain
University of Kaiserslautern
Architectures

• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de

© 2002, [email protected] 35 http://kressarray.de


Xputer Lab
rDPA (Reconfigurable Datapath
University of Kaiserslautern
Array)

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU


Reconfigurable Interconnect Fabric
rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU
separate routing area RIF layouted over rDPUs:
rDPA wired by abutment
© 2002, [email protected] 36 http://kressarray.de
Xputer Lab
CMOS intercoonnect resources
University of Kaiserslautern

Foundries offer
up to 8 metal layers

and up to 3 poly
layers
reconfigurable
interconnect fabric
layouted over the
rDU cell

© 2002, [email protected] 37 http://kressarray.de


Generically defined Fabrics:
Xputer Lab
University of Kaiserslautern
KressArray Family
a) c) b) d) rD PU:
rDP U

routing
only

e) rD PU:
routing +
and
function
g) h) i)
f)

me Application Areas, like e. g. Wireless Communica


ed extraordinarily 38 Communication
© 2002, [email protected] powerful Resourc
http://kressarray.de
Xputer Lab
Universal RAs are not always
University of Kaiserslautern
feasible
general purpose (coarse
grain) rDPA is an illusion ...
... often functional resources
are not the throughput
bottleneck
Some Application Areas, such as e. g.
Wireless Communication, need
extremely rich Communication
Resources
..the case for domain-specific platform
generators
© 2002, [email protected] 39 http://kressarray.de
Xputer Lab
KressArray Family Example
University of Kaiserslautern

16 8 32
taylored KressArray 24
rDPU example
2
rDPU
external view: only 4
NNport Abutment
Architecture shown

http://kressarray.de
© 2002, [email protected] 40 http://kressarray.de
KressArray Family generic
Xputer Lab
University of Kaiserslautern Select mode,
Fabrics:
Select
a few examples
number, Function
width of Repertory
NNports
16 8 32
+
24

2 rout-through rout-
rDPU and function throug
h
4
only
more NNports:
rich Rout Resources
select Nearest Neighbour (NN) Interconnect: an
example
Examples of
2nd Level
Interconnect:
layouted
over
rDPU cell -
no separate
routing areas
http://kressarray.de
© 2002, [email protected] 41 ! http://kressarray.de
SNN filter KressArray Mapping
Xputer Lab
University of Kaiserslautern Example

http://kressarray.de
rout thru only

array size:
10 x 16
= 160
rDPUs Legend: rDPU not used backbus
backbus connect connect
used for routing only operator and routing not usedmarker
port location

© 2002, [email protected] 42 http://kressarray.de


Xputer Lab
Dissertations
University of Kaiserslautern

• Michael Herz Agilent, Sindelfingen


• Ulrich Nageldinger, infineon technologies, Munich
• Rainer Kress, infineon technologies, Munich
• Prof. Jürgen Becker, Karlsruhe University
• Karin Schmidt, DaimlerChrysler Research, Ulm

© 2002, [email protected] 43 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Coarse Grain Architectures
style project first source architecture granularity fabrics mapping intended target application
DP-FPGA 1994
publ. [4] 2-D array 1 & 4 bit multi-granular Inhomog. routing channels switchbox routing regular datapaths
KressArray 1995 [5,11] 2-D mesh family: sel. pathwidth multiple NN & bus segments (co-)compilation (adaptable)
Colt 1996 [12] 2-D array 1 & 16 bit inhomogenous run time reconfiguration highly dynamic reconfig.
Matrix 1996 [15] 2-D mesh 8 bit, multi-granular 8NN, length 4 & global lines multi-length general purpose
RAW 1997 [17] 2-D mesh 8 bit, multi-granular 8NN switched connections switchbox rout experimental
Garp 1997 [16] 2-D mesh 2 bit global & semi-global lines heuristic routing loop acceleration
REMARC 1998 [18] 2-D mesh 16 bit NN & full length buses (info not available) multimedia
mesh
MorphoSys 1999 [19] 2-D mesh 16 bit NN, length 2 & 3 global lines manual P&R (not disclosed)
CHESS 1999 [20] hexagon 4 bit, multi-granular 8NN and buses JHDL compilation multimedia
DReAM 2000 [21] 2-D array 8 &16 bit NN, segmented buses co-compilation next generation wireless
CS2000 family 2000 [23] 2-D array 16 & 32 bit inhomogenous array (not disclosed) communication
MECA family 2000 [24] 2-D array multi-granular (not disclosed) (not disclosed) tele- & datacommunication
CALISTO 2000 [25] 2-D array 16 bit multi-granular (not disclosed) (not disclosed) tele- & datacommunication
FIPSOC 2000 [26] 2-D array 4 bit multi-granular (not disclosed) (not disclosed) tele- & datacommunication
RaPID 1996 [27] 1-D array 16 bit segmented buses channel routing pipelining
linear
PipeRench 1998 [29] 1-D array 128 bit (sophisticated) scheduling pipelining
PADDI 1990 [30] crossbar 16 bit central crossbar routing DSP
Cross
PADDI-2 1993 [32] crossbar 16 bit multiple crossbar routing DSP and others
bar
Pleiades 1997 [33] mesh+crossbar multi-granular multiple segmented crossbar switchbox routing multimedia

© 2002, [email protected] 44 http://kressarray.de


Xputer Lab
Commercial rDPAs
University of Kaiserslautern

XPU family (IP cores):


PACT Corp., Munich
CALISTO: Silicon Spice **
CS2000 family:
Chameleon Systems
MECA family: Malleable**
flexible array: MorphICs
ACM: Quicksilver Tech *
CHESS array: Elixent
MorphoSys: Morpho Tech*
FIPSOC: SIDSA
XPU128
**) bought *) here at SoC
© 2002, [email protected] 45 http://kressarray.de
PACT XPP: Reference Module: XPU128 Co-
Xputer Lab
University of Kaiserslautern
Processor
ALU - PAE
XPP128 ALU-Array

CFG
ALU Ctrl

PAE
core
• 2 X PACs (Cluster) • Full 32 or 24 Bit Design
• • 2 Configuration Hierarchies
128 X ALU-PAEs
• Evaluation Board (2001)
• 32 X 1Kbyte RAM-PAEs
• XDS Development Tool with
• 8X I/O Elements Simulator

[Jürgen Becker, • PAE Core is 32- or 24-Bit ALU with


Univ. Karlsruhe] DSP-Instruction Set and Controller
• Connecttions: Inputs + Outputs (Channels) + Events
© 2002, [email protected] 46 http://kressarray.de
Xputer Lab
Datastream-based Compilation Principles
University of Kaiserslautern

library

placement
mapper & routing

scheduler
data stream assembly

© 2002, [email protected] 47 http://kressarray.de


Xputer Lab
Energy Efficiency vs. Flexibility
University of Kaiserslautern

T. Claasen et al.: ISSCC 1999 throughput


MOPS / mW *) R. Hartenstein: ISIS 1997
1000 hard- anti
wired machine
100
i red put
ing )*
FPGAs
d w
ha r
10 om
flexibility bl ec
fi gura
o g ic)
r eco
n
b le l P von
1 A s (
gu ra DS Neumann
rDP o n fi rs
r e c e s s o
GA s( t p r oc
0.1 FP n se flexibility
u c tio c es s or
instr
ro
rd m icrop
nda
sta
0.01

0.001
2 1 0.5 0.25 0.13 0.1 0,07 µ feature size

© 2002, [email protected] 48 http://kressarray.de


Xputer Lab
>> Reconfiguration
University of Kaiserslautern
Architecture

• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture
http://www.uni-kl.de

© 2002, [email protected] 49 http://kressarray.de


Xputer Lab
Configuration Architectures
University of Kaiserslautern (dynamic vs. static)

Configuration caching*: *) no straight forward:


cache as
RAM usual !
host Soft host Soft
Config. RAM RAM Data RAM Data
Compiler, Cache RAM Compiler,
Mapper, Path Mapper, Path
RTOS RAM RTOS
etc. etc.

multi-context:
Configuration Loading Resources:
• separate configuration fabrics (e.g. host RAM
Soft
FPGA)
• wormhole routing (KressArray, Colt, PipeRench)Compiler, RAM
Data
Mapper, RAM Path
• RA part computes code for other
RA part (self reconfiguration) dynamic RTOS
etc. RAM

© 2002, [email protected] 50 http://kressarray.de


Xputer Lab
University of Kaiserslautern

- END -
© 2002, [email protected] 51 http://kressarray.de
Xputer Lab
Also as an autonomous Machine
University of Kaiserslautern

• New Machine Paradigm (Xputer)


• is the counterpart of the so-called von Neumann paradigm
– CONS: confuses customers (paradigm switch: the brain hurts)
– PROS: strong guidance of EDA tool development
– more effective hardware/software APIs
– compilation techniques similar to traditional compilation
– better Application Development Tools accepting C or Java
• easy to teach: simple machine principles
– scan patterns (data counter) similar to control flow (program
counter)
– general model of hardware / software co-design
– fascination for freak effect: opening up a new R&D discipline

© 2002, [email protected] 52 http://kressarray.de


Xputer Lab Future Coarse Grain RA Development
University of Kaiserslautern

• It is indispensable to operate within


the Convergence Area of Compilers,
Co-Compilers, Architecture and full-
custom-style VLSI Design (array cells).

• It is a must, that Products come with


a Development Platform which
encourages users,especially also
those with a limited Hardware
Background.

© 2002, [email protected] 53 http://kressarray.de


Schedule
Xputer Lab
University of Kaiserslautern

time slot
08.30 – Reconfigurable Computing (RC)
10.00
10.00 – coffee break
10.30
10.30 – Compilation Techniques for RC
12.00
12.00 – lunch break
14.00
14.00 – Resources for Stream-based RC
© 2002, 15.30
[email protected] 54 http://kressarray.de
Schedule
Xputer Lab
University of Kaiserslautern

time slot
08.30 – Reconfigurable Computing (RC)
10.00
10.00 – coffee break
10.30
10.30 – Compilation Techniques for RC
12.00
12.00 – lunch break
14.00
14.00 – Resources for Stream-based RC
© 2002, 15.30
[email protected] 55 http://kressarray.de
Xputer Lab
Primarily Mesh-based ….
University of Kaiserslautern

market project bits granularity source


KressArray variable U. Kaiserslautern
Garp 2 UC Berkeley
CHESS 4 Hewlett Packard
Matrix
8 M.I.T.
research RAW
Colt 1 & 16 Virginia Tech
DReAM 8 &16 TU Darmstadt
REMARC Stanford
MorphoSys UC Irvine
16
CALISTO Slicon Spice
MECA family Malleable
commercial CS2000 family 16 & 32 Chameleon Systems
FIPSOC 16 & analog SIDSA
XPP XPU128 32 PACT Corp.

© 2002, [email protected] 56 http://kressarray.de


Xputer Lab
UC Berkeley (Jan Rabaey)
University of Kaiserslautern

market project bits granularity source


PADDI
research PADDI-2 16 UC Berkeley
Pleiades

© 2002, [email protected] 57 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Crossbar-based Architectures
16 bit

C C C C
T EXU T EXU T EXU T EXU
L L L L

1990: UC
Berkeley (Jan
Rabaey) crossbar switch
I/O I/O
1993: PADY-II
(Jan Rabaey)
C C C C
T EXU T EXU T EXU T EXU
1997: Pleiades L L L L
(mesh & crossbar)
32 bit

© 2002, [email protected] 58 http://kressarray.de


Xputer Lab
PADDI-II Architecture
University of Kaiserslautern

I/O I/O I/O I/O


16 x 6 switch matrix

P1 P5 P9 P13 P17 P21


P2 P6 P10 P14 P18 P22 4-PE Cluster
P3 P7 P11 P15 P19 P23
6 x 16b P4 P8 P12 P16 P20 P24 P45

break-switch

break-switch
P46
Level-2
Network
16 x 16b P47

P25 P29 P33 P37 P41 P45


P26 P30 P34 P38 P42 P46 P48
P27 P31 P35 P39 P43 P47
P28 P32 P36 P40 P44 P48

I/O I/O I/O I/O Level-1 Network

© 2002, [email protected] 59 http://kressarray.de


Some Players in Silicon Valley and
Xputer Lab
University of Kaiserslautern ….

Company Architecture Business Markets


Model
Adaptive Silicon Not disclosed Sell Cores Embedded DSP
Chameleon 32 bit datapath Sell Chips Networking
Systems array Voice over IP
Malleable Not disclosed Sell Chips
MorphICs Not disclosed Sell Cores Wireless
Silicon Spice Not disclosed Sell Solutions Commun.
Networking
Systolix Bit Serial SystolicSell Cores Signal
Triscend Array on Chip
System Sell Chips Conditioning
Embedded
Systems
Network Processors: > 20 Players
ge s t C us t omer
c o: X i l i nx’ s l ar
Ci s

© 2002, [email protected] 60 http://kressarray.de


CHESS Array w. embedded RAM
Xputer Lab
University of Kaiserslautern (Elixent)
multi-granular e. g. 16 * 4 Bits = 64 Bits
User Registers Clock Control
R R R R
A A A A
M M M M

ALU
SequencerALU R
A
R
A
R
A
R
A
M M M M
R R R R

Memory Interface
A A A A
M M M M
ALU R
A
R R
A
R
A
A
M M M M
R R R R
A A A A
ine

M M M M
ALU ALU
RAM

R R R R
ach

A A A A
M M M M
y4
te M

R R R R
A A A A
ALU M M M M
16b
Sta

R R R R
A A A A
M M M M

© 2002, [email protected] 61 http://kressarray.de


Xputer Lab
Chameleon Systems
University of Kaiserslautern

• RISC processor and an array of 108 arithmetic processing units. Each of those
32-bit processing cores runs at 125 MHz.

• The CS2112 is the industry's first Reconfigurable Communications Processor


(RCP), a streaming data processor.
• The vendor claims a performance of 20 billion 16-bit operations per second, and
2.4 billion 16-bit multiply-accumulates per second - and 1.6 GBytes / sec for ist
programmable I/O (PIO) banks.

• It also has a PCI interface.

• Tool suite C~SIDE for developing, verifying and optimizing.

© 2002, [email protected] 62 http://kressarray.de


Xputer Lab
MorphoSys
University of Kaiserslautern

© 2002, [email protected] 63 http://kressarray.de


Xputer Lab
University of Kaiserslautern
PipeRench Architecture (CMU
1998)

alternating
data/instructio
n stream

highly
dynamic
reconfigurati
on

© 2002, [email protected] 64 http://kressarray.de


M.I.T.
0.5  CMOS
8 bit 10 x 10
1.8 mm2
100 MHz
Xputer Lab
University of Kaiserslautern MATRIX
(1996)
Multiple Alu archiTecture with Reconfigurable Interconnect eXperiment

C / R Network
BFU opc operation
RAW (M.I.T. 1997) compare / reduce 2 0 ×
1 ×+
MIPS-like
2 ×++
processor 3 × const
WE mode
4 insh

Network Port A
core

Network Port B
256x8 bit nsh
5
Mem 6 dsh
7 csh
global cross 8 +
lines bar 9 +0
ti- r

Mem Func Port


ALU Func Port

u l 10 +1
global 8 bit mnula 11
Reconfigurable lines ALU ra 12 :=
Architecture g 13 nand
Workbench compare / reduce 1
14 nor
Level-1 Network C / R Network 15 xor

© 2002, [email protected] 65 http://kressarray.de


Xputer Lab
University of Kaiserslautern
MATRIX Interconnect Fabrics
Communication
Resources are often BFUs
the bottleneck
BFU

its
neighbours

© 2002, [email protected] 66 http://kressarray.de


Xputer Lab
More Research Projects
University of Kaiserslautern

Garp (UC Berkeley)

published RaPiD (U. Washington )


between REMARC (Stanford)
1996 - 2000
DReAM (U. Karlsruhe)
.... and
others
Asia / Pacific: also
see embedded tutorials
by Prof. Amano
(ASP_DAC’99, FPL-2000)

© 2002, [email protected] 67 http://kressarray.de


Xputer Lab
RaPiD Architecture
University of Kaiserslautern

M
Datapath U
Registers L
T
R R R
A A A A A A
L M L M L M
U U U

Bus Connectors Input Multiplexers Output Drivers

© 2002, [email protected] 68 http://kressarray.de


Xputer Lab
University of Kaiserslautern
REMARC

© 2002, [email protected] 69 http://kressarray.de


Xputer Lab Colt Architecture (P. Athanas
1996)
University of Kaiserslautern

Multiplier

DP DP
wormhol I/O Pins I/O Pins

e routing I/O Pins


DP Smart
Crossbar
DP
I/O Pins
DP DP
I/O Pins I/O Pins

IFU IFU IFU IFU


Studying IFU IFU IFU IFU
highly IFU IFU IFU IFU

dynamic IFU IFU IFU IFU

reconfigurati
on
© 2002, [email protected] 70 http://kressarray.de
Xputer Lab
SOC Alternatives… not including
University of Kaiserslautern
C/C++ CAD Tools [Gordon Bell]
• The blank sheet of paper: FPGA
• Auto design of a basic system: Tensilica
• Standardized, committee designed components*, cells,
and custom IP
• Standard components including more application specific
processors *, IP add-ons and custom
• One chip does it all: SMOP **
*) Processors, Memory, Communication & Memory Links,
**) Simple Matter of Programming

© 2002, [email protected] 71 http://kressarray.de


Xputer Lab
SoC Alternatives [Gordon Bell]
University of Kaiserslautern

product strategy vendor


FPGA “sea of uncommitted gate Xylinx, Altera
arrays”
compile a unique processor for every Tensilica
system application
systolic array many pipelined or parallel
processors + custom
DSP, VLIW special purpose processor TI
cores + custom
processor + RAM general purpose cores, IBM, Intel,
+ ASICS specialized by I/O, etc.
universal micro multiprocessor array, Cradle
programmable I/O
© 2002, [email protected] 72 http://kressarray.de
Designer-oriented Innovation
Xputer Lab
University of Kaiserslautern stalled ?
• EDA industry: about 7 bio $
• leverages > 200 bio $ semconductor industry
• FPGAs (7 bio $) fastest growing segment
• EDA industry constantly redefining itself
• „except logic synthesis nor really significant
innovation in the past decade“
• CAD developers can‘t deliver their idear
effectively
• CAD developers personally don‘t appreciate the
real problems facing designers
© 2002, [email protected] 73 http://kressarray.de
EDA the main bottleneck
Xputer Lab
University of Kaiserslautern

© 2002, [email protected] 74 http://kressarray.de


Xputer Lab guess it ! Biggest Mistake of EDA
University of Kaiserslautern

© 2002, [email protected] 75 http://kressarray.de


4G
Why coarse
Xputer Lab
University of Kaiserslautern grain ? 3G
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
memory

100 000 000


2G
10 000 000
wireless
Transistors/chip

1000 000
Algorithmic Complexity microprocessor / DSP
100 000 100
(Shannon’s Law)
1G
10 000 computational 10
efficiency
1000 SH7752 1
processor speed

StrongARM mA/ MIP


Normalized

100 0.1
battery performance
10 0.01

1 0.001
1960 1970 1980 1990 2000 2010

© 2002, [email protected] 76 http://kressarray.de


Xputer Lab
… Decline of Wintel Business
University of Kaiserslautern
Model

Billion US-$ US Market [forrester]


Billion Subscribers worldwide
Million Devices delivered in the U.S.
20 201 Bio
[IDC]

15
1500 $
er PC
s um
es

C o n
nc

S
ia

1000 $ PC 10 0.5 Bio


pl

r&
Ap

a Cons
l av. reumer PC
llu [fo sale ($)
n

ce r re s t e
io

r]
at
m
f or
In

1997 1998 1999 2000 2001 2002

© 2002, [email protected] 77 http://kressarray.de


4G
Why coarse
Xputer Lab
University of Kaiserslautern grain ? 3G
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
memory

100 000 000


2G
10 000 000
wireless
Transistors/chip

1000 000
Algorithmic Complexity microprocessor / DSP
100 000 100
(Shannon’s Law)
1G
10 000 computational 10
efficiency
1000 SH7752 1
processor speed

StrongARM mA/ MIP


Normalized

100 0.1
battery performance
10 0.01

1 0.001
1960 1970 1980 1990 2000 2010

© 2002, [email protected] 78 http://kressarray.de


Xputer Lab
Dataquest Predicts
University of Kaiserslautern
Programmability to be Predominant
in SOC

• Application-specific programmable products (ASPPs)


will be the next best thing in semiconductor technology

• With programmability as a standard feature, ASPPs will


be predominant system-on-a-chip products in five years

Jordan Selburn, principal analyst,


ASICs and system-level integration,
Dataquest Inc.’s Semiconductors Group
EETimes 10/21/98
Dataquest Semiconductors ‘98 conference

© 2002, [email protected] 79 http://kressarray.de


Xputer Lab
It’s a General Paradigm Shift !
University of Kaiserslautern

• Using FPGAs (fine grain reconfigurable):


just Logic Synthesis on a strange platform
• Coarse Grain Reconfigurable Arrays
0]
(Reconfigurable Computing): * [1 98
a fundamental Paradigm Shift rray [1995]
i c a **
tol ra y
sys A r 0 0 0]
• Replacing Concurrent Processes by ss *[ 2
Kre - d ay ___
_
wire
d

much more efficient parallelism: ip-on-a


ar d
*) h urable
n f ig
**) reco

Stream-based ComputingArrays ch
• ignored by Curricula & most R&D scenes

© 2002, [email protected] 80 http://kressarray.de


Xputer Lab
Fine-grained vs. coarse-grained
University of Kaiserslautern

• Fine-grained reconfiguration versus


coarse-grained reconfiguration.

• fine grain is general purpose


• slow and area-inefficient, but high
parallelism
• coarse grain is application domain-specific
• coarse grain is highly area-efficient
• extremely high performance

© 2002, [email protected] 81 http://kressarray.de


Configurable Computing Systems
Xputer Lab
University of Kaiserslautern

• combine programmable sequential processor with Flexware


(structurally programmable „hard“ware):

• capitalize on the strength of both,flexware and software.

• early 60ies: Estrin (UCLA): enabling technology not available

• 90ies: significant increase of research activities (DARPA ...)

• FPGAs: not the enabling technology: hardware skills needed

• Verilog or VHDL based systems often result in poor


performance

© 2002, [email protected] 82 http://kressarray.de


Super Pipe Networks
Xputer Lab
University of Kaiserslautern

The key is mapping, rather t


han architecture

scheduling
pipeline properties
array applications mapping (data stream
formation)
shape resources
regular data
systolic linear uniform linear projection or
dependencies
array only only algebraic synthesis
only
super- simulated (e.g. force-directed)
systolic no restrictions annealing or scheduling
rDPA
* P&R algorithm algorithm
*) KressArray [1995]

© 2002, [email protected] 83 http://kressarray.de


Xputer Lab
Super Pipe Networks
University of Kaiserslautern

The key is mapping, rather th


an architecture

scheduling
pipeline properties
array applications mapping (data stream
formation)
shape resources
regular data
systolic linear uniform linear projection or
dependencies
array only only algebraic synthesis
only
super- simulated (e.g. force-directed)
systolic no restrictions annealing or scheduling
RA * P&R algorithm algorithm
*) KressArray [ASP-DAC-1995]

© 2002, [email protected] 84 http://kressarray.de


Xputer Lab
Xplorer Plot: SNN Filter Example
University of Kaiserslautern

[13]
operator
http://kressarray.de + operand

2 hor. NNports, 32 bit result route thru


3 vert. NNports, 32 bit
operand
route-thru-only rDPU backbus connect

© 2002, [email protected] 85 http://kressarray.de


Xputer Lab
University of Kaiserslautern
Dimensions of
Reconfigurability
ASIPs* vs. Network Processors
*) Application-Specific Instruction set Processors

configuration time
Extremes:
Class of design
processor product vendor time ASIP
ASI P Tensilica Tensilica fabrication
time
MECA family Malleable
compile statically re-
Network time
Processor CALISTO SiliconSpice configurable

many others many others run time Network


dynamically
Processor
reconfigurable

© 2002, [email protected] 86 http://kressarray.de


Xputer Lab
KressArray: try out youself !
University of Kaiserslautern

• You may experiment yourself


• You may use it over the internet
• Map an application onto a KressArray
• Start with a simple example
• Visit http://kressarray.de
• Click the link to Xplorer try Netscape 4.7x
• ... does not run on internet explorer ....

• ... since Bill Gates does not like Java
© 2002, [email protected] 87 http://kressarray.de
Xputer Lab Communication Resource
University of Kaiserslautern
Requirements

... often Functional Resources


are not the Throughput
Bottleneck
In some Application Areas,
such as e. g. Wireless
Communication,
Reconfigurable Computing Arrays
need extraordinarily rich and
powerful Communication
The
ResourcesSolution: Generators
for
Domain-specific RA
Platforms
© 2002, [email protected] 88 http://kressarray.de
Xputer Lab
Ulrich Nageldinger
University of Kaiserslautern

Dissertation Ulrich Nageldinger, infineon


Ulrich Nageldinger: technologies, Munich
• ... on mapping applications onto KessArrays
• ... simultaneous routing and placement by
simulated annealing
• Supporting a huge family of KressArrays
• fuzzy logic improvement proposal generator
• profiling
• design space exploration
© 2002, [email protected] 89 http://kressarray.de
Xputer Lab
Rainer Kress
University of Kaiserslautern

Rainer Kress, infineon


Dissertation
technologies, Munich
Rainer Kress:
• ... on mapping applications onto his*
KessArray
• DPSS datapath synthesis system
• Including a data scheduler
• (data stream scheduler)
• Generalization of the Systolic Array
• (KressArray is a super systolic array)
• 32 bit design via Eurochip
© 2002, [email protected] 90 support http://kressarray.de
Xputer Lab
Jürgen Becker
University of Kaiserslautern

Dissertation
Jürgen Becker: Professor at Univ. Karlsruhe
• ... Automatically partitioning Co-compiler
• (configware / software co-compilation)
• Resource-parameter-driven retargettable
• Profiler-driven optimization
• Accepts HLL „ALE-X“ (extended C subset)
• (subset: pointers not supported)
© 2002, [email protected] 91 http://kressarray.de
Xputer Lab
Karin Schmidt
University of Kaiserslautern

Dissertation
Karin Schmidt: Karin Schmidt,
DaimlerChrysler
• Compilation Techniques Research
for Xputers
• modified loop transformations
• Modified parts of implementation used
for Jürgen Becker‘s Ph. D. thesis

© 2002, [email protected] 92 http://kressarray.de


Xputer Lab
PACT AG
University of Kaiserslautern

• Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable,


scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and
high-speed I/O ports -
• Application development support software featuring a flow graph-style algorithm
mapping language - to minimize training requirements.
• XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network
to dynamically configure the execution flow,
• Supports dynamic RTR: hierarchical configuration managers free the designer from chip-
level details and ensure that configurations are independently loaded in exactly the
intended order.
• Automatic event-based task swapping along with data streams: released resources
automatically reconfigured immediately

© 2002, [email protected] 93 http://kressarray.de

You might also like