0% found this document useful (0 votes)

24 views33 pages

Lecture 30

Uploaded by

Nivedita Acharyya 2035

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views33 pages

Lecture 30

Uploaded by

Nivedita Acharyya 2035

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Reconfigurable Computing

CS G553

Dr. A. Amalin Prince

BITS - Pilani K K Birla Goa Campus
Department of Electrical and Electronics Engineering

‹#›
Lecture –30
High-Level Synthesis for Reconfigurable Devices
(Behavioral Synthesis): Spatial and Temporal Partitioning

CS G553 2
High-Level Synthesis

 Fundamental differences in RCS:

1.
o General: The binding will just map operator to resources and the
scheduling will decide which operator owns the resource at a given
time.
o RCS: The architectural resources are created on the reconfigurable
device according to the resource types needed to map the operators
at a given time
o Uniform resources:
• → It is possible to implement any task on a given part of a device
(provided that the available resource are enough).

CS G553 3
General vs. RCS High-Level Synthesis

 Example: add
mul
x = ((a  b)  (c  d)) + ((c  d)-(e-f))
sub *
y = ((c  d) - (e - f)) - ((e - f) + (g - h))

• Assumptions on “resource fixed” device:

➢ Multiplication needs 100 basic resource unit
➢ The adder and the subtractor need 50 units
each.
➢ Allocation selects “one” instance of each
resource type.
− → Two subtractors cannot be used in the first *
level.
➢ The adder cannot be used in the first step
− due to data dependency
➢ Minimum execution time: 4 steps

CS G553 4
General vs. RCS High-Level Synthesis

Assumptions on a reconfigurable device

o Multiplication needs 100 LUTs.
o Adder/subtractor need 50 LUTs each. *

o Total available amount of resources:

200 LUTs.
o The two subtractors can be assigned
in the first step.
o Minimum execution time: 3 steps

CS G553 5
High-Level Synthesis
 Fundamental differences in RCS:
2.
o In general HLS:
• Application is specified using a structure that encapsulates a datapath and a
control part.
• Synthesis process allocates the resources to operators at different time
according to a computed schedule.
• Control part is synthesized.

o In RCS:
• Hardware modules implemented as datapath normally compete for execution on
the chip.
• A processor is used to control selection process of the hardware modules by
means of reconfiguration.
• The same processor is also in charge of activating the resources in the
corresponding hardware accelerators.

Most of your project under which category?

CS G553 6
Partitioning

CS G553 7
Partitioning - Motivation

 A design implementation is often too big to allow an

implementation on a single FPGA.
 Possible solutions are:
o Spatial partitioning: The design is partitioned into many FPGAs.
Each partition block is implemented in one single FPGA. All the
FPGAs are used simultaneously.
o Temporal partitioning: The design is partitioned into blocks, each of
which will be executed on one FPGA at a given time.

CS G553 8
Spatial Partitioning

CS G553 9
Spatial partitioning - Problem

 Partitioning Constraints: Each

FPGA is characterized by:
o The size, i.e., the number of LUTs,
FFs available
o The terminals, i.e., the number of
I/O pins available on the device
o A partition is valid iff: for a block B
produced by the partition, we have:
• S(B) <= S(device) where S(X) = size
of X
• T(B) <= T(device) where T(X) = #
terminals of X

CS G553 10
Spatial partitioning - Problem

 Objectives: The following objectives

are possible:
o Minimize the number of cut nets
o Minimize the number of produced
blocks
o Minimize the delay
 Difficult problem due to all the
constraints which are not always
compatible.
 Solution approaches:
o Use of heuristics for automatic
partitioning
o Manual intervention

CS G553 11
Spatial partitioning –Timing –Block
replication

CS G553 17
Temporal Partitioning

CS G553 18
Temporal Partitioning

o Resources on the device are not

allocated to only one operator but to a set
of operators that must be placed at the
same time and removed.
• An application must be partitioned in sets
of operators.
o The partitions will then be successively
implemented at different time on the
device.

Temporal Partitioning
Challenging
Why?

CS G553 19
CS G553 20
Configuration

 Configuration:
o Given a reconfigurable processing unit H and
o a set of tasks T = {t1, ...., tn} available as cores C = {c1, ...., cn},
o we define the configuration ζi of the RPU at time si to be the set of
cores ζi = {ci1, ..., cik}  C running on H at time si.
 A core (module) ci for each ti in library:
o Hard / soft / firm module.

CS G553 21
Schedule

 Schedule:
o is a function ς : V → Z+, where ς(vi) denotes the starting time
of the node vi that implements a task ti.
 Feasible Schedule:
o ς is feasible if: eij = (vi, vj)  E,
ς(tj) ≥ ς(ti) + T(ti) + tij
• eij defines a data dependency between tasks ti and tj,
• tij is the latency of the edge eij,
• T(ti) is the time it takes the node vi to complete execution.

CS G553 22
Ordering Relation

 Ordering relation ≤ among the nodes of G

o vi ≤ vj   schedule ς, ς(vi) ≤ ς(vj).

• ≤ is a partial ordering, as it is not defined for all pairs of nodes in G.

CS G553 23
Partition

 Partition:
o A partition P of the graph G = (V,E) is its division into some
disjoint subsets P1, ..., Pm such that
Uk=1,…,mPk = V

 Feasible Partition:
o A partition is feasible in accordance to a reconfigurable
device H with area a(H) and pin count p(H) if:
o Pk  P: a(Pk) = (∑viPkai) ≤ a(H)
o ∑eijEwij ≤ p(H)
• for eij = crossing edges
 Crossing edge:
o an edge that connects one component in a partition with
another component out of the partition.

CS G553 24
Run Time

 Run time of a partition r(Pi):

o The maximum time from the input of the data to the output of the
result.

CS G553 25
Ordering Relation

 Ordering relation for partitions:

o Pi ≤ Pj  vi  Pi, vj  Pj
• either vi ≤ vj
• or vi and vj are not in relation.

 Ordered partitions:
o A partitioning P is ordered  an ordering relation ≤ exists on P.

o If P is ordered, then for a pair of partitions, one can always be

implemented after the other with respect to any scheduling relation.

CS G553 26
Temporal Partitioning

 Temporal partitioning:
o Given a DFG G = (V,E) and a reconfigurable device H, a temporal
partitioning of G on H is an ordered partitioning P of G with respect
to H.

CS G553 27
Temporal Partitioning

o Cycles are not allowed in DFG.

• Otherwise, the resulting partition may not
be schedulable on the device.

Cycle

CS G553 28
Temporal partitioning

 Goal:
o Computation and scheduling of a Configuration graph
 A configuration graph is a graph in which:
o Nodes are partitions or bitstreams
o Edges reflect the precedence constraints in a given DFG

P1 P2 P3

P4
P5

Configuration Graph

CS G553 29
Temporal partitioning

P1 P2 P3

P4
• Formal Definition: P5

➢ Given a DFG G = (V,E) Configuration Graph

➢ and a temporal partitioning P = {P1, ..., Pn} of G, we define a
Configuration graph of G relative to the P, with notation Γ(G/P) =
(P,EP) in which the nodes are partitions in P. An edge eP = (Pi, Pj )
EP  e = (vi, vj)  E with vi  Pi and vj  Pj .
• Configuration:
➢ For a given partition P, each node Pi  P has an associated
configuration ζi that is the implementation of Pi for the given device
H.

CS G553 30
Temporal partitioning
 Whenever a new partition is P1 P2 P3
downloaded, the partition that was
running is destroyed.
o Communication through inter-configuration P4
P5
registers (or communication memory)
Inter-configuration
• May sit in main memory registers
• May sit at the boundary of the device to
hold the input and output values
o Configuration sequence is controlled by the IO Register Bus IO Register
host processor

IO Register

IO Register
IO Register
Block
IO Register
IO Register

IO Register
Processor
FPGA
Communication Memory Synthesis? Device’s register mapping into
the processor address spaces

CS G553 31
Temporal partitioning

 Steps (for Pi and Pj, (Pi ≤ Pj):

1. Configuration for Pi is first downloaded into
the device.
2. Executes. P1 P2 P3

3. Pi copies all the data it needs to send to

other partitions into the communication
memory. P4
P5
4. The device is reconfigured to implement the
partition Pj Inter-configuration
registers
5. Accesses the communication memory and
collect the data.

CS G553 32
Temporal partitioning
 Objectives for optimization:
1. # interconnections: very important, since it minimizes:
➢ The amount of exchanged data
➢ The amount of memory for temporally storing the data
2. # produced blocks (partitions)
➢ Reduces the number of reconfigurations (total time?)
3. Overall computation delay depends on
➢ the partition run time
➢ the processor used for reconfiguration
➢ speed of data exchange
4. Similarity between consecutive partitions (for partial)
5. Overall amount of wasted resources on the chip.
➢ When components with shorter run-times are
placed in the same partition with other components
with longer run-time, those with the shorter
components remain idle for a longer period of time.

CS G553 33
Wasted Resources

 Wasted resource wr(vi) of a node vi:

o Unused area occupied by the node vi during the computation
of a partition
wr(vi) = (t(Pi)−T(ti))×ai Run time
t(Pi): run-time of partition Pi.
T(ti)): run-time of the component vi
ai: area of vi
 Wasted resource wr(Pi) of a partition
(Pi = {vi1 , .., vin}:
wr(Pi) = j =1,…,n wr(vi)
 Wasted resource of a partitioning P: Area
wr(P) = j =1,…,k wr(Pj)

CS G553 34
Communication Overhead

Communication Cost: modelled as graph connectivity:

Connectivity of a graph G=(V,E):
con(G) = 2*|E|/(|V|2 - |V|)
o |V|2 - |V|: the number of all edges that can be built with V.

8
1 2

3 7 9
4 6
5
1
0

Connectivity = 0.24

CS G553 35
Communication Overhead

Quality of Partitioning P = {P1,…,Pn}:

o Average connectivity over P: 8
1 2
Q(P) = 1/n i=1,…,ncon(Pi) 3 7 9
4 6
5 1
o High quality means the algorithm performs well. 0

o Low quality means that the algorithm performs poor Connectivity = 0.24

3 8
2 6 2
1
9
1 3 7 9
7
4
4 6
5 8 10
5 10

Quality = 0.25 Quality = 0.45

CS G553 36
Communication Overhead

• Minimizing communication overhead by

➢minimizing the weighted sum of crossing edges among
the partitions.
− → minimize the size of the communication memory and
− → minimize the communication time.
• Heuristic:
➢Highly connected components are placed in the same
partition (High quality partitioning)

CS G553 37
The End

 Questions ?

 Thank you for your attention

CS G553 38

Vlsi Physical Design Nptel
No ratings yet
Vlsi Physical Design Nptel
1,091 pages
VLSI Design Flow (RTL To GDS)
No ratings yet
VLSI Design Flow (RTL To GDS)
12 pages
CMOS Mixed Signal Circuit Design
No ratings yet
CMOS Mixed Signal Circuit Design
261 pages
Digital Signal Processors and Architectures (DSPA) Unit-2
No ratings yet
Digital Signal Processors and Architectures (DSPA) Unit-2
92 pages
Dynamic Logic
No ratings yet
Dynamic Logic
27 pages
Types of Glasswares
100% (2)
Types of Glasswares
18 pages
CrossFit Strongman Course
100% (2)
CrossFit Strongman Course
26 pages
CAD VLSI Overview
No ratings yet
CAD VLSI Overview
12 pages
Interest On Drawing
No ratings yet
Interest On Drawing
8 pages
Nextgen Comp Arch
No ratings yet
Nextgen Comp Arch
794 pages
Session - 09 10 - CPLD Fpga
No ratings yet
Session - 09 10 - CPLD Fpga
20 pages
BITS Pilani: Hardware Software Co-Design ES/SE/SS ZG626, MEL ZG651 Session 5
No ratings yet
BITS Pilani: Hardware Software Co-Design ES/SE/SS ZG626, MEL ZG651 Session 5
28 pages
Chip Design For Non Designers An Introduction 2008 PDF
No ratings yet
Chip Design For Non Designers An Introduction 2008 PDF
180 pages
Impact of Training en Employee Perf
No ratings yet
Impact of Training en Employee Perf
35 pages
OceanofPDF - Com The Great Illusion - Norman Angell
No ratings yet
OceanofPDF - Com The Great Illusion - Norman Angell
313 pages
Ee8018 2M and 16M With Answer
No ratings yet
Ee8018 2M and 16M With Answer
101 pages
Instruction Manual: Motion Controller With Sine Wave Commutation For EC-Motors
No ratings yet
Instruction Manual: Motion Controller With Sine Wave Commutation For EC-Motors
65 pages
Nmos Inverter Numerical
No ratings yet
Nmos Inverter Numerical
4 pages
Asic Design Flow: TKK "Laitteistokuvauskielinen Digitaalisuunnitelu" Syksy-1999
0% (1)
Asic Design Flow: TKK "Laitteistokuvauskielinen Digitaalisuunnitelu" Syksy-1999
25 pages
Architecture of Fpga Altera Cyclone: BY:-Karnika Sharma Mtech (2 Year)
100% (1)
Architecture of Fpga Altera Cyclone: BY:-Karnika Sharma Mtech (2 Year)
29 pages
Logical Effort B
No ratings yet
Logical Effort B
30 pages
Unit 3
No ratings yet
Unit 3
50 pages
Independent Contractor Agreement 19
100% (1)
Independent Contractor Agreement 19
6 pages
Rts 4
No ratings yet
Rts 4
91 pages
Assignment3 2021HT80531
100% (1)
Assignment3 2021HT80531
14 pages
Rts 3
No ratings yet
Rts 3
64 pages
Canadian Visa Requirements 1. Accomplished IMM5257 Form
50% (2)
Canadian Visa Requirements 1. Accomplished IMM5257 Form
5 pages
Nervatla Artish 2005
No ratings yet
Nervatla Artish 2005
61 pages
Synchronous Sequential Logic
No ratings yet
Synchronous Sequential Logic
69 pages
Advanced VLSI Architecture Design For Emerging Digital Systems
No ratings yet
Advanced VLSI Architecture Design For Emerging Digital Systems
78 pages
018) (Number Ranking) by Amit Gargpdf Dark
No ratings yet
018) (Number Ranking) by Amit Gargpdf Dark
2 pages
Lecture 13,14
No ratings yet
Lecture 13,14
44 pages
Fulltext
No ratings yet
Fulltext
29 pages
Lecture 24
No ratings yet
Lecture 24
41 pages
Unit 2 QB With Answers
No ratings yet
Unit 2 QB With Answers
13 pages
Rob Lect2
No ratings yet
Rob Lect2
29 pages
Lecture 31
No ratings yet
Lecture 31
41 pages
NEA Mac Protocols Presentation
No ratings yet
NEA Mac Protocols Presentation
26 pages
Introduction To CN-Parte-4
No ratings yet
Introduction To CN-Parte-4
27 pages
Assignment 2 Vlsi
No ratings yet
Assignment 2 Vlsi
21 pages
Vlsi Design Question Bank
No ratings yet
Vlsi Design Question Bank
3 pages
CSE 420 Fall 2018 Module 1 Sample Questi
No ratings yet
CSE 420 Fall 2018 Module 1 Sample Questi
18 pages
Lecture 1 - Introduction: Arto Perttula TIE-50206 Logic Synthesis Tampere University of Technology 2017-2018
No ratings yet
Lecture 1 - Introduction: Arto Perttula TIE-50206 Logic Synthesis Tampere University of Technology 2017-2018
57 pages
Exist 2 Inspire PDF
100% (1)
Exist 2 Inspire PDF
2 pages
Reconfigurable Computing
No ratings yet
Reconfigurable Computing
70 pages
VLSI Design
No ratings yet
VLSI Design
19 pages
PDF 1
No ratings yet
PDF 1
17 pages
210 - EC8392, EC6302 Digital Electronics - Question Bank
No ratings yet
210 - EC8392, EC6302 Digital Electronics - Question Bank
17 pages
Embedded System Life Cycle
50% (2)
Embedded System Life Cycle
15 pages
Timer - LPC
No ratings yet
Timer - LPC
17 pages
210 - EC8392, EC6302 Digital Electronics - Question Bank 1
No ratings yet
210 - EC8392, EC6302 Digital Electronics - Question Bank 1
19 pages
Clock and Low Power Modes - STM32
No ratings yet
Clock and Low Power Modes - STM32
11 pages
Hspice Lab Manual
No ratings yet
Hspice Lab Manual
47 pages
Design of Embedded Systems
No ratings yet
Design of Embedded Systems
44 pages
WMS 1
No ratings yet
WMS 1
10 pages
Online Placement Rectangle
No ratings yet
Online Placement Rectangle
10 pages
Amba
No ratings yet
Amba
7 pages
ADC & DAC - LPC23xx
No ratings yet
ADC & DAC - LPC23xx
12 pages
Creating A USSD Application With NodeJS and Redis Part 1
No ratings yet
Creating A USSD Application With NodeJS and Redis Part 1
4 pages
Chortle CRF
No ratings yet
Chortle CRF
7 pages
Design of Beams
100% (2)
Design of Beams
28 pages
Assignment 2
No ratings yet
Assignment 2
12 pages
18ec0443-Analog Electronic Circuits
No ratings yet
18ec0443-Analog Electronic Circuits
7 pages
Timer - STM32
No ratings yet
Timer - STM32
14 pages
Accounting For Short-Term Liabilities
No ratings yet
Accounting For Short-Term Liabilities
7 pages
Centum AGM Notice
No ratings yet
Centum AGM Notice
11 pages
14 Compaction
No ratings yet
14 Compaction
22 pages
Detailed Lesson Plan in Earth Scienc1
No ratings yet
Detailed Lesson Plan in Earth Scienc1
7 pages
Sys LW-01EN ComputingBasises
No ratings yet
Sys LW-01EN ComputingBasises
12 pages
FSM Design and Optimisation
No ratings yet
FSM Design and Optimisation
45 pages
PDF 2
No ratings yet
PDF 2
13 pages
Fpga Design Flow
No ratings yet
Fpga Design Flow
3 pages
Senator Patty Ritchie 2016 Veterans Hall of Fame Honorees
No ratings yet
Senator Patty Ritchie 2016 Veterans Hall of Fame Honorees
36 pages
sensorKDD 2010
No ratings yet
sensorKDD 2010
9 pages
6.hardware Software Codesign Ijrect
No ratings yet
6.hardware Software Codesign Ijrect
6 pages
Asal
No ratings yet
Asal
9 pages
DSP SHARC Processors PART1
100% (1)
DSP SHARC Processors PART1
33 pages
Computer Architecture As A Multilevel Hierarchical Framework
100% (1)
Computer Architecture As A Multilevel Hierarchical Framework
6 pages
Flashflex Microcontroller Using The Programmable Counter Array (Pca)
No ratings yet
Flashflex Microcontroller Using The Programmable Counter Array (Pca)
17 pages
Santosh V Hegde-2022HT01035-ESZG553 RTS
No ratings yet
Santosh V Hegde-2022HT01035-ESZG553 RTS
7 pages
Fuchs Lubritech GMBH - STABYL EOS E 2 - 000000000601079160 - 10!25!2016 - English
No ratings yet
Fuchs Lubritech GMBH - STABYL EOS E 2 - 000000000601079160 - 10!25!2016 - English
9 pages
Lec 1
No ratings yet
Lec 1
17 pages
Polish Expression
No ratings yet
Polish Expression
20 pages
Low Cost FPGA Development System For Tea
No ratings yet
Low Cost FPGA Development System For Tea
5 pages
7A Colorectal Exam
No ratings yet
7A Colorectal Exam
7 pages
Distributed Python
No ratings yet
Distributed Python
22 pages
ESD Assignment1
No ratings yet
ESD Assignment1
7 pages
ECE 301 - Digital Electronics: Sequential Logic Circuits: FSM Design
No ratings yet
ECE 301 - Digital Electronics: Sequential Logic Circuits: FSM Design
27 pages
Italo Calvino Mushrooms in The City
No ratings yet
Italo Calvino Mushrooms in The City
3 pages
MARHABA CRETA Khutba Mehmoodiya 2
No ratings yet
MARHABA CRETA Khutba Mehmoodiya 2
9 pages
Untitled
No ratings yet
Untitled
4 pages
RECONFIGURABLE COMPUTING Presentation
No ratings yet
RECONFIGURABLE COMPUTING Presentation
23 pages
0 Offer Letter
No ratings yet
0 Offer Letter
5 pages
TEDTalksFREEWorksheettoUseWithANYTEDTalkPublicSpeakingGrades612 PDF
No ratings yet
TEDTalksFREEWorksheettoUseWithANYTEDTalkPublicSpeakingGrades612 PDF
2 pages
ADSP Compre
No ratings yet
ADSP Compre
2 pages
Geothermal Power Generating Systems: Zserlene Faye B. Manalo
No ratings yet
Geothermal Power Generating Systems: Zserlene Faye B. Manalo
14 pages
Review: Design Objectives: Thresholds
No ratings yet
Review: Design Objectives: Thresholds
19 pages
ARLS Bylaws
No ratings yet
ARLS Bylaws
3 pages
Basic FPGA Architectures: Altera Xilinx
No ratings yet
Basic FPGA Architectures: Altera Xilinx
8 pages
Experiment No 6: Implementation of Instruction Fetch Unit: Team Details: Terminal No: SL No Name Id No 1 2 3
No ratings yet
Experiment No 6: Implementation of Instruction Fetch Unit: Team Details: Terminal No: SL No Name Id No 1 2 3
6 pages
EE 5324 - VLSI Design II
No ratings yet
EE 5324 - VLSI Design II
63 pages
CFAH1602BTMCJP
No ratings yet
CFAH1602BTMCJP
20 pages
Leve in Conditioner
No ratings yet
Leve in Conditioner
2 pages
Daftar PUstaka
No ratings yet
Daftar PUstaka
5 pages
Md2024 Corneal Disease
No ratings yet
Md2024 Corneal Disease
3 pages
Buttons Figs Activity Ep 5 092316
No ratings yet
Buttons Figs Activity Ep 5 092316
1 page
2009 - Open Book Exam BITS Pilani
No ratings yet
2009 - Open Book Exam BITS Pilani
2 pages
Control Systems Interview Questions
100% (1)
Control Systems Interview Questions
2 pages
Field Programmable Gate Array
No ratings yet
Field Programmable Gate Array
18 pages
Number Systems and Boolean Logic
No ratings yet
Number Systems and Boolean Logic
9 pages
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet

Lecture 30

Uploaded by

Lecture 30

Uploaded by

Reconfigurable Computing

Dr. A. Amalin Prince

 Fundamental differences in RCS:

• Assumptions on “resource fixed” device:

Assumptions on a reconfigurable device

o Total available amount of resources:

Most of your project under which category?

 A design implementation is often too big to allow an

 Partitioning Constraints: Each

 Objectives: The following objectives

o Resources on the device are not

 Ordering relation ≤ among the nodes of G

• ≤ is a partial ordering, as it is not defined for all pairs of nodes in G.

 Run time of a partition r(Pi):

 Ordering relation for partitions:

o If P is ordered, then for a pair of partitions, one can always be

o Cycles are not allowed in DFG.

➢ Given a DFG G = (V,E) Configuration Graph

 Steps (for Pi and Pj, (Pi ≤ Pj):

3. Pi copies all the data it needs to send to

 Wasted resource wr(vi) of a node vi:

Communication Cost: modelled as graph connectivity:

Quality of Partitioning P = {P1,…,Pn}:

Quality = 0.25 Quality = 0.45

• Minimizing communication overhead by

 Thank you for your attention

You might also like