0% found this document useful (0 votes)
13 views

Lecture 31

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 31

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Reconfigurable Computing

CS G553

Dr. A. Amalin Prince


BITS - Pilani K K Birla Goa Campus
Department of Electrical and Electronics Engineering

‹#›
Lecture – 31
High-Level Synthesis for Reconfigurable Devices
(Behavioral Synthesis): Temporal Partitioning Algorithms

CS G553 2
Temporal partitioning & Scheduling

 Scheduling:
o Inputs:
• A DFG
• An architecture (i.e. a set of processing elements)
o Output:
• Starting time of each node on a given resource
 Temporal partitioning:
o Input:
• A DFG
• A reconfigurable device
o Output:
• A set of partitions
• Starting time of each node is the starting time of the partition to which it
belongs

CS G553 3
Temporal partitioning & Scheduling

 Solution approaches:
o List scheduling
o Integer linear programming (exact method)
o Network flow
o Spectral method
• * Recursive bi-partitioning approaches

CS G553 4
Unconstrained Scheduling

 Unconstrained scheduling:
o Assumption: unlimited amount of resources
• Device with unlimited size

o Usually as pre-processing step for other algorithms


• E.g. computation of the upper and lower bounds on the starting time of
operations.
o Lower bound: the earliest time at which a module can be
scheduled,
o Upper bound: the latest time at which a module can be started.

CS G553 5
Unconstrained Scheduling

 ASAP (as soon as possible)

o Defines the earliest starting time for each node in the DFG

o Computes a minimal latency

 ALAP (as late as possible)

o Defines the latest starting time for each node in the DFG
according to a given latency
 The mobility of a node:
o (ALAP starting time) – (ASAP starting time)

o Mobility = 0 → node is on a critical path

CS G553 6
ASAP Algorithm

ASAP(G(V,E),d) {
FOREACH ( vi without predecessor)
s(vi) := 0;
REPEAT {
choose a node vi , whose predecessors are all planned;
s(vi) := maxj:(vj,vi)E {s(vj)+ dj};
}
UNTIL (all nodes vi are planned);
RETURN s;
}

CS G553 7
ASAP Example

Unconstrained scheduling with optimal latency : L = 4

Time 0
* * * * +
Time 1
* * + <
Time 2
-
Zeit 3
Time 3
-
Time 4

CS G553 8
ASAP Example

 Assumptions:
o Multiplication: latency of 100
clocks,
o Addition/subtraction: 50
clocks,
o data transmission delay is *
neglected.

CS G553 9
ASAP Example

 Assumptions:
o Multiplication: latency of 100
clocks,
o Addition/subtraction: 50
clocks,
o data transmission delay is
neglected.

Computation delay of
the prev. node

Node’s starting time as computed by


the algorithm.

CS G553 10
ALAP-Algorithm

ALAP(G(V,E),d, L) {
FOREACH( vi without successor)
s(vi) := L - di;
REPEAT {
Choose a node vi , which successors are all planned;
s(vi) := minj:(vi,vj)E {s(vj)} - di;
}
UNTIL (all nodes vi are planned);
RETURN s
}

CS G553 11
ALAP-Example

Unconstrained scheduling with optimal latency : L = 4

Time 0
* *
Time 1
* *
Time 2
- * * +

Time 3
- + <
Time 4

CS G553 12
Mobility

Zeit 0
Time 0 0 0
* * * * +
Zeit 1
Time 1
0 * 1 + <
**
Zeit 2
Time 2
0 1 2 2
- * * +
Zeit 3
Time 3 2 2
0 - + <
Zeit 4
Time 4

CS G553 13
ALAP Example

 Assumptions:
o Multiplication: latency of 100
clocks,
o Addition/subtraction: 50
clocks,
o Overall computation time: *
250

CS G553 14
ALAP Example

 Assumptions:
o Multiplication: latency of 100
clocks,
o Addition/subtraction: 50
clocks,
o Overall computation time:
250

Computation delay of
the prev. node

Node’s starting time as


computed by the algorithm.

CS G553 15
Constrained Scheduling

 Constrained scheduling:
o A set of fixed resources available (ASIC).
o Many tasks competing for a given resource,
• → One of them must be chosen according to a given criteria and
the rest will be scheduled later.
1. Extended ASAP, ALAP:
o Compute ASAP or ALAP
o Assign the tasks earlier (ASAP) or later (ALAP), until the
resource constraints (e.g. area) are fulfilled.

CS G553 16
Extended ASAP
● Constraint:
o 2 Multipliers, 2 ALUs (+, −, <)

Time 0
* * +
Time 1
* * <
Time 2
- * *
Time 3

- +
Time 4

CS G553 17
Constrained Scheduling

 List scheduling:
o Sort nodes in topological order
o Assign priority to nodes
o Criteria can be:
• number of successors,
• depth (length of longest path from inputs),
• latency-weighted depth,
– w: latency of the operation to be executed by the nodes on the path.
• mobility,
• connectivity,
• ...

CS G553 18
Constrained Scheduling

 At any time step t:


o A ready set L is constructed (operations ready to be
scheduled)
• L: operations whose predecessors have already been scheduled
early enough to complete their execution at time t.

o Tasks are placed in L in decreasing priority order


o At a given step, the free resource is assigned the
task with highest priority.

CS G553 19
Constrained Scheduling
o At a given step, the free resource is assigned the
task with highest priority.

y Are there enough resources of type k n


to implement all the operations of
type k?

Assign sources to Assign sources to


operations high priority
operations

CS G553 20
Constrained Scheduling (Example)

● Criterion: number of successors


● Resources: 1 multiplier, 1 ALU (+, -, <)

3 3 2 1 1
* * * * +

2 1 0 0
* * + <

1
-

0 -

CS G553 21
Constrained Scheduling (Example)
Time 0
* +
Time 1

* <
Time 2
*
Time 3
* -
Time 4
*
Time 5

* -
Time 6
+
Time 7
CS G553 22
List Scheduling: Example
 Resources: 1 multiplier, 1 adder
 Latency:
o Multiplication: 100 clocks,
o Add/sub: 50 clocks,

number of
successors
as priority 400

CS G553 23
Temporal Partitioning in RCS

CS G553 24
Temporal Partitioning vs.
Constrained Scheduling
 In RCS,

o Resource types are not important.


• Amount of basic resources are important.

o Operators do not compete for resources.


• They compete for area.

o Only the starting time and the end time of the complete partition is
usually considered.

CS G553 25
Temporal Partitioning in RCS

 Temporal partitioning:
o The same as list scheduling
o Assignment criterion: there should be enough places left on the
device to accommodate the new component.
Algorithm: List-scheduling algorithm for reconfigurable devices
sort the nodes of v according to their priorities
P0 := Ø
while V ≠Ø do
select a vertex v V with highest priority and whose predecessors
are all placed
if (a partition Pi exists with s(Pi) + s(v) ≤ s(H)) then
Pi = Pi  {v}
else
create a new partition Pi+1 and set Pi+1 = {v}
end if
end while

CS G553 26
Temporal Partitioning vs.
Constrained Scheduling

3 3 1
3 * 3 * 2 * 1 * 1 +
* * +
0 2 1 0 0
* * + <
P1 <
1 -
2 2
* *
0 -
P2
1 ● Criterion: number of
- successors
● Connectivity:
1 1 ● c(P1) = 1/6,
* ● size(FPGA) = 250,
* ● c(P2) = 1/3,
● size (mult) = 100,
0 0 ● c(P3) = 2/6.
● size(add) = size(sub) = 20,
- P3 + ● Quality: 0.28
● size(comp) = 10.

CS G553 27
Improvement

 Best criteria:
o Total computation time of DFG:

tDFG = n × CH + 1,…,n(tPi)

o CH: Reconfiguration time of device H


o tPi : Computation time of partition Pi.
o n: Number of partitions
 Optimization:
o If CH too large, then the optimization will tend to minimize the
number of partitions
o If CH « tp, then algorithm will tend to avoid long paths in partitions.

CS G553 28
Improvement

 Advantage of LS-based temporal partitioning:


o Fast (linear time algorithm)
o → Local optimization possible *
• e.g. configuration switching + / *
Level 0

+ - *
Level 1

- /
Level 2
• Disadvantage: Level 3
➢ Levelization:
− Modules are assigned to partitions based more on their
level number rather than their interconnectivity with other
component.
➢ Interconnectivity (data exchange) must be optimized.

CS G553 29
LS-Based Temporal Partitioning

3 3 1 3 * 3 * 2 * 1 * 1 +
* * +
2 1 0 0
0 * * + <
P1 <
1 -
2 2
* * 0 -
P2
1 ● Criterion: number of
● Connectivity:
- successors
1 ● c(P1) = 1/6,
1 ● size(FPGA) = 250,
* * ● size (mult) = 100,
● c(P2) = 1/3,
● c(P3) = 2/6.
0 0 ● size(add) = size(sub) = 20,
- ● Quality: 0.28
P3 + ● size(comp) = 10.

CS G553 30
Improved Temporal Partitioning

3 1 1 3 * 3 * 2 * 1 * 1 +
* * +
P1 2 1 0 0
0 * * + <
0 + <
1-
3 *
2
* 0 -
P2
1 ● Connectivity:
- 2 ● c(P1) = 2/10,

1 * ● c(P2) = 2/3,
● c(P3) = 2/3.
0 *
● Quality: 0.51
- P3
● Quality is better

CS G553 31
Improved List Scheduling
 Pair wise interchange

CS G553 32
Temporal partitioning – ILP
 With the ILP (Integer Linear Programming),
o Temporal partitioning constraints are formulated as
equations.
o The equations are then solved using an ILP-solver.
 The constraints usually considered are:
o Uniqueness constraint
o Temporal order constraint
o Memory constraint
o Resource constraint
o Latency constraint
 Notations:
( yvi = 1)  (v  Pi )
( wuv  0)  ((u  Pi )  (v  Pj )  ( Pi  Pj ))

CS G553 33
Temporal partitioning – ILP
 Unique assignment constraint: Each task must be placed in
exactly one partition. (m = # of partitions)
m
v  V ,  yvi = 1
i =1

 Precedence constraint: For each edge e = (u, v) in the


graph, u must be placed either in the same partition as v or
in an earlier partition than that in which v is placed.
m m
(u, v)  E ,  iyui  iyvi
i =1 i =1

CS G553 34
Temporal partitioning – ILP

 Resource constraint: The sum of the resources needed to


implement the modules in one partition should not exceed the
total amount of available resources.
• Device area constraint: s
• Device terminal constraints: T (size of communication memory):

Pi  P, y
vV
vi s (v)  s (device)

Pi  P, w uv
( uPi )  ( vPi )
+ w uv
( uPi )  ( vPi )
 T (device)

CS G553 35
Temporal partitioning – ILP

 Communication memory constraint: The total amount of data


to be temporally saved must not exceed the size of the
communication memory used. This constraint is captured by
the following equation.

CS G553 36
Temporal partitioning by ILP: Example

 Assignment constraint:
o y11+ y12 + y13 = 1
o y21+ y22 + y23 = 1
o …
o y71 +y72 + y73 = 1

o Partition P1:
o y22 = y23 = 0, y21 = 1
o y32 = y33 = 0, y31 = 1
o y42 = y43 = 0, y41 = 1

o Partition P2:
o y11 = y13 = 0, y12 = 1
o y51 = y53 = 0, y52 = 1
o y61 = y63 = 0, y62 = 1

o Partition P3:
o y71 = y72 = 0, y73 = 1

CS G553 37
Temporal partitioning by ILP: Example

 Precedence constraint:

i i

i i

CS G553 38
Temporal partitioning by ILP: Example

 Resource constraint:
o Device with a size of 200 LUTs, and 100 LUTs for the multiplication,
50 LUTs each for the addition, the comparison

s(u)
s(u)
s(u)

CS G553 39
Temporal partitioning by ILP: Example

 Communication memory constraint:


o Assume that a memory with 50 bytes is available for communication
and each datum has a 32-bit width.

Bits
Bits

For P2 to P3 : w67 and w17 = 64 bits ≤ 8×50

CS G553 40
The End

 Questions ?

 Thank you for your attention

CS G553 41

You might also like