Lect 5
Lect 5
Stephen A. Edwards
Forrest Brewer
Ryan Kastner
Philosophy of Dataflow Languages
• Drastically different way of looking at computation
FIFO Buffer
Process 1 Process 2
FIFO Buffer
FIFO Buffer
Process 3
Dataflow Communication
• Communication only through buffers
- No side effects (or shared memory)
• Buffers are unbounded for simplicity
- Causes model complexity issues
• Token Sequence into link is sequence out of link
- links are strictly FIFO
• Destructive read: reading a value from a buffer
removes the value
- Cannot ‘check’ to see new token without read
• Unlike shared memory, can always determine latency
Applications of Dataflow Models
• Poor fit for a word processor
- Data-flow models are weak on control intensive behavior
• Common in signal-processing applications
- Ordered streams of data
- Simple map to pipelined hardware
• Lab View, Simulink, System C Transactions
• Buffers used for signal processing applications anyway
- FIFO buffers allow for mediation of bursty flows up to
capacity of the buffer
• Rates must strictly agree on average
Applications of Dataflow
• Good fit for block-diagram specifications
- System Level RTL (directed links)
- Linear/nonlinear control systems (Feedback Networks)
- Network Computing
• Common in Electrical Engineering
• Value: reasoning about data rates, availability,
latency and performance can be done abstractly
- Used for top-level models before processes are
designed
- Allow reasoning about process requirements
Kahn Process Networks
• Proposed by Kahn in 1974 as a general-purpose
scheme for parallel programming
• Laid the theoretical foundation for dataflow
• Unique attribute: deterministic
• Difficult to schedule
• Too flexible to make efficient, not flexible enough for
a wide class of applications
• Never put to widespread use
Kahn Process Networks
• Key idea:
g f
h
process f(in int u, in int v, out int w)
{
u
int i; bool b = true;
for (;;) {
i = b ? wait(u) : wait(w); f w
printf("%i\n", i);
v
send(i, w);
b = !b;
} Process alternately reads
} from u and v, prints the data
value, and writes it to w
A Kahn Process h
g f
• From Kahn’s original 1974 paper
h
Process
process f(in int u, in int v, out int w) interface
{ includes FIFOs
int i; bool b = true;
wait() returns the next
for (;;) { token in an input FIFO,
i = b ? wait(u) : wait(w); blocking if it’s empty
printf("%i\n", i);
send(i, w);
b = !b; send() writes a data
value on an output FIFO
}
}
A Kahn Process h
g f
• From Kahn’s original 1974 paper
h
• Behavior of process:
Compute … read … compute … write … read …
compute
A C
B
Scheduling Kahn Networks
• Challenge is running processes without
accumulating tokens
A C
Only consumes
tokens from A
Tokens will
accumulate here
B
Always emit tokens
Demand-driven Scheduling?
• Apparent solution: only run a process whose outputs
are being actively solicited
• However...
A C
Always
consume
tokens
B D
Always
produce
tokens
Other Difficult Systems
• Not all systems can be scheduled without token
accumulation
Produces
two a’s for Alternates
every b between
receiving
one a and
one b
Tom Parks’ Algorithm
• Schedules a Kahn Process Network in bounded
memory if it is possible
• Start with bounded buffers
• Use any scheduling technique that avoids buffer
overflow
• If system deadlocks because of buffer overflow,
increase size of smallest buffer and continue
Parks’ Algorithm in Action
• Start with buffers of size 1
• Run A, B, C, D
Only consumes
0-1-0 C tokens from A
A
0-1
B 0-1-0 D
Parks’ Algorithm in Action
• B blocked waiting for space in B->C buffer
• Run A, then C
• System will run indefinitely
Only consumes
0-1-0 C tokens from A
A
B 0 D
Parks’ Scheduling Algorithm
• Neat trick
• Whether a Kahn network can execute in bounded
memory is undecidable
• Parks’ algorithm does not violate this
• It will run in bounded memory if possible, and use
unbounded memory if necessary
Using Parks’ Scheduling Algorithm
• It works, but…
f AN fBM
• Schedulable statically number of tokens
• Get a well-defined “iteration” consumed
number of firings per
• Decidable: “iteration”
buffer memory requirements number of tokens produced
deadlock
fire A { fire B {
… channel …
produce N consume M
… N M …
} }
SDF and Signal Processing
• Restriction natural for multirate signal processing
• Unit-rate
- Adders, multipliers
• Upsamplers (1 in, n out)
• Downsamplers (n in, 1 out)
Operational Semantics
Firing Rule
• Any enabled actor may be fired to define
the “next state” of the computation
• An actor is fired by removing a token
from each of its input arcs and placing
tokens on each of its output arcs.
• Computation A Sequence of
Snapshots
- Many possible sequences as long as
firing rules are obeyed
- Determinacy
- “Locality of effect”
Multi-rate SDF System
• DAT-to-CD rate converter
• Converts a 44.1 kHz sampling rate to 48 kHz
1 1 2 3 2 7 8 7 5 1
Upsampler Downsampler
Delays
• Kahn processes often have an initialization phase
• SDF doesn’t allow this because rates are not always
constant
• Alternative: an SDF system may start with tokens in
its buffers
• These behave like delays (signal-processing)
• Delays are sometimes necessary to avoid deadlock
Example SDF System
• FIR Filter (all single-rate) Duplicate
One-cycle delay
+ + + +
Adder
SDF Scheduling
• Schedule can be determined completely before the
system runs
• Two steps:
np nc
A B
3 A 2
2
1 1
1
B C
1 1
• Balance for each edge:
- 3 vS(A) - vS(B) = 0
- vS(B) - vS(C) = 0
- 2 vS(A) - vS(C) = 0
- 2 vS(A) - vS(C) = 0
Balance equations
A 2 3 -1 0
3
2 M =0 1 -1
1 1 2 0 -1
1
2 0 -1
B C
1 1
• M vS = 0
iff S is periodic
• Full rank (as in this case)
• no non-zero solution
• no periodic schedule
(too many tokens accumulate on A->B or B->C)
Balance equations
A 2 2 -1 0
2
2 M =0 1 -1
1 1 2 0 -1
1
2 0 -1
B C
1 1
• Non-full rank
• infinite solutions exist (linear space of dimension 1)
1 A 2
1 3
B C
• No admissible schedule:
2 3
BACBA, then deadlock…
• Adding one token on A->C makes
BACBACBA valid
• Making a periodic schedule admissible is always possible, but changes specification...
From repetition vector to schedule
2 A 2
2
1 1
1
B C
1 1
• Can find either ABCBC or ABBCC
• If deadlock before original state, no valid schedule exists (Lee
‘86)
Calculating Rates
• Each arc imposes a constraint
3a – 2b = 0
4b – 3d = 0
1 4
b b – 3c = 0
2 2c – a = 0
3 3
d – 2a = 0
c 6 d
2 1 Solution?
3
1 2 a = 2c
a
b = 3c
d = 4c
Calculating Rates
• Consistent systems have a one-dimensional solution
- Usually want the smallest integer solution
a–c=0
1 1
a c a – 2b = 0
1 1 3b – c = 0
2 3
b 3a – 2c = 0
An Underconstrained System
• Two or more unconnected pieces
• Relative rates between pieces undefined
1 1
a b a–b=0
3c – 2d = 0
3 2
c d
Consistent Rates Not Enough
• A consistent system with no schedule
• Rates do not avoid deadlock
1 1
1 1
a b
8 1
{1,1,1,1,1,1,1,1} {1,0,0,0,0,0,0,0}
R 1 R 1
f A N i mod P f B M i mod Q ; R lcm( P, Q)
i 0 i 0
Ni
N 0 , , N P 1 M 0 , , M Q 1
Cyclo-Static Dataflow
• Scheduling similar to SDF
• Balance equations establish relative rates
• Key: avoid underflow of channel
• Advantages
- Increased schedule flexibility
• Easier to avoid large buffers
- Closer to parallel hardware model
• Links move single values at a time
Multidimensional SDF
(Lee, 1993)
• Production and
consumption of N-
dimensional arrays of
data:
• Undecidable:
- deadlock
- bounded buffer memory
- existence of an annotated schedule
Dynamic Dataflow (DDF)