Dataflow Programming Concept, Languages and Applications
Dataflow Programming Concept, Languages and Applications
Introduction
1.1
Motivation
1.2
Structure
This survey is composed by five sections, from which this first one is the introduction. The history and concepts of Dataflow Programming are described in
the next section. Section 3 gives examples of DFP languages, frameworks for
implementing the dataflow paradigm and know usages from it. Section 4 argues
about some well-known issues over DFP, as well as describing some common
answers for some of those questions. In section 5 the author argues on why DFP
is relevant knowledge for any developer. Section 6 details future work and the
paper is then finished with a last section detailing the conclusions gathered in
this survey paper.
4
History
DFP has been subject of study in the area of Software Engineering for more
than 40 years, with its origins being traced back at at the Ph.D. thesis of Bert
Sutherland [30]. Sutherland used a light-pen and a TX-2 computer to create a
visual programming language, on top of the SKETCHPAD framework. He also
contributed with patterns for graphical representation of procedures that are still
used in visual languages today.
In figure 1 Sutherland shows how arithmetic instructions can be represented
in both textual and visual forms. In that example, extracted from Sutherlands
thesis, we can understand how parallel operations occur and why they result in a
reduction of the computation time in even such a small code snippet. We can
observe that the calculation of the value of W can be processed simultaneously
with the other arithmetic operations occurring in the two vertically aligned
nodes, as there are no data dependencies between them. In a DFP language, such
parallel computation is achieved automatically by the compiler. The compiler
analyses the source and creates an internal dataflow representation of it, based
on connected notes, commonly, with each node being processed by an individual
thread. DFP compilers exist to create such binaries from either textual and visual
languages.
2.2
Architecture
With the increased need to compute large datasets and enable common computers
to process more than a single thread at the same time, both in the industrial
and scientific world, the need for multi-core processor systems arose [9]. Despite
that, multi-threaded programming was still an error prone task to achieve,
as it was subject to race conditions, very complex scenarios to debug. The
disadvantages and common problems with using threads were well summarized
by Ousterhout [24]. Dataflow programming was able to provide parallelism
without the increased complexity involved in the management of threads.
In dataflow programming, computation nodes are connected between themselves whenever a node as a dependency on the value processed from another
node. Values are propagated as soon as they are processed to the dependent
nodes, triggering the computation on them.
The dataflow paradigm has been used in a wide range of contexts, supporting
either massive computation of data or being the basis for visual languages
providing end-user programming capabilities. The Journal of Visual Languages
and Computing 5 is a reference point in the novel researches being held in this
topic.
This section introduces DFP languages and relevant implementations using
them. The section describes a textual and a visual dataflow language, particularly,
SISAL and Quartz Composer. Although, many more exist, with some relevant
names such as LabVIEW [31], VHDL [29] or LUSTRE [12].
3.1
End-User Programming
Fig. 2. The Quartz composer editor. Blocks and the connection between them are
clearly visible. The interface is visually attractive and easy to use.
3.3
The actor model is a very popular concurrency model by Carl Hewitt from MIT
introduced in the 70s. With his team, he researched a method that allow developers not only to simplify the process of parallelizing their computations, but also to
increase the confidence on the concurrent behavior of their programs [14].Twitter
as adopted it for scaling their computations [21].
An Actor is an agent that receives and sends messages, behaving independently
from other actors in the system. On each message, the actor is able to start new
actors, compute data or reply with messages to other existing actors. In the
dataflow paradigm, an actor is the equivalent to the node and the messages past
are equivalent to the connections between nodes.
This architecture perfectly fits the dataflow model when an actor is used as
a processing node and the massages between them as communication channels.
In cases where theres the need to use an imperative or functional programming
language, the actor model could be applied to port the concepts of dataflow
programming into those languages, as it has been done by [27,18,11,23].
Many implementations of the actor model are freely available for several
languages [26,28,32,1].
Open Problems
Dataflow programming is an area still open to further research, with some open
issues to answer. In fact, most of the open questions today have long been
identified and despite the improvements, patterns for answering them are yet to
be achieved.
4.1
Visual Representations
Fig. 3. A While block in VIPERS. A represents a block (or set of blocks) inside the
loop that receives and generates new values of x and y. Whenever A returns an x y
the loop exits and continues execution to block B.
4.2
Debugging
The steps above can be followed by language designers to guide the development of visual debugging tools for DFP languages using a graph-based
representation of the application, obtained from either a textual of visual language.
Discussion
This paper introduces the DPF paradigm and presents the two most relevant
features within it: DFP as a basis for most visual programming languages,
including a as way of providing end-user programming in applications and the
ability to seamlessly provide developers with a parallel computational model,
without introducing development complexity.
Visual Programming Languages allow experienced users to perform rapid
application development and non-technical users to extend their application,
what is commonly denominated by end-used programming, or create their own
applications, without requiring programming knowledges. A common issue with
these languages is the complexity to provide abstractions capable of representing
an application without resulting in a huge, unperceivable, dataflow diagram
this paper identifies two patterns that can be applied to prevent this situation.
Non visual DFP languages also exist. The textual approaches to DFP have a compiler capable of inferring the internal dataflow representation of the application,
defining how parallelism is achieved automatically.
Concurrency is also easily achieved by the lack of side-effects in a DFP processing node. Following the concept that data is transmitted as a message and that
these are sequentially processed as they arrive to a node provides DFP languages
with parallelism out of the box, a valuable feature for developers looking to
increase performance on parallelizable applications and algorithms.
Future Work
Conclusions
computation applications. Due to the lack of good quality visual editors and
frameworks available for creating such systems, the creation of a generic framework
for building end-user programing systems on top of a DFP architecture based
on the actor model would be of use in several scenarios and will be pursued as
future work.
References
1. Agha, G.: Actors: a Model of Concurrent Computation in Distributed Systems,
Series in Artificial Intelligence (Jun 1985)
2. Arvind, D.: IEEE Xplore - Dataflow architectures and multithreading. Annual
review of computer science (1986)
3. Bernini, M.: VIPERS (1994)
4. Browne, J., Hyder, S., Dongarra, J.: IEEE Xplore - Visual programming and
debugging for parallel computing (1995)
5. Cann, D.: Retire Fortran? A debate rekindled (1991)
6. Dennis, J.B.: Data Flow Supercomputers. Computer 13(11), 4856 (1980)
7. Erwig, M., Meyer, B.: Heterogeneous visual languages-integrating visual and textual
programming pp. 318325
8. Feo, J., DeBoni, T.: A tutorial introduction to sisal (August 1991), https://
waimingmok.wordpress.com/2009/06/27/how-twitter-is-scaling/
9. Feo, J., Cann, D.: A report on the Sisal language project (1990)
10. Ferreira, H., Aguiar, A., Faria, J.: Adaptive Object-Modelling: Patterns, Tools
and Applications. In: Software Engineering Advances, 2009. ICSEA 09. Fourth
International Conference on. pp. 530535 (2009)
11. Gu, R., Janneck, J., Bhattacharyya, S., Raulet, M., Wipliez, M., Plishker, W.:
Exploring the Concurrency of an MPEG RVC Decoder Based on Dataflow Program
Analysis. Circuits and Systems for Video Technology, IEEE Transactions on 19(11),
16461657 (2009)
12. Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous data flow
programming language LUSTRE. Proceedings of the IEEE 79(9), 13051320 (Sep
1991)
13. Hermans, F., Pinzger, M., van Deursen, A.: Breviz: Visualizing Spreadsheets using
Dataflow Diagrams. arXiv.org cs.SE (Nov 2011), 9 Pages, 5 Colour Figures; Proc.
European Spreadsheet Risks Int. Grp. (EuSpRIG) 2011 ISBN 978-0-9566256-9-4
14. Hewitt, C., Bishop, P.: A universal modular ACTOR formalism for artificial intelligence. 3rd IJCAI-73 (1973)
15. Inc., A.: Quartz composer user guide (July 2007), http://developer.
apple.com/library/mac/#documentation/graphicsimaging/conceptual/
QuartzComposerUserGuide/qc_intro/qc_intro.html#//apple_ref/doc/uid/
TP40005381
16. Johnston, W., Hanna, J.: Advances in dataflow programming languages. ACM
Computing Surveys (CSUR) (2004)
17. Kahn, G.: The Semantics of a Simple Language for Parallel Programming. In
Information Processing
74: Proceedings of the IFIP Congress (1974), pp. 471-475.
pp. 471475 (1974)
18. Lee, E., Parks, T.: Dataflow process networks. In: Proceedings of the IEEE. pp.
773801 (1995)
19. McGraw, J.: The VAL Language: Description and Analysis (1982)
20. Mellor, S.J., Balcer, M.B.J.I.: Executable UML: A Foundation for Model-Driven
Architectures. Addison-Wesley Longman Publishing Co., Inc. (Jun 2002)
21. Mok, W.: How twitter is scaling (June 2009), https://waimingmok.wordpress.
com/2009/06/27/how-twitter-is-scaling/
22. Mosconi, M.: ScienceDirect - Computer Languages : Iteration constructs in data-flow
visual programming languages. Computer languages (2000)
23. Oh, H.: Constant Rate Dataflow Model with Intermediate Ports for Efficient Code
Synthesis with Top-Down Design and Dynamic Behavior. Quality Electronic Design,
2008. ISQED 2008. 9th International Symposium on pp. 190193 (2008)
24. Ousterhout, J.: Why threads are a bad idea (for most purposes) (1996)
25. Petre, M.: ScienceDirect - International Journal of Human-Computer Studies :
Mental imagery in program design and visual programming. International Journal
of Human-Computer Studies (1999)
26. Philipp Haller, F.S.: Actors in Scala pp. 1139 (Mar 2011)
27. Plishker, W., Sane, N., Bhattacharyya, S.: A generalized scheduling approach for
dynamic dataflow applications. In: Design, Automation & Test in Europe Conference
& Exhibition, 2009. DATE 09. pp. 111116 (2009)
28. Scherer, A., Gandhi, R.: Programming Concurrency on the JVM
29. Sjoholm, S., Lindh, L.: VHDL for Designers. Prentice Hall PTR, Upper Saddle
River, NJ, USA (1997)
30. Sutherland, W.: On-Line Graphical Specification of Computer Procedures. (1966)
31. Travis, J., Kring, J.: LabVIEW for Everyone: Graphical Programming Made Easy
and Fun (3rd Edition) (National Instruments Virtual Instrumentation Series).
Prentice Hall PTR, Upper Saddle River, NJ, USA (2006)
32. Vajda, A.: Programming Many-Core Chips - Andr
as Vajda, Mats Brorsson, Diarmuid (CON) Corcoran - Google Books (2011)