Supernova - A Scalable Parallel Audio Synthesis Server For Supercollider
Supernova - A Scalable Parallel Audio Synthesis Server For Supercollider
Supernova - A Scalable Parallel Audio Synthesis Server For Supercollider
Tim Blechmann
[email protected]
644
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011
pipeline stages should take the same CPU time and the
number of pipeline stages should match the number of
CPU cores. While pipelining is a simple technique to in- SinOsc.ar(440)
crease the throughput, it usually increases the latency. In
terms of computer music systems this would increase the
latency of a signal, which reduces its usability for real-
time applications.
Out.ar(53) ? In.ar(53)
645
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011
However, SuperCollider does not have a proper notion but they do not necessarily depend on a result of one of
of a dependency graph. Instead, its node graph models a the predecessors of the parent (parallel) group. To express
tree hierarchy, with synths and groups as tree nodes and a these single dependency relations, we introduce another
group as its root. Groups are lists of nodes, which can be concept that we call satellite nodes. Satellite predeces-
used to structure the audio synthesis and address multiple sors have to be evaluated before their reference nodes,
nodes as one entity. Listing 1 shows a small code example while satellite successors are evaluated after their refer-
that would result in the node hierarchy shown in Figure 2. ence node.
Since the node graph is explicitly exposed to the lan- Listing 3 shows how satellite predecessors can be used:
guage, the user needs to take care of the correct order of all generator synths are instantiated as satellite predeces-
execution. While this imposes some responsibility on the sors of the effect node, so they would be initially runnable.
user, it can be modified to explicitly specify parallelism. Typical use cases for satellite successors would be audio
To achieve this, we propose two extensions to the Super- analysis synths like peak meters for GUI applications, or
Collider node graph: parallel groups and satellite nodes. sound file recorders.
Since satellite nodes provide a facility to specify de-
3.1. Parallel Groups pendencies more accurately, the parallelism for many use
cases can be increased. It is even be possible to dispatch
The first approach to specify parallelism is the concept of satellite nodes at a lower priority in order to optimize graph
parallel groups. Parallel groups would be available as throughput.
ParGroup class in the SuperCollider language and have The combination of parallel group and satellite nodes
a semantics similar to groups. Like groups they can con- should provide sufficient means to parallelize many use
tain child nodes. However, instead of evaluating the child cases. They still do not model a dependency graph with
nodes sequentially, the order of execution is undefined, arbitrary dependencies, so there are certain dependency
so all nodes can be evaluated in parallel. This provides graphs which may be cumbersome to formulate. But this
the user with a simple facility to explicitly specify paral- limitation also avoids problems such as cyclic dependen-
lelism. Assuming that the generators of the earlier exam- cies, and integrates well in the node hierarchy.
ple can be evaluated in parallel, the code could be imple-
mented using the proposed ParGroup class as shown in 4. SUPERNOVA
Listing 2. While the node hierarchy would be the same as
shown in Figure 2, the dependency graph shown in Figure Supernova is a parallel implementation of the SuperCol-
3 is different. lider server, that can be used as drop-in replacement for
Introducing parallel groups has the advantage of be- scsynth. It implements an extended OSC interface, which
ing compatible with scsynth. Scsynth can simply emulate supports the necessary commands to instantiate parallel
parallel groups with sequential groups, since they provide groups and satellite nodes. Supernova can dynamically
all semantic properties as parallel groups. load SuperCollider unit generators, although the source
code needs to be slightly modified if the unit generator
3.2. Satellite Nodes accesses resources like buffers or busses.
While parallel groups easily fit into the concept of the Su-
4.1. Resource Access
perCollider node graph, they impose some limitation to
parallelism. Members of parallel groups are synchronized In scsynth, unit generators are known to be evaluated se-
in two directions: they are evaluated after all predecessors quentially. Obviously, this is not the case for unit gener-
of the parallel group have been evaluated and before all ators in Supernova. In order to ensure data consistency
successors. For many applications one does not need to for concurrent execution, some care needs to be taken to
specify both directions of synchronization, but one syn- achieve thread safety. The main data structures that are
chronization constraint is sufficient. In the example above shared among unit generators are busses and buffers. To
the generators need to be evaluated before the effect node, allow multiple readers for a resource, the unit generator
646
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011
Synth: myFx
API has been extended by adding reader-writer spin-locks reasonably low (tens of microseconds, depending on the
for each bus or buffer. Before a unit generator can access size of the graph). So this does not significantly affect the
a resource, it needs to acquire the corresponding spinlock. real-time safety of the audio synthesis engine.
Since some unit generators require access to multiple re-
sources, some care needs to be taken in order to prevent 4.3. Limitations
deadlocks. Therefore a simple locking policy is used: a
total order of all resources is defined and locks need to be For low-latency applications real-time computer music sys-
acquired in this order. If one lock cannot be acquired, all tems require a low worst-case scheduling latency. There-
previously acquired locks need to be released before the fore it is not feasible to use blocking primitives for syn-
locks will be acquired again. chronization of the main audio thread and the audio helper
While this ensures atomicity for write access, it does threads. Instead Supernova wakes all helper threads in the
not take all the responsibility away from the user. For beginning of the main audio callback and all threads poll
example, two synths may use the Out.ar ugen in paral- a lock-free stack, containing those queue nodes that are
lel to write to the same bus without problems, since the ready for execution. This greedy use of CPU resources is
actual order of the write access does not matter, but for not friendly to other processes. Depending on the struc-
ReplaceOut.ar the semantics would differ. ture of the node graph a significant amount of CPU time
could be spent in the busy waiting loops. Unless one uses
highly tuned systems running the RT preemption patches
4.2. Dependency Graph for the Linux kernel, it seems to be the only way to dis-
patch threads quickly enough.
Internally, Supernova does not interpret the node graph di-
rectly as is done in scsynth, but the node graph is used to
create a dependency graph data structure. This data struc- 5. EXPERIMENTAL RESULTS
ture does not have the notion of groups and synths any
more, but its nodes contain sequences of synths. In this Supernova is designed for low-latency real-time opera-
representation, sequential synths are combined into a sin- tions. In order to evaluate the performance, we measured
gle queue node to avoid the overhead of scheduling each the execution times of the audio callback and stored them
synth as a single entity. While the construction of the de- in a histogram with microsecond granularity. This ap-
pendency graph introduces some run-time overhead when proach has the advantage that it does not only show mea-
the signal graph is changed, benchmarks suggest that it is sure the thougput, but that it actually shows more detailed
performance characteristics. For real-time audio synthe-
sis, the speedup of the worst case is more interesting than
the speedup of the average case, since a missed deadline
Listing 3: Satellite Node example would result in a possibly audible audio dropout.
var fx = Synth.new(\myFx); The tests were carried out on an Intel Core i7 work-
station, running Linux with RT preemption patches. Its
4.do { worst-case scheduling latency was measured to be about
Synth.preceding(fx, 20 microseconds. Different test graphs were examined.
\myGenerator) Each graph layout was tested with up to 4 threads using
}; parallel groups and is compared against sequential groups.
Figure 4 shows a typical histogram. One can observe
647
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011
Figure 4: Execution Time Histogram, One Parallel Group with 256 Lightweight Synths
different aspects: the execution time for each test case international conference on Applied parallel com-
shows little spread. Most of the histogram samples are puting: state of the art in scientific computing.
found around one peak, which a second small peak rough- Springer-Verlag, 2006, pp. 1–10.
ly 20 microseconds after the first. The time difference
between the first and second peak is in the order of mag- [2] S. Gochman, R. Ronen, I. Anati, A. Berkovits,
nitude of the worst-case scheduling latency that can be T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C.
achieved by the workstation. So it is most likely a result of Valentine, “The Intel Pentium M Processor: Mi-
hardware effects. Since no samples can be found behind croarchitecture and Performance,” Intel Technology
the second peak, the implementation can be considered as Journal, vol. 7, no. 02, pp. 21–36, 2003.
real-time safe.
[3] L. B. Kish, “End of Moores law: thermal (noise)
Using the histograms, average-case and worst-case ex-
death of integration in micro and nano electronics,”
ecution times can be determined and speedups can be com-
Physics Letters A, vol. 305, no. 3-4, pp. 144–149,
puted. Figures 5 and 6 show the computed speedups for
2002.
different use cases, both average-case and worst-case.
[4] S. Letz, Y. Orlarey, and D. Fober, “Jack audio server
6. CONCLUSION for multi-processor machines,” in Proceedings of the
International Computer Music Conference, 2005.
This paper introduces Supernova, a replacement for Super-
Collider’s default synthesis server scsynth. Supernova sup- [5] J. McCartney, “SuperCollider, a new real time syn-
ports two extensions to the SuperCollider node graph, so thesis language,” in Proceedings of the International
that the user can explicitly express parallelism in the node Computer Music Conference, 1996.
hierarchy. Its multiprocessor aware synthesis engine is
[6] J.-M. Muller, Elementary Functions: Algorithms
optimized for real-time audio synthesis, and scales well
and Implementation. Birkhauser, 2006.
with the number of CPUs. The next release of SuperCol-
lider will include Supernova as alternative to scsynth. [7] K. Olukotun and L. Hammond, “The Future of Mi-
croprocessors,” Queue, vol. 3, no. 7, pp. 26–29,
7. ACKNOWLEDGEMENTS 2005.
This paper is a revisited and extended version of the paper [8] Y. Orlarey, S. Letz, and D. Forber, “Multicore Tech-
presented at the SuperCollider Symposium 2010. Many nologies in Jack and Faust,” in Proceedings of the
thanks to Dr. Dan Stowell and Dr. M. Anton Ertl for International Computer Music Conference, 2008.
their valuable feedback and James McCartney for creating
[9] M. Puckette, “Combining Event and Signal Pro-
SuperCollider.
cessing in the MAX Graphical Programming
Environment,” Computer Music Journal, vol. 15,
8. REFERENCES no. 3, pp. 68–77, 1991.
[1] A. Buttari, J. Dongarra, J. Kurzak, J. Langou, [10] ——, “FTS: A Real-time Monitor for Multipro-
P. Luszczek, and S. Tomov, “The impact of multi- cessor Music Synthesis,” Computer Music Journal,
core on math software,” in Proceedings of the 8th vol. 15, no. 3, pp. 58–67, 1991.
648
Proceedings of the International Computer Music Conference 2011, University of Huddersfield, UK, 31 July - 5 August 2011
4
3.5
3
2.5
Speedup
2
1.5
1
0.5
0
group 1 Thread 2 Thread 3 Thread 4 Thread
4
3.5
3
2.5
Speedup
2
1.5
1
0.5
0
group 1 Thread 2 Thread 3 Thread 4 Thread
649