Jacktrip/Soundwire Meets Server Farm: Center For Computer Research in Music and Acoustics (CCRMA) Stanford University

Proceedings of the SMC 2009 - 6th Sound and Music Computing Conference, 23-25 July 2009, Porto - Portugal
JACKTRIP/SOUNDWIRE MEETS SERVER FARM
Juan-Pablo Cáceres & Chris Chafe

Center for Computer Research in Music and Acoustics (CCRMA)
Stanford University
{jcaceres,cc}@ccrma.stanford.edu
ABSTRACT a software for low-latency, high quality and multi-channel

audio streaming over TCP/IP Wide Area Networks (WAN).
Even though bidirectional, high-quality and low-latency au- The design and architecture is first geared towards imple-
dio systems for network performance are available, the com- mentation of this QoS evaluation method. The architecture
plexity involved in setting up remote sessions needs better has also been extended to provides other types of service.
tools and methods to asses and tune network parameters. We In particular, a central “mixing hub” to control audio in a
present an implementation of a system to intuitively evalu- concert where multiple locations are involved.
ate the Quality of Service (QoS) on best effort networks.
In our implementation, musicians are able to connect to a
2 QoS EVALUATION METRICS
multi-client server and tune the parameters of a connection
using direct “auditory displays.” The server can scale up to Cromer gives a good definition of QoS:
hundreds of users by taking advantage of modern multi-core
machines and multi-threaded programing techniques. It also “The term Quality of Service (QoS) refers to
serves as a central “mixing hub” when network performance statistical performance guarantees that a net-
involves several participants. work system can make regarding loss, delay,
throughput, and jitter.” [7, p. 510]
1 INTRODUCTION Most of the networks available today are best effort delivery,
i.e., don’t provide any specific level of QoS. As such, this in-
Systems for real-time, high-quality and low-latency audio frastructure can be problematic since sound is unforgiving in
over the Internet that take advantage of high-speed networks regard to packet loss and jitter; any lost data is immediately
are available and have been used in the last several years audible. In evaluating a particular connection, we want to
for distributed concerts and other musical applications [12]. know “instantaneous” QoS, i.e., assessing its quality at any
The difficulty of setting up one of these distributed sessions given moment. Users should be able to adjust their settings
is, however, still very high. Most musicians have experi- to achieve the optimal quality given the current bandwidth
enced the disheartening amount of time that can be lost in and congestion conditions. This should be convenient and a
rehearsal, where most of the time is spent adjusting the con- conscious part of setting up, it should also be monitored with
nection rather than playing music. regard to longer-term changes: a connection that is perfectly
Keeping delay to a minimum is one of the main goals clean at 1:00 a.m. can become congested at 9:00 a.m. A bad
when tuning network parameters. Delay is known to be dis- connection today can be a surprisingly good one a year from
ruptive in musical performance [4], so a sensible goal is to now when intermediate network upgrades are put in place,
minimize it as much as possible. Often, there is a trade- or when the user asks that their service be enhanced.
off with audio quality. The longer the latency, the better A connection is presently either tuned by trial and er-
the audio (i.e., less dropouts) if facing problematic network ror, or is set automatically by an adaptive mechanism that
conditions. For most users that are not familiar with TCP/IP changes the data rate depending on bandwidth availability
network protocols 1 and delivery, understanding the mean- [11]. Adaptive methods are typically found in unidirec-
ing of these parameters can be daunting. tional streaming and have a disadvantage for bidirectional
We present here a server-based application that can be high-quality audio. Latency is a parameter we want to keep
of use to intuitively tune these parameters using “auditory constant. To accommodate changing amounts of jitter, adap-
displays” [5]. With it, musicians tune their network connec- tive methods can arbitrarily increase and decrease the local
tion much like they do their instruments, using their ears. buffering, affecting total latency in a way that is very dis-
The implementation is part of the JackTrip application [3], ruptive for musical performance.
1 In particular, we use the User Datagram Protocol (UDP) which is part We describe an implementation of a tool that let musi-
of the TCP/IP protocol suite. cians tune a connection completely by ear. Parameters like
Page 95
buffer size, sampling rate, packet size, and packet redun- Two different but related techniques are implemented.
dancy among others, can be adjusted using this “auditory The first relies on a “smart” client which can change to a
display” mechanism. new server port number after being assigned one for exclu-
sive communication. The second uses Linux’s iptables rules
to route clients into local sockets. The former technique has
2.1 Pinging the network, acoustically
the advantage of being portable (works currently on both
The advantages of evaluating very fine-grained jitter and Linux and OS X, and should be easily portable to Windows),
packet loss using these “auditory displays” have been pre- is lightweight and doesn’t require root privileges. In turn, it
viously discussed in the literature [5]. The method consists expects its clients to change connection ports. The latter
of listening to a pitched sound in order to assess delay, jitter, technique (Linux only) requires iptables privileges but pro-
and loss. The procedure produces a tone by recirculating vides a mechanism whereby “dumb” clients (e.g., embedded
audio in the network path and thus allows for fine-grained systems) can connect to a unique IP/Port pair without any
listening of the packet flow 2 . The acronym, SoundWIRE, change to their behavior. It is also computationally more
describes the technique used in this project, sound waves expensive because the kernel has to perform a port redirect
on the Internet from real-time echoes. In principle, it uses by source IP number for every packet.
the Karplus-Strong plucked string synthesis algorithm [9]
and simply replaces string delay lines running in local host 0&(1#2'-3'-#.'%%4
5(6.(#+6-)
memory with network memory.
This technique can be extended to incorporate different- 0%6$5#7()&%#,#
$%&'()#-'87'")"#,#
sounding auditory pings using other physical models [6], but $6(('$)&6(
+$/"*4-)-5#-6%*)(*.$3378/(./
the underlying approach is the same. In the case of, e.g., a 9%&'()#:
,(#)*9/)23*%$#:$#*#$,32$%
string physical model, musicians want to tune their connec-
tion to get a sounding instrument that has the highest pos- !"#$%&'()*"#
!33*/$.*"-)-5#-6%*-#$*%$/)*)(*)12%*/$.*,(#)<
!""#$%%&'(#)*
+$#:$#*;$,32$%*.2)1*/$.*32%)$/2/5*,(#)<
sible pitch (low delay) without vibrato (jitter). Users also +,&-#('./ =
want to minimize extraneous impulses coming from packet

C
loss. ;'<&")'-#$%&'()#
!""#$%%&'(#)#
In the next section (Sec. 3), we present an architecture of +,&-
=
a server that clients can use to evaluate and tune their con-
9%6('#,(#
nection solely based on auditory feedback, much like guitar &("),($'#6B# !"#)>'#$7--'()#
?,$5@-&+#A6-5'-# ?,$5@-&+#A6-5'-# C
players tune their instruments. .&)>#('.# "+,.(&(</
%&")'(&(<#"'-3'-#
+6-)
+,-./*-*01#$-"*2/)(*)1$*'((3
3 MULTI-CLIENT CONCURRENT SEVER
!"#$%&'()*+&$,&)%-&,".)/ !"#$%&'())*
We extended the JackTrip platform to include a system for 2),-)#?,$5@-&+#
@>'#A6-5'-#-'<&")'-"#&)"'%B
$6(('$)&6(#)6#
.&)>#)>'#2'-3'-
QoS evaluation. The new architecture provides a multi- )>'#$%&'() !"#$%&'()*+&$,&)%-&,".)2
client concurrent server that can be used to provide QoS

;'D63'#I11-'""JF6-)#F,&-
B-6D#)>'#$%&'()*"#-'<&")-K
0%6$5#7()&%#
evaluation service, or a central network/mixer hub, among )26$(9)#G(6#('.#
+,$5')"#B-6D#
)>'#$%&'()#B6-#=# !"#$%&'()*+&$,&)%-&,".)1
other uses. Taking advantage of multi-core computers, it "'$"H
is possible to run concurrently hundreds of clients with un- 2&<(,%#@&D'67)
LLL
compressed real-time audio and processing plugins. 2)6+#,%%#

+-6$'""'"#,(1#
-'D63'#@>-',1#
E-6D#)>'#F66% !"#$%&'()*+&$,&)%-&,".)0
3.1 Server architecture
The User Datagram Protocol (UDP) is a connection-less
Figure 1. Multi-client concurrent server algorithm
protocol and consequently identification of a new client’s IP
number has to be done on a packet-per-packet basis. Several
Figure 1 describes the architecture of the system. The
techniques to deal with multiple clients connecting are dis-
server listens on a well-known port for client connection re-
cussed in the literature [13], but no standard exists as in the
quests. For every new request, the server has to check if
case of Transmission Control Protocol (TCP) servers (see
the originating address/port pair is new. If it is, it registers
[7] or [10] for a good description of the differences between
it in an array of active address/port pairs and blocks the re-
TCP and UDP protocols.)
quests of new clients while this one is being processed. It
2 The granularity is determined by the sampling rate and the packet size. then allocates a new port to communicate exclusively with
e.g., at 48kHz and 64 samples/packet, the granularity is 1.3 milliseconds. this client, and informs it of the new port. The client then
Page 96
stops sending packets to the well- known port and starts to this technology. By connecting to the server and “listening”
send them to its own assigned one. From then on, the whole to the path, users can tune their connection to its optimal
JackTrip process is sent to a thread pool and runs indepen- settings. As mentioned above, there’s a trade-off between
dently, in its own thread. The server is freed to wait for new latency and sound quality. In the presence of jitter/vibrato,
client requests. The thread runs until the client stops send- the local buffering has to be increased to avoid late packets,
ing packets (or the server doesn’t receive them) for a certain but at the same time we don’t want to increase it too much
amount of time. At that point a signal is emitted and the (to avoid unnecessary latency). Doing this by trial and error
server deletes the client IP/port pair from the active clients requires experience and can be frustrating for new users. If,
registry and removes the process from the thread pool. The in turn, musicians can listen and tune the connection in the
implementation is written in C++ using the Qt libraries [1] same way they tune an instrument, the setup is much faster
for networking and multithreading. and intuitive. Again the goal for the musician, is that they
The architecture that uses Linux’s iptables is similar, ex- want to tune their pitch to be as high as possible (lower la-
cept that all port determination work is on the server side, tency) with the smallest possible vibrato (jitter).
and packets are redirected to a local IP/Port pair assigned
exclusively to the client. It doesn’t need to be notified of a 4.2 Star topology connection/mixing hub
new port.
Mixing and managing a remote connection when more than
two sites are involved can be very complicated. Engineers
4 SERVER APPLICATIONS
have to deal with audio channels coming from different
4.1 Quality of Service (QoS) evaluation places (sometimes on confusingly different channels), all
with different levels. They also need to make sure local
Each connection between the client and the server recircu- audio is sent to the peer with proper gains. A solution to
lates audio and implements a Karplus-Strong string model centrally manage these types of situations designates a mas-
[9]. This configuration has been discussed in detailed pre- ter location which can mix and/or relay all the channels and
viously [6]. Figure 2 shows a basic implementation of the send them back to the respective connected peers.
algorithm. The ipsi-lateral host (which in our system corre-
sponds to the server) generates excitations (plucks or noise
bursts) that are “echoed” back from the contra-lateral host
.+'/0,#8
(the client) recirculating in a loop that includes a low-pass
filter (LPF). .+'/0,#7
*"#!+)',()+ -%&'()+)',()+ =##:"00/+;

45 8##:"00/+;
!"#$%*!"#!********$%&'() 123
)*+," &'(
!"#$%&'(
-./,0
123 !"#$%*!"#!********$%&'() 45 )*+,'-.+'/0,#1/&2/&
)'3'04#5*6
Figure 2. Karplus-Strong algorithm implemented in the net-

<##:"00/+;
work path recirculating audio.
To test a connection, a client connects against a known

server IP number (e.g., CCRMA at Stanford). The path is .+'/0,#9
sonified with this string model. As the network delay in-
creases, the pitch of the sting will be lower. Variances in
the latency will be perceived as vibrato of the string model.
Figure 3. Multi-client server as a hub
Packet losses are translated into impulsive types of sounds
(for the case when the receive plays zeros when it doesn’t The present sever implementation allows a server to
receive a packet) or into wavetable type of sound (for the dynamically connect and disconnect audio from different
mode when the system keeps looping through the last re- clients. Each client can have a different number of chan-
ceiving packet) 3 . nels and different network tuning parameters. 4 In this case
Providing this service for intuitive and quick evaluation
4 JackTrip presently uses Jack [2] as its audio host. This has the limita-
of connection QoS is the original intended application of
tion that sampling rate and buffer are fixed at Jack start-time and cannot be
3 More details on these two modes can be found in [3]. tuned after the server has started.
Page 97
the server will act as a “hub” between several locations. [2] (2009) JACK: Connecting a world of audio. [Online].
Figure 3 illustrates this for an example with three clients. Available: http://jackaudio.org/
The server can mix and re-route all the audio channels be-
tween the clients, hence allowing a multi-site performance [3] Cáceres, J.-P. and Chafe, C., “JackTrip: Under the hood
with one site acting as a master relay service and/or mixer. of an engine for network audio,” in Proceedings of Inter-
national Computer Music Conference, Montreal, 2009.
5 CONCLUSIONS AND FUTURE WORK [4] Chafe, C. and Gurevich, M., “Network time delay and
ensemble accuracy: Effects of latency, asymmetry,” in
The first decade of the 21st century evidenced a dramatic Proceedings of the AES 117th Convention, San Fran-
increase in the speed and reliability of high-speed networks. cisco, 2004.
This increase is expected to continue. We have provided a
[5] Chafe, C. and Leistikow, R., “Levels of temporal reso-
system for musicians to tune and optimize their connections
lution in sonification of network performance,” in Pro-
against a reference server in a way that lets them adapt to
ceedings of the 2001 International Conference on Audi-
their given network situation. The server can also be used
tory Display. Helsinki: ICAD, 2001.
to interconnect multiple sites with arbitrary number of chan-
nels, and be a “mixing hub” that distributes audio to all the [6] Chafe, C., Wilson, S., and Walling, D., “Physical model
locations from a central place. synthesis with application to internet acoustics,” in Pro-
Scalability in network performance is a big issue that still ceedings of the International Conference on Acoustics,
needs to be solved. Learning how to connect hundreds or Speech and Signal Processing, Orlando, 2002.
even thousands of remote locations for a global-jam ses-
sion is a pending goal. Multicast at the network layer would [7] Comer, D. E., Internetworking with TCP/IP, Vol 1,
provide a solution for a fully connected peer-to-peer mesh. 5th ed. Prentice Hall, Jul. 2005.
Clients would select from a list of peers they want to con- [8] Daw, M., “Advanced collaboration with the access
nect with, and then send just one packet via multicast (us- grid,” Ariadne, vol. 42, Jan. 2005. [Online]. Available:
ing its underlying network layer implementation). Network http://www.ariadne.ac.uk/issue42/daw/intro.html
routers and switches determine when a copy needs to be
made. AccesGrid implements this [8] for a fixed number of [9] Karplus, K. and Strong, A., “Digital synthesis of
audio channels, however this infrastructure is not yet ubiqui- Plucked-String and drum timbres,” Computer Music
tous. Furthermore, when the number of audio channels and Journal, vol. 7, no. 2, pp. 43–55, 1983.
other settings differ among the clients, a new and consistent
solution is required so that they can inter-operate. [10] Peterson, L. L. and Davie, B. S., Computer Networks: A
Scaling up and distributing physical models embedded Systems Approach, 3rd Edition, 3rd ed. Morgan Kauf-
in the network path can also serve to perform “global string mann, May 2003.
network symphonies”, where the global network becomes [11] Qiao, Z., Venkatasubramanian, R., Sun, L., and Ifea-
the instrument itself, an instrument distributed throughout chor, E., “A new buffer algorithm for speech quality im-
the world. provement in VoIP systems,” Wireless Personal Commu-
nications, vol. 45, no. 2, pp. 189–207, Apr. 2008.
6 ACKNOWLEDGMENTS
[12] Renaud, A. B., Carôt, A., and Rebelo, P., “Networked
music performance: State of the art,” in Proceedings of
This work was carried out in cooperation with Musi-
the AES 30th International Conference, Saariselkä, Fin-
cianLink, Inc. and funded by National Science Founda-
land, 2007.
tion Award Grant No. IIP-0741278 with a sub-award to
CCRMA. See the online Final Report, Technical Research [13] Stevens, W. R., Fenner, B., and Rudoff, A. M., Unix Net-
Summary. 5 Fernando Lopez-Lezcano and Carr Wilkerson work Programming, Volume 1: The Sockets Networking
from CCRMA have provided continuous assistance in the API (3rd Edition), 3rd ed. Addison-Wesley Profes-
implementation and server infrastructure setup. sional, Nov. 2003.
7 REFERENCES
[1] (2008–2009) Qt Software. [Online]. Available:

http://www.qtsoftware.com/
5 http://ccrma.stanford.edu/∼ cc/pub/pdf/qosServer-nsfFinalReport.pdf
Page 98

Jacktrip/Soundwire Meets Server Farm: Center For Computer Research in Music and Acoustics (CCRMA) Stanford University

Uploaded by

Copyright:

Available Formats

Jacktrip/Soundwire Meets Server Farm: Center For Computer Research in Music and Acoustics (CCRMA) Stanford University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jacktrip/Soundwire Meets Server Farm: Center For Computer Research in Music and Acoustics (CCRMA) Stanford University

Uploaded by

Copyright:

Available Formats

Proceedings of the SMC 2009 - 6th Sound and Music Computing Conference, 23-25 July 2009, Porto - Portugal

JACKTRIP/SOUNDWIRE MEETS SERVER FARM

Juan-Pablo Cáceres & Chris Chafe

ABSTRACT a software for low-latency, high quality and multi-channel

want to minimize extraneous impulses coming from packet

client concurrent server that can be used to provide QoS

compressed real-time audio and processing plugins. 2)6+#,%%#

*"#!+)',()+ -%&'()+)',()+ =##:"00/+;

Figure 2. Karplus-Strong algorithm implemented in the net-

To test a connection, a client connects against a known

[1] (2008–2009) Qt Software. [Online]. Available:

You might also like