Icmc 2023 Template

This document discusses using networked music performance in PatchXR and FluCoMa. It proposes exploring a sound corpus in VR using FluCoMa to analyze audio data and cluster samples by timbre similarity. PatchXR's multiplayer feature allows remote performers to interact musically and simulate chamber music. Corpus-Based Concatenative Sound Synthesis involves constructing music by concatenating smaller recorded units, and is used here to model improvising musicians by rearranging phrases in real-time.

Uploaded by

bell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views9 pages

Icmc 2023 Template

Uploaded by

bell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Networked Music Performance in PatchXR and FluCoMa

Jonathan Bell
Aix Marseille Univ, CNRS, PRISM, Marseille, France
[email protected]

ABSTRACT

The present study proposes to explore a sound corpus in

VR, in which audio data is sliced and analysed in Flu-
CoMa in order obtain relatively large collections of sam-
ples clustered by timbre similarity in a 3d room. The recent
implementation of the multiplayer feature in PatchXR lets
envisage wide variety of gesture-based control interfaces
querying those corpora, and in which performers can in-
teract remotely in order to simulate a chamber music situ-
ation.

1. INTRODUCTION Figure 1. A VR interface in which each button in the world corresponds

to a slice of the sound file. Machine learning helps bringing closer sounds
The recent emergence of multiplayer capabilities of a VR that share common spectral characteristics.
software such as patchXR [1] urges to find meaningful in-
strument design in order to remotely and collaboratively
interact musically with a digital instrument, in multiplayer
VR (or metaverse). In search of experiences that would
relate to those found in traditional chamber music, the so-
lution proposed here focuses on the exploration of a sound
corpus projected on a 3d space, which the users can then
navigate with his hand controllers (see Fig. 1): in an etude
for piano and saxophone for instance 1 , one musician plays
blue buttons (the saxophone samples), and the other the
yellow buttons (the piano samples). [2]
An important reason here for using VR to explore a 3D
dataset is that it allows users to interact with the data in a
more natural and immersive way compared to a 2d plane
(Chapter 3 will show how the present study derives from
the CataRT 2d interface project), using the experience both
as a tool for performance as well as data visualisation and
analysis. Users can move around and explore the data from Figure 2. The implementation of the FM algorithm 1/ in Pure Data (left)
different angles, which can help them to better understand 2/ in PatchXR (right).
the relationships between different data points and identify
patterns, which becomes more evident as the number of
points increases. The use of machine learning (dimension- clear opposition between two types of gestures: 1/ (0’00)
ality reduction in our case) renders a world in which the staccato notes and 2/ (0’05) legato scale-like type of ma-
absolute coordinates of each point has no more link to the terial. This contrast in timbral quality is made explicit by
descriptor space (the high sounds cannot be mapped to the a movement of the avatar which jumps from a cluster of
y axis for instance), but offers compelling results for clus- buttons to another.
tering information relating to the different playing styles
of the instrument that is being analysed: as an example, in 2. VIRTUAL REALITY
this extract 2 based on flute sounds, the opening shows a
2.1 Musical Metaverse
1 https://youtu.be/kIi7YdzP2Nw?t=89
2 https://youtu.be/777fqIIJCY4 Turchet conducted a thorough study on what the musical
metaverse means today [3], although the realm still is in its
infancy. Berthaut [4] reviewed 3D interaction techniques
Copyright: c 2023 Jonathan Bell et al. This is an open-access article and examined how they can be used for musical control.
distributed under the terms of the Creative Commons Attribution License A survey of the emerging field of networked music perfor-
3.0 Unported, which permits unrestricted use, distribution, and reproduc- mances in VR was offered by Loveridge [5]. Atherton and
tion in any medium, provided the original author and source are credited. Wang [18] provided an overview of recent musical works
in VR, while Çamcı and Hamilton [6] identified research 3.1 Timbre Space
trends in the Musical XR field through a set of workshops
focusing on Audio-first VR. In spite of promising advances in the domain of deep learn-
Amongst a plethora of tools available today, PatchXR has ing applied to sound synthesis [8] [9], CBCS tools may
caught our attention primarily because its ressemblance to earn their popularity from a metaphor which leads back
Max/Pure Data environments (see Fig. 2). to the early days of computer music: the notion of timbre
space, developed by Wessel [10] and Grey [11], accord-
ing to which the multi-dimentional qualities of timbre may
2.2 PatchXR be better understood using spatial metaphors (e.g. the tim-
bre of the English horn being closer to basson than is it of
PatchXR [1] is a tool for creating immersive virtual reality
trumpet).
experiences in which users can design and build interac-
tive worlds, games, and music experiences. At its core,
Patch is a modular synthesis platform that allows users to
create and connect different building blocks, or “patches”
together to create complex systems. These building blocks
can include everything from 3D models and textures to
physics simulations, lighting controls, and, most impor-
tantly, audio digital signal processing.
One of the key features of Patch is its ability to enable
collaboration between users. Patches can be shared and
remixed, allowing multiple users to work together on a
single project or create something entirely new. In addi-
tion, Patch has a robust library of resources for users to
draw from, including tutorials, documentation, and sample
patches. The community around Patch is also very active,
with regular competitions, events, and meet ups happening
around the world.
One of the most exciting aspects of Patch is its poten-
tial for use in music performance and composition. The
modular design of the platform allows users to create com-
plex audio and visual environments that can be controlled
in real-time, opening up new possibilities for live music
and audiovisual performances. Patch has been used in a va-
Figure 3. Multidimensional perceptual scaling of mu-
riety of contexts, from creating interactive installations and sical timbres (John M. Grey[11]). Sounds available at:
exhibits to developing VR training simulations and games. https://muwiserver.synology.me/timbrespaces/grey.htm.
Its flexibility and modular design make it a powerful tool
for anyone interested in exploring the creative possibilities
of VR. Pioneers in the perception of timbre studies such as Grey
[11], J.C. Risset, D. Wessel, [12] or Stephen McAdams
[13] [14] most often define timbre by underline what it is
3. CORPUS-BASED CONCATENATIVE SOUND not. Risset and Wessel, for instance, define it as follow:
SYNTHESIS (CBCS) It is the perceptual attribute that enables us to distinguish
among orchestral instruments that are playing the same
Corpus-Based Concatenative Sound Synthesis (CBCS) is a pitch and are equally loud. The co-variance such parame-
technique used in computer music that involves construct- ters (pitch, loudness and timbre), however, leads Schwarz
ing a sound or music piece by concatenating (joining to- to distinguish timbre space and CBCS notions: ‘Note that
gether) smaller units of sound, such as phonemes in speech this concept is similar but not equivalent to that of the tim-
synthesis or musical phrases in music synthesis. It is used bre space put forward by Wessel and Grey [7, 24], since
in our case to model an improvising instrumental musician timbre is defined as those characteristics that serve to dis-
by creating a database of recorded musical phrases or seg- tinguish one sound from another, that remain after remov-
ments that can be combined and rearranged in real-time to ing differences in loudness and pitch. Our sound space ex-
create a musical performance that sounds like it is being plicitly includes those differences that are very important
improvised. to musical expression.” [15]
Today nearly 20 years old if one refers to the first CataRT The workflow described in Chapter 5 gave in practice
publications [7], CBCS today enjoys an increasing pop- strong evidence of inter-dependance between register, tim-
ularity. Various apps today are based on similar princi- bre and dynamics, particularly when the analysis run over a
pals (AudioStellar, Audioguide, LjudMAP or XO). The single instrument sound file (e.g. 30 minutes of solo flute),
democratisation of audio analysis and machine learning and chopped in short samples. The system will then pre-
tools such as the FluCoMa package (for Max, SuperCol- cisely be able to find similarity between instrumental pas-
lider and Pure Data) encourages computer music practi- sages played in the same register, same dynamic, and same
tioners to engage in this field at the crux between music playing technique (e.g. a flute playing fast trills mezzo
creation and data science/machine learning. forte, in mid-low register, with air).
3.2 Corpus-Based Concatenative Synthesis - State of
the art
A wide array of technologies today can be called corpus-
based concatenative synthesis, in the sense that they allow,
through segmentation and analysis, to explore large quanti-
ties of sound. Some of them are presented as “ready-made”
solutions, such as the recent Audiostellar [16], or SCMIR
3
for SuperCollider. Hackbarth’s AudioGuide [17] of-
fers a slightly different focus because it uses the morphol-
ogy/timeline of a soundfile to produce a concatenated out-
put. Within the Max world finally, two environments ap-
pear as highly customizable: IRCAM’s MuBu [18] and the Figure 4. General workflow: from an input audio file to its .patch 3d
more recent EU funded FluCoMa [19] project. CataRT representation in PatchXR.
is now fully integrated in MuBu, whose purpose encom-
passes multimodal audio analysis as well as machine for
movement and gesture recognition [20]. This makes MuBu After a few trials in which the x y z coordinates of a
extremely general purpose, but also difficult to grasp. The world directly represented audio descriptors such as loud-
data processing tools in MuBu are mostly exposed in the ness, pitch and centroid 5 , I more systematically used mfcc
pipo plugin framework [21], which can compute for in- analysis and dimensionality reduction, as will be shown in
stance mfcc analysis on a given audio buffer 4 by embed- section 5.
ding the pipo.mfcc plugin inside the mubu.process object. Section 6 will present different javascript programs and
FluComa also aims to be general purpose, but seems par- Max patches that were developed in order to diversify the
ticularly suited to perform two popular specific tasks. With ways in which FluCoMa analysis is represented in PatchXR,
only limited knowledge of the framework nor of theory and how the user can interact with it.
laying behind the algorithms it uses (such as those dimen-
tionality reduction, mfcc analysis, or neural network train- 5. WORKFLOW - ANALYSIS IN FLUCOMA
ing), the framework allows: 1/ to segment, analyse and
represent/playback a sound corpus 2/ to train a neural net- My experiments have focussed on musical instrument cor-
work to control a synthesizer, in a manner reminiscent of pora almost exclusively 6 . The tools presented here can
Fiebrink’s Wekinator [22]. efficiently generate plausible virtuosic instrumental music
Only the tools for segmentation, analysis, representation but recent uses found more satisfying results in slower,
and playback (described in detail in Chapter 5) were used quieter, “Feldman-like” types of textures. Various limi-
here, for they precisely fit the needs of corpus-based syn- tations on the playback side (either in standalone VR, or
thesis. on a Pure Data sampler for RaspberryPi described in Sec-
tion 6.2) have imposed restrictions in the first stages on
the amount of data it could handle (less than 5 minutes in
4. CONNECTING PATCHXR AND FLUCOMA AIFF in PatchXR) or the number of slice the sample could
be chunked into (256 because of limitation of lists in Max,
Porting analysis made in Max/FluCoMa to PatchXR con-
a limitation that has also been surpassed since). Both lim-
sists in generating a world in which each sonic fragment’s
itations were later overcome (use of the compressed ogg
3d position follows coordinates delivered by FluCoMa.
format in PatchXR, and taking advantage of longer sound
The structure of a .patch file (a patchXR world) follows
files since version 672, increase of internal buffer size in
the syntax of a .maxpat (for Max) or .pd file (for pure
fluid.buf2list in FluCoMa), thus allowing for far more con-
data) in the sense that it first declares the objects used, and
vincing models.
then the connexions between them. This simple structure
Using concatenative synthesis to model an improvising
helped to generate a javascript routine generating a tem-
instrumental musician typically involves several steps:
plate world, taking as input 1/ dictionaries (json files) with
each segment’s 3d coordinates and 2/ each segment’s tem- 1. Segmentation of a large soundfile: This involves di-
poral position in the sound file, and as output a new .patch viding a large audio recording of the musician’s per-
file (a world accessible in VR, see general workflow on formance into smaller units or segments.
Fig. 4).
2. Analysis: These segments are then organised in a
3 A demo is available at: https://youtu.be/jxo4StjV0Cg database according to various descriptor data (mfcc
4 MFCC stands for Mel-Frequency Cepstral Coefficients. It is a type
in our case).
of feature extraction method that is commonly used in speech and speaker
recognition systems. MFCCs are used to represent the spectral character- 3. Scaling/pre-processing: scaling is applied for better
istics of a sound in a compact form that is easier to analyze and process visualisation.
than the raw waveform. They are calculated by applying a series of trans-
formations to the power spectrum of a sound signal, including a Mel-scale 4. Dimension reduction: Based on mfcc descriptors,
warping of the frequency axis, taking the logarithm of the power spec- the dimensionality of the data is reduced in order to
trum, and applying a discrete cosine transform (DCT) to the resulting
5 https://youtu.be/1LHcbYh2KCI?t=19
coefficients. The resulting coefficients, which are called MFCCs, cap-
ture the spectral characteristics of the sound and are commonly used as 6 For cello: https://youtu.be/L-MiKmsIzjM For various instruments:
features for training machine learning models for tasks such as speech https://www.youtube.com/playlist?list=PLc WX6wY4JtnNqu4Lwe2YzEUq9S1IMvUk
recognition and speaker identification. For flute: https://www.youtube.com/playlist?list=PLc WX6wY4JtlbjLuLHDZhlx78sTDm
make it more manageable and easier to work with. which even a large reverb fails making sound plausible.
This can be done using techniques such as principal Having a low threshold and large “minslicelenght” results
component analysis (PCA) singular value decompo- in equidistant slices, all of identical durations, as would do
sition (SVD), or Uniform Manifold Approximation the pipo.onseg object in MuBu.
and Projection (UMAP, preferred in our case). Because we listen to sound in time, this parameter re-
5. Near neighbours sequencing: Once the segments have sponsible for the duration of samples is of prior impor-
been organised and analysed, the algorythm selects tance.
and combines them in real-time based on certain in-
put parameters or rules to create a simulated musical 5.2 MFCC on each slice - across one whole
performance that sounds like it is being improvised slice/segment
by the musician. We use here a near neighbours al-
Multidimensional MFCC analysis: MFCC (Mel-Frequency
gorithm, which selects segments that are similar in
Cepstral Coefficient) analysis is a technique used to extract
some way (e.g., in terms of pitch, loudness, or timbre
features from audio signals that are relevant for speech and
- thanks to similarities revealed by umap on mfccs in
music recognition. It involves calculating a set of coeffi-
our case) to the current segment being played.
cients that represent the spectral envelope of the audio sig-
We will now describe these steps in further detail: nal, or decomposing a sound signal into a set of frequency
bands and representing the power spectrum of each band
5.1 Slicing with a set of coefficients. The resulting MFCC coefficients
capture important spectral characteristics of the sound sig-
Slicing a sound file musically allow various possible ex- nal (albeit hardly interpretable by the novice user), such
ploitations in the realm of CBCS. In MuBu onset detection as the frequency and magnitude of the spectral peaks. We
is done with pipo.onseg or pipo.gate. FluCoMa expose five will see that combines with umap, it is able to capture the
different onset detection algorithms: spectral characteristics of the musician’s playing style.

1. fluid.ampslice: Amplitude-based detrending slicer 5.3 Statistical Analysis Over Each Slice
2. fluid.ampgate: Gate detection on a signal
BufStats is used to calculate statistical measures on data
3. fluid.onsetslice: Spectral difference-based audio buffer stored in a buffer channel. BufStats calculates seven statis-
slicer tics on the data in the buffer channel: mean, standard de-
4. fluid.noveltyslice: Based on self-similarity matrix viation, skewness, kurtosis, low, middle, and high values.
(SSM) These statistics provide information about the central ten-
5. fluid.transcientslice: Implements a de-clicking algo- dency of the data and how it is distributed around that ten-
rithm dency. In addition to calculating statistics on the original
buffer channel, BufStats can also calculate statistics on up
Onsetslice only was extensively tested. The only tweaked to two derivatives of the original data, apply weights to the
parameters were a straight-forward “threshold” as well as a data using a weights buffer, and identify and remove out-
“minslicelength” argument, determining the shortest slice lier frames. These statistical measures can be useful for
allowed (or minimum duration of a slice) in hopSize. This comparing different time-series data, even if they have dif-
introduce a common limitation in CBCS: the system strongly ferent length, and may provide better distinction between
biases the user to choose short samples for better anal- data points when used in training or analysis. The output
ysis results, and more interactivity, when controlling the of BufStats is a buffer with the same number of channels as
database with a gesture follower. Aaron Einbond remarks the original data, with each channel containing the statis-
in the use of CataRT how short samples most suited his in- tics for its corresponding data in the original buffer.
tention: “Short samples containing rapid, dry attacks, such
as close-miked key-clicks, were especially suitable for a 5.4 Normalization
convincing impression of motion of the single WFS source.
The FluCoMa package proposes several scaling or prepro-
The effect is that of a virtual instrument moving through the
cessing tools, amongst which normalization and standard-
concert hall in tandem with changes in its timbral content,
ization were used. Standardization and normalization are
realizing Wessel’s initial proposal.”[23]
techniques used to transform variables so that they can be
A related limitation of concatenative synthesis lies in the
compared or combined in statistical analyses. Both tech-
fact that short samples will demonstrate the efficiency of
niques are used to make data more comparable, but they
the algorithm 7 , but at the same time moves away from
work in slightly different ways.
the “plausible simulation” sought in the present study. A
balance therefore must be found between the freedom im- Standardization scales a variable to have a mean of 0 and
posed by large samples, and the refined control one can a standard deviation of 1, while normalization scales a vari-
obtain with short samples. able to have a minimum value of 0 and a maximum value
A direct concatenation of slices clicks in most cases on of 1. Normalization ccaling was found easier to use both
the edit point, which can be avoided through the use of in 2-D (in FluCoMa, the fluid.plotter object), as well as
ramps. The second most noticeable glitch on concatena- in the VR 3D world in which the origin corresponds to a
tion concerns the interruption of low register resonances, corner of the world. The fluid.nomalize object features an
“@max” attribute (1 by default), which then maps directly
7 e.g. https://youtu.be/LD0ivjyuqMA?t=3032 to the dimensions of the VR world.
5.5 Dimensionality Reduction - UMAP 5.6 Neighbourhood queries
The neighbourhood retrieval function is based in FluCoMa
Dimensionality reduction is a technique used in machine on K-d trees and the knn algorythm. In MuBu, the mubu.knn
learning to reduce the number of features (dimensions) in object serves similar tasks. The ml.kdtree object in the
a dataset. The goal of dimensionality reduction is to sim- ml.star library [25] gives comparable results.
plify the data without losing too much information. Var- K-d trees (short for ”k-dimensional trees”) and k-nearest
ious dimensionality reduction algorithms are presented in neighbours (k-NN) are two algorithms that are related to
an early FluCoMa study [24], with interestingly no men- each other, but serve different purposes, a k-d tree is a data
tion of UMAP, later favoured. structure that is used to store and efficiently query a set of
UMAP (Uniform Manifold Approximation and Projec- points in a k-dimensional space, while the k-NN algorithm
tion) is a non-linear dimensionality reduction technique is a machine learning algorithm that is used for classifi-
that is based on the principles of topological data analy- cation or regression. Both algorithms are often used in ap-
sis. It can be used to visualize high-dimensional data in plications such as pattern recognition, image classification,
a lower-dimensional space. When applied to sound data and data mining.
analysed with MFCC (Mel-Frequency Cepstral Coefficients),
UMAP reduces the dimensionality of the data and creates a
6. WORKFLOW IN PATCHXR
visual representation of the sound in a 2- or 3-dimensional
space. I have most often used FluCoMa and PatchXR to generate
By applying UMAP to the MFCC coefficients of a sound monophonic instruments (one performer plays one instru-
signal, it is possible to create a visual representation of the ment at a time), most typically in experiences where play-
sound that preserves the relationships between the different ers the players face one another 8 . In the case of ”button
MFCC coefficients (see Fig. 5). worlds” such as this one or those described in section 6.1,
there is no need of nearest neighbour retrieval since the
performer clicks exactly on the data point, and he (medi-
ated by his avatar) reproduces what knn would do with an
automated instrument: he will privilege in his choice the
samples he can reach at hand, rather than constantly jump
large distances between items (see Fig. 1).
In the worlds developed in Sections 6.2 and 6.3 on the
other hand, data points are not explicitely represented and
some near neighbour strategies need to be implemented.
PatchXR exposes a wide range of blocks (a block corre-
sponds to an object in Max or Pure Data) making it simple
to access gesture data such as:

• The position/distance between hands/controllers and

a reference.
• The rotation angles (x y z) of both hands’ controllers
• 2-d touchscreen-like controllers, where the user moves
the xy position of a selector across a plane by man-
ually grabbing it.
• 2-d lazer-like controllers, where the user moves the
xy position of a selector remotely, as if using a lazer
pointer towards a remote screen or board.
Figure 5. Dimensionality reduction of MFCCs help revealing spectral • 2-d pads, which allow to access the velocity at which
similarities. UMAP outputs coordinates in 2d or 3d.
the pad is hit
• 3-d slider or theremine like controllers, where the
UMAP is therefore used for its clustering abilities in the user moves the xyz position of a selector across a
first place, helping for classification purposes. It helps plane by manually grabbing it.
identifying patterns or trends that may not be evident from • A block called “interaction box” similar to 3-d slider,
the raw data. This can be useful for tasks such as explor- with the main difference that the user does not grab
ing the structure of a sound dataset, identifying patterns or the selector, but instead comes in and out of the in-
trends in the data, and comparing different sounds. teractive zone
Most importantly, the non-linear dimensions proposed by • 1-d sliders, knobs, buttons...
UMAP (whether in 2d in Max or in 3 dimensions in PatchXR,
and when compared to linear analyses in which, for in- One of the current challenges consists in diversifying the
stance, x, y and z correspond to pitch, loudness and cen- ways in which the corpus is queried.
troid) gave far more “intelligent” clustering than more con-
ventional parameter-consistent types of representations. 8 https://youtu.be/WhuqOOuzzBw
One to one mapping of UMAP results such as those of
section 6.1 use buttons facing each other, in order to prompt
the players to face each other 9 .
When playing alone and controlling at the same time many
instruments (the one-man-orchestra), encourages to use higher
level type of control over automata, i.e. to implement the
simple ability to concatenate automatically: play the next
sample as soon as the previous one has stopped (see section
6.2).

6.1 One to one mapping between data points and

buttons
The first javascript routine developed was designed to sim-
ply map a data point in the sonic space to a button in the
virtual 3d space. By iterating over an array, the routine
generates a world in which each button’s coordinates are
dictated by the FluCoMa umap analysis descripbed in sec-
tion 5.5. Albeit simpler than the method that will be ex-
posed later, this simple one to one to one mapping has
several advantages, most importantly the haptic and visual
feedback the performer gets when hitting each button.

6.2 Max/pd dependence

The second stage investigated the possibility to render the
sounds on an array of raspberry pi computers [26]. While
this method shows advantages in terms of patching (be-
cause of the convenience of using max and pure data), the
main drawback is that patches designed in this way cannot
be accessed by the PatchXR community. Documentation,
similarly is harder to record since the sound is produced
outside of PatchXR. The haptic and visual feedback is very
different here in the sense that the user controls primarily
the region of the space to be played, and when to start and
stop playing (when his hand touches the interface or not).
The way he plays is less rhythmical than in the button in-
terface where the player “hits” 10 each sample (section6.1).
Here on the contrary, the automat simply keeps playing as
long as he touches the interface 11
Flucoma exposes the fluid.kdtree that is able to find the
k nearest number once it is given as input the coordinates
of each data point (see Fig. 6). This method proved more Figure 6. The fluid.kdtree object is used here to retrieve the 8 nearest
suitable to control automata in which the player selects a neighbours of a point in a 2-d space.
region of a 2d plane together with the number of neigh-
bours he wants the automate to improvise with.
Most satisfying results were achieved by sending mes- board” block (the long rectangle in Fig.7). To measure
sages to each RaspberriPi independently, according to its the distance in 3D using Thales’ theorem (or so called dis-
specific (static) IP address, with a simple syntax of a 2- tance formula), we need to find the distance between two
integer list corresponding to: 1/which buffer to lookup 2/ points in three-dimensional space, i.e find the diameter of
which slice in this buffer to play, each Pi/speaker thus be- the sphere that passes through both points.
ing able to play each sound in a pure data patch, in which
each slice looks up an array with the corresponding slice p
points. (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

6.3 Nearest neighbour in PatchXR Fig. 8 shows the corresponding implementation in patchXR.
The currently most frequently used program is a javascript This fragment of visual program (called “abstraction” or
routine that generates a world (a .patch file) in which the x “subpatch” in Max, and called “group” in PatchXR) is then
y z coordinates of each data point are stored in a “knob used in a more complex patch which iterates (100 times per
9
second, for instance) over an array (the rectangular “knob-
https://youtu.be/
10 the opening of this video aptly conveys how the energy transfers form
board” object in Fig.7, so as to output the index (the value
the player’s gesture https://www.youtube.com/watch?v=glFdzbAJVRU 234 in the figure) of the point situated the closest to the
11 https://youtu.be/vtob96F9cQw controller.
expressing a lack of embodiment in the performance. Al-
though in its infancy, a first live performance staged/composed
by the author in Paris, with participants distributed across
Europe, showed a promising potential for tackling these is-
sues, most importantly through its attempt at dramatising
the use of avatars.

• JIM 23: https://youtu.be/npyfwqN02qE

A lot remains to be improved in order to give the audi-

ence an experience aesthetically comparable to that of the
Figure 7. The coordinates of the player’s controller (here 0.01, 0.06,
concert hall. Carefully orchestrated movements of cam-
0.13) yield as a result index 234 as nearest neighbour (read from left to eras around the avatars, and faithful translation the headset
right). experience of spatial audio with immersive visuals need
further exploration, but would go beyond the scope of the
present article.

8. CONCLUSIONS
We’ve proposed a workflow for corpus-based concatena-
tive synthesis CBCS in multiplayer VR (or metaverse), ar-
guing that machine learning tools for data visualisation of-
fer revealing and exploitable information about the timbral
quality of the material that is being analysed. In a wider
sense, the present approach can be understood as reflexive
practice on new media, according to which the notion of
data-base may be considered an art form [30]).
Figure 8. The implementation of Thales’ theorem in patchXR (read from The discussed tools for “machine listening” (FluCoMa,
right to left).
MuBu) help building intelligent instruments with relatively
small amounts of data, the duration of samples appear cru-
cial in CBCS. A balance must be found between 1/ short
7. NETWORKED MUSIC PERFORMANCE
duration sample analysis which are easier to process and
The concept of online virtual collaboration has gained sig- categorise and 2/ long samples which sound more natural
nificant attention, notably through Meta’s promotion in 2021. in the context instrument-based simulations.
The application of this concept within the realm of “tele-
improvisations” [27], most commonly referred to as Net- Acknowledgments
worked Music Performance (NMP) or holds the potential I am grateful for the support of UCA/CTEL, whose artist
to overcome what was hitherto viewed as intrinsic limita- residency research program has allowed to hold these ex-
tion of the field, both from practitioners and from an audi- periments, and for the support of PRISM-CNRS.
ence point of view.
NMPs have indeed stimulated considerable research and
experimentation whilst facing resistance at the same time. 9. REFERENCES
In his article “Not Being There”, Miller Puckette argues [1] Andersson, “immersive audio programming in a vir-
that he finds “Having seen a lot of networked performances tual reality sandbox,” journal of the audio engineering
of one sort or another, I find myself most excited by the po- society, march 2019.
tential of networked “telepresence” as an aid to rehearsal,
not performance.” [28]. In Embodiment and Disembodi- [2] L. Turchet, N. Garau, and N. Conci, “Networked
ment in NMP [29], Georg Hajdu identifies an issue with Musical XR: where’s the limit? A preliminary
a lack of readability from an audience perspective, who investigation on the joint use of point clouds and
cannot perceive the gesture to sound relationship as would low-latency audio communication,” in Proceedings
be the case in a normal concert situation: “These perfor- of the 17th International Audio Mostly Conference,
mances take machine–performer–spectator interactions into ser. AM ’22. New York, NY, USA: Association for
consideration, which, to a great deal, rely on embodied Computing Machinery, 2022, pp. 226–230. [Online].
cognition and the sense of causality[...]. Classical cause- Available: https://doi.org/10.1145/3561212.3561237
and-effect relationships (which also permeate the ‘genuine’
musical sign of the index) are replaced by plausibility, that [3] L. Turchet, “Musical Metaverse: vision, opportunities,
is the amount to which performers and spectators are ca- and challenges,” Personal and Ubiquitous Computing,
pable of ‘buying’ the outcome of a performance by build- 01 2023.
ing mental maps of the interaction.” Performances of lap-
top orchestras, along with various other experiments using [4] F. Berthaut, “3D interaction techniques for musical ex-
technology collaboratively, wether in local or distributed pression,” Journal of New Music Research, vol. 49,
settings, have reported similar concerns, most commonly no. 1, pp. 60–72, 2020.
[5] B. Loveridge, “Networked music performance in vir- [17] B. Hackbarth, N. Schnell, P. Esling, and D. Schwarz,
tual reality: current perspectives,” Journal of Network “Composing Morphology: Concatenative Synthesis
Music and Arts, vol. 2, no. 1, p. 2, 2020. as an Intuitive Medium for Prescribing Sound
in Time,” Contemporary Music Review, vol. 32,
[6] A. Çamcı and R. Hamilton, “Audio-first VR: new per- no. 1, pp. 49–59, 2013. [Online]. Available: https:
spectives on musical experiences in virtual environ- //hal.archives-ouvertes.fr/hal-01577895
ments,” Journal of New Music Research, vol. 49, no. 1,
pp. 1–7, 2020. [18] N. Schnell, A. Roebel, D. Schwarz, G. Peeters, and
R. Borghesi, “MUBU and FRIENDS -ASSEMBLING
[7] D. Schwarz, G. Beller, B. Verbrugghe, and S. Britton, TOOLS FOR CONTENT BASED REAL-TIME IN-
“Real-Time Corpus-Based Concatenative Synthesis TERACTIVE AUDIO PROCESSING IN MAX/MSP,”
with CataRT,” in 9th International Conference on Proceedings of the International Computer Music Con-
Digital Audio Effects (DAFx), Montreal, Canada, Sep. ference (ICMC 2009), 01 2009.
2006, pp. 279–282, cote interne IRCAM: Schwarz06c.
[Online]. Available: https://hal.archives-ouvertes.fr/ [19] P. A. Tremblay, G. Roma, and O. Green, “Enabling
hal-01161358 Programmatic Data Mining as Musicking: The
Fluid Corpus Manipulation Toolkit,” Computer Music
[8] J.-P. Briot, G. Hadjeres, and F.-D. Pachet, Deep Journal, vol. 45, no. 2, pp. 9–23, 06 2021. [Online].
Learning Techniques for Music Generation – A Available: https://doi.org/10.1162/comj a 00600
Survey, Aug. 2019. [Online]. Available: https:
[20] F. Bevilacqua and R. Müller, “A Gesture follower for
//hal.sorbonne-universite.fr/hal-01660772
performing arts,” 05 2005.
[9] P. Esling, A. Chemla-Romeu-Santos, and A. Bitton, [21] N. Schnell, D. Schwarz, J. Larralde, and R. Borghesi,
“Generative timbre spaces with variational audio syn- “PiPo, a Plugin Interface for Afferent Data Stream Pro-
thesis,” CoRR, vol. abs/1805.08501, 2018. [Online]. cessing Operators,” in International Society for Music
Available: http://arxiv.org/abs/1805.08501 Information Retrieval Conference, 2017.
[10] D. L. Wessel, “Timbre Space as a Musical Control [22] R. Fiebrink and P. Cook, “The Wekinator: A System
Structure,” Computer Music Journal, vol. 3, no. 2, pp. for Real-time, Interactive Machine Learning in Mu-
45–52, 1979. [Online]. Available: http://www.jstor. sic,” Proceedings of The Eleventh International Soci-
org/stable/3680283 ety for Music Information Retrieval Conference (IS-
MIR 2010), 01 2010.
[11] K. Fitz, M. Burk, and M. McKinney, “Multidimen-
sional perceptual scaling of musical timbre by hearing- [23] A. Einbond and D. Schwarz, “Spatializing Tim-
impaired listeners,” The Journal of the Acoustical So- bre With Corpus-Based Concatenative Synthesis,” 06
ciety of America, vol. 125, p. 2633, 05 2009. 2010.

[12] J.-C. Risset and D. Wessel, “Exploration of timbre by [24] G. Roma, O. Green, and P. A. Tremblay, “Adaptive
analysis and synthesis,” Psychology of Music, pp. 113– Mapping of Sound Collections for Data-driven Musical
169, 1999. Interfaces,” in New Interfaces for Musical Expression,
2019.
[13] S. Mcadams, S. Winsberg, S. Donnadieu, G. De Soete,
and J. Krimphoff, “Perceptual scaling of synthesized [25] B. D. Smith and G. E. Garnett, “Unsupervised Play:
musical timbres: Common dimensions, specificities, Machine Learning Toolkit for Max,” in New Interfaces
and latent subject classes,” Psychological research, for Musical Expression, 2012.
vol. 58, pp. 177–92, 02 1995.
[26] PrÉ : connected polyphonic immersion. Zenodo,
[14] A. Caclin, S. Mcadams, B. Smith, and S. Winsberg, Jul. 2022. [Online]. Available: https://doi.org/10.5281/
“Acoustic correlates of timbre space dimensions: A zenodo.6806324
confirmatory study using synthetic tones,” The Jour- [27] R. Mills, “Tele-Improvisation: Intercultural Interaction
nal of the Acoustical Society of America, vol. 118, pp. in the Online Global Music Jam Session,” in Springer
471–82, 08 2005. Series on Cultural Computing, 2019. [Online].
Available: https://api.semanticscholar.org/CorpusID:
[15] D. Schwarz, “The Sound Space as Musical Instrument:
57428481
Playing Corpus-Based Concatenative Synthesis,” in
New Interfaces for Musical Expression (NIME), Ann [28] M. Puckette, “Not Being There,” Contemporary
Arbour, United States, May 2012, pp. 250–253, cote Music Review, vol. 28, no. 4-5, pp. 409–412,
interne IRCAM: Schwarz12a. [Online]. Available: 2009. [Online]. Available: https://doi.org/10.1080/
https://hal.archives-ouvertes.fr/hal-01161442 07494460903422354
[16] L. Garber, T. Ciccola, and J. C. Amusategui, “Au- [29] G. Hajdu, “Embodiment and disembodiment in
dioStellar, an open source corpus-based musical instru- networked music performance,” 2017. [Online].
ment for latent sound structure discovery and sonic ex- Available: https://api.semanticscholar.org/CorpusID:
perimentation,” 12 2020. 149523160
[30] L. Manovich, “Database as Symbolic Form,” Conver-
gence: The International Journal of Research into New
Media Technologies, vol. 5, pp. 80 – 99, 1999.

The Art of Supercell 10th Anniversary Edition
100% (2)
The Art of Supercell 10th Anniversary Edition
212 pages
Hidden Structure Music Analysis Using Computers
No ratings yet
Hidden Structure Music Analysis Using Computers
376 pages
SBCM 2019 - Proceedings
No ratings yet
SBCM 2019 - Proceedings
247 pages
Moajamul Kabeer Tabrani 1
No ratings yet
Moajamul Kabeer Tabrani 1
841 pages
TWO STAGE RC Coupled Amplifier
100% (2)
TWO STAGE RC Coupled Amplifier
3 pages
Compositionl Strategies and Spectral Spatialization
100% (1)
Compositionl Strategies and Spectral Spatialization
183 pages
A Survey of Music Visualization Techniques
No ratings yet
A Survey of Music Visualization Techniques
29 pages
Sound Processes A New Computer Music Fra
No ratings yet
Sound Processes A New Computer Music Fra
9 pages
Spazializzazione PDF
No ratings yet
Spazializzazione PDF
5 pages
31 Guitarmethodforchildren1
100% (1)
31 Guitarmethodforchildren1
16 pages
Nso Nso Nso - Full Score
No ratings yet
Nso Nso Nso - Full Score
2 pages
Morrison 2024 Timbre Space On The Flat History of A Multidimensional Metaphor
No ratings yet
Morrison 2024 Timbre Space On The Flat History of A Multidimensional Metaphor
16 pages
1994-Virtual Musical Instruments - Accessing The Sound Synthesis Universe As A Performer
No ratings yet
1994-Virtual Musical Instruments - Accessing The Sound Synthesis Universe As A Performer
11 pages
Klapuri - 2006 - Introduction To Music Transcription
100% (1)
Klapuri - 2006 - Introduction To Music Transcription
28 pages
Elektron Next Level
No ratings yet
Elektron Next Level
52 pages
Evolving Structures For Electronic Dance Music: JULY 2013
100% (1)
Evolving Structures For Electronic Dance Music: JULY 2013
9 pages
Lingastagam With Prelude
No ratings yet
Lingastagam With Prelude
7 pages
Good Girl Gone Bad - Scene Guide v21
No ratings yet
Good Girl Gone Bad - Scene Guide v21
46 pages
SMC2017 Proc Papers
No ratings yet
SMC2017 Proc Papers
470 pages
Irin Micromontage in Graphical Sound Editing and Mixing Tool PDF
100% (1)
Irin Micromontage in Graphical Sound Editing and Mixing Tool PDF
4 pages
Perspectives On The Contribution of Timbre To Musical Structure
100% (1)
Perspectives On The Contribution of Timbre To Musical Structure
19 pages
Sufjan Stevens - Fourth of July Tab
No ratings yet
Sufjan Stevens - Fourth of July Tab
2 pages
Bell Shortened
No ratings yet
Bell Shortened
12 pages
Purposive Communication
No ratings yet
Purposive Communication
2 pages
UCL Icmc
No ratings yet
UCL Icmc
9 pages
Tools For Real Time Music Notation
No ratings yet
Tools For Real Time Music Notation
14 pages
Gesture Timbre Space CMMR LNCS
No ratings yet
Gesture Timbre Space CMMR LNCS
24 pages
Current Trends and Future Research Directions For Interactive Music
No ratings yet
Current Trends and Future Research Directions For Interactive Music
34 pages
S119 BHAIRAVI SWARAJATHI Bhairavi
No ratings yet
S119 BHAIRAVI SWARAJATHI Bhairavi
3 pages
Realtime Stochastic Decision Making For Music Composition and Improvisation
100% (1)
Realtime Stochastic Decision Making For Music Composition and Improvisation
15 pages
MUSIC-LINE-UP-FOR-HOLY-WEEK 2022 (Edited)
No ratings yet
MUSIC-LINE-UP-FOR-HOLY-WEEK 2022 (Edited)
3 pages
Spectral FFT Max Ms P
100% (1)
Spectral FFT Max Ms P
17 pages
This Study Resource Was: ENGLISH II. Assignment 6 Text
No ratings yet
This Study Resource Was: ENGLISH II. Assignment 6 Text
9 pages
OMAX
No ratings yet
OMAX
28 pages
Perceptual Aspects in Spatial Audio Processing
No ratings yet
Perceptual Aspects in Spatial Audio Processing
7 pages
JulesFRANCOISE Slides Phddefense
No ratings yet
JulesFRANCOISE Slides Phddefense
130 pages
Seminar 2
No ratings yet
Seminar 2
34 pages
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
No ratings yet
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
152 pages
Paper 3
No ratings yet
Paper 3
5 pages
Bridal Chorus Lohengrin
No ratings yet
Bridal Chorus Lohengrin
1 page
E1091 User Manual
No ratings yet
E1091 User Manual
16 pages
Python Script
No ratings yet
Python Script
4 pages
Creating Visual Music in Jitter - Approaches and Techniques
100% (2)
Creating Visual Music in Jitter - Approaches and Techniques
17 pages
Soloud 20140821
No ratings yet
Soloud 20140821
95 pages
PH.D Scholars List For 2018 19 Academic Years
No ratings yet
PH.D Scholars List For 2018 19 Academic Years
11 pages
Carey y Hajdu - NETSCORE AN IMAGE SERVERCLIENT PACK - AGE FOR TRA
No ratings yet
Carey y Hajdu - NETSCORE AN IMAGE SERVERCLIENT PACK - AGE FOR TRA
6 pages
Chet Baker On Piano It Could Happen To You Trascrizione PDF
No ratings yet
Chet Baker On Piano It Could Happen To You Trascrizione PDF
1 page
78º Concurse de Genève Composition - Rules - 2024
No ratings yet
78º Concurse de Genève Composition - Rules - 2024
9 pages
Yoshimi Cookbook
No ratings yet
Yoshimi Cookbook
40 pages
ZsaDescriptors A Library
No ratings yet
ZsaDescriptors A Library
5 pages
Mit PDF
No ratings yet
Mit PDF
81 pages
Emilia ResearchWork
No ratings yet
Emilia ResearchWork
114 pages
Chapter - 1: 1.1 Introduction To Music Genre Classification
No ratings yet
Chapter - 1: 1.1 Introduction To Music Genre Classification
57 pages
PANAGUITON
No ratings yet
PANAGUITON
5 pages
Music Genre Classification
No ratings yet
Music Genre Classification
5 pages
The Continuum of Indeterminacy in Live C
No ratings yet
The Continuum of Indeterminacy in Live C
72 pages
X - Voice Music Player
No ratings yet
X - Voice Music Player
15 pages
PHD Tristan
No ratings yet
PHD Tristan
137 pages
E32-DTU+ (433L37) UserManual EN v1.3
No ratings yet
E32-DTU+ (433L37) UserManual EN v1.3
18 pages
Personal Project S5
No ratings yet
Personal Project S5
7 pages
Nicola Bernardini, Should Musical Instruments Be Dreams
No ratings yet
Nicola Bernardini, Should Musical Instruments Be Dreams
6 pages
CH 11
No ratings yet
CH 11
17 pages
OpenFrameworks Lections: Interactive Sound
No ratings yet
OpenFrameworks Lections: Interactive Sound
35 pages
25수능특강영어 Test 2 직전보강
No ratings yet
25수능특강영어 Test 2 직전보강
9 pages
Audiostellar - Intro
No ratings yet
Audiostellar - Intro
1 page
5 Paragraph Essay On Respect
100% (2)
5 Paragraph Essay On Respect
6 pages
ID10 User Manual
No ratings yet
ID10 User Manual
8 pages
Teruggi 1
No ratings yet
Teruggi 1
3 pages
Demonstrating Interactive Machine Learning Tools For Rapid Prototyping of Gestural Instruments in The Browser
No ratings yet
Demonstrating Interactive Machine Learning Tools For Rapid Prototyping of Gestural Instruments in The Browser
2 pages
Beatscape, A Mixed Virtual-Physical Environment For Musical Ensembles
No ratings yet
Beatscape, A Mixed Virtual-Physical Environment For Musical Ensembles
4 pages
Individual Scripts & General Script
No ratings yet
Individual Scripts & General Script
15 pages
Paradigms of Music Software Interface Design and Musical Creativity
No ratings yet
Paradigms of Music Software Interface Design and Musical Creativity
10 pages
A Short History of Digital Sound Synthesis by Composers in The U.S.A
No ratings yet
A Short History of Digital Sound Synthesis by Composers in The U.S.A
9 pages
LL: Listening and Learning in An Interactive Improvisation System
No ratings yet
LL: Listening and Learning in An Interactive Improvisation System
7 pages
Ar Pluck Kamp, KCPS, Icps, Ifn, Imeth (Iparm1, Iparm2)
No ratings yet
Ar Pluck Kamp, KCPS, Icps, Ifn, Imeth (Iparm1, Iparm2)
11 pages
Technology and Innovation in Music Notation
No ratings yet
Technology and Innovation in Music Notation
40 pages
Digital Sound Synthesis by Physical Modelling
No ratings yet
Digital Sound Synthesis by Physical Modelling
12 pages
A Music Data Mining and Retrieval Primer: Dan Berger Dberger@cs - Ucr.edu May 27, 2003
No ratings yet
A Music Data Mining and Retrieval Primer: Dan Berger Dberger@cs - Ucr.edu May 27, 2003
6 pages
IMSLP768707-PMLP3267-J S Bach - 15 Inventions - Contrapunctus Press
0% (1)
IMSLP768707-PMLP3267-J S Bach - 15 Inventions - Contrapunctus Press
42 pages
12 Hole Ocarinas - Songbird Ocarina 4
No ratings yet
12 Hole Ocarinas - Songbird Ocarina 4
1 page
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
No ratings yet
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
9 pages
Realtime Selection of Percussion Samples Through Timbral Similarity in Max/Msp
No ratings yet
Realtime Selection of Percussion Samples Through Timbral Similarity in Max/Msp
4 pages
Strike The Viol Henry Purcell Strike The Viol
No ratings yet
Strike The Viol Henry Purcell Strike The Viol
3 pages
Xsens Performance: Playing Music by The Rules
100% (1)
Xsens Performance: Playing Music by The Rules
2 pages
Multiple Choice Question Paper
No ratings yet
Multiple Choice Question Paper
2 pages
09-09 - SR Az M
No ratings yet
09-09 - SR Az M
1 page

Icmc 2023 Template

Uploaded by

Icmc 2023 Template

Uploaded by

Networked Music Performance in PatchXR and FluCoMa

The present study proposes to explore a sound corpus in

1. INTRODUCTION Figure 1. A VR interface in which each button in the world corresponds

• The position/distance between hands/controllers and

6.1 One to one mapping between data points and

6.2 Max/pd dependence

• JIM 23: https://youtu.be/npyfwqN02qE

A lot remains to be improved in order to give the audi-

You might also like