0% found this document useful (0 votes)
22 views9 pages

Icmc 2023 Template

This document discusses using networked music performance in PatchXR and FluCoMa. It proposes exploring a sound corpus in VR using FluCoMa to analyze audio data and cluster samples by timbre similarity. PatchXR's multiplayer feature allows remote performers to interact musically and simulate chamber music. Corpus-Based Concatenative Sound Synthesis involves constructing music by concatenating smaller recorded units, and is used here to model improvising musicians by rearranging phrases in real-time.

Uploaded by

bell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Icmc 2023 Template

This document discusses using networked music performance in PatchXR and FluCoMa. It proposes exploring a sound corpus in VR using FluCoMa to analyze audio data and cluster samples by timbre similarity. PatchXR's multiplayer feature allows remote performers to interact musically and simulate chamber music. Corpus-Based Concatenative Sound Synthesis involves constructing music by concatenating smaller recorded units, and is used here to model improvising musicians by rearranging phrases in real-time.

Uploaded by

bell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Networked Music Performance in PatchXR and FluCoMa

Jonathan Bell
Aix Marseille Univ, CNRS, PRISM, Marseille, France
[email protected]

ABSTRACT

The present study proposes to explore a sound corpus in


VR, in which audio data is sliced and analysed in Flu-
CoMa in order obtain relatively large collections of sam-
ples clustered by timbre similarity in a 3d room. The recent
implementation of the multiplayer feature in PatchXR lets
envisage wide variety of gesture-based control interfaces
querying those corpora, and in which performers can in-
teract remotely in order to simulate a chamber music situ-
ation.

1. INTRODUCTION Figure 1. A VR interface in which each button in the world corresponds


to a slice of the sound file. Machine learning helps bringing closer sounds
The recent emergence of multiplayer capabilities of a VR that share common spectral characteristics.
software such as patchXR [1] urges to find meaningful in-
strument design in order to remotely and collaboratively
interact musically with a digital instrument, in multiplayer
VR (or metaverse). In search of experiences that would
relate to those found in traditional chamber music, the so-
lution proposed here focuses on the exploration of a sound
corpus projected on a 3d space, which the users can then
navigate with his hand controllers (see Fig. 1): in an etude
for piano and saxophone for instance 1 , one musician plays
blue buttons (the saxophone samples), and the other the
yellow buttons (the piano samples). [2]
An important reason here for using VR to explore a 3D
dataset is that it allows users to interact with the data in a
more natural and immersive way compared to a 2d plane
(Chapter 3 will show how the present study derives from
the CataRT 2d interface project), using the experience both
as a tool for performance as well as data visualisation and
analysis. Users can move around and explore the data from Figure 2. The implementation of the FM algorithm 1/ in Pure Data (left)
different angles, which can help them to better understand 2/ in PatchXR (right).
the relationships between different data points and identify
patterns, which becomes more evident as the number of
points increases. The use of machine learning (dimension- clear opposition between two types of gestures: 1/ (0’00)
ality reduction in our case) renders a world in which the staccato notes and 2/ (0’05) legato scale-like type of ma-
absolute coordinates of each point has no more link to the terial. This contrast in timbral quality is made explicit by
descriptor space (the high sounds cannot be mapped to the a movement of the avatar which jumps from a cluster of
y axis for instance), but offers compelling results for clus- buttons to another.
tering information relating to the different playing styles
of the instrument that is being analysed: as an example, in 2. VIRTUAL REALITY
this extract 2 based on flute sounds, the opening shows a
2.1 Musical Metaverse
1 https://youtu.be/kIi7YdzP2Nw?t=89
2 https://youtu.be/777fqIIJCY4 Turchet conducted a thorough study on what the musical
metaverse means today [3], although the realm still is in its
infancy. Berthaut [4] reviewed 3D interaction techniques
Copyright: c 2023 Jonathan Bell et al. This is an open-access article and examined how they can be used for musical control.
distributed under the terms of the Creative Commons Attribution License A survey of the emerging field of networked music perfor-
3.0 Unported, which permits unrestricted use, distribution, and reproduc- mances in VR was offered by Loveridge [5]. Atherton and
tion in any medium, provided the original author and source are credited. Wang [18] provided an overview of recent musical works
in VR, while Çamcı and Hamilton [6] identified research 3.1 Timbre Space
trends in the Musical XR field through a set of workshops
focusing on Audio-first VR. In spite of promising advances in the domain of deep learn-
Amongst a plethora of tools available today, PatchXR has ing applied to sound synthesis [8] [9], CBCS tools may
caught our attention primarily because its ressemblance to earn their popularity from a metaphor which leads back
Max/Pure Data environments (see Fig. 2). to the early days of computer music: the notion of timbre
space, developed by Wessel [10] and Grey [11], accord-
ing to which the multi-dimentional qualities of timbre may
2.2 PatchXR be better understood using spatial metaphors (e.g. the tim-
bre of the English horn being closer to basson than is it of
PatchXR [1] is a tool for creating immersive virtual reality
trumpet).
experiences in which users can design and build interac-
tive worlds, games, and music experiences. At its core,
Patch is a modular synthesis platform that allows users to
create and connect different building blocks, or “patches”
together to create complex systems. These building blocks
can include everything from 3D models and textures to
physics simulations, lighting controls, and, most impor-
tantly, audio digital signal processing.
One of the key features of Patch is its ability to enable
collaboration between users. Patches can be shared and
remixed, allowing multiple users to work together on a
single project or create something entirely new. In addi-
tion, Patch has a robust library of resources for users to
draw from, including tutorials, documentation, and sample
patches. The community around Patch is also very active,
with regular competitions, events, and meet ups happening
around the world.
One of the most exciting aspects of Patch is its poten-
tial for use in music performance and composition. The
modular design of the platform allows users to create com-
plex audio and visual environments that can be controlled
in real-time, opening up new possibilities for live music
and audiovisual performances. Patch has been used in a va-
Figure 3. Multidimensional perceptual scaling of mu-
riety of contexts, from creating interactive installations and sical timbres (John M. Grey[11]). Sounds available at:
exhibits to developing VR training simulations and games. https://muwiserver.synology.me/timbrespaces/grey.htm.
Its flexibility and modular design make it a powerful tool
for anyone interested in exploring the creative possibilities
of VR. Pioneers in the perception of timbre studies such as Grey
[11], J.C. Risset, D. Wessel, [12] or Stephen McAdams
[13] [14] most often define timbre by underline what it is
3. CORPUS-BASED CONCATENATIVE SOUND not. Risset and Wessel, for instance, define it as follow:
SYNTHESIS (CBCS) It is the perceptual attribute that enables us to distinguish
among orchestral instruments that are playing the same
Corpus-Based Concatenative Sound Synthesis (CBCS) is a pitch and are equally loud. The co-variance such parame-
technique used in computer music that involves construct- ters (pitch, loudness and timbre), however, leads Schwarz
ing a sound or music piece by concatenating (joining to- to distinguish timbre space and CBCS notions: ‘Note that
gether) smaller units of sound, such as phonemes in speech this concept is similar but not equivalent to that of the tim-
synthesis or musical phrases in music synthesis. It is used bre space put forward by Wessel and Grey [7, 24], since
in our case to model an improvising instrumental musician timbre is defined as those characteristics that serve to dis-
by creating a database of recorded musical phrases or seg- tinguish one sound from another, that remain after remov-
ments that can be combined and rearranged in real-time to ing differences in loudness and pitch. Our sound space ex-
create a musical performance that sounds like it is being plicitly includes those differences that are very important
improvised. to musical expression.” [15]
Today nearly 20 years old if one refers to the first CataRT The workflow described in Chapter 5 gave in practice
publications [7], CBCS today enjoys an increasing pop- strong evidence of inter-dependance between register, tim-
ularity. Various apps today are based on similar princi- bre and dynamics, particularly when the analysis run over a
pals (AudioStellar, Audioguide, LjudMAP or XO). The single instrument sound file (e.g. 30 minutes of solo flute),
democratisation of audio analysis and machine learning and chopped in short samples. The system will then pre-
tools such as the FluCoMa package (for Max, SuperCol- cisely be able to find similarity between instrumental pas-
lider and Pure Data) encourages computer music practi- sages played in the same register, same dynamic, and same
tioners to engage in this field at the crux between music playing technique (e.g. a flute playing fast trills mezzo
creation and data science/machine learning. forte, in mid-low register, with air).
3.2 Corpus-Based Concatenative Synthesis - State of
the art
A wide array of technologies today can be called corpus-
based concatenative synthesis, in the sense that they allow,
through segmentation and analysis, to explore large quanti-
ties of sound. Some of them are presented as “ready-made”
solutions, such as the recent Audiostellar [16], or SCMIR
3
for SuperCollider. Hackbarth’s AudioGuide [17] of-
fers a slightly different focus because it uses the morphol-
ogy/timeline of a soundfile to produce a concatenated out-
put. Within the Max world finally, two environments ap-
pear as highly customizable: IRCAM’s MuBu [18] and the Figure 4. General workflow: from an input audio file to its .patch 3d
more recent EU funded FluCoMa [19] project. CataRT representation in PatchXR.
is now fully integrated in MuBu, whose purpose encom-
passes multimodal audio analysis as well as machine for
movement and gesture recognition [20]. This makes MuBu After a few trials in which the x y z coordinates of a
extremely general purpose, but also difficult to grasp. The world directly represented audio descriptors such as loud-
data processing tools in MuBu are mostly exposed in the ness, pitch and centroid 5 , I more systematically used mfcc
pipo plugin framework [21], which can compute for in- analysis and dimensionality reduction, as will be shown in
stance mfcc analysis on a given audio buffer 4 by embed- section 5.
ding the pipo.mfcc plugin inside the mubu.process object. Section 6 will present different javascript programs and
FluComa also aims to be general purpose, but seems par- Max patches that were developed in order to diversify the
ticularly suited to perform two popular specific tasks. With ways in which FluCoMa analysis is represented in PatchXR,
only limited knowledge of the framework nor of theory and how the user can interact with it.
laying behind the algorithms it uses (such as those dimen-
tionality reduction, mfcc analysis, or neural network train- 5. WORKFLOW - ANALYSIS IN FLUCOMA
ing), the framework allows: 1/ to segment, analyse and
represent/playback a sound corpus 2/ to train a neural net- My experiments have focussed on musical instrument cor-
work to control a synthesizer, in a manner reminiscent of pora almost exclusively 6 . The tools presented here can
Fiebrink’s Wekinator [22]. efficiently generate plausible virtuosic instrumental music
Only the tools for segmentation, analysis, representation but recent uses found more satisfying results in slower,
and playback (described in detail in Chapter 5) were used quieter, “Feldman-like” types of textures. Various limi-
here, for they precisely fit the needs of corpus-based syn- tations on the playback side (either in standalone VR, or
thesis. on a Pure Data sampler for RaspberryPi described in Sec-
tion 6.2) have imposed restrictions in the first stages on
the amount of data it could handle (less than 5 minutes in
4. CONNECTING PATCHXR AND FLUCOMA AIFF in PatchXR) or the number of slice the sample could
be chunked into (256 because of limitation of lists in Max,
Porting analysis made in Max/FluCoMa to PatchXR con-
a limitation that has also been surpassed since). Both lim-
sists in generating a world in which each sonic fragment’s
itations were later overcome (use of the compressed ogg
3d position follows coordinates delivered by FluCoMa.
format in PatchXR, and taking advantage of longer sound
The structure of a .patch file (a patchXR world) follows
files since version 672, increase of internal buffer size in
the syntax of a .maxpat (for Max) or .pd file (for pure
fluid.buf2list in FluCoMa), thus allowing for far more con-
data) in the sense that it first declares the objects used, and
vincing models.
then the connexions between them. This simple structure
Using concatenative synthesis to model an improvising
helped to generate a javascript routine generating a tem-
instrumental musician typically involves several steps:
plate world, taking as input 1/ dictionaries (json files) with
each segment’s 3d coordinates and 2/ each segment’s tem- 1. Segmentation of a large soundfile: This involves di-
poral position in the sound file, and as output a new .patch viding a large audio recording of the musician’s per-
file (a world accessible in VR, see general workflow on formance into smaller units or segments.
Fig. 4).
2. Analysis: These segments are then organised in a
3 A demo is available at: https://youtu.be/jxo4StjV0Cg database according to various descriptor data (mfcc
4 MFCC stands for Mel-Frequency Cepstral Coefficients. It is a type
in our case).
of feature extraction method that is commonly used in speech and speaker
recognition systems. MFCCs are used to represent the spectral character- 3. Scaling/pre-processing: scaling is applied for better
istics of a sound in a compact form that is easier to analyze and process visualisation.
than the raw waveform. They are calculated by applying a series of trans-
formations to the power spectrum of a sound signal, including a Mel-scale 4. Dimension reduction: Based on mfcc descriptors,
warping of the frequency axis, taking the logarithm of the power spec- the dimensionality of the data is reduced in order to
trum, and applying a discrete cosine transform (DCT) to the resulting
5 https://youtu.be/1LHcbYh2KCI?t=19
coefficients. The resulting coefficients, which are called MFCCs, cap-
ture the spectral characteristics of the sound and are commonly used as 6 For cello: https://youtu.be/L-MiKmsIzjM For various instruments:
features for training machine learning models for tasks such as speech https://www.youtube.com/playlist?list=PLc WX6wY4JtnNqu4Lwe2YzEUq9S1IMvUk
recognition and speaker identification. For flute: https://www.youtube.com/playlist?list=PLc WX6wY4JtlbjLuLHDZhlx78sTDm
make it more manageable and easier to work with. which even a large reverb fails making sound plausible.
This can be done using techniques such as principal Having a low threshold and large “minslicelenght” results
component analysis (PCA) singular value decompo- in equidistant slices, all of identical durations, as would do
sition (SVD), or Uniform Manifold Approximation the pipo.onseg object in MuBu.
and Projection (UMAP, preferred in our case). Because we listen to sound in time, this parameter re-
5. Near neighbours sequencing: Once the segments have sponsible for the duration of samples is of prior impor-
been organised and analysed, the algorythm selects tance.
and combines them in real-time based on certain in-
put parameters or rules to create a simulated musical 5.2 MFCC on each slice - across one whole
performance that sounds like it is being improvised slice/segment
by the musician. We use here a near neighbours al-
Multidimensional MFCC analysis: MFCC (Mel-Frequency
gorithm, which selects segments that are similar in
Cepstral Coefficient) analysis is a technique used to extract
some way (e.g., in terms of pitch, loudness, or timbre
features from audio signals that are relevant for speech and
- thanks to similarities revealed by umap on mfccs in
music recognition. It involves calculating a set of coeffi-
our case) to the current segment being played.
cients that represent the spectral envelope of the audio sig-
We will now describe these steps in further detail: nal, or decomposing a sound signal into a set of frequency
bands and representing the power spectrum of each band
5.1 Slicing with a set of coefficients. The resulting MFCC coefficients
capture important spectral characteristics of the sound sig-
Slicing a sound file musically allow various possible ex- nal (albeit hardly interpretable by the novice user), such
ploitations in the realm of CBCS. In MuBu onset detection as the frequency and magnitude of the spectral peaks. We
is done with pipo.onseg or pipo.gate. FluCoMa expose five will see that combines with umap, it is able to capture the
different onset detection algorithms: spectral characteristics of the musician’s playing style.

1. fluid.ampslice: Amplitude-based detrending slicer 5.3 Statistical Analysis Over Each Slice
2. fluid.ampgate: Gate detection on a signal
BufStats is used to calculate statistical measures on data
3. fluid.onsetslice: Spectral difference-based audio buffer stored in a buffer channel. BufStats calculates seven statis-
slicer tics on the data in the buffer channel: mean, standard de-
4. fluid.noveltyslice: Based on self-similarity matrix viation, skewness, kurtosis, low, middle, and high values.
(SSM) These statistics provide information about the central ten-
5. fluid.transcientslice: Implements a de-clicking algo- dency of the data and how it is distributed around that ten-
rithm dency. In addition to calculating statistics on the original
buffer channel, BufStats can also calculate statistics on up
Onsetslice only was extensively tested. The only tweaked to two derivatives of the original data, apply weights to the
parameters were a straight-forward “threshold” as well as a data using a weights buffer, and identify and remove out-
“minslicelength” argument, determining the shortest slice lier frames. These statistical measures can be useful for
allowed (or minimum duration of a slice) in hopSize. This comparing different time-series data, even if they have dif-
introduce a common limitation in CBCS: the system strongly ferent length, and may provide better distinction between
biases the user to choose short samples for better anal- data points when used in training or analysis. The output
ysis results, and more interactivity, when controlling the of BufStats is a buffer with the same number of channels as
database with a gesture follower. Aaron Einbond remarks the original data, with each channel containing the statis-
in the use of CataRT how short samples most suited his in- tics for its corresponding data in the original buffer.
tention: “Short samples containing rapid, dry attacks, such
as close-miked key-clicks, were especially suitable for a 5.4 Normalization
convincing impression of motion of the single WFS source.
The FluCoMa package proposes several scaling or prepro-
The effect is that of a virtual instrument moving through the
cessing tools, amongst which normalization and standard-
concert hall in tandem with changes in its timbral content,
ization were used. Standardization and normalization are
realizing Wessel’s initial proposal.”[23]
techniques used to transform variables so that they can be
A related limitation of concatenative synthesis lies in the
compared or combined in statistical analyses. Both tech-
fact that short samples will demonstrate the efficiency of
niques are used to make data more comparable, but they
the algorithm 7 , but at the same time moves away from
work in slightly different ways.
the “plausible simulation” sought in the present study. A
balance therefore must be found between the freedom im- Standardization scales a variable to have a mean of 0 and
posed by large samples, and the refined control one can a standard deviation of 1, while normalization scales a vari-
obtain with short samples. able to have a minimum value of 0 and a maximum value
A direct concatenation of slices clicks in most cases on of 1. Normalization ccaling was found easier to use both
the edit point, which can be avoided through the use of in 2-D (in FluCoMa, the fluid.plotter object), as well as
ramps. The second most noticeable glitch on concatena- in the VR 3D world in which the origin corresponds to a
tion concerns the interruption of low register resonances, corner of the world. The fluid.nomalize object features an
“@max” attribute (1 by default), which then maps directly
7 e.g. https://youtu.be/LD0ivjyuqMA?t=3032 to the dimensions of the VR world.
5.5 Dimensionality Reduction - UMAP 5.6 Neighbourhood queries
The neighbourhood retrieval function is based in FluCoMa
Dimensionality reduction is a technique used in machine on K-d trees and the knn algorythm. In MuBu, the mubu.knn
learning to reduce the number of features (dimensions) in object serves similar tasks. The ml.kdtree object in the
a dataset. The goal of dimensionality reduction is to sim- ml.star library [25] gives comparable results.
plify the data without losing too much information. Var- K-d trees (short for ”k-dimensional trees”) and k-nearest
ious dimensionality reduction algorithms are presented in neighbours (k-NN) are two algorithms that are related to
an early FluCoMa study [24], with interestingly no men- each other, but serve different purposes, a k-d tree is a data
tion of UMAP, later favoured. structure that is used to store and efficiently query a set of
UMAP (Uniform Manifold Approximation and Projec- points in a k-dimensional space, while the k-NN algorithm
tion) is a non-linear dimensionality reduction technique is a machine learning algorithm that is used for classifi-
that is based on the principles of topological data analy- cation or regression. Both algorithms are often used in ap-
sis. It can be used to visualize high-dimensional data in plications such as pattern recognition, image classification,
a lower-dimensional space. When applied to sound data and data mining.
analysed with MFCC (Mel-Frequency Cepstral Coefficients),
UMAP reduces the dimensionality of the data and creates a
6. WORKFLOW IN PATCHXR
visual representation of the sound in a 2- or 3-dimensional
space. I have most often used FluCoMa and PatchXR to generate
By applying UMAP to the MFCC coefficients of a sound monophonic instruments (one performer plays one instru-
signal, it is possible to create a visual representation of the ment at a time), most typically in experiences where play-
sound that preserves the relationships between the different ers the players face one another 8 . In the case of ”button
MFCC coefficients (see Fig. 5). worlds” such as this one or those described in section 6.1,
there is no need of nearest neighbour retrieval since the
performer clicks exactly on the data point, and he (medi-
ated by his avatar) reproduces what knn would do with an
automated instrument: he will privilege in his choice the
samples he can reach at hand, rather than constantly jump
large distances between items (see Fig. 1).
In the worlds developed in Sections 6.2 and 6.3 on the
other hand, data points are not explicitely represented and
some near neighbour strategies need to be implemented.
PatchXR exposes a wide range of blocks (a block corre-
sponds to an object in Max or Pure Data) making it simple
to access gesture data such as:

• The position/distance between hands/controllers and


a reference.
• The rotation angles (x y z) of both hands’ controllers
• 2-d touchscreen-like controllers, where the user moves
the xy position of a selector across a plane by man-
ually grabbing it.
• 2-d lazer-like controllers, where the user moves the
xy position of a selector remotely, as if using a lazer
pointer towards a remote screen or board.
Figure 5. Dimensionality reduction of MFCCs help revealing spectral • 2-d pads, which allow to access the velocity at which
similarities. UMAP outputs coordinates in 2d or 3d.
the pad is hit
• 3-d slider or theremine like controllers, where the
UMAP is therefore used for its clustering abilities in the user moves the xyz position of a selector across a
first place, helping for classification purposes. It helps plane by manually grabbing it.
identifying patterns or trends that may not be evident from • A block called “interaction box” similar to 3-d slider,
the raw data. This can be useful for tasks such as explor- with the main difference that the user does not grab
ing the structure of a sound dataset, identifying patterns or the selector, but instead comes in and out of the in-
trends in the data, and comparing different sounds. teractive zone
Most importantly, the non-linear dimensions proposed by • 1-d sliders, knobs, buttons...
UMAP (whether in 2d in Max or in 3 dimensions in PatchXR,
and when compared to linear analyses in which, for in- One of the current challenges consists in diversifying the
stance, x, y and z correspond to pitch, loudness and cen- ways in which the corpus is queried.
troid) gave far more “intelligent” clustering than more con-
ventional parameter-consistent types of representations. 8 https://youtu.be/WhuqOOuzzBw
One to one mapping of UMAP results such as those of
section 6.1 use buttons facing each other, in order to prompt
the players to face each other 9 .
When playing alone and controlling at the same time many
instruments (the one-man-orchestra), encourages to use higher
level type of control over automata, i.e. to implement the
simple ability to concatenate automatically: play the next
sample as soon as the previous one has stopped (see section
6.2).

6.1 One to one mapping between data points and


buttons
The first javascript routine developed was designed to sim-
ply map a data point in the sonic space to a button in the
virtual 3d space. By iterating over an array, the routine
generates a world in which each button’s coordinates are
dictated by the FluCoMa umap analysis descripbed in sec-
tion 5.5. Albeit simpler than the method that will be ex-
posed later, this simple one to one to one mapping has
several advantages, most importantly the haptic and visual
feedback the performer gets when hitting each button.

6.2 Max/pd dependence


The second stage investigated the possibility to render the
sounds on an array of raspberry pi computers [26]. While
this method shows advantages in terms of patching (be-
cause of the convenience of using max and pure data), the
main drawback is that patches designed in this way cannot
be accessed by the PatchXR community. Documentation,
similarly is harder to record since the sound is produced
outside of PatchXR. The haptic and visual feedback is very
different here in the sense that the user controls primarily
the region of the space to be played, and when to start and
stop playing (when his hand touches the interface or not).
The way he plays is less rhythmical than in the button in-
terface where the player “hits” 10 each sample (section6.1).
Here on the contrary, the automat simply keeps playing as
long as he touches the interface 11
Flucoma exposes the fluid.kdtree that is able to find the
k nearest number once it is given as input the coordinates
of each data point (see Fig. 6). This method proved more Figure 6. The fluid.kdtree object is used here to retrieve the 8 nearest
suitable to control automata in which the player selects a neighbours of a point in a 2-d space.
region of a 2d plane together with the number of neigh-
bours he wants the automate to improvise with.
Most satisfying results were achieved by sending mes- board” block (the long rectangle in Fig.7). To measure
sages to each RaspberriPi independently, according to its the distance in 3D using Thales’ theorem (or so called dis-
specific (static) IP address, with a simple syntax of a 2- tance formula), we need to find the distance between two
integer list corresponding to: 1/which buffer to lookup 2/ points in three-dimensional space, i.e find the diameter of
which slice in this buffer to play, each Pi/speaker thus be- the sphere that passes through both points.
ing able to play each sound in a pure data patch, in which
each slice looks up an array with the corresponding slice p
points. (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

6.3 Nearest neighbour in PatchXR Fig. 8 shows the corresponding implementation in patchXR.
The currently most frequently used program is a javascript This fragment of visual program (called “abstraction” or
routine that generates a world (a .patch file) in which the x “subpatch” in Max, and called “group” in PatchXR) is then
y z coordinates of each data point are stored in a “knob used in a more complex patch which iterates (100 times per
9
second, for instance) over an array (the rectangular “knob-
https://youtu.be/
10 the opening of this video aptly conveys how the energy transfers form
board” object in Fig.7, so as to output the index (the value
the player’s gesture https://www.youtube.com/watch?v=glFdzbAJVRU 234 in the figure) of the point situated the closest to the
11 https://youtu.be/vtob96F9cQw controller.
expressing a lack of embodiment in the performance. Al-
though in its infancy, a first live performance staged/composed
by the author in Paris, with participants distributed across
Europe, showed a promising potential for tackling these is-
sues, most importantly through its attempt at dramatising
the use of avatars.

• JIM 23: https://youtu.be/npyfwqN02qE

A lot remains to be improved in order to give the audi-


ence an experience aesthetically comparable to that of the
Figure 7. The coordinates of the player’s controller (here 0.01, 0.06,
concert hall. Carefully orchestrated movements of cam-
0.13) yield as a result index 234 as nearest neighbour (read from left to eras around the avatars, and faithful translation the headset
right). experience of spatial audio with immersive visuals need
further exploration, but would go beyond the scope of the
present article.

8. CONCLUSIONS
We’ve proposed a workflow for corpus-based concatena-
tive synthesis CBCS in multiplayer VR (or metaverse), ar-
guing that machine learning tools for data visualisation of-
fer revealing and exploitable information about the timbral
quality of the material that is being analysed. In a wider
sense, the present approach can be understood as reflexive
practice on new media, according to which the notion of
data-base may be considered an art form [30]).
Figure 8. The implementation of Thales’ theorem in patchXR (read from The discussed tools for “machine listening” (FluCoMa,
right to left).
MuBu) help building intelligent instruments with relatively
small amounts of data, the duration of samples appear cru-
cial in CBCS. A balance must be found between 1/ short
7. NETWORKED MUSIC PERFORMANCE
duration sample analysis which are easier to process and
The concept of online virtual collaboration has gained sig- categorise and 2/ long samples which sound more natural
nificant attention, notably through Meta’s promotion in 2021. in the context instrument-based simulations.
The application of this concept within the realm of “tele-
improvisations” [27], most commonly referred to as Net- Acknowledgments
worked Music Performance (NMP) or holds the potential I am grateful for the support of UCA/CTEL, whose artist
to overcome what was hitherto viewed as intrinsic limita- residency research program has allowed to hold these ex-
tion of the field, both from practitioners and from an audi- periments, and for the support of PRISM-CNRS.
ence point of view.
NMPs have indeed stimulated considerable research and
experimentation whilst facing resistance at the same time. 9. REFERENCES
In his article “Not Being There”, Miller Puckette argues [1] Andersson, “immersive audio programming in a vir-
that he finds “Having seen a lot of networked performances tual reality sandbox,” journal of the audio engineering
of one sort or another, I find myself most excited by the po- society, march 2019.
tential of networked “telepresence” as an aid to rehearsal,
not performance.” [28]. In Embodiment and Disembodi- [2] L. Turchet, N. Garau, and N. Conci, “Networked
ment in NMP [29], Georg Hajdu identifies an issue with Musical XR: where’s the limit? A preliminary
a lack of readability from an audience perspective, who investigation on the joint use of point clouds and
cannot perceive the gesture to sound relationship as would low-latency audio communication,” in Proceedings
be the case in a normal concert situation: “These perfor- of the 17th International Audio Mostly Conference,
mances take machine–performer–spectator interactions into ser. AM ’22. New York, NY, USA: Association for
consideration, which, to a great deal, rely on embodied Computing Machinery, 2022, pp. 226–230. [Online].
cognition and the sense of causality[...]. Classical cause- Available: https://doi.org/10.1145/3561212.3561237
and-effect relationships (which also permeate the ‘genuine’
musical sign of the index) are replaced by plausibility, that [3] L. Turchet, “Musical Metaverse: vision, opportunities,
is the amount to which performers and spectators are ca- and challenges,” Personal and Ubiquitous Computing,
pable of ‘buying’ the outcome of a performance by build- 01 2023.
ing mental maps of the interaction.” Performances of lap-
top orchestras, along with various other experiments using [4] F. Berthaut, “3D interaction techniques for musical ex-
technology collaboratively, wether in local or distributed pression,” Journal of New Music Research, vol. 49,
settings, have reported similar concerns, most commonly no. 1, pp. 60–72, 2020.
[5] B. Loveridge, “Networked music performance in vir- [17] B. Hackbarth, N. Schnell, P. Esling, and D. Schwarz,
tual reality: current perspectives,” Journal of Network “Composing Morphology: Concatenative Synthesis
Music and Arts, vol. 2, no. 1, p. 2, 2020. as an Intuitive Medium for Prescribing Sound
in Time,” Contemporary Music Review, vol. 32,
[6] A. Çamcı and R. Hamilton, “Audio-first VR: new per- no. 1, pp. 49–59, 2013. [Online]. Available: https:
spectives on musical experiences in virtual environ- //hal.archives-ouvertes.fr/hal-01577895
ments,” Journal of New Music Research, vol. 49, no. 1,
pp. 1–7, 2020. [18] N. Schnell, A. Roebel, D. Schwarz, G. Peeters, and
R. Borghesi, “MUBU and FRIENDS -ASSEMBLING
[7] D. Schwarz, G. Beller, B. Verbrugghe, and S. Britton, TOOLS FOR CONTENT BASED REAL-TIME IN-
“Real-Time Corpus-Based Concatenative Synthesis TERACTIVE AUDIO PROCESSING IN MAX/MSP,”
with CataRT,” in 9th International Conference on Proceedings of the International Computer Music Con-
Digital Audio Effects (DAFx), Montreal, Canada, Sep. ference (ICMC 2009), 01 2009.
2006, pp. 279–282, cote interne IRCAM: Schwarz06c.
[Online]. Available: https://hal.archives-ouvertes.fr/ [19] P. A. Tremblay, G. Roma, and O. Green, “Enabling
hal-01161358 Programmatic Data Mining as Musicking: The
Fluid Corpus Manipulation Toolkit,” Computer Music
[8] J.-P. Briot, G. Hadjeres, and F.-D. Pachet, Deep Journal, vol. 45, no. 2, pp. 9–23, 06 2021. [Online].
Learning Techniques for Music Generation – A Available: https://doi.org/10.1162/comj a 00600
Survey, Aug. 2019. [Online]. Available: https:
[20] F. Bevilacqua and R. Müller, “A Gesture follower for
//hal.sorbonne-universite.fr/hal-01660772
performing arts,” 05 2005.
[9] P. Esling, A. Chemla-Romeu-Santos, and A. Bitton, [21] N. Schnell, D. Schwarz, J. Larralde, and R. Borghesi,
“Generative timbre spaces with variational audio syn- “PiPo, a Plugin Interface for Afferent Data Stream Pro-
thesis,” CoRR, vol. abs/1805.08501, 2018. [Online]. cessing Operators,” in International Society for Music
Available: http://arxiv.org/abs/1805.08501 Information Retrieval Conference, 2017.
[10] D. L. Wessel, “Timbre Space as a Musical Control [22] R. Fiebrink and P. Cook, “The Wekinator: A System
Structure,” Computer Music Journal, vol. 3, no. 2, pp. for Real-time, Interactive Machine Learning in Mu-
45–52, 1979. [Online]. Available: http://www.jstor. sic,” Proceedings of The Eleventh International Soci-
org/stable/3680283 ety for Music Information Retrieval Conference (IS-
MIR 2010), 01 2010.
[11] K. Fitz, M. Burk, and M. McKinney, “Multidimen-
sional perceptual scaling of musical timbre by hearing- [23] A. Einbond and D. Schwarz, “Spatializing Tim-
impaired listeners,” The Journal of the Acoustical So- bre With Corpus-Based Concatenative Synthesis,” 06
ciety of America, vol. 125, p. 2633, 05 2009. 2010.

[12] J.-C. Risset and D. Wessel, “Exploration of timbre by [24] G. Roma, O. Green, and P. A. Tremblay, “Adaptive
analysis and synthesis,” Psychology of Music, pp. 113– Mapping of Sound Collections for Data-driven Musical
169, 1999. Interfaces,” in New Interfaces for Musical Expression,
2019.
[13] S. Mcadams, S. Winsberg, S. Donnadieu, G. De Soete,
and J. Krimphoff, “Perceptual scaling of synthesized [25] B. D. Smith and G. E. Garnett, “Unsupervised Play:
musical timbres: Common dimensions, specificities, Machine Learning Toolkit for Max,” in New Interfaces
and latent subject classes,” Psychological research, for Musical Expression, 2012.
vol. 58, pp. 177–92, 02 1995.
[26]  PrÉ  : connected polyphonic immersion. Zenodo,
[14] A. Caclin, S. Mcadams, B. Smith, and S. Winsberg, Jul. 2022. [Online]. Available: https://doi.org/10.5281/
“Acoustic correlates of timbre space dimensions: A zenodo.6806324
confirmatory study using synthetic tones,” The Jour- [27] R. Mills, “Tele-Improvisation: Intercultural Interaction
nal of the Acoustical Society of America, vol. 118, pp. in the Online Global Music Jam Session,” in Springer
471–82, 08 2005. Series on Cultural Computing, 2019. [Online].
Available: https://api.semanticscholar.org/CorpusID:
[15] D. Schwarz, “The Sound Space as Musical Instrument:
57428481
Playing Corpus-Based Concatenative Synthesis,” in
New Interfaces for Musical Expression (NIME), Ann [28] M. Puckette, “Not Being There,” Contemporary
Arbour, United States, May 2012, pp. 250–253, cote Music Review, vol. 28, no. 4-5, pp. 409–412,
interne IRCAM: Schwarz12a. [Online]. Available: 2009. [Online]. Available: https://doi.org/10.1080/
https://hal.archives-ouvertes.fr/hal-01161442 07494460903422354
[16] L. Garber, T. Ciccola, and J. C. Amusategui, “Au- [29] G. Hajdu, “Embodiment and disembodiment in
dioStellar, an open source corpus-based musical instru- networked music performance,” 2017. [Online].
ment for latent sound structure discovery and sonic ex- Available: https://api.semanticscholar.org/CorpusID:
perimentation,” 12 2020. 149523160
[30] L. Manovich, “Database as Symbolic Form,” Conver-
gence: The International Journal of Research into New
Media Technologies, vol. 5, pp. 80 – 99, 1999.

You might also like