Icmc 2023 Template
Icmc 2023 Template
Jonathan Bell
Aix Marseille Univ, CNRS, PRISM, Marseille, France
[email protected]
ABSTRACT
1. fluid.ampslice: Amplitude-based detrending slicer 5.3 Statistical Analysis Over Each Slice
2. fluid.ampgate: Gate detection on a signal
BufStats is used to calculate statistical measures on data
3. fluid.onsetslice: Spectral difference-based audio buffer stored in a buffer channel. BufStats calculates seven statis-
slicer tics on the data in the buffer channel: mean, standard de-
4. fluid.noveltyslice: Based on self-similarity matrix viation, skewness, kurtosis, low, middle, and high values.
(SSM) These statistics provide information about the central ten-
5. fluid.transcientslice: Implements a de-clicking algo- dency of the data and how it is distributed around that ten-
rithm dency. In addition to calculating statistics on the original
buffer channel, BufStats can also calculate statistics on up
Onsetslice only was extensively tested. The only tweaked to two derivatives of the original data, apply weights to the
parameters were a straight-forward “threshold” as well as a data using a weights buffer, and identify and remove out-
“minslicelength” argument, determining the shortest slice lier frames. These statistical measures can be useful for
allowed (or minimum duration of a slice) in hopSize. This comparing different time-series data, even if they have dif-
introduce a common limitation in CBCS: the system strongly ferent length, and may provide better distinction between
biases the user to choose short samples for better anal- data points when used in training or analysis. The output
ysis results, and more interactivity, when controlling the of BufStats is a buffer with the same number of channels as
database with a gesture follower. Aaron Einbond remarks the original data, with each channel containing the statis-
in the use of CataRT how short samples most suited his in- tics for its corresponding data in the original buffer.
tention: “Short samples containing rapid, dry attacks, such
as close-miked key-clicks, were especially suitable for a 5.4 Normalization
convincing impression of motion of the single WFS source.
The FluCoMa package proposes several scaling or prepro-
The effect is that of a virtual instrument moving through the
cessing tools, amongst which normalization and standard-
concert hall in tandem with changes in its timbral content,
ization were used. Standardization and normalization are
realizing Wessel’s initial proposal.”[23]
techniques used to transform variables so that they can be
A related limitation of concatenative synthesis lies in the
compared or combined in statistical analyses. Both tech-
fact that short samples will demonstrate the efficiency of
niques are used to make data more comparable, but they
the algorithm 7 , but at the same time moves away from
work in slightly different ways.
the “plausible simulation” sought in the present study. A
balance therefore must be found between the freedom im- Standardization scales a variable to have a mean of 0 and
posed by large samples, and the refined control one can a standard deviation of 1, while normalization scales a vari-
obtain with short samples. able to have a minimum value of 0 and a maximum value
A direct concatenation of slices clicks in most cases on of 1. Normalization ccaling was found easier to use both
the edit point, which can be avoided through the use of in 2-D (in FluCoMa, the fluid.plotter object), as well as
ramps. The second most noticeable glitch on concatena- in the VR 3D world in which the origin corresponds to a
tion concerns the interruption of low register resonances, corner of the world. The fluid.nomalize object features an
“@max” attribute (1 by default), which then maps directly
7 e.g. https://youtu.be/LD0ivjyuqMA?t=3032 to the dimensions of the VR world.
5.5 Dimensionality Reduction - UMAP 5.6 Neighbourhood queries
The neighbourhood retrieval function is based in FluCoMa
Dimensionality reduction is a technique used in machine on K-d trees and the knn algorythm. In MuBu, the mubu.knn
learning to reduce the number of features (dimensions) in object serves similar tasks. The ml.kdtree object in the
a dataset. The goal of dimensionality reduction is to sim- ml.star library [25] gives comparable results.
plify the data without losing too much information. Var- K-d trees (short for ”k-dimensional trees”) and k-nearest
ious dimensionality reduction algorithms are presented in neighbours (k-NN) are two algorithms that are related to
an early FluCoMa study [24], with interestingly no men- each other, but serve different purposes, a k-d tree is a data
tion of UMAP, later favoured. structure that is used to store and efficiently query a set of
UMAP (Uniform Manifold Approximation and Projec- points in a k-dimensional space, while the k-NN algorithm
tion) is a non-linear dimensionality reduction technique is a machine learning algorithm that is used for classifi-
that is based on the principles of topological data analy- cation or regression. Both algorithms are often used in ap-
sis. It can be used to visualize high-dimensional data in plications such as pattern recognition, image classification,
a lower-dimensional space. When applied to sound data and data mining.
analysed with MFCC (Mel-Frequency Cepstral Coefficients),
UMAP reduces the dimensionality of the data and creates a
6. WORKFLOW IN PATCHXR
visual representation of the sound in a 2- or 3-dimensional
space. I have most often used FluCoMa and PatchXR to generate
By applying UMAP to the MFCC coefficients of a sound monophonic instruments (one performer plays one instru-
signal, it is possible to create a visual representation of the ment at a time), most typically in experiences where play-
sound that preserves the relationships between the different ers the players face one another 8 . In the case of ”button
MFCC coefficients (see Fig. 5). worlds” such as this one or those described in section 6.1,
there is no need of nearest neighbour retrieval since the
performer clicks exactly on the data point, and he (medi-
ated by his avatar) reproduces what knn would do with an
automated instrument: he will privilege in his choice the
samples he can reach at hand, rather than constantly jump
large distances between items (see Fig. 1).
In the worlds developed in Sections 6.2 and 6.3 on the
other hand, data points are not explicitely represented and
some near neighbour strategies need to be implemented.
PatchXR exposes a wide range of blocks (a block corre-
sponds to an object in Max or Pure Data) making it simple
to access gesture data such as:
6.3 Nearest neighbour in PatchXR Fig. 8 shows the corresponding implementation in patchXR.
The currently most frequently used program is a javascript This fragment of visual program (called “abstraction” or
routine that generates a world (a .patch file) in which the x “subpatch” in Max, and called “group” in PatchXR) is then
y z coordinates of each data point are stored in a “knob used in a more complex patch which iterates (100 times per
9
second, for instance) over an array (the rectangular “knob-
https://youtu.be/
10 the opening of this video aptly conveys how the energy transfers form
board” object in Fig.7, so as to output the index (the value
the player’s gesture https://www.youtube.com/watch?v=glFdzbAJVRU 234 in the figure) of the point situated the closest to the
11 https://youtu.be/vtob96F9cQw controller.
expressing a lack of embodiment in the performance. Al-
though in its infancy, a first live performance staged/composed
by the author in Paris, with participants distributed across
Europe, showed a promising potential for tackling these is-
sues, most importantly through its attempt at dramatising
the use of avatars.
8. CONCLUSIONS
We’ve proposed a workflow for corpus-based concatena-
tive synthesis CBCS in multiplayer VR (or metaverse), ar-
guing that machine learning tools for data visualisation of-
fer revealing and exploitable information about the timbral
quality of the material that is being analysed. In a wider
sense, the present approach can be understood as reflexive
practice on new media, according to which the notion of
data-base may be considered an art form [30]).
Figure 8. The implementation of Thales’ theorem in patchXR (read from The discussed tools for “machine listening” (FluCoMa,
right to left).
MuBu) help building intelligent instruments with relatively
small amounts of data, the duration of samples appear cru-
cial in CBCS. A balance must be found between 1/ short
7. NETWORKED MUSIC PERFORMANCE
duration sample analysis which are easier to process and
The concept of online virtual collaboration has gained sig- categorise and 2/ long samples which sound more natural
nificant attention, notably through Meta’s promotion in 2021. in the context instrument-based simulations.
The application of this concept within the realm of “tele-
improvisations” [27], most commonly referred to as Net- Acknowledgments
worked Music Performance (NMP) or holds the potential I am grateful for the support of UCA/CTEL, whose artist
to overcome what was hitherto viewed as intrinsic limita- residency research program has allowed to hold these ex-
tion of the field, both from practitioners and from an audi- periments, and for the support of PRISM-CNRS.
ence point of view.
NMPs have indeed stimulated considerable research and
experimentation whilst facing resistance at the same time. 9. REFERENCES
In his article “Not Being There”, Miller Puckette argues [1] Andersson, “immersive audio programming in a vir-
that he finds “Having seen a lot of networked performances tual reality sandbox,” journal of the audio engineering
of one sort or another, I find myself most excited by the po- society, march 2019.
tential of networked “telepresence” as an aid to rehearsal,
not performance.” [28]. In Embodiment and Disembodi- [2] L. Turchet, N. Garau, and N. Conci, “Networked
ment in NMP [29], Georg Hajdu identifies an issue with Musical XR: where’s the limit? A preliminary
a lack of readability from an audience perspective, who investigation on the joint use of point clouds and
cannot perceive the gesture to sound relationship as would low-latency audio communication,” in Proceedings
be the case in a normal concert situation: “These perfor- of the 17th International Audio Mostly Conference,
mances take machine–performer–spectator interactions into ser. AM ’22. New York, NY, USA: Association for
consideration, which, to a great deal, rely on embodied Computing Machinery, 2022, pp. 226–230. [Online].
cognition and the sense of causality[...]. Classical cause- Available: https://doi.org/10.1145/3561212.3561237
and-effect relationships (which also permeate the ‘genuine’
musical sign of the index) are replaced by plausibility, that [3] L. Turchet, “Musical Metaverse: vision, opportunities,
is the amount to which performers and spectators are ca- and challenges,” Personal and Ubiquitous Computing,
pable of ‘buying’ the outcome of a performance by build- 01 2023.
ing mental maps of the interaction.” Performances of lap-
top orchestras, along with various other experiments using [4] F. Berthaut, “3D interaction techniques for musical ex-
technology collaboratively, wether in local or distributed pression,” Journal of New Music Research, vol. 49,
settings, have reported similar concerns, most commonly no. 1, pp. 60–72, 2020.
[5] B. Loveridge, “Networked music performance in vir- [17] B. Hackbarth, N. Schnell, P. Esling, and D. Schwarz,
tual reality: current perspectives,” Journal of Network “Composing Morphology: Concatenative Synthesis
Music and Arts, vol. 2, no. 1, p. 2, 2020. as an Intuitive Medium for Prescribing Sound
in Time,” Contemporary Music Review, vol. 32,
[6] A. Çamcı and R. Hamilton, “Audio-first VR: new per- no. 1, pp. 49–59, 2013. [Online]. Available: https:
spectives on musical experiences in virtual environ- //hal.archives-ouvertes.fr/hal-01577895
ments,” Journal of New Music Research, vol. 49, no. 1,
pp. 1–7, 2020. [18] N. Schnell, A. Roebel, D. Schwarz, G. Peeters, and
R. Borghesi, “MUBU and FRIENDS -ASSEMBLING
[7] D. Schwarz, G. Beller, B. Verbrugghe, and S. Britton, TOOLS FOR CONTENT BASED REAL-TIME IN-
“Real-Time Corpus-Based Concatenative Synthesis TERACTIVE AUDIO PROCESSING IN MAX/MSP,”
with CataRT,” in 9th International Conference on Proceedings of the International Computer Music Con-
Digital Audio Effects (DAFx), Montreal, Canada, Sep. ference (ICMC 2009), 01 2009.
2006, pp. 279–282, cote interne IRCAM: Schwarz06c.
[Online]. Available: https://hal.archives-ouvertes.fr/ [19] P. A. Tremblay, G. Roma, and O. Green, “Enabling
hal-01161358 Programmatic Data Mining as Musicking: The
Fluid Corpus Manipulation Toolkit,” Computer Music
[8] J.-P. Briot, G. Hadjeres, and F.-D. Pachet, Deep Journal, vol. 45, no. 2, pp. 9–23, 06 2021. [Online].
Learning Techniques for Music Generation – A Available: https://doi.org/10.1162/comj a 00600
Survey, Aug. 2019. [Online]. Available: https:
[20] F. Bevilacqua and R. Müller, “A Gesture follower for
//hal.sorbonne-universite.fr/hal-01660772
performing arts,” 05 2005.
[9] P. Esling, A. Chemla-Romeu-Santos, and A. Bitton, [21] N. Schnell, D. Schwarz, J. Larralde, and R. Borghesi,
“Generative timbre spaces with variational audio syn- “PiPo, a Plugin Interface for Afferent Data Stream Pro-
thesis,” CoRR, vol. abs/1805.08501, 2018. [Online]. cessing Operators,” in International Society for Music
Available: http://arxiv.org/abs/1805.08501 Information Retrieval Conference, 2017.
[10] D. L. Wessel, “Timbre Space as a Musical Control [22] R. Fiebrink and P. Cook, “The Wekinator: A System
Structure,” Computer Music Journal, vol. 3, no. 2, pp. for Real-time, Interactive Machine Learning in Mu-
45–52, 1979. [Online]. Available: http://www.jstor. sic,” Proceedings of The Eleventh International Soci-
org/stable/3680283 ety for Music Information Retrieval Conference (IS-
MIR 2010), 01 2010.
[11] K. Fitz, M. Burk, and M. McKinney, “Multidimen-
sional perceptual scaling of musical timbre by hearing- [23] A. Einbond and D. Schwarz, “Spatializing Tim-
impaired listeners,” The Journal of the Acoustical So- bre With Corpus-Based Concatenative Synthesis,” 06
ciety of America, vol. 125, p. 2633, 05 2009. 2010.
[12] J.-C. Risset and D. Wessel, “Exploration of timbre by [24] G. Roma, O. Green, and P. A. Tremblay, “Adaptive
analysis and synthesis,” Psychology of Music, pp. 113– Mapping of Sound Collections for Data-driven Musical
169, 1999. Interfaces,” in New Interfaces for Musical Expression,
2019.
[13] S. Mcadams, S. Winsberg, S. Donnadieu, G. De Soete,
and J. Krimphoff, “Perceptual scaling of synthesized [25] B. D. Smith and G. E. Garnett, “Unsupervised Play:
musical timbres: Common dimensions, specificities, Machine Learning Toolkit for Max,” in New Interfaces
and latent subject classes,” Psychological research, for Musical Expression, 2012.
vol. 58, pp. 177–92, 02 1995.
[26] PrÉ : connected polyphonic immersion. Zenodo,
[14] A. Caclin, S. Mcadams, B. Smith, and S. Winsberg, Jul. 2022. [Online]. Available: https://doi.org/10.5281/
“Acoustic correlates of timbre space dimensions: A zenodo.6806324
confirmatory study using synthetic tones,” The Jour- [27] R. Mills, “Tele-Improvisation: Intercultural Interaction
nal of the Acoustical Society of America, vol. 118, pp. in the Online Global Music Jam Session,” in Springer
471–82, 08 2005. Series on Cultural Computing, 2019. [Online].
Available: https://api.semanticscholar.org/CorpusID:
[15] D. Schwarz, “The Sound Space as Musical Instrument:
57428481
Playing Corpus-Based Concatenative Synthesis,” in
New Interfaces for Musical Expression (NIME), Ann [28] M. Puckette, “Not Being There,” Contemporary
Arbour, United States, May 2012, pp. 250–253, cote Music Review, vol. 28, no. 4-5, pp. 409–412,
interne IRCAM: Schwarz12a. [Online]. Available: 2009. [Online]. Available: https://doi.org/10.1080/
https://hal.archives-ouvertes.fr/hal-01161442 07494460903422354
[16] L. Garber, T. Ciccola, and J. C. Amusategui, “Au- [29] G. Hajdu, “Embodiment and disembodiment in
dioStellar, an open source corpus-based musical instru- networked music performance,” 2017. [Online].
ment for latent sound structure discovery and sonic ex- Available: https://api.semanticscholar.org/CorpusID:
perimentation,” 12 2020. 149523160
[30] L. Manovich, “Database as Symbolic Form,” Conver-
gence: The International Journal of Research into New
Media Technologies, vol. 5, pp. 80 – 99, 1999.