Machine Learning With Max/msp
Machine Learning With Max/msp
Machine Learning With Max/msp
3, 2015
Ali Momeni
Birmingham Conservatoire
Birmingham City University
Paradise Place, B3 3HG
[email protected]
ABSTRACT
This paper documents the development of ml.lib: a set of opensource tools designed for employing a wide range of machine
learning techniques within two popular real-time programming
environments, namely Max and Pure Data. ml.lib is a crossplatform, lightweight wrapper around Nick Gillians Gesture
Recognition Toolkit, a C++ library that includes a wide range
of data processing and machine learning techniques. ml.lib
adapts these techniques for real-time use within popular dataflow IDEs, allowing instrument designers and performers to
integrate robust learning, classification and mapping approaches
within their existing workflows. ml.lib has been carefully designed to allow users to experiment with and incorporate machine learning techniques within an interactive arts context with
minimal prior knowledge. A simple, logical and consistent,
scalable interface has been provided across over sixteen externals in order to maximize learnability and discoverability. A
focus on portability and maintainability has enabled ml.lib to
support a range of computing architecturesincluding ARM
and operating systems such as Mac OS, GNU/Linux and Windows, making it the most comprehensive machine learning
implementation available for Max and Pure Data.
Author Keywords
Machine Learning, Max, Pure Data, Gesture, Classification,
Mapping, Artificial Neural Networks, Support Vector Machines, Regression
ACM Classification
I.2.6 [Artificial Intelligence] Induction, H.5.5 [Information
Interfaces and Presentation] Sound and Music Computing.
1. INTRODUCTION
The term Machine Learning refers to a scientific discipline and
associated range of techniques that explore the construction and study
of algorithms that can learn from data through induction or by
example [21]. Typically machine learning techniques are black box
systems that can deduce appropriate outputs from given inputs based
on a statistical model generated from sufficient and appropriate training data. Supervised machine learning algorithms take a set of labeled
feature vectors (lists of values that describe features of a class), which
are used to train the machine learning algorithm and generate a
model. Once trained, a classification system can output an estimate
for the class of unlabeled feature vectors. In the case of regression
algorithms, a continuous value is given as output rather than a discrete class. Unsupervised machine learning algorithms take a set of
unlabeled feature vectors and partition them into clusters based on
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. To copy otherwise, to republish, to post on
servers or to redistribute to lists, requires prior specific permission and/or a fee.
NIME15, May 31-June 3, 2015, Louisiana State University, Baton Rouge, LA.
Copyright remains with the author(s).
2. BACKGROUND
There is extensive use of machine learning within the domain of
sound and music research. Recent contributions to this field include
several categories of applications: gesture analysis, mapping and
265
Proceedings of the International Conference on New Interfaces for Musical Expression, Baton Rouge, LA, USA, May 31-June 3, 2015
control of sound synthesis [2, 4, 12, 14, 16, 20, 22, 29] parsing and
segmentation [3], and algorithmic composition [6]. A number of
existing projects implement various machine learning techniques in
Max. As early as 1991, Wessel et al. implemented a real-tine artificial neural network for control of synthesis parameters in Max [30].
Later work from CNMAT includes Schmeders use of support vector
machines applied to pitch predictions [27]. The MnM toolbox from
Bevilacqua et al. implements multidimensional linear mapping [
as basic module to build complex n-to-m mapping; the MnM package of Max externals also includes a principal component analysis
module which can also be of use in creating complex real-time mappings [1]. It is worth noting that MnM relies on FTMa shared
library for Max for static and dynamic creation of complex data structures and therefore requires an additional step for integration into a
user s workflow, a step that may be non-trivial depending on the
complexity of the users existing patch. Cont et al.s range of workleading to the development of the sophisticated score-following engine Antescofo [7, 8, 9, 11] are among the most advanced examples
of machine learning at work within the Max environment. Cont et al.
also created neural network implementation for PD, applied to gesture mapping [10].Smith and Garnett developed a machine learning
library for Max that implements adaptive resonance theory, selforganizing maps and spatial encoding [28].
A number of externals also exist for the PD environment. The most
coherent and widely-used being the ANN library by Davide Morelli1.
This consists of a multilayer perceptron and a time-delay network,
implemented as a wrapper around the widely-used FANN library
[24] and a self-organizing map implementation, featuring Instar,
Outstar and Kohonen learning rules. A Genetic Algorithm implementation has been developed by Georg Holzman using the flext framework, and is therefore available for both Max and PD. There also
exists a k-NN (ks nearest neighbor) external, originally developed by
Fujinaga and MacMillan [15] and now maintained by Jamie Bullock.
A plan was proposed to develop an SVM external as part of the 2009
Google Summer of Code2, but to the knowledge of the current authors, this was never realized.
3.1. GRT
Given the range of existing machine learning libraries available for
C and C++, it was decided that given the limited development resources available, the best approach would be to develop a wrapper
around an existing library rather than starting from scratch. An
initial survey was conducted and a range of libraries were considered including Dlib3 , mlpack4 and Shark5. We also considered
using a collection of C libraries for example libsvm (for Support
Vector Machines). After considering the pros and cons of each
library, we decided to base ml.lib on the Gesture Recognition
Toolkit by Nick Gillian [17] due to its wide range of implemented
algorithms, simple design, straightforward C++ interface, pre- and
post-processing functions and orientation towards artistic applications, specifically real-time gesture analysis.
3.2. flext
Max and PD both provide C APIs for developing external objects. Whilst the APIs are superficially similar, there are enough
differences to mean that in supporting both environments, strategies must be developed for effective code reuse. One approach
is to use C macros to conditionally include environmentspecific code blocks. This may be sufficient for smaller projects, but for larger projects it degrades readability and creates
an unnecessary maintenance burden. An alternative approach is
to use the flext API, by Thomas Grill [18], an abstract objectoriented C++ interface that provides a common layer, compatible with both Max and PD. flext is a compile-time dependency
meaning that it places no additional installation burden on the
offline
map
training
vector
3. IMPLEMENTATION
algorithm
in-memory
model
Our design goals in implementing a suite of machine learning externals are as follows:
To provide an exhaustive range of machine learning techniques for Max and PD
input
vector
output
value
read / write
stored
data
stored
model
live
http://dlib.net/ml.html
http://www.mlpack.org
http://image.diku.dk/shark
http://bit.ly/morelli_ann
http://bit.ly/pd_svm
266
Proceedings of the International Conference on New Interfaces for Musical Expression, Baton Rouge, LA, USA, May 31-June 3, 2015
rated into the codebase, whilst the GPL license forbids closedsource forks that may prevent fixes and enhancements being
contributed back to their upstream sources.
Another strategy used to ensure maintainability was to adhere
strongly to DRY (dont repeat yourself) principles in the development of the wrapper code [19]. This was achieved by developing a number of generic abstract base classes (ml_base, ml,
ml_classification and ml_regression) implementing functionality
common to the majority of wrapped classes in the GRT library.
These classes exploit C++s runtime polymorphism to call
common child class methods in GRT through a reference to a
base class instance returned by a concrete child. That is:
ml_classification and ml_regression must both implement the pure
virtual method get_MLBase_instance() and all children of
ml_classification and ml_regression must implement the pure virtual
methods get_Classifier_instance() and get_Regressifier_instance() respectively. This means that all common functionality can be
implemented in ml_base, by calling methods through a reference
to GRT::MLBase from which the majority of GRT classes derive.
Only algorithm-specific attributes and methods are implemented in children of ml_classification and ml_regression, making the
wrapper code very lean, readable and keeping repetition to a
minimum. The current ratio of wrapper code to original sources
is 5k SLOC to 41k SLOC or approximately 1:10.
4. LIBRARY DESIGN
From an end-user perspective, we aimed to provide the best
possible experience by maximizing learnability and discoverability within the ml.lib library. This was achieved by establishing a convention of consistent and logical object, message and
attribute naming, and by designing a simple and consistent
workflow across common object groups. In some cases, it was
sufficient to follow the well thought-out patterns established in
GRT, but in others further abstraction was necessary. Furthermore, the aim was not simply to wrap GRT, exposing every
detail of the GRT API, but rather to provide a somewhat abstracted set of objects conforming to the idioms and user expectations of dataflow environments. ml.lib objects follow the
naming convention ml.* where * is an abbreviated form of the
algorithm implemented by the object.
Objects fall into one of six categories:
Pre-processing: pre-process data prior to used as input to a
classification or regression object
Post-processing: post-process data after being output from a
classification or regression object
Feature extraction: extract features from control data. Feature vectors can be used as input to classification or regression
objects
Classification: take feature vectors as input, and output a value
representing the class of the input. For example an object detecting hand position might output 0 for left, 1 for right, 2 for
top and 3 for bottom.
Regression: perform an M x N mapping between an input vector and an output vector with one or more dimensions. For example an object may map x and y dimensions of hand position
to a single dimension representing the distance from origin (0,
0)
267
http://www.nickgillian.com/software/grt
Proceedings of the International Conference on New Interfaces for Musical Expression, Baton Rouge, LA, USA, May 31-June 3, 2015
6. INITIAL TESTING
A series of initial experiments were performed to test the functionality of ml.lib within Max and PD for creative applications.
These experiments were designed to be repeatable by students,
artists, designers and other potential users with minimal requirements besides Max and PD. The first three examples utilize sensor dataspecifically the three-axis accelerometer datafrom a mobile phone. The tests were conducted by Momeni
in the ArtFab, a mixed media design and fabrication lab at Car-
268
Proceedings of the International Conference on New Interfaces for Musical Expression, Baton Rouge, LA, USA, May 31-June 3, 2015
http://hexler.net/software/touchosc
http://www.raspberrypi.org
http://bit.ly/artfab_video
seven-dimensional vector corresponding to the synthesis parameters. A training example for this application consists of the
three-dimensional feature vector and the corresponding sevendimensional desired output vector (i.e the synthesis paramters).
The system is therefore performing a 3-to-7 dimensional mapping. This approach to n-to-m mapping provides a useful counterpart to a weighted-interpolation technique used to similar
ends [23] as it provides opportunities for extrapolation, i.e.
generating synthesis parameters that are outside of the range of
possibilities achieved by mixing the predefined examples
arithmetically. In our experience, these extrapolations can be
very useful compositional discovery tools for synthesis schemes
that have many parameters whose influence on the resulting
sound is highly inter-related. This application is provided with
the help-patch for the external ml.mlp as a subpatch named
test; the help patch also gives reference to a demonstration
video shared on YouTube9.
269
Proceedings of the International Conference on New Interfaces for Musical Expression, Baton Rouge, LA, USA, May 31-June 3, 2015
8. ACKNOWLEDGMENTS
Many thanks to Nick Gillian and Thomas Grill for their support
in the development of ml.lib.
9. REFERENCES
[1] Bevilacqua, F., M ller, R., & Schnell, N. (2005). MnM: a
Max/MSP mapping toolbox. Proceedings of the conference on New Interfaces for Musical Expression, 85 88.
National University of Singapore.
[2] Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell,
N., Gu dy, F., & Rasamimanana, N. H. (2009). Continuous Realtime Gesture Following and Recognition. Gesture
Workshop, 5934 (7), 73 84.
[3] Caramiaux, B., Wanderley, M. M., & Bevilacqua, F.
(2012). Segmenting and Parsing Instrumentalists' Gestures.
Journal of New Music Research, 41(1), 13 29.
[4] Carrillo, A. P., & Wanderley, M. M. (2012). Learning and
extraction of violin instrumental controls from audio signal. Mirum, 25 30.
[5] Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A Library for
Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.
[6] Collins, N. (2012). Automatic Composition of Electroacoustic Art Music Utilizing Machine Listening. Computer
Music Journal, 36(3), 8 23.
[7] Cont, A. (2006). Realtime Audio to Score Alignment for
Polyphonic Music Instruments, using Sparse NonNegative Constraints and Hierarchical HMMS. IEEE International Conference on Acoustics, Speech and Signal
Processing. Proceedings, 5, V V.
[8] Cont, A. (2008a). Antescofo: Anticipatory Synchronization and Control of Interactive Parameters in Computer
Music. In Proceedings of the International Computer Music Conference, Ann Arbor.
[15] Fujinaga, I., & MacMillan, K. (2000). Realtime recognition of orchestral instruments. In Proceedings of the international computer music conference (141), 43.
[16] Gillian, N., Knapp, B., & O'Modhrain, S. (2011). A Machine Learning Toolbox For Musician Computer Interaction. Proceedings of the conference on New Interfaces for
Musical Expression, 343 348.
[17] Gillian, N., & Paradiso, J. A. (2014). The gesture recognition toolkit. The Journal of Machine Learning Research,
15(1), 3483 3487.
[18] Grill, T. (2004). flext
C++ programming layer for
cross-platform development of PD and Max/MSP externals
An introduction In Proceedings of The second Linux Audio
Conference.
[19] Hunt, A., & Thomas, D. (2000). The pragmatic programmer: from journeyman to master. Addison-Wesley Professional.
[20] Knapp, R. B. (2011). Recognition Of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic
Time Warping. Proceedings of the International Conference on New Interfaces for Musical Expression, 1 6.
[21] Kohavi, R., & Provost, F. (1998). Glossary of terms. Machine Learning, 30(2-3), 271 274.
[22] Malloch, J., Sinclair, S., & Wanderley, M. M. (2013).
Libmapper: (a library for connecting things). CHI Extended Abstracts, 3087 3090.
[23] Momeni, A., & Wessel, D. (2003). Characterizing and
Controlling Musical Material Intuitively with Geometric
Models. Proceedings of the conference on New Interfaces
for Musical Expression, 54 62.
[24] Nissen, S. (2003). Implementation of a fast artificial neural
network library (fann). Report.
[25] Ono, M., Shizuki, B., & Tanaka, J. (2013). Touch & activate. Presented at the the 26th annual ACM symposium,
New York, New York, USA: ACM Press, 31 40.
[10] Cont, A., Coduys, T., & Henry, C. (2004). Real-time Gesture Mapping in Pd Environment using Neural Networks.
Proceedings of NIME, 39 42.
[11] Cont, A., Wessel, D., & Dubnov, S. (2014). Realtime Multiple-pitch and Multiple-instrument Recognition For Music
Signals using Sparse Non-negative Constraints. In Proceedings of Digital Audio Effects Conference, Bordeaux,
France.
[12] Fiebrink, R., & Cook, P. R. (2010). The Wekinator: a system for real-time, interactive machine learning in music. In
Proceedings of The Eleventh International Society for Music Information Retrieval Conference. Utrecht.
[13] Fiebrink, R., Trueman, D., Britt, C., Nagai, M., Kaczmarek, K., Early, M., Daniel, M. R., Hege, A., and Cook, P.
R. (2010) Toward understanding human-computer interaction in composing the instrument. In Proceedings of the International Computer Music Conference.
[14] Franoise, J. (2013). Gesture-sound mapping by demonstration in interactive music systems. ACM Multimedia,
1051 1054.
270