Libxtract: A Lightweight Library For Audio Feature Extraction
Libxtract: A Lightweight Library For Audio Feature Extraction
Libxtract: A Lightweight Library For Audio Feature Extraction
EXTRACTION
Jamie Bullock
UCE Birmingham Conservatoire
Music Technology
The central idea behind libxtract is that the feature extrac- Table 1. Some of the features provided by the library
tion functions should be modularised so they can be com-
bined arbitrarily. Central to this approach is the idea of a
cascaded extraction hierarchy. A simple example of this is by *data and *argv, and what the ’donor’ functions for
shown in Figure 1. This approach serves a dual purpose: any sub-features (passed in as part of the argument vector)
it avoids the duplication of ’subfeatures’, making compu- would be.
tation more efficient, and if the calling application allows,
The library has been written on the assumption that a
it enables a certain degree of experimentation. For exam-
contiguous block of data will be written to the input ar-
ple the user can easily create novel features by making
ray of the feature extraction functions. Some functions
unconventional extraction hierarchies.
assume the data represents a block of time-domain audio
libxtract seeks to provide a simple API for developers.
data, others use a special spectral data format, and others
This is achieved by using an array of function pointers
make no assumption about what the data represents. Some
as the primary means of calling extraction functions. A
of the functions may therefore be suitable for analysing
consequence of this is that all feature extraction functions
non-audio data.
have the same prototype for their arguments. The array of
function pointers can be indexed using an enumeration of Certain feature extraction functions require that the FFT
descriptively-named constants. A typical libxtract call in of an audio block be taken. The inclusion of FFT process-
the DSP loop of an application will look like this: ing is provided as a compile time option because it entails
a dependency on the FFTW library. Signal windowing
xtract[XTRACT_FUNCTION_NAME] and zero padding are provided as helper functions.
(input_vector, blocksize, argv,
output_vector);
This design makes libxtract particularly suitable for use 4. LIST OF FEATURES
in modular patching environments such as Pure Data and
Max/MSP, because it alleviates the need for the program
making use of libxtract to provide mappings between sym- It is beyond the scope of this paper to list all the features
bolic ’xtractor’ names and callback function names. provided by the library, but some of the most useful ones
libxtract divides features into scalar features, which give are listed in table 1. If the feature is ’of a spectrum’ it
the result as a single value, vector features, which give the denotes that the input data will follow the format of libx-
result as an array, and delta features, which have some tract’s spectral data types.
temporal element in their calculation process. To make
the process of incorporating the wide variety of features
(with their slightly varying argument requirements) eas-
5. PROGRAMS USING THE LIBRARY
ier, each extraction function has its own function descrip-
tor. The purpose of the function descriptor is to provide
useful ’self documentation’ about the feature extraction Despite the fact that libxtract is a relatively recent library,
function in question. This enables a calling application to it has already been incorporated into a number of useful
easily determine the expected format of the data pointed to programs.
5.1. Vamp libxtract plugin the data. Possible algorithms for the dimension reduc-
tion task include Neural Networks, k-NN and Multidi-
The Vamp analysis plugin API was devised by Chris Can- mensional Gauss[11]. Figure 2 shows a dimension re-
nam for the Sonic Visualiser 5 software. Sonic Visualiser duction implementation in Pure Data using the [xtract˜]
is an application for visualising features extracted from libxtract wrapper, and the [ann mlp] Fast Artificial Neural
audio files, it was developed at Queen Mary University Network 7 (FANN) library wrapper by Davide Morelli 8 .
of London, and has applications in musicology, signal- An extended version of this system has recently been
processing research and performance practice[4]. Sonic used in one of the author’s own compositions for Piano
Visualiser acts as a Vamp plugin host, with vamp plugins and Live electronics. For this particular piece, a PD patch
supplying analysis data to it. was created to detect whether a specific chord was being
libxtract is used to provide analysis functions for Sonic played, and to add a ’resonance’ effect to the Piano ac-
Visualiser using the vamp-libxtract-plugin. The vamp- cordingly. For the detection aspect of the patch, a selec-
libxtract-plugin acts as a wrapper for the libxtract library, tion of audio features, represented as floating point val-
making nearly the entire set of libxtract features avail- ues, are ’packed’ into a list using the PD [pack] object.
able to any vamp host. This is done by providing only This data is used to train the neural network (a multi-layer
the minimum set of feature combinations, the implication perceptron), by successively presenting it with input lists,
of this being that the facility to experiment with different followed by the corresponding expected output. Once the
cascaded features is lost. network has been trained (giving the minimum possible
error), it can operate in ’run’ mode, whereby it should
5.2. PD and Max/MSP externals give appropriate output when presented with new data that
shows similarity to the training data. With a close-mic’d
The libxtract library comes with a Pure Data (PD) exter- studio recording in a dry acoustic, an average detection
nal, which acts as a wrapper to the library’s API. This is an accuracy of 92% was achieved. This dropped to around
ideal use case because it enables feature extraction func- 70% in a concert environment. An exploration of these
tions to be cascaded arbitrarily, and for non-validated data results is beyond the scope of this paper.
to be passed in as arguments through the [xtract˜] object’s Another possible use case for the library is as a source
right inlet. The disadvantage of this approach is that it for continuous mapping to an output feature space. With
requires the user to learn how the library works, and to a continuous mapping, the classifier gives as output, a lo-
understand in a limited way what each function does. cation on a low-dimensional map rather than giving a dis-
A Max/MSP external is available, which provides func- crete classification ’decision’. This has been implemented
tionality that is analogous to the PD external. in one of the author’s recent works for Flute and live elec-
tronics, whereby the flautist can control the classifier’s
5.3. SC3 ugen output by modifying their instrumental timbre. Semantic
descriptors were used to tag specific map locations, and
There also exists a Supercollider libxtract wrapper by Dan proximity to these locations was measured and used as a
Stowell. This is implemented as a number of SuperCol- source of realtime control data. The system was used to
lider UGens, which are object-oriented multiply instan- measure the ’breathiness’ and ’shrillness’ of the flute tim-
tiable DSP units 6 . The primary focus of Stowell’s libx- bre. Further work could involve the recognition of timbral
tract work is a libxtract-based MFCC UGen, but several gestures in this resultant data stream.
other libxtract-based Ugens are under development.
7. EFFICIENCY AND REALTIME USE
6. EXTRACTING HIGHER LEVEL FEATURES
The library in its current form makes no guarantee about
The primary focus of the libxtract library is the extrac- how long it will take for a function to execute. For any
tion of relatively low-level features. However, one the given blocksize, there is no defined behaviour determining
main reasons for its development was that it could serve as what will happen if the function does not return in the
a basis for extracting more semantically meaningful fea- duration of the block.
tures. These could include psychoacoustic features such Most of the extraction functions have an algorithmic
as roughness, sharpness and loudness[10], some of which efficiency of O(n) or better, meaning that computation time
are included in the library; instrument classification outputs[3]; is usually proportional to the audio blocksize used. How-
or arbitrary user-defined descriptors such as ’Danceability’[7]. ever, because of the way in which the library has been de-
It is possible to extract these ’high level’ features by signed (flexibility of feature combination has been given
using libxtract to extract a feature vector, which can be priority), certain features end up being computed com-
constructed using the results from a range of extraction paratively inefficiently. For example if only the Kurtosis
functions. This vector can then be submitted to a map- feature was required in the system shown in figure 1, the
ping that entails a further reduction in dimensionality of functions xtract mean(), xtract variance(),
5 http://www.sonicvisualiser.org 7 http://leenissen.dk/fann/
6 http://mcld.co.uk/supercollider 8 http://www.davidemorelli.it
tors”, Introduction to MPEG-7 Multimedia
Content Description Interface, West Sussex,
England, 2003.
[2] Peeters, G. A large set of audio features
for sound description (similarity and classi-
fication) in the CUIDADO project,. IRCAM,
Paris, 2003.
[3] Fujinaga, I., and MacMillan, K. ”Realtime
recognition of orchestral instruments.” Pro-
ceedings of the Interface Computer Music
Conference 2000.
[4] Cannam, C., Landone, C., Sandler, M., and
Bello, J., P. ”The Sonic Visualiser: A Visual-
isaton Platform for Semantic Descriptors from
Musical Signals.” Proceedings of the 7th In-
ternational Conference on Music Information
Retrieval Victoria, Canada, 2006.
[5] Brossier, P., M. ”Automatic Annotation of
Musical Audio for Interactive Applications”
PhD Thesis Centre for Digital Music, Queen
Mary, University of London, UK, 2006.
[6] Lerch, A. ”FEAPI: A Low Level Feature Ex-
Figure 2. Audio feature vector construction and dimen- traction Plugin API” Proceedings of the 8th
sion reduction using libxtract and FANN bindings in Pure International Conference on Digital Audio Ef-
Data fects (DAFx) Madrid, Spain, 2005
[7] Amatriain, X., Massaguer, J., Garcia, D.,
xtract standard deviation() and xtract kurtosis() must all and Mosquera, I. ”The CLAM Annotator:
execute N iterations over their input (where N is the size of A Cross-platform Audio Descriptors Editing
the input array). The efficiency of xtract kurtosis() could Tool” Proceedings of the 6th International
be improved if the outputs from all the intermediate fea- Conference on Music Information Retrieval
tures were not exposed to the user or developer. London, UK, 2005
Tests show that all the features shown in table 1 can
be computed simultaneously with a blocksize of 512 sam- [8] Amatriain, X. ”An Object-Oriented Meta-
ples, and 20 Mel filter bands with a load of 20-22% on a model for Digital Signal Processing with a fo-
dual Intel 2.1GHz Macbook Pro Laptop running GNU/Linux cus on Audio and Music” PhD Thesis Mu-
with a 2.6 series kernel. This increases to 70% for a block sic Technology Group of the Institut Univer-
size of 8192, but removing xtract f0() reduces this figure sitari de l’Audiovisual at the Universitat Pom-
to 50%. peu Fabra, Barcelona, Spain, 2004
[9] McEnnis, D., McKay, C., Fujinaga, I., De-
8. CONCLUSIONS palle, P. ”jAudio: A Feature Extraction Li-
brary” Proceedings of the 6th International
In this paper I have described a new library that can be Conference on Music Information Retrieval
used for low level audio feature extraction. It is capable London, UK, 2005
of being used inside a realtime application, and serves as
a useful tool for experimentation with audio features. Use [10] Moore, B. C. J., Glasberg, B. R., and Baer,
of the library inside a variety of applications has been dis- T. ”A model for the prediction of thresholds,
cussed, along with a description of its role in extracting loudness and partial loudness” J. Audio Eng.
higher level, more abstract features. It can be concluded Soc., vol. 45, pp. 224-240 New York, USA,
that the library is a versatile tool for low-level feature ex- 1997
traction, with a simple and convenient API.
[11] Herrera-Boyer, P., Peeters, G., Dubnov, S.
”Automatic Classification of Musical Instru-
9. REFERENCES ment Sounds” Journal of New Music Re-
search, Volume 32, Issue 1, pages 3 - 21 Lon-
[1] Lindsay, T., Burnett, I., Quackenbush, S., don, UK, 2003
Jackson, M. ”Fundamentals of Audio Descrip-