Timbre Id
Timbre Id
William Brent
University of California, San Diego
Center for Research in Computing and the Arts
Several projects have been developed for the purpose of or- The following external objects for measuring basic features
ganizing sounds and/or querying an audio corpus based on are provided with timbreID: magSpec∼, specBrightness∼,
timbral similarity. CataRT and Soundspotter are among the specCentroid∼, specFlatness∼, specFlux∼, specIrregularity∼,
most widely recognized open source options [7][3]. The specKurtosis∼, specRolloff∼, specSkewness∼, specSpread∼,
former is available as a Max/MSP implementation, while and zeroCrossing∼. The more processed features in the set
the latter is intended for multiple platforms—including Pd. (generated by barkSpec∼, cepstrum∼, mfcc∼, and bfcc∼)
Soundspotter’s Pd realization is primarily designed for real are generally the most powerful for classification. Math-
time target-driven concatenative synthesis. More general ematical definitions for many of these measurements are
tools for creative work centered on timbre similarity are lim- given in a previous paper, along with an evaluation of their
ited in Pd. effectiveness [1]. Detailed information on sound descrip-
timbreID is a Pd external collection developed by the tors in general is available elsewhere [8][9]. Although an
author. It is composed of a group of objects for extract- understanding of the various analysis techniques is not re-
ing timbral features, and a classification object that man- quired for use, a general idea of what to expect can be very
ages the resulting database of information. The objects are helpful. To that effect, a simple demonstration and straight-
designed to be easy to use and adaptable for a number of forward explanation of each feature is given in its accompa-
purposes, including real-time timbre identification, ordering nying help file.
of sounds by timbre, target-driven concatenative synthesis, In order to facilitate as many types of usage as possible,
and plotting of sounds in a user-defined timbre space that non real-time versions of all feature externals are provided
can be auditioned interactively. This paper will summarize for analyzing samples directly from graphical arrays in Pd.
the most relevant features of the toolkit and describe its use
in the four applications listed above.
2.2. Open-ended analysis strategies
2. FEATURE EXTRACTION OBJECTS Independent, modular analysis objects allow for flexible anal-
ysis strategies. Each of the objects reports its results as ei-
In general, timbreID’s feature extraction objects have four ther a single number or a list that can be further manipulated
important qualities. First, each object maintains its own in Pd. Feature lists of any size can be packed together so
signal buffer based on a user-specified window size. This that users can design a custom approach that best suits their
particular sound set. Figure 1 demonstrates how to gener-
ate a feature list composed of MFCCs, spectral centroid,
and spectral brightness. Subsets of mel-frequency cepstral
coefficients (MFCCs) are frequently used for economically
representing spectral envelope, while spectral centroid and
brightness provide information about the distribution of spec-
tral energy in a signal. Each time the button in the upper
right region of the patch is clicked, a multi-feature analysis
snapshot composed of these features will be produced.
should be struck a few times at different dynamic levels. For 4.1. Vowel recognition
each strike, an onset detector like bonk∼ will send a bang
Identification of vowels articulated by a vocalist is a task
message to bfcc∼—the bark-frequency cepstral analysis ob-
best accomplished using the cepstrum∼ object. Under the
ject. Once a training database has been accumlated in this
right circumstances, cepstral analysis can achieve a rough
manner, bfcc∼’s output can be routed to timbreID’s second
deconvolution of two convolved signals. In the case of a
inlet, so that any new instrument onsets will generate a near-
sung voiced vowel, glottal impulses at a certain frequency
est match report from the first outlet. A match result is given
are convolved with a filter corresponding to the shape of the
as the index of the nearest matching instance as assigned
vocalist’s oral cavity. Depending on fundamental frequency,
during training. For each match, the second outlet reports
the cepstrum of such a signal will produce two distinctly
the distance between the input feature and its nearest match,
identifiable regions: a compact representation of the filter
and the third outlet produces a confidence measure based on
component at the low end, and higher up, a peak associated
the ratio of the first and second best match distances.
with the pitch of the note being sung. The filter region of the
For many sound sets, timbreID’s clustering function will
cepstrum should hold its shape reasonably steady in spite
automatically group features by instrument. A desired num-
of pitch changes, making it possible to identify vowels no
ber of clusters corresponding to the number of instruments
matter which pitch the vocalist happens to be singing. As
must be given with the “cluster” message, and an agglomer-
pitch moves higher, the cepstral peak actually moves lower,
ative hierarchical clustering algorithm will group instances
as the so-called “quefrency” axis corresponds to period—
according to current similarity metric settings. Afterward,
the inverse of frequency. If the pitch is very high, it will
timbreID will report the associated cluster index of the near-
overlap with the region representing the filter component,
est match in response to classification requests.
and destroy the potential for recognizing vowels regardless
Once training is complete, the resulting feature database
of pitch2 .
can be saved to a file for future use. There are four file
Having acknowledged these limitations, a useful pitch-
formats available: timbreID’s binary .timid format, a text
independent vowel recognition system can nevertheless be
format for users who wish to inspect the database, ARFF
arranged using timbreID objects very easily. Figure 4 shows
format for use in WEKA1 , and .mat format for use in either
a simplified excerpt of an example patch where cepstral co-
MATLAB or GNU octave.
efficients 2 through 40 are sent to timbreID’s training in-
let every time the red snapshot button is clicked. Although
3.1. timbreID settings identical results could be achieved without splitting off a
Nearest match searches are performed with a k-nearest neigh- specific portion of the cepstrum3 , pre-processing the feature
bor strategy, where K can be chosen by the user. Several 2 These qualities of cepstral analysis can be observed by sending
other settings related to the matching process can also be cepstrum∼’s output list to an array and graphing the analysis continuously
in real-time.
1 WEKA is a popular open source machine learning package described 3 The alternative would be to pass the entire cepstrum, but set timbreID’s
in [4] active attribute range to use only the 2nd through 40th coefficients in simi-
from cepstrum∼ to timbreID, a nearest match is identified
and its associated cluster index is sent out timbreID’s first
outlet. The example patch animates vowel classifications as
they occur.
Figure 7. 2400 string grains mapped with respect to ampli- [5] S. König, http://www.popmodernism.org/scrambledhackz.
tude and fundamental frequency. [6] M. Puckette, T. Apel, and D. Zicarelli, “Real-time audio
analysis tools for pd and msp,” in Proceedings of the
Figure 7 shows a plot of string sample grains mapped ac-
International Computer Music Conference, 1998, pp.
cording to RMS amplitude and fundamental frequency. Be-
109–112.
cause the frequencies in this particular sound file fall into
discrete pitch classes, its grains are visibly stratified along [7] D. Schwarz, G. Beller, B. Verbrugghe, and S. Britton,
the vertical dimension. “Real-time corpus-based concatenative synthesis with
Mapping is achieved by recovering features from tim- catart,” in Proceedings of the COST-G6 Gonference on
breID’s database with the “feature list” message, which is Digital Audio Effects (DAFx), Montreal, Canada, 2006,
sent with a database index indicating which instance to re- pp. 279–282.
port. The feature list for the specified instance is then sent
out of timbreID’s fifth outlet, and used to determine the in- [8] G. Tzanetakis and P. Cook, “Musical genre classifica-
stance’s position in feature space. tion of audio signals,” IEEE Transactions on Speech and
Audio Processing, vol. 10, no. 5, pp. 293–302, 2002.
5. CONCLUSION [9] X. Zhang and Z. Ras, “Analysis of sound features for
music timbre recognition,” in Proceedings of the IEEE
This paper has introduced some important features of the CS International Conference on Multimedia and Ubiq-
timbreID analysis/classification toolkit for Pd, and demon- uitous Engineering, 2007, pp. 3–8.
strated its adaptability to four unique tasks. Pd external