Librosa - Audio and Music Signal Analysis in Python SCIPY 2015

Librosa provides implementations of a variety of common functions used throughout the field of music information retrieval. In this document, a brief overview of the library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

Uploaded by

Kate Zen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

460 views

Librosa - Audio and Music Signal Analysis in Python SCIPY 2015

Uploaded by

Kate Zen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

PROC. OF THE 14th PYTHON IN SCIENCE CONF.

(SCIPY 2015) 1

librosa: Audio and Music Signal Analysis in Python

Brian McFee¶k∗ , Colin Raffel§ , Dawen Liang§ , Daniel P.W. Ellis§ , Matt McVicar‡ , Eric Battenberg∗∗ , Oriol Nietok

Abstract—This document describes version 0.4.0 of librosa: a Python pack- techniques readily available to the broader community of
age for audio and music signal processing. At a high level, librosa provides scientists and Python programmers.
implementations of a variety of common functions used throughout the field of
music information retrieval. In this document, a brief overview of the library’s
Design principles
functionality is provided, along with explanations of the design goals, software
development practices, and notational conventions. In designing librosa, we have prioritized a few key concepts.
First, we strive for a low barrier to entry for researchers
Index Terms—audio, music, signal processing familiar with MATLAB. In particular, we opted for a relatively
flat package layout, and following scipy [Jones01] rely upon
numpy data types and functions [VanDerWalt11], rather than
Introduction abstract class hierarchies.
Second, we expended considerable effort in standardizing
The emerging research field of music information retrieval
interfaces, variable names, and (default) parameter settings
(MIR) broadly covers topics at the intersection of musicol-
across the various analysis functions. This task was compli-
ogy, digital signal processing, machine learning, information
cated by the fact that reference implementations from which
retrieval, and library science. Although the field is relatively
our implementations are derived come from various authors,
young—the first international symposium on music informa-
and are often designed as one-off scripts rather than proper
tion retrieval (ISMIR)1 was held in October of 2000—it is
library functions with well-defined interfaces.
rapidly developing, thanks in part to the proliferation and
Third, wherever possible, we retain backwards compatibility
practical scientific needs of digital music services, such as
against existing reference implementations. This is achieved
iTunes, Pandora, and Spotify. While the preponderance of MIR
via regression testing for numerical equivalence of outputs.
research has been conducted with custom tools and scripts
All tests are implemented in the nose framework.4
developed by researchers in a variety of languages such as
Fourth, because MIR is a rapidly evolving field, we rec-
MATLAB or C++, the stability, scalability, and ease of use
ognize that the exact implementations provided by librosa
these tools has often left much to be desired.
may not represent the state of the art for any particular task.
In recent years, interest has grown within the MIR commu-
Consequently, functions are designed to be modular, allowing
nity in using (scientific) Python as a viable alternative. This has
practitioners to provide their own functions when appropriate,
been driven by a confluence of several factors, including the
e.g., a custom onset strength estimate may be provided to the
availability of high-quality machine learning libraries such as
beat tracker as a function argument. This allows researchers to
scikit-learn [Pedregosa11] and tools based on Theano
leverage existing library functions while experimenting with
[Bergstra11], as well as Python’s vast catalog of packages
improvements to specific components. Although this seems
for dealing with text data and web services. However, the
simple and obvious, from a practical standpoint the monolithic
adoption of Python has been slowed by the absence of a
designs and lack of interoperability between different research
stable core library that provides the basic routines upon which
codebases have historically made this difficult.
many MIR applications are built. To remedy this situation,
Finally, we strive for readable code, thorough documen-
we have developed librosa:2 a Python package for audio and
tation and exhaustive testing. All development is conducted
music signal processing.3 In doing so, we hope to both ease
on GitHub. We apply modern software development prac-
the transition of MIR researchers into Python (and modern
tices, such as continuous integration testing (via Travis5 ) and
software development practices), and also to make core MIR
coverage (via Coveralls6 ). All functions are implemented in
∗ Corresponding author: [email protected]
pure Python, thoroughly documented using Sphinx, and in-
¶ Center for Data Science, New York University clude example code demonstrating usage. The implementation
k Music and Audio Research Laboratory, New York University mostly complies with PEP-8 recommendations, with a small
§ LabROSA, Columbia University
‡ Department of Engineering Mathematics, University of Bristol 1. http://ismir.net
∗∗ Silicon Valley AI Lab, Baidu, Inc.
2. https://github.com/bmcfee/librosa
Copyright c 2015 Brian McFee et al. This is an open-access article dis- 3. The name librosa is borrowed from LabROSA : the LABoratory for the
tributed under the terms of the Creative Commons Attribution License, Recognition and Organization of Speech and Audio at Columbia University,
which permits unrestricted use, distribution, and reproduction in any medium, where the initial development of librosa took place.
provided the original author and source are credited. 4. https://nose.readthedocs.org/en/latest/
2 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)

set of exceptions for variable names that make the code Package organization
more concise without sacrificing clarity: e.g., y and sr are In this section, we give a brief overview of the structure
preferred over more verbose names such as audio_buffer of the librosa software package. This overview is intended
and sampling_rate. to be superficial and cover only the most commonly used
functionality. A complete API reference can be found at
Conventions
https://bmcfee.github.io/librosa.
In general, librosa’s functions tend to expose all relevant
parameters to the caller. While this provides a great deal Core functionality
of flexibility to expert users, it can be overwhelming to
The librosa.core submodule includes a range of com-
novice users who simply need a consistent interface to process
monly used functions. Broadly, core functionality falls into
audio files. To satisfy both needs, we define a set of general
four categories: audio and time-series operations, spectro-
conventions and standardized default parameter values shared
gram calculation, time and frequency conversion, and pitch
across many functions.
operations. For convenience, all functions within the core
An audio signal is represented as a one-dimensional numpy
submodule are aliased at the top level of the package hierarchy,
array, denoted as y throughout librosa. Typically the signal y is
e.g., librosa.core.load is aliased to librosa.load.
accompanied by the sampling rate (denoted sr) which denotes
Audio and time-series operations include functions
the frequency (in Hz) at which values of y are sampled. The
such as: reading audio from disk via the audioread
duration of a signal can then be computed by dividing the
package7 (core.load), resampling a signal at a desired
number of samples by the sampling rate:
rate (core.resample), stereo to mono conversion
>>> duration_seconds = float(len(y)) / sr
(core.to_mono), time-domain bounded auto-correlation
By default, when loading stereo audio files, the (core.autocorrelate), and zero-crossing detection
librosa.load() function downmixes to mono by (core.zero_crossings).
averaging left- and right-channels, and then resamples the Spectrogram operations include the short-time Fourier trans-
monophonic signal to the default rate sr=22050 Hz. form (stft), inverse STFT (istft), and instantaneous
Most audio analysis methods operate not at the native frequency spectrogram (ifgram) [Abe95], which provide
sampling rate of the signal, but over small frames of the much of the core functionality for down-stream feature anal-
signal which are spaced by a hop length (in samples). The ysis. Additionally, an efficient constant-Q transform (cqt)
default frame and hop lengths are set to 2048 and 512 samples, implementation based upon the recursive down-sampling
respectively. At the default sampling rate of 22050 Hz, this method of Schoerkhuber and Klapuri [Schoerkhuber10] is
corresponds to overlapping frames of approximately 93ms provided, which produces logarithmically-spaced frequency
spaced by 23ms. Frames are centered by default, so frame representations suitable for pitch-based signal analysis. Fi-
index t corresponds to the slice: nally, logamplitude provides a flexible and robust imple-
y[(t * hop_length - frame_length / 2): mentation of log-amplitude scaling, which can be used to avoid
(t * hop_length + frame_length / 2)], numerical underflow and set an adaptive noise floor when
where boundary conditions are handled by reflection-padding converting from linear amplitude.
the input signal y. Unless otherwise specified, all sliding- Because data may be represented in a variety of time or fre-
window analyses use Hann windows by default. For analyses quency units, we provide a comprehensive set of convenience
that do not use fixed-width frames (such as the constant- functions to map between different time representations: sec-
Q transform), the default hop length of 512 is retained to onds, frames, or samples; and frequency representations: hertz,
facilitate alignment of results. constant-Q basis index, Fourier basis index, Mel basis index,
The majority of feature analyses implemented by MIDI note number, or note in scientific pitch notation.
librosa produce two-dimensional outputs stored as Finally, the core submodule provides functionality to esti-
numpy.ndarray, e.g., S[f, t] might contain the mate the dominant frequency of STFT bins via parabolic in-
energy within a particular frequency band f at frame index terpolation (piptrack) [Smith11], and estimation of tuning
t. We follow the convention that the final dimension provides deviation (in cents) from the reference A440. These functions
the index over time, e.g., S[:, 0], S[:, 1] access allow pitch-based analyses (e.g., cqt) to dynamically adapt
features at the first and second frames. Feature arrays are filter banks to match the global tuning offset of a particular
organized column-major (Fortran style) in memory, so that audio signal.
common access patterns benefit from cache locality.
Spectral features
By default, all pitch-based analyses are assumed to be
relative to a 12-bin equal-tempered chromatic scale with a Spectral representations—the distributions of energy over a
reference tuning of A440 = 440.0 Hz. Pitch and pitch- set of frequencies—form the basis of many analysis tech-
class analyses are arranged such that the 0th bin corresponds niques in MIR and digital signal processing in general. The
to C for pitch class or C1 (32.7 Hz) for absolute pitch librosa.feature module implements a variety of spectral
measurements. representations, most of which are based upon the short-time
Fourier transform.
5. https://travis-ci.org
6. https://coveralls.io 7. https://github.com/sampsyo/audioread
LIBROSA: AUDIO AND MUSIC SIGNAL ANALYSIS IN PYTHON 3

11025 0 dB
STFT log power -6 dB >>> melspec = librosa.feature.melspectrogram(y=y,
8268 -12 dB
-18 dB ... sr=sr)
-24 dB >>> chroma = librosa.feature.chroma_cqt(y=y,
5512 -30 dB
Hz

-36 dB ... sr=sr)

2756 -42 dB
-48 dB >>> tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
-54 dB
0 -60 dB
10453
Mel spectrogram log power
0 dB
-6 dB
In addition to Mel and chroma features, the feature
-12 dB
4462 -18 dB submodule provides a number of spectral statistic
-24 dB
-30 dB representations, including spectral_centroid,
Hz

1905
-36 dB
799 -42 dB
-48 dB
spectral_bandwidth, spectral_rolloff
0
-54 dB
-60 dB [Klapuri07], and spectral_contrast [Jiang02].10
B
Chroma
1.0
0.9
Finally, the feature submodule provides a few func-
A 0.8 tions to implement common transformations of time-series
0.7
Pitch class

G 0.6
F 0.5 features in MIR. This includes delta, which provides a
E 0.4
D
0.3
0.2
smoothed estimate of the time derivative; stack_memory,
C 0.1 which concatenates an input feature array with time-lagged
0.0

M3y 0.60 copies of itself (effectively simulating feature n-grams); and

Tonnetz
M3x 0.45 sync, which applies a user-supplied aggregation function
0.30
Tonnetz

m3y 0.15 (e.g., numpy.mean or median) across specified column

m3x 0.00
5y 0.15 intervals.
0.30
5x
0.45
25.00s 27.86s 30.72s 33.58s 36.44s 39.30s 42.16s 45.02s Display
Fig. 1: First: the short-time Fourier transform of a The display module provides simple interfaces to visu-
20-second audio clip (librosa.stft). Second: the ally render audio data through matplotlib [Hunter07].
corresponding Mel spectrogram, using 128 Mel bands The first function, display.waveplot simply renders the
(librosa.feature.melspectrogram). Third: the amplitude envelope of an audio signal y using matplotlib’s
corresponding chromagram (librosa.feature.chroma_cqt).
Fourth: the Tonnetz features (librosa.feature.tonnetz). fill_between function. For efficiency purposes, the signal
is dynamically down-sampled. Mono signals are rendered
symmetrically about the horizontal axis; stereo signals are
The Mel frequency scale is commonly used to represent rendered with the left-channel’s amplitude above the axis
audio signals, as it provides a rough model of human fre- and the right-channel’s below. An example of waveplot is
quency perception [Stevens37]. Both a Mel-scale spectro- depicted in Figure 2 (top).
gram (librosa.feature.melspectrogram) and the The second function, display.specshow wraps mat-
commonly used Mel-frequency Cepstral Coefficients (MFCC) plotlib’s imshow function with default settings (origin and
(librosa.feature.mfcc) are provided. By default, Mel aspect) adapted to the expected defaults for visualizing
scales are defined to match the implementation provided by spectrograms. Additionally, specshow dynamically selects
Slaney’s auditory toolbox [Slaney98], but they can be made appropriate colormaps (binary, sequential, or diverging) from
to match the Hidden Markov Model Toolkit (HTK) by setting the data type and range.11 Finally, specshow provides a
the flag htk=True [Young97]. variety of acoustically relevant axis labeling and scaling pa-
rameters. Examples of specshow output are displayed in
While Mel-scaled representations are commonly used to
Figures 1 and 2 (middle).
capture timbral aspects of music, they provide poor reso-
lution of pitches and pitch classes. Pitch class (or chroma)
Onsets, tempo, and beats
representations are often used to encode harmony while
suppressing variations in octave height, loudness, or timbre. While the spectral feature representations described above
Two flexible chroma implementations are provided: one uses capture frequency information, time information is equally
a fixed-window STFT analysis (chroma_stft)8 and the important for many applications in MIR. For instance, it can
other uses variable-window constant-Q transform analysis be beneficial to analyze signals indexed by note or beat events,
(chroma_cqt). An alternative representation of pitch and rather than absolute time. The onset and beat submodules
harmony can be obtained by the tonnetz function, which es- implement functions to estimate various aspects of timing in
timates tonal centroids as coordinates in a six-dimensional in- music.
terval space using the method of Harte et al. [Harte06]. Figure 8. chroma_stft is based upon the reference implementation provided
1 illustrates the difference between STFT, Mel spectrogram, at http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
chromagram, and Tonnetz representations, as constructed by 9. For display purposes, spectrograms are scaled by
the following code fragment:9 librosa.logamplitude. We refer readers to the accompanying
IPython notebook for the full source code to recontsruct figures.
>>> filename = librosa.util.example_audio_file() 10. spectral_* functions are derived from MATLAB reference imple-
>>> y, sr = librosa.load(filename, mentations provided by the METLab at Drexel University. http://music.ece.
... offset=25.0, drexel.edu/
... duration=20.0) 11. If the seaborn package [Waskom14] is available, its version of
>>> spectrogram = np.abs(librosa.stft(y)) cubehelix is used for sequential data.
4 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)

More specifically, the onset module provides two to detect musical structure. It is sometimes more convenient
functions: onset_strength and onset_detect. The to operate in time-lag coordinates, rather than time-time,
onset_strength function calculates a thresholded spec- which transforms diagonal structures into more easily
tral flux operation over a spectrogram, and returns a one- detectable horizontal structures (Figure 3, right) [Serra12].
dimensional array representing the amount of increasing spec- This is facilitated by the recurrence_to_lag (and
tral energy at each frame. This is illustrated as the blue curve in lag_to_recurrence) functions.
the bottom panel of Figure 2. The onset_detect function, Second, temporally constrained clustering can be
on the other hand, selects peak positions from the onset used to detect feature change-points without relying
strength curve following the heuristic described by Boeck et upon repetition. This is implemented in librosa by
al. [Boeck12]. The output of onset_detect is depicted as the segment.agglomerative function, which
red circles in the bottom panel of Figure 2. uses scikit-learn’s implementation of Ward’s
The beat module provides functions to estimate the global agglomerative clustering method [Ward63] to partition
tempo and positions of beat events from the onset strength the input into a user-defined number of contiguous
function, using the method of Ellis [Ellis07]. More specifically, components. In practice, a user can override the
the beat tracker first estimates the tempo, which is then used default clustering parameters by providing an existing
to set the target spacing between peaks in an onset strength sklearn.cluster.AgglomerativeClustering
function. The output of the beat tracker is displayed as the object as an argument to segment.agglomerative().
dashed green lines in Figure 2 (bottom).
Tying this all together, the tempo and beat positions for an Decompositions
input signal can be easily calculated by the following code Many applications in MIR operate upon latent factor represen-
fragment: tations, or other decompositions of spectrograms. For exam-
>>> y, sr = librosa.load(FILENAME)
ple, it is common to apply non-negative matrix factorization
>>> tempo, frames = librosa.beat.beat_track(y=y, (NMF) [Lee99] to magnitude spectra, and analyze the statistics
... sr=sr) of the resulting time-varying activation functions, rather than
>>> beat_times = librosa.frames_to_time(frames,
... sr=sr)
the raw observations.
The decompose module provides a simple interface to fac-
Any of the default parameters and analyses may be overridden. tor spectrograms (or general feature arrays) into components
For example, if the user has calculated an onset strength and activations:
envelope by some other means, it can be provided to the beat >>> comps, acts = librosa.decompose.decompose(S)
tracker as follows:
By default, the decompose() function constructs
>>> oenv = some_other_onset_function(y, sr) a scikit-learn NMF object, and applies its
>>> librosa.beat.beat_track(onset_envelope=oenv)
fit_transform() method to the transpose of S. The
All detection functions (beat and onset) return events as frame resulting basis components and activations are accordingly
indices, rather than absolute timing. The downside of this is transposed, so that comps.dot(acts) approximates S. If
that it is left to the user to convert frame indices back to the user wishes to apply some other decomposition technique,
absolute time. However, in our opinion, this is outweighed any object fitting the sklearn.decomposition interface
by two practical benefits: it simplifies the implementations, may be substituted:
and it makes the results directly accessible to frame-indexed >>> T = SomeDecomposer()
>>> librosa.decompose.decompose(S, transformer=T)
functions such as librosa.feature.sync.
In addition to general-purpose matrix decomposition tech-
Structural analysis
niques, librosa also implements the harmonic-percussive
Onsets and beats provide relatively low-level timing cues source separation (HPSS) method of Fitzgerald [Fitzgerald10]
for music signal processing. Higher-level analyses attempt to as decompose.hpss. This technique is commonly used in
detect larger structure in music, e.g., at the level of bars or MIR to suppress transients when analyzing pitch content, or
functional components such as verse and chorus. While this suppress stationary signals when detecting onsets or other
is an active area of research that has seen rapid progress in rhythmic elements. An example application of HPSS is illus-
recent years, there are some useful features common to many trated in Figure 4.
approaches. The segment submodule contains a few useful
functions to facilitate structural analysis in music, falling Effects
broadly into two categories. The effects module provides convenience functions for
First, there are functions to calculate and applying spectrogram-based transformations to time-domain
manipulate recurrence or self-similarity plots. The signals. For instance, rather than writing
segment.recurrence_matrix constructs a binary
k-nearest-neighbor similarity matrix from a given feature >>> D = librosa.stft(y)
>>> Dh, Dp = librosa.decompose.hpss(D)
array and a user-specified distance function. As displayed >>> y_harmonic = librosa.istft(Dh)
in Figure 3 (left), repeating sequences often appear as
diagonal bands in the recurrence plot, which can be used one may simply write
LIBROSA: AUDIO AND MUSIC SIGNAL ANALYSIS IN PYTHON 5

y (mono signal)

11025
STFT log power
2196

1098
Hz

452

Onset strength
Detected note onsets
Detected beats

25.00s 27.87s 30.73s 33.60s 36.46s 39.33s 42.20s 45.06s

Fig. 2: Top: a waveform plot for a 20-second audio clip y, generated by librosa.display.waveplot. Middle: the log-power short-time
Fourier transform (STFT) spectrum for y plotted on a logarithmic frequency scale, generated by librosa.display.specshow. Bottom:
the onset strength function (librosa.onset.onset_strength), detected onset events (librosa.onset.onset_detect), and
detected beat events (librosa.beat.beat_track) for y.

45.02s 0.00s
Recurrence plot Lag plot
42.16s
-10.01s
39.30s

36.44s
Lag

19.99s
33.58s

30.72s
9.98s
27.86s

25.00s 0.00s
25.00s 27.86s 30.72s 33.58s 36.44s 39.30s 42.16s 45.02s 25.00s 27.86s 30.72s 33.58s 36.44s 39.30s 42.16s 45.02s

Fig. 3: Left: the recurrence plot derived from the chroma features displayed in Figure 1. Right: the corresponding time-lag plot.

>>> y_harmonic = librosa.effects.harmonic(y) validation of time intervals. The resulting outputs are designed
to work with other common MIR tools, such as mir_eval
Convenience functions are provided for HPSS (retaining the [Raffel14] and sonic-visualiser [Cannam10].
harmonic, percussive, or both components), time-stretching The output module also provides the write_wav
and pitch-shifting. Although these functions provide no ad- function for saving audio in .wav format. The write_wav
ditional functionality, their inclusion results in simpler, more simply wraps the built-in scipy wav-file writer
readable application code. (scipy.io.wavfile.write) with validation and
optional normalization, thus ensuring that the resulting audio
Output files are well-formed.
The output module includes utility functions to save the
results of audio analysis to disk. Most often, this takes the form Caching
of annotated instantaneous event timings or time intervals,
MIR applications typically require computing a variety of fea-
which are saved in plain text (comma- or tab-separated values)
tures (e.g., MFCCs, chroma, beat timings, etc) from each audio
via output.times_csv and output.annotation, re-
signal in a collection. Assuming the application programmer
spectively. These functions are somewhat redundant with al-
is content with default parameters, the simplest way to achieve
ternative functions for text output (e.g., numpy.savetxt),
this is to call each function using audio time-series input, e.g.:
but provide sanity checks for length agreement and semantic
6 PROC. OF THE 14th PYTHON IN SCIENCE CONF. (SCIPY 2015)

harmonic Parameter Description Values

percussive
fmax Maximum frequency value (Hz) 8000, 11025
n_mels Number of Mel bands 32, 64, 128
aggregate Spectral flux aggregation function np.mean,
np.median
10453
Harmonic mel spectrogram ac_size Maximum lag for onset autocor- 2, 4, 8
4462 relation (s)
std_bpm Deviation of tempo estimates 0.5, 1.0, 2.0
Hz

1905
from 120.0 BPM
799
tightness Penalty for deviation from esti- 50, 100, 400
0
mated tempo
10453
Percussive mel spectrogram
4462
TABLE 1: The parameter grid for beat tracking optimization. The
Hz

1905

799
best configuration is indicated in bold.
0
25.00s 27.86s 30.72s 33.58s 36.44s 39.30s 42.16s 45.02s
The cache object is disabled by default, but can be activated
Fig. 4: Top: the separated harmonic and percussive waveforms.
by setting the environment variable LIBROSA_CACHE_DIR
Middle: the Mel spectrogram of the harmonic component. Bottom:
the Mel spectrogram of the percussive component. prior to importing the package. Because the Memory object
does not implement a cache eviction policy (as of version
0.8.4), it is recommended that users purge the cache after
>>> mfcc = librosa.feature.mfcc(y=y, sr=sr) processing each audio file to prevent the cache from filling
>>> tempo, beats = librosa.beat.beat_track(y=y, all available disk space13 . We note that this can potentially
... sr=sr) introduce race conditions in multi-processing environments
(i.e., parallel batch processing of a corpus), so care must be
However, because there are shared computations between the taken when scheduling cache purges.
different functions—mfcc and beat_track both compute
log-scaled Mel spectrograms, for example—this results in
Parameter tuning
redundant (and inefficient) computation. A more efficient
implementation of the above example would factor out the Some of librosa’s functions have parameters that require some
redundant features: degree of tuning to optimize performance. In particular, the
performance of the beat tracker and onset detection functions
>>> lms = librosa.logamplitude(
... librosa.feature.melspectrogram(y=y,
can vary substantially with small changes in certain key
... sr=sr)) parameters.
>>> mfcc = librosa.feature.mfcc(S=lms) After standardizing certain default parameters—sampling
>>> tempo, beats = librosa.beat.beat_track(S=lms,
... sr=sr)
rate, frame length, and hop length—across all functions, we
optimized the beat tracker settings using the parameter grid
Although it is more computationally efficient, the above ex- given in Table 1. To select the best-performing configura-
ample is less concise, and it requires more knowledge of tion, we evaluated the performance on a data set comprised
the implementations on behalf of the application programmer. of the Isophonics Beatles corpus14 and the SMC Dataset2
More generally, nearly all functions in librosa eventually de- [Holzapfel12] beat annotations. Each configuration was eval-
pend upon STFT calculation, but it is rare that the application uated using mir_eval [Raffel14], and the configuration was
programmer will need the STFT matrix as an end-result. chosen to maximize the Correct Metric Level (Total) metric
One approach to eliminate redundant computation is to [Davies14].
decompose the various functions into blocks which can be Similarly, the onset detection parameters (listed in Table 2)
arranged in a computation graph, as is done in Essentia were selected to optimize the F1-score on the Johannes Kepler
[Bogdanov13]. However, this approach necessarily constrains University onset database.15
the function interfaces, and may become unwieldy for com- We note that the "optimal" default parameter settings are
mon, simple applications. merely estimates, and depend upon the datasets over which
Instead, librosa takes a lazy approach to eliminating redun- they are selected. The parameter settings are therefore subject
dancy via output caching. Caching is implemented through an to change in the future as larger reference collections become
extension of the Memory class from the joblib package12 , available. The optimization framework has been factored out
which provides disk-backed memoization of function outputs. into a separate repository, which may in subsequent versions
The cache object (librosa.cache) operates as a decorator grow to include additional parameters.16
on all non-trivial computations. This way, a user can write
12. https://github.com/joblib/joblib
simple application code (i.e., the first example above) while 13. The cache can be purged by calling librosa.cache.clear().
transparently eliminating redundancies and achieving speed 14. http://isophonics.net/content/reference-annotations
comparable to the more advanced implementation (the second 15. https://github.com/CPJKU/onset_db
example). 16. https://github.com/bmcfee/librosa_parameters
LIBROSA: AUDIO AND MUSIC SIGNAL ANALYSIS IN PYTHON 7

Parameter Description Values [Slaney98] Slaney, Malcolm. Auditory toolbox. Interval Research Cor-
poration, Tech. Rep 10 (1998): 1998.
fmax Maximum frequency value 8000, 11025 [Young97] Young, Steve, Evermann, Gunnar, Gales, Mark, Hain,
(Hz) Thomas, Kershaw, Dan, Liu, Xunying (Andrew), Moore,
n_mels Number of Mel bands 32, 64, 128 Gareth, Odell, Julian, Ollason, Dave, Povey, Dan, Valtchev,
aggregate Spectral flux aggregation np.mean, Valtcho, and Woodland, Phil. The HTK book. Vol. 2. Cam-
function np.median bridge: Entropic Cambridge Research Laboratory, 1997.
[Harte06] Harte, C., Sandler, M., & Gasser, M. (2006). Detecting
delta Peak picking threshold 0.0--0.10 (0.07)
Harmonic Change in Musical Audio. In Proceedings of
the 1st ACM Workshop on Audio and Music Computing
Multimedia (pp. 21-26). Santa Barbara, CA, USA: ACM
TABLE 2: The parameter grid for onest detection optimization. The Press. doi:10.1145/1178723.1178727.
best configuration is indicated in bold. [Jiang02] Jiang, Dan-Ning, Lie Lu, Hong-Jiang Zhang, Jian-Hua
Tao, and Lian-Hong Cai. Music type classification by
spectral contrast feature. In ICME’02. vol. 1, pp. 113-116.
IEEE, 2002.
Conclusion [Klapuri07] Klapuri, Anssi, and Manuel Davy, eds. Signal processing
This document provides a brief summary of the design consid- methods for music transcription. Springer Science & Busi-
ness Media, 2007.
erations and functionality of librosa. More detailed examples, [Hunter07] Hunter, John D. Matplotlib: A 2D graphics environment.
notebooks, and documentation can be found in our devel- Computing in science and engineering 9, no. 3 (2007):
opment repository and project website. The project is under 90-95.
[Waskom14] Michael Waskom, Olga Botvinnik, Paul Hobson, John B.
active development, and our roadmap for future work includes Cole, Yaroslav Halchenko, Stephan Hoyer, Alistair Miles,
efficiency improvements and enhanced functionality of audio et al. Seaborn: v0.5.0 (November 2014). ZENODO, 2014.
coding and file system interactions. doi:10.5281/zenodo.12710.
[Boeck12] Böck, Sebastian, Florian Krebs, and Markus Schedl. Eval-
uating the Online Capabilities of Onset Detection Meth-
Citing librosa ods. In 11th International Society for Music Information
We request that when using librosa in academic work, authors Retrieval Conference (ISMIR 2012), pp. 49-54. 2012.
[Ellis07] Ellis, Daniel P.W. Beat tracking by dynamic programming.
cite the Zenodo reference [McFee15]. For references to the Journal of New Music Research 36, no. 1 (2007): 51-60.
design of the library, citation of the present document is [Serra12] Serra, Joan, Meinard Müller, Peter Grosche, and Josep
appropriate. Lluis Arcos. Unsupervised detection of music boundaries
by time series structure features. In Twenty-Sixth AAAI
Conference on Artificial Intelligence. 2012.
Acknowledgements
[Ward63] Ward Jr, Joe H. Hierarchical grouping to optimize an
BM acknowledges support from the Moore-Sloan Data Sci- objective function. Journal of the American statistical
ence Environment at NYU. Additional support was provided association 58, no. 301 (1963): 236-244.
[Lee99] Lee, Daniel D., and H. Sebastian Seung. Learning the parts
by NSF grant IIS-1117015. of objects by non-negative matrix factorization. Nature
401, no. 6755 (1999): 788-791.
R EFERENCES [Fitzgerald10] Fitzgerald, Derry. Harmonic/percussive separation using
median filtering. 13th International Conference on Digital
[Pedregosa11] Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Audio Effects (DAFX10), Graz, Austria, 2010.
Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu [Cannam10] Cannam, Chris, Christian Landone, and Mark Sandler.
Blondel et al. Scikit-learn: Machine learning in Python. Sonic visualiser: An open source application for viewing,
The Journal of Machine Learning Research 12 (2011): analysing, and annotating music audio files. In Proceed-
2825-2830. ings of the international conference on Multimedia, pp.
[Bergstra11] Bergstra, James, Frédéric Bastien, Olivier Breuleux, Pascal 1467-1468. ACM, 2010.
Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume [Holzapfel12] Holzapfel, Andre, Matthew E.P. Davies, José R. Zapata,
Desjardins et al. Theano: Deep learning on gpus with João Lobato Oliveira, and Fabien Gouyon. Selective sam-
python. In NIPS 2011, BigLearning Workshop, Granada, pling for beat tracking evaluation. Audio, Speech, and
Spain. 2011. Language Processing, IEEE Transactions on 20, no. 9
[Jones01] Jones, Eric, Travis Oliphant, and Pearu Peterson. SciPy: (2012): 2539-2548.
Open source scientific tools for Python. http://www.scipy. [Davies14] Davies, Matthew E.P., and Boeck, Sebastian. Evaluating
org/ (2001). the evaluation measures for beat tracking. In 15th Interna-
[VanDerWalt11] Van Der Walt, Stefan, S. Chris Colbert, and Gael Varo- tional Society for Music Information Retrieval Conference
quaux. The NumPy array: a structure for efficient numeri- (ISMIR 2014), 2014.
cal computation. Computing in Science & Engineering 13, [Raffel14] Raffel, Colin, Brian McFee, Eric J. Humphrey, Justin
no. 2 (2011): 22-30. Salamon, Oriol Nieto, Dawen Liang, and Daniel PW Ellis.
[Abe95] Abe, Toshihiko, Takao Kobayashi, and Satoshi Imai. Har- mir eval: A transparent implementation of common MIR
monics tracking and pitch extraction based on instanta- metrics. In 15th International Society for Music Infor-
neous frequency. International Conference on Acoustics, mation Retrieval Conference (ISMIR 2014), pp. 367-372.
Speech, and Signal Processing, ICASSP-95., Vol. 1. IEEE, 2014.
1995. [Bogdanov13] Bogdanov, Dmitry, Nicolas Wack, Emilia Gómez, Sankalp
[Schoerkhuber10] Schoerkhuber, Christian, and Anssi Klapuri. Constant-Q Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma,
transform toolbox for music processing. 7th Sound and Justin Salamon, José R. Zapata, and Xavier Serra. Essen-
Music Computing Conference, Barcelona, Spain. 2010. tia: An Audio Analysis Library for Music Information Re-
[Smith11] Smith, J.O. "Sinusoidal Peak Interpolation", in Spectral trieval. In 12th International Society for Music Information
Audio Signal Processing, https://ccrma.stanford.edu/~jos/ Retrieval Conference (ISMIR 2013), pp. 493-498. 2013.
sasp/Sinusoidal_Peak_Interpolation.html , online book, [McFee15] Brian McFee, Matt McVicar, Colin Raffel, Dawen Liang,
2011 edition, accessed 2015-06-15. Oriol Nieto, Josh Moore, Dan Ellis, et al. Librosa: v0.4.0.
[Stevens37] Stevens, Stanley Smith, John Volkmann, and Edwin B. Zenodo, 2015. doi:10.5281/zenodo.18369.
Newman. A scale for the measurement of the psychological
magnitude pitch. The Journal of the Acoustical Society of
America 8, no. 3 (1937): 185-190.

Boundary Representation Modelling Techniques - Ian Stroud PDF
No ratings yet
Boundary Representation Modelling Techniques - Ian Stroud PDF
789 pages
Astm A578
100% (3)
Astm A578
5 pages
A Matlab Script To Explore Linear Predictive Coding With Vocal
No ratings yet
A Matlab Script To Explore Linear Predictive Coding With Vocal
6 pages
NNG Reference Manual, Second Edition
From Everand
NNG Reference Manual, Second Edition
Garrett D'Amore
No ratings yet
A Digital Signal Processing Primer, With Applications To Digital Audio and Computer Music
No ratings yet
A Digital Signal Processing Primer, With Applications To Digital Audio and Computer Music
164 pages
Intro To Numerical Computing
No ratings yet
Intro To Numerical Computing
10 pages
(DFT) Julius O. Smith III-Mathematics of The Discrete Fourier Transform-W3K Publishing (2003)
No ratings yet
(DFT) Julius O. Smith III-Mathematics of The Discrete Fourier Transform-W3K Publishing (2003)
431 pages
Matlab Music Synthesis
0% (1)
Matlab Music Synthesis
11 pages
FIR and IIR Filter Design Using Matlab: Prof. C.M Kyung
No ratings yet
FIR and IIR Filter Design Using Matlab: Prof. C.M Kyung
12 pages
DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
The Matrix in Theory
100% (8)
The Matrix in Theory
315 pages
BorgWarner Turboalimentadores
100% (2)
BorgWarner Turboalimentadores
108 pages
Python For Audio Processing
No ratings yet
Python For Audio Processing
8 pages
Alias-Free Digital Synthesis of Classic Analog Waveforms
No ratings yet
Alias-Free Digital Synthesis of Classic Analog Waveforms
12 pages
Toby Newman How To Write A VST Plugin
No ratings yet
Toby Newman How To Write A VST Plugin
77 pages
Waldorf Microwave Masterclass PDF
No ratings yet
Waldorf Microwave Masterclass PDF
10 pages
Matlab Implementation of Reverberation Algorithms
No ratings yet
Matlab Implementation of Reverberation Algorithms
6 pages
Audio Processing Using Matlab
No ratings yet
Audio Processing Using Matlab
12 pages
Fourier Transform (For Non-Periodic Signals)
No ratings yet
Fourier Transform (For Non-Periodic Signals)
27 pages
AudioMulch Help PDF
No ratings yet
AudioMulch Help PDF
437 pages
Modal Synthesis of Weapon Sounds PDF
No ratings yet
Modal Synthesis of Weapon Sounds PDF
7 pages
The Murmurator: A Flocking Simulation-Driven Multi-Channel Software Instrument For Collaborative Improvisation
100% (1)
The Murmurator: A Flocking Simulation-Driven Multi-Channel Software Instrument For Collaborative Improvisation
6 pages
Tutorial For Frequency Modulation Synthesis Barry Truax
No ratings yet
Tutorial For Frequency Modulation Synthesis Barry Truax
7 pages
Spectrogram Using Short-Time Fourier Transform - MATLAB Spectrogram PDF
No ratings yet
Spectrogram Using Short-Time Fourier Transform - MATLAB Spectrogram PDF
25 pages
Introduction To Wavelet
No ratings yet
Introduction To Wavelet
44 pages
A Not-So-Short Introduction To Version 0.x: - Draft
No ratings yet
A Not-So-Short Introduction To Version 0.x: - Draft
212 pages
Sound and Digital Audio
No ratings yet
Sound and Digital Audio
27 pages
Spectrogram
No ratings yet
Spectrogram
17 pages
Cottle Computer Music Using SC3
100% (1)
Cottle Computer Music Using SC3
403 pages
NEL Spectral Suite For Kyma - Class Descriptions (1st Draft)
No ratings yet
NEL Spectral Suite For Kyma - Class Descriptions (1st Draft)
25 pages
Lab 5
No ratings yet
Lab 5
14 pages
Stilson-Smith - Alias-Free Digital Synthesis of Classic Analog Waveforms (BLIT)
No ratings yet
Stilson-Smith - Alias-Free Digital Synthesis of Classic Analog Waveforms (BLIT)
12 pages
FFT Computation and Generation of Spectrogram
No ratings yet
FFT Computation and Generation of Spectrogram
32 pages
Iir Filter Python
No ratings yet
Iir Filter Python
11 pages
Mubu & Friends
No ratings yet
Mubu & Friends
4 pages
ICMC96 SuperCollider Paper
No ratings yet
ICMC96 SuperCollider Paper
3 pages
Sound Processing in Openmusic PDF
100% (1)
Sound Processing in Openmusic PDF
6 pages
State Machine
No ratings yet
State Machine
627 pages
NEL FFT Suite User Guide
No ratings yet
NEL FFT Suite User Guide
18 pages
Instrument Synthesizer Aman Singh Dilpreet: CMPT 468 Final Project by
No ratings yet
Instrument Synthesizer Aman Singh Dilpreet: CMPT 468 Final Project by
3 pages
926 Music DSP
100% (1)
926 Music DSP
265 pages
Newton's Fractals in Python
No ratings yet
Newton's Fractals in Python
19 pages
Little Lfo Manual
No ratings yet
Little Lfo Manual
24 pages
Chapter 11: Sound, The Auditory System, and Pitch Perception
No ratings yet
Chapter 11: Sound, The Auditory System, and Pitch Perception
49 pages
OM - Chroma
No ratings yet
OM - Chroma
18 pages
Devaney - A 1st Course in Chaotic Dynamical Systems (1992)
No ratings yet
Devaney - A 1st Course in Chaotic Dynamical Systems (1992)
161 pages
Fractal Music
100% (1)
Fractal Music
8 pages
FFT Window Functions - Limits On FFT Analysis
No ratings yet
FFT Window Functions - Limits On FFT Analysis
4 pages
Time Frequency Analysis of Musical Instruments
100% (1)
Time Frequency Analysis of Musical Instruments
20 pages
Combination Tones
100% (2)
Combination Tones
24 pages
Making Music With MATLAB
No ratings yet
Making Music With MATLAB
4 pages
Omchroma 1.0 PDF
100% (1)
Omchroma 1.0 PDF
82 pages
An 4VirtualAnalogFilters
No ratings yet
An 4VirtualAnalogFilters
24 pages
Faust Strings
No ratings yet
Faust Strings
35 pages
Instant Download (Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora PDF All Chapter
100% (5)
Instant Download (Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora PDF All Chapter
51 pages
Real Time Digital Audio Processing With Arduino
No ratings yet
Real Time Digital Audio Processing With Arduino
16 pages
DSP Lab Assignments
No ratings yet
DSP Lab Assignments
112 pages
Sound Processing in MATLAB
No ratings yet
Sound Processing in MATLAB
7 pages
Advanced Digital Signal Processing Lecture 1
0% (1)
Advanced Digital Signal Processing Lecture 1
42 pages
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
From Everand
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
Robert Johnson
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
DirectX Demystified: A Comprehensive Guide to Game Development Essentials
From Everand
DirectX Demystified: A Comprehensive Guide to Game Development Essentials
Kameron Hussain
No ratings yet
OpenFlow Cookbook
From Everand
OpenFlow Cookbook
Kingston Smiler. S
5/5 (1)
Kafka Event System
75% (4)
Kafka Event System
166 pages
Right-Wing Chinese
No ratings yet
Right-Wing Chinese
28 pages
Rong Xiaoqing: The Fading American Dreams of China's Most Notorious Snakehead' - Foreign Policy
No ratings yet
Rong Xiaoqing: The Fading American Dreams of China's Most Notorious Snakehead' - Foreign Policy
9 pages
What It Means To Be Liberal' or Conservative' in China - Foreign Policy
No ratings yet
What It Means To Be Liberal' or Conservative' in China - Foreign Policy
5 pages
Reporting On Human Snakes in China - Pulitzer Center
No ratings yet
Reporting On Human Snakes in China - Pulitzer Center
11 pages
Western Liberalism Is Dying in China - Foreign Policy
No ratings yet
Western Liberalism Is Dying in China - Foreign Policy
7 pages
Bountouridis - Music Information Retrieval Using Biologically-Inspired Techniques
No ratings yet
Bountouridis - Music Information Retrieval Using Biologically-Inspired Techniques
167 pages
Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting
No ratings yet
Nieuwenhuizen - Comparison of Algorithms For Audio Fingerprinting
2 pages
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
No ratings yet
Nieuwenhuizen - The Study and Implementation of Shazam's Audio Fingerprinting Algorithm For Advertisement Identification
4 pages
Choi - Deep Learning For Musical Info Retrieval
No ratings yet
Choi - Deep Learning For Musical Info Retrieval
16 pages
MNN - Basic Studio Production Handbook
No ratings yet
MNN - Basic Studio Production Handbook
55 pages
Optimal Binary Search Trees: Problem
No ratings yet
Optimal Binary Search Trees: Problem
16 pages
Electrical Storms in Tesla's Colorado Springs Notes
86% (7)
Electrical Storms in Tesla's Colorado Springs Notes
122 pages
Cnc-Machine and Its Components
No ratings yet
Cnc-Machine and Its Components
36 pages
Three Phase Electrical Circuit Analysis
No ratings yet
Three Phase Electrical Circuit Analysis
72 pages
4-1-Studies On Fly Ash-Based Geopolymer Concrete PDF
No ratings yet
4-1-Studies On Fly Ash-Based Geopolymer Concrete PDF
6 pages
BLV R: Series Type
No ratings yet
BLV R: Series Type
474 pages
ECE222 DP1 Binary Multiplier
No ratings yet
ECE222 DP1 Binary Multiplier
8 pages
Molines 2018
No ratings yet
Molines 2018
13 pages
Control Engineering and Automation
No ratings yet
Control Engineering and Automation
2 pages
Asas Algebra _ PDF
No ratings yet
Asas Algebra _ PDF
1 page
Mind Map Stoikiometri 1
0% (1)
Mind Map Stoikiometri 1
1 page
Muara Satui 12/22/2015: Heights Are in Meters Above Chart Datum Times Are in Local Time
No ratings yet
Muara Satui 12/22/2015: Heights Are in Meters Above Chart Datum Times Are in Local Time
45 pages
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
No ratings yet
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
Test Management
No ratings yet
Test Management
57 pages
International Standard
100% (1)
International Standard
192 pages
Veloza-Fajardo Et Al., 2007
No ratings yet
Veloza-Fajardo Et Al., 2007
10 pages
Bedroom Remodel FINSIHED
No ratings yet
Bedroom Remodel FINSIHED
3 pages
Data Type Variable Name (Size)
No ratings yet
Data Type Variable Name (Size)
31 pages
Twincat 3: Josef Papenfort Twincat Product Management
No ratings yet
Twincat 3: Josef Papenfort Twincat Product Management
49 pages
Monetary 2012E.C
0% (1)
Monetary 2012E.C
434 pages
Ogden Foam Curve Fitting
No ratings yet
Ogden Foam Curve Fitting
7 pages
6 CE 414 UnitHG Examples
No ratings yet
6 CE 414 UnitHG Examples
7 pages
2005 August Prov Exam
No ratings yet
2005 August Prov Exam
30 pages
Recitation 10: December 1, 2020
No ratings yet
Recitation 10: December 1, 2020
6 pages
Rock_Paper_Scissors_Mini_Project_Report_With_Flowchart (1)
No ratings yet
Rock_Paper_Scissors_Mini_Project_Report_With_Flowchart (1)
5 pages
Beam Weld Splice Connection 1
No ratings yet
Beam Weld Splice Connection 1
3 pages
Bach - Well Tempered Clavier - Prelude I
No ratings yet
Bach - Well Tempered Clavier - Prelude I
2 pages