0% found this document useful (0 votes)
30 views

New Directions in Music and Machine Learning

mi

Uploaded by

Sicilienne Ex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

New Directions in Music and Machine Learning

mi

Uploaded by

Sicilienne Ex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Journal of New Music Research, 2014

Vol. 43, No. 3, 251–254, http://dx.doi.org/10.1080/09298215.2014.930496

New Directions in Music and Machine Learning

Machine learning has permeated nearly every area of natural language processing, among others. The main idea
music informatics, driven by a profusion of recordings avail- for automatic transcription is that the spectrogram of an input
able in digital audio formats, steady improvements to the audio data can be considered as a product of two matrices: one
accessibility and quality of symbolic corpora, availability of containing the inferred spectral templates (called a dictionary)
powerful algorithms in standard machine learning toolboxes, and the other a matrix of weights from which the transcription
and theoretical advances in machine learning and data min- can be obtained as a piano roll by thresholding.
ing. As a consequence, research on machine learning and It has been suggested that the training data can be utilized as
music is an active and growing field reflected in international the dictionary. This is semantically more meaningful, since the
meetings such as the International Workshops on Machine dictionary consists of real samples, but it leads to impractical
Learning and Music (MML): 2008 (Helsinki, Finland), 2009 algorithms in terms of computational and space requirements.
(Bled, Slovenia), 2010 (Florence, Italy), 2011 (Sierra Nevada, The paper in this issue investigates fast and efficient methods
Spain), 2012 (Edinburgh, Scotland), and 2013 (Prague, Czech for expressing very large dictionary matrices using substan-
Republic). This Journal of New Music Research Special Is- tially fewer dimensions. Algorithms such as singular value
sue on Music and Machine Learning presents some recent decomposition (SVD) using randomized techniques, and CUR
advanced research in the field. factorization are utilized to reduce the size of the dictionary
The papers in this Special Issue range across the sym- matrix within the NMF framework. Data selection algorithms
bolic to audio spectrum, with several creatively employing such as k-medoids and affinity propagation are also used to
both music data types. A diverse range of music informatics reduce the amount of training data.
applications are approached using machine learning: audio The experiments reported in this paper were performed
transcription, score following, symbolic music classification, using the MAPS (MIDI Aligned Piano Sounds) data set. The
hit song prediction, expressive performance modelling, au- transcription accuracy and runtime of each proposed method
dio chord extraction, and melody harmonization. The Guest was tested, along with the size of the dictionary matrix needed
Editors welcome you to this Special Issue and hope that you to attain a good transcription accuracy. The results show that
find the papers interesting and relevant to your own research. the performance of the data-reduced methods is comparable
to that of the full data solution, and though the latter far
exceeds real-time transcription performance, the runtime of
Randomized Matrix Decompositions and the method is reduced dramatically for one of the reduced
Exemplar Selection in Large Dictionaries for configurations. A thorough experimentation was carried out to
Polyphonic Piano Transcription show that with only a fraction of the training set the accuracy
of the system is maintained in a range that permits the system
Automatic music transcription has been one of the fundamen- to be used for real-time transcription.
tal problems in musical audio processing for many years.
Although there are competent systems for the monophonic
problem, polyphonic data remains a challenge for state-of- Estimating Onset and Offset Asynchronies in
the-art systems. The two main categories of methods are those
Polyphonic Score-Audio Alignment
implementing aspects of the human auditory system and those
based on learning spectral patterns of the sounds to be tran- Audio to score alignment has been a research issue for many
scribed, finding complex superpositions of them in the audio years. It has numerous applications including expressive
data. The paper in this issue by Ari, Simsekli, Cemgil, and performance, automatic accompaniment generation, content-
Akarun presents a study on how to efficiently use training data based processing, digital library synchronization, and per-
for a polyphonic piano transcription system based on machine former modelling. An alignment method is said to be online
learning. when it is applied to a live performance and a reference score,
Non-negative matrix factorization (NMF) has been shown and offline when there is a representation of the previously
to be successful for modelling non-negative data in many recorded sound (which can be an audio analysis or a symbolic
domains, including signal processing, bioinformatics, and representation, such as MIDI) and the score in any format.

© 2014 Taylor & Francis


252 D. Conklin et al.

This paper in this issue by Devaney presents an offline simplified to seven possibilities. Two broad classifier methods
system for measuring note onset and offset asynchronies be- are applied: a Markov classifier based on n-gram models with
tween musical lines that are marked as simultaneities in the classification based on the maximum a posteriori decision
score. Current state-of-the-art asynchrony measurement for rule, and two further classification methods based on n-words
onset and offset is not sufficient for particular applications (shared subsequences between the test piece and pieces in
like string quartets or choral voices, where timing is partic- the corpus). In all four classification dimensions, the Markov
ularly difficult. The system has been developed and tested classifier using original chord types outperforms the word-
for the singing voice, estimating onset and offset locations based methods, though the authors note that some bias may
for each voice in monaural recordings of polyphonic vocal be introduced due to chord notational differences between
performances. classes.
The first pass of the proposed method employs dynamic The authors then extend the chord representations using
time warping (DTW), a well-known string alignment algo- the method of multiple viewpoints, deriving multiple abstract
rithm, which was introduced in music for recognizing melodic features from chord labels and representing pieces by parallel
fragments in an audio texture. The DTW creates a hypothetical streams of derived sequences using the multiple viewpoint
harmonic template for each note in the score and calculates the method for music prediction and classification. In this exten-
peak spectral difference between the template and the audio sion the authors also consider a duration component in their
in order to build a similarity matrix where DTW is used to models. Each viewpoint is associated with an n-gram model
find the best path, obtaining a series of note transition times. with the predictions of each model interpolated using a fusion
A novelty in this paper is the use of a Hidden Markov Model function. A full system of ten viewpoints, including duration,
(HMM) in a second pass to refine the offset–onset transitions outperforms all other classifiers on the tasks of composer,
between groups of simultaneous notes in the alignment. This sub-genre, and performance style. A slightly smaller multiple
pass uses the processed output of a constant-Q filter bank as viewpoint model, again including duration, achieves nearly
observations and finds the best sequence of note onsets and perfect classification accuracy on the meter classification task.
offsets for each part in the recording. The proposed system As a practical application of the classification results, the
is evaluated with eleven a cappella recordings and using two paper finally considers parsing a chord sequence for segments
metrics: the number of onsets and offsets within a fixed inter- that are each classified using a multiple viewpoint model.
val of the ground truth, and the percentiles of the difference Such a parse of a chord sequence may indicate sharing and
between the predictions and the ground truth, which provide combining of stylistic material between different performers.
information about the distribution of errors. The evaluation
results using singing voice ensembles show that the system
with the HMM improves the median accuracy of the DTW Dance Hit Song Prediction
alignment both for onsets and offsets. Hit song science is a subfield of music information retrieval,
aiming to gain insight into what actually makes a song suc-
cessful before it is released on the market. It is of particular
Predicting the Composer and Style of Jazz Chord interest to record companies in the music industry who invest
large amounts of money in new talent. In the past, several
Progressions approaches to predicting whether a song would reach the
The task of supervised learning is the most deeply studied area top positions in the charts or not have been proposed. Some
of machine learning. Data labelled with classes is presented of these approaches have obtained encouraging results by
to a learning algorithm which must learn a function predicting extracting standard and novel audio features from the songs
with high accuracy the class label for new instances. In music and applying machine learning techniques to the resulting
informatics, supervised learning has been widely applied for data.
the prediction of genre, composer, and geographic origin in The paper in this issue by Herremans, Martens and
both audio and symbolic data. The data representation there- Sörensen focuses on dance music and proposes a machine
fore varies from low-level audio features through to abstract learning approach to predicting if a dance song is going to
features derived from the symbolic music score. The paper be a top 10 hit versus a lower positioned dance song. With
in this issue by Hedges, Roy and Pachet falls into the latter this aim, two dance hit archives available online were used to
category, using chord sequences as the basic representation. create a database: the singles dance archive from the Official
These are extracted from thousands of jazz standards in lead Charts Company, and the singles dance archive from Bill-
sheet format, and the sequences have been labelled along board. The Official Charts Company is operated by both the
four different dimensions: composer, sub-genre, performance British Phonographic Industry and the Entertainment Retail-
style, and meter. ers Association, while Billboard is one of the oldest magazines
In the first part of the paper, the authors consider classifying in the world devoted to music and the music industry.
basic chord sequences, represented as linked features of chord The Echo Nest was used to extract features from the dance
root and chord type. For chord type the authors consider both songs. The authors divide the extracted features into three cate-
the original labels appearing in the corpus, and chord types gories: meta-information (e.g. artist location, artist
New Directions in Music and Machine Learning 253

familiarity, artist hotness), basic features (e.g. duration, tempo, which many studies have tried to use different aspects of music
time signature, mode, key, loudness), and temporal features theory, statistical and structural models, and signal processing
(e.g. timbre features). Different machine learning techniques algorithms for the extraction of harmonic information from the
were applied to the dataset. The algorithms used were audio signal. Applications of audio chord extraction include
decision trees (C4.5), rule induction (Ripper), naïve Bayes, tonal music analysis, cover identification, comparison metrics,
logistic regression, and support vector machines. The results automatic accompaniment, automatic playlist generation, and
obtained show that popularity of dance songs can be predicted performance rendering.
by analysing music signals. The work focuses on the determination of both the key
and the chord sequence of a music audio signal. The chord
The Sense of Ensemble: a Machine Learning estimation problem is stated here as the process of convert-
Approach to Expressive Performance Modeling ing an audio recording into a stream of chord symbols with
associated times, and the key estimation as the process of ex-
in String Quartets
tracting tonality symbols from the audio. For the latter task two
Expressive music performance research investigates the ma- main approaches have been proposed in the literature: global
nipulation of sound properties, e.g. timing and dynamics, in and local. Global approaches are less complex but inevitably
order to understand and recreate expression in performances. lead to a loss of information when key is changed along the
There have been several approaches to study expression in composition. The local approach copes with this problem but
music performances using computational tools. Most of these is harder because the decisions have to be made with fewer
studies have investigated solo expressive performances, and observations. The present work falls into the second category:
there has been little attention to the study of expression in both local key and chords are extracted simultaneously based
ensemble performances. on sequences of chroma vectors extracted from the spectro-
The paper in this issue by Marchini, Ramirez, Papiotis gram of the audio signal.
and Maestre proposes a novel approach for building compu- The authors introduce a statistical framework for both keys
tational models of ensemble expressive performance, gaining and chords that contains a musicological model with prior
insights into musician collaboration. The authors study string knowledge about harmony, and key and chord durations. The
quartet performances with different expressive intentions: me- prior knowledge model can be decomposed into different com-
chanical, normal, and exaggerated; and in different scenar- ponents, namely duration and change models for both keys
ios: solo and ensemble. The authors record both audio and and chords. This permits the authors to study the role and
motor information in string quartet performances. Different relevance of exploiting background knowledge in audio chord
performance parameter descriptors are extracted from each detection. The probabilities are learned from a training set
note in the recordings: sound level, note lengthening, vibrato of manually annotated music pieces, and are embedded in a
extent, and bow velocity. At the same time melodic, rhythmic, Hidden Markov model (HMM) in which each state represents
and harmonic descriptors are extracted from the music scores a key/chord combination. The HMM finds the most likely
of the recorded pieces. Features are grouped into horizontal alignment of the extracted smoothed chroma vectors with its
features (those representing information about a unique instru- states using the Viterbi algorithm.
ment or melodic line), and vertical features (those representing The authors consider all keys, two modes (major and mi-
information about several instruments or concurrent melodic nor) and four types of triads (major, minor, augmented and
lines). Using machine learning techniques, models are created diminished) for the chords. The performance of the system
for predicting sound level, bow velocity, vibrato extent, and was evaluated in terms of the percentage of time the extracted
note lengthening. A number of machine learning techniques key or chord equals the annotated key or chord. It was tested
are applied to generate the models: model trees, support vector on two different corpora of western popular music, includ-
machines, and instance based methods. ing one of those used for the MIREX 2009 chord detection
The results show that the features proposed contain suf- contest.
ficient information to train accurate predictive models for The factorization capability of the presented model
ensemble performance. Furthermore, by applying feature se- allows for independent testing of the different knowledge
lection, it is possible to see clear tendencies of the models sources involved in the detection. The authors have tested
to prefer horizontal features (individual voice context) in the the impact of the chord duration model, the key duration
solo case, and to prefer vertical features (inter-voice relation model, the chord change model, and the chord duration model,
context) in the ensemble case. and also the influence of the acoustic model. The prior
duration knowledge led to significant improvement for
Combining Musicological Knowledge About both key and chord estimation, but the chord change
models only achieve marginal improvements in the
Chords and Keys in a Simultaneous Chord and chord performance, although they are still required to
Local Key Estimation System achieve a good key estimation. The authors also stress the
The paper in this issue by Pauwels and Martens deals with importance of a good acoustic model for the overall
the harmonic analysis of audio recordings. This is a vast area in performance.
254 D. Conklin et al.

Four-part Harmonization Using Bayesian in terms of the edges to be included. The authors conclude that
Networks: Pros and Cons of Introducing Chord too many arcs lead to an unlearnable network, and too few
Nodes arcs can produce an excess of dissonant harmonic intervals.
Finally, the model is compared to three existing computa-
The understanding of the music harmonization process has tional approaches for four-part harmonization and the authors
been tackled so far from different perspectives and objectives: propose several extensions and open issues for four-part
from the harmonization of a melodic line to the analysis of harmonization systems based on Bayesian networks.
existing harmonizations applied to music comparison, genre
classification or composition applications. In general, most
proposed systems use graph models where nodes represent Acknowledgements
harmonic information such as tonal functions or chords.
The Guest Editors point out that this Special Issue would
The paper in this issue by Suzuki and Kitahara explores a
not have been possible without the help and dedication of
different approach using a Bayesian network where the nodes,
several experts in the area of music and machine learning,
rather than representing prebuilt chords or tonal functions,
who provided detailed anonymous reviews of manuscripts
represent individual notes and the network edges describe the
submitted to this Special Issue. Further special thanks are
interrelation between successive notes in a voice and simul-
given to Alan Marsden, who enthusiastically supported this
taneous notes in different voices. To study the suitability of
Special Issue since its inception.
this approach, a four voice harmonization of a melodic line
is produced and the results compared to those of two models
using prebuilt chord nodes, the first with basic triad informa-
tion, and the second one with inversions and extended notes Darrell Conklin
represented in the nodes. In order to compare the results, some University of the Basque Country UPV/EHU
objective and subjective metrics are defined that evaluate how IKERBASQUE, Basque Foundation for Science
close the automatic harmonization is to traditional rules. E-mail: darrell.conklin@ehu.es
The results confirm the main hypothesis of the paper: the
chord model with basic triad information is unable to generate Rafael Ramirez
smooth voicings; the chord model with full information cannot Universitat Pompeu Fabra
be learned due to the small amount of training data compared E-mail: rafael.ramirez@upf.edu
to the number of nodes; and finally, the chord labelling and
voicing using the Bayesian network without prebuilt chord José M. Iñesta
nodes performs better than the other two approaches. The Universidad de Alicante
authors also study the configuration of the Bayesian network, E-mail: inesta@dlsi.ua.es
Copyright of Journal of New Music Research is the property of Routledge and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for
individual use.

You might also like