Composing Music With Neural Networks and Probabilistic Finite-State Machines
Composing Music With Neural Networks and Probabilistic Finite-State Machines
Composing Music With Neural Networks and Probabilistic Finite-State Machines
1 Introduction
Artificial music composition systems have been created in the past using various
paradigms. Approaches using Recurrent Neural Networks [7] and Long-Short
Term Memory (LSTM) Neural Networks [3] architectures to learn from a dataset
of music and to create new instances based on the learned information have been
taken as well as approaches with genetic algorithms. The latter ones focus on
semi-objective [9], i.e. combined computational and human evaluation of the
songs, or fully objective fitness functions [8] to generate new songs. Associative
Memories have also been tried [5] using a context-sensitive grammar.
Classical Algorithm-based automatic music composition systems, which aim
at following predefined rules to construct music, stand or fall by human imple-
mentation of the underlying algorithms, which leaves the cumbersome task to
derive sets of musical creation rules completely to the human designer. Another
approach is to modify existing melodies by applying specific noise function to
create new melodies [4], thus possibly reducing the dominance of the human
factor. Heuristic Search Algorithms like Genetic Algorithms, on the other side,
suffer from the fitness bottleneck [2][6], a gigantic, and in terms of music mostly
unusable, search space.
Our machine learning systems extract important features/key elements from
a dataset of music (created by humans) and are able to produce new song ma-
terial which inherits these ideas. They compose music based on the extracted
information gained by inductive learning. In both of our following approaches, we
use machine learning techniques for the feature selection of musical information
from the music database.
3 Music Representation
The 19 songs in our database were written in abc language [10]. Conversion
from and to the abc language format from the MIDI format can be done using
the abcMIDI package 2 . MIDI (Musical Instrument Digital Interface)3 defines a
communications protocol for instruments and computers to transfer data.
Fig. 2. Beginning of ”Claret and Oysters” in the abc language (left) and our integer
representation (right)
5 Experimental Results
The reference song “Claret and Oysters” and one song made by the FSM are
visualized in Figure 4 with our integer representation, with the integer numbers
on the x-axis and the time (sequential numbers) on the y-axis. As can be seen,
there exist repeating patterns in the graph, the “landscape” of the song shares
similar “hills”, for example the notes 50-57 and 80-87.
4
http://www.wardsystems.com/products.asp?p=neuroshell2
Fig. 3. TDNN illustration using the Ward Net, process of continuously generating four
notes, based on the last ten notes, with Ti indicating the i-th note of the song
Several smaller and longer learned patterns can be recognized by ear and eye,
not only from “Claret and Oysters” in Figure 4, but from different songs of the
entire database as well. It is noteworthy that the overall quality of the song
does not change over time (in contrast to the NN songs, described in the next
section). The beginning of this song is shown in Figure 5 as musical score.
In Figure 6, the result from a TDNN-Ward Net, which was trained over 72000
epochs with an average error of 0.0006714 on the training set (the song “Claret
and Oysters”), is shown.
In the second half of the song, after a given starting seed, the NN is oscillating
more often between extreme notes than in the beginning of the song, it can not
predict the song any more. Including multiple songs to the knowledge database
did not significantly change the result. That means that even with a large number
of epochs the network architecture was not able to memorize the whole song.
Fig. 5. Sample song generated by the FSM in its musical score
Fig. 6. Sample song generated by the NN, trained on one song (internal representation)
Our two advantages are that (a) we learn key elements from analyzed songs
and features like distinct scale, arpeggios, musical style expressed through fre-
quent musical patterns and subpatterns are identified and used to create new
songs without an explicit human modeling and (b) a bias is induced to produce
short coherent sequences at once.
Future research need to recognize and use different coherent structures of a
song (like refrain, chorus, solo etc.) A newly composed song would then be struc-
tured by the learned musical structures and inherit features which are present
only in specific parts of songs, such as the refrain.
References
1. A. Baratè, G. Haus, and L. A. Ludovico. Music analysis and modeling through
petri nets. In CMMR, volume 3902 of Lecture Notes in Computer Science, pages
201–218. Springer, 2005.
2. J. A. Biles. Genjam: A genetic algorithm for generating jazz solos, June 15 1994.
3. D. Eck and J. Schmidhuber. Finding temporal structure in music: Blues improvi-
sation with lstm recurrent networks, 2002.
4. Y.-W. Jeon, I.-K. Lee, and J.-C. Yoon. Generating and modifying melody using
editable noise function. In CMMR, volume 3902 of Lecture Notes in Computer
Science, pages 164–168. Springer, 2005.
5. T. Kohonen. A self-learning musical grammar, or ”associative memory of the
second kind”. In IJCNN, volume I, pages I–1–I–6, Washington DC, 1989. IEEE.
6. E. R. Miranda and J. A. Biles, editors. Evolutionary Computer Music. Springer,
March 2007.
7. M. Mozer. Neural network music composition by prediction: Exploring the benefits
of psychoacoustic constraints and multi-scale processing, 1994.
8. J. Schoenberger. Genetic algorithms for musical composition with coherency
through genotype. Spring 2002.
9. M. Unehara and T. Onisawa. Construction of music composition system with
interactive genetic algorithm. Oct, 2003.
10. C. Walshaw. The abc notation system. http://abcnotation.org.uk, 1993.