Functional Modelling of Musical Harmony

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Functional Modelling of Musical Harmony

An experience report

José Pedro Magalhães W. Bas de Haas


Department of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
{jpm,bash}@cs.uu.nl

Abstract Among the first to formalize these theories were Lerdahl and Jack-
Music theory has been essential in composing and performing endoff (1983), who gave an encompasing account on how experi-
music for centuries. Within Western tonal music, from the early enced listeners hierarchically organise tonal music. More formally,
Baroque on to modern-day jazz and pop music, the function of Steedman (1984) proposes a generative grammar for twelve-bar
chords within a chord sequence can be explained by harmony blues chord progressions, and Rohrmeier (2007, 2011) describes
theory. Although Western tonal harmony theory is a thoroughly the core of tonal harmony as a formal grammar. This grammar was
studied area, formalising this theory is a hard problem. implemented by De Haas et al. (2009) and used for modelling har-
We present a formalisation of the rules of tonal harmony as monic similarity. Models of tonal harmony are useful because they
a Haskell (generalized) algebraic datatype. Given a sequence of explain the role or function that a musical chord has within a piece
chord labels, the harmonic function of a chord in its tonal context of music. For instance, the same musical chord often has different
is automatically derived. For this, we use several advanced func- functions depending on the context in which it occurs.
tional programming techniques, such as type-level computations, We present H ARM T RACE (Harmony Analysis and Retrieval of
datatype-generic programming, and error-correcting parsers. As a Music with Type-level Representations of Abstract Chords Enti-
detailed example, we show how our model can be used to improve ties), an adaptation and extension of the Java approach of De Haas
content-based retrieval of jazz songs. et al. to a functional setting. Exploring the connection between a
We explain why Haskell is the perfect match for these tasks, and context-free grammar and an algebraic datatype, we represent dif-
compare our implementation to an earlier solution in Java. We also ferent musical harmonies as values of a datatype. Unlike in previ-
point out shortcomings of the language and libraries that limit our ous work, we encode all the musical restrictions in the type itself;
work, and discuss future developments which may ameliorate our strong static typing guarantees that well-typed values represent har-
solution. monical sequences. Furthermore, a strongly-typed model gives us
higher expressiveness and results in simpler code: through tech-
Categories and Subject Descriptors D.1.1 [Programming Tech- niques such as datatype-generic programming and type-level com-
niques]: Functional Programming; H.5.5 [Information Interfaces putation, most of the code is automatically derived from the types.
and Presentation]: Sound and Music Computing—Modelling In a way, the types are the code: most of the code (that would other-
wise have to be written manually) follows directly from the types.
General Terms Experimentation, Languages
A formal model of musical harmony can be used to improve
many other typical music processing tasks. Content-based Mu-
1. Introduction sic Information Retrieval (MIR, Downie 2003), for instance, is a
[Tonality is] the art of combining tones in such succes- rapidly expanding area within multimedia research which aims at
sions and such harmonies or successions of harmonies, that keeping large repositories of digital music maintainable and acces-
the relation of all events to a fundamental tone is made pos- sible. Within MIR the notion of similarity is crucial: songs that are
sible. similar in one or more features to a given relevant song are likely to
Arnold Schoenberg, in Problems of Harmony be relevant as well. The majority of approaches to notation-based
music retrieval focus on melodic similarity. Using our harmony
The deep connection between mathematics and music has been model, we present a method that allows the retrieval of music based
known at least since the times of Plato (Mountford 1923). In the on harmonic similarity. We compare harmonic analyses in a tree
realm of tonal harmony in particular, when studying the relation- form (which explains the functions of chords within a sequence)
ships between sequential chords, we notice order and regularity; using a generic edit-distance function, and show that this compar-
some combinations sound pleasing while others sound peculiar. ison predicts harmonic similarity better than an edit-distance be-
These observations led music theorists to develop ways to analyse tween the original textual sequences of chords. Chord labels which
the function of a chord in its tonal context (e.g. Riemann 1893). do not “fit” in our model are automatically corrected at the parsing
stage, allowing comparison to proceed.

Contribution In this paper we present a new functional model


Permission to make digital or hard copies of all or part of this work for personal or of Western tonal harmony and explain why Haskell is particularly
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
well-suited for modelling harmony. We show how our model can
on the first page. To copy otherwise, to republish, to post on servers or to redistribute be used to perform automatic harmony analysis of sequences of
to lists, requires prior specific permission and/or a fee. textual chord labels, and that such an analysis improves the task of
ICFP’11, September 19–21, 2011, Tokyo, Japan. retrieving harmonically similar pieces. Along the way, we explain
Copyright c 2011 ACM 978-1-4503-0865-6/11/09. . . $10.00 how several features of Haskell, such as type-level computations,
1 semitone Ton SDom Dom Ton
2nd
3rd 4th

{
{
{{ I IV V/V V I

C D E F G A B C CD E F G A B C C F D7 G7 C
Figure 1. On the left: the C major scale with the names of the Figure 2. An often occurring chord sequence. The chord labels
notes at the bottom and three example intervals above the notes. are printed below the score, and the scale degrees and functional
On the right: a schematic piano keyboard containing note names analysis above the score.
and highlighting the semitone interval.

building up tension, and the tonic releases tension. Hence, every


error-correcting parsers, and generic programming, are essential
scale degree can be categorized by having a dominant, subdom-
to our approach. All the code of H ARM T RACE is available on
inant, or tonic role. Similarly to the preparation of a tonic by a
Hackage (package HarmTrace-0.4).
dominant, or a dominant by a subdominant, any scale degree can
The rest of this report is structured as follows: we first introduce
be recursively preceded by the scale degree seven semitones (or a
basic concepts of harmony in Section 2, and then explain how
we encode them in Haskell in Section 3. In Section 4 we show fifth) up, e.g. the D7 preceding G7 in Figure 2. This allow the cre-
applications of our model, which we evaluate in Section 5. We ation of chains of scale degrees, so-called secondary dominants.
conclude in Section 6, discussing the limitations of our system and We have presented an extremely condensed view on harmony
pointing out directions for future development. theory. Nevertheless, it is clear that within a sequence not ev-
ery chord is equally important. Some chords can easily be re-
moved leaving the global structure of the piece intact, whereas
2. Harmony other chords cannot be removed without altering the way the piece
The French-American composer Edgard Varèse once defined music is perceived. For instance, the D7 in Figure 2 can be removed leav-
as “organised sound”. In this section we present a very brief intro- ing the general harmony structure intact, while removing the G7
duction of how tonal harmony organises sound in Western music; or the C at the end would change the harmony structure. This sug-
for a thorough approach, we refer the reader to Piston and DeVoto gests that the rules of tonal harmony can be formalized hierarchi-
(1991). cally, analogically to linguistics. This is what we do in the next sec-
We start with the most basic element in music: a tone. A tone is tion, building on ideas of Rohrmeier (2007, 2011) and the previous
a sound with a fixed frequency or pitch which can be described in formalisation as a context-free grammar by De Haas et al. (2009).
musical notation with a note. All notes have a name, e.g. C, D, However, it is important to stress that formal modelling of tonal
E, etc., and represent tones of a specific pitch (Figure 1 on the harmony is a difficult task, since the rules of harmony are highly
left). The distance between two notes is called an interval and is ambiguous and often formulated imprecisely.
measured in semitones, which is the smallest interval in Western
tonal music. A semitone is also the distance between a black and a
white key (or two adjacent white keys) on a piano (Figure 1 on the 3. Encoding harmony as a datatype
right). Harmony arises when two or more tones sound at the same We now discuss how we formalize general harmony theory as a
time. Simultaneously sounding notes form chords, which can in datatype. Throughout the rest of the paper we elide most of the
turn be used to form chord sequences. A chord is a group of tones musical details and concentrate on a small but representative subset
sounding at the same time, and separated by intervals of roughly of the rules. The general idea is that we convert an input sequence
the same size. The two most important factors that characterize a of chord labels, such as "C:maj F:maj D:7 G:7 C:maj" (also
chord are its structure, which determined by the intervals between shown in Figure 2), into a value of a Haskell datatype which
the notes of the chord, and the chord root. The root is the note on captures the function of chords within the sequence. Since we want
which the chord is built. Chords can be labelled by describing their to abstract from specific keys, we first translate every chord label
root and the relative interval structure of the tones in the chord. into a scale degree. For this to be possible, we assume we know the
Figure 2 displays a frequently occurring chord sequence, in the key of every input song. For instance, our previous example is in C
C major key. The key of a piece of music is the tonal center of the major, so it translates to "I:maj IV:maj II:7 V:7 I:maj".
piece. It specifies the tonic, which is perceived as the most stable
tone in that piece. Often pieces begin and end with chords rooted 3.1 Naive approach
on the tonic of the key. Moreover, the key specifies the scale, which
Using standard algebraic datatypes, we can encode alternatives as
is the set of pitches that occur most frequently in the piece and that
constructors, sequences as arguments to a constructor, and repeti-
sound reasonably well together. For instance, the key of C major
tions as lists. A first (and very simplified) approach could be the
only contains the white keys of a piano keyboard.
following:
The same chord sounds differently in pieces of different keys.
On the other hand, a chord sequence that is transposed to a different data Piece = Piece [Phrase]
key, by moving all notes up or down by a fixed interval, sounds very data Phrase = PT Ton | PD Dom
similar to the original sequence. Scale degrees are used to abstract data Ton = TIMaj Degree
from key and absolute pitch. A scale degree represents the relative data Dom = DVMaj Degree | DSD,D SDom Dom
interval between a tone and the tonic of the piece. They are typically data SDom = SIVMaj Degree
denoted with Roman numerals, as seen in Figure 2.
In music, building up and releasing tension is crucial. In the de- We see a piece as a list of phrases. A phrase is either a tonic or a
velopment of harmonic tension, three functional roles can generally dominant. A tonic is simply the first scale degree, while a dominant
be discerned: tonic, dominant, and subdominant. The dominant in- might branch into a subdominant and a dominant, or simply be the
duces maximal tension, the subdominant prepares a dominant by fifth degree.
The leaves of our tree are the input labelled scale degrees, which According to harmony theory, every scale degree can be preceded
consist of a root degree (an integer between 1 and 7) together with by the scale degree of the dominant class a fifth interval (seven
a chord class: semitones) up. To encode this notion we need to compute transpo-
sitions on scale degrees. Since we encode the degree at the type-
data Degree = Deg Int Class
level this means we need type-level computations. For this we use
data Class = Maj | Min | Dom7 | . . . GADTs (Peyton Jones et al. 2006) and type families (Schrijvers
The chord class is used to group chords into a small number of cat- et al. 2008). GADTs allow us to conveniently restrict the chord
egories based on their internal interval structure. All major chords, root and class for certain constructors, while type families perform
which tend to be perceived as sounding joyfully, are grouped under the necessary transpositions for relative degrees. To support chains
Maj. Similarly, Min groups all minor chords, which are generally of secondary dominants we change the Degree type as follows:
perceived as sounding darker than major chords, and Dom7 groups data Degreen δ γ η where
chords that have an interval structure that induces tension. BaseDeg :: DegreeFinal δ γ → Degreen δ γ (Su η)
We can now encode harmonic sequences as values of type ConsV :: Degreen (V/ δ ) Dom7 η → DegreeFinal δ γ
Piece: → Degreen δ γ (Su η)
goodPiece, badPiece :: Piece data DegreeFinal δ γ = Deg Int Class
goodPiece = Piece [PT (TIMaj (Deg 1 Maj))]
badPiece = Piece [PT (TIMaj (Deg 2 Maj))] We now have two constructors for Degreen : BaseDeg is the base
case, which stores a Root and a Class as before. In ConsV we
The problem with this representation is evident: non-sensical se- encode the relative dominants. Its type says we can produce a
quences such as badPiece are allowed by the type-checker. We Degreen for any root δ and class γ by having a DegreeFinal for that
know that a Tonic can never be the second scale degree: it must root and class preceded by a Degreen of root V/ δ of the dominant
be the first degree. However, since we do not constrain the Degree class. The type family V/ transposes its argument degree a fifth up:
argument in TIMaj , we have to make sure at the value-level that we
only accept Deg 1 Maj as an argument. To guarantee that the model type family V/ δ
never deals with invalid sequences we would need a separate proof type instance V/ I = V
of code correctness. type instance V/ V = II
...
3.2 More type information
To avoid infinite recursion in the parser (Section 4.1) we use a
Fortunately, we can make our model “more typed” simply by using type-level natural number in Degreen . This parameter also serves
phantom types to encode degrees and chord classes at the type to control the number of allowed secondary dominants:
level:
data Su η
data Ton = TIMaj (Degree I Maj) data Ze
data Dom = DVMaj (Degree V Maj) | DSD,D SDom Dom
data SDom = SIVMaj (Degree IV Maj) type Degree δ γ = Degreen δ γ (Su (Su (Su (Su Ze))))
data Degree δ γ = Deg Int Class Typically we use values between 4 and 7 for η. Its value greatly
affects compilation time; see the discussion in Section 6.1.
Now we detail precisely the root and class of the scale degree
expected by each of the constructors. We need type-level scale 3.4 Examples
degrees and classes to use as arguments to the new Degree type:
We have shown a very simplified description of our model of
data I; data II; data III; data IV; data V; data VI; data VII; musical harmony as a Haskell datatype. In reality, our model is
data Maj; data Dom7 ; . . . larger and more detailed, albeit still far simpler than the hundreds
of pages of Piston and DeVoto (1991), for instance. To provide an
It only remains to guarantee that Degrees are built correctly. An idea of the kind of structure our datatype models, we show the
easy way to achieve this is to have a type class mediating type-to- chord sequence of Figure 2 as a pretty-printed tree in Figure 3.
value conversions, and a function to build degrees: Every chord is classified as being part of a dominant, subdominant,
class ToRoot δ where toRoot :: δ → Int or tonic structure, and the D:7 is classified as a secondary dominant
instance ToRoot I where toRoot = 1 of the G:7 (V/V).
...
class ToClass γ where toClass :: γ → Class Piece
instance ToClass Maj where toClass = Maj PT PD PT
...
T D T
deg :: (ToRoot δ , ToClass γ) ⇒ δ → γ → Degree δ γ
deg r c = Deg (toRoot r) (toClass c) I S D I
If we also make sure that the constructor Deg is not exported, we C:maj IV V/V C:maj
can be certain that our value-level Degrees correctly reflect their
type. Sequences like badPiece above are no longer possible, since F:maj II 7 V7
the term TIMaj (deg (⊥ :: II) (⊥ :: Maj)) is not well-typed.
D:7 G:7
3.3 Secondary dominants Figure 3. The parse tree for the chord sequence shown in Figure 2.
So far we have seen how to encode simple harmonic rules and guar- PT and PD represent phrase nodes. T, D, and S denote tonic, dom-
anteed that well-typed pieces make “sense”. However, we also need inant, and subdominant, respectively, and the secondary dominant
to encode harmonic rules that account for secondary dominants. is denoted by V/V.
Another example piece is displayed in Figure 4. Within this all these problems by not writing instances like the one above. In-
short piece, the IV:maj and G:7 are preceded by their secondary stead, we use datatype-generic programming to derive a parser au-
dominants. Because the model expects the first C:7 to resolve to tomatically in a type-safe way. We use the instant-generics
an F:maj, the parser inserts the expected scale degree IV (see package, which implements a library similar to that initially de-
Section 4.1.2). Note that although C:7 sounds similar to C:maj, scribed by Chakravarty et al. (2009). Due to length considerations
their harmonic functions in Figure 4 are distinct. we cannot explain how generic programming works in this paper,
but our generic parser is entirely trivial. The order of the construc-
Piece tors and their arguments determines the order of the alternatives and
sequences; in particular, we avoid left-recursion in our datatypes,
PD PT since we do not implement a grammar analysis like Devriese and
Piessens (2011).
D T
S D I 4.1.1 Adhoc parsers
The only truly non-generic parser is that for DegreeFinal , which is
V/IV IV S D C:maj also the only parser that consumes any input. It uses the type classes
V/I I7 ins V/IV IV V/V V7 ToRoot and ToClass as described in Section 3.2.
Unfortunately, we are also forced to write the parser instances
Vmin C:7 V/I I7 F:maj II 7 G:7 for GADTs such as Degreen , since instant-generics does not
support GADTs. Although the code remains entirely trivial, the
G:min Vmin C:7 D:7
instance heads become more complicated, since they have to reflect
G:min the type equalities introduced by the GADT. As an example, we
show the parser code for Degreen :
Figure 4. The pretty-printed parse tree generated for the chord
sequence "G:min C:7 G:min C:7 F:maj D:7 G:7 C:maj". instance ( Parse (DegreeFinal δ γ)
, Parse (Degreen (V/ δ ) Dom7 η ))
⇒ Parse (Degreen δ γ (Su η)) where
parse = BaseDeg <$> parse
4. From chord labels to harmonic structure <|> ConsV <$> parse <∗> parse
We have seen how to put Haskell’s advanced type system features The context of the instance reflects the type of the constructors of
to good use in the definition of a model of tonal harmony. In this Degreen : BaseDeg introduces the Parse (DegreeFinal δ γ) constraint,
section we further exploit the advantages of a well-typed model
while defining a generic parser from labelled scale degrees (e.g. whereas ConsV requires Parse (Degreen (V/ δ ) Dom7 η)) too.
"I:maj IV:maj II:7 V:7 I:maj") to our datatype. We also The need for type-level natural numbers becomes evident here;
show other operations on the model, like pretty-printing and diffing. the instance above is “undecidable” for GHC, meaning that the
rules for instance contexts become relaxed. Normally there are con-
4.1 Parsing straints on what can be written in the context to guarantee termi-
nation of type-checking. Undecidable instances lift these restric-
From the high-level musical structure (e.g. the Ton and Dom tions, with the side-effect that type-checking becomes undecidable
datatypes of Section 3.2) we can easily build a parser in applicative in general. However, we are certain that type-checking will always
style mimicking the structure of the types: terminate since we recursively decrease the type-level natural num-
data Parser α -- abstract ber η. This means we also need a “base case instance” where we
use the empty parser which always fails; this is acceptable because
class Parse α where parse :: Parser α it is never used.
instance Parse Ton where
instance Parse (Degreen δ γ Ze) where parse = empty
parse = TIMaj <$> parse
instance Parse Dom where Note how useful the type class resolution mechanism becomes:
it recursively builds a parser for all possible alternatives, driven
parse = DVMaj <$> parse
by the type argument η. This also means potentially long type-
<|> DSD,D <$> parse <∗> parse
checking times; fortunately our current implementation remains
For the purposes of this paper we keep Parser abstract; in our compilable under a minute. We discuss parser performance issues
implementation we use the uu-parsinglib package (Swierstra in more detail in Section 6.2.
2009). We prefer uu-parsinglib over, say, parsec because our
grammar is highly ambiguous and we can put error correction to 4.1.2 Error correction
good use, as we explain in Section 4.1.2. We cannot hope to be able to model all valid harmonic relations
The instances of Parse for Ton and Dom are trivial because they in our datatype. Furthermore, songs often contain mistakes or
follow directly from the structure of the datatypes. They can even mistyped chords, or sequences of dubious harmonic validity. How-
be obtained by syntactic manipulation of the datatype declaration: ever, these things are often a localized problem, and most of the
replace | by <|>, add <$> after the constructor name, separate song still makes sense. In our solution we rely on error correction
constructor arguments by <∗> and replace each argument by parse. while parsing: chords that do not fit the structure are automatically
The code is tedious to write, and since we have several similar deleted or preceded by inserted chords, according to heuristics
datatypes it becomes repetitive and long. computed from the grammar structure. We keep track of the num-
To compound the problem, the rules of harmony are naturally ber of corrections, since the ratio of corrections to number of input
ambiguous, and we often change the model in search of the best chords provides a measure of meaningfulness of the parse tree. For
solution. Even more importantly, different musical styles can have most songs, parsing proceeds with none or very few corrections.
significantly different harmony rules (e.g. baroque harmony versus Songs with a very high error ratio denote bad input or wrong key
jazz), so our solution should support multiple models. We solve assignment, which results in meaningless scale degrees.
4.2 Visualising harmonic relations 5.2 Parsing results
In a way similar to the generic parser of Section 4.1, we also The parsing results are shown in Table 2. For each dataset, we show
have a generic pretty-printer, which produces output suitable for the average time taken to parse a song and the average error ratio.
generation of graphical representations such as that of Figure 4. The error ratio is a measure of how many corrections the parser
Similar issues with adhoc instances for GADTs arise, which we performed. We define it as a ratio between the number of correction
solve in the exact same way as described in Section 4.1.1. steps and the number of chord labels, but we remove sequences
of duplicate chord labels from the input first. A ratio of 0.2, for
4.3 Generic diff instance, means that 20% of the significant labels of the sequence
A practical application of our tonal harmony model is estimating have been altered. Note that a single chord that doesn’t match the
the harmonic similarity of two songs. An easy way to obtain a specification might cause multiple corrections, e.g. one deletion
measure of similarity between two Pieces is to use a generic diff followed by one insertion. Lower ratios indicate that the song fits
algorithm. Just like the parser and the pretty-printer, our generic our harmony model better.
diff is derived from the structure of the datatypes, and adapts
automatically to any change. We have implemented it in the style Dataset Error ratio Time taken (ms)
of Lempsink et al. (2009) for the instant-generics library.
This diff is based on four primitive generic functions: children, small 0.067 23.833
which returns a (heterogenously-typed) list of all children of a term, large 0.200 381.837
build, which rebuilds a term given a list of new children, eqCon,
Table 2. Error ratio and parsing runtime averaged over all songs
which computes equality of terms based only on their top-level
constructor, and typeOf , which returns a unique representation for
the type of a term. For performance reasons we use typeOf from On the small dataset, which consists of “harmonically cor-
the standard Data.Typeable library, while the other functions are rect” chord sequences, our model performs very well. The songs
easily implemented in instant-generics. However, the generic are parsed quickly and with average error ratio below 0.07. The
diff is rather slow; we discuss this problem in detail in Section 6.3. large dataset is more problematic. The parsing time increases con-
siderably, mostly because the ambiguity of our model can make the
error-correction process rather expensive. The error ratio also in-
5. Evaluation creases considerably, but in no way does the parser crash or refuse
In this section we evaluate the parsing results of our system and to produce a valid output. A higher error ratio is also expected,
compare the retrieval performance of the gdiff similarity measure since this dataset has many noisy or meaningless songs.
with a simple baseline diff on the input tokens.
5.3 Matching results
5.1 Datasets
To test gdiff as a similarity measure for musical harmony, we have
We have performed our experiment with two datasets: the dataset performed a retrieval experiment. In this experiment, the task is to
of De Haas et al. (2009, which we call small) and a larger dataset retrieve the similar (but not identical) songs based on the edit dis-
(large). Both datasets consist of textual chord sequences extracted tance of the gdiff algorithm. The distance between all pairs of
from user-generated Band-in-a-Box files that were collected from songs is calculated, and for every song a ranking is constructed by
the Internet. Band-in-a-Box is a software package that generates sorting all other songs on the basis of their distance. To place the
accompaniment given a chord sequence provided by the user. The performance of the gdiff algorithm and the difficulty of the task
small dataset contains a selection of pieces that “harmonically in perspective, we compare with a baseline algorithm. This method
make sense”, while the large dataset includes many songs that are uses no harmony information whatsoever; we simply tokenize the
harmonically atypical. This is because the files are user-generated, input string into a list of Degrees and perform a standard diff on
and contain peculiar and unfinished pieces, wrong key assignments that list (using the diff package). We use this method to provide
and other errors; it can therefore be considered “real life” data. a baseline case; the generic diff, having all the harmony infor-
Within both datasets there are different chord sequences that de- mation available, has to perform better than this. We call this sim-
scribe the same piece in different ways; these can be used to do a ple algorithm baseline, while the generic diff of Section 4.3 is
retrieval experiment. named gdiff.
We summarize the statistics of each dataset in Table 1. The last For our datasets we know all the clusters of similar songs. We
column shows the average number of chord labels per song on the can therefore analyse the rankings by calculating the Mean Average
dataset, and the clusters are the number of songs that are similar. Precision (MAP). The MAP is a single-figure measure between 0
For instance, in the small dataset, 35 songs have no similar songs, and 1 quantifying the precision of the retrieved results at all recall
11 songs have one other similar song, and 5 songs have two other levels (Manning et al. 2008, Chap. 8, p. 160); a higher MAP value
similar songs (for a total of 35 + 11 ∗ 2 + 5 ∗ 3 = 72 songs). The indicates a better ranking. For the small dataset, gdiff has a MAP
large dataset contains about 11 times more songs than small (854 of 0.853, while baseline scores 0.475. In the large dataset the
songs), and the songs are also longer on average. Note also that difference is smaller, but gdiff still outperforms baseline with a
songs with no similar songs are akin to noise for the retrieval task score of 0.510 against 0.395, respectively.
(see Section 5.3). We tested whether the difference in MAP is significant by per-
forming a Wilcoxon Signed-rank test1 . We chose the Wilcoxon
Dataset Clusters Avg. labels/song Signed-rank test because the underlying distribution of the aver-
small 35 11 5 41.70 age precision over the queries is unknown, and this Signed-ranks
large 485 71 27 21 7 2 1 1 54.73 test does not require the distribution to be normally distributed. The
differences between baseline and gdiff were statistically signif-
Table 1. Cluster size distribution and average number of chord icant, with W = 1058.5, p < 0.0001 on the small dataset, and also
labels per song. The small dataset has cluster sizes ranging from 1 on the large dataset, with W = 80352, p < 0.0001.
to 3, and the large dataset has cluster sizes ranging from 1 to 8.
1 All statistical tests were performed with the R language.
5.4 Comparison with previous work 5.4.6 Code repetition
There are considerable differences between our H ARM T RACE sys- Our Haskell system is more concise than the Java implementation
tem and the context-free grammar approach of De Haas et al. (2009, of ISMIR 09. An analysis of the number of significant source lines
hereafter referred to as ISMIR 09). of code2 reveals that ISMIR 09 has 5545 lines, while H ARM T RACE
has 1311, less than one quarter.
5.4.1 Error-correcting parsers
One of the drawbacks of ISMIR 09 is that a sequence of chords 6. Discussion and conclusion
that does not match the context-free specification precisely will
be rejected. For instance, appending one nonsensical chord to an We have shown how Haskell can be used to implement a model
otherwise grammatically correct sequence of symbols will still of musical harmony. Our solution outperforms a previous Java
force the parser to reject the complete sequence, not returning any approach in terms of speed, functionality, and elegance. However,
partial information about what it has parsed. H ARM T RACE solves the current implementation has a number of limitations, which we
this rejection problem by using error correcting parsers (Swierstra now describe in detail.
2009). This allows us to formalize the rules of tonal harmony that
6.1 Type-checker performance
we are certain of, and leave the borderline cases to the parser.
As mentioned in Section 4.1.1, it is easy to make the type-checker
5.4.2 Ambiguity control take very long to compile our code. We managed to keep the type-
checking time acceptably low, but this is only because we are “help-
Music, and harmony in particular, is intrinsically ambiguous.
ing” it. We minimized the number of type families used (four in
Hence, certain chords can have multiple meanings within a tonal
total, all similar to V/), and we (automatically) place each instance
context. This is reflected in both ISMIR 09 and H ARM T RACE. A
declaration in a separate module, since this speeds up compilation
major drawback of ISMIR 09 is that it is very limited in ways of
considerably. Furthermore, we represent each scale degree as an in-
controlling the ambiguity of the grammar. ISMIR 09 uses weighting
dependent type; type-level computations, such as transposition, are
to order the grammar rules by adding low weights to rules that ex-
then indexed over each type. A more concise way of representing
plain rare phenomena. However, controlling conditional execution
scale degrees would be to use type-level naturals. Transposition is
would require some form of high-level grammar generation system,
then simply summing modulo the total number of scale degrees.
since all rules are replicated for each scale degree and chord class.
Unfortunately this makes the compilation time unacceptably high.
On the other hand, H ARM T RACE supports more flexible condi-
We hope that native type-level naturals are added to GHC soon3 so
tional execution, through the use of GADTs and type families. An
that we can simplify our type-level computations without a perfor-
example is the restriction of secondary dominants to chords of the
mance penalty.
Dom7 class (Section 3.3).
6.2 Parser performance
5.4.3 Parsing performance
The higher average parsing time per song on the large dataset
There are considerable differences in the parsing performance of shown in Table 2 is caused mostly by a few songs taking very long.
H ARM T RACE compared to ISMIR 09 on both datasets. H ARM - In this dataset, only about 6% of the songs take longer than one
T RACE takes 1.65s to parse the small dataset, while ISMIR 09 second to parse. The three slowest songs take 41s, 24s, and 15s
takes more than 9m. When we compare parsing performance on to parse. They are long songs, and either contain chord sequences
the large dataset the differences become even more prominent: which our model does not account for or are harmonically atypical.
ISMIR 09 rejects 89.7% of the 854 pieces and 3.9% of the dataset
In these pathological cases the parser combinators take very long
had to be excluded because the parsing process would not terminate to compute the possible corrections. This is somewhat understand-
(due to unconstrained ambiguities). The remaining pieces parse in able, since our grammar is highly ambiguous and there are mul-
84m13s, while H ARM T RACE parses the entire dataset in 5m14s. tiple non-trivial possible corrections. However, such long parsing
All measurements were done on the same Intel Core 2 duo E6600, times are undesirable; perhaps the number of steps to lookahead in
2.4 GHz machine using GHC 7.0.2 and Java SE 1.6.0 17. the parser could be dynamically adjusted based on the number of
possible alternatives. This would hopefully lead to shorter parsing
5.4.4 Retrieval effectiveness times, albeit at the cost of potentially worse corrections.
Both H ARM T RACE as well as ISMIR 09 have been evaluated on
the small dataset. When we compare the retrieval effectiveness of 6.3 Matching performance
the gdiff approach with the best performing variant of ISMIR 09 The generic diff is a powerful tool that solves the matching prob-
(MAP of 0.859), we conclude that there is no statistically signifi- lem almost “for free” (Section 5.3). However, to use it we need
cant difference (W = 685, p = 1.00, using the same test procedure new generic functions to be derived for every datatype. This
as in Section 5.3). Because ISMIR 09 rejects 89.7 percent of the means longer compilation times, but also more adhoc instances,
pieces, no sensible comparison between the two approaches on the since there is no suitable generic programming library supporting
large data set can be performed. GADTs. These instances amount to over 200 lines of repetitive and
error-prone code. Worse, this code runs very slowly; our imple-
5.4.5 Grammar simplicity mentation uses type-safe runtime casts, which prevents fusion of
In ISMIR 09 all context-free rules were written by hand, which is the generic representations (Magalhães et al. 2010). This prevents
not only a tedious and error-prone enterprise, but can also result us from using the generic diff on datasets with thousands of songs.
in very large grammars. By using Haskell’s GADTs to represent
the rules of tonal harmony, we gain more expressive power, and Besides addressing the limitations pointed out above, we also
the grammar becomes shorter and easier to maintain. For instance, plan to add new functionality to our system.
GADTs allow us to write rules that hold for every Maj chord. In
2 Using http://cloc.sourceforge.net/.
ISMIR 09 this is expressed by having one rule for major I, II, III,
etc. 3 http://hackage.haskell.org/trac/ghc/ticket/4385
6.4 Mode and key Acknowledgments
In Section 3 we only discussed the rules for pieces in a major key. This work has been partially funded by the Portuguese Foundation
However, many songs are written in a minor key; this affects the for Science and Technology (FCT) via the SFRH/BD/35999/2007
expected scale degrees at the leaves, invalidates some alternatives, grant, and by the Dutch ICES/KIS III Bsik project MultimediaN.
and creates others. Nevertheless, a large number of rules hold for We thank Jurriaan Hage, Johan Jeuring, Andres Löh, Frans Wier-
both pieces in a major and a minor key. Currently we handle this ing, and the anonymous reviewers for their helpful comments, and
using a similar model for pieces in a minor key: Doaitse Swierstra for his exhaustive technical support in using his
parser combinators.
data Piece = PieceMaj [PhraseMaj ] | PieceMin [PhraseMin ]
However, this leads to unnecessary code duplication, since most of References
the harmony rules are independent of mode. A better alternative
would be to index pieces by their mode: E. Brady. Idris—systems programming meets full dependent yypes. In
PLPV’11, pages 43–54, 2011.
data MajMode ; data MinMode ; M. M. T. Chakravarty, G. C. Ditu, and R. Leshchinskiy. Instant generics:
data Piece µ = Piece [Phrase µ ] Fast and easy, 2009. Draft version.
D. Devriese and F. Piessens. Explicitly recursive grammar combinators—
The type variable µ would then be indexed with either MajMode a better model for shallow parser DSLs. In PADL’11, pages 84–98.
or MinMode , similarly to δ for degrees and γ for chord classes. Springer, 2011.
We think this would be an elegant way of expressing mode in the
J. Stephen Downie. Music information retrieval. Annual Review of Infor-
model. mation Science and Technology, 37(1):295–340, 2003.
Additionally, we currently restrict ourselves to songs in a single
W. B. de Haas, M. Rohrmeier, R. C. Veltkamp, and F. Wiering. Modeling
key, but often songs change the key throughout their development.
harmonic similarity using a generative grammar of tonal harmony. In
This means that scale degree I no longer maps to chord C, but to F, Proceedings of the Tenth International Conference on Music Information
for instance. Indexing the model over the key, and introducing rules Retrieval (ISMIR’09), pages 549–554, 2009.
for modulation which would change this key, would be a good way
E. Lempsink, S. Leather, and A. Löh. Type-safe diff for families of
of encoding key changes. datatypes. In WGP’09, pages 61–72. ACM, 2009.
Such changes would make the entire model indexed over
F. Lerdahl and R. Jackendoff. A Generative Theory of Tonal Music. The
one or more type variables. We plan to see if the extensions to
MIT Press, 1983. ISBN 0-262-62107-X.
instant-generics reported by Magalhães and Jeuring (2011)
allow us to continue using generic programming for our model. J. P. Magalhães and J. Jeuring. Generic programming for indexed datatypes.
Technical Report UU-CS-2011-021, Department of Information and
6.5 Other applications Computing Sciences, Utrecht University, 2011.
J. P. Magalhães, S. Holdermans, J. Jeuring, and A. Löh. Optimizing generics
We show how to use our model for improving music retrieval, but is easy! In PEPM’10, pages 33–42. ACM, 2010.
we believe other tasks can be improved similarly. For instance, al-
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information
gorithms for computing chord labels from audio or images (scores)
Retrieval. Cambridge University Press, 2008.
often recognize a set of possible chords at each step, with differ-
ent probabilities. Our model could be used to check which chords J. F. Mountford. The musical scales of Plato’s Republic. The Classical
Quarterly, 17(3/4):125–136, 1923.
are harmonically valid at each step, therefore introducing harmony
knowledge into the algorithm. Another interesting development U. Norell. Dependently typed programming in agda. In AFP’08, volume
would be to implement a (generic) enumerator over our datatypes; 5832 of LNCS, pages 230–266. Springer, 2009.
this would correspond to a generator of harmonically valid se- S. Peyton Jones, D. Vytiniotis, S. Weirich, and G. Washburn. Simple
quences of chords. unification-based type inference for GADTs. In ICFP’06, pages 50–61.
ACM, 2006.
6.6 Dependently-typed implementation W. Piston and M. DeVoto. Harmony. Victor Gollancz, 1991.
It would be interesting to see if we could easily port our system to H. Riemann. Vereinfachte Harmonielehre; oder, die Lehre von den tonalen
a dependently-typed setting. We plan to use Agda (Norell 2009), Funktionen der Akkorde. Augener, 1893.
due to its proximity to Haskell, or Idris (Brady 2011), since it has M. Rohrmeier. A generative grammar approach to diatonic harmonic struc-
efficient type-level naturals. We expect that deriving the parser au- ture. In Proceedings of the 4th Sound and Music Computing Conference,
tomatically will not be as easy, since generic programming support pages 97–100, 2007.
in dependently-typed languages is more primitive than in Haskell. M. Rohrmeier. Towards a generative syntax of tonal harmony. Journal of
However, we believe the model can benefit from a more expressive Mathematics and Music, 5(1):35–53, 2011.
type language. Having no barriers between values and types would R. Schrijvers, S. Peyton Jones, M. M. T. Chakravarty, and M. Sulzmann.
reduce code duplication and simplify the model. At the same time, Type checking with open type functions. In ICFP’08, pages 51–62.
we expect that the increased expressiveness can be used to model ACM, 2008.
more complex harmonic relations. M. J. Steedman. A generative grammar for jazz chord sequences. Music
Perception, 2(1):52–77, 1984.
Overall, we are convinced that strong static typing and generic S. Doaitse Swierstra. Combinator Parsing: A Short Tutorial, pages 252–
programming are essential tools in modelling musical harmony. 300. Springer-Verlag, 2009.
We hope that our approach paves the way for future functional
approaches to musical modelling and processing.

You might also like