Searching Music Incipits in Metric Space With Locality-Sensitive Hashing - CodeProject
Searching Music Incipits in Metric Space With Locality-Sensitive Hashing - CodeProject
GIT repository
Official Website
Source code can be downloaded only from GIT repository because it has more than 100MB and CodeProject only allows 10.
Preface
The ideas presented in this article have been implemented in my incipit search engine for RISM catalogue which can be found
here. The project uses ASP.NET Core, Entity Framework, Blazor, Knockout.js, Manufaktura Controls and MySQL database. The full
source code can be found at https://github.com/manufaktura-controls/rism. You can read about other technologies used in this
project in my other articles here.
The Problem
Many catalogues of written music such as Oskar Kolberg’s Catalogue or Répertoire International des Sources Musicales (RISM)
contain incipits, that is short fragments of music material written in musical notation. These fragments allow the user to search
musical sources by musical criteria such as melody and rhythm.
In this article, I am going to discuss querying by melody. The main criteria that should be met by such a search are:
Searching with transpositions – the same melody can be written in different keys and the search engine should find results
regardless of key,
Ordering results by similarity – variants of the desired melody have to be placed in results and ordered by similarity to the
query.
Various approaches have been made to this topic among which it is worth mentioning two:
Aligning sequence of pitches and calculating the similarity score (as done in Monochord Project),
Calculating distances in metric space (as in Oskar Kolberg’s Catalogue).
The main disadvantage of these two methods is that every query is traversing all the records in the database. This effects in poor
performance on large sets of data. This is not the case in Kolberg’s Catalogue because the size of the dataset is relatively small.
Monochord Project, however, must be deployed on multi-core system and use parallel processing.
The solution to this problem is to pre-filter data before calculating similarity. In this article I will explain how I did this in my search
engine for RISM data (http://musicalsources.org/).
Can be expressed as a sequence of numbers: 4, -2, -2, 7, 9, -5, -2, -2, 4, -2, -2.
The similarity between two melodies is expressed as a distance between points in space that represent these two melodies. We
calculate it using this formula:
Which is expressed as a sequence of 4, 2, 2. We take into account only the number of intervals that appear in the query so the
distance between two melodies is:
To express the similarity in percentage (which is more legible to users) we have to assume that a specific arbitrary distance equals
0% and a distance of 0 equals 100%. We calculate the similarity from the following formula:
If we assume that max distance is 12, the similarity score equals 52.8%.
These calculations have to be done for every record in the database before the results are sorted. Now I am going to show how to
filter the dataset before calculating similarity and sorting.
Space is a set that contains all the incipits which are described as vectors or points. Space has a number of dimensions that equals
the number of intervals in the melody. For my purposes I use 1, 2, 3, 4, 5 and 6-dimensional spaces. If the melody has more than 6
intervals I only take the first 6 into account.
Plane (or hyperplane) is a space that has 1 dimension less than the space that surrounds it (ambient space). It always divides the
higher-dimensional space in two. This table should clarify this concept:
https://www.codeproject.com/Articles/1268315/Searching-Music-Incipits-in-Metric-Space-with-Loca?display=Print 2/6
27/11/2018 Searching Music Incipits in Metric Space with Locality-Sensitive Hashing - CodeProject
1 (line) 0 (point)
2 (plane) 1 (line)
3 (space) 2 (plane)
If we want a plane to pass through the center of the coordinate system we can describe it as a single vector. This approach,
however, will favor smaller intervals – bigger intervals will fall into larger “cells”. To make the hash distribution equal, we can
describe plane as two vectors: the first defines the “alignment” of the plane and the second defines the translation.
For every point in space, we can determine if it is on the one side or on the other side of the plane. We calculate it using a dot
product:
return sum;
}
The goal of the algorithm is to generate a fixed number of random planes. Then, for each point (a melody), we determine its relation
to every plane. If the point is on one side of the plane, we write 0 or if the point is on the other side of the plane we write 1:
We can write the result as a binary number like 10010 or a decimal number like 18. This is a LSH hash. Note that we have to store
separate hashes for every number of dimensions. We can’t just take a higher-dimensional space and fill the remaining dimensions
with zeros because zero is treated as a perfect unison.
A plane group is a concept that can be used to avoid situations when similar melodies accidentally fall into wrong cells. We create a
few groups of planes and calculate hashes separately for each group so each melody has more than 1 hash. This is optional as it
leads to better search accuracy but lower performance.
Effect on Performance
https://www.codeproject.com/Articles/1268315/Searching-Music-Incipits-in-Metric-Space-with-Loca?display=Print 3/6
27/11/2018 Searching Music Incipits in Metric Space with Locality-Sensitive Hashing - CodeProject
Experiment was done on MySQL database. This is the execution plan of a query without LSH optimization. As you can see the
whole table is scanned and then the rows are ordered:
https://www.codeproject.com/Articles/1268315/Searching-Music-Incipits-in-Metric-Space-with-Loca?display=Print 4/6
27/11/2018 Searching Music Incipits in Metric Space with Locality-Sensitive Hashing - CodeProject
Note that LIMIT 100 OFFSET 0 doesn’t improve performance in any of the queries because sorting forces all rows to be
scanned. Sorting in the second query is faster because it operates on pre-filtered subset of records.
This table shows how locality-sensitive hashing affects performance of the queries:
https://www.codeproject.com/Articles/1268315/Searching-Music-Incipits-in-Metric-Space-with-Loca?display=Print 5/6
27/11/2018 Searching Music Incipits in Metric Space with Locality-Sensitive Hashing - CodeProject
As can be seen, pre-filtering data with spatial hash reduces query time in most cases. Applying index on hash column vastly
reduces query time in most queries but can increase the cost in some cases. The best effects are achieved when user enters a
rarely occurring melody. If the user searches for common interval patterns like scales or arpeggiatios the effect on performance can
be worse because many melodies share the same spatial hash.
License
This article, along with any associated source code and files, is licensed under The BSD License
I graduated from Adam Mickiewicz University in Poznań where I completed a MA degree in computer science (MA thesis:
Analysis of Sound of Viola da Gamba and Human Voice and an Attempt of Comparison of Their Timbres Using Various
Techniques of Digital Signal Analysis) and a bachelor degree in musicology (BA thesis: Continuity and Transitions in European
Music Theory Illustrated by the Example of 3rd part of Zarlino's Institutioni Harmoniche and Bernhard's Tractatus Compositionis
Augmentatus). I also graduated from a solo singing class in Fryderyk Chopin Musical School in Poznań. I'm a self-taught
composer and a member of informal international group Vox Saeculorum, gathering composers, which common goal is to revive
the old (mainly baroque) styles and composing traditions in contemporary written music. I'm the annual participant of
International Summer School of Early Music in Lidzbark Warmiński.
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile Article Copyright 2018 by Ajcek84
Web04 | 2.8.181127.1 | Last Updated 27 Nov 2018 Everything else Copyright © CodeProject, 1999-2018
https://www.codeproject.com/Articles/1268315/Searching-Music-Incipits-in-Metric-Space-with-Loca?display=Print 6/6