Query by Singing/Humming via Dynamic Programming: J.-S. Roger Jang (張智星)
Query by Singing/Humming via Dynamic Programming: J.-S. Roger Jang (張智星)
http://www.soundhound.com (Mobile)
Approach to QBSH
Pitch tracking: Convert singing/humming into pitch vector
Retrieval: Find the distance between the pitch vector and
each song in the database
Our homework
Explore how we can use dynamic programming (DP) to find
the distance in the retrieval part of QBSH
2/11
Examples of Pitch Vectors and Music Notes
MIDI numbers
Used in MIDI files
AKA semitones
Music note vector
Integer semitones
Example
MIDI file of 小星星
Note vector: [60 60 67 67 69 69 67
65 65 64 64 62 62 60]
Pitch vector
Real-number semitone
Example of singing clips
小星星, pitch vector (play)
在那遙遠的地方, pitch vector (play)
The alignment path: (1,1), (2,1), (3,2), (4,2), (5,2), (6,2), (7,3),
(8,3), (9,4), (10,4), (11,4), (12,4)
1-index based
Distance:
m
dist ( p, q) = min p(i ) − q(ui ) , with u1 = 1 and u1 u2 u3 um .
u1 ~ u m
i =1 4/11
Three-step Formula of DP for Alignment
Three-step DP formula
Optimum-value function: D(i,j) is the min distance between
p(1:i) and q(1:j)
Recurrent equation: D(i − 1, j )
D(i, j ) = p (i ) − q ( j ) + min , with i 1 and j 1.
D(i − 1, j − 1)
D(1,1) = p (1) − q (1) .
Assumption
Anchored at beginning ➔ p(1) is assigned to q(1)
No rest in p ➔ No zeros in p
No need to do key transposition for p
5/11
Walk-through Example
|p(i)-q(j)|
40 19 15 28 26 12 14 19 9 19 14 17
20 1 5 8 6 8 6 1 11 1 6 3
q 30 9 5 18 16 2 4 9 1 9 4 7
11 15 2 4 18 16 11 21 11 16 13
10
20 1 5 8 6 8 6 1 11 1 6 3
21 25 12 14 28 26 21 31 21 26 23
6/11
Hints and Caveats
Useful hints to implementation
Be aware of the recurrent equation when i=1 or j=1.
Pad an extra layer with D(i, j)=inf. for simplified code
Caveats
The optimum path may not be unique, but the minimum
distance is.
The last element in the pitch vector does not have to be
assigned to the last music note ➔ Anchored beginning, free
end
7/11
Example
8/11
Example
9/11
Retrieval Result
Min distance!
Other considerations
Key transposition
Anchor point
Music note duration
10/11
Exercise
Find the alignment between the following two
sequences, assuming anchored-beginning and free-end.
Singing pitch vector = [12 11 15 16 11 12 9 10]
Music note vector = [11 15 10 18]
18
6 7 3 2 7 6 9 8
Music notes
2 1 5 6 1 2 1 0
10
3 4 0 1 4 3 6 5
15
1 0 4 5 0 1 2 1
11
12 11 15 16 11 12 9 10