0% found this document useful (0 votes)
39 views11 pages

Query by Singing/Humming via Dynamic Programming: J.-S. Roger Jang (張智星)

The document introduces query by singing/humming (QBSH), which allows users to identify songs by singing or humming. It discusses using pitch tracking to convert an audio clip into a pitch vector and then using dynamic programming (DP) to find the optimal alignment between the pitch vector and note vectors of songs in a database to calculate the distance and identify matching songs. It provides an example of using a three-step DP formula to recursively calculate the minimum distance between two sequences and find their optimal alignment path. It also notes some considerations for implementation like padding and the possibility of non-unique optimum paths.

Uploaded by

kapik79591
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views11 pages

Query by Singing/Humming via Dynamic Programming: J.-S. Roger Jang (張智星)

The document introduces query by singing/humming (QBSH), which allows users to identify songs by singing or humming. It discusses using pitch tracking to convert an audio clip into a pitch vector and then using dynamic programming (DP) to find the optimal alignment between the pitch vector and note vectors of songs in a database to calculate the distance and identify matching songs. It provides an example of using a three-step DP formula to recursively calculate the minimum distance between two sequences and find their optimal alignment path. It also notes some considerations for implementation like padding and the possibility of non-unique optimum paths.

Uploaded by

kapik79591
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Query by Singing/Humming

via Dynamic Programming

J.-S. Roger Jang (張智星)


[email protected]
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
Introduction to Query by Singing/Humming
 Query by singing/humming (QBSH, 哼唱選歌)
 Goal: Identify a song by singing or humming
 Demos
 http://mirlab.org/demo/miracle (PC)
 http://www.midomi.com (PC)

 http://www.soundhound.com (Mobile)

 Approach to QBSH
 Pitch tracking: Convert singing/humming into pitch vector
 Retrieval: Find the distance between the pitch vector and
each song in the database
 Our homework
 Explore how we can use dynamic programming (DP) to find
the distance in the retrieval part of QBSH
2/11
Examples of Pitch Vectors and Music Notes
 MIDI numbers
 Used in MIDI files
 AKA semitones
 Music note vector
 Integer semitones
 Example
 MIDI file of 小星星
 Note vector: [60 60 67 67 69 69 67
65 65 64 64 62 62 60]
 Pitch vector
 Real-number semitone
 Example of singing clips
 小星星, pitch vector (play)
 在那遙遠的地方, pitch vector (play)

Pitch rate = 31.25 pitch/second 3/11


Our Task: Optimal Alignment
 How to find the distance between a pitch vector p(i),
i=1~m, and a note vector q(j), j=1~n?
 We need to find the optimal alignment.
 This can be achieved by DP.
 Example:

 The alignment path: (1,1), (2,1), (3,2), (4,2), (5,2), (6,2), (7,3),
(8,3), (9,4), (10,4), (11,4), (12,4)
1-index based
 Distance:
m
dist ( p, q) = min  p(i ) − q(ui ) , with u1 = 1 and u1  u2  u3    um .
u1 ~ u m
i =1 4/11
Three-step Formula of DP for Alignment
 Three-step DP formula
 Optimum-value function: D(i,j) is the min distance between
p(1:i) and q(1:j)
 Recurrent equation:  D(i − 1, j )
D(i, j ) = p (i ) − q ( j ) + min  , with i  1 and j  1.
 D(i − 1, j − 1)
D(1,1) = p (1) − q (1) .

 Answer: dist ( p, q ) = min D(m, j ).


1 j n

 Assumption
 Anchored at beginning ➔ p(1) is assigned to q(1)
 No rest in p ➔ No zeros in p
 No need to do key transposition for p

5/11
Walk-through Example

|p(i)-q(j)|

40 19 15 28 26 12 14 19 9 19 14 17

20 1 5 8 6 8 6 1 11 1 6 3

q 30 9 5 18 16 2 4 9 1 9 4 7

11 15 2 4 18 16 11 21 11 16 13
10

20 1 5 8 6 8 6 1 11 1 6 3

21 25 12 14 28 26 21 31 21 26 23

6/11
Hints and Caveats
 Useful hints to implementation
 Be aware of the recurrent equation when i=1 or j=1.
 Pad an extra layer with D(i, j)=inf. for simplified code
 Caveats
 The optimum path may not be unique, but the minimum
distance is.
 The last element in the pitch vector does not have to be
assigned to the last music note ➔ Anchored beginning, free
end

7/11
Example

• Singing clip of 小星星


• MIDI file of 三輪車
Alignment path: Pitch vectors:

8/11
Example

• Singing clip of 小星星


• MIDI file of小星星
Alignment path: Pitch vectors:

9/11
Retrieval Result

Min distance!

 Other considerations
 Key transposition
 Anchor point
 Music note duration
10/11
Exercise
 Find the alignment between the following two
sequences, assuming anchored-beginning and free-end.
 Singing pitch vector = [12 11 15 16 11 12 9 10]
 Music note vector = [11 15 10 18]
18

6 7 3 2 7 6 9 8
Music notes

2 1 5 6 1 2 1 0
10

3 4 0 1 4 3 6 5
15

1 0 4 5 0 1 2 1
11

12 11 15 16 11 12 9 10

Singing pitch Solution!


11/11

You might also like