Audio Fingerprinting Based On Multiple
Audio Fingerprinting Based On Multiple
Audio Fingerprinting Based On Multiple
Abstract—Audio fingerprinting techniques aim at successfully relies on the assumption that at least one of the so called subfin-
performing content-based audio identification even when the audio gerprints is invariant to noise. Although this assumption holds
signals are slightly or seriously distorted. In this letter, we propose a in mild conditions (e.g., MP3 compression, downsampling, and
novel audio fingerprinting technique based on multiple hashing. In
order to improve the robustness of hashing, multiple hash strings equalization), it fails to be valid for some seriously corrupted
are generated through the discrete cosine transform (DCT) which audio clips such as those recorded in a noisy environment, hence
is applied to the temporal energy sequence in each subband. Ex- results in a serious performance degradation.
perimental results show that the proposed algorithm outperforms For such reasons, some efforts have been delivered to im-
the Philips Robust Hash (PRH) algorithm [1] under various distor- prove the robustness of subfingerprints. One possible method
tions.
mentioned in [1] suggests to generate a list of most probable
Index Terms—Audio fingerprinting, content-based audio identi- candidates for each subfingerprint based on the reliability infor-
fication, discrete cosine transform (DCT), robust hashing.
mation obtained from soft coding, albeit that the reliability in-
formation is in fact not reliable and less spectacular in practical
I. INTRODUCTION implementations. Another method is to view extracting subfin-
gerprints from the spectrogram as 2-D filtering in spectro-tem-
poral domain and try to substitute the filters [7], which is empir-
Fig. 3. Overall block diagram of the fingerprint extraction stage in the MLH
Fig. 1. Overall block diagram of the fingerprint extraction stage in the PRH method.
algorithm.
REFERENCES
[1] J. Haitsma and T. Kalker, “A highly robust audio fingerprinting
system,” in Proc. 3rd Int. Conf. Music Information Retrieval, Oct.
2002, pp. 107–115.
[2] P. Cano, E. Batlle, T. Kalker, and J. Haitsma, “A review of audio fin-
gerprinting,” J. VLSI Signal Process., vol. 41, no. 3, pp. 271–284, Nov.
2005.
[3] A. Wang, “An industrial strength audio search algorithm,” in Proc. 4th
Int. Conf. Music Information Retrieval, Oct. 2003, pp. 7–13.
[4] F. Balado, N. Hurley, E. McCarthy, and G. Silvestre, “Performance
analysis of robust audio hashing,” IEEE Trans. Inform. Forensics Se-
curity, vol. 2, no. 2, pp. 254–266, June 2007.
[5] P. Doets and R. Lagendijk, “Distortion estimation in compressed
music using only audio fingerprints,” IEEE Trans. Audio, Speech,
Lang. Process., vol. 16, no. 2, pp. 302–317, Feb. 2008.
[6] J. Haitsma and T. Kalker, “Speed-change resistant audio fingerprinting
conventional recall rate in that the latter excludes only the first using auto-correlation,” in Proc. Int. Conf. Acoustics, Speech, and
Signal Processing, Apr. 2003, vol. 4, pp. 728–731.
type of false negatives. [7] M. Park, H. Kim, Y. Ro, and M. Kim, “Frequency filtering for a highly
In addition, to show the performance variation with different robust audio fingerprinting scheme in a real-noise environment,” IEICE
combinations of hash tables, we set the threshold to be 0.35 as in Trans. Inform. Syst., vol. E89-D, no. 7, pp. 2324–2327, July 2006.
[8] K. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advan-
[1] and measured the results. The four hash tables employed are tages, Applications. New York: Academic, 1990.
denoted as HT1, HT2, HT3 and HT4. Here HT was built from [9] N. Ahmed, T. Natarajan, and K. Rao, “Discrete cosine transform,”
the th DCT coefficients, for example, HT1 was constructed IEEE Trans. Comput., pp. 90–93, Jan. 1974.
[10] J. Xi and J. Chicharo, “Computing running DCTs and DSTs based on
from the DC components. For comparison, the hash table con- their second-order shift properties,” IEEE Trans. Circuits Syst. I: Fund.
structed in the PRH algorithm is represented as HT0. For each Theory Applicat., vol. 47, no. 5, pp. 779–783, May 2000.