0% found this document useful (0 votes)
71 views

Exploring Phase Information in Sound Source Separa

This document discusses using phase information in sound source separation applications. It reviews previous work that has explored: 1) Using phase contours of musical instrument notes as potential separation features. 2) Resolving overlapping harmonics using phase coupling properties of musical instruments. 3) Harmonic percussive decomposition using calculated radian ranges for each frequency bin. Previous research has found that phase information can reveal characteristics of signals that are not evident from the magnitude spectrum alone. Phase-based techniques have been used for musical instrument characterization, modeling, coding, and classification. Higher-order statistics that maintain both magnitude and phase have also been explored for sound source separation.

Uploaded by

Paulo Feijão
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Exploring Phase Information in Sound Source Separa

This document discusses using phase information in sound source separation applications. It reviews previous work that has explored: 1) Using phase contours of musical instrument notes as potential separation features. 2) Resolving overlapping harmonics using phase coupling properties of musical instruments. 3) Harmonic percussive decomposition using calculated radian ranges for each frequency bin. Previous research has found that phase information can reveal characteristics of signals that are not evident from the magnitude spectrum alone. Phase-based techniques have been used for musical instrument characterization, modeling, coding, and classification. Higher-order statistics that maintain both magnitude and phase have also been explored for sound source separation.

Uploaded by

Paulo Feijão
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228762923

Exploring phase information in sound source separation applications

Article

CITATIONS READS

2 415

3 authors:

Estefanía Cano Gerald Schuller


AudioSoureceRe Technische Universität Ilmenau
48 PUBLICATIONS   328 CITATIONS    114 PUBLICATIONS   1,304 CITATIONS   

SEE PROFILE SEE PROFILE

Christian Dittmar
Friedrich-Alexander-University of Erlangen-Nürnberg
80 PUBLICATIONS   791 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Source Separation and Restoration of Drum Sound Components in Music Recordings View project

Scottish Fiddle Analysis View project

All content following this page was uploaded by Christian Dittmar on 30 May 2014.

The user has requested enhancement of the downloaded file.


Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

EXPLORING PHASE INFORMATION IN SOUND SOURCE SEPARATION


APPLICATIONS

Estefanía Cano, Gerald Schuller and Christian Dittmar


Fraunhofer Institute for Digital Media Technology
Ilmenau, Germany
{cano,shl,dmr}@idmt.fraunhofer.de

ABSTRACT some or all the magnitude information of a signal can be ex-


tracted from its phase. The minimum-phase condition where a
Separation of instrument sounds from polyphonic music record- signal can be recovered to within a scale factor from its phase is
ings is a desirable signal processing functionality with a wide discussed and iterative techniques for this purpose are presented
variety of applications in music production, music video games in [2]. For the particular case of discrete time signals a set of
and music information retrieval. In general, sound source separa- conditions is also presented: a sequence which is zero outside the
tion algorithms attempt to exploit those characteristics of audio interval 0 ≤ n ≤ (N -1) is uniquely specified to within a scale fac-
signals that differentiate one from the other. Many algorithms tor by (N -1) samples of its phase in the interval 0 < w < π if it
have studied spectral magnitude as a means for separation tasks. has a z-transform with no zeros on the unit circle or in conjugate
Here we propose the exploration of phase information of musical reciprocal pairs. More recently, Dubnov has explored the use of
instrument signals as an alternative dimension in discriminating phase information for musical instrument characterization, mod-
sound signals originating from different sources. Three cases are elling, coding and classification. Based on the fact that second
presented: (1) Phase contours of musical instruments notes as order statistics and power spectra are phase blind, he proposes
potential separation features. (2) Resolving overlapping harmon- the use of Higher Order Statistics (HOS) and their associated
ics using phase coupling properties of musical instruments. (3) Fourier transforms, i.e., polyspectra, to describe phase variations
Harmonic percussive decomposition using calculated radian that cannot be revealed by regular spectral analysis. Polyspectra
ranges for each frequency bin. are the mathematical generalization of the power spectrum main-
taining not only magnitude but also phase information. In [3],
Dubnov uses HOS to estimate sinusoidality for quality coding of
1. INTRODUCTION AND PREVIOUS WORK musical instruments. His system is based on the fact that the
analysis of sinusoidal harmonics leads to linear or almost linear
phases as opposed to the analysis of stochastic harmonics which
1.1. Phase in Music Information Retrieval leads to random phases. In synthetic, perfectly periodic signals,
The importance of phase in signals was thoroughly described by spectral components are multiple integers of the fundamental
Oppenheim and Lim in the 1980s [1]. They describe different frequency and thus, frequency coupled. Similarly, the relative
scenarios where important features of the signal are only pre- phases of the harmonics follow the phase of the fundamental and
served if the spectral phase, as opposed to spectral magnitude, is consequently spectral components are phase coupled. These non-
retained. Among other applications, the authors present different linear interactions between spectral components of the signal are
examples of images and speech where relevant information of the assessed using the bicoherence as a detector of frequency coupl-
signal is retained in phase-only reconstructions where the spec- ing and kurtosis as a measure of phase coupling. These measures
tral magnitude is either set to unity, randomly selected or aver- are used to associate a certain frequency bin as noise or harmonic
aged over an ensemble of signals. The contrasting case where content. In [4] Dubnov further explores the use of non-linear po-
synthesis is performed by preserving spectral magnitude with lyspectra methods in the development of a Harmonic + Noise
zero phase, i.e., magnitude-only reconstructions, shows to pre- model for analysis and synthesis of vocal and musical instrument
serve far less of the relevant features of the signal and decrease sounds. Once again, the use of bicoherence is proposed as a sinu-
intelligibility. In this sense, the authors argue that many of the soidality measure in each frequency band. Based on the fact that
important features preserved in phase-only reconstructions are spectral components of certain musical instruments can show
due to the fact that the location of events, e.g., lines and points in considerable sinusoidal phase deviations without actually causing
images, is retained. Bearing in mind that spectral magnitude of the spectral peak to become noise, phase noise variance for har-
speech and images tends to fall off at high frequencies, phase- monic partials is estimated using the phase coupling measure.
only reconstructions with unity magnitude can be interpreted as a Dubnov & Rodet investigate in [5] the phase coupling phenome-
spectral whitening process where the signal will experience a na in the sustained portion of musical instruments sounds. It is
high frequency boost that will consequently accentuate lines, well know that acoustical musical instruments never produce
edges, and narrow events without modifying their location. 1 Fur- waveforms that are exactly periodic. In this sense, two different
thermore, the authors address the cases and conditions where conditions are analyzed: synchronous phase deviations of propor-
tional magnitude which preserve phase relations between partials
1
and asynchronous deviations which do not preserve phase rela-
An example of a phase-only reconstruction of a speech signal tions and consequently change the shape of the signal. A measure
can be found in of phase coupling called Quadratic Phase Coupling (QPC) is
http://www.idmt.fhg.de/eng/business%20areas/analysis_audio_si used and its equivalency under certain assumptions to the dis-
gnals.htm.

DAFX-1
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

crete Bispectrum, i.e., 2D Fourier transform of the third order 1.2. Sound Source Separation
cumulant function, is presented. Phase correlation is analyzed by
calculating the instantaneous frequency by means of the un- Many algorithms designed for sound source separation are solely
wrapped phase and obtaining the fluctuations around an ideal based on analysis and processing of the spectral magnitude in-
theoretical value derived from the fundamental frequency, i.e., formation of an audio file. Virtanen for example proposes in [10]
a separation algorithm based on Nonnegative Matrix Factoriza-
f k  k  f o . This procedure eliminates phase deviations due to vi- tion (NMF) of the magnitude spectrogram into a sum of compo-
brato and slight pitch changes. Flute, trumpet and cello sounds nents with fixed magnitude spectra and time varying gains. The
are analyzed and results suggest that different instruments and system uses an iterative approach to minimize the reconstruction
possibly different instrument families have distinct phase coupl- error and incorporates a cost function that favors temporal conti-
ing characteristics, i.e., the trumpet signal exhibits high QPC val- nuity by means of the sum of squared differences between gains
ues and thus strong phase coupling among partials whereas the in adjacent frames. A sparseness measure is also included by pe-
flute signal shows some correlation but its phase deviations are nalizing nonzero gains. Fitzgerald et al. [11] have extensively
mostly uncoupled. explored the use of Nonnegative Tensor Factorization as an ex-
In [6] Cont & Dubnov expand the concept of phase coupling tension of matrix factorization techniques for source separation.
in musical instrument sounds to a real time multiple pitch and Tensors are built with magnitude spectrograms from the different
multiple instrument recognition system. They propose the use of audio channels and iterative techniques are used to find the dif-
the modulation spectrum presented in [7] as it is a good represen- ferent components in the mix. Shift invariance in the frequency
tation of phase coupling in musical instruments, shows both domain has been explored and a sinusoidal shifted 2D nonnega-
short-term and long-term information about the signal and is a tive tensor factorization (SSNTF) algorithm has been proposed
non-negative representation. where the signal is modeled as a summation of weighted harmon-
In [8] Paraskevas and Chilton present an audio classification ically related sinusoids. In [12] Burred uses the evolution of the
system that uses both magnitude and phase information as statis- spectral envelope in time together with a Principal Component
tical features. The problem of phase discontinuity is addressed Analysis (PCA) approach to create a prototype curve in the tim-
and two different types of such discontinuities are described: ex- bral space which is then used as a template for grouping and
trinsic discontinuities caused by the computation of the inverse separation of sinusoidal components from an audio mixture.
tangent function and intrinsic discontinuities that arise from Every and Szymanski propose in [13] a spectral filtering ap-
properties of the physical system producing the data and that oc- proach to source separation. The system detects salient spectral
cur due to simultaneous zero crossing of the real and imaginary peaks and creates pitch trajectories over all frames. The peaks are
components of the Fourier spectrum. An alternative method to then matched to note harmonics and filters are created to remove
calculate phase which overcomes both discontinuity problems the individual spectrum of each note from within the mixture.
and uses the z-transform of the signal is proposed. The classifica- Ono et al. proposed in [14] a Harmonic/Percussive separation
tion system was tested using gunshot signals and results show algorithm that exploits the anisotropy of the gradient spectro-
that there is 14% performance improvement for certain classes grams with an auxiliary function approach to separate the mix
compared to the case where only magnitude features are used. into its constituent harmonic and percussive components.
Furthermore, classification rates are also evaluated with phase The remainder of this paper is organized as follows: Section
information only. In general, classification rates are lower for 2 presents three scenarios where the use of phase information is
phase-only features than for magnitude-only features; however, relevant and three proposed algorithms are described. Section 3
certain classes show to be very well characterized by their phase presents some conclusions and a final discussion of possible fu-
information. ture approaches. In Sections 4 and 5 acknowledgements and used
In [9] Woodruff, Li & Wang propose the use of common references are presented.
amplitude modulation (CAM), pitch information and a sinusoid-
al signal model to resolve overlapping harmonics in monaural
musical sound separation. To estimate the amplitude of the over- 2. PHASE IN SOUND SOURCE SEPARATION: THREE
lapping harmonic, amplitude envelopes of sinusoidal components PROPOSED ALGORITHMS
of the same source are assumed to be correlated. This means that
the unknown amplitude can be approximated from the amplitude Spectral magnitude can be informative, intuitive, and numerically
envelopes of non-overlapped harmonics of the same source. Pitch simple. However, working with magnitude information in source
information is used to predict phase of the overlapped harmonic separation presents several difficulties: many separation algo-
by calculating the phase change of the spectral component on a rithms rely on the use of instrument models or spectral envelope
frame by frame basis: templates which suffer from the large diversity in terms of re-
cordings, playing styles, register, instrument models and per-
formers which can make them unreliable. Furthermore, some al-
gorithms rely on assumptions about the magnitude spectrum such
as spectral smoothness or harmonicity which might not always be
hnn (m)  2  hn Fn (m)T (1) fulfilled. Some systems also rely on the use of pitch tracking al-
gorithms to find the evolution of the harmonic components in
time and even though solid results can be achieved, performance
of such algorithms will suffer under noisy conditions.
where m denotes time frame, hn and Fn harmonic number and As opposed to spectral magnitude, interpreting phase infor-
fundamental frequency of source n respectively and T frame mation is a more challenging task as it is not visually intuitive,
shift in seconds. A least-squares solution approach is used to ob- presents numerical discontinuities that need to be dealt with and
tain the sinusoidal parameters. The phase change prediction error in its pure form might not be very informative. It also clear that
is calculated and results show that reliable estimations can be phase on its own might not be sufficient to achieve solid sound
obtained for lower numbered harmonics. separation. However, it is our belief that phase can be comple-

DAFX-2
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

mentary to magnitude information, increase robustness and en- harmonic components whose magnitude is very close to zero,
hance performance in separation algorithms. Three scenarios will phase values are completely uninformative and do not provide
be presented where phase information has been used in separa- solid information for separation applications.
tion tasks. This approach presents several benefits that can be ex-
ploited: (1) No assumption has to be made regarding harmonicity
of musical instruments as harmonic components can be tracked
2.1. Phase contours of musical instrument notes as separa-
by looking for similar phase trajectories in time. (2) Common
tion features
onset and decay of harmonic components can also be potentially
In general, the principle of Common Fate states that different exploited as phase values fall out of a predicted range (see Sec-
parts of the spectrum that change in the same way in time will tion 2.3) when a tone is not present. In contrast, this approach
probably belong to the same environmental sound [15]. In this also presents several difficulties: Collisions of harmonic compo-
sense, two types of changes can be studied: frequency modula- nents can be misleading as it has been observed that for such cas-
tion changes and amplitude modulation changes. Amplitude es, the phase trajectory of the harmonic component with the larg-
modulation changes in sound separation applications have been est magnitude prevails, showing once more the intricate relation-
studied in [10, 11, 12]. Here we are concerned with changes in ship between spectral phase and magnitude. Figure 5 shows an
the frequency and phase of harmonic components belonging to example where a clarinet C5 note and a trumpet G5 note have
the same source: In a mixture of sounds, any partials that change been mixed and a harmonic collision is present between the cla-
in frequency in an exactly synchronous way and whose temporal rinet’s second harmonic (H2) and the trumpet’s first harmonic
paths are parallel on a log-frequency scale are probably partials (H1). In this signal, the amplitude of the trumpet note was much
that have been derived from a single acoustic source [15]. Fur- higher than the clarinet note and it can be seen that the phase tra-
thermore, we explore the importance of Micromodulations in jectory for such harmonic follows the trajectory of the trumpet’s
harmonic partials as a sound separation feature. Micromodula- F0. In Section 2.2 a method to resolve harmonic collisions is pre-
tions refer to small frequency modulations that occur naturally in sented. Particularly for higher harmonics, frequency bin shifting
the human voice and musical instruments and that have potent makes tracking a more complex task. However, by exploring al-
effects on the perceptual grouping of the component harmonics. ternative frequency resolutions in the time frequency transform,
Four different signals are studied: (1) C5 violin note, (2) C5 both overlapping between harmonics and frequency bin shifting
trumpet note, (3) C5 clarinet note and (4) C5 piano note. All sig- can be minimized. Approaches like multiresolution Fourier
nals are monophonic tracks with a sampling frequency Fs = Transforms [18] or logarithmic frequency resolution can be ex-
44100 Hz taken from the University of Iowa Musical Instruments plored. As shown in Figure 4 for the piano note, not all instru-
Database [16]. A Hann-window 4096 samples long is used and a ments show properties in the phase trajectories that exhibit
hop size T = 512. For the different frequency bins, the Fourier Common Fate and consequently the approach is instrument de-
phase is differentiated in time and phase increments between pendent.
time frames are found. Inherent discontinuities in phase values
are resolved and kept within a [-π, π] range. This procedure 2.2. Resolving overlapping harmonics using phase coupling
would be equivalent to finding the instantaneous frequency if properties of musical instruments
phase values were divided by the hopsize T and normalized with
the sampling period. The basic assumption behind this procedure As discussed in Section 1, phase coupling is an important charac-
is that if there is a tonal component some linearity within the teristic of musical instrument sounds. In general, phase coupling
phase values can be expected without placing any constraint in implies that for a triplet of harmonically related partials with
terms of pitch variations or applied vibrato. If such variations are harmonic numbers j, k, and h, with h = j + k, any deviations that
large enough a frequency bin shifting might occur, i.e., the ob- occur in their respective phases ϕj, ϕk will sum up to occur identi-
served harmonic component might present itself in different fre- cally in ϕh.
quency bins through the duration of the signal. Frequency bin
shifting in harmonic components can in general be expected to (2)
happen in adjacent frequency bins making it easier to track  j  k  h  0
changes.
The phase contours obtained for the five signals are pre-
sented in Figures 1-4. For all the cases the pitch detection algo- As presented by Dubnov & Rodet in [5], phase coupling charac-
rithm described in [17] was used to detect relevant peaks in the teristics of musical instruments differ for instrument families and
audio track. Phase contours are presented for the fundamental types. In general, musical instruments are never perfectly phase-
frequency and the most prominent harmonic components. coupled and deviations are always expected. However, when it
It can be seen that for the violin, clarinet and trumpet notes, the comes to resolving overlapped harmonics where the frequency
micromodulations in frequency follow similar trajectories and the information of the different components is hidden within the mix,
principle of Common Fate can be observed. However, for the we propose the use of phase coupling properties to estimate fre-
piano note, micromodulations in frequency seem to be complete- quency information of the overlapped component. For such esti-
ly uncorrelated. Figure 2 shows the phase contours for the trum- mation, one condition must be fulfilled: (1) to be able to estimate
pet C5 note with vibrato. The large variations in the phase con- information of harmonic h, it must be guaranteed that the infor-
tours exhibit both the extent and frequency of the vibrato and mation of at least two harmonics j and k which fulfill the condi-
how it presents itself in the different harmonics. Instead of re- tion h = j + k is available.
moving vibrato as in [5], this approach sees vibrato as a potential Two signals were analyzed for this purpose: (1) Trumpet
feature for sound separation. It can be seen both in the clarinet C5 note. The sixth harmonic H6 is reconstructed using H2 and
(Figure 3) and violin (Figure 1) notes that for the attacks and de- H4. (2) Violin C5 note. The third harmonic H3 is reconstructed
cays of the notes Common Fate is not so clear and micromodula- using H1 and H2.
tions are not so correlated. It is important to mention that for

DAFX-3
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

Figure 1: Phase contours obtained for a C5 violin note. The fundamental frequency F0 and the first three harmonic components are
shown.

Figure 2: Phase contours obtained for a C5 trumpet note with vibrato. The fundamental frequency F0 and the first three harmonic com-
ponents are shown.

Figure 3: Phase contours obtained for a C5 clarinet note. The fundamental frequency F0 and the first four harmonic components are
shown.

DAFX-4
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

Figure 4: Phase contours obtained for a C5 piano note. The fundamental frequency F0 and the first three harmonic components are
shown.

Figure 5: Phase contours obtained for a C5 clarinet note + G5 trumpet note mix. The fundamental frequencies for both instruments and
an overlapped harmonic are shown.

Prediction errors of the overlapped harmonics are presented in As in most source separation algorithms being able to determine
Figure 6 and the corresponding phase contours obtained are where the harmonic collisions appear is not a simple task. How-
shown in Figures 7 and 8. For visualization purposes and to ever if a certain number of harmonic components that exhibit
avoid contour overlapping, the estimated contours in Figures 7 similar phase trajectories as in Section 2.1 have been detected,
and 8 have been given a 0.3 vertical offset. Consequently, the prediction of a missing overlapped harmonic can be made using
upper contour in both figures represents the estimated harmonic harmonicity pointers and searching the spectrogram for promi-
and the lower contour represents the true harmonic. Results show nent harmonics. Furthermore, it is important to mention that
that as long as condition (1) is fulfilled; reconstructing frequency phase coupling properties are different for all musical instru-
information of overlapped harmonics is possible for certain in- ments and consequently performance of such a system will also
strument types by exploiting phase coupling properties of musi- be instrument dependent.
cal instruments. As far as the magnitude information, the ap-
proach used by Woodruff [9] or the iterative techniques proposed
2.3. Harmonic/Percussive decomposition using calculated
in [2] to reconstruct magnitude from phase can be explored.
radian ranges for every frequency bin
In this case we exploit the fact that for a certain frequency bin,
phase values of tonal components will fall within a radian range
determined by the frequency band covered by the frequency bin
and the hop size T of the time-frequency transform. Particularly,
the condition of phase linearity is relaxed and micromodulations
of frequency are allowed within the radian range of the frequency
bin. Values of phase outside the calculated range are assumed
non-tonal and consequently classified as percussive components.
A percussive-harmonic spectral mask in created both for the
phase and magnitude spectrograms and applied for synthesizing
the harmonic and percussive tracks. It has been observed that
when percussive and tonal components are simultaneously pre-
sent in a particular time frame and frequency bin, the phase val-
Figure 6: Prediction Error for overlapped harmonics. Column ues of the percussive component prevail and they do not lay in
one represents the trumpet and column two the violin harmonic. general within the radian ranges calculated for every frequency

DAFX-5
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

bin. In this sense, a strict sound separation task is not being per- the project’s web site.2 For comparison purposes the percussive
formed as phase values outside the range imply the presence of a and harmonic tracks obtained with Ono’s [14] algorithm are also
percussive component but not necessarily the absence of a tonal available in the web site. For Ono’s algorithm the following pa-
one. For such case, no estimation of the hidden tonal component rameters, as proposed by the authors for best performance, were
is performed and the information of that frequency bin in that used: α = 0.3, γ = 0.3 and the maximum number of iterations was
time frame is assumed to be percussive. set to 50. No direct numeric comparison is presented with Ono’s
algorithm as the performance measures used do not correlate di-
rectly to any perceptual attribute and in certain cases particularly
when the perceived loudness of interference or artifacts is much
smaller than the power of the corresponding signals the numbers
can be misleading. For this reason only a perceptual comparison
is presented.
Especially for the harmonic tracks obtained, performance
measures show good results with positive ratios in all cases. As
expected, performance with percussive tracks is much lower fal-
ling in the negative values. In an auditory evaluation, the har-
monic and percussive components are well separated into the re-
Figure 7: Estimated and true phase trajectories for the sixth spective tracks. Bass drums and singing voice are particularly
harmonic H6 of a trumpet note. Blue: Estimated. Red: True. challenging as in both cases, elements from each source are
placed in both the harmonic and percussive tracks.

Table 1: Performance measures obtained for the three analyzed


tracks.

SDR SAR SIR


Harmonic 6.8141 12.4192 8.4533
Figure 8: Estimated and true phase trajectories for the third
1.Nm Percussive -7.9250 -3.5217 -0.8490
harmonic H3 of a violin note. Blue: Estimated. Red: True. Harmonic 2.88 9.78 4.30
2. 7Y Percussive -8.02 -4.59 0.49
The algorithm is summarized as follows: Harmonic 2.1900 8.1912 4.0591
3.WR Percussive -5.2766 -2.2619 2.0162
1. Calculate the STFT of the input audio signal
2. For every subband k in the STFT, calculate minimum
and maximum radian changes using eq. (1). 3. CONCLUSIONS
3. Create the binary spectral masks: For every time frame
m and subband k, check if phase values fall Three cases have been presented were phase information has
within the calculated radian ranges. Values within been used in sound separation problems. In all cases, phase in-
these ranges are assumed tonal and outside the ranges formation appears to be informative and somehow complemen-
percussive. tary to the use of magnitude information. Phase contours for mu-
4. Apply masks both to phase and magnitude spectro- sical instruments exhibit similar micromodulations in frequency
grams. for certain instruments and can be an alternative of spectral in-
5. Obtain percussive and harmonic audio sig- strument templates or instrument models. For the case of over-
nals with inverse STFT. lapped harmonics, phase coupling properties can be exploited for
certain instruments. For the two instruments presented, estimated
To test the algorithm 3 mixtures were created from multi-track harmonics show prediction errors lower than 0.05 radians. For
recordings available in [19]. The three tracks used for evaluation the Harmonic-Percussive decomposition, radian ranges have
are (1) Natural Minor (Nm), (2) Seven Years of Sorrow (7Y) and been calculated for every frequency bin and by relaxing phase
(3) Wreck (WR). The algorithm was used to create independent linearity and allowing frequency variations, tonal components
harmonic and percussive tracks and the SISEC evaluation tool- have been detected. The spectral mask created allows discrimi-
box [20] was used to assess the algorithm’s performance using nating not only magnitude but also phase information belonging
the original multi-track recordings for comparison. Signal to Dis- to the harmonic and percussive components. Both harmonic and
tortion Ratio (SDR), Signal to Artefacts Ratio (SAR) and Signal percussive tracks obtained can be used to facilitate transcription
to Interference Ratio (SIR) are presented for the three signals. A applications. Further studies have to be made in order to assess
thorough description of these measures and their calculations is performance and robustness of the algorithms in more complex
presented in [21]. The performance measures obtained are pre- and demanding scenarios. A possible extension to this approach
sented in Table 1 and the audio tracks obtained can be heard in

http://www.idmt.fhg.de/eng/business%20areas/analysis_audio_si
gnals.htm

DAFX-6
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

is the use of Modulation Spectra as a means of exhibiting fre- [13] Every Mark R. and Szymanski John E. “A spectral filtering
quency variations in the different frequency bins. approach to music signal separation,” in 7th International
Conference on Digital Audio Effects (DAFx). Naples. 2004.
4. ACKNOWLEDGMENTS [14] Ono Nobutaka [et al.] “Separation of a Monaural Audio
Signal into Harmonic/Percussive Components by
The Thuringian Ministry of Economy, Employment and Tech- Complememntary Diffusion on Spectrogram,” in 16th European
nology supported this research by granting funds of the European Signal Processing Conferenc (EUSIPCO). Lausanne,
Fund for Regional Development to the project Songs2See, enabl- Switzerland, Aug. 2008.
ing transnational cooperation between Thuringian companies and [15] Bregman Albert S. Auditory Scene Analysis. The
their partners from other European regions. perceptual organization of sound. Cambridge : MIT Press, 1999.
[16] University of Iowa Musical Instrument Samples. Available
at http://theremin.music.uiowa.edu/. Accessed October 10, 2010.
5. REFERENCES
[17] Cano Estefanía and Cheng Corey. “Melody Line Detection
and Source Separation in Classical Saxophone Recordings,” in
[1] Oppenheim Alan V. and Lim Jae S. “The importance of
Proceedings of the 12th International Conference on Digital Au-
phase in signals,” in Proceedings of IEEE. pp. 529-550, 1981.
dio Effects (DAFx-09), Como, Italy, Sept 1 - 4, 2009.
[2] Quatieri Thomas F. and Oppenheim Alan V. “Iterative
[18] Dressler Karin. “Sinusoidal Extraction using an Efficient
techniques for minimum phase signal reconstruction from phase
Implementation of a Multi-Resolution FFT,” in 9th International
or magnitude,” in IEEE Transactions on Acoustics, Speech and
Conference on Digital Audio Effects (DAFx). Montreal, Canada,
Signal Processing, vol. 29, no. 6 , pp. 1187-1192, 1981.
Sept. 2006.
[3] Dubnov Shlomo. “Higher order statistical estimation of
sinusoidality with applications for quality coding of musical
[19] Multitrack Recording. Available at http://bass-
db.gforge.inria.fr/BASS-dB/?show=browse&id=mtracks. Ac-
instruments,” in AES 17th International Conference on High
cessed March 10, 2010.
Quality Audio Coding, Florence, Sept. 1999.
[20] SISEC Evaluation Software. Available at
[4] Dubnov Shlomo. “Improved harmonic + noise model for http://www.irisa.fr/metiss/members/evincent/software. Accessed
vocal and musical instrument sounds,” in AES 22th International March 10, 2010.
Conference on Virtual, Synthetic and Entertainment Audio, [21] Vincent Emmanuel, Gribonval Rémi and Févotte Cédric
Espoo, Finalnd, June 2002. “Peformance Measurement in Blind Audio Source Separation,”
[5] Dubnov Shlomo and Rodet Xavier. “Investigating the phase in IEEE Transactions on Audio, Speech and Language
coupling phenomena in sustained portion of musical instruments
sound,”`in Journal of the Acoustical Society of America, vol. 113,
no. 1, pp. 348-359, 2003.
[6] Cont Arshia and Dubnov Shlomo. “Real time multi-pitch
and multi-instrument recognition for music signals using aparse
non-negative constraints,” in Proceedings of the 10th
International Conference on Digital Audio Effects
(DAFx). Bordeaux, France, 2007.
[7] Sukittanon Somsak, Atlas Les E. and Pitton James W. “
Modulation-scale analysis for content identification,” in IEEE
Transactions on Signal Processing, vol. 52, no. 10, pp. 3023-
3035. 2004.
[8] Paraskevas Ioannis and Chilton Edward. “Combination of
magnitude and phase statistical features for audio classification,”
in Acoustics Research Letters Online (ARLO). vol.5, no 3. pp.
111-117. 2004.
[9] Woodruff John, Yipeng Li and Wang DeLiang. “Resolving
overlapped harmonics for monaural musical sound separation
using pitch and common amplitude modulation,” in International
Conference on Music Information Retrieval
(ISMIR). Philadelphia, Sept. 2008, pp. 538-543.
[10] Virtanen Tuomas. “Monaural sound source separation by
nonnegative matrix factorization with temporal continuity and
sparseness criteria,” in IEEE Transactions on Audio, Speech
and Language Processing. vol. 15, no. 3, pp. 1066-1074. 2007.
[11] Fitzgerald Derry, Cranitch Matt and Coyle Eugene.
“Extended nonnegative tensor factorization models for musical
sound source separation,” in Computational Intelligence and
Neuroscience. Hindawi Publishing Corporation, 2008.
[12] Burred Juan Jose, “ From Sparse Models to Timbre Learn-
ing: New Methods for Musical Sound Separation,” PhD Thesis.
Elektrotechnik und Informatik der Technischen Universit at Ber-
lin. 2009.

DAFX-7
Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 6-10, 2010

DAFX-8

View publication stats

You might also like