Noise Reduction For Periodic Signals Using High-Resolution Frequency Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Yoshizawa et al.

EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5


http://asmp.eurasipjournals.com/content/2011/1/5

RESEARCH Open Access

Noise reduction for periodic signals using high-


resolution frequency analysis
Toshio Yoshizawa, Shigeki Hirobayashi* and Tadanobu Misawa

Abstract
The spectrum subtraction method is one of the most common methods by which to remove noise from a spectrum.
Like many noise reduction methods, the spectrum subtraction method uses discrete Fourier transform (DFT) for
frequency analysis. There is generally a trade-off between frequency and time resolution in DFT. If the frequency
resolution is low, then the noise spectrum can overlap with the signal source spectrum, which makes it difficult to
extract the latter signal. Similarly, if the time resolution is low, rapid frequency variations cannot be detected. In order
to solve this problem, as a frequency analysis method, we have applied non-harmonic analysis (NHA), which has high
accuracy for detached frequency components and is only slightly affected by the frame length. Therefore, we
examined the effect of the frequency resolution on noise reduction using NHA rather than DFT as the preprocessing
step of the noise reduction process. The accuracy in extracting single sinusoidal waves from a noisy environment was
first investigated. The accuracy of NHA was found to be higher than the theoretical upper limit of DFT. The
effectiveness of NHA and DFT in extracting music from a noisy environment was then investigated. In this case, NHA
was found to be superior to DFT, providing an approximately 2 dB improvement in SNR.

1. Introduction presence of ambient noise can decrease the level of


Noise reduction to recover a target signal from an input enjoyment. Therefore, various noise reduction methods
waveform is important in a number of fields. We usually are being investigated, and a number of noise reduction
use a frequency spectrum to remove noise from the input techniques have been proposed. The spectral subtraction
waveform. Although it is difficult to distinguish a signal method (SS method) is a widely used approach [1] in
from the noise in the time domain, this task tends to which the target signal is extracted from a noisy signal by
become easier in the frequency domain. However, it is measuring the noise in advance and modeling the statisti-
difficult to filter out noise that is similar to a signal. For cal spectral envelope characteristics [2-4]. The SS method
example, the consonant, which is the part of the sound does not require multiple microphones, and highly effec-
that has a frequency spectrum that is similar to a noise. tive results can be obtained by using a relatively simple
This study proposes a basic technology by which to algorithm. For this reason, many techniques for improv-
remove a noise from musical sound including several ing the SS method have been proposed. Sorensen and
periodic signals. We selected white noise and pink noise Andersen [5] also used the SS method in combination
as the noise signals. These noises are common in cities as with speech presence detection. Soon and Koh [6] and
well as in nature and have a continuous spectrum. Based Ding et al. [7] treated audio signals as graphics and
on this study, we can remove white noise, including applied 2D and 1D Wiener filters in the frequency
wideband noise such as pulse and white noise, from an domain for noise reduction. The advantage of this
old music recording in order to apply digital remastering method is the possibility of frame-to-frame correlation.
in multimedia industries. We will also be able to remove In addition, the amplitude in the frequency domain can
noise from a recording of a singing voice because this is a be adjusted and an unmodified initial phase can be used.
periodic signal. When listening to music in a high-noise Finally, Virag [8] and Udrea et al. [9] suggested an SS
environment, difficulty in hearing the music and the method based on the characteristics of the human audi-
tory system.
* Correspondence: [email protected] However, using unmodified noisy phases limits the
Department of Intellectual Information Systems Engineering, Faculty of
Technology, University of Toyama, 3190 Gofuku, Toyama-shi, Toyama, Japan
noise reduction effect. In general, the discrete Fourier

© 2011 Yoshizawa et al; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 2 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

transform (DFT) is used to obtain the spectral charac- components, and X(k) is not accurately reflected in the
teristics during preprocessing for the SS method. The spectral structure.
frequency resolution of the DFT is restricted because it In order to increase the frequency resolution, the
depends on the analytical frame length and the window value of N is generally increased. If the frequency is
function. If the frequency resolution is low, the noise accompanied by a temporal fluctuation, however, then
spectrum can overlap the spectrum of the signal source, the average period is extracted and the analytical accu-
which makes it difficult to extract the original signal. racy deteriorates as N is increased. Some techniques use
Energy leaks into another band and side lobes are gen- an analysis window function for x(n) in preprocessing.
erated when the frequency of the analytic signal does However, this does not improve the apparent frequency
not correspond to an integral multiple of the base fre- resolution.
quency. In harmonic frequency analysis, there is then a Figure 1 shows some of the problems associated with
high probability of overlap between the side-lobes of the frequency analysis. Even when analyzing the simplest fre-
source spectrum and the noise spectrum. If the side- quency signal shown at the top of Figure 1, one portion
lobes are removed, then the signal source can fully be of the section is removed when determining the periodi-
recovered. Similarly, if the time resolution is low, then city of the analyzed signal. The center left section of
rapid frequency variations cannot be detected. In order Figure 1 shows the analytical accuracy. The period can
to solve this problem, Kauppinen and Roth attempted to accurately be identified only if the frame length is a mul-
increase the frequency resolution by applying an extra- tiple of the period of the analyzed signal. In other words,
polation method to the signal frame in the time domain a group of different spectra appear near the true fre-
[10]. In this study, we have applied non-harmonic analy- quency because the analyzed signal is expressed as a mul-
sis (NHA), which has a high frequency resolution with tiple number of periods NΔt/k. In order to prevent this,
limited influence of the frame length [11], to the pro- an analysis window function may be used, as shown in
blem of noise reduction. For a similar frame length, the center right section of Figure 1. However, this will
NHA is expected to achieve better frequency resolution merely concentrate around the true value, making it diffi-
than the length extrapolation method used in [10]. cult to determine the true value. We, therefore, noted
Therefore, we investigated the use of NHA as an alter- that the Fourier coefficient could be estimated by solving
native preprocessing method to DFT for noise reduc- a nonlinear equation based on the assumption of a sta-
tion. Since the effects of frequency resolution can best tionary signal (see the bottom of Figure 1). Thus, the
be evaluated for periodic signals, sounds produced by NHA developed in this study achieves a high analytical
musical instruments were used in this study, and preli- accuracy because this NHA reduces the influence of the
minary noise reduction experiments were performed. analysis window.
The remainder of this article is organized as follows.
In Section 2, we provide an introduction to the NHA 2.2 Algorithm of NHA
algorithm. In Section 3, we investigate noise reduction Figure 2 shows the algorithm used by NHA. First, a fre-
using single sinusoidal waves. Section 4 describes the quency analysis of the input signal is carried out by fast
side-lobe suppression experiments. In Section 5, noise Fourier transform (FFT) for obtaining the initial value.
reduction experiments are carried out using sounds pro- Next, the frequency and initial phase of the spectral com-
duced by musical instruments, and the results are ponent that has the largest amplitude are converged
described in Section 6. using a cost function with the steepest descent method.
At this time, a weighting coefficient based on the retarda-
2. The NHA method tion method is applied to convert the cost functions cal-
2.1 Background culated by the recurrence formulas into a monotonically
The DFT is generally used for frequency analysis. A dis- decreasing sequence. The amplitude is then converged
crete spectrum X of the discrete time signal x(n) of using Newton’s method. Following this, Newton’s
length N can be expressed as method is applied again to converge both the frequency
and the initial phase to a high degree of accuracy. Follow-
−j2π kn
1 
N−1
ing a final convergence of the amplitude using Newton’s
X(k) = x(n)e N (k = 0, 1, 2, . . . , N − 1). (1) method, we obtain the fully converged spectrum.
N n=0
Finally, we describe the motivation for the structure
When the sampling frequency is Δt and the original shown in Figure 2. For the cost function equation, given
signal x(n) has a period of NΔt/k, X(k) can accurately by Equation 2, although the convergence speed is slow,
reflect the spectral structure. However, if a period other the steepest descent method can find the stationary
than NΔt/k appears in x(n), X(k) is expressed by the point within a wide range. In contrast, the Newton
combination of NΔt/k in terms of several frequency method can quickly find a nearby stationary point.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 3 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 1 Fourier transform and NHA technique.

Therefore, we first use the steepest descent method to 2.3 Details of NHA
find the stationary point within a wide range. Then, we In this section, we present a more detailed description
use the Newton method to quickly find a stationary of the NHA method. Since the Fourier coefficient is
point. Either way, we distinguish the convergence calcu- estimated by solving a nonlinear equation, NHA enables
lation of amplitude A from the other parameters, so the frequency and its associated parameters to be accu-
that the local stationary point will not be calculated rately estimated without being significantly affected by
incorrectly. the frame length. In order to minimize the sum of


Figure 2 NHA algorithm.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 4 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

squares of the difference between the object signal and around the true value, where N is 512, fs is 512, and the
the sinusoidal model signal, the frequency f̂ , amplitude true values of A, f, and j are 1, 100 Hz, and 0.5π rad,
respectively. Since small values are given in black,
 , and initial phase φ̂ are calculated using the cost troughs appear as black and peaks as white. In other
function, as follows: words, Equation 2 is a multimodal nonlinear evaluation
  2 function. Around the true value ( f̂ = 100, φ̂/(2π ) = 0.5),
1
N−1

F(Â, f̂ , ϕ̂) = x(n) − Â cos 2π n + ϕ̂ , (2) minimum and maximum values are aligned vertically.
N n=0 fs This is because the true value is a minimum but becomes
a maximum for the antiphase case (j(2π) = 0, 1). Since
where N is the frame length and fs is the sampling fre- the trough at the minimum value is 2 Hz wide, the mini-
quency (fs = 1/Δt). mum of the evaluation function can be estimated only if
2.3.1. Steepest descent method the initial value lies in the trough when solving the non-
George and Smith [12,13] attempted to introduce the linear equation. Since the DFT frequency resolution is 1
signal parameter A and the initial phase j by applying Hz, one or two points can be contained in a trough that
the least mean squares method to the difference signal is 2 Hz wide. At the point on the frequency axis where
between the analyzed signal and the modulated harmo- the DFT amplitude becomes maximum (i.e., the integral
nic sinusoidal wave. frequency when the frame length is 1 s), the evaluation
However, this method is strongly dependent on the function of Equation 2 is minimized at the initial phase
frame length and is difficult to apply to the analysis of determined by DFT.
signals that do not have a simple frequency harmonic If the maximum amplitude A determined by DFT and
structure because frequencies that are dependent on the the frequency f and initial phase j are used as initial
frame length are used for the group of harmonic fre- values (A 0,0 , f 0,0 , j 0,0 ), then the initial values can be
quencies, as in DFT. In other words, small frequency given inside the trough containing the minimum of cost
changes cannot be detected. function in Figure 3.
By focusing on the problem of solving a nonlinear Therefore, in order to obtain an accurate spectrum,
equation, we apply the nonlinear equation process to we use the initial value (A 0,0, f0,0, j0,0 ), which is con-
Equation 2 for optimum calculation of the frequency f, as verged using the nonlinear equation process. Consider-
well as the parameter amplitude A and initial phase j. ing Equation 2 as the cost function, this nonlinear
Figure 3 shows an example of the characteristics of f̂ problem is converted into a minimization problem, and
and φ̂ in the evaluation function of Equation 2, enlarged f̂m,p and φ̂m,p are determined using the steepest descent

Figure 3 Distribution of the cost function.


Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 5 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

method and the retardation method to obtain the fol- 2.3.2. Amplitude convergence
lowing expressions: Here, A can be uniquely determined only if f̂m,p and
∂Fm,0,0 φ̂m,p are known, and the following formula is used to
f̂m,p = f̂m,0 − μm,p , (3)
∂f cause A to converge:
∂Fm,p,0
∂Fm,0,0 Âm,q = Âm,0 − νm,q (8)
φ̂m,p = φ̂m,0 − μm,p , (4) ∂A
∂φ
Similarly, μ m,p and v m,q are weighting coefficients
where p is the operated number of the retardation based on the retardation method [14-16] and are given
methods for the frequency and the phase, and m is the by
number of iterations of the steepest descent method.
We use the following shorthand νm,q+1 = 0.5νm,q , (9)

Fm,p,q = F(Âm,q , f̂m,p , φ̂m,p ), (5) with v m,1 = 1. This causes Âm,q to converge with a
high degree of accuracy until
where q is the number of iterations of the retardation
method. These variables are iterated as shown in Figure 4. Fm,p,q < ((1 − 0.5νm,q ) · Fm,p,0 ). (10)
In the above equations, μm,p is a weighting coefficient
based on the retardation method and has a value between Then, Âm+1,0 , f̂m+1,0 , and φ̂m+1,0 are set to Âm,q , f̂m,p ,
0 and 1 to convert the cost functions calculated by recur- and φ̂m,p , and q and p are reset to 1.
rence formulas into a monotonically decreasing sequence
Next, the steepest descent method and the amplitude
[14-16]. In this article, we use this weighting coefficient as
converging algorithm are recursed until the cost func-
follows
tion becomes partially converged. Newton’s method is
μm,p+1 = 0.5μm,p , (6) then applied.
2.3.3. Newton’s method
where μm,1 is set to 1. Although the steepest descent method causes values to
This series of calculations is repeated to cause f̂m,p converge over a comparatively wide range, a single ser-
ies of operations cannot ensure sufficient accuracy. In
and φ̂m,p to converge with high accuracy until the fol-
order to achieve a highly accurate conversion, NHA
lowing conditions occur: uses Newton’s method following the lower accuracy
Fm,p,0 < ((1 − 0.5μm,p ) · Fm,0,0 ). (7) steepest descent method. The following recurrence for-
mula is used for Newton’s method:
The next step is the convergence of the amplitude.  
 ∂Fm,0,0 ∂ 2 Fm,0,0 
 
μm,p  ∂f ∂f ∂φ  ,
f̂m,p = f̂m,0 − (11)
J  ∂ 2 Fm,0,0 ∂ 2 Fm,0,0 
 ∂φ ∂φ 2 

 2 
 ∂ Fm,0,0 ∂Fm,0,0 
 
μm,p  ∂f 2 ∂f 
φ̂m,p = φ̂m,0 −  2 , (12)
J  ∂ Fm,0,0 ∂Fm,0,0 
 
 ∂f ∂φ ∂φ 

where
 2 
 ∂ Fm,0,0 ∂ 2 Fm,0,0 
 
 
J =  2∂f ∂f ∂φ
2
, (13)
∂ F ∂ 2 Fm,0,0 
 m,0,0 
 ∂f ∂φ ∂φ 2 

 and m is the number of iterations of Newton’s


Figure 4 Convergence process for the steepest descent and method. In addition, μ m,p is similarly obtained from
the retardation method.
Equation 6. This series of calculations is also repeated
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 6 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

to cause f̂m and φ̂m to converge accurately. After apply- 2.4. Accuracy of NHA
Among the techniques based on DFT, generalized harmo-
ing Equations 11 and 12, Âm is made to converge by
nic analysis (GHA or Hirata’s algorithm) is generally con-
applying Equation 8 in the same manner as in the stee- sidered to have the highest accuracy [17-20].
pest descent method, and the series of calculations is According to these analyses, the frequency resolution
repeated. The only difference is that the converging depends on the frame length because one analysis window
algorithm is repeated using Newton’s method instead of apparently has the length of several windows. However,
the steepest descent method. Thus, the frequency para- the decomposition frequency has a finite length, and an
meters are estimated to a high degree of accuracy and object signal of any other frequency cannot be analyzed.
at high speed by using a hybrid process combining the Figure 5 shows the numbers of frequencies that can be
steepest descent and Newton’s method. analyzed by DFT and GHA at each frame length. Success-
2.3.4. Sequential reduction ful frequency analysis means that the number of spectra of
Even for the case in which there are several sinusoidal the object signal matches the number of spectra after ana-
waves, the spectral parameters can approximately be lysis, that is, if the frame length is unique, then DFT has N
derived by sequential reduction. Here, x(n) is expressed as decomposition frequencies (0, fs/N, 2f/N,..., (N - 1)fs/N
the sum of K sinusoidal waves in the following manner: [Hz]). Compared to DFT of approximately half the data
K 

length, GHA is one order of magnitude more accurate. If
fk the spectrum of the object signal is not in the group of the
x(n) = Ak cos 2π n + φk . (14)
fs harmonic spectra, the group of harmonic spectra appears
k=1
near the true frequency.
According to Parseval’s theorem, the object signal fre- In order to verify the frequency resolution of NHA, we
quency f k and the model signal’s frequency f̂ do not compared DFT and GHA experimentally, as shown in
match, i.e., if Figure 6. With the frame length set to 1 s (512 samples),
we analyzed a single sinusoidal wave. By each technique,
fk = f̂ , (15) one sinusoidal wave was extracted, and the square of the
error from the original signal was examined.
then DFT exhibited low analytical accuracy except when the
signals had frequencies that were integral multiples of

K the fundamental frequency. At frequencies above 1 Hz,
F(Â, f̂ , φ̂) = Â2 + Â2k . (16) GHA exhibited accuracies that were two to five orders of
k=1 magnitude greater. At the same frequencies, NHA was 10
or more orders of magnitude more accurate than DFT.
In addition, if the pair of f̂ and φ̂ matches either fk At frequencies below 1 Hz, DFT and GHA were equally
or φk , then accurate, but NHA was able to estimate the frequency

2 
K
F(Â, f̂ , φ̂) = Â2 − Aj + Â2k . (17)
k=1.k=j

If both Aj and A match, then a frequency component


of an estimated spectrum can completely be removed
from an object signal. Therefore, the problem of acquir-
ing an optimum solution is frequency independent and
is applicable even to a signal consisting of several sinu-
soidal waves by sequential and individual estimation
from the object signal. In other words, even when the
object signal is a composite sinusoidal wave, several
sinusoidal waves can be extracted by performing similar
processing on sequential residual signals. If the frequen-
cies of two spectra are adjacent to each other, the other
spectrum generates another trough in the trough around
the true value shown in Figure 3 and distorts the evalua-
tion function. This may result in an error, as discussed Figure 5 Frequency resolution of DFT and GHA.
later herein.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 7 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 6 Square error (frame length: 512).

and other parameters correctly without being affected by NHA is performed. A single sinusoidal wave in a noisy
the frame length. Thus, NHA was demonstrated to have environment was used for the experiment. For each
an even greater analysis accuracy than GHA, which was method, an optimum spectrum (closest to the target sig-
developed from DFT. nal frequency) was selected and converted to a wave-
Accurate estimation at frequencies below 1 Hz means form for evaluation. For DFT, f is necessarily an integral
that even object signals having periods longer than the multiple of the fundamental frequency. For the calcula-
frame length can accurately be analyzed. Therefore, it tions, the frame length was set to 256, and the sampling
may be possible to accurately estimate the spectral frequency was set to 488 kHz. The sinusoidal wave was
structures of signals representing stock prices and other set to 488 Hz in order to investigate frequencies that
fluctuation factors. DFT could not estimate.
Figures 7 and 8 show the square errors of two sinusoidal Figure 9 shows the sinusoidal wave extracted by DFT
waves. A similar evaluation to that in Figure 6 was per- and NHA from a white-noise environment in which the
formed by adding another sinusoidal wave (f = 0.6 Hz) in SNR was 0 dB, where (a) is the 488 Hz target signal and
order to determine whether both sinusoidal waves could (b) is the added white noise signal.
be correctly extracted. Figure 9c, 9e are the signals detected by NHA and
The ratio of the amplitudes of the two sinusoidal waves DFT, respectively, and (d) and (f) are the residual signals
is 1:1 in Figure 7 and 1:10 in Figure 8. The latter is the obtained by subtracting (c) and (e) from the target sig-
sinusoidal wave ratio at f = 0.6 Hz. In both cases, the nal. This figure shows that NHA more accurately
accuracy increases in the order of NHA, GHA, and DFT. extracts the original signal. When noise is added to the
If the two sinusoidal waves have similar amplitudes, the signal, DFT produces errors if the frequency is not a
evaluation functions shown in Figure 3 interfere with multiple of the fundamental frequency. The output SNR
each other, increasing the distortion, which results in a was approximately 24 dB when NHA was used for
greater error than that when only one sinusoidal wave is extraction and approximately 4 dB when DFT was used.
used. As mentioned above, this tendency becomes more Thus, an improvement of approximately 20 dB was
noticeable as the frequencies become closer to each confirmed.
other. However, the NHA error is less than the average, These calculations were performed using a personal
as compared to the errors of DFT and GHA. computer (CPU: Intel Core [email protected] GHz, Memory: 6
GB). The time required for calculating a signal consist-
3. Extracting single sinusoidal waves ing of 256 samples by DFT and NHA are 2.8 and 12.0
In this section, a quantitative comparison of the extrac- ms, respectively. It is noted that DFT is calculated by
tion accuracy and the calculation time of DFT and the fastest FFT using a radix-2 number in this article.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 8 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 7 Square error of the obstruction sine wave (A = 1, f = 0.6).

Figure 8 Square error of the obstruction sine wave (A = 10, f = 0.6).


Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 9 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 9 Sinusoidal waves extracted by DFT and NHA from a white-noise environment (SNR: 0 dB).

For statistical verification at various target signal fre- yielded results well above the theoretical limit of DFT and
quencies, an extraction experiment was conducted in showed a linear improvement even in a low-noise environ-
which the frequency f and the initial phase j of the tar- ment, thus confirming the importance of improved fre-
get signal were varied 1,000 times in different noise quency resolution.
environments using uniformly distributed random num-
bers. The range of f and j was 0 <f < 4000 and -π <j 4. Suppression of side-lobes
<π, respectively. In this case, the amplitude A was main- In this section, the ability of NHA to suppress side-lobes
tained constant. The input signal was generated by add- is discussed. A frequency analysis was performed on a
ing white noise to a single sinusoidal wave. Throughout waveform composed of four sinusoidal waves (see Table 1).
the experiments, the input SNR was maintained in the Figure 11 shows the resulting waveform, and Figure 12
range from -10 to +10 dB and was varied in 5-dB steps. shows the frequency spectra of this waveform as deter-
Figure 10 shows the results for a white-noise environ- mined by DFT (zero-padding indicates interpolation of the
ment. The upper dotted line indicates the theoretical limit DFT) and NHA. In the case of DFT, side-lobes exist
of recovery using DFT. This corresponds to the case in around the main-lobe because of the limited frequency
which the extracted spectrum could be converted back to resolution. In the case of NHA, a line spectrum that is
a waveform with the original amplitude. As shown in similar to that of the original waveform is obtained, and no
Figure 10, NHA performed much better in white-noise side-lobes are produced. Even spectral components that
environments. Because of the finite frequency resolution, are weaker than the DFT side-lobes can be extracted, as
recovery of a single spectrum using DFT was limited, par- shown in Figure 12c.
ticularly in a low-noise environment. Recovery using NHA In a case such as that shown in Figure 13, in which the
source spectrum is mixed with a noise spectrum, side-
lobe suppression can lead to greater noise reduction. The
black line indicates the signal source spectrum, and the
gray line represents the noise signal spectrum.
Figure 13a shows the case for DFT. The side-lobes of the
source spectrum overlap the noise spectrum, making it
difficult to estimate the amplitude. In addition, the phase
information of the target signal is lost. If the side-lobes are
removed, then the signal source cannot fully be recovered.
On the other hand, the possibility of any overlap between

Table 1 Parameters of sinusoidal waves


Sinusoidal waves
Mark Amplitude Target frequency (Hz)
(a) 0.8 4.2
(b) 1 10.3
Figure 10 SNR changes of sinusoidal waves extracted by DFT (c) 0.1 13.7
and NHA in a white-noise environment. (d) 0.6 20.3
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 10 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 11 Composite wave synthesized by four sinusoidal waves.

the source and noise spectrum decreases because NHA is By DFT and NHA, we performed a frequency analysis
a high-frequency resolution analysis, as shown in Figure on the part of the sound for which the input SNR of the
13b. Therefore, there is a high possibility that the informa- white noise is 0 dB. Figure 14a is the original voice signal,
tion contained in the source spectrum is isolated from the and Figure 14b is the voice signal to which a noise was
noise spectrum and can be recovered. added. We removed noise by the SS method using DFT

Figure 12 Frequency characteristics of four sinusoidal waves.


Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 11 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 13 Spectrographs for a noise signal and a signal source. (a) low resolution, (b) high resolution.

and NHA; the results of which are described in Figure 14c, threshold can be increased and the numerous noises can
d, respectively. Figure 14e shows the variation of the out- be suppressed, thereby improving the output SNR.
put SNR by changing the threshold of the SS method.
This figure shows that the maxima of output SNR using 5. Constant threshold experiment
DFT and NHA are 9.1 and 17.4 dB, respectively. There- 5.1. Experimental conditions for the constant threshold
fore, the proposed technique using NHA is more useful in experiments
the noise reduction than that using DFT. In addition, it is In order to investigate the relationship between the fre-
important to appropriately determine the threshold for quency resolution obtained by DTF, NHA, and the Ismo
each noise because, as shown in Figure 14e, the output method [21,22], and the noise compression obtained by
SNR changes significantly near the threshold to distinguish the SS method, we evaluate the results obtained by the
between signal and noise. One part of the output SNR segmental SNR method. In general, in the SS method,
using NHA is a straight line because small side lobes musical noises occur and affect the subjective evalua-
appear from the signal. However, NHA does not reveal tion. Although the spectral floor [23] has been proposed
the spectrum components of a sound in the side lobes. to eliminate these noises, in order to determine only the
DFT is inferior to NHA because, in DFT, noise is mixed improvement in the results, we do not use this method
with the sound in the side lobes. Therefore, in NHA, the in this study. In DFT, NHA, and the Ismo method,

0.6 1.5
Amplitude

1
Amplitude

0.4
0.2 0.5
0 0
-0.2 -0.5
-0.4
-0.6 -1
-0.8 -1.5
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Time (ms) Time (ms)
(a) Vowel sound. (b) Vowel sound with white noise.

0.6
0.4
Amplitude

0.2
0 18
-0.2
-0.4 16
-0.6 14
-0.8
OutputSNR(dB)

0 5 10 15 20 25 30 12
Time (ms) 10
(c) Noise reducttion using DFT. 8
0.6 6
Amplitude

0.4 4
0.2
0 2
-0.2 0
-0.4 0 0.1 0.2 0.3 0.4 0.5
-0.6
-0.8 Threshold
0 5 10 15 20 25 30
Time (ms) (e) Relationship between threshold and Output SNR.
(d) Noise reduction using NHA. The solid line indicates the DFTresults, and
the dotted line indicates the NHA results.
Figure 14 Noise reduction of the vowel sound.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 12 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

various window functions were chosen. In DFT and the where |X(k)| , k, and a denote the spectral amplitude,
Ismo method, a Hanning and a rectangular window the spectral number, and the most suitable threshold of
functions were used. In NHA, only a rectangular win- the input signal, respectively. In general, the SS method
dow was used. In a previous study [11], the Ismo used in noise compression yields the most suitable out-
method applied a Hanning window at points at which put by adjusting the noise spectrum model by means of
the signal changed suddenly, and a rectangular window a subtraction factor [23]. However, we calculate the seg-
was applied at the other points. In this article, to con- mental SNR using a few suitable threshold values for
sider frequency resolution, we use a Hanning window each analysis method because it is predicted that the
and a rectangular window separately in different experi- most suitable values of the variable used in noise com-
ments. The signal sources are musical sounds in the pression differ depending of the analysis method. The
form of midi data (Do-Re-Mi, Für Elise) that are played obtained results confirm that the most suitable thresh-
by a YAMAHA XG WDM SoftSynthesizer for 2 s. old values do differ depending on the analysis method.
Based on the findings of a previous study [11], the order Consequently, we calculated the suitable values for each
of the filter used for the prediction of the Ismo method signal waveform and compared the analysis methods
is less than one frame length, and the half-frame-length with the most suitable segmental SNR. For the case of
sections before and after the signal frame are extrapo- white Gaussian noise, we use |D̂(k)| that is constant for
lated. Here, the frequency resolution of the Ismo k, because the power spectrum density is uniform in any
method is theoretically twice that of DFT. In most cases frequency band. We select the most suitable value of a
considered herein, NHA is used to extract 512 spectra so that the segmental SNR becomes maximum by gra-
per frame. In addition, after subtracting the signals of 3/ dually increasing the segmental SNR from a small value
4-frame-length sections before and after the signal and use the selected value of a in the experiments. For
frame, we evaluate the result of the NHA to consider
the case of pink noise, we use the noise model |D̂(k)|
the overlap of the signal frames. We then determined
whether the same tendency was observed for each that varies linearly along frequency axis and select the
method, for four window lengths of 256, 512, 1024, and most suitable value of a using the above-mentioned
2048. Table 2 lists the experimental conditions. method. In this study, we also remove the noise by the
spectrum extraction (SE) method based on the concept
5.2 Details of the methods used to obtain the amplitude- of high frequency resolution preventing spectrum mix-
modified spectra ture. In the SE method, the output signal of DFT ŝDFTex
First, spectrum (Ak, fk, jk), X(k), and XISM(k) are calcu- is given as
lated by NHA, DFT, and the Ismo method, respectively.   
ŝDFTex (n) =  IFFT |X̂(k)| exp(j X(k)) k = 0, 1, 2, . . . , N − 1
The previously estimated noise spectrum is then sub-  (19)
|X(k)| if(|X(k)| − α|D̂(k)|) > 0
tracted from the calculated spectrum. Output signal |X̂(k)| =
0 otherwise
ŝDFTsub obtained by DFT using the SS method is as fol-
lows: Substituting Xism(k) obtained using the Ismo method
   for X(k) in Equations 18 and 19, we calculate these
ŝDFTsub (n) =  IFFT |X̂(k)| exp(j X(k)) k = 0, 1, 2, . . . , N − 1 equations in a similar manner and obtain the output
 (18)
|X(k)| − α|D̂(k)| if(|X(k)| − α|D̂(k)|) > 0 ŝISMsub by the SS method, and the output ŝISMex by the
|X̂(k)| =
0 otherwise
SE method.

Table 2 Experimental conditions


Analysis method DFT (rectangular), DFT (Hanning), Ismo (rectangular), Ismo(Hanning), NHA
Amplitude modification Spectral extraction, SS
Sampling frequency 44.1 KHz
Length of Music 2s
Frame length 256, 512, 1024, 2048
Shift length (Frame length)/4
Added noise White Gaussian noise, Pink noise
Input SNR (dB) -10, -5, 0, 5, 10
Instrument of MIDI Flute, Grand piano, Reed organ, Overdrive guitar, Trumpet
Music (midi) Do-Re-Mi, For Elise
Software synthesizer YAMAHA XG WDM SoftSynthesizer
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 13 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

As mentioned earlier, we investigated both X(k) and 5.3. Results of the fixed-threshold experiment
XISM(k) using a Hanning window and a rectangular win- The variation with respect to time of the output SNR
dow (a is optimally selected for each window function). for input signals in which white Gaussian noise is added
The output signal ŝNHAsub of NHA obtained by the SS to a grand piano sound source is shown in Figures 15,
method is given by the following equation: 16, and 17. In these figures, (a), (b), and (c) show the
  output SNRs obtained by the SE method, the SS
K
f̂k method, and the time-waveform, respectively, for the
ŝNHAsub (n) = Ãk cos 2π n + ϕ̂k original signal. The window length is 2048.
fs
k=0 (20)
 Compared to the SE method, the NHA, indicated by
(Âk − 2α|D̂(fk )|) if(Âk − 2α|D̂(fk )|) > 0 blue solid lines, provided the best results, followed by
Ãk = ,
0 otherwise the Ismo method with a Hanning window, and DTF
with a rectangular window provided the worst results.
and (Ak, fk, jk) is the spectrum component obtained
Similarly, compared to the SS method, NHA provided
from the noise signal obtained by NHA. Here, a is
the best results, and DTF with a rectangular window
doubled in order to be equal to |X(k)| . Similarly, the provided the worst results.
output signal ŝNHAex of NHA obtained by the SE For this sound source, the output SNR calculated by
method is as follows: each method has a different magnitude, but these mag-
   nitudes change at approximately the same time and

K
f̂k Âk if(Âk − 2α|D̂(fk )|) > 0
ŝNHAex (n) = Ãk cos 2π n + ϕ̂k , Ãk = (21)
k=0
fs 0 otherwise exhibit a similar trend.

Figure 15 Change with respect to time in the output SNR of the signal source of a grand piano in a white Gaussian noise
environment for which the input SNR is 0 dB. (a) SE method, (b) SS method, (c) signal source.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 14 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 16 Change with respect to time in the output SNR of the signal source of a grand piano in a white Gaussian noise
environment for which the input SNR is 10 dB. (a) SE method, (b) SS method, (c) signal source.

Figure 17 Change with respect to time in the output SNR of the signal source of a grand piano in a white Gaussian noise
environment for which the input SNR is -10 dB. (a) SE method, (b) SS method, (c) signal source.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 15 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

The results obtained for all of the analysis methods good, in part because the prediction of the signal became
were poor during the periods of sudden changes in easy.
amplitude. In regions of stable amplitude, the high fre- Figure 21 shows the average segmental SNR for the
quency resolution analysis methods that use a Hanning music signal as obtained by ten noise reduction meth-
window function provided good results. Examples of sig- ods, which are the combinations of two noise subtrac-
nals for which a stable envelope was maintained are tion methods and five frequency analysis methods in an
shown in Figures 18, 19, and 20. analysis frame. Similar magnitude correlations appeared
The signal used here is stable and exhibits only a few among the methods, even when the window length
changes in its envelope for both the SE and SS methods, changed in Figure 21a-f. Similar results are observed for
as shown in Figures 18, 19, and 20. The calculated results SNRs of 10, 0, and -10 dB.
for that signal were ranked in order of NHA, the Ismo Figure 21a-c shows the results for input SNRs of 10, 0,
method, and DFT. For the SE method, the Ismo method and -10 dB, respectively, in a white Gaussian noise envir-
and NHA provided better results than DFT by approxi- onment. Based on the results, the average segmental SNR
mately 5 and 3 dB, respectively, when the envelope chan- obtained by NHA is the highest for the SE method, fol-
ged markedly. For the SS method, the Ismo method and lowed by the Ismo method using a Hanning window. For
NHA provided better results than DFT by approximately the SS method, the average segmental SNR obtained by
1.5 and 0.7 dB, respectively, when the envelope changed NHA is high compared to other techniques. Unlike in a
markedly. The results obtained by NHA may have been previous study [11], the improvement in precision by the
superior because the signal source spectrum was not dis- Ismo method for the SS method could not be confirmed
persed and the frequency resolution was high. In addi- in the present experiment. However, the higher values
tion, the results of the Ismo method are comparatively are thought to have been obtained using transient

Figure 18 Change with respect to time in the output SNR of the signal source of a reed organ in a white Gaussian noise environment
for which the input SNR is 10 dB. (a) SE method, (b) SS method, (c) signal source.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 16 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 19 Change with respect to time in the output SNR of the signal source of a reed organ in a white Gaussian noise environment
for which the input SNR is 0 dB. (a) SE method, (b) SS method, (c) signal source.

Figure 20 Change with respect to time in the output SNR of the signal source of a reed organ in a white Gaussian noise environment
for which the input SNR is -10 dB. (a): SE method, (b): SS method, (c): signal source.
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 17 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

Figure 21 Average segmental SNR of a white Gaussian noise and a pink noise environment.

detection [21]. In this study, the threshold is chosen so decision method that considers either human hearing [8]
that the segmental SNR I maximized each time the seg- or musical noise [23]) and provides good affinity. Figure
mental SNR is calculated. The Ismo method is thought 21d-f shows the results for input SNRs of 10, 0, and -10
to be well suited to real applications (e.g., threshold dB, respectively, in a pink noise environment. In this
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 18 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

case, the best NHA results were obtained using either the because the characteristic of an unsteady noise must be
SE method or the SS method. Moreover, the combination predicted.
of the Ismo method and a Hanning window provide good At this stage, we have not incorporated the proposed
results compared to DFT by the SE method. method into the embedded system or the portable
device because the proposed method is several times
6. Summary longer than the calculation time of DFT (equivalent to
Previous studies have confirmed that the precision of the fastest FFT using a radix-2 number in this article).
the noise suppression is improved by increased fre- The high-speed SS method appears to be advantageous
quency resolution for quality enhancement of sound to if the application is for the research of the speech recog-
a previously existing recording. In this study, we demon- nition in the daily conversations. Although the calcula-
strate that NHA provides high frequency resolution by tion time is increased, the proposed technique will be
suppressing the influence of the window length. The effective if used in an application that requires high pre-
limit to the precision improvement of noise suppression cision. We believe that the defects of the proposed
by NHA is examined. Since a frequency spectrum using method are best left for consideration in a future study
NHA is not affected by the window length at the time if the proposed method is applied to a portable product
of frequency conversion, the frequency resolution width or the research of speech recognition.
is regarded as theoretically infinitesimal.
We added white Gaussian noise and pink noise to a
Acknowledgements
music signal and performed experiments to examine the This work was supported by Grants-in-Aid for Challenging Exploratory
effects of noise suppression by the basic SS method. Research, MEXT(No.23650110).
Segmental SNR was used to evaluate the effectiveness of
Competing interests
noise suppression through a fixed-threshold experiment, The authors declare that they have no competing interests.
and NHA and the conventional SS method were com-
pared. The precision of the noise suppression obtained Received: 27 June 2011 Accepted: 21 September 2011
Published: 21 September 2011
by NHA was confirmed to be better than that obtained
by the conventional method. A similar magnitude corre- References
lation was confirmed to appear among the methods 1. SF Boll, Suppression of acoustic noise in speech using spectral subtraction.
even if the window length changed. In addition, the IEEE Trans Acoust Speech, Signal Process ASSP. 27(2), 113–120 (1979).
doi:10.1109/TASSP.1979.1163209
improvement in precision of noise suppression by high 2. CT Lin, Single-channel speech enhancement in variable noise-level
frequency resolution was confirmed when the envelope environment. IEEE Trans Syst Man Cybernet A. 33(1), 137–143 (2003)
was stable. Based on these results, an improvement in 3. SD Kamath, PC Loizou, A multi-band spectral subtraction method for
enhancing speech corrupted by colored noise, in Proceedings of the ICASSP,
noise suppression precision, as compared to that pro- pp. 4164–4167 (2002)
vided by the conventional method, can be expected in 4. Z Goh, KC Tan, BTG Tan, Postprocessing method for suppressing musical
various applications by incorporating NHA with a theo- noise generated by spectral subtraction. IEEE Trans Speech Audio Process.
6, 287–292 (1998). doi:10.1109/89.668822
retically infinitesimal frequency resolution. 5. K Sorensen, S Andersen, Speech enhancement with natural sounding
In this study, we attempt only to re-master the old residual noise based on connected time-frequency speech presence
music sources. Therefore, the main noise sources are regions. EURASIP J Appl Signal Process. 18, 2954–2964 (2005)
6. IY Soon, SN Koh, Speech enhancement using 2-D Fourier transform. IEEE
usually generated by the old recording device and the Trans Speech Audio Process. 11, 717–724 (2003). doi:10.1109/
deterioration of the recording media as pulsive noise TSA.2003.816063
and white noise. We do not assume noise encountered 7. H Ding, IY Soon, SN Koh, CK Yeo, A spectral filtering method based on
hybrid wiener filters for speech enhancement. Speech Commun. 51,
in a noisy environment, such as a subway or a roadside. 259–267 (2009). doi:10.1016/j.specom.2008.09.003
It may be feasible to apply the proposed technique to 8. N Virag, Single channel speech enhancement based on masking properties
sound sources of daily conversations. It appears that we of the human auditory system. IEEE Trans Speech Audio Process. 7(2),
126–137 (1999). doi:10.1109/89.748118
can recover enough even if a noise is mixed because the 9. R Udrea, N Vizireanu, S Ciochina, An improved spectral subtraction method
vowel sound is a periodic signal over a short time per- for speech enhancement using a perceptual weighting filter. Digital Signal
iod. However, in the frequency analysis of the conso- Process. 18(4), 581–587 (2008). doi:10.1016/j.dsp.2007.08.002
10. I Kauppinen, K Roth, Improved noise reduction in audio signals using
nant, the calculation using NHA is approximately spectral resolution enhancement with time-domain signal extrapolation.
equivalent to the calculation using FFT. IEEE Trans Speech Audio Process. 13, 1210–1216 (2005)
In addition, we examined a pink noise as a representa- 11. S Hirobayashi, F Ito, T Yoshizawa, T Yamabuchi, Estimation of the frequency
of non-stationary signals by the steepest descent method, in Proceedings of
tive colored noise. Other steady noises can be reduced the Fourth Asia-Pacific Conference of Industrial Engineering and Management
in the same manner if the outline of the power spec- Systems, pp. 788–791 (2002)
trum is known. However, it appears that we must incor- 12. EB George, MJT Smith, Analysis-by-synthesis/overlap add sinusoidal
modeling applied to the analysis and synthesis of musical tones. J Audio
porate new methods other than the proposed method, Eng Soc. 125(40), 497–516 (1992)
and the new methods must be dynamically devised
Yoshizawa et al. EURASIP Journal on Audio, Speech, and Music Processing 2011, 2011:5 Page 19 of 19
http://asmp.eurasipjournals.com/content/2011/1/5

13. EB George, MJT Smith, Speech analysis/synthesis and modification using an


analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech
Audio Process. 5(5), 398–406 (1997)
14. JW Turkey, AE Beaton, The fitting of power series, meaning polynomials,
illustrated on band-spectroscopic-data. Technometrics. 16, 189–192 (1974).
doi:10.2307/1267938
15. JM Chambers, Computational Methods for Data Analysis. Wiley, New York
(1977)
16. PE Gill, W Murray, Quasi-Newton methods for unconstrained optimization. J
Inst Math Appl. 9, 91–108 (1972). doi:10.1093/imamat/9.1.91
17. T Terada., et al, Non-stationary waveform analysis and synthesis using
generalized harmonic analysis, in IEEE-SP International Symposium on Time-
Frequency and Time-Scale Analysis, pp. 429–432 (1994)
18. N Wiener, in The Fourier Integral and Certain of Its Applications, (Dover
Publications, Inc., New York, 1958), pp. 158–199
19. T Muraoka, S Kiriu, Y Kamiya, Fast algorithm for generalized harmonic
analysis (GHA), in The 47th IEEE International Midwest Symposium on Circuit
and Systems, pp. 153–156 (2004)
20. Y Hirata, Non-harmonic Fourier analysis available for detecting very low-
frequency components. J Sound Vib. 287(3), 611–613 (2005)
21. I Kauppinen, K Roth, An adaptive technique for modeling audio signals, in
Proceedings of the 4th International Conference on Digital Audio Effects (DAFx-
01), (Limerick, Ireland, 2001), pp. 1–4
22. I Kauppinen, K Roth, Audio signal extrapolation–theory and applications, in
Proceedings of the 5th International Conference on Digital Audio Effects (DAFx-
02), (Hamburg, Germany, 2002), pp. 105–110
23. M Berouti, R Schwartz, J Makhoul, Enhancement of speech corrupted by
acoustic noise, in Proc IEEE ICASSP’79, pp. 208–211 (April 1979)

doi:10.1186/1687-4722-2011-426794
Cite this article as: Yoshizawa et al.: Noise reduction for periodic signals
using high-resolution frequency analysis. EURASIP Journal on Audio,
Speech, and Music Processing 2011 2011:5.

Submit your manuscript to a


journal and benefit from:
7 Convenient online submission
7 Rigorous peer review
7 Immediate publication on acceptance
7 Open access: articles freely available online
7 High visibility within the field
7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

You might also like