0% found this document useful (0 votes)

23 views5 pages

A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement

This document proposes a novel recurrent neural network (RNN) called the Signal-to-Noise Ratio Recurrent Neural Network (SNRNN) for robust speech enhancement. The SNRNN reformulates the classical decision-directed approach for estimating the a priori and a posteriori signal-to-noise ratios (SNRs) as latent variables in an RNN. It introduces neural networks to learn parameters normally determined heuristically. The SNRNN aims to combine the robustness of classical approaches with the learning ability of neural networks for speech enhancement.

Uploaded by

gupnaval2473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views5 pages

A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement

Uploaded by

gupnaval2473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust

Speech Enhancement
Yangyang Xia1 , Richard M. Stern1,2
1
Department of Electrical and Computer Engineering, Carnegie Mellon University
2
Language Technologies Institute, Carnegie Mellon University
[email protected], [email protected]

Abstract domain for the well-known methods such as spectral subtraction

Speech enhancement under highly non-stationary noise condi- [2], Wiener filter [3], maximum likelihood (ML) estimator [6],
tions remains a challenging problem. Classical methods typi- and MMSE estimation [4] can all be expressed in terms of a pri-
cally attempt to identify a frequency-domain optimal gain func- ori and a posteriori SNRs [7]. The enhancement problem thus
tion that suppresses noise in noisy speech. These algorithms becomes a priori SNR and a posteriori SNR estimation prob-
typically produce artifacts such as “musical noise” that are lem. For estimating the a priori SNR, a closed-form maximum-
detrimental to machine and human understanding, largely due likelihood method and a recursive “decision-directed” method
to inaccurate estimation of noise power spectra. The optimal are proposed [4]. For estimating the a posteriori SNR, or
gain function is commonly referred to as the ideal ratio mask equivalently the noise power, the minima-controlled recursive-
(IRM) in neural-network-based systems, and the goal becomes averaging (MCRA) algorithm can be employed [7, 8, 9]. De-
estimation of the IRM from the short-time Fourier transform spite the robustness of the decision-directed approach even in
amplitude of degraded speech. While these data-driven tech- highly nonstationary noise environments, inexact heuristics in
niques are able to enhance speech quality with reduced artifacts, the estimation procedure often produce artifacts called musical
they are frequently not robust to types of noise that they had noise, which is sometimes even more detrimental to machine
not been exposed to in the training process. In this paper, we tasks and human listening experience than noisy speech.
propose a novel recurrent neural network (RNN) that bridges Independent of the classical approaches, researchers in the
the gap between classical and neural-network-based methods. neural network (NN) community formulate the speech enhance-
By reformulating the classical decision-directed approach, the ment task to be a supervised learning problem. Recognizing the
a priori and a posteriori SNRs become latent variables in the ideal ratio mask (IRM) in the STSA domain as a better training
RNN, from which the frequency-dependent estimated likeli- target than clean signal power or magnitude spectra [10], var-
hood of speech presence is used to update recursively the la- ious neural network architectures have been explored to learn
tent variables. The proposed method provides substantial en- the IRM for SE. Some examples are feedforward deep neural
hancement of speech quality and objective accuracy in machine networks [11, 12], deep denoising autoencoders [13], and re-
interpretation of speech. current neural networks (RNN) with long short-term memory
Index Terms: robust speech enhancment, a priori SNR estima- [14]. Although these NN-based SE algorithms work well under
tion, decision-directed, recurrent neural networks. noise conditions that appear in the training set, they typically
suffer from degraded performance in unseen noise types as they
attempt to learn a nonlinear mapping between noisy speech and
1. Introduction the IRM.
Speech enhancement (SE) has been one of the enabling tech- A fusion system that combines the robustness and inter-
nologies for robust speech processing applications for decades. pretability of the classical approach and the learning ability of
SE algorithms strive to improve speech quality and intelligibil- the NN approach is clearly desirable. One previous study that
ity of speech signals degraded by additive noise [1]. Enhanced attempts this fusion [15] proposes a NN version of spectral sub-
speech signals will benefit subsequent human listening experi- traction by having dedicated NNs for estimating noise alone,
ence or performance of machine tasks, such as automatic speech noise in noisy speech, and the enhanced speech. Although their
recognition and speaker verification. Classical signal process- NN structure is reminiscent of spectral subtraction, our exper-
ing methods for SE typically work in the frequency domain with iments show that the latent variables do not learn the intended
optimization criteria associated with the spectral component of representation. Others [16, ?] have attempted to improve a pri-
the enhanced speech. The technique ranges from heuristically ori SNR estimation using NNs, but their systems are shallow
estimating the power spectra [2], finding a linear filter that opti- combinations of multiple approaches at the input and output
mizes the mean squared error of the complex spectra [3], to min- levels.
imum mean-squared error estimators (MMSE) that optimizes We propose a novel RNN that addresses these issues. We
the (log) short-time spectral amplitude (STSA) [4, 5]. slightly modify the decision-directed approach to form a recur-
A priori signal-to-noise ratio (SNR) and a posteriori SNR rent estimation of both the a priori SNR and a posteriori SNR,
arise as two important concepts from the derivation of the eliminating the need to estimate noise explicitly. This refor-
MMSE-STSA estimator [4]. The a priori SNR can be under- mulation leads to a ratio-based representation for all variables,
stood as the true instantaneous power ratio between each spec- which have already proven to be superior training targets for
tral component of clean speech and noise, while the a posteriori neural network learning [10]. Among them, the a priori SNR,
SNR can be viewed as the instantaneous power ratio between a posteriori SNR, and the speech-presence likelihood ratio are
each spectral component of observed noisy speech and noise. interpreted as latent recurrent cells of a recurrent neural net-
Within this framework, the optimal gain function in the STSA work. This enables us to insert feedforward NNs to learn pa-
rameters that are normally heuristically determined using clas- Noise estimation is needed to calculate γ̂[m, k] by definition.
sical approaches. In addition, we introduce a learning objective Acknowledging the importance of the decision-directed ap-
function that jointly optimizes the MSE of STSA as well as the proach, we adopt the MCRA algorithm [7, 8, 9] for noise power
frame-level speech-presence detection accuracy. estimation. Specifically, the speech-absence (H0k ) hypothesis
and speech-presence (H1k ) hypothesis are assumed for each fre-
2. The Signal-to-noise Ratio Recurrent quency bin k of each frame m of the noisy signal:
Neural Network (SNRNN) H0k : |N̂ [m, k]|2 = b|N̂ [m − 1, k]|2 +(1 − b)|X[m − 1, k]|2 (10)
Our signal-to-noise ratio recurrent neural network (SNRNN) H1k : |N̂ [m, k]|2 = |N̂ [m − 1, k]|2
consists of a slightly modified version of the classical decision-
directed a priori SNR estimation and a neural network compo- where 0 < b < 1 is the weighting coefficient. In other words,
nent. Throughout the discussion, we assume additive noise in the noise power in a specific frequency bin is recursively up-
the short-time Fourier transform (STFT) domain: dated by a fraction of signal power from the previous frame
only if it is classified as speech-absent. This decision is made
X[m, k] = S[m, k] + N [m, k] (1) by thresholding the likelihood ratio of speech-presence uncer-
where X[m, k], S[m, k], and N [m, k] denote the STFT at time tainty:
frame m and frequency bin k of the observed noisy speech, P (X[m, k]|H1k )
Λ[m, k] , (11)
clean speech, and noise, respectively. The end goal is to seek for P (X[m, k]|H0k )
the optimal gain function or IRM in the STSA domain, G[m, k], The previous assumption that the noise and speech DFT coeffi-
such that the clean speech estimate Ŝ[m, k] can be obtained cients are independent, complex, and Gaussian leads to:
from the modified STSA and the phase from the noisy input:
ξ̂[m,k]
1 γ̂[m,k]
j 6 X[m,k] Λ[m, k] = e 1+ξ̂[m,k] (12)
Ŝ[m, k] = G[m, k]|X[m, k]|e (2) ˆ
1 + ξ[m, k]
The a priori SNR ξ[m, k] is defined by the ratio of the expected In our system, we replace the hard threshold used in Eq. 10 by
value of clean speech power to the expected value of the noise a soft threshold to enable gradient backpropagation. We also
power: rewrite γ̂[m, k] as a recursive function, eliminating the notion
E[|S[m, k]|2 ] of noise estimation completely. Finally, we introduce the neural
ξ[m, k] = (3)
E[|N [m, k]|2 ] network component, along with the loss function.
The a posteriori SNR γ[m, k] is defined by the ratio of the in-
stantaneous noisy speech power to the expected value of the 2.1. Recurrent A Priori and A Posteriori SNR Estimation
noise power: The noise update rule in Eq. 10 can be interpreted as a re-
|X[m, k]|2 current nonlinear activation function. Specifically, let δ be a
γ[m, k] = (4)
E[|N [m, k]|2 ] hard threshold of the log-likelihood ratio of speech-presence
In estimating the a priori and a posteriori SNRs, we replace the uncertainty above which the noisy frame is classified as speech-
expected values by the corresponding instantaneous values: present. The update rule can then be rewritten as:

ˆ |Ŝ[m, k]|2 |X[m, k]|2 |N̂ [m,k]|2

= β(Λ[m − 1, k]) + (1 − β(Λ[m − 1, k]))γ̂[m − 1, k] (13)
ξ[m, k] = , γ̂[m, k] = (5) |N̂ [m−1,k]|2
|N̂ [m, k]|2 |N̂ [m, k]|2
where β(Λ[m, k]) is the scaled and shifted unit step function:
Assuming that S[m, k] and N [m, k] are statistically indepen-
dent zero-mean complex Gaussian random variables, Eq. 1 im- β(Λ[m, k]) = b + (1 − b)u[log(Λ[m, k]) − δ] (14)
plies an additive relationship in the spectral power domain:
To enable gradient backpropagation in our RNN, we propose
E[|X[m, k]|2 ] = E[|S[m, k]|2 ] + E[|N [m, k]|2 ] (6) two nonlinearities, sigmoid and piecewise-linear, that have non-
zero gradients around the decision boundary δ to replace the
which leads to the definition of ξ[m, k] in terms of γ[m, k]: unit step function:
ξ[m, k] = E[γ[m, k]] − 1 (7) 1
βsig (Λ[m, k]) = b + (1 − b) (15)
ˆ
The decision-directed approach [4] calculates ξ[m, k] by lin- 1 + e−(log(Λ[m,k])−δ)
1−b
early averaging the past and present estimates of a priori SNR: βpwl (Λ[m, k]) = min{1, max{b, 2
[log(Λ[m, k]) − (δ − )] + b}}

ˆ
ξ[m, k] = aĜ2 [m − 1, k]γ̂[m − 1, k] + (1 − a)max{γ̂[m, k] − 1, 0} (8) where is a small positive constant that controls the width of
the linear region. Combining the new update Eq. 13 with Eq. 5
where 0 < a < 1 is the weighting coefficient, and max{·} is we obtain:
the element-wise maximum operator that prevents the current
estimate from going below 0. The gain function Ĝ[m, k] is ex- |X[m, k]|2 γ̂[m − 1, k]
γ̂[m, k] = (16)
pressed in terms of ξ[m, k] depending on the method to be used |X[m − 1, k]|2 β + (1 − β)γ̂[m − 1, k]
[7]. We use the Wiener estimate solution [3, 7], because the
ˆ
partial derivative of Ĝ[m, k] with respect to ξ[m, k] does not where β is shorthand for β(Λ[m − 1, k]). Equations 8, 12, and
involve potential division by zero, which would result in gradi- 16 complete the recurrent estimation of both a priori and a pos-
ent explosion during training: teriori SNR, without the need to explicitly estimate noise power.
This distinction is important from the neural network learning
ˆ
ξ[m, k] perspective, as estimating a ratio mask rather than direct signal
Ĝ[m, k] = (9)
ˆ
ξ[m, k] + 1 is desirable [10]. We now present the full system.
2.2. RNN for A Priori SNR Estimation Gm,k
<latexit sha1_base64="gD+0YpANu9H2nhwu6GQy+4Ze9mI=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBg5TEi3oretBjBWMLbSib7aZdsrsJuxuhhPwILx5UvPp/vPlv3LY5aOuDgcd7M8zMC1POtHHdb6eysrq2vlHdrG1t7+zu1fcPHnWSKUJ9kvBEdUOsKWeS+oYZTrupoliEnHbC+Gbqd56o0iyRD2aS0kDgkWQRI9hYqXM7yMVZXAzqDbfpzoCWiVeSBpRoD+pf/WFCMkGlIRxr3fPc1AQ5VoYRTotaP9M0xSTGI9qzVGJBdZDPzi3QiVWGKEqULWnQTP09kWOh9USEtlNgM9aL3lT8z+tlJroMcibTzFBJ5ouijCOToOnvaMgUJYZPLMFEMXsrImOsMDE2oZoNwVt8eZn4582rpnfvNlrXZRpVOIJjOAUPLqAFd9AGHwjE8Ayv8Oakzovz7nzMWytOOXMIf+B8/gB+BI8y</latexit>
sha1_base64="LfSg8xQRLksWwsKWbKBYPGx6TVk=">AAAB7XicbVC7SgNBFL0bXzG+opY2g1GwkLBro3ZBCy0juCaQLGF2MpsMOzO7zMwKYclH2Fio2PoH1n6DnR9i7+RRaOKBC4dz7uXee8KUM21c98spLCwuLa8UV0tr6xubW+XtnTudZIpQnyQ8Uc0Qa8qZpL5hhtNmqigWIaeNML4c+Y17qjRL5K0ZpDQQuCdZxAg2VmpcdXJxHA875YpbdcdA88Sbkkrt4Pv9AwDqnfJnu5uQTFBpCMdatzw3NUGOlWGE02GpnWmaYhLjHm1ZKrGgOsjH5w7RoVW6KEqULWnQWP09kWOh9UCEtlNg09ez3kj8z2tlJjoLcibTzFBJJouijCOToNHvqMsUJYYPLMFEMXsrIn2sMDE2oZINwZt9eZ74J9Xzqndjw7iACYqwB/twBB6cQg2uoQ4+EIjhAZ7g2UmdR+fFeZ20FpzpzC78gfP2A0WlkgE=</latexit>
sha1_base64="77QUGU2pQVuCBNeajLYgcHLSgvk=">AAAB7XicbVC7SgNBFL3rMyY+opY2g1GwkLBro3ZBCy0jGBNIljA7mU2GnZldZmYDYclH2Fio2PoH/oB/YOeHaO3kUWjigQuHc+7l3nuChDNtXPfTWVhcWl5Zza3lC+sbm1vF7Z07HaeK0BqJeawaAdaUM0lrhhlOG4miWASc1oPocuTX+1RpFstbM0ioL3BXspARbKxUv2pn4jgatoslt+yOgeaJNyWlysHX23u/8F1tFz9anZikgkpDONa66bmJ8TOsDCOcDvOtVNMEkwh3adNSiQXVfjY+d4gOrdJBYaxsSYPG6u+JDAutByKwnQKbnp71RuJ/XjM14ZmfMZmkhkoyWRSmHJkYjX5HHaYoMXxgCSaK2VsR6WGFibEJ5W0I3uzL86R2Uj4vezc2jAuYIAd7sA9H4MEpVOAaqlADAhHcwyM8OYnz4Dw7L5PWBWc6swt/4Lz+ADzLk3s=</latexit>

The recurrent structure described in the previous subsection

naturally lends itself to a recurrent neural network framework. a
<latexit sha1_base64="l5fvM/MWx/+DpZyAURszk8BDyqM=">AAAB53icbVBNT8JAEJ3iF+IX6tHLRmLiibRe1BvRi0dIrJBAQ7bLFFa222Z3a0IafoEXD2q8+pe8+W9coAcFXzLJy3szmZkXpoJr47rfTmltfWNzq7xd2dnd2z+oHh496CRTDH2WiER1QqpRcIm+4UZgJ1VI41BgOxzfzvz2EyrNE3lvJikGMR1KHnFGjZVatF+tuXV3DrJKvILUoECzX/3qDRKWxSgNE1TrruemJsipMpwJnFZ6mcaUsjEdYtdSSWPUQT4/dErOrDIgUaJsSUPm6u+JnMZaT+LQdsbUjPSyNxP/87qZia6CnMs0MyjZYlGUCWISMvuaDLhCZsTEEsoUt7cSNqKKMmOzqdgQvOWXV4l/Ub+uey231rgp0ijDCZzCOXhwCQ24gyb4wADhGV7hzXl0Xpx352PRWnKKmWP4A+fzBzCajLU=</latexit>
<latexit sha1_base64="B7GcAiEC4M44/5dV+3sOvKLP1RY=">AAAB53icbZC7SwNBEMbn4iuer6ilzWIQrMKdjVqIQRvLBDwTSI6wt5lL1uw92N0TwhGwt7FQsfWvsbfzv3HzKDTxg4Uf3zfDzkyQCq6043xbhaXlldW14rq9sbm1vVPa3btTSSYZeiwRiWwGVKHgMXqaa4HNVCKNAoGNYHA9zhsPKBVP4ls9TNGPaC/mIWdUG6tOO6WyU3EmIovgzqB8+WlfPAJArVP6ancTlkUYayaoUi3XSbWfU6k5Eziy25nClLIB7WHLYEwjVH4+GXREjozTJWEizYs1mbi/O3IaKTWMAlMZUd1X89nY/C9rZTo883Mep5nGmE0/CjNBdELGW5Mul8i0GBqgTHIzK2F9KinT5ja2OYI7v/IieCeV84pbd8rVK5iqCAdwCMfgwilU4QZq4AEDhCd4gVfr3nq23qz3aWnBmvXswx9ZHz+eBY6C</latexit>
sha1_base64="StlBlhS0dsvuEJ6wOil6ehCvKQE=">AAAB53icbZA9SwNBEIbn4lc8v6KWNotBsAp3NmohBm0sE/BMIDnC3mYuWbP3we6eEI78AhsLFVvxx9jbiP/GTWKhiS8sPLzvDDszQSq40o7zZRUWFpeWV4qr9tr6xuZWaXvnRiWZZOixRCSyGVCFgsfoaa4FNlOJNAoENoLB5Thv3KFUPImv9TBFP6K9mIecUW2sOu2Uyk7FmYjMg/sD5fN3+yx9+7RrndJHu5uwLMJYM0GVarlOqv2cSs2ZwJHdzhSmlA1oD1sGYxqh8vPJoCNyYJwuCRNpXqzJxP3dkdNIqWEUmMqI6r6azcbmf1kr0+GJn/M4zTTGbPpRmAmiEzLemnS5RKbF0ABlkptZCetTSZk2t7HNEdzZlefBO6qcVty6U65ewFRF2IN9OAQXjqEKV1ADDxgg3MMjPFm31oP1bL1MSwvWT88u/JH1+g2OII/2</latexit>

x
Specifically, we place a feedforward neural network immedi- 1+x
x 2
<latexit sha1_base64="YLFUymZj95nHkP3rfmgaib5jpHw=">AAAB83icbVBNS8NAEJ3Ur1q/qh69LBZBEEoignorevFYwdhCG8pmu2mXbjZxd1NaQn6HFw8qXv0z3vw3btMctPXBwOO9GWbm+TFnStv2t1VaWV1b3yhvVra2d3b3qvsHjypKJKEuiXgk2z5WlDNBXc00p+1YUhz6nLb80e3Mb42pVCwSD3oaUy/EA8ECRrA2ktcNJCbpJEuds0nWq9bsup0DLROnIDUo0OxVv7r9iCQhFZpwrFTHsWPtpVhqRjjNKt1E0RiTER7QjqECh1R5aX50hk6M0kdBJE0JjXL190SKQ6WmoW86Q6yHatGbif95nUQHV17KRJxoKsh8UZBwpCM0SwD1maRE86khmEhmbkVkiE0O2uRUMSE4iy8vE/e8fl137i9qjZsijTIcwTGcggOX0IA7aIILBJ7gGV7hzRpbL9a79TFvLVnFzCH8gfX5A1gUkgQ=</latexit>

ately after each weighting factor, so that the RNN can learn the
⇠ˆm 1,k (
1+x
) ⇠ˆm,k
recursive averaging coefficients rather than applying heuristics: <latexit sha1_base64="oxIWaLbytY/SA7a19zuIz6KhERs=">AAAB+XicbVA9T8MwEHXKVylfKYwsFhUSA1QJC7BVsDAWidBKTRQ5rtNatZ3IdoAq5KewMABi5Z+w8W9w2wzQ8qSTnt670929KGVUacf5tipLyyura9X12sbm1vaOXd+9U0kmMfFwwhLZjZAijAriaaoZ6aaSIB4x0olGVxO/c0+koom41eOUBBwNBI0pRtpIoV33h0jn/iMtwpyfuMejIrQbTtOZAi4StyQNUKId2l9+P8EZJ0JjhpTquU6qgxxJTTEjRc3PFEkRHqEB6RkqECcqyKenF/DQKH0YJ9KU0HCq/p7IEVdqzCPTyZEeqnlvIv7n9TIdnwc5FWmmicCzRXHGoE7gJAfYp5JgzcaGICypuRXiIZIIa5NWzYTgzr+8SLzT5kXTvXEarcsyjSrYBwfgCLjgDLTANWgDD2DwAJ7BK3iznqwX6936mLVWrHJmD/yB9fkDemWTrA==</latexit>
sha1_base64="vc2VMwqdtmcuOIxbQ6W44+wGYJ4=">AAAB+XicbVC7TsNAEFyHVwgvB0oai4BEAZFNA3QRNJRBwiRSYlnnyzk55Xy27s5AZPwpNBSAaJGo+QY6PoSey6OAhJFWGs3sancnSBiVyra/jMLc/MLiUnG5tLK6tr5hljevZZwKTFwcs1g0AyQJo5y4iipGmokgKAoYaQT986HfuCFC0phfqUFCvAh1OQ0pRkpLvllu95DK2nc097Po0Dno575Zsav2CNYscSakUtv9fv8AgLpvfrY7MU4jwhVmSMqWYyfKy5BQFDOSl9qpJAnCfdQlLU05ioj0stHpubWnlY4VxkIXV9ZI/T2RoUjKQRTozgipnpz2huJ/XitV4YmXUZ6kinA8XhSmzFKxNczB6lBBsGIDTRAWVN9q4R4SCCudVkmH4Ey/PEvco+pp1bnUYZzBGEXYhh3YBweOoQYXUAcXMNzCAzzBs3FvPBovxuu4tWBMZrbgD4y3H0IGlns=</latexit>
sha1_base64="qA04ghxVkAh8pH6YQ2NKABWwB+8=">AAAB+XicbVC7TsMwFHXKq7Q8UhhZLAoSA1QJC7BVsDAWidBKTRQ5rtNatZPIdgpVyKewMABiReIH+AM2PgRm3McAhSNd6eice3XvPUHCqFSW9WEU5uYXFpeKy6XyyuraulnZuJJxKjBxcMxi0QqQJIxGxFFUMdJKBEE8YKQZ9M9GfnNAhKRxdKmGCfE46kY0pBgpLflmxe0hlbk3NPczfmDv93PfrFo1awz4l9hTUq3vfL6+DcpfDd98dzsxTjmJFGZIyrZtJcrLkFAUM5KX3FSSBOE+6pK2phHiRHrZ+PQc7mqlA8NY6IoUHKs/JzLEpRzyQHdypHpy1huJ/3ntVIXHXkajJFUkwpNFYcqgiuEoB9ihgmDFhpogLKi+FeIeEggrnVZJh2DPvvyXOIe1k5p9ocM4BRMUwRbYBnvABkegDs5BAzgAg2twBx7Ao3Fr3BtPxvOktWBMZzbBLxgv3zksl/U=</latexit>

<latexit sha1_base64="Vm6J2Ff6NFzfv/z4SRIoO+5kxgs=">AAAB+XicbVBNS8NAEJ34WetXqkcvi0WoCCUpgnorevFYwdhCG8tmu2mXbj7Y3WhLzE/x4kHFq//Em//GbZuDtj4YeLw3w8w8L+ZMKsv6NpaWV1bX1gsbxc2t7Z1ds7R3J6NEEOqQiEei5WFJOQupo5jitBULigOP06Y3vJr4zQcqJIvCWzWOqRvgfsh8RrDSUtcsVTq+wCQdZal9MsqO72tds2xVrSnQIrFzUoYcja751elFJAloqAjHUrZtK1ZuioVihNOs2EkkjTEZ4j5taxrigEo3nZ6eoSOt9JAfCV2hQlP190SKAynHgac7A6wGct6biP957UT5527KwjhRNCSzRX7CkYrQJAfUY4ISxceaYCKYvhWRAdZRKJ1WUYdgz7+8SJxa9aJq35yW65d5GgU4gEOogA1nUIdraIADBB7hGV7hzXgyXox342PWumTkM/vwB8bnD8rtkz4=</latexit>
<latexit sha1_base64="b4K+HEWfC7raiKHHfPAi8q1gAsM=">AAAB93icbVBNS8NAEN3Ur1o/GvXoJVgED1ISL+qt6MVjBWMLbQib7aZdursJuxOxhvwSLx5UvPpXvPlv3LY5aOuDgcd7M8zMi1LONLjut1VZWV1b36hu1ra2d3br9t7+vU4yRahPEp6oboQ15UxSHxhw2k0VxSLitBONr6d+54EqzRJ5B5OUBgIPJYsZwWCk0K73Rxjy/iMrwlycjovQbrhNdwZnmXglaaAS7dD+6g8SkgkqgXCsdc9zUwhyrIARTotaP9M0xWSMh7RnqMSC6iCfHV44x0YZOHGiTElwZurviRwLrSciMp0Cw0gvelPxP6+XQXwR5EymGVBJ5ovijDuQONMUnAFTlACfGIKJYuZWh4ywwgRMVjUTgrf48jLxz5qXTe/WbbSuyjSq6BAdoRPkoXPUQjeojXxEUIae0St6s56sF+vd+pi3Vqxy5gD9gfX5A5aNkzo=</latexit>
sha1_base64="BgIpnxbF7auGd23Dyzwq9OUquLM=">AAAB93icbVC7TsNAEFyHVwiPGChpLAISBYpsGqCLoKEMEiaRYss6X87JKeezdXdGBMtfQkMBiJaGmm+g40PouTwKSBhppdHMrnZ3wpRRqWz7yygtLC4tr5RXK2vrG5tVc2v7RiaZwMTFCUtEO0SSMMqJq6hipJ0KguKQkVY4uBj5rVsiJE34tRqmxI9Rj9OIYqS0FJhVr49U7t3RIsjjo0ERmDW7bo9hzRNnSmqN/e/3DwBoBuan101wFhOuMENSdhw7VX6OhKKYkaLiZZKkCA9Qj3Q05Sgm0s/HhxfWgVa6VpQIXVxZY/X3RI5iKYdxqDtjpPpy1huJ/3mdTEWnfk55minC8WRRlDFLJdYoBatLBcGKDTVBWFB9q4X7SCCsdFYVHYIz+/I8cY/rZ3XnSodxDhOUYRf24BAcOIEGXEITXMCQwQM8wbNxbzwaL8brpLVkTGd24A+Mtx9eLpYJ</latexit>
sha1_base64="twtCeyvmn4qBguHVDtqT34UUT+c=">AAAB93icbVC7TsMwFHV4lhZogJHFoiAxoCphAbYKFsYiEVqpqSLHdVqrthPZTkWJ8iUsDIBYWfgB/oCND4EZ9zFAy5GudHTOvbr3njBhVGnH+bQWFpeWV1YLa8XS+sZm2d7avlFxKjHxcMxi2QyRIowK4mmqGWkmkiAeMtII+xcjvzEgUtFYXOthQtocdQWNKEbaSIFd9ntIZ/4tzYOMH/XzwK44VWcMOE/cKanU9r/e3gel73pgf/idGKecCI0ZUqrlOoluZ0hqihnJi36qSIJwH3VJy1CBOFHtbHx4Dg+M0oFRLE0JDcfq74kMcaWGPDSdHOmemvVG4n9eK9XRaTujIkk1EXiyKEoZ1DEcpQA7VBKs2dAQhCU1t0LcQxJhbbIqmhDc2ZfniXdcPau6VyaMczBBAeyCPXAIXHACauAS1IEHMEjBPXgET9ad9WA9Wy+T1gVrOrMD/sB6/QFVVJeD</latexit>

ˆ
1 a
ˆ m,k Ĝ2 [m − 1, k]γ̂[m − 1, k]
<latexit sha1_base64="OJFfAG5GlDhZcwzrJtxUI/cEDoA=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBiyXxot6KXjxWNLbQhjLZbtqlm03Y3Qgl9Cd48aDi1X/kzX/jts1BWx8MPN6bYWZemAqujet+O6WV1bX1jfJmZWt7Z3evun/wqJNMUebTRCSqHaJmgkvmG24Ea6eKYRwK1gpHN1O/9cSU5ol8MOOUBTEOJI84RWOle+8Me9WaW3dnIMvEK0gNCjR71a9uP6FZzKShArXueG5qghyV4VSwSaWbaZYiHeGAdSyVGDMd5LNTJ+TEKn0SJcqWNGSm/p7IMdZ6HIe2M0Yz1IveVPzP62QmugxyLtPMMEnni6JMEJOQ6d+kzxWjRowtQaq4vZXQISqkxqZTsSF4iy8vE/+8flX37txa47pIowxHcAyn4MEFNOAWmuADhQE8wyu8OcJ5cd6dj3lrySlmDuEPnM8fCRqNJw==</latexit>
sha1_base64="ECbaArmw/QzP6GlSIYSkBu246U0=">AAAB6XicbVC7TsNAEFzzTMIrQElzIkKiIbJpgC6ChjIITCIlVjhf1skp57N1d0aKrHwCDQWvlj+i42+4PApIGGml0cyudnfCVHBtXPfbWVpeWV1bLxRLG5tb2zvl3b17nWSKoc8SkahmSDUKLtE33AhspgppHApshIOrsd94RKV5Iu/MMMUgpj3JI86osdKtd0I75YpbdScgi8SbkUqtmL4/AEC9U/5qdxOWxSgNE1TrluemJsipMpwJHJXamcaUsgHtYctSSWPUQT45dUSOrNIlUaJsSUMm6u+JnMZaD+PQdsbU9PW8Nxb/81qZic6DnMs0MyjZdFGUCWISMv6bdLlCZsTQEsoUt7cS1qeKMmPTKdkQvPmXF4l/Wr2oejc2jEuYogAHcAjH4MEZ1OAa6uADgx48wQu8OsJ5dt6cj2nrkjOb2Yc/cD5/AFFKjtk=</latexit>
sha1_base64="+WCNPY2qUWhZ6tpikJM1r8oUDHU=">AAAB6XicbVC7TgMxENwLryS8ApQ0FhESDdEdDdBF0FAGwZGI5BT5HF9ixWefbB8iOuUTaCh4tXwA/0LH14DzKCBhpJVGM7va3QkTzrRx3S8nt7C4tLySLxRX19Y3Nktb2zdapopQn0guVSPEmnImqG+Y4bSRKIrjkNN62D8f+fU7qjST4toMEhrEuCtYxAg2VrryDnG7VHYr7hhonnhTUq4Wktfbj/vvWrv02epIksZUGMKx1k3PTUyQYWUY4XRYbKWaJpj0cZc2LRU4pjrIxqcO0b5VOiiSypYwaKz+nshwrPUgDm1njE1Pz3oj8T+vmZroJMiYSFJDBZksilKOjESjv1GHKUoMH1iCiWL2VkR6WGFibDpFG4I3+/I88Y8qpxXv0oZxBhPkYRf24AA8OIYqXEANfCDQhQd4gmeHO4/Oi/M2ac0505kd+APn/Qclu5D5</latexit>

ξ[m, k] = a1
ˆ m,k max{γ̂[m, k] − 1, 0}
+ a2
|X[m, k]|2 γ̂[m − 1, k]
x 1
<latexit sha1_base64="a4vRUAqjZw7seJEh6MYi+uuavzQ=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4sSRe1FvRi8eKxhbaUDbbTbt0swm7E7GE/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLUykMuu63s7S8srq2Xtoob25t7+xW9vYfTJJpxn2WyES3Qmq4FIr7KFDyVqo5jUPJm+HweuI3H7k2IlH3OEp5ENO+EpFgFK1093TqdStVt+ZOQRaJV5AqFGh0K1+dXsKymCtkkhrT9twUg5xqFEzycbmTGZ5SNqR93rZU0ZibIJ+eOibHVumRKNG2FJKp+nsip7Exozi0nTHFgZn3JuJ/XjvD6CLIhUoz5IrNFkWZJJiQyd+kJzRnKEeWUKaFvZWwAdWUoU2nbEPw5l9eJP5Z7bLm3brV+lWRRgkO4QhOwINzqMMNNMAHBn14hld4c6Tz4rw7H7PWJaeYOYA/cD5/ACxtjT4=</latexit>
sha1_base64="IP/ucXeMKn01LyFe01SWsQBHSXo=">AAAB6XicbVDLTgJBEOzFF+AL9ehlIjHxItn1ot6IXjxidIUENjg7NDBhdnYzM2skGz7BiwdfV//Im3/j8DgoWEknlarudHeFieDauO63k1taXlldyxeK6xubW9ulnd07HaeKoc9iEatGSDUKLtE33AhsJAppFAqsh4PLsV9/QKV5LG/NMMEgoj3Ju5xRY6Wbx2OvXSq7FXcCski8GSlXC8n7PQDU2qWvVidmaYTSMEG1bnpuYoKMKsOZwFGxlWpMKBvQHjYtlTRCHWSTU0fk0Cod0o2VLWnIRP09kdFI62EU2s6Imr6e98bif14zNd2zIOMySQ1KNl3UTQUxMRn/TTpcITNiaAllittbCetTRZmx6RRtCN78y4vEP6mcV7xrG8YFTJGHfTiAI/DgFKpwBTXwgUEPnuAFXh3hPDtvzse0NefMZvbgD5zPH3SdjvA=</latexit>
sha1_base64="/zbEfUdbki53wJii38gDI3FP3Vw=">AAAB6XicbVC7TgMxENzjmYRXgJLGIkKiIbqjAboIGsogOBKRnCKf40us+HyW7UOJTvkEGgpeLR/Av9DxNeA8CkgYaaXRzK52d0LJmTau++UsLC4tr6zm8oW19Y3NreL2zq1OUkWoTxKeqHqINeVMUN8ww2ldKorjkNNa2LsY+bV7qjRLxI0ZSBrEuCNYxAg2VrruH3mtYsktu2OgeeJNSamSl693H/3vaqv42WwnJI2pMIRjrRueK02QYWUY4XRYaKaaSkx6uEMblgocUx1k41OH6MAqbRQlypYwaKz+nshwrPUgDm1njE1Xz3oj8T+vkZroNMiYkKmhgkwWRSlHJkGjv1GbKUoMH1iCiWL2VkS6WGFibDoFG4I3+/I88Y/LZ2XvyoZxDhPkYA/24RA8OIEKXEIVfCDQgQd4gmeHO4/Oi/M2aV1wpjO78AfO+w9JDpEQ</latexit>

γ̂[m, k] = (17)
|X[m − 1, k]|2 b1 ˆ m,k + b2ˆ m,k γ̂[m − 1, k] <latexit sha1_base64="/62JbImWgjg4U1i0LGZjDCJkyk8=">AAAB/HicbVBNS8NAEN34WetX/Lh5CRbBg5bEi3orevFYwdhCE8Jku22X7m7C7kaoIfhXvHhQ8eoP8ea/cdvmoK0PBh7vzTAzL04ZVdp1v62FxaXlldXKWnV9Y3Nr297ZvVdJJjHxccIS2Y5BEUYF8TXVjLRTSYDHjLTi4fXYbz0QqWgi7vQoJSGHvqA9ikEbKbL3gwHoPOgD51BEOT/1ToZFZNfcujuBM0+8ktRQiWZkfwXdBGecCI0ZKNXx3FSHOUhNMSNFNcgUSQEPoU86hgrgRIX55PrCOTJK1+kl0pTQzkT9PZEDV2rEY9PJQQ/UrDcW//M6me5dhDkVaaaJwNNFvYw5OnHGUThdKgnWbGQIYEnNrQ4egASsTWBVE4I3+/I88c/ql3Xv1q01rso0KugAHaJj5KFz1EA3qIl8hNEjekav6M16sl6sd+tj2rpglTN76A+szx+1ZZTs</latexit>
sha1_base64="FEgfaMNCTQExUKx2aLnUO/MWbF8=">AAAB/HicbVC7TsNAEFzzDOFlHh2NRUCigMimAboIGsogYRIpjqL15ZKccmdbd2ekYEX8Cg0FINrUfAMdH0LP5VFAwkgrjWZ2tbsTJpwp7bpf1tz8wuLScm4lv7q2vrFpb23fqTiVhPok5rGshqgoZxH1NdOcVhNJUYScVsLu1dCv3FOpWBzd6l5C6wLbEWsxgtpIDXs36KDOgjYKgf1GJk68426/YRfcojuCM0u8CSmUDr4HHwBQbtifQTMmqaCRJhyVqnluousZSs0Ip/18kCqaIOlim9YMjVBQVc9G1/edQ6M0nVYsTUXaGam/JzIUSvVEaDoF6o6a9obif14t1a3zesaiJNU0IuNFrZQ7OnaGUThNJinRvGcIEsnMrQ7poESiTWB5E4I3/fIs8U+LF0XvxoRxCWPkYA/24Qg8OIMSXEMZfCDwAE/wAq/Wo/VsvVnv49Y5azKzA39gDX4AfQaXuw==</latexit>
sha1_base64="titSB5VazXSaoasJpsGbw0hyXAg=">AAAB/HicbVC7TsNAEDyHV0h4mEdHYxGQKCCyaYAugoYySJhESixrfbkkp9zZ1t05UrAsfoWGAhBtfoA/oONDoObyKCBhpJVGM7va3QliRqWy7U8jt7C4tLySXy0U19Y3Ns2t7TsZJQITF0csEvUAJGE0JK6iipF6LAjwgJFa0Lsa+bU+EZJG4a0axMTj0Alpm2JQWvLN3WYXVNrsAOeQ+Sk/cY57mW+W7LI9hjVPnCkpVQ6+hu/94nfVNz+arQgnnIQKM5Cy4dix8lIQimJGskIzkSQG3IMOaWgaAifSS8fXZ9ahVlpWOxK6QmWN1d8TKXApBzzQnRxUV856I/E/r5Go9rmX0jBOFAnxZFE7YZaKrFEUVosKghUbaAJYUH2rhbsgACsdWEGH4My+PE/c0/JF2bnRYVyiCfJoD+2jI+SgM1RB16iKXITRPXpEz+jFeDCejFfjbdKaM6YzO+gPjOEPdCyZNQ==</latexit>
ˆm 1,k <latexit sha1_base64="MeXxbpfhxJiCmABCB0JbkRYEz64=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSJ4KokX9Vb04rGCsYU2lM120y7dbMLuRCwhP8OLBxWv/htv/hu3bQ7a+mDg8d4MM/PCVAqDrvvtrKyurW9sVraq2zu7e/u1g8MHk2SacZ8lMtGdkBouheI+CpS8k2pO41Dydji+mfrtR66NSNQ9TlIexHSoRCQYRSt1e5GmLPeK/Kno1+puw52BLBOvJHUo0erXvnqDhGUxV8gkNabruSkGOdUomORFtZcZnlI2pkPetVTRmJsgn51ckFOrDEiUaFsKyUz9PZHT2JhJHNrOmOLILHpT8T+vm2F0GeRCpRlyxeaLokwSTMj0fzIQmjOUE0so08LeStiI2hTQplS1IXiLLy8T/7xx1fDu3HrzukyjAsdwAmfgwQU04RZa4AODBJ7hFd4cdF6cd+dj3rrilDNH8AfO5w8MoZFJ</latexit>
<latexit sha1_base64="G1750OsRP5tfbtQNVD/j5s5DkO0=">AAAB8XicbVA9T8MwEL2Ur1K+CowsFhUSU5WwULYKFsYiEVopjSrHdVqrjh3ZDqKK8jNYGACx8mcQGxN/BfdjgJYnnfT03p3u7kUpZ9q47pdTWlldW98ob1a2tnd296r7B3daZopQn0guVSfCmnImqG+Y4bSTKoqTiNN2NLqa+O17qjST4taMUxomeCBYzAg2Vgq6scIk94r8oehVa27dnQItE29Oas3G9wcCgFav+tntS5IlVBjCsdaB56YmzLEyjHBaVLqZpikmIzyggaUCJ1SH+fTkAp1YpY9iqWwJg6bq74kcJ1qPk8h2JtgM9aI3Ef/zgszEjTBnIs0MFWS2KM44MhJN/kd9pigxfGwJJorZWxEZYpuCsSlVbAje4svLxD+rX9S9GxvGJcxQhiM4hlPw4ByacA0t8IGAhEd4hhfHOE/Oq/M2ay0585lD+APn/QceJZOQ</latexit>
sha1_base64="U6qWvLuRCUwhu8abk0SW1ihHM/w=">AAAB8XicbVA9SwNBEJ2LXzF+RS1EbA6DYBXubIxd0MYygmcClyPsbfaSJXu7x+6eGI77GTYWKrb+GbHTxtaf4eaj0MQHA4/3ZpiZFyaMKu04H1ZhYXFpeaW4Wlpb39jcKm/v3CiRSkw8LJiQrRApwignnqaakVYiCYpDRprh4GLkN2+JVFTwaz1MSBCjHqcRxUgbyW9HEuHMzbO7vFOuOFVnDHueuFNSqde+3vY+v/cbnfJ7uytwGhOuMUNK+a6T6CBDUlPMSF5qp4okCA9Qj/iGchQTFWTjk3P7yChdOxLSFNf2WP09kaFYqWEcms4Y6b6a9Ubif56f6qgWZJQnqSYcTxZFKbO1sEf/210qCdZsaAjCkppbbdxHJgVtUiqZENzZl+eJd1I9q7pXJoxzmKAIB3AIx+DCKdThEhrgAQYB9/AIT5a2Hqxn62XSWrCmM7vwB9brD4uslWI=</latexit>
1
x <latexit sha1_base64="b9gM4OMMSoAsCWuXQ7mRvfZaouc=">AAAB+nicbVBNS8NAEN3Ur1q/Yj16CRbBg5TEi3orevFYwWihCWGy3bRLdzdhdyOWkL/ixYOKV3+JN/+N2zYHbX0w8Hhvhpl5ccao0q77bdVWVtfWN+qbja3tnd09e795r9JcYuLjlKWyF4MijAria6oZ6WWSAI8ZeYjH11P/4ZFIRVNxpycZCTkMBU0oBm2kyG4GI9BFMATOoYwKfjouI7vltt0ZnGXiVaSFKnQj+ysYpDjnRGjMQKm+52Y6LEBqihkpG0GuSAZ4DEPSN1QAJyosZreXzrFRBk6SSlNCOzP190QBXKkJj00nBz1Si95U/M/r5zq5CAsqslwTgeeLkpw5OnWmQTgDKgnWbGIIYEnNrQ4egQSsTVwNE4K3+PIy8c/al23v1m11rqo06ugQHaET5KFz1EE3qIt8hNETekav6M0qrRfr3fqYt9asauYA/YH1+QPPPZR6</latexit>
sha1_base64="8wu98Pa8PD/+eoT538OeidKMIeo=">AAAB+nicbVC7TsNAEFzzDOFlQkljEZAoUGTTAF0EDWWQMImUWNH6cklOubOtuzMisvwrNBSAaBE130DHh9BzeRSQMNJKo5ld7e6ECWdKu+6XtbC4tLyyWlgrrm9sbm3bO6VbFaeSUJ/EPJaNEBXlLKK+ZprTRiIpipDTeji4HPn1OyoVi6MbPUxoILAXsS4jqI3UtkutPuqs1UMhMG9n4niQt+2yW3HHcOaJNyXl6sH3+wcA1Nr2Z6sTk1TQSBOOSjU9N9FBhlIzwmlebKWKJkgG2KNNQyMUVAXZ+PbcOTRKx+nG0lSknbH6eyJDodRQhKZToO6rWW8k/uc1U909CzIWJammEZks6qbc0bEzCsLpMEmJ5kNDkEhmbnVIHyUSbeIqmhC82ZfniX9SOa941yaMC5igAHuwD0fgwSlU4Qpq4AOBe3iAJ3i2cuvRerFeJ60L1nRmF/7AevsBlt6XSQ==</latexit>
sha1_base64="WrBYMn7QY+bZKmiuaBCB0i8vJRE=">AAAB+nicbVC7TsNAEDyHV0h4mFDSWAQkChTZNEAXQUMZJEwixZa1vlySU+5s6+4cEVn+FRoKQLSIH+AP6PgQqLk8CkgYaaXRzK52d8KEUals+9MoLC2vrK4V10vljc2tbXOncivjVGDi4pjFohWCJIxGxFVUMdJKBAEeMtIMB5djvzkkQtI4ulGjhPgcehHtUgxKS4FZ8fqgMq8HnEMeZPx4kAdm1a7ZE1iLxJmRav3g6+19WP5uBOaH14lxykmkMAMp246dKD8DoShmJC95qSQJ4AH0SFvTCDiRfja5PbcOtdKxurHQFSlrov6eyIBLOeKh7uSg+nLeG4v/ee1Udc/8jEZJqkiEp4u6KbNUbI2DsDpUEKzYSBPAgupbLdwHAVjpuEo6BGf+5UXintTOa861DuMCTVFEe2gfHSEHnaI6ukIN5CKM7tA9ekRPRm48GM/Gy7S1YMxmdtEfGK8/jgSYww==</latexit>
ˆm,k
ˆ m,k
a1 ˆ m,k = F F (1 − a[m, k])
= F F (a[m, k]), a2
ˆ m,k = F F (β(Λ[m, k])), b2
b1 ˆ m,k = F F (1 − β(Λ[m, k])) 1 x ˆ
⇠ˆ
e 1+⇠ˆ
<latexit sha1_base64="PJNiMr6c/Cumz+G6wXEOYA7r8hs=">AAAB6XicbVBNS8NAEJ34WetX1aOXxSJ4sSRe1FvRi8eKxhbaUDbbTbt0swm7E7GE/gQvHlS8+o+8+W/ctjlo64OBx3szzMwLUykMuu63s7S8srq2Xtoob25t7+xW9vYfTJJpxn2WyES3Qmq4FIr7KFDyVqo5jUPJm+HweuI3H7k2IlH3OEp5ENO+EpFgFK10550+dStVt+ZOQRaJV5AqFGh0K1+dXsKymCtkkhrT9twUg5xqFEzycbmTGZ5SNqR93rZU0ZibIJ+eOibHVumRKNG2FJKp+nsip7Exozi0nTHFgZn3JuJ/XjvD6CLIhUoz5IrNFkWZJJiQyd+kJzRnKEeWUKaFvZWwAdWUoU2nbEPw5l9eJP5Z7bLm3brV+lWRRgkO4QhOwINzqMMNNMAHBn14hld4c6Tz4rw7H7PWJaeYOYA/cD5/ACvfjT4=</latexit>
sha1_base64="CJ4drV+P146Tzwha1gHzSlLTB2U=">AAAB6XicbVDLTgJBEOzFF+AL9ehlIjHxItn1ot6IXjxidIUENjg7NDBhdnYzM2skGz7BiwdfV//Im3/j8DgoWEknlarudHeFieDauO63k1taXlldyxeK6xubW9ulnd07HaeKoc9iEatGSDUKLtE33AhsJAppFAqsh4PLsV9/QKV5LG/NMMEgoj3Ju5xRY6Ub7/ixXSq7FXcCski8GSlXC8n7PQDU2qWvVidmaYTSMEG1bnpuYoKMKsOZwFGxlWpMKBvQHjYtlTRCHWSTU0fk0Cod0o2VLWnIRP09kdFI62EU2s6Imr6e98bif14zNd2zIOMySQ1KNl3UTQUxMRn/TTpcITNiaAllittbCetTRZmx6RRtCN78y4vEP6mcV7xrG8YFTJGHfTiAI/DgFKpwBTXwgUEPnuAFXh3hPDtvzse0NefMZvbgD5zPH3QPjvA=</latexit>
sha1_base64="ENhwjexHi3Tw3TP/U9cdxW8CPhQ=">AAAB6XicbVC7TgMxENzjmYRXgJLGIkKiIbqjAboIGsogOBKRnCKf40us+HyW7UOJTvkEGgpeLR/Av9DxNeA8CkgYaaXRzK52d0LJmTau++UsLC4tr6zm8oW19Y3NreL2zq1OUkWoTxKeqHqINeVMUN8ww2ldKorjkNNa2LsY+bV7qjRLxI0ZSBrEuCNYxAg2Vrr2jvqtYsktu2OgeeJNSamSl693H/3vaqv42WwnJI2pMIRjrRueK02QYWUY4XRYaKaaSkx6uEMblgocUx1k41OH6MAqbRQlypYwaKz+nshwrPUgDm1njE1Xz3oj8T+vkZroNMiYkKmhgkwWRSlHJkGjv1GbKUoMH1iCiWL2VkS6WGFibDoFG4I3+/I88Y/LZ2XvyoZxDhPkYA/24RA8OIEKXEIVfCDQgQd4gmeHO4/Oi/M2aV1wpjO78AfO+w9IgJEQ</latexit>

⇤m log
⇤m,k
1,k
where F F (·) represents a feedforward neural network. Al- 1 + ⇠ˆ
<latexit sha1_base64="ii5wciDiPeoxHjYUyRBxe6RCZHc=">AAAB83icbVA9T8MwFHwpX6V8FRhZLCokBlQlLMBWwcLAUCRCK7VR5ThOa9V2gu1UqqL+DhYGQKz8GTb+DW6bAVpOsnS6u6f3fGHKmTau++2UVlbX1jfKm5Wt7Z3dver+waNOMkWoTxKeqHaINeVMUt8ww2k7VRSLkNNWOLyZ+q0RVZol8sGMUxoI3JcsZgQbKwXdOxuNcC8XZ8NJr1pz6+4MaJl4BalBgWav+tWNEpIJKg3hWOuO56YmyLEyjHA6qXQzTVNMhrhPO5ZKLKgO8tnRE3RilQjFibJPGjRTf0/kWGg9FqFNCmwGetGbiv95nczEl0HOZJoZKsl8UZxxZBI0bQBFTFFi+NgSTBSztyIywAoTY3uq2BK8xS8vE/+8flX37t1a47poowxHcAyn4MEFNOAWmuADgSd4hld4c0bOi/PufMyjJaeYOYQ/cD5/APmVkcQ=</latexit>
sha1_base64="Dx/mkmPCaBRU2lehGklWrM/4OgU=">AAAB83icbVC7TsMwFL0pr1JeBUYWi4LEgKqEBdgqWBgYikRppTaqHMdprdpOsJ1KVdTvYGEAxMrKzDew8SHsuI8BWo5k6eicc3WvT5Bwpo3rfjm5hcWl5ZX8amFtfWNzq7i9c6fjVBFaIzGPVSPAmnImac0ww2kjURSLgNN60Lsc+fU+VZrF8tYMEuoL3JEsYgQbK/mtaxsNcTsTx71hu1hyy+4YaJ54U1KqHHy/fwBAtV38bIUxSQWVhnCsddNzE+NnWBlGOB0WWqmmCSY93KFNSyUWVPvZ+OghOrRKiKJY2ScNGqu/JzIstB6IwCYFNl09643E/7xmaqIzP2MySQ2VZLIoSjkyMRo1gEKmKDF8YAkmitlbEelihYmxPRVsCd7sl+dJ7aR8XvZubBkXMEEe9mAfjsCDU6jAFVShBgTu4QGe4NnpO4/Oi/M6ieac6cwu/IHz9gPBNpST</latexit>
sha1_base64="gEVT0gJnpksuxy8VOxfQmRkxCFY=">AAAB83icbVC7TsMwFL0pr9LyKDCyWBQkBlQlLMBWwcLAUCRCK7VR5ThOa9VOgu1UqqJ+BwsDIFZWfoA/YONDYMZ9DNByJEtH55yre338hDOlbfvTyi0sLi2v5FcLxbX1jc3S1vatilNJqEtiHsuGjxXlLKKuZprTRiIpFj6ndb93MfLrfSoVi6MbPUioJ3AnYiEjWBvJa12ZaIDbmTjqDdulsl2xx0DzxJmScnX/6+29X/yutUsfrSAmqaCRJhwr1XTsRHsZlpoRToeFVqpogkkPd2jT0AgLqrxsfPQQHRglQGEszYs0Gqu/JzIslBoI3yQF1l01643E/7xmqsNTL2NRkmoakcmiMOVIx2jUAAqYpETzgSGYSGZuRaSLJSba9FQwJTizX54n7nHlrOJcmzLOYYI87MIeHIIDJ1CFS6iBCwTu4B4e4cnqWw/Ws/Uyieas6cwO/IH1+gO4XJYN</latexit>

<latexit sha1_base64="pjj8QDuN4//rSD7kwrYVR2wip+A=">AAAB6XicbVA9T8MwED2Xr1K+CowsFhUSU5WwAFsFC2MRhFZqo8pxndSqY0e2g1RF/QksDIBY+Uds/BvcNgMUnnTS03t3ursXZYIb63lfqLKyura+Ud2sbW3v7O7V9w8ejMo1ZQFVQuluRAwTXLLAcitYN9OMpJFgnWh8PfM7j0wbruS9nWQsTEkiecwpsU66EyoZ1Bte05sD/yV+SRpQoj2of/aHiuYpk5YKYkzP9zIbFkRbTgWb1vq5YRmhY5KwnqOSpMyExfzUKT5xyhDHSruSFs/VnxMFSY2ZpJHrTIkdmWVvJv7n9XIbX4QFl1lumaSLRXEusFV49jcecs2oFRNHCNXc3YrpiGhCrUun5kLwl1/+S4Kz5mXTv/UarasyjSocwTGcgg/n0IIbaEMAFBJ4ghd4RQI9ozf0vmitoHLmEH4BfXwDz9uNqg==</latexit>
sha1_base64="lMEBX+U9FCUUUzIRR0xULF83jYw=">AAAB6XicbVC7TgMxENwLrxBeAUooLCIkquiOJtBF0FAmgiORklPkc3wXKz77ZPuQolM+gYYCEC0/wXfQ0fEpOI8CEkZaaTSzq92dMOVMG9f9cgorq2vrG8XN0tb2zu5eef/gXstMEeoTyaVqh1hTzgT1DTOctlNFcRJy2gqH1xO/9UCVZlLcmVFKgwTHgkWMYGOlWy7jXrniVt0p0DLx5qRSP/5ofgNAo1f+7PYlyRIqDOFY647npibIsTKMcDoudTNNU0yGOKYdSwVOqA7y6aljdGqVPoqksiUMmqq/J3KcaD1KQtuZYDPQi95E/M/rZCa6CHIm0sxQQWaLoowjI9Hkb9RnihLDR5Zgopi9FZEBVpgYm07JhuAtvrxM/PPqZdVr2jCuYIYiHMEJnIEHNajDDTTABwIxPMIzvDjceXJenbdZa8GZzxzCHzjvP92Qj+8=</latexit>
sha1_base64="00weeR1wz2eDutGILdDYtduox7g=">AAAB6XicbVC7SgNBFL0bXzG+opaKDAbBKuzaqF3QxjJB1wSSJcxOZpMhszPLzKwQlpSWNhYqtv5EvsPOb/AnnDwKTTxw4XDOvdx7T5hwpo3rfjm5peWV1bX8emFjc2t7p7i7d69lqgj1ieRSNUKsKWeC+oYZThuJojgOOa2H/euxX3+gSjMp7swgoUGMu4JFjGBjpVsuu+1iyS27E6BF4s1IqXI4qn0/Ho2q7eJnqyNJGlNhCMdaNz03MUGGlWGE02GhlWqaYNLHXdq0VOCY6iCbnDpEJ1bpoEgqW8Kgifp7IsOx1oM4tJ0xNj09743F/7xmaqKLIGMiSQ0VZLooSjkyEo3/Rh2mKDF8YAkmitlbEelhhYmx6RRsCN78y4vEPytflr2aDeMKpsjDARzDKXhwDhW4gSr4QKALT/ACrw53np03533amnNmM/vwB87HD7pvkVU=</latexit>

<latexit sha1_base64="W7mvHQlxQ4Y2HyyCzrncLD9dKas=">AAAB93icbVA9T8MwFHzhs5SPBhhZLCokBqgSFmCrYGFgKBKhldoochyntWonke0glai/hIUBECt/hY1/g9tmgJaTLJ3u7uk9X5hxprTjfFtLyyura+uVjerm1vZOzd7de1BpLgn1SMpT2Qmxopwl1NNMc9rJJMUi5LQdDq8nfvuRSsXS5F6PMuoL3E9YzAjWRgrsWu/WhCMcFOLUPRmOA7vuNJwp0CJxS1KHEq3A/upFKckFTTThWKmu62TaL7DUjHA6rvZyRTNMhrhPu4YmWFDlF9PDx+jIKBGKU2leotFU/T1RYKHUSIQmKbAeqHlvIv7ndXMdX/gFS7Jc04TMFsU5RzpFkxZQxCQlmo8MwUQycysiAywx0aarqinBnf/yIvHOGpcN986pN6/KNipwAIdwDC6cQxNuoAUeEMjhGV7hzXqyXqx362MWXbLKmX34A+vzB1Jqkmc=</latexit>
sha1_base64="lc5zgz4ewh+7bJ4+zcxBTJGsPNo=">AAAB93icbVC7TsMwFL0pr1IeDTCyRBQkBqgSFmCrYGFgKBKhldooclyntWo7ke0glahfwsIAiJWFmW9g40PYcR8DFI5k6eicc3WvT5QyqrTrflqFufmFxaXicmlldW29bG9s3qgkk5j4OGGJbEZIEUYF8TXVjDRTSRCPGGlE/fOR37glUtFEXOtBSgKOuoLGFCNtpNAuty9NuIPCnB96B/1haFfcqjuG85d4U1Kp7X69vQNAPbQ/2p0EZ5wIjRlSquW5qQ5yJDXFjAxL7UyRFOE+6pKWoQJxooJ8fPjQ2TNKx4kTaZ7Qzlj9OZEjrtSARybJke6pWW8k/ue1Mh2fBDkVaaaJwJNFccYcnTijFpwOlQRrNjAEYUnNrQ7uIYmwNl2VTAne7Jf/Ev+oelr1rkwZZzBBEbZhB/bBg2OowQXUwQcMGdzDIzxZd9aD9Wy9TKIFazqzBb9gvX4DGguVNg==</latexit>
sha1_base64="aJRidUPHYn7GC50qXhm658TvXuE=">AAAB93icbVC9TsMwGHTKX2mBBhhZLAoSA1QJC7BVsDAwFInQSm0UOY7bWrWdyHYqlahPwsIAiJWFF+AN2HgQmHF/Bmg5ydLp7j59ny9MGFXacT6t3MLi0vJKfrVQXFvfKNmbW7cqTiUmHo5ZLBshUoRRQTxNNSONRBLEQ0bqYe9i5Nf7RCoaixs9SIjPUUfQNsVIGymwS60rE45QkPEj97A3DOyyU3HGgPPEnZJyde/r7b1f/K4F9kcrinHKidCYIaWarpNoP0NSU8zIsNBKFUkQ7qEOaRoqECfKz8aHD+G+USLYjqV5QsOx+nsiQ1ypAQ9NkiPdVbPeSPzPa6a6fepnVCSpJgJPFrVTBnUMRy3AiEqCNRsYgrCk5laIu0girE1XBVOCO/vleeIdV84q7rUp4xxMkAc7YBccABecgCq4BDXgAQxScA8ewZN1Zz1Yz9bLJJqzpjPb4A+s1x8RMZaw</latexit>

though the equations look very similar to Eqs. 8 and 16, we note
<latexit sha1_base64="q6/1u35BwFuw38hb+WVGBAJ7RWU=">AAACKXicbZDLSsNAFIYnXmu9RV26CRZBEEriRt1V3bisYGyhieVkOmmHziRhZiKWkOdx46u46cLb1hdx2gTR1gMD3/z/OcycP0gYlcq2P4yFxaXlldXKWnV9Y3Nr29zZvZNxKjBxccxi0Q5AEkYj4iqqGGknggAPGGkFw6uJ33ogQtI4ulWjhPgc+hENKQalpa554YUCcEbuswK8AajMe6R5AX3gHPI8c45/jHzm2jVrdt2eljUPTgk1VFaza469XoxTTiKFGUjZcexE+RkIRTEjedVLJUkAD6FPOhoj4ET62XTV3DrUSs8KY6FPpKyp+nsiAy7liAe6k4MayFlvIv7ndVIVnvkZjZJUkQgXD4Ups1RsTXKzelQQrNhIA2BB9V8tPAAdmdLpVnUIzuzK8+Ce1M/rzo1da1yWaVTQPjpAR8hBp6iBrlETuQijJ/SCXtGb8WyMjXfjs2hdMMqZPfSnjK9v9BSpog==</latexit>
sha1_base64="WkQG03e7IxxH4I7ElY0S01J/d5w=">AAACKXicbZDLSsNAFIZPvNZ6q7p0EyyCIJTEjbqrunFZwdpCU8vJdNIOziRhZiKWkOdx40P4Am668IY7X8RpU0SrBwa++f9zmDm/H3OmtOO8WTOzc/MLi4Wl4vLK6tp6aWPzSkWJJLROIh7Jpo+KchbSumaa02YsKQqf04Z/czbyG7dUKhaFl3oQ07bAXsgCRlAbqVM68QKJJKXXaQ5eH3Xq3bEshx4KgVmWuvvfRjZ17ZTKTsUZl/0X3AmUq6cf5UcAqHVKQ68bkUTQUBOOSrVcJ9btFKVmhNOs6CWKxkhusEdbBkMUVLXT8aqZvWuUrh1E0pxQ22P150SKQqmB8E2nQN1X095I/M9rJTo4aqcsjBNNQ5I/FCTc1pE9ys3uMkmJ5gMDSCQzf7VJH01k2qRbNCG40yv/hfpB5bjiXozCgLwKsA07sAcuHEIVzqEGdSBwD0/wDC/WgzW0Xq33vHXGmsxswa+yPr8A/Pmr4w==</latexit>
sha1_base64="BSfxtPG4SL61p1nwvmrb5w88DyQ=">AAACKXicbZDLSsNAFIYn9VbrLerSTbAIglASN+qu6sZlBWsLTS0n00k7dCYJMxOxhDyPLnwHn8BNF95w5yu4cOm0KaKtBwa++f9zmDm/FzEqlW2/GrmZ2bn5hfxiYWl5ZXXNXN+4lGEsMKnikIWi7oEkjAakqqhipB4JAtxjpOb1Tod+7ZoIScPgQvUj0uTQCahPMSgttcxj1xeAE3KVZOB2QSXuDU0z6ADnkKaJs/djpBPXllm0S/aorGlwxlAsn7wXH77uPistc+C2QxxzEijMQMqGY0eqmYBQFDOSFtxYkghwDzqkoTEATmQzGa2aWjtaaVt+KPQJlDVSf08kwKXsc093clBdOekNxf+8Rqz8w2ZCgyhWJMDZQ37MLBVaw9ysNhUEK9bXAFhQ/VcLd0FHpnS6BR2CM7nyNFT3S0cl53wYBsoqj7bQNtpFDjpAZXSGKqiKMLpFj+gJPRv3xsB4Md6y1pwxntlEf8r4+AZrBq52</latexit>

two key differences. First, each coefficient is now parametrized |Xm,k |2

by both time and frequency. Because of the interconnection of <latexit sha1_base64="VLm/ZTAXlVxBH0qJhK4VSR6LGVI=">AAACCXicbVC7TsMwFHV4lvIKMLIYKiQGqJIuwFbBwlgkQiu1IXJcp7VqO5HtIFVpZhZ+hYUBECt/wMbf4LYZoOVIls49515d3xMmjCrtON/WwuLS8spqaa28vrG5tW3v7N6pOJWYeDhmsWyFSBFGBfE01Yy0EkkQDxlphoOrsd98IFLRWNzqYUJ8jnqCRhQjbaTAPuhEEuFs1AoyfjLIR/e1fFqcukUZ2BWn6kwA54lbkAoo0Ajsr043xiknQmOGlGq7TqL9DElNMSN5uZMqkiA8QD3SNlQgTpSfTU7J4ZFRujCKpXlCw4n6eyJDXKkhD00nR7qvZr2x+J/XTnV07mdUJKkmAk8XRSmDOobjXGCXSoI1GxqCsKTmrxD3kclGm/TKJgR39uR54tWqF1X3xqnUL4s0SmAfHIJj4IIzUAfXoAE8gMEjeAav4M16sl6sd+tj2rpgFTN74A+szx/K1JqB</latexit>
sha1_base64="NDbuMTez7CBVlmCgtJgtkPmkEhg=">AAACCXicbVC7TsNAEFyHVwgvAyWNIUKigMhOA3QBGgqKIGESKTHW+XJOTjk/dHdGihzXNNT8BQ0FIFr+gI6/4RKngISRTpqd2dXejhczKqRpfmuFufmFxaXicmlldW19Q9/cuhVRwjGxccQi3vSQIIyGxJZUMtKMOUGBx0jD61+M/MY94YJG4Y0cxMQJUDekPsVIKsnVd9s+RzgdNt00OOxnw7tqlhdH1qR09bJZMccwZok1IeXa2dNVDQDqrv7V7kQ4CUgoMUNCtCwzlk6KuKSYkazUTgSJEe6jLmkpGqKACCcdn5IZ+0rpGH7E1QulMVZ/T6QoEGIQeKozQLInpr2R+J/XSqR/4qQ0jBNJQpwv8hNmyMgY5WJ0KCdYsoEiCHOq/mrgHlLZSJVeSYVgTZ88S+xq5bRiXaswziFHEXZgDw7AgmOowSXUwQYMD/AMr/CmPWov2rv2kbcWtMnMNvyB9vkD/5KcJA==</latexit>
sha1_base64="b+i6676qkMqQlzL7Fdsg6FA+EMU=">AAACCXicbVC7TsMwFHXKq5RXgJElUCExQJV0AbYCC0IMRSK0Uhsqx3Vaq7YT2Q5SlWZmYeYvWBgAsfIHbHwDP4HbdICWI1k695x7dX2PH1EilW1/GbmZ2bn5hfxiYWl5ZXXNXN+4kWEsEHZRSENR96HElHDsKqIorkcCQ+ZTXPN7Z0O/doeFJCG/Vv0Iewx2OAkIgkpLLXO7GQiIkkG9lbD9Xjq4LadZceCMy5ZZtEv2CNY0ccakWDl5vKxciO9qy/xstkMUM8wVolDKhmNHykugUARRnBaascQRRD3YwQ1NOWRYesnolNTa1UrbCkKhH1fWSP09kUAmZZ/5upNB1ZWT3lD8z2vEKjjyEsKjWGGOskVBTC0VWsNcrDYRGCna1wQiQfRfLdSFOhul0yvoEJzJk6eJWy4dl5wrHcYpyJAHW2AH7AEHHIIKOAdV4AIE7sETeAGvxoPxbLwZ71lrzhjPbII/MD5+ADKRncs=</latexit>
|Xm 1,k |2
the neural network, these coefficients depend not only on the
frequency bin they belong to, but also all other frequency bins. Figure 1: SNRNN computation at frame m in the dashed box.
This is a useful generalization that is hard to carry out systemati- Octagons hold input and output. Latent variables are bounded
cally in the classical framework due to the lack of a closed-form by rings. Feedforward networks are highlighted by bold cap-
solution, but is easily realized in the neural network framework. sules. Rectangles and circles are element-wise operations.
Second, the heuristic constraint that weighting coefficients add
up to 1 is removed. We conclude our description of the SNRNN from the neural
The loss function of the SNRNN is twofold. We adopt network perspective. From the zoomed-in view of the recurrent
the mean squared error in the STSA domain of the enhanced structure shown in Fig. 1, a priori SNR, a posteriori SNR, and
speech, not only because it is a popular objective function in speech-presence likelihood ratio are interpreted as latent vari-
deep learning SE methods, but also because it is the principle ables that carry information across time frames. The four feed-
upon which the a priori SNR estimation is derived [4]: forward neural networks interact with instantaneous power ra-
tios rather than direct speech or noise power, which potentially
K−1
1 X makes the system robust against unseen noise types. In fact, all
Emse [m] = (|S[m, k]|−|Ŝ[m, k]|)2 (18)
K variables in the network are represented as ratios, motivated by
k=0
the finding that ratio masks are superior learning targets [10].
In addition to the MSE-STSA loss, we introduce frame-level
voice activity detection (VAD) loss. Because the recurrent 3. Experimental Results and Discussions
structure is derived directly from the decision-directed ap-
proach, we can obtain the frame-level speech-presence log- We conducted experiments using the RATS speech activity de-
likelihood ratio assuming statistical independence across fre- tection dataset [17] and the TIMIT dataset [18]. We selected the
quency: RATS dataset to train our system because it contains extremely
K−1
X challenging noise conditions. It also contains ample exam-
logΛ[m] = logΛ[m, k] (19) ples of both speech-present and speech-absent regions that are
k=0 needed for training. To demonstrate the enhancement quality of
where K is the total number of frequency bins. Assuming equal our system, we choose speakers from four of the RATS channels
prior probability of speech presence and absence, the speech- for a speaker verification (SV) evaluation. To demonstrate the
presence probability given the noisy frame can be expressed as: robustness of our system against other unseen noise types, we
performed the global and local signal-to-distortion ratio (SDR)
Λ[m] test [19] on the speech segments from the TIMIT dataset with
P (speech|X[m]) = (20)
1 + Λ[m] digitally added noise samples taken from the NOIZEUS dataset
[20]. In both experiments, we compared the SNRNN’s perfor-
We define the VAD loss for this two-class classfication prob- mance (denoted NN in all tables and figures) with the classical
lem as the cross entropy between the true and predicted speech- Wiener solution using decision-directed a priori SNR estima-
presence probability: tion with MCRA noise estimation (denoted DD).
To train the SNRNN, we used a total 56 hours of 320 audio
Evad [m] = −vad[m]log(P (speech|X[m]))
(21) recordings sampled at 16 kHz from Channels A and H in the de-
− (1 − vad[m])log(1 − P (speech|X[m])) velopment partition of the RATS SAD dataset. For the SV task,
we also used Channel D in training. During the training phase,
where vad[m] ∈ {0, 1} is the true speech-presence probability
1000 320-ms audio segments were randomly sampled from all
for frame m. The overall objective function is:
recordings to form one minibatch. We used oracle VAD in-
E[m] = αEmse [m] + (1 − α)Evad [m] (22) formation to maintain approximately equal numbers of speech-
present and speech-absent frames within each minibatch. Next,
where 0 ≤ α ≤ 1 is the weighting coefficient. we computed the STFT of each segment with a 32-ms Hamming
Table 1: Improvement of SDR and 512-point Segmental SDR on 6300 TIMIT Utterances

Cafeteria Babble Train Flight Car

SNR
SDR SegSDR SDR SegSDR SDR SegSDR SDR SegSDR
(dB)
DD NN DD NN DD NN DD NN DD NN DD NN DD NN DD NN
10 2.25 0.96 5.21 8.20 3.30 1.76 5.60 8.30 4.25 2.68 5.17 8.53 6.92 4.01 7.13 10.5
5 3.05 2.88 7.09 9.43 4.20 4.11 7.53 9.76 5.25 5.24 6.86 9.90 8.83 7.66 9.18 12.7
0 3.67 3.90 8.93 10.4 4.85 5.61 9.31 11.0 6.19 6.88 9.01 11.4 10.7 10.7 11.6 15.3
-5 3.44 3.65 9.54 10.5 5.27 6.37 10.7 11.9 7.00 7.69 11.3 12.5 12.3 13.1 13.8 17.5
-10 1.39 1.66 9.41 9.80 4.39 5.57 11.0 12.3 6.72 7.24 12.4 13.1 13.1 14.5 14.8 18.4
Mean 2.76 2.61 8.04 9.67 4.40 4.68 8.83 10.7 5.88 5.95 8.95 11.1 10.4 9.99 11.3 14.9

1
Table 2: Speaker Verification Performance on RATS Speakers
0.4
0.5
0.2
EER(%) Noisy DD NN Clean 0 0
Channel A 28.6 32.2 24.9 10.7 -0.2
Channel B 36.6 37.2 36.6 11.5 -0.4
-0.5

Channel C 44.8 40.1 36.7 7.93 -0.6 -1

0 2 4 6 8 10 0 0.5 1
Channel H 43.2 29.7 23.9 10.8 Time (s) Time (s)

window, 75% overlap between frames, and 512-point DFTs. Fi- 0.4
0.5
nally, we computed the magnitude STFT, retaining the first 257 0.2

frequency dimensions as the input to SNRNN. 0 0

-0.2
For each neural network inside the SNRNN, we used a 3- -0.5
-0.4
layer feedforward network with 257 neurons in each of the in-
-0.6 -1
put, hidden, and output layers. We used rectified linear units 0 2 4 6 8 10 0 0.5 1

(ReLUs) as the activation function at all layers because SNRs

are non-negative. We used the sigmoid function in Eq. 15. We Figure 2: Denoised waveforms. Top left and bottom left: de-
chose constants a = 0.98, b = 0.98 and δ = 0.15, which noised signal using DD and NN, respectively. Top right and
are effective for the classical DD approach [1]. The network bottom right: residual noise at 4-5s after 39.3dB amplification.
parameters were initialized so that each network is an identity wrongly suppressed by NN. However, the burst of noise during
ˆ and Λ are initialized the same
function prior to learning. γ̂, ξ, 0-3s and 5-10s is better contained using NN. In addition, NN
way as the decision-directed method [1]. α = 0.2 is a good produces far fewer musical artifacts during 4-5s. The parameter
weighting constant for the loss function. We used stochastic α controls the tradeoff between speech smearing and noise sup-
gradient descent with a learning rate of 10−4 and a momentum pression. Using MSE-STSA loss alone yields almost an identi-
of 0.9 to update all network weights. cal system as the decision-directed approach, while using VAD
We evaluated our enhancement system in terms of the equal loss alone results in a system that heavily suppresses noise and
error rate (EER) obtained for the SV task. The baseline SV smears speech. Overall, the robustness of SNRNN processing
system was trained using the ALIZE i-vector system setup de- was expected, as we note in Sec. 2, because the neural networks
scribed in [21]. Farsi speakers in the training partition of the “see” only instantaneous SNRs.
RATS SAD dataset were used for evaluation. The enrollment 4. Conclusions
consisted of 30, 28, 28, and 30 speakers from RATS Channels
In this paper, we have proposed a neural-network equivalent
A, B, C, and H, respectively. 28, 35, 25, and 37 recordings from
of the decision-directed a priori SNR estimation. We strongly
Channels A, B, C, and H, respectively, were tested against ev-
advocate the use of instantaneous SNRs as internal represen-
ery enrolled speaker from their corresponding channels. Table
tations in neural networks to accompany the use of IRMs as
2 shows that the NN provides significant improvement for all
learning targets [10] for noise robustness. Our system preserves
channels except Channel B. The improvement for unseen chan-
the robustness of the classical method while improving the ac-
nel C is even greater than the improvement for Channel A. In
curacy of the recurrent approximations of a priori and a pos-
addition, NN provides better performance than DD in all cases.
teriori SNRs. Our results have shown that SNRNN processing
One notable finding in our experiment is that Ĝ[m, k] in Eq. 8 can preserve speech and greatly suppress noise, while produc-
and 9 no longer need to be identical in SNRNN. The results in ing very few residual artifacts. In addition, our system can han-
Table 2 were obtained using the power subtraction rule [7] for dle unseen nonstationary noise conditions when trained on very
Eq. 8, and the Wiener filter rule for Eq. 9. few noise types. We introduce the joint STSA-MSE and VAD
Table 1 shows the improvement of global and 512-point loss function, and highlight the importance of VAD loss for bal-
segmental SDR after applying DD and NN on noisy TIMIT ancing the level of noise suppression and speech distortion. In
utterances. The four types of noise we include are perceptu- the future, we will attempt to improve the quality of enhanced
ally very different from the noise in RATS channels. Our re- speech in the speech-present regions, and extend the additive-
sults show that SNRNN consistently improves segmental SDR noise framework to linear filtering for channel compensation.
under all conditions, even though the global SDR sometimes
is worse than DD in high SNRs (which are rare in our train- 5. Acknowledgments
ing data). We illustrate this phenomenon in Fig. 2, where we We thank Benjamin Martinez Elizalde for useful suggestions on
show compensated waveforms after DD and NN processing, re- many prior drafts of this paper. We also thank our colleagues at
spectively. The impulse-like speech waveform at around 3.8s is the Afeka College of Engineering for their support.
6. References [19] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measure-
ment in blind audio source separation,” IEEE Transactions on Au-
[1] P. C. Loizou, Speech Enhancement : Theory and Practice, Second dio Speech and Language Processing, vol. 14, no. 4, pp. 1462–
Edition. CRC Press, 2013. 1469, 2006.
[2] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spec- [20] Y. Hu and P. C. Loizou, “Subjective Comparison of Speech En-
tral Subtraction,” IEEE Transactions on Acoustics, Speech, and hancement Algorithms,” Acoustics, Speech and Signal Process-
Signal Processing, vol. 27, no. 2, pp. 113–120, apr 1979. ing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International
[3] J. S. Lim and A. V. Oppenheim, “Enhancement and Bandwidth Conference on, vol. 1, pp. I–I, 2006.
Compression of Noisy Speech,” Proceedings of the IEEE, vol. 67, [21] A. Larcher, J.-F. Bonastre, B. Fauve, K. A. Lee, L. Christophe,
no. 12, pp. 1586–1604, 1979. H. Li, J. S. D. Mason, and J.-Y. Parfait, “ALIZE 3.0 - Open Source
[4] Y. Ephraim and D. Malah, “Speech Enhancement Using a- Min- Toolkit for State-of-the-Art Speaker Recognition,” Interspeech,
imum Mean- Square Error Short-Time Spectral Amplitude Esti- no. August, pp. 2768–2772, 2013.
mator,” IEEE Transactions on Acoustics Speech and Signal Pro-
cessing, vol. 32, no. 6, pp. 1109–1122, dec 1984.
[5] ——, “Speech enhancement using a minimum mean-square error
log-spectral amplitude estimator,” IEEE Transactions on Acous-
tics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445,
1985.
[6] R. J. McAulay and M. L. Malpass, “Speech Enhancement Using
a Soft-Decision Noise Suppression Filter,” IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. 28, no. 2, pp. 137–
145, apr 1980.
[7] P. Scalart and J. Filho, “Speech enhancement based on a priori sig-
nal to noise estimation,” in 1996 IEEE International Conference
on Acoustics, Speech, and Signal Processing Conference Proceed-
ings, vol. 2, 1996, pp. 629–632.
[8] J. Sohn and W. Sung, “A voice activity detector employing soft
decision based noise spectrum adaptation,” Acoustics, Speech and
Signal Processing, . Proceedings of the IEEE International Con-
ference on., vol. 1, pp. 365–368, 1998.
[9] I. Cohen and B. Berdugo, “Noise Estimation by Minima Con-
trolled Recursive Averaging for Robust Speech Enhancement,”
IEEE SIGNAL PROCESSING LETTERS, vol. 9, no. 1, 2002.
[10] Y. Wang, A. Narayanan, and D. L. Wang, “On training targets for
supervised speech separation,” IEEE/ACM Transactions on Audio
Speech and Language Processing, vol. 22, no. 12, pp. 1849–1858,
2014.
[11] Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee, “A Regres-
sion Approach to Speech Enhancement Based on Deep Neural
Networks,” IEEE/ACM Transactions on Audio, Speech, and Lan-
guage Processing, vol. 23, no. 1, pp. 7–19, jan 2015.
[12] A. Kumar and D. Florencio, “Speech Enhancement In Multiple-
Noise Conditions using Deep Neural Networks,” 2016.
[13] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech Enhancement
Based on Deep Denoising Autoencoder,” INTERSPEECH-2013,
pp. 436–440, 2013.
[14] F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux,
J. R. Hershey, and B. Schuller, “Speech enhancement with LSTM
recurrent neural networks and its application to noise-robust
ASR,” in Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioin-
formatics), vol. 9237. Springer, Cham, 2015, pp. 91–99.
[15] K. Osako, R. Singh, and B. Raj, “Complex recurrent neural net-
works for denoising speech signals,” in 2015 IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, WAS-
PAA 2015, 2015.
[16] S. Suhadi, C. Last, and T. Fingscheidt, “A data-driven approach
to a priori SNR estimation,” IEEE Transactions on Audio, Speech
and Language Processing, vol. 19, no. 1, pp. 186–195, 2011.
[17] K. Walker, X. Ma, D. Graff, S. Stephanie, S. Stephanie, and
K. Jones, “Rats speech activity detection ldc2015s02,” Philadel-
phia: Linguistic Data Consortium, 2015.
[18] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S.
Pallett, “DARPA TIMIT acoustic-phonetic continous speech cor-
pus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon Techni-
cal Report N, vol. 93, Feb. 1993.

T1630 - Specialised Electrical Installation Codes P1 Memo Aug 2022
0% (1)
T1630 - Specialised Electrical Installation Codes P1 Memo Aug 2022
5 pages
Vocoder Summer School 2021
No ratings yet
Vocoder Summer School 2021
298 pages
2023 - Dragas - A Survey On Low-Latency DNN-Based Speech Enhancement
No ratings yet
2023 - Dragas - A Survey On Low-Latency DNN-Based Speech Enhancement
26 pages
Fundamental of Speech Enhencements
No ratings yet
Fundamental of Speech Enhencements
112 pages
A Speech Denoising Demonstration System Using Multi-Model Deep-Learning Neural Networks
No ratings yet
A Speech Denoising Demonstration System Using Multi-Model Deep-Learning Neural Networks
23 pages
Applsci 15 02919
No ratings yet
Applsci 15 02919
19 pages
Deep Neural Networks For Speech Enhancement
No ratings yet
Deep Neural Networks For Speech Enhancement
7 pages
Catalago Volvo L30B MODERNA
100% (1)
Catalago Volvo L30B MODERNA
321 pages
Spcom20 Aaron
No ratings yet
Spcom20 Aaron
17 pages
Speech Processing Research Paper
No ratings yet
Speech Processing Research Paper
13 pages
Noise To Noise
No ratings yet
Noise To Noise
20 pages
Metadata of The Chapter That Will Be Visualized Online: Samui
No ratings yet
Metadata of The Chapter That Will Be Visualized Online: Samui
14 pages
Effect of Noise Suppression Losses On Speech Distortion and Asr Performance
No ratings yet
Effect of Noise Suppression Losses On Speech Distortion and Asr Performance
5 pages
DSP Paper
No ratings yet
DSP Paper
16 pages
RRR2
No ratings yet
RRR2
605 pages
The Religious Affiliation of Charlie Chaplin
100% (1)
The Religious Affiliation of Charlie Chaplin
15 pages
XG AC DC Annual Maint Schedule
No ratings yet
XG AC DC Annual Maint Schedule
28 pages
Crim 103
No ratings yet
Crim 103
7 pages
CDiffSEwRL 1113 Chu Final
No ratings yet
CDiffSEwRL 1113 Chu Final
9 pages
Applsci 12 03461 v2
No ratings yet
Applsci 12 03461 v2
15 pages
MSGLN
No ratings yet
MSGLN
10 pages
Improving GANs For Speech Enhancement
No ratings yet
Improving GANs For Speech Enhancement
5 pages
HP Man PPM9.30 HP SolutionIntegrations PDF
No ratings yet
HP Man PPM9.30 HP SolutionIntegrations PDF
428 pages
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
No ratings yet
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
7 pages
Speech Enhancement Using A DNN-Augmented Colored-Noise
No ratings yet
Speech Enhancement Using A DNN-Augmented Colored-Noise
14 pages
A Consolidate View of Loss Functions For Supervised Deep Learning-Based Speech Enhancement
No ratings yet
A Consolidate View of Loss Functions For Supervised Deep Learning-Based Speech Enhancement
5 pages
Speech Enhancement Temporal Convolutional Neural Network
No ratings yet
Speech Enhancement Temporal Convolutional Neural Network
37 pages
A Perceptually-Motivated Approach For Low-Complexity Real-Time Enhancement of Fullband Speech
No ratings yet
A Perceptually-Motivated Approach For Low-Complexity Real-Time Enhancement of Fullband Speech
5 pages
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
No ratings yet
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
8 pages
Applsci 09 02166
No ratings yet
Applsci 09 02166
12 pages
World GK MCQs For PPSC Set II PDF
No ratings yet
World GK MCQs For PPSC Set II PDF
12 pages
Paper 3
No ratings yet
Paper 3
8 pages
Lightburn2017 (Icaasp)
No ratings yet
Lightburn2017 (Icaasp)
5 pages
BERINGER PMP518M User Manual
No ratings yet
BERINGER PMP518M User Manual
11 pages
Performance Report - FMGE 2019-1
No ratings yet
Performance Report - FMGE 2019-1
20 pages
LAB 08 Web Filtering
No ratings yet
LAB 08 Web Filtering
19 pages
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
No ratings yet
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
13 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
18 pages
2019 Speech Enhancement For Secure Communication
No ratings yet
2019 Speech Enhancement For Secure Communication
19 pages
Growth of Probiotic in Soy Yogurt Formulation PDF
No ratings yet
Growth of Probiotic in Soy Yogurt Formulation PDF
8 pages
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
No ratings yet
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
14 pages
Multi-Band Spectral Subtraction Algorithm For Speech Enhancement
No ratings yet
Multi-Band Spectral Subtraction Algorithm For Speech Enhancement
12 pages
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
No ratings yet
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
17 pages
Noise PSD
No ratings yet
Noise PSD
4 pages
Background Noise Suppression in Audio File Using LSTM Network
No ratings yet
Background Noise Suppression in Audio File Using LSTM Network
9 pages
Bowon Lee Mark Hasegawa-Johnson
No ratings yet
Bowon Lee Mark Hasegawa-Johnson
5 pages
DeepFilterNet2205 05474
No ratings yet
DeepFilterNet2205 05474
5 pages
A Block-Based Linear MMSE Noise Reduction With A H PDF
No ratings yet
A Block-Based Linear MMSE Noise Reduction With A H PDF
15 pages
BTP Group-1 Report
No ratings yet
BTP Group-1 Report
21 pages
Guide To Road
No ratings yet
Guide To Road
332 pages
Nvs Teaching and Non Teaching Jobs 2019
No ratings yet
Nvs Teaching and Non Teaching Jobs 2019
4 pages
Robust Speech Recognition Using Adaptive Noise Cancellation
No ratings yet
Robust Speech Recognition Using Adaptive Noise Cancellation
4 pages
Multidimensional Locus of Control Attitude Scale Levenson Miller 1976 JPSP
No ratings yet
Multidimensional Locus of Control Attitude Scale Levenson Miller 1976 JPSP
10 pages
PCP Glossary of Diseases
No ratings yet
PCP Glossary of Diseases
4 pages
PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network
No ratings yet
PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network
9 pages
Keynote Slides
No ratings yet
Keynote Slides
33 pages
Fire From Heaven Mary Renault Instant Download
No ratings yet
Fire From Heaven Mary Renault Instant Download
38 pages
Comparison of Speech Enhancement Algorithms: Sciencedirect
No ratings yet
Comparison of Speech Enhancement Algorithms: Sciencedirect
11 pages
Power-Normalized Cepstral Coefficients (PNCC) For
No ratings yet
Power-Normalized Cepstral Coefficients (PNCC) For
14 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
No ratings yet
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
26 pages
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
No ratings yet
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
5 pages
Brochure - REMEMBERING PROFESSOR S.P. SATHE - 17TH INTERNATIONAL MOOT COURT COMPETITION
No ratings yet
Brochure - REMEMBERING PROFESSOR S.P. SATHE - 17TH INTERNATIONAL MOOT COURT COMPETITION
11 pages
Project Control Manager
No ratings yet
Project Control Manager
3 pages
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
No ratings yet
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
6 pages
Conference Book of Abstracts - I-CMME 2022
No ratings yet
Conference Book of Abstracts - I-CMME 2022
139 pages
Non-Linear Feature Extraction For Robust Speech Recognition in Stationary and Non-Stationary Noise
No ratings yet
Non-Linear Feature Extraction For Robust Speech Recognition in Stationary and Non-Stationary Noise
22 pages
Speech Enhancement Based On ESS
No ratings yet
Speech Enhancement Based On ESS
8 pages
CNN Basic
No ratings yet
CNN Basic
11 pages
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
No ratings yet
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
15 pages
Taal 2013
No ratings yet
Taal 2013
4 pages
CIBIL Report
No ratings yet
CIBIL Report
3 pages
MODULE 4 Implementating The Curriculum
No ratings yet
MODULE 4 Implementating The Curriculum
7 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
No ratings yet
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
13 pages
Pikovskaya Medio
No ratings yet
Pikovskaya Medio
2 pages
SCM Mba
No ratings yet
SCM Mba
94 pages
Adaptive Wiener Filtering Approach For Speech Enhancement
No ratings yet
Adaptive Wiener Filtering Approach For Speech Enhancement
9 pages
Speech Enhancement Using Signal Subspace Algorithm
No ratings yet
Speech Enhancement Using Signal Subspace Algorithm
4 pages
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
No ratings yet
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
4 pages
Programming Fundamentals: Laboratory Workbook
No ratings yet
Programming Fundamentals: Laboratory Workbook
54 pages
Selected Paper at Ncsp'20
No ratings yet
Selected Paper at Ncsp'20
4 pages
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
No ratings yet
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
12 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Battle Field Speech Enhancement Using An Efficient Unbiased Adaptive Filtering Technique
No ratings yet
Battle Field Speech Enhancement Using An Efficient Unbiased Adaptive Filtering Technique
5 pages
Project Synopsis
No ratings yet
Project Synopsis
5 pages
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
No ratings yet
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
7 pages
C-17 Test1 WT-2
No ratings yet
C-17 Test1 WT-2
3 pages
Logistics Manager - Franco Canzani
No ratings yet
Logistics Manager - Franco Canzani
2 pages

A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement

Uploaded by

A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement

Uploaded by

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust

Abstract domain for the well-known methods such as spectral subtraction

ˆ |Ŝ[m, k]|2 |X[m, k]|2 |N̂ [m,k]|2

The recurrent structure described in the previous subsection

two key differences. First, each coefficient is now parametrized |Xm,k |2

Cafeteria Babble Train Flight Car

Channel C 44.8 40.1 36.7 7.93 -0.6 -1

frequency dimensions as the input to SNRNN. 0 0

(ReLUs) as the activation function at all layers because SNRs

You might also like