Classification. of Reverberant Situations
Classification. of Reverberant Situations
Introduction
In daily communication, speech intelligibility depends on
the acoustic surrounding or acoustic situation. Particularly for hearing impaired persons, speech understanding
is often problematic if speech is distorted by (room)
reverb, noise or competing talkers. Acoustic situations
are characterized by different dominating types of distortion. Hearing aids might provide appropriate algorithms
to enhance speech intelligibility in the different acoustic
situations. A robust and fast automatic classification
of the acoustic situation should therefore select the
appropriate hearing aid algorithm without requiring an
action of the hearing aid wearer. This study is concerned
with the automatic estimation of the reverberation time
(T 60) in natural situations and with unknown excitation
signal.
Acoustic test situations were generated by
convolving speech signals with artificial and real room
impulse responses with T 60 times ranging from 0.05
to 4 s. Features derived from the cepstral mean, the
autocorrelation function and from the distribution of
modulation energy were used to blindly estimate different
reverb times.
(1)
where Aexp und Anoise are scalar, is the decay parameter in seconds, t is the time in seconds and n1 (t) and
n2 (t) present two independent noise processes.
A common measure for reverberation is the time until
the impulse response has decreased by 60 dB. In (1), the
reverberation time, T 60, can be calculated directly from
the decay parameter :
T 60 = ln(103 ) 6.908 .
(2)
Cepstral Mean
To estimate the impulse response from an unknown
reverberated signal there exists the theory of blind
homomorphic deconvolution [3], [4], [5]. Here a reverberated speech signal is assumed as:
sir(t) = s(t) h(t),
(5)
log
F 1
b ) + H(f
b ) sb(q) + b
s(t) h(t) S(f ) H(f ) S(f
h(q)
Figure 1: Calculation of the cepstrum from a convoluted
input signal
cepstrogram
|F( )|
filtering
(causal)
F (log( ))
F -1(log( ))
1 S( )
T
T
F -1(exp(F( )))
t=1
time
|F( )|
inverse
cepstrum
|F( )|
mean
-1
F -1(log( ))
time
Autocorrelation
The autocorrelation function Rsir,sir (t) of a reverberant
signal sir(t) is the convolution product of the autocorrelation functions of the underlying clean speech s(t) and
the room impulse response h(t).
Rsir,sir (t)
(6)
If the clean speech is considered to have a peaky autocorrelation function then the following approximation is
possible:
Rsir,sir (t) Rh,h (t).
(7)
For an exponential function, the autocorrelation function
for positive times has the same exponential decay parameter . Thus we assume the autocorrelation function
of the reverberated signal to decay like the underlying
impulse response. To reduce estimation errors averaging
over overlapping windows was performed. From the
averaged autocorrelation function the T 60 time was
estimated due to equation (4).
weighting
SRMR
G,K
high
low
time
low
SE
SE
g,k=1
G,K
g,k=1
Results
g,k
high
g,k
Cepstral Mean
The means of 200 T 60-time estimates for the artificial
IR setup are plotted in Fig. 4 (solid lines) as a function
of the analysis window duration for three different real
T 60 times indicated by the dotted lines.
The T 60time estimates depend on the analysis window duration,
4.5
3.5
3.2
1.6
estimated T60 /s
3
estimated T60 /s
2.5
2
1.5
1
0.8
0.2
66
53.8
1.9
96.5
100
100
97
94.1
0.9
0.1
100
0.5
0.3
0.05
0
0.5
1.5
2
2.5
3
window length /s
3.5
4.5
0.05
0.1
0.2
0.4
0.8
real T60 /s
1.6
3.2
0.3
0.9
real T60 /s
1.9
Autocorrelation
For the autocorrelation feature, the same sound material
and paramters as for the cepstral mean feature were used
(see above).
The means of 200 T 60-time estimates per window and
impulse response of the articicial IR setup are plotted
in Figure 6, comparable to Figure 4. Comparable to
blind estimated T60s with different window lengths
4.5
real T60: 0.1 s
real T60: 1.6 s
real T60: 3.2 s
4
3.5
3
estimated T60 /s
74
0.4
0.5
10.5
estimated T60 /s
2.5
2
1.5
1
0.5
0
0.5
1
0.5
1.5
2
2.5
3
window length /s
3.5
4.5
3.2
estimated T60 /s
1.6
0.8
0.5
15
20.5 48.5
20.6
1.9
19.5
33
0.4
0.2
estimated T60 /s
70.5
35.9
0.9
0.1
91.6
0.3
0.05
0.05
0.1
0.2
0.4
0.8
real T60 /s
1.6
3.2
0.3
0.9
real T60 /s
1.9
0.8
0.6
1.6
0.4
1.4
1.2
SRMR
SRMR
means
0.6
0.4
0.2
0.05
0.1
0.2
0.4
0.8
real T60 /s
1.6
3.2
Acknowledgements
This work was supported by the Bundesministerium f
ur
Bildung und Forschung (BMBF) project Modellbasierte
Horsysteme.
References
[1] M. Karjalainen, P. Antsalo, A. Makivirta, T.
Peltonen, V. Valimaki, Estimation of Modal Decay
Parameters from Noisy Response Measurments, J.
Audio. Eng. Soc., Vol. 50, No. 11, pp. 867-878, Nov.
2002
0.8
0.2
0.3
0.9
1.9
real T60 /s
Conclusions
Three different methods for the estimation of the reverberation time T 60 were presented. It was shown that
for the cepstral mean and the autocorrelation feature
an estimation of the T 60 time via the N
S and slope
criteria is possible with very good accuracy above about
200 ms. The lower limit of estimated T 60 times at
about 200 ms is most likely related to the statistical
features of speech. Both methods assume that the speech
signal is statistically independent in successive time
windows which is not the case. Shorter T 60 times could
be only estimated with a input signal of significantly