Combining visual and acoustic speech signals with a neural network improves intelligibility

TJ Sejnowski, B Yuhas, M Goldstein… - Advances in neural …, 1989 - proceedings.neurips.cc
TJ Sejnowski, B Yuhas, M Goldstein, R Jenkins
Advances in neural information processing systems, 1989proceedings.neurips.cc
Acoustic speech recognition degrades in the presence of noise. Compensatory information
is available from the visual speech signals around the speaker's mouth. Previous attempts at
using these visual speech signals to improve automatic speech recognition systems have
combined the acoustic and visual speech information at a symbolic level using heuristic
rules. In this paper, we demonstrate an alternative approach to fusing the visual and
acoustic speech information by training feedforward neural networks to map the visual …
Acoustic speech recognition degrades in the presence of noise. Compensatory information is available from the visual speech signals around the speaker's mouth. Previous attempts at using these visual speech signals to improve automatic speech recognition systems have combined the acoustic and visual speech information at a symbolic level using heuristic rules. In this paper, we demonstrate an alternative approach to fusing the visual and acoustic speech information by training feedforward neural networks to map the visual signal onto the corresponding short-term spectral amplitude envelope (STSAE) of the acoustic signal. This information can be directly combined with the degraded acoustic STSAE. Significant improvements are demonstrated in vowel recognition from noise-degraded acoustic signals. These results are compared to the performance of humans, as well as other pattern matching and estimation algorithms.
proceedings.neurips.cc
Showing the best result for this search. See all results