8.5 Multilingual Speech Processing
8.5 Multilingual Speech Processing
SPEECH PROCESSING
Veronica Gosteva – 1002
Linguistic and marketing
◦ Multilingual speech processing
provides a great opportunity to
revisit lingering challenges.
There has been much progress in the past Non-native speech , tend to have a large
few years in the areas of large impact on the accuracy of current speech
vocabulary speech recognition, dialog recognition systems. This is the case for
systems, and robustness of recognizers to small vocabulary, isolated word
noisy environments, making speech recognition tasks as well as for large
processing systems ready for real-world vocabulary, spontaneous speech
applications. recognition tasks.
Non-native speech
◦ The differences between native and non-native speech
can be quantified in a variety of ways, all relevant to the
problem of improving recognition for non-native speakers.
2) the incrementality,
Spoken
information, including speech recognition,
natural language understanding, dialog
modeling, and speech synthesis. Hence,
Dialog the user can present queries to the system
by speaking naturally, and the SDS can
Systems respond in real time in synthetic speech.
Numerous commercial SDS have been
deployed for multiple languages.
An example
dialog in the
stocks domain
illustrating the
capabilities of a
state-of-the-art
spoken dialog
system.
(source: www.
speechworks.c
om).
Multilingual Speech Recognition
◦ The speech recognition component uses
an HMM-based approach with context-
dependent acoustic models. In order to
efficiently capture contextual and
temporal variations in the input while
constraining the number of parameters, the
system uses the successive state splitting
(SSS) algorithm in combination with a
minimum description length criterion.
◦ This algorithm constructs appropriate context-
dependent model topologies by iteratively
identifying an HMM state that should be split into
two independent states. It then reestimates the
parameters of the resulting HMMs based on the
standard maximum-likelihood criterion. Two types
of splitting are supported:
◦ Contextual splitting
◦ Temporal splitting
◦ Contextual
splitting
and
temporal
splitting.
◦ In the past decade, the performance of
automatic speech processing systems
(such as automatic speech recognizers,
speech translation systems, and speech
synthesizers) has improved dramatically,
resulting in an increasingly widespread use
of speech technology in real-world
scenarios.
◦ The challenge of rapidly adapting existing
speech processing systems to new
languages is currently one of the major
bottlenecks in the development of
BIBLIOGRAPHY:
Singh, R., Raj, B., Stern, R. (2002). Automatic generation of subword units for
speech recognition systems. In: IEEE Transactions on Speech and Audio
Processing. 10. pp. 98-99.
◦ Somers, H. (1999). Review article: Example-based machine translation.
Journal
◦ of Machine Translation, 14 (2), 113-157.
◦ SPICE (2005). http://www.cmuspice.org.
◦ Spiegel, M. (1993). Using the Orator synthesizer for a public reverse-directory
service: Design, lessons, and recommendations. In: Proceedings of the
European Conference on Speech Communication and Technology
(EUROSPEECH). Berlin, Germany. pp. 1897-1900.