0% found this document useful (0 votes)

23 views24 pages

8.5 Multilingual Speech Processing

Uploaded by

tjcbx2z9k7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views24 pages

8.5 Multilingual Speech Processing

Uploaded by

tjcbx2z9k7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

MULTILINGUAL

SPEECH PROCESSING
Veronica Gosteva – 1002
Linguistic and marketing
◦ Multilingual speech processing
provides a great opportunity to
revisit lingering challenges.

◦ First, current speech technology

is challenged by the peculiarities of
many languages, which increases the
probability to detect inappropriate modeling
assumptions.
◦ Second, recognition of multiple languages,
especially their simultaneous recognition,
can be viewed as an extreme instance of
model mismatch and can therefore serve
as a testbed for model adaptation and
other robustness techniques.
◦ Third, due to the need for technology-
educated language experts, we are forced
to think about speech and language
technology education in general. Last but
not least, the high demand of speech
processing systems in new languages
encourages the development of tools and
methods that automate the building
process
◦ The difficulties of speech processing are
compounded with multilingual systems, and few if
any commercial multilingual speech services exist
to date. Yet intense research activity in areas of
potential commercial interest are underway. These
are aiming at:
By determining a speaker's language
automatically, callers could be routed to human
Spoken Language Identification
translation services. This is of particular interest to
public services such as police, government offices

Future Spoken Language Services could be

provided in multiple languages. Dictation systems
Multilingual Speech Recognition and and spoken language database access systems, for
Understanding example, could operate in multiple languages, and
deliver text or information in the language of the
input speech.

Voice activated dictionaries, phrase books or

spoken language translators, telephone based
Speech Translation
speech translation services and/or automatic
translation of foreign broadcasts and speeches.
Statistical Language
Modeling
◦ A language model is a probability assignment over all
possible word sequences in a natural (human)
language. Its goal, loosely stated, is to assign relatively
large probability to meaningful, grammatical, or merely
frequent word sequences compared to rare,
ungrammatical, or nonsensical ones.

◦ The classical communication channel model of automatic speech

recognition.
Translation Aware Language
Modeling
◦ A speech recognition system, which contains a
language model, often serves as the front-end of
a translation system. The back-end of such a
system is either another human language or a
database access and manipulation language. In
case of spoken interaction with databases,
considerable progress has been made in paying
special attention to parts of the source language
sentence that matter for the application.
Non-native Speech

There has been much progress in the past Non-native speech , tend to have a large
few years in the areas of large impact on the accuracy of current speech
vocabulary speech recognition, dialog recognition systems. This is the case for
systems, and robustness of recognizers to small vocabulary, isolated word
noisy environments, making speech recognition tasks as well as for large
processing systems ready for real-world vocabulary, spontaneous speech
applications. recognition tasks.
Non-native speech
◦ The differences between native and non-native speech
can be quantified in a variety of ways, all relevant to the
problem of improving recognition for non-native speakers.

◦ Differences in articulation, speaking rate, and pause

distribution can affect acoustic modeling, which looks for
patterns in phone pronunciation and duration and cross-
word behavior.
◦ Differences in disfluency distribution, word choice,
syntax, and discourse style can affect language modeling.
And, of course, as these components are not independent
of one another, all affect overall recognizer performance.
◦ When speaking a foreign language, one must
concentrate not only on the meaning of the
message but also on getting the syntax right,
articulating the sounds, capturing the cadence of
the sequence of words, speaking with the right
level of formality, and mastering other elements
of spoken language that are more or less
automatic for a native speaker. The additional
cognitive load can result in slower speech, with
more pauses as the speaker stops to think. The
fluidity of speech is called fluency, and offers a
number of opportunities for quantification.
◦ The table compares fluency for native speakers of
English, Japanese, and Chinese speaking English
in read and spontaneous speech tasks.
◦ The overall word rate (number of words per
second) is much lower for the non-native speakers
for both types of speaking tasks. The main factor in
the decrease, though, seems to be the number of
pauses inserted
◦ Due to the peculiarities of spoken
Coupling
language, an effective solution to Speech
speech translation cannot be expected
to be a mere sequential connection of Recognition
automatic speech recognition (ASR) and
machine translation components but and
rather a coupling between both.
Translation
This coupling can be characterized by three orthogonal dimensions:

1) the complexity of the search algorithm,

2) the incrementality,

3) the tightness, which describes how close ASR and MT interact

while searching for a solution (Ringger, 1995.)
◦ State-of-the-art translation systems use a variety
of different coupling strategies. Examples of
loosely coupled systems are:

◦ IBM's MASTOR (Liu et al., 2003), ATR-MATRIX

(Takezawa et al., 1998c), and NESPOLE! (Lavie et
al., 2001a), which uses the interlingua-based
JANUS system. Examples for tightly coupled
systems are EuTrans (Pastor et al.)
◦ 2001), developed at UPV, and AT&T's Transnizer
(Mohri and Riley, 1997).
◦ A generic SDS consists of:

• A speech recognizer that transcribes input speech into text

• A natural language understanding component that transforms the

recognition output into a semantic representation (typically via
parsing)

• A discourse and dialog manager that handles the inheritance of

discourse history, content retrieval (often via database access), and
dialog turn taking between the human and the computer

• A spoken response generator that verbalizes the retrieved content

• A text-to-speech synthesizer to generate a spoken presentation of the

verbalized content
Development of multilingual SDS sees a drastic increase in system
complexity for every additional language supported.

Very often, multilingual SDS involves multiple speech recognizers,

one for each supported language.

---> This naturally creates the need for language identification as a

preprocess unless the selected language is explicitly stated by the
user.
SDS encompasses a suite of speech and

Multilingual language technologies to offer a

conversational interface to dynamic

Spoken
information, including speech recognition,
natural language understanding, dialog
modeling, and speech synthesis. Hence,
Dialog the user can present queries to the system
by speaking naturally, and the SDS can
Systems respond in real time in synthetic speech.
Numerous commercial SDS have been
deployed for multiple languages.
An example
dialog in the
stocks domain
illustrating the
capabilities of a
state-of-the-art
spoken dialog
system.
(source: www.
speechworks.c
om).
Multilingual Speech Recognition
◦ The speech recognition component uses
an HMM-based approach with context-
dependent acoustic models. In order to
efficiently capture contextual and
temporal variations in the input while
constraining the number of parameters, the
system uses the successive state splitting
(SSS) algorithm in combination with a
minimum description length criterion.
◦ This algorithm constructs appropriate context-
dependent model topologies by iteratively
identifying an HMM state that should be split into
two independent states. It then reestimates the
parameters of the resulting HMMs based on the
standard maximum-likelihood criterion. Two types
of splitting are supported:

◦ Contextual splitting

◦ Temporal splitting
◦ Contextual
splitting
and
temporal
splitting.
◦ In the past decade, the performance of
automatic speech processing systems
(such as automatic speech recognizers,
speech translation systems, and speech
synthesizers) has improved dramatically,
resulting in an increasingly widespread use
of speech technology in real-world
scenarios.
◦ The challenge of rapidly adapting existing
speech processing systems to new
languages is currently one of the major
bottlenecks in the development of
BIBLIOGRAPHY:
Singh, R., Raj, B., Stern, R. (2002). Automatic generation of subword units for
speech recognition systems. In: IEEE Transactions on Speech and Audio
Processing. 10. pp. 98-99.
◦ Somers, H. (1999). Review article: Example-based machine translation.
Journal
◦ of Machine Translation, 14 (2), 113-157.
◦ SPICE (2005). http://www.cmuspice.org.
◦ Spiegel, M. (1993). Using the Orator synthesizer for a public reverse-directory
service: Design, lessons, and recommendations. In: Proceedings of the
European Conference on Speech Communication and Technology
(EUROSPEECH). Berlin, Germany. pp. 1897-1900.

Nfpa 70B
100% (4)
Nfpa 70B
32 pages
Research Report On Pakistan Post Office
No ratings yet
Research Report On Pakistan Post Office
29 pages
Faith in Mind PDF
No ratings yet
Faith in Mind PDF
2 pages
Advances in Speech To Speech Translation Technologies
No ratings yet
Advances in Speech To Speech Translation Technologies
35 pages
Development of Multilingual Speech
No ratings yet
Development of Multilingual Speech
13 pages
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
No ratings yet
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
23 pages
Gruhn Et Al - Statistical Pronunciation Modeling For Non-Native Speech Processing
No ratings yet
Gruhn Et Al - Statistical Pronunciation Modeling For Non-Native Speech Processing
118 pages
A Focus On Codemixing and Codeswitching in Tamil Speech To Text
No ratings yet
A Focus On Codemixing and Codeswitching in Tamil Speech To Text
12 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Individual Project - Mason Leary
No ratings yet
Individual Project - Mason Leary
15 pages
Speech Translation
No ratings yet
Speech Translation
3 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
From Everand
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
JED RAMOS
No ratings yet
Report Sample
No ratings yet
Report Sample
61 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Speech Recognition-Statistical Methods
No ratings yet
Speech Recognition-Statistical Methods
18 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Introduction To Linguistics 14
No ratings yet
Introduction To Linguistics 14
27 pages
Voice Recognition System
No ratings yet
Voice Recognition System
4 pages
Modern Speech Recognition Approa
No ratings yet
Modern Speech Recognition Approa
337 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
Research Paper
No ratings yet
Research Paper
9 pages
Speech Recognition
No ratings yet
Speech Recognition
17 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Speech Recognition Applications TEXT
No ratings yet
Speech Recognition Applications TEXT
7 pages
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
No ratings yet
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
36 pages
VUIs and Mobile Applications
No ratings yet
VUIs and Mobile Applications
9 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Paper TTS+Conversion
No ratings yet
Paper TTS+Conversion
13 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
Computer Based Automatic Speech Processing: Pham Van Tuan
No ratings yet
Computer Based Automatic Speech Processing: Pham Van Tuan
70 pages
Cmu Sphinx Audio To Text
No ratings yet
Cmu Sphinx Audio To Text
9 pages
ASR Proof
No ratings yet
ASR Proof
19 pages
Post LTC07 115
No ratings yet
Post LTC07 115
13 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
Comparison:Four Approaches To Automatic Language Identification of Telephone Speech
No ratings yet
Comparison:Four Approaches To Automatic Language Identification of Telephone Speech
14 pages
A Comparative Study of Various Approaches For Dialogue Management
No ratings yet
A Comparative Study of Various Approaches For Dialogue Management
8 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
SPEECH
100% (1)
SPEECH
17 pages
Audio Speech To Sign Language Converter Master Complete Document
No ratings yet
Audio Speech To Sign Language Converter Master Complete Document
54 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
A Review On Speech Recognition Approaches and Challenges For Portuguese: Exploring The Feasibility of Fine-Tuning Large-Scale End-To-End Models
No ratings yet
A Review On Speech Recognition Approaches and Challenges For Portuguese: Exploring The Feasibility of Fine-Tuning Large-Scale End-To-End Models
13 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Neurocomputing: Mario Malcangi, David Frontini
No ratings yet
Neurocomputing: Mario Malcangi, David Frontini
10 pages
Implementation of Marathi Language Speech Databases For Large Dictionary
No ratings yet
Implementation of Marathi Language Speech Databases For Large Dictionary
6 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
133-138, Tesma0810, IJEAST
No ratings yet
133-138, Tesma0810, IJEAST
6 pages
Redaction HTK Amazigh Speech
No ratings yet
Redaction HTK Amazigh Speech
15 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
5 pages
Abstract For Language Classification
No ratings yet
Abstract For Language Classification
1 page
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
Medicion de Suciedad NRG - 35W - Soiling - TPS - WEB
No ratings yet
Medicion de Suciedad NRG - 35W - Soiling - TPS - WEB
3 pages
ANSI Codes
No ratings yet
ANSI Codes
12 pages
06 Ethweb 3
No ratings yet
06 Ethweb 3
11 pages
CICS Administration Reference
No ratings yet
CICS Administration Reference
575 pages
Mod Menu Crash 2024 02 27-12 00 09
No ratings yet
Mod Menu Crash 2024 02 27-12 00 09
3 pages
Odbc
No ratings yet
Odbc
2 pages
Windroid Application For E-Attendance
No ratings yet
Windroid Application For E-Attendance
53 pages
Waltbh1617 Pinv00360
No ratings yet
Waltbh1617 Pinv00360
2 pages
Arithmetic 2 Teacher Edition
No ratings yet
Arithmetic 2 Teacher Edition
8 pages
Saa-C01 V14.35
No ratings yet
Saa-C01 V14.35
112 pages
OITAF2024 AURO v2-LOW
No ratings yet
OITAF2024 AURO v2-LOW
42 pages
BL Outline 14 01 24
No ratings yet
BL Outline 14 01 24
8 pages
SaratSasikumar v1701
No ratings yet
SaratSasikumar v1701
5 pages
Characteristics of New Media
No ratings yet
Characteristics of New Media
1 page
PREPOSITIONS OF PLACE - Quizizz
No ratings yet
PREPOSITIONS OF PLACE - Quizizz
6 pages
Bookstore Management System
100% (1)
Bookstore Management System
40 pages
Invoice: WD Elements (WDBUZG0010BBK) 1 TB Portable External Hard Drive (Black) 1 4284 4284
No ratings yet
Invoice: WD Elements (WDBUZG0010BBK) 1 TB Portable External Hard Drive (Black) 1 4284 4284
1 page
4E1 4 10/100M Ethernet Integrated Optical Multiplexer: User Manual
No ratings yet
4E1 4 10/100M Ethernet Integrated Optical Multiplexer: User Manual
25 pages
Lab 3 Exercises
No ratings yet
Lab 3 Exercises
4 pages
EC6 2 ReleaseNotes P638 24
No ratings yet
EC6 2 ReleaseNotes P638 24
3 pages
Python PYQ
No ratings yet
Python PYQ
10 pages
2000 Procedimientos Industriales - Formoso
100% (2)
2000 Procedimientos Industriales - Formoso
1,219 pages
Driver SCN Serie
No ratings yet
Driver SCN Serie
47 pages
DTN Tutorial
No ratings yet
DTN Tutorial
11 pages
Study Id51495 Smart-Cities
No ratings yet
Study Id51495 Smart-Cities
66 pages
1 - Icue49301.2020.9307075
No ratings yet
1 - Icue49301.2020.9307075
7 pages
OOSD Unit 1.3
No ratings yet
OOSD Unit 1.3
27 pages

8.5 Multilingual Speech Processing

Uploaded by

8.5 Multilingual Speech Processing

Uploaded by

MULTILINGUAL

◦ First, current speech technology

Future Spoken Language Services could be

Voice activated dictionaries, phrase books or

◦ The classical communication channel model of automatic speech

◦ Differences in articulation, speaking rate, and pause

1) the complexity of the search algorithm,

3) the tightness, which describes how close ASR and MT interact

◦ IBM's MASTOR (Liu et al., 2003), ATR-MATRIX

• A speech recognizer that transcribes input speech into text

• A natural language understanding component that transforms the

• A discourse and dialog manager that handles the inheritance of

• A spoken response generator that verbalizes the retrieved content

• A text-to-speech synthesizer to generate a spoken presentation of the

Very often, multilingual SDS involves multiple speech recognizers,

---> This naturally creates the need for language identification as a

Multilingual language technologies to offer a

You might also like