0% found this document useful (0 votes)

38 views8 pages

Creating A Mexican Spanish Version of The CMU Sphinx-III Speech Recognition System

The document describes the creation of a Mexican Spanish version of the CMU Sphinx-III speech recognition system. The researchers trained acoustic and language models using Spanish speech data collected from an automated attendant system in Mexico. Their best model achieved a word error rate of 6.32%. This version of Sphinx can now be used to develop Spanish speech applications, especially for communities in Mexico and Latin America. It is available for non-commercial use upon request.

Uploaded by

Pablo Loste Ramos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views8 pages

Creating A Mexican Spanish Version of The CMU Sphinx-III Speech Recognition System

Uploaded by

Pablo Loste Ramos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Creating a Mexican Spanish Version of the

CMU Sphinx-III Speech Recognition System

Armando Varela1 , Heriberto Cuayáhuitl1 , and Juan Arturo Nolazco-Flores2

1
Universidad Autónoma de Tlaxcala,
Department of Engineering and Technology,
Intelligent Systems Research Group,
Apartado Postal #140, 90300 Apizaco, Tlaxcala, Mexico.
{avarela, hcuayahu}@ingenieria.uatx.mx
http://orion.ingenieria.uatx.mx:8080/si/si.jsp
2
Instituto Tecnológico de Estudios Superiores de Monterrey,
Sucursal de Correos “J”, 64849, Monterrey, Nuevo Leon, Mexico.
[email protected]

Abstract. In this paper we present the creation of a Mexican Span-

ish version of the CMU Sphinx-III speech recognition system. We
trained acoustic and N-gram language models with a phonetic set of 23
phonemes. Our speech data for training and testing was collected from an
auto-attendant system under telephone environments. We present exper-
iments with diﬀerent language models. Our best result scored an overall
error rate of 6.32%. Using this version is now possible to develop speech
applications for Spanish speaking communities. This version of the CMU
Sphinx system is freely available for non-commercial use under request.

1 Introduction
Today, building a new robust Automatic Speech Recognition (ASR) system is a
task of many years of effort. In the Autonomous University of Tlaxcala - Mexico,
we have two goals in the ASR field: Do research for generating a robust speech
recognizer, and build speech applications for automating services. In order to
achieve our goals in a short time, we had to take a baseline work. We found
that the CMU (Carnegie Mellon University) Sphinx speech recognition system
is freely available and currently is one of the most robust speech recognizers in
English. The CMU Sphinx system enables research groups with modest budgets
to quickly begin conducting research and developing applications. This arrange-
ment is particularly pertinent in Latin America, where the financial support and
experience otherwise necessary to support such research is not readily available.
In the past, few research efforts have been done for Spanish and these includes
work from CMU in broadcast news transcription [1, 2], where basically acous-
tic and language models have been trained. Our motivations for developing this
work are due to the fact that many applications require a speech recognizer for
Spanish, and because Spoken Dialogue Systems (SDS) require a robust speech
recognizer were reconfiguration and retraining is necessary.

A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 251–258, 2003.

c Springer-Verlag Berlin Heidelberg 2003
252 A. Varela, H. Cuayáhuitl, and J.A. Nolazco-Flores

In this research, we have generated a lexicon and trained acoustic and lan-
guage models with Mexican Spanish speech data for the CMU Sphinx speech
recognition system. Our experiments are based on data collected from an auto-
attendant application (CONMAT) deployed in Mexico [3], with a vocabulary of
2,288 entries from names of people and places inside a university, including syn-
onyms. Our speech data used for training and testing was ﬁltered avoiding noisy
utterances. Results are given in terms of the well known evaluation metric: Word
Error Rate (WER). In the remainder of the paper we ﬁrst provide an overview
of the system in section 2. In section 3 we describe the components of the Sphinx
system and how these were trained. In Section 4 we present experimental results.
Finally, in section 5 we provide our conclusions and future directions.

2 System Overview

The Carnegie Mellon University Sphinx-III system is a frame-based, HMM-

based, speaker-independent, continuous speech recognition system, capable of
handling large vocabularies (see Fig. 1). The word modeling is performed based
on subword units, in terms of which all the words in the dictionary are tran-
scribed. Each subword unit considered in its immediate context (triphone) is
modeled by 5-state left-to-right HMM model. Data is shared across states of
diﬀerent triphones. These groups of HMM states sharing distributions between
its member states are called senones [4].

raw audio 13 dimensional speeh feature

16 bit samples real valued vector
sampling rate 16k Acoustic cepstrum vectors (39 dimensional)
Feature
Features
Stream
Computation

Signal
1- Best
N-Best List
Viterbi &
Beam

model definition
mixture weigths
LM Probabilities
state transition matrix
subvector quantized model

Acoustic Language
Lexicon
Models Models

Fig. 1. Architecture of the CMU Sphinx-III speech recognition system. The lexical
or pronunciation model contains pronunciations for all the words of interest to the
decoder. Acoustic models are based on statistical Hidden Markov models (HMMs).
Sphinx-III uses a conventional backoﬀ bigram or trigram language model. The result
is a recognition hypothesis with a word lattice representing an N-best list.
Creating a Mexican Spanish Version 253

The feature vector computation is a two-stage process. In the ﬁrst stage, an

off-line front-end module is first responsible for processing the raw audio sample
stream into a cepstral stream. The input is windowed, resulting in frames of du-
ration 25.625 ms. The output is a stream of 13-dimensional real-valued cepstrum
vectors. The frames overlap, thus resulting in a rate of 100 vectors/sec. In the
second stage, the stream of cepstrum vectors is converted into a feature stream.
This process consists of a Cepstrum Mean-Normalization (CMN) and Automatic
Gain Control (AGC) step. The final speech feature vector is created by typically
augmenting the cepstrum vector (after CMN and AGC) with one or more time
derivatives. The feature vector in each frame is computed by concatenating first
and second derivatives to the cepstrum vector, giving a 39-dimensional vector.

3 System Components

3.1 Lexicon

The lexicon development process consisted of deﬁning a phonetic set and gener-
ating the word pronunciations for training acoustic and language models.

Table 1. ASCII Phonetic Symbols for Mexican Spanish.

Manner Label Example Worldbet Word

Plosives p punto punto
b baños b a ñ o s
t tino tino
d donde donde
k casa kasa
g ganga ganga
Fricatives f falda falda
s mismo mismo
x jamas xamas
Aﬀricates tS chato tS a t o
Nasals m mano mano
n nada nada
ñ baño b a ñ o
Semivowels l lado lado
L pollo poLo
r( pero p e r( o
r perro pero
w hueso weso
Vowels i piso piso
e mesa mesa
a caso kaso
o modo modo
u cura k u r( a
254 A. Varela, H. Cuayáhuitl, and J.A. Nolazco-Flores

Our approach for modeling Mexican Spanish phonetic sounds in the CMU
Sphinx-III speech recognition system consisted of an adapted version from the
WORLDBET Castilian Spanish phonetic set [5], which resulted in 23 phonemes
listed in Table 1. The adaptation consisted in a manual comparison of spec-
trograms from words including a common phoneme; we found common sounds
which we merged in our final list of phonemes. The following are the modifica-
tions made to the Castilian Spanish sounds set for generating a Mexican Spanish
version:
– Fricative /s/ as in “kasa” and fricative /z/ as in “mizmo” merged into /s/,
– Plosive /b/ as in “baños” and fricative /V/ as in “aVa” merged into /b/,
– Plosive /d/ as in “donde” and fricative /D/ as in “deDo” merged into /d/,
– Plosive /g/ as in “ganga” and fricative /G/ as in “lago” merged into /g/,
– Semi-vowels /j/ as in “majo” and /L/ as in “poLo”, and affricate /dZ/ as
in “dZugo” merged into /L/,
– Nasal /n/ as in “nada” and nasal /N/ as in “baNko” merged into /n/,
– Fricative /T/ as in “luTes” was deleted due to the fact that this sound does
not exist in Mexican Spanish.
The vocabulary size has 2,288 words, which is based on names of people
and places inside a university, including synonyms. The automatic generation
of pronunciations was performed using a simple list of rules and exceptions.
The rules determine the mapping of clusters of letters into phonemes and the
exceptions list covers some words with irregular pronunciations. A Finite State
Machine (FSM) was used to develop the pronunciations from the word list.

3.2 Acoustic Models

For training acoustic models is necessary a set of feature files computed from the
audio training data, one each for every recording in the training corpus. Each
recording is transformed into a sequence of feature vectors consisting of the
Mel-Frequency Cepstral Coefficients (MFCCs). The training of acoustic models
is based on utterances without noise. This training was performed using 3,375
utterances of speech data from an auto-attendant system, which context is names
of people and places inside a university.
The training process (see Fig. 2) consists of the following steps: Obtain a cor-
pus of training data and for each utterance, convert the audio data to a stream
of feature vectors, convert the text into a sequence of linear triphone HMMs
using the pronunciation lexicon, and find the best state sequence or state align-
ment through the sentence HMM for the corresponding feature vector sequence.
For each senone, gather all the frames in the training corpus that mapped to
that senone in the above step and build a suitable statistical model for the
corresponding collection of feature vectors. The circularity in this training pro-
cess is resolved using the iterative Baum-Welch or forward-backward training
algorithm. Due to the fact that continuous density acoustic models are com-
putationally expensive, a model is built by sub-vector quantizing the acoustic
model densities (sub-vector quantizing was turned off in our work).
Creating a Mexican Spanish Version 255

Acoustic
Features
Computation

13 dimensional real
audio data
valued cepstrum gaussian mixture,
vectors senone models, top scoring
HMM state transition gaussian
probability matrices densities
State Sub-vector
Corpus of sequence quantizing
training data

linear sequence
text data of triphone HMMs

Sentence
HMM

for each utterance

Fig. 2. A block schematic diagram for training acoustic models.

3.3 Language Models

The main Language Model (LM) used by the Sphinx decoder is a conventional
bigram or trigram backoff language model. Our LMs were constructed from
the 2,288 word dictionary using the CMU-Cambridge statistical language model
toolkit version 2.0 [6], see Fig. 3. The training data consisted of 3,375 transcribed
utterances of speech data from an auto-attendant system. We trained bigrams
and trigrams with four discounting strategies: Good Turing, Absolute, Linear,
and Witten Bell. The LM probability of an entire sentence is the product of the
individual word probabilities. The output from the CMU-Cambridge toolkit is
an ASCII text file, and because this file can be very slow to load into memory,
the LM must be compiled into a binary form. The decoder uses a disk-based
LM strategy to read the binary into memory. Although the CMU-sphinx recog-
nizer is capable for handling out-of-vocabulary speech, we did not set any filler
models. Finally, the recognizer needs to exponenciate the LM probability using
a language weight before combining the result with the acoustical likelihood.

Corpus of
training data

text

binary LM
CMU N-grams LM weigth
Binary LM file disk-based
Cambridge
conversion LM strategy
Toolkit

Fig. 3. A block schematic diagram for training language models.

256 A. Varela, H. Cuayáhuitl, and J.A. Nolazco-Flores

4 Experimental Results
4.1 Experimental Setup
We performed two experiments for evaluating the performance of the CMU
Sphinx system trained with Mexican speech data (872 utterances) in the con-
text of an auto-attendant application: the first experiment considered names of
people and places as independent words (i.e. any combination of first names and
last names was allowed), the second experiment considered names of people and
places as only one word. Each experiment was evaluated with two different LMs.

4.2 Evaluation Criteria

The evaluation of each experiment was made according to recognition accuracy
and computed using the WER (Word Error Rate) metric deﬁned by the equation
1, which align a recognized word string against the correct word string and
compute the number of substitutions (S), deletions (D), and insertions (I) from
the number of words in the correct sentence (N).

W ER = (S + D + I) /N ∗ 100%. (1)

4.3 Results
Recognition results for each decoding stage for the CMU with Sphinx Mexican
Spanish test data are shown in Tables 2 and 3. In table 2 (experiment 1), we
can observe that the use of Good Turing discount strategy is not convenient,
and the use of different n-grams does not make much difference, perhaps bigger
training and test sets would yield significant differences. In the mean time, for
this experiment the best option is bigrams with Witten Bell discounting strategy,
but we observed problems with this approach due that this experiment can yield
incorrect hypothesis, i.e. inexistent names of people and places. Thus, another
solution was necessary to solve this problem. In table 3 (experiment 2), we
observe that due to the conditions of the experiment, would yield no further
significant improvements with different n-grams. Despite of this, the best gains
are shown in trigrams with Witten Bell discounting strategy.

Table 2. Word error rate in the test set after decoding from the experiment 1, which
considered names of people and places as independent words.

Discounting Strategy Bigrams Trigrams

Good Turing 12.95 12.88
Absolute 7.82 7.63
Linear 7.94 8.07
Witten Bell 7.63 7.75
Creating a Mexican Spanish Version 257

Table 3. Word error rate in the test set after decoding from the experiment 2, which
considered names of people and places as only one word.

Discounting Strategy Bigrams Trigrams

Good Turing 6.88 6.44
Absolute 6.38 6.38
Linear 6.50 6.57
Witten Bell 6.38 6.32

5 Conclusions and Future Work

We described the training and evaluation processes of the CMU Sphinx-III

speech recognition system for Mexican Spanish. We performed two experiments
in which we grouped differently the word dictionary entries. Our best results
of this development considered dictionary entries as only one word for avoid-
ing inexistent names of people and places inside a university. Through a simple
lexicon and set of acoustic and language models, we demonstrated an accurate
recognizer which scored an overall error rate of 6.32% on in-vocabulary speech
data. We achieved the goal of this work from which now we have a baseline
product for performing research in speech recognition, which is an important
component of spoken language systems. Also, with this work we can start de-
velopment of speech applications with the advantage that we can retrain and
adapt the recognizer according to our needs. This work was motivated due to
the fact that people around the world needs to develop applications involving
speech recognition for Spanish speaking communities. Therefore, the resulted
lexicon, acoustic and language models are freely available for non-commercial
purposes under request.
An immediate future work is to provide a bridge for invoking the recog-
nizer and see it as a black box, perhaps we can build a dll file or we can pro-
vide something similar as SAPI. This is indispensable for programmers who
need to develop speech applications from different programming environments.
Another important future direction and due that this development considers
only in-vocabulary speech, we plan to retrain the recognizer considering Out-
Of-Vocabulary (OOV) speech, measuring computational overhead. This is due
to the fact that OOV speech is an important factor in spoken dialogue systems
and degrades significantly the performance in such systems [7]. Also, we plan to
train Sphinx in different domains, as well as optimize configuration parameters.
Finally, we plan to train Sphinx release 4 which was implemented in Java, and
make a comparison between Sphinx III and Sphinx 4 in Spanish domains. All
this work would be performed considering a bigger corpus.

Acknowledgements. This research was possible due to the availability of the

CMU Sphinx speech recognizer. We want to thank to the people involved in the
258 A. Varela, H. Cuayáhuitl, and J.A. Nolazco-Flores

development of the CMU Sphinx-III and of course the formers of the recognizer
[8]. Also, we want to thank Ben Serridge for his writing revision on this paper.

References
1. J. M. Huerta, E. Thayer, M. Ravishankar, and R. M. Stern: The Development of
the 1997 CMU Spanish Broadcast News Transcription System. Proc. of the DARPA
Broadcast News Transcription and Understanding Workshop, Landsdowne, Vir-
ginia, Feb 1998.
2. J. M. Huerta, S. J. Chen, and R. M. Stern: The 1998 Carnegie Mellon University
Sphinx-III Spanish Broadcast News Transcription System. In the proceedigns of
the DARPA Broadcast News Transcription and Understanding Workshop, Herndon,
Virginia, Mar 1999.
3. Cuayáhuitl, H. and Serridge, B.: Out-Of-Vocabulary Word Modeling and Rejection
for Spanish Keyword Spotting Systems. Lecture Notes in Computer Science, Vol,
2313. Berlin Heidelberg New York (2002) 158–167.
4. Hwang, M-Y: Subphonetic Acoustic Modeling for Speaker-Independent Continuous
Speech Recognition. Ph.D. thesis, Carnegie Mellon University, 1993.
5. Hieronymus L., J.: ASCII Phonetic Symbols for World’s Languages: worldbet. Tech-
nical report, Bell Labs, 1993.
6. P. Clarkson, and R. Rosenfeld.: Statistical Language Modeling Using the CMU-
Cambridge Toolkit. In the proceedings of Eurospeech, Rodhes, Greece, 1997, 2707–
2710.
7. Farfán, F., Cuayáhuitl H., and Portilla, A.: Evaluating Dialogue Strategies in a
Spoken Dialogue System for Email. In the proceedings of the IASTED Artiﬁcial
Intelligence and Applications, ACTA Press, Manalmádena, Spain, Sep 2003.
8. CMU Robust Speech Group, Carnegie Mellon University.
http://www.cs.cmu.edu/afs/cs/user/robust/www/

Voice Recognition
60% (5)
Voice Recognition
31 pages
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
100% (1)
Robust Speech Recognition Using Articulatory Information: Der Technischen Fakult at Der Universit at Bielefeld
148 pages
Towards A Model of The Mapping
No ratings yet
Towards A Model of The Mapping
147 pages
ASR Building Using Sphinx
100% (2)
ASR Building Using Sphinx
36 pages
Christoph Bensch Master Thesis
No ratings yet
Christoph Bensch Master Thesis
67 pages
Xiao Guest Lecture ASR
No ratings yet
Xiao Guest Lecture ASR
39 pages
Scaling Speech Technology To 1,000+ Languages
No ratings yet
Scaling Speech Technology To 1,000+ Languages
41 pages
Mba-Ai Speech Technologies: Prof. Brian Mak
No ratings yet
Mba-Ai Speech Technologies: Prof. Brian Mak
56 pages
Applsci 12 01091
No ratings yet
Applsci 12 01091
18 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Multitask Learning of Deep Neural Networks For Low-Resource Speech Recognition
No ratings yet
Multitask Learning of Deep Neural Networks For Low-Resource Speech Recognition
12 pages
356 Phonological Disorders
100% (2)
356 Phonological Disorders
2 pages
Spasov Ski 2015
No ratings yet
Spasov Ski 2015
8 pages
Forced Alignment and Speech Recognition Systems
No ratings yet
Forced Alignment and Speech Recognition Systems
32 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
(Sici) 1099 1115 (199711) 11:7 569::aid Acs453 3.0.co 2 2
No ratings yet
(Sici) 1099 1115 (199711) 11:7 569::aid Acs453 3.0.co 2 2
15 pages
Us8527276 PDF
No ratings yet
Us8527276 PDF
26 pages
Lee K F 1990 1
No ratings yet
Lee K F 1990 1
11 pages
658-Article Text-674-1-10-20190607
No ratings yet
658-Article Text-674-1-10-20190607
12 pages
Viva Speech
100% (1)
Viva Speech
4 pages
Karafiat Icassp2018 0005789
No ratings yet
Karafiat Icassp2018 0005789
5 pages
Phoneme Spotting For Speech-Based Crypto-Key Generation: Paola - Garcia, Jnolazco, Carlosmex @itesm - MX
No ratings yet
Phoneme Spotting For Speech-Based Crypto-Key Generation: Paola - Garcia, Jnolazco, Carlosmex @itesm - MX
8 pages
ASR - Thesis Report PDF
No ratings yet
ASR - Thesis Report PDF
42 pages
Automatic Urdu Speech Recognition Using
No ratings yet
Automatic Urdu Speech Recognition Using
5 pages
Tutorial On Speech Recognition: Alex Acero Microsoft Research
No ratings yet
Tutorial On Speech Recognition: Alex Acero Microsoft Research
38 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Xilin Jiang, Yinghao Aaron Li, Adrian Nicolas Florea, Cong Han, Nima Mesgarani
No ratings yet
Xilin Jiang, Yinghao Aaron Li, Adrian Nicolas Florea, Cong Han, Nima Mesgarani
9 pages
Dimex100: A New Phonetic and Speech Corpus For Mexican Spanish
No ratings yet
Dimex100: A New Phonetic and Speech Corpus For Mexican Spanish
10 pages
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
No ratings yet
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
14 pages
Automatic Speech Recognition 2
No ratings yet
Automatic Speech Recognition 2
22 pages
The CMU-EBMT Machine Translation System: Ralf D. Brown
No ratings yet
The CMU-EBMT Machine Translation System: Ralf D. Brown
17 pages
A Complete Kaldi Recipe For Building Arabic Speech Recognition Systems
No ratings yet
A Complete Kaldi Recipe For Building Arabic Speech Recognition Systems
5 pages
Class 11th (CH4)
100% (1)
Class 11th (CH4)
12 pages
Moroccan Dialect Speech Recognition System Based On Cmu Sphinxtools
No ratings yet
Moroccan Dialect Speech Recognition System Based On Cmu Sphinxtools
5 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 E-Mail
No ratings yet
IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 E-Mail
4 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Isolated Speech Recognition Using Artificial Neural Networks
No ratings yet
Isolated Speech Recognition Using Artificial Neural Networks
5 pages
(Handbook of Linguistics and Communication Science_ Handbucher Zur Sprach-Und Kommunikationswissenschaft) Roland Pfau, Markus Steinbach, Bencie Woll-Sign Language_ An International Handbook-Mouton De (1).pdf
No ratings yet
(Handbook of Linguistics and Communication Science_ Handbucher Zur Sprach-Und Kommunikationswissenschaft) Roland Pfau, Markus Steinbach, Bencie Woll-Sign Language_ An International Handbook-Mouton De (1).pdf
1,141 pages
White Paper - Demystifying Speech Recognition by Charles Corfield - July2012
No ratings yet
White Paper - Demystifying Speech Recognition by Charles Corfield - July2012
5 pages
Speech Recognition Application
No ratings yet
Speech Recognition Application
13 pages
Portfolio in Oral Communication: Submitted By: Maria Theressa Namoco Submitted To: Ms. Carmela Tan
No ratings yet
Portfolio in Oral Communication: Submitted By: Maria Theressa Namoco Submitted To: Ms. Carmela Tan
10 pages
Speech To Text Using Multiple Lang...
No ratings yet
Speech To Text Using Multiple Lang...
5 pages
Extremoduro - So Payaso Dificil
No ratings yet
Extremoduro - So Payaso Dificil
7 pages
11IASRUCSS186
No ratings yet
11IASRUCSS186
5 pages
Sharika Malayalam Speech Recognition System: Shyam.k MES College of Engineering, Kuttipuram
No ratings yet
Sharika Malayalam Speech Recognition System: Shyam.k MES College of Engineering, Kuttipuram
4 pages
Speech To Text Conversion For Multilingual Languages
No ratings yet
Speech To Text Conversion For Multilingual Languages
5 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
Meta-Learning For Phonemic Annotation of Corpora
No ratings yet
Meta-Learning For Phonemic Annotation of Corpora
8 pages
Neurocomputing: Mario Malcangi, David Frontini
No ratings yet
Neurocomputing: Mario Malcangi, David Frontini
10 pages
Comp Sci - Recognition Isolated - Shanthi Teressa1
No ratings yet
Comp Sci - Recognition Isolated - Shanthi Teressa1
6 pages
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation
No ratings yet
A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation
4 pages
Redaction HTK Amazigh Speech
No ratings yet
Redaction HTK Amazigh Speech
15 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
Build Automatic Speech Recognition System: Bachelor of Technology
No ratings yet
Build Automatic Speech Recognition System: Bachelor of Technology
25 pages
Realization of Embedded Speech Recognmition Module Based On STM32
No ratings yet
Realization of Embedded Speech Recognmition Module Based On STM32
5 pages
Artificial Neural Network For Arabic Speech Recognition in Humanoid Robotic Systems
No ratings yet
Artificial Neural Network For Arabic Speech Recognition in Humanoid Robotic Systems
4 pages
BIBM PREli GOVT & Private Sumon K. Sarkar
No ratings yet
BIBM PREli GOVT & Private Sumon K. Sarkar
303 pages
Tetbfm:978 1 4615 3650 5 - 1
No ratings yet
Tetbfm:978 1 4615 3650 5 - 1
14 pages
Speaking Time 3
No ratings yet
Speaking Time 3
11 pages
Remedial Instruction in English
80% (5)
Remedial Instruction in English
9 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Introductionto Linguistics - Module 1 - Lesson 3
No ratings yet
Introductionto Linguistics - Module 1 - Lesson 3
21 pages
HBET1303
No ratings yet
HBET1303
216 pages
Speech and Theatre Arts
No ratings yet
Speech and Theatre Arts
4 pages
Communication Disorders
No ratings yet
Communication Disorders
5 pages
Engl1a Midterms Module 2 English Language Registers Surname Name Course Section
No ratings yet
Engl1a Midterms Module 2 English Language Registers Surname Name Course Section
8 pages
Oral Communication in Context
No ratings yet
Oral Communication in Context
10 pages
REVISED Exam Oral Com
No ratings yet
REVISED Exam Oral Com
5 pages
Cognitive Gains in 7-Month-Old Bilingual Infants: Agnes Melinda Kova Cs and Jacques Mehler
No ratings yet
Cognitive Gains in 7-Month-Old Bilingual Infants: Agnes Melinda Kova Cs and Jacques Mehler
5 pages
Agenda - Meeting Minutes
No ratings yet
Agenda - Meeting Minutes
32 pages
DLP Demo
No ratings yet
DLP Demo
6 pages
Animal Language by Michael Balter
No ratings yet
Animal Language by Michael Balter
3 pages
1999 Waspaa Mfas PDF
No ratings yet
1999 Waspaa Mfas PDF
4 pages
Traits Ui Slides
No ratings yet
Traits Ui Slides
122 pages
55 PDF
No ratings yet
55 PDF
6 pages
The Effect of Background Knowledge On Speaking Ability of Iranian EFL Learners
No ratings yet
The Effect of Background Knowledge On Speaking Ability of Iranian EFL Learners
9 pages
Understanding Speech Delay
No ratings yet
Understanding Speech Delay
16 pages
NCR FINAL Oral Com. Q2 M4. Val
No ratings yet
NCR FINAL Oral Com. Q2 M4. Val
20 pages
Communicating Effectively in GDs - Nov 7 2021
No ratings yet
Communicating Effectively in GDs - Nov 7 2021
28 pages
Public Speaking 1
No ratings yet
Public Speaking 1
13 pages
Stages of Development 2
No ratings yet
Stages of Development 2
52 pages
Contoh
No ratings yet
Contoh
25 pages
The Correlation Between Students' Grammar Ability and Students' Speaking Ability in SMP Panca Budi Medan
No ratings yet
The Correlation Between Students' Grammar Ability and Students' Speaking Ability in SMP Panca Budi Medan
12 pages
Story Segmentation and Detection of Commercials in Broadcast News Video
No ratings yet
Story Segmentation and Detection of Commercials in Broadcast News Video
12 pages
3rd Periodical Exam in Eng 9
No ratings yet
3rd Periodical Exam in Eng 9
4 pages
Repertorio Booster Ordenado Tempos 151218
No ratings yet
Repertorio Booster Ordenado Tempos 151218
1 page
Fast Speaker Change Detection For Broadcast News Transcription and Indexing
No ratings yet
Fast Speaker Change Detection For Broadcast News Transcription and Indexing
4 pages
025 What Effect Audio Quality Robustness MFCC Chroma Features
No ratings yet
025 What Effect Audio Quality Robustness MFCC Chroma Features
6 pages
Some Recent Research Work at LIUM Based On The Use of CMU Sphinx
No ratings yet
Some Recent Research Work at LIUM Based On The Use of CMU Sphinx
6 pages
0 Rybach
No ratings yet
0 Rybach
4 pages
Automatic Segmentation of Broadcast News Audio Using Self Similarity Matrix
No ratings yet
Automatic Segmentation of Broadcast News Audio Using Self Similarity Matrix
4 pages
A Computationally Efficient Speech/music Discriminator For Radio Recordings
No ratings yet
A Computationally Efficient Speech/music Discriminator For Radio Recordings
4 pages
Audio-Speech Segmentation and Topic Detection For A Speech-Based Information Retrieval System
No ratings yet
Audio-Speech Segmentation and Topic Detection For A Speech-Based Information Retrieval System
7 pages
Heuristic Algorithms For Extracting Relevant Features in Signal Analysis
No ratings yet
Heuristic Algorithms For Extracting Relevant Features in Signal Analysis
8 pages
Automatic Web Page Classification: Abstract
No ratings yet
Automatic Web Page Classification: Abstract
10 pages
Goldfield Snow Individual Differences PDF
No ratings yet
Goldfield Snow Individual Differences PDF
16 pages
4) Time-Frequency and Time-Scale Analysis: An Alternative
No ratings yet
4) Time-Frequency and Time-Scale Analysis: An Alternative
1 page
B. Transient/Steady-State Separation A. Reduction Based On Signal Features 1) Temporal Features: When Observing The Temporal Evo
No ratings yet
B. Transient/Steady-State Separation A. Reduction Based On Signal Features 1) Temporal Features: When Observing The Temporal Evo
1 page
Behringer Pro800 Cheatsheet
No ratings yet
Behringer Pro800 Cheatsheet
1 page
Managing Déjà Vu: Collection Building For The Identification of Nonidentical Duplicate Documents
No ratings yet
Managing Déjà Vu: Collection Building For The Identification of Nonidentical Duplicate Documents
12 pages
AES 132 Salient Audio Features Investigation (Paper No 8663) 2012-Libre
No ratings yet
AES 132 Salient Audio Features Investigation (Paper No 8663) 2012-Libre
8 pages
Kristin Longman CV
No ratings yet
Kristin Longman CV
3 pages
Muhamad Ramdhani - PSYCHOLINGUISTICS - Individual Assignment 1
No ratings yet
Muhamad Ramdhani - PSYCHOLINGUISTICS - Individual Assignment 1
5 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
The Cyclic System of Transposition for Trumpet
From Everand
The Cyclic System of Transposition for Trumpet
Keith Doles
5/5 (1)

Creating A Mexican Spanish Version of The CMU Sphinx-III Speech Recognition System

Uploaded by

Creating A Mexican Spanish Version of The CMU Sphinx-III Speech Recognition System

Uploaded by

Creating a Mexican Spanish Version of the

CMU Sphinx-III Speech Recognition System

Armando Varela1 , Heriberto Cuayáhuitl1 , and Juan Arturo Nolazco-Flores2

Abstract. In this paper we present the creation of a Mexican Span-

The Carnegie Mellon University Sphinx-III system is a frame-based, HMM-

raw audio 13 dimensional speeh feature

The feature vector computation is a two-stage process. In the ﬁrst stage, an

Table 1. ASCII Phonetic Symbols for Mexican Spanish.

Manner Label Example Worldbet Word

3.2 Acoustic Models

for each utterance

Fig. 2. A block schematic diagram for training acoustic models.

3.3 Language Models

Fig. 3. A block schematic diagram for training language models.

4.2 Evaluation Criteria

Discounting Strategy Bigrams Trigrams

Discounting Strategy Bigrams Trigrams

5 Conclusions and Future Work

We described the training and evaluation processes of the CMU Sphinx-III

Acknowledgements. This research was possible due to the availability of the

You might also like