The Arabic Speech Database: PADAS
The Arabic Speech Database: PADAS
The Arabic Speech Database: PADAS
Cherif Adnan
Abstract
This work describes a construction of PADAS Phonetics Arabic Database Automatically segmented
based on a data-driven Markov process. The use of a segmentation database is necessary in
speech synthesis and recognizing speech. Manual segmentation is precise but inconsistent,
since it is often produced by more than one label and require time and money. The MAUS
segmentation and labeling exist for German speech and other languages but not in Arabic. It is
necessary to modify MAUS for establish a segmental database for Arab. The speech corpus
contains a total of 600 sentences recorded by 3 (2 male and 1 female) Arabic native speakers
from Tunisia, 200 sentences for each.
Keywords: HTK, MAUS, Phonetic Database, Automatic Segmentation.
1. INTRODUCTION
Many researches such as automatic speech recognition or speech synthesis are now based on
database e.g. English [1, 2, 3 and 4]. For obtaining a good result, the database must be
balanced, segmented and reduce the noise (noise in step of record)In order to produce a robust
speaker-independent continuous Arabic, a set of speech recordings that are rich and balanced is
required. The rich characteristic is in the sense that it must contain all the phonemes of Arabic
language. It must be balanced in preserving the phonetics distribution of Arabic language too.
This set of speech recordings must be based on a proper written set of sentences and phrases
created by experts. Therefore, it is crucial to create a high quality written (text) set of the
sentences and phrases before recording them. Any work based on the learning step requires a
database to learn the system and then evaluate it. They are a several international databases in
field of speech such as TIMIT which was developed by DARPA Committee for American English.
And we also find other databases in different known languages, such as French and German,
and unknown, as Vietnamese and Turkish.
For Arabic, we have not found a standard database, but we still found a few references. KACST
[5] database developed by the Institute of King Abdul -Aziz in Saudi Arabia.
1.1 KACST
Indeed KACST created a database for Arabic language sounds in 1997. This database was to
created the least number of phonetically rich Arabic words. As a result, a list of 663 phonetically
rich words containing all Arabic phonemes.
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
10
The purpose is used for Arabic ASR and text-to-speech synthesis applications.
KACST produced a technical report of the project Database of Arabic Sounds: Sentences in
2003. The sentences of Arabic Database have been written using the said 663 phonetically rich
words. The database consists of 367 sentences; 2 to 9 words per sentence.
The purpose is to produce Arabic phrases and sentences that are phonetically rich and balanced
based on the previously created list of 663 phonetically rich words [6].
1.2 ALGASD
ALGERIAN ARABIC SPEECH DATABASE (ALGASD) [7] developed for the treatment of Algeria
Arabic speech taking into account the different accents from different regions of the country.
Unavailability and lack of resources for a database audio prompted us to build our own database
to make the recognition of numbers and operations of a standard calculator in Arabic for a single
user. We made 27 recordings of 28 vocabulary words.
Database is the most important tool for multiple domains as speech synthesis or speech
recognition. to provide database a interesting and contains all the acoustic units must have all
the possible linguistic combinations .The quality of the final result of the synthesis is directly
dependent on the quality of recordings made during the development of the acoustic units
therefore a filtering step dictionary is mandatory.
The implementation stages can be summarized as follows:
a)
b)
c)
d)
2. ARABIC LANGUAGE
Statistics show that it is the first language (mother-tongue) of 206 million native speakers ranked
as fourth after Mandarin, Spanish and English [8].The Arabic language is a derivational and
inflectional language. The original Arabic is the language spoken by the Arabs. In addition, it is
the sacred language of the Koran and Islam. Because the spread of Islam and the spread of the
Qur'an, the language became a liturgical language. It is spoken in 22 countries, while the number
of speakers is more than 280 million [9].
2.1 Alphabet
2.1.1 Consonants
The Arabic alphabet consists of twenty eight consonants (see Table 1) basic, but there are
authors who treat the letter (alif) as the twenty ninth consonant. The (alif) behaves as a long
vowel never found as consonant of the root.
Vowels are not as consonants, they are rarely noted. They are written only to clarify ambiguities
in the editions of the Koran or in the academic literature. Indeed, vowels play an important role in
the Arabic words, not only because they remove the ambiguity, but also because they give the
grammatical function of a word regardless of its position in the sentence. In other words, vowels
have a dual function: one morphological or semantic and the other is syntactic. Arabic has two
sets of vowels, the short one and the other long.
2.1.2
Short Vowels
( , , )
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
).
11
,,)
2.2 The Diacritics
Short vowels are represented by symbols called diacritics (see Figure 5). Three in number, these
symbols are transcribed as follows:
(
/ AIan / )
( / AIun /)
(/ AIin / )
FIGURE 1: Example of a sentence / jalAIabuuna limuddati saAItin / ("They play for an hour").
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
12
Graphem
es
symbol
symbol
Graphem
es
symbol
Graphem
es
t
T
Z
X
x
d
D
symbol
Graphem
es
s
S
s
d
t
D
?
q
k
l
m
n
h
w
a:
i
i:
u
u:
li
cvv
English
meaning
to
fii
in
cvc
qul
say
cvcc
bahr
sea
cvvc
maAl
money
cvvcc
zaArr
visit
cv
Arabic example
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
13
average speed (from 10 to 12 phonemes/second) by Tunisian speakers, two male and a female.
They were sampled at 16 KHz with 16 bits per sample.
3.2 Corpus Analysis
We have carried out a statistical study of our corpus. Table 3 shows the results of this study. We
can note the following results:
The short vowel [a] and the long vowel [a:] appear with a frequency of 17%, followed by
vowels [i] and [i:] with an occurrence frequency of 14.3%. The vowels [u] and [u:]
represent 7%.
The occurrence of the vowel (short and long) is about 37%.
The most frequent Arabic consonants are: [?] (15%), [n] (6.66%), [l] (6.63%), [m]
(5.59%), etc.
Consonant and
vowels
?
b
t
T
x
/X
G
d
D
r
z
s
S
s
d
t
D
?
g
f
q
k
l
m
n
h
w
j
a
a:
i
i:
u
u:
total
Phoneme
Repetitions
523
102
92
70
19
20
35
39
40
102
35
48
73
18
24
19
24
61
23
61
61
80
260
219
261
123
51
80
400
254
400
28
252
23
3920
%
13,34%
2,60%
2,35%
1,79%
0,48%
0,51%
0,89%
0,99%
1,02%
2,60%
0,89%
1,22%
1,86%
0,46%
0,61%
0,48%
0,61%
1,56%
0,59%
1,56%
1,56%
2,04%
6,63%
5,59%
6,66%
3,14%
1,30%
2,04%
10,20%
6,48%
10,20%
0,71%
6,43%
0,59%
100%
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
14
for a S&L k in a space of all possible S&Ls, which can be formulated as [13,14]:
^
P ( k ) p (O | k )
P (O )
(1)
Where, O is the acoustic observation on the corresponding speech signal. The MAUS system
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
15
models
P (k )
for each recording O. Each path from the start node to the end node represents a
k
P(k ) p(O | k )
possible
and accumulates to the probability
which is determined by HMMs for
each phonemic segment and a simple Viterbi search through the graph yields the
P(k ) p(O | k )
maximal
.
The Munich Automatic Segmentation (MAUS) system developed by Department of Phonetics,
University of Munich, For more details about the MAUS method refer to [15], [16] and [17].
The purpose is analyzing a spoken utterance. Indeed, input a speech wave and some
orthographic form of the spoken text. The text is parsed into a chain of single words (punctuation
marks are stripped) and passed to a texttophoneme algorithm, which is either rulebased or a
combination of lexicon lookup and fallback to the rulebased system.
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
16
FIGURE 5: Example MAUS segmentation and labeling taken from the Arabic corpus with SAMPA code.
FIGURE 6: Temporal description of each phoneme; start time and end time.
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
17
4.2 Evaluation
In total, 600 sentences were segmented, 400 sentences for the two speakers (male, 200
sentences for every one), 200 sentences for the third speaker (female). For each segmented 200
sentences, we randomly selected 10 sentences for segmented manually. To do this, we need 6
students in our research laboratory, two for each 10 sentences. The results are summarized in
the following table:
speaker
First male speaker
second male speaker
female speaker
Manual segmentation
99%
99%
99%
Automatic segmentation
94%
94.4%
94.1%
5. CONCLUSION
This paper reports our work towards developing the PADAS Phonetic Arabic Database
Automatically Segmented based on rich phonetic and balanced speech corpus, which is
automatic segmented with the MAUS system. This work includes creating the rich phonetic and
balanced speech corpus; building an Arabic phonetic dictionary, reducing noise by wavelet
method and an evaluation of the automatic segmentation. The current release of our database
contains 1 female and 2 male voices. The purpose of this work is to build a database to be used
in all area of Speech processing. This variety is useful when used in speech synthesis or speech
recognition.
6. REFERENCES
[1] A. Black and K. Tokuda, The Blizzard Challenge Evaluating Corpus-Based Speech
Synthesis on Common Datasets, in Proceeding of Interspeech, Portugal, pp. 77-80, 2005.
[2] S. DArcy and M. Russell, Experiments with the ABI (Accents of the British Isles) Speech
Corpus, in Proceedings of Interspeech 08, Australia, pp. 293-296, 2008.
[3] J. Garofolo, L. Lamel , W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue, TIMIT
Acoustic-Phonetic Continuous Speech Corpus, Technical Document, Trustees of the
University of Pennsylvania, Philadelphia, 1993.
[4] K. Tokuda, H. Zen, and A.W. Black, An HMM-based speech synthesis system applied to
English, in IEEE Speech Synthesis Workshop, 2002.
[5] M. Alghamdi, A. Alhamid, and M. Aldasuqi, Database of Arabic Sounds: Sentences,
Technical Report, Saudi Arabia, 2003.
[6] M.A. Mansour Kacst arabic phonetics database. Riyadh, Kingdom of Saudi Arabia. 2004.
[7] G.Droua-Hamdani Algerian Arabic speech database (algasd). December 2010.
[8] R. Gordon, Ethnologue: Languages of the World, Texas: Dallas, SIL International, 2005.
[9] A. Omar Dirasat AlSwat AlLugawi.Cairo: Alam Al Kutub 1985.
[10] L. Pineda, G.mez M., D. Vaufreydaz and J. Serignat Experiments on the Construction of a
Phonetically Balanced Corpus from the Web, in Proceedings of 5th International Conference
on Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer
Science, Korea, pp. 416-419, 2004.
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
18
Signal Processing: An International Journal (SPIJ), Volume (8) : Issue (2) : 2014
19