Static Dictionary For Pronunciation Modeling
Static Dictionary For Pronunciation Modeling
Static Dictionary For Pronunciation Modeling
ISSN: 2319-1163
Abstract
Speech Recognition is the process by which a computer can recognize spoken words. Basically it means talking to your computer and
having it correctly recognize what you are saying. This work is to improve the speech recognition accuracy for Telugu language using
pronunciation model. Speech Recognizers based upon Hidden Markov Model have achieved a high level of performance in controlled
environments. One naive approach for robust speech recognition in order to handle the mismatch between training and testing
environments, many techniques have been proposed. some methods work on acoustic model and some methods work on Language
model. Pronunciation dictionary is the most important component of the Speech Recognition system. New pronunciations for the
words will be considered for speech recognition accuracy.
DICTIONARY REFINEMENT:
Sometimes Dictionary pruning is used to improve the Speech
Recognition accuracy. Dictionary pruning is done based on the
speech training database. we may arrive 2 types of problems
1. Words that are not included in the data do not have
information to be treated with and
2. Some words tend to keep pronunciations that were rarely
observed.
To solve the Unobserved words problem, we can use central or
summary pronunciations in the pruned Dictionary [3] . The aim
of this sort of pronunciation is to capture the phonetic contents
included in the set of pronunciation variants of each word and
to consolidate them in a reduced pronunciation set.
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org
185
ISSN: 2319-1163
Database
The speech database consists of 24 speakers voice and each
speaker spoken 40 sentences.
The database is verified with different times with increased
training data. We observed improved accuracy.
TRAIN
ING
(numbe
r of
speaker
s)
Transcription Modification:
when I get some type of errors , I modified the Transcription
for those words, i observed improved accuracy. The following
are the Examples of errors.
MEEREKKADA- MEE
EKKADIKI EKKADA
HYDERAABAD- MEERAEMI
BHAARATHADESAM-BHAARATHEEYULANDARU
The modified Transcription words are
MEERREKKADA
EKKADAKKI
HHYDERAABAD
BHAARATTHADESAM
AFTER
DICTIONARY
ADDITION
ACCURA ERRO
CY(%) RS(%)
12
12
51.159
85.797
69.420
88.478
16
58.370
71.196
76.413
65.000
20
59.826
73.696
79.130
65.870
The following are the Results when the wave files are noisy
EXPERIMENT
WORD
NAME
ACCURACY(%)
ERRORS(%)
50
61.364
59.091
51
88.696
21.739
52
84.348
31.304
53
80.870
65.217
54
93.913
16.522
55
87.826
24.348
56
73.913
68.696
57
96.522
16.522
58
89.565
18.261
59
77.391
61.739
60
88.696
38.261
61
62.609
95.652
62
90.435
21.739
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org
186
ISSN: 2319-1163
63
93.043
16.522
64
93.043
11.304
65
92.174
26.087
66
81.739
20.870
67
91.304
20.000
50
99.130
22.609
68
93.913
12.174
51
95.652
12.174
69
66.957
72.714
52
98.261
10.435
70
83.478
24.348
53
96.522
28.696
71
83.478
36.522
54
99.130
10.435
72
76.522
31.304
55
97.391
8.696
73
55.000
50.833
56
97.391
16.522
57
98.261
8.696
58
99.130
5.217
59
96.522
23.478
60
99.130
7.826
EXPER WORD
IMENT ACCURA
NAME
CY(%)
ERRORS(%)
EXPERIMENT
NAME
WORD
ACCURACY(%)
ERRORS(%)
61
94.696
30.739
50
80.870
40.000
62
98.261
6.957
51
88.696
19.130
63
98.261
6.957
52
86.957
21.739
53
81.739
45.217
64
96.522
8.696
54
91.304
18.261
65
100.000
15.652
55
92.174
14.783
66
94.783
7.826
56
87.826
21.739
67
99.130
6.957
57
94.783
12.174
68
97.391
6.957
58
93.913
10.435
69
93.913
36.522
59
84.348
37.391
70
93.913
16.522
60
96.522
10.435
71
93.913
13.913
61
73.913
53.043
72
96.522
9.565
62
92.174
13.043
73
99.130
4.348
63
93.043
12.174
64
92.174
13.043
65
94.696
22.957
66
81.739
20.870
67
93.043
14.783
68
93.913
12.174
69
78.261
51.304
70
80.870
29.565
71
85.217
21.739
72
86.957
19.130
73
93.913
9.565
CONCLUSIONS
The approaches discussed in this dissertation working well.
Initially the wave files are noisy. I removed the noise with
sound recorder .But sound recorder can remove noise present
at both the ends , not in the middle. so I used Praat to remove
the noise present in the middle part of wave files . I given these
wave files to the Sphinx tool ,i got word accuracy, errors and
confusion pairs .These confusion pairs are added to the
Dictionary using the approaches discussed in this thesis. finally
I observed improved accuracy and decreased errors.
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org
187
Description
Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and
0..n columns.
2.
Initialize
the
first
row
to
0..n.
Initialize the first column to 0..m.
Examine each character of s (i from 1 to n).
Examine each character of t (j from 1 to m).
If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
Set cell d[i,j] of the matrix equal to the
minimum of:
a. The cell immediately above plus 1: d[i-1,j] +
1.
b. The cell immediately to the left plus 1: d[i,j1] + 1.
c. The cell diagonally above and to the left plus
the cost: d[i-1,j-1] + cost.
3
4.
5.
6.
7.
REFERENCES:
[1] Gopala krishna Anumanchipalli, Modeling Pronunciation
Variation for Speech Recognition, M.S Thesis , IIIT
Hyderabad, February 2008.
[2] Eric John Fosler-Lussier, Dynamic Pronunciation Models
for Automatic Speech Recognition, Ph.d Thesis, University
of California, Berkeley, Berkeley, CA, 1999.
[3] Gustavo Hernandez-Abrego, Lex Olorenshaw, Dictionary
Refinement based on Phonetic Consensus and Non-uniform
Pronunciation Reduction, INTERSPEECH 2004 -ICSLP 8th
International Conference on Spoken Language Processing,
October 4-8, 2004.
[4] Hua Yu and Tanja Schultz, Enhanced Tree Clustering with
Single Pronunciation Dictionary for Conversational Speech
Recognition, in Proceedings of Eurospeech, pp. 1869-1872,
2003.
[5] Dan Jurafsky, Wayne Ward, Zhang Jianping, Keith Herold,
Yu Xiuyang, and Zhang sen, What Kind of Pronunciation
ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org
188
ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org
189