0% found this document useful (0 votes)

79 views9 pages

Hidden Markov Model and Persian Speech Recognition

This document summarizes a research paper on using hidden Markov models for Persian speech recognition. It provides background on speech recognition classifications and challenges. Previous studies on Persian speech recognition using neural networks achieved 85.5% phoneme recognition accuracy and 89.4% syllable recognition accuracy. Another study used hidden semi-Markov models and neural networks, achieving a 2.68% improvement in phoneme recognition accuracy over Gaussian mixture model-hidden Markov models. The current study uses hidden Markov models with syllable-based features to recognize Persian speech.

Uploaded by

Sana Isam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views9 pages

Hidden Markov Model and Persian Speech Recognition

Uploaded by

Sana Isam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Int. J. Nonlinear Anal. Appl.

In Press, 1–9
ISSN: 2008-6822 (electronic)
http://dx.doi.org/10.22075/ijnaa.2022.27851.3735

Hidden Markov model and Persian speech recognition

Masoume Shafieian
Department of Technology and Media Engineering IRIBU University, Tehran, Iran

(Communicated by Mohammad Bager Ghaemi)

Abstract

Nowadays, speech recognition, which simply refers to the process of converting an audio signal into its equivalent text,
has become one of the most important research topics. Although many studies have been conducted in the field of
speech recognition for many languages of the world, but can be said that no more study has been conducted in the
Persian language and therefore it is necessary to conduct more studies in this field. Since Persian is a rich language that
can create many new words by adding a suffix (prefix) to its main root, so it can be said that the success rate of voice
recognition programs in this language has also increased with the increase in the number of phonemes and therefore
can have a significant improvement. Therefore, in this study, a practical approach to Persian speech recognition based
on syllables, which are a unit between phonemes and words, has been used and done by the hidden Markov model.
After obtaining syllable utterances, multiple coefficients are calculated for all syllables. Finally, suitable models were
created and the success rate was calculated by conducting tests for the systems. To measure the performance of the
system, the error rate criterion was used. The results of this study show that the word error rate for the hidden
Markov model was 18.3% and increased the system performance by approximately 16% after post-processing.
Keywords: hidden Markov model, persian language, speech recognition, syllable, syllable based speech
recognition)=
2020 MSC: 62M05

1 Introduction
Phonemes are linguistic units that differentiate the words of the language from each other. Phonemes appear in
the signal level as a static function per unit of time, and for this reason, in speech processing, the values of each
representation parameter in the frequency domain in successive frames that belong to a phoneme or part of a phoneme
do not have statisticaly significant differences. For this reason, phoneme as a structural fact in the speech signal has
become extremely important in speech processing [23]. We know that every spoken language has a certain set of faces
and researches of linguistic sciences show that the general characteristics of this set are generally different for various
languages. For example, the set of English speech phonemes has 44 elements, while the set of Persian speech phonemes
has 19 elements [26].
Nowadays, due to the expansion of the use of multimedia contents, the importance of searching for sound, image,
text and video has become very important. The recognition of spoken phrases provides the possibility to retrieve a
phrase from the text representation of that speech, and therefore the recognition of spoken phrases can be consid-
ered a preliminary step for the recovery of spoken documents [25]. Therefore, the main use of spoken phrases can

Email address: [email protected] (Masoume Shafieian)

Received: June 2022 Accepted: August 2022

2 Shafieian

be summarized in Audio Indexing (AI) and speech data mining, (SDM). The most important challenge in speech
recognition is audio signals, sound transmission and recording environments, which are different from one speech to
another. Another challenging situation is the tone of voice caused by the speaker’s emotional state [14]. One of the
other challenges in this field is the wide variety of spoken documents. The existence of discrete or continuous speech,
clean or noisy speech is also one of the other challenges in this field, which causes the speed and accuracy of the
proposed solutions to be different. Therefore, these methods are not yet accurate and powerful whit compared to text
recovery methods [13].
Speech recognition can generally be classified into two categories, “pattern-based” or “model-based”, according to
the characteristics of the used method. Dynamic Time Warping (DTW), Linear Time Alignment methods (LTA), are
examples of pattern-based speech recognition. Multilayer Sensor, support vector machine (SVM) and hidden Markov
model (HMM) are models based methods. In template-based methods, a template is created for each sound sample
and compared with this template. On the other hand, in model-based methods, trained on sound samples and general
features are extracted and the final model is created [22].
In another classification, we can divide speech search methods into the following two categories: a- Direct Phonetic
Matching, b- Automatic Speech Recognition (ASR). In the direct phonetic matching approach, such as the dynamic
time warping method, an attempt is made to directly match the phonetic features between the speech and the keyword.
But in the second approach, first the speech is converted into text by the automatic speech recognition system, and
later the search is performed on the ASR output text using multiple text recovery methods [27].
The first speech recognition studies began in the late 1940s. However, these studies have gained momentum in the
last 30 years. Most of these studies have used phonemes and lexical units as basic components in speech recognition.
However, it must be acknowledged that determining the boundaries between phoneme units is a very difficult process
that must be carefully considered. In addition, systems based on lexical units, although they do not have the problems
of systems using phonetic units, but they involve a lot of calculations and data processing [14]. In a simple classification,
speech recognition systems can be classified by the number of words in small scale (1-100 words), medium (100-100
words) and large scale (more than 1000 words).
Speech recognition is a process of converting speech signal to a sequence of word. Various approach has been used
for speech recognition. We can see the classification of speech recognition in the below diagram:

Figure 1: Speech recognition clasification

Usually, the preferred features in speech recognition are Linear Prediction Coefficients (LPC), such as: [7, 17, 18, 22]
and the coefficients of MFCC and Parcor. The most widely used methods in studies are: Dynamic Time Warping
(DTW), such as [12, 15, 16], Artificial Neural Networks(ANN) and hidden Markov model (HMM) such as Studies:
[14, 24, 22].
Extensive studies for speech recovery in languages such as English [5, 6] have been conducted on a large scale (more
than 1000 words). In recent years, new researches have been conducted on languages such as Pashto, Vietnamese,
Turkish, with a limited and medium scale, which have reached acceptable results [1, 26]. Although the studies in the
Persian language are very limited, which have been done with both limited and large data, but it is necessary to carry
Hidden Markov model and Persian speech recognition 3

out more studies in this field. Therefore, this study has been done to expand the scope of internal studies and pay
more attention to applied resarche and theoretical findings in this field.
In the continuation of the article and in the second part of this research, a summary of the findings of previous
internal studies is presented and then general information about the overall structure of the system is presented. In
the rest of the article, it pointed out how to determine the boundaries of syllables and finally, how to extract the
features used from the sound signals of syllables is explained. Then, Markov’s hidden method, incremental algorithm,
system tests are respectively explained and finally conclusion and summary are stated.

2 Research background
With regard to the previous study and related to the subject that were mentioned in the previous sections, in this
section only a number of internal researches that are consistent with the method of this study will be mentioned.
Khanzadi et al [11] recognized Persian phonemes and syllables with neural networks and for the first time launched a
comprehensive system for this purpose. Various recognition modules are implemented including a phoneme recognition
system for the phoneme segmentation task, a syllable recognition system for the syllable segmentation task, and a
sub-word recognition system for the three types of phoneme deletion tasks including the initial, middle, and final
phoneme deletions. The findings of this study show that the accuracy rate for the phoneme recognition is 85.5%, and
for the syllable recognition, it is 89.4%. The accuracy rates for the initial, middle, and final phoneme deletions are
96.76%, 98.21%, and 95.9%, respectively.
Asadolahzade and Homayounpour [2] have investigate to improve phoneme sequence recognition with using hidden
semi-Markov model (HSMM) and neural networks (HSMM-DNN). Furthermore, they investigate the performance
of a post-processing method that corrects the phoneme sequence obtained from the neural network based on our
knowledge about phonemes.The experimental results obtained using the Persian Fars Dat corpus show that using the
extended Viterbi algorithm on HSMM achieves phoneme recognition accuracy improvements of 2.68% and 0.56% over
the conventional methods using Gaussian mixture model-hidden Markov models (GMM-HMMs) and Viterbi on HMM,
respectively. In addition, postprocessing method also increases the accuracy compared to before its application.
Zoughi and Homayounpour [28] in a study related to Persian language, by presenting a adaptive windows con-
volutional neural network(AWCNN), have investigated the speech recognition system in relation to the difference in
expression between speakers and the difference in the expressions of a speaker.The obtaind results and analysis on
FARSDAT and TIMIT datasets show that,for phone recognition task, the proposed structure achieves 1.2%, 1.1% ab-
solute error reduction with respect to CNN models respectively, which is a considereble improvement in this problem.
They conclude that the use of speaker information is very beneficial for recognition acuuracy.
Sheikh Zadegan [23] has studied the effectiveness of Persian speech phonemes in terms of speaker recognition. To
estimate phoneme efficiency, he used a criterion defined as the ratio of “inter-speaker distance (IerSD)” of phonemes to
“intera-speaker distance (IraSD)”. The results of the tests and calculations performed with the “FARSDAT” dataset
showed that vowels and semi-vowels are in the first place in terms of efficiency in speaker recognition.
Homayounpour, Mousavi [9] have used the hidden Markov methods to model the parameters related to speech
units to implement the synthesis system. In order to generate speech synthesis parameters by HMMs, an algorithm
has been used in which the characteristics of Mel Frequency Cepstral coefficients and pitch frequency, as well as their
first and second derivatives have been used. The results of this study show that the scores obtained for the training
sentences (the sentences that were available in the used data set) according to the determined parameters set in 4.2,
4.4 and 4.1 respectively, and for the experimental sentences (sentences outside the used data set) are 4.3, 4.2 and 3.4
respectively.
Salehi [20] used hidden Markov models and artificial neural networks to create a speech recognition system capable
of recognizing Persian digits. She used the CSLU toolbox to implement the combined ANN/HMM model for Persian
speech recognition and collected 210 samples of the speech of a male person and after removing the noises, he manually
labeled 47 of the samples. Then, the remaining training samples were automatically labeled and new ANN neural
networks were created for the final recognition of the three-layer MLP. To extract the features, four methods including
MEL (12 coefficients), MEL derivative (12 coefficients), energy (1 coefficient), and energy derivative (1 coefficient)
were used and by applying recognition on the data, the success of the test was 99.4%, which Considering the small
number of speech data, it is considered a very suitable result.
4 Shafieian

3 General structure of the model

As shown in Figure 2, our proposed developed system consists of four stages. These stages are respectively pre-
processing stage, detection of syllable boundaries and features, hidden Markov model (speech recognition method) and
finally post-processing. In the first step, audio signals of predetermined words are pre-processed. In the second step,
the feature vectors corresponding to each syllable sound signal are determined. In the third stage, syllable recognition
is done with Markov’s hidden method [14] and in the last stage, post-processing method are performed to increase the
success of speech recognition.

Figure 2: General structure of the system

In order to calculate the feature pattern of each syllable, the small Persian data base “FARSDAT” [3] is used. In this
data, 304 speakers randomly read twenty sentences. The data of this database were recorded in a low-noise environment
with an average signal to noise of about 31 db and a sampling rate of 22050 Hz. These data are segmented and labeled
at the phoneme level. All 450 unique sentences used in this database have been used. From these data, which contain
1414 unique words whose phonetic equivalents are available in the dataset, samples were taken and pre-processing
was performed using 16-bit pulse code modulation (PMC). In the pre-processing stage, the average audio signals were
rearranged to become zero.
To get the new sound signal (yn ), we can use equation (1) where xn is the sound signal; m is the average sound
signal: P
k
i=1 x i
yn = xn − m, m = (1)
k
Before the syllable boundaries are defined and the feature extraction process is performed, the sample sounds are
pre-emphasized. Then syllable boundaries are defined. The sound samples of each syllable are divided into 20 ms
frames and Hamming windowing is applied on the frames. The overlap between the frames is considered 10 milliseconds
and then for each frame of the syllable, 8 feature values are obtained from the LPC, parcor and MFCC feature vectors.
Hidden Markov model and Persian speech recognition 5

With the HMM speech recognition method, the HMM syllable model is created for each syllable in the training
phase. Then, by calculating the similarity between syllable sound signals and syllable patterns, the recognized syllables
in the word are determined. Post processing is done at the end of proses to realize better recognition. In this study,
applications are coded with Matlab software version 2019.

4 Setting syllable boundaries

The method of determining syllable boundaries consists of two stages. The first step is the process of determining
the starting and ending points of words and audio signals. For this, the parts without noise are removed until the part
where the word is pronounced and again from the place where the pronunciation of the word ends until the end.The
second step is the process of determining the boundaries of the syllables in the word. The algorithm for determining
syllable boundaries is given below.

4.1 Algorithm for determining the boundaries of syllables in a word

1. After determining the beginning and end indices (SB and SS) of the sound, syllable boundaries are determined
with the following algorithm:

n = (n1 , n2 , . . . , nk ) = (x̃SB , x̃SB+1 , . . . , x̃SS ) (1)

2. The n vector is divided into windows with k number of non-overlapping samples. The n̄ vector is also the average
of each window with L samples:
K
n̄ = (n̄1 , n̄2 , . . . , n̄p ), P = (2)
  L
(i+1)∗ L−1
X
n̄i =  nm  /L, i = 1, 2, . . . , p (3)
n=m∗ L

3. The slopes between successive values of the N vector are calculated and the training vector is formed. For
i = 1, 2, . . . , p − 1,
n̄E = (n̄E1 , n̄E2 , . . . , n̄Ep−1 ) ve n̄Ei = n̄i+1 /n̄i (4)
4. A new vector a = (a1 , a2 , . . . , ap−1 ), consisting of +1 and -1, is calculated from the slope vector. In this way,
the increasing and decreasing vector is calculated. Therefore, we will have:

For k = 1 To p − 1
If ak−1 = 1 and ak = −1
(5)
Otherwise ak = −1
End
5. H: Number of syllables in the word
H=0
For k = 2 To p − 1
if ak−1 = 1 and ak = −1 (6)
if H = H + 1
End
6. Index groups containing -1 values in vector a are values with main syllable boundaries. The margin of syllables
will be H − 1. The s = (s1 , s2 , . . . , sH−1 ) vector is calculated for syllable boundaries. The Si values are the
values that hold the indices of the x̃ vector:
For k = 1 To H − 1
If the index in the middle of the indexes with kth consecutive -1 values in vector a is W :
Sk = SB + L ∗ W
END
7. So far, the starting value of SB and the ending value of SS of the sound have been determined exactly in
the x̃ vector. The vector S is the vector of approximate boundary indices between syllables. To find more
precise boundaries, the following procedure is performed and its s̃ = (s̃1 , s̃2 , . . . , s̃H+1 ) vector is obtained. Here,
assuming s̃1 = SB and s̃H+1 = SS:
6 Shafieian

For i = 1 To H − 1
In the interval Si − 500 and Si + 500, windows with the number of 20 samples are formed and after calculating
the average of these windows, a window with the smallest average is selected and the index between this window,
will be equal to s̃i+1 = q.
8. Syllable boundary indices in the x̃ sound vector are found in the form of the s̃ vector. The beginning of the
kth syllable will be from s̃k and the end of that syllable will be index s̃k+1 . In each word, will be H number of
syllables.

5 Identifying the features of LPC, Parcor and MFCC

Before calculating LPC, parcor and MFCC features, vectors of syllable audio signals are filtered by preprocessing.
Then divided into 20 millisecond frames. After a 10 ms overlap, a Hamming windowing is applied to each frame.
In this section, the correlation vector and the autocorrection vector [1, 18] are calculated. In the following, with the
method proposed by Rabiner, Juang [18], Predictive linear coding and Parcor feature extraction. At the end, the
value of eight characteristics of LPC, parcor and MFCC are obtained for each frame. Finally, these generated feature
vectors for each syllable are saved with file name, syllable name and file extension “fetN” for later use. The letter N
indicates that the syllable in the word is which syllable.

6 Hidden Markov model

HMM is a method that statistically models audio signals. This method is one of the most successful speech
recognition methods and has the ability to mathematically describe audio signals in a very convenient way. The
inputs in this method are a representation of time-dependent discrete data that is displayed as a vector. An HMM
consists of finite states, and each of them is connected by probability distributions. Transitions between states are
determined by probability values called “transition probabilities” [4]. An observation or outcome in a state is obtained
from the probability distributions that depend on it. Since the states are not visible to outside observers, then word
“hidden” is used in this method. To define the HMM method, the following variables are needed:
N : the number of modes in the model.
M : number of viewing symbols in alphabets. If the observations are continuous, M will be infinite.
A: Transfer probabilities as seen in equation (1).

A = {aij }
aij = p{qt+1 = j|qt = i}, 1 ≤ i, j ≤ N (1)

qi represents the current state. The transition probabilities provide the normal probability constraints in equations
(2) and (3):

aij ≥ 0, 1 ≤ i, j ≤ N (2)
N
X
aij = 1 1≤i≤N (3)
j=1

The probability distribution of states can be seen as shown in equation (4).

B = {bj (k)}
bj (k) = p{0t = vk |qt = j}, 1≤j≤N 1≤i≤M (4)

vk , in alphabetical order, represents the observation symbol k. ot is the current parameter vector. The possible
constraints in equations (5) and (6) must be satisfied.

bj (k) ≥ 0, 1 ≤ j ≤ N, 1≤k≤M (5)

M
X
bj (k) = 1, 1≤j≤N (6)
k=1
Hidden Markov model and Persian speech recognition 7

If the observations are continuous, we should use probability density function instead of discrete probability. In
this case, we must specify the parameters of the probability density function. In general, as seen in equation (5),
the probability density M of the Gaussian distribution approaches the sum of their approximate weights. Possible
restrictions must be met in the following relationship:
 
M
X X
bj (Ot ) = Cjm Ω µjm , , Ot 
m=1 jm

Cjm : weighting coefficients (7)

µjm : Mean vectors
X
: Joint exchange matrices
jm

Cjm ≥ o, , 1 ≤ J ≤ N, 1≤m≤M (8)

M
X
Cjm = 1, 1≤j≤N (9)
m=1

The initial state distributions are given in the following equations.

π = {πi }
πi = p{q1 = i}, 1≤i≤N (10)
If we want to use a more compact notation, we can express the probability distribution of this method using a
continuous density as seen in Equations (11) and (12):
λ = (A, B, π) (11)
λ = (A, cjm , µjm , Σjm , π) (12)

7 Post-processing algorithm
After completing the syllable recognition process with the HMM method, the syllables are combined and the
known word is identified.However, this found word may be a non-Persian word as a result of misrecognition. In order
to increase recognition success, each syllable is sorted according to the first 10 orders. Therefore, Persian words are
searched by combining syllables based on the highest order, and if a Persian word is found, the identification process
ends.
N : the number of syllables related to the word retrieved from the test database.
Hk(S): The kth syllable of the tested word is the most similar to the sth ranked syllable.
1. i = 1, 2, 3, . . . , 10 and Si is one of the ten syllables that is most similar to the i-th syllable.
Syllables are combined as H1(S1)H2(S2) . . . H10(S10) and a new word is formed. A total of 10N words are
obtained.
2. A level determined for each word. The sum of the rows of syllables that make up the word in step 1 is calculated
and this sum will be the level of that word.
3. The word level starts with the smallest one and if this word exists in the word database, the word is found and
the process ends regardless of other words. If there are no words in the database, the system cannot find a word.

8 Implementation of tests and research findings

After determining the syllable boundaries of audio files, LPC, Parcor and MFCC features of each syllable were
calculated. 10 syllables similar to each syllable of the words in the test database can be obtained using the hidden
Markov model method. Syllables with the least distance value will be the most similar syllables.
If the syllables with the smallest distance from the syllables of the word in the test database are combined, the
closest text word is obtained. The detection rates of the system depending on the features used and whether post-
processing was used or not are given in Table 1. Accordingly, the recognition success using post-processing increased
by about 16%. The greatest success was achieved using post-processing in the MFCC feature, with a success rate of
81.3%.
8 Shafieian

Table 1: System word error rate

Characteristics
Speech Recognition Method
LPC MFCC PARCOR
Hidden Markov 41.1 36.3 32.5
Hidden Markov (post-processing) 23.9 19.5 18.7

9 Conclusion
In this paper, speech recognition systems for discrete and speaker-dependent Persian words based on syllables
were developed using hidden Markov model method. As main features, linear predictive coding (LPC), parcor and
MFCC features were selected and the programs were implemented and compared. The results of the tests showed
that the post-processing method included in the system has greatly increased the performance of the system. The
most successful feature was related to the MFCC system and the word error rate was determined to be 18.7%. After
MFCC, the order of feature success belonged to parcor and LPC.

References
[1] A. Asliyan, K. Günel and T. Yakhno,Syllable Based Speech Recognition Using Dynamic Time Warping, Academic
Informatics, Canakkale Onsekiz Mart University, Canakkale, 2008.
[2] M. Asadolahzade Kermanshahi and M.M. Homayounpour, Improving phoneme sequence recognition using
phoneme duration, J. AI and Data Min. 7 (2018), no. 1, 137–147.
[3] M. Bijankhan, J. Shcikhzadegan, M.R. Rohani, Y. Samareh, C. Lucas and M. Tebyani, FARSDAT- The speech
database of Farsi spoken language, Proc. Aust. Conf. Speech Sci. Technol. 2 (1994), 826–831.
[4] M. Farsinejad, B. Zamani Dehkordi and A. Akbari, Proposing a two-stage sound detector method based on the
hidden Markov model, The fourteenth Ann. Nat. Conf. Iran. Comput. Assoc., Amirkabir University of Technology,
2007.
[5] J.G. Fiscus, J. Ajot, J.S. Garofolo and G. Doddingtion, Results of the 2006 spoken tcrm detection evaluation,
Proc. ACM SIGIR Work, 2006, pp. 51–55.
[6] J.S. Garofolo, C.G.P. Auzance and E.M. Voorhees, The TREC spoken document retrieval track: A success story,
Proc. TREC-8 8940 (1999), no. 500-246, 109–130.
[7] A. Harma, A comparison of warped and conventional linear predictive coding, IEEE Trans. Speech Audio Process.
9 (2001), no. 5, 579–588.
[8] A. Harma, Linear predictive coding with modified filter structures, IEEE Trans. Speech Audio Process. 9 (2001),
no. 8, 769–777.
[9] M. M. Homayunpour and S. M. Mousavi, Generation of Persian speech synthesis parameters using hidden Markov
and decision tree models, J. Comput. Sci. Engin. 2 (2007), no. 1–3.
[10] R.J. Jones, S. Downey and J.S. Mason, Continuous speech recognition using syllables, Proc. Eurospeech 3 (1997),
1171–1174.
[11] M. Khanzadi, H. Veisi, R. Alinaghizade and Z. Soleymani, Persian phoneme and syllable recognition using recur-
rent neural networks for phonological awareness assessment, J. Artif. Intell. Data Min. 10 (2022), no. 1, 117–126.
[12] J. Kruskall and M. Liberman, The symmetric time warping problem: From continuous to discrete. In Time Warps,
String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley Publishing
Co., 1983.
[13] L. Lee, J. Glass, H. Lee and C. Chan,Spoken content retrival beyond cascading speeech rcognition whit text retrival,
IEEE/ACM trans. Audio Speech Lang. Process. 23 (2015), no. 9, 1389–1420.
[14] E. Mengusoglu and O. Derro, Turkish LVCSR: Database preparation and language modeling for an agglutinative
language, ICASSP’2001, Student Forum, May, Salt-Lake City, 2001.
Hidden Markov model and Persian speech recognition 9

[15] C.S. Myers, L.R. Rabiner and A.E. Rosenberg, Performance tradeoffs in dynamic time warping algorithms for
isolated word recognition, IEEE Trans. Acous. Speech Sig. Process. ASPP-28 (1980), no. 6, 623–635.
[16] K.K. Paliwal, A. Agarwal and S.S. Sinha, A modification over Sakoe and Chiba’s dynamic time warping algorithm
for isolated word recognition, Signal Process. 4 (1982), no. 4, 329–333.
[17] J.G. Proakis, and D.G. Manolakis, Digital Signal Processing: Principles and Application, Prentice-Hall, Upper
Saddle River, NJ, 1996.
[18] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prenctice-Hall, Englewood Cliffs, NJ, 1993.
[19] A.E. Rosenberg, L.R. Rabiner, S.E. Levinson and J.G. Wilpon, A preliminary study on the use of demisyllables
in automatic speech recognition, Conf. Rec. Int. Conf. Acous. Speech Sig. Process. GA, 1981, pp. 967–970.
[20] F. Salehi, Speech recognition using methods of hidden Markov models and artificial neural networks and hybrid
speech recognition systems, Nat. Conf. Engin. Sci. New Ideas, 2013.
[21] Y. Samere, Phonology of the Persian language, University Publishing Center, Second Edition, 1368.
[22] I. Shafran, Clustering wide context and HMM topologies for spontaneous speech recognition, Ph.D. Thesis, Uni-
versity of Washington, 2001.
[23] J. Sheikh Zadegan, Ranking of persian speech phonemes from the point of view of efficiency in speaker recognition,
J. Languge Res. 7 (2015), no. 1, 77–96.
[24] T. Svendsen, K.K. Paliwal, E. Harborg and P.O. Husoy, A modified acoustic sub-word unit based speech recognizer,
Proc. IEEE Int. Conf. Acoustics Speech Signal Process. 1989, pp. 108–111.
[25] J. Tejedor, D.T. Toledano, P. Lopez-Otero, L. Docio-Fernandez, L. Serrano, I. Hernaez, A. Coucheiro-Limeres,
J. Ferreiros, J. Olcoz and J. Llombart, AlBAYZIN 2016 spoken term detection evaluation: An international open
competitive evaluation in Spanish, EURASIA J, Audio. Speech, Music Process. 2017 (2017), no. 1, 1–23.
[26] J. Trmal, M. Wiesner, V. Peddinti, X. Zhang, P. Ghahremani, Y. Wang, V. Manohar, H. Xu, D. Povey and
S. Khudanpur, The Kaldi open KWS system: improving low resource keyword search, Interspeech, 2017, pp.
3597–3601.
[27] H. Veisey, S.A. Qureshi and A. Bastan Fard, Recognition of speech phrases for Farsi news of the Islamic Republic
of Iran, Signal Data Process. Quart. 4 (2019), no. 46.
[28] T. Zoghi and M.M. Homayounpour, Adaptive windows convolutional neural network for speech recognition, Signal
Data Process. Quart. 3 (2017), no. 37.

WFP 0000102103
No ratings yet
WFP 0000102103
119 pages
Module 5 Social and Emotional Development of Children and Adolescents
No ratings yet
Module 5 Social and Emotional Development of Children and Adolescents
11 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
K. M. Petyt Study of Dialect
No ratings yet
K. M. Petyt Study of Dialect
231 pages
Fundamentals of Speech Recognitiony - Lawrence Rabiner - Biing-Hwang Juang PDF
No ratings yet
Fundamentals of Speech Recognitiony - Lawrence Rabiner - Biing-Hwang Juang PDF
546 pages
Reading and Writing MELCs BOW
No ratings yet
Reading and Writing MELCs BOW
1 page
4.1 - Phases of The Psychological Assessment Process
No ratings yet
4.1 - Phases of The Psychological Assessment Process
80 pages
Organizational Behavior Chapter 16 Quiz
0% (2)
Organizational Behavior Chapter 16 Quiz
3 pages
Health Essay Sample
100% (2)
Health Essay Sample
7 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Review of Feature Extraction Techniques in Automatic Speech Recognition
100% (1)
Review of Feature Extraction Techniques in Automatic Speech Recognition
6 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Speech Recognition: From Wikipedia, The Free Encyclopedia
0% (1)
Speech Recognition: From Wikipedia, The Free Encyclopedia
16 pages
Theories of Learning
No ratings yet
Theories of Learning
8 pages
Handout 3a-Elements of Pedagogy
No ratings yet
Handout 3a-Elements of Pedagogy
1 page
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Getting Started in UX
No ratings yet
Getting Started in UX
12 pages
Bcse207l - Programming-For-Data-Science - TH - 1.0 - 71 - Bcse207l - 66 Acp
No ratings yet
Bcse207l - Programming-For-Data-Science - TH - 1.0 - 71 - Bcse207l - 66 Acp
2 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Defining Competencies and Establishing Team Training Requirements
No ratings yet
Defining Competencies and Establishing Team Training Requirements
24 pages
Arabic Speech Recognition Systems
No ratings yet
Arabic Speech Recognition Systems
8 pages
Unit 1 The Nature and Context of Social Research
No ratings yet
Unit 1 The Nature and Context of Social Research
48 pages
Intelligence & Learning Style
No ratings yet
Intelligence & Learning Style
50 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
ASR Proof
No ratings yet
ASR Proof
19 pages
Senior Research Guidelines 23 1 20
No ratings yet
Senior Research Guidelines 23 1 20
28 pages
Build Automatic Speech Recognition System: Bachelor of Technology
No ratings yet
Build Automatic Speech Recognition System: Bachelor of Technology
25 pages
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
No ratings yet
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
36 pages
Project Monitoring & Evaluation (BMPR 3)
No ratings yet
Project Monitoring & Evaluation (BMPR 3)
21 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
A Report On
No ratings yet
A Report On
35 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
Xiao Guest Lecture ASR
No ratings yet
Xiao Guest Lecture ASR
39 pages
Redaction HTK Amazigh Speech
No ratings yet
Redaction HTK Amazigh Speech
15 pages
Speech Recognition Application
No ratings yet
Speech Recognition Application
13 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
MLAll Practical
No ratings yet
MLAll Practical
27 pages
10 1108 - JMD 02 2021 0048
No ratings yet
10 1108 - JMD 02 2021 0048
27 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
17 pages
FARSDAT
No ratings yet
FARSDAT
12 pages
EJ1294348
No ratings yet
EJ1294348
17 pages
Research Paper
No ratings yet
Research Paper
9 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Adalya
No ratings yet
Adalya
10 pages
Comp Sci - Recognition Isolated - Shanthi Teressa1
No ratings yet
Comp Sci - Recognition Isolated - Shanthi Teressa1
6 pages
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
No ratings yet
Automatic Speech Recognition: 2.1 Relevant Keywords From Probability Theory and Statistics
14 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Educational Psychology Chapter 2 Part 1
No ratings yet
Educational Psychology Chapter 2 Part 1
16 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Arabic Automatic Speech Recognition Transcripts
No ratings yet
Arabic Automatic Speech Recognition Transcripts
9 pages
Fine-Grained Arabic Dialect Identification
No ratings yet
Fine-Grained Arabic Dialect Identification
13 pages
Speech Recognition Using Neural Networks IJERTV7IS100087
No ratings yet
Speech Recognition Using Neural Networks IJERTV7IS100087
7 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Voice Recognition System
No ratings yet
Voice Recognition System
4 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Historical Foundation
No ratings yet
Historical Foundation
13 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
DIWAN A Dialectal Word Annotation Tool For Arabic
No ratings yet
DIWAN A Dialectal Word Annotation Tool For Arabic
10 pages
Reviewer in NSTP II PDF
No ratings yet
Reviewer in NSTP II PDF
7 pages
Assignment Submission Speech Recognition System Architectural Design
No ratings yet
Assignment Submission Speech Recognition System Architectural Design
5 pages
Punjabi Speech Recognition: A Survey: by Muskan and Dr. Naveen Aggarwal
No ratings yet
Punjabi Speech Recognition: A Survey: by Muskan and Dr. Naveen Aggarwal
7 pages
25 The Comprehensive Analysis Speech Recognition System
No ratings yet
25 The Comprehensive Analysis Speech Recognition System
5 pages
2015 Attention Based Models For Speech Recognition Paper
No ratings yet
2015 Attention Based Models For Speech Recognition Paper
9 pages
A Multi Purpose and Large Scale Speech Corpus in Persian and English For Speaker and Speech Recognition The Deepmine Database
No ratings yet
A Multi Purpose and Large Scale Speech Corpus in Persian and English For Speaker and Speech Recognition The Deepmine Database
6 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
A Speaker Independent Continuous Speech Recognizer For Amharic
No ratings yet
A Speaker Independent Continuous Speech Recognizer For Amharic
5 pages
Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique
No ratings yet
Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique
8 pages
Presentation On Speech Recognition
No ratings yet
Presentation On Speech Recognition
11 pages
Continuous Density Hidden Markov Model For Hindi Speech Recognition
No ratings yet
Continuous Density Hidden Markov Model For Hindi Speech Recognition
7 pages
Proposal of An Intelligent Speech Recognition System: November 2012
No ratings yet
Proposal of An Intelligent Speech Recognition System: November 2012
7 pages
Lecture 9
No ratings yet
Lecture 9
39 pages
CSIR Seminar Grant Result July 21
No ratings yet
CSIR Seminar Grant Result July 21
4 pages
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
No ratings yet
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
6 pages
9.1 Improvement in Food Resources
No ratings yet
9.1 Improvement in Food Resources
4 pages
Iccsee 2012 359
No ratings yet
Iccsee 2012 359
4 pages
Biostats Notes
No ratings yet
Biostats Notes
2 pages
4 Lect Preparation and Planning
No ratings yet
4 Lect Preparation and Planning
6 pages
Placement Metrics Ece 2026
No ratings yet
Placement Metrics Ece 2026
5 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Essential Methods For Design To Value" (DTV) in Procurement and Cost & Value Engineering - Design To Cost
No ratings yet
Essential Methods For Design To Value" (DTV) in Procurement and Cost & Value Engineering - Design To Cost
2 pages
Psy504 Assignment No 1
No ratings yet
Psy504 Assignment No 1
2 pages
Least Learned Competencies Secondary SY 2022
No ratings yet
Least Learned Competencies Secondary SY 2022
1 page
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Hidden Markov Model and Persian Speech Recognition

Uploaded by

Hidden Markov Model and Persian Speech Recognition

Uploaded by

Int. J. Nonlinear Anal. Appl.

Hidden Markov model and Persian speech recognition

(Communicated by Mohammad Bager Ghaemi)

Email address: [email protected] (Masoume Shafieian)

Received: June 2022 Accepted: August 2022

Figure 1: Speech recognition clasification

3 General structure of the model

Figure 2: General structure of the system

4 Setting syllable boundaries

4.1 Algorithm for determining the boundaries of syllables in a word

n = (n1 , n2 , . . . , nk ) = (x̃SB , x̃SB+1 , . . . , x̃SS ) (1)

5 Identifying the features of LPC, Parcor and MFCC

6 Hidden Markov model

The probability distribution of states can be seen as shown in equation (4).

bj (k) ≥ 0, 1 ≤ j ≤ N, 1≤k≤M (5)

Cjm : weighting coefficients (7)

Cjm ≥ o, , 1 ≤ J ≤ N, 1≤m≤M (8)

The initial state distributions are given in the following equations.

8 Implementation of tests and research findings

Table 1: System word error rate

You might also like