167 574 3 PB
167 574 3 PB
167 574 3 PB
net/publication/316449552
CITATIONS READS
8 281
3 authors:
Azad Ameen
Charmo University
5 PUBLICATIONS 21 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Currently, we are focusing on Smart-City services including tracking & fall-detecting (of elderly people!) services View project
All content following this page was uploaded by Abdulbasit Al-Talabani on 19 April 2020.
Abstract–Dialect recognition is one of the hottest topics in the collected data. Consequently, generalizing the developed
speech analysis area. In this study, a system for dialect and language algorithms starting with the set of the used feature or the
recognition is developed using phonetic and a style-based features. classification methods is generally non-convincing. For this
The study suggests a new set of feature using one-dimensional local reason, some studies focus on using collected data under
binary pattern (LBP). The results show that the proposed LBP set
specific condition which “preserve” the real-life characteristic
of the feature is useful to improve dialect and language recognition
accuracy. The acquired data involved in this study are three of the data. A study made by Huang and Hansen (2007)
Kurdish dialects (Sorani, Badini, and Hawrami) with three neighbor addresses novel advances in unsupervised spontaneous DR in
languages (Arabic, Persian, and Turkish). The study proposed a English and Spanish. The problem considers the case where
new method to interpret the closeness of the Kurdish dialects and no transcripts are available for training and test data, and
their neighbor languages using confusion matrix and a non-metric speakers are talking spontaneously. In this study, we adopt
multi-dimensional visualization technique. The result shows that the use of spontaneous speech signals recorded from show
the Kurdish dialects can be clustered and linearly separated from and debate TV programs.
the neighbor languages. In the literature, some of the studies focus on investigating
Index Terms—Dialect recognition, Language processing, the nature of dialect speech signals. For example, in Bahari,
Speech analysis, Machine learning, Local binary pattern. et al. (2014) a non-negative factor analysis approach is
developed for Gaussian mixture model (GMM) weight
decomposition and adaptation. Their study show that GMM
1. Introduction weights carry less, yet complimentary, information to GMM
Dialect is the language variation of a population established means for language and DR. In addition, in Patil and Basu,
based on various real-life conditions (Chen, et al., 2010). 2009, a new method of machine learning, called modified
Recently, dialect recognition (DR) has become a hot topic for its polynomial networks is proposed for the DR problem in an
wide applications in speech recognition and forensic. Adapted Indian language. The proposed algorithm for machine learning
speech recognition system needs different tools such as the is interpreted as designing a neural network by viewing it as
recognition of the dialect or the accent to normalize the speech a curve fitting (approximation) problem in a high-dimensional
samples for speech recognition system. For example, Hirayama, space with the help of radial-basis functions.
et al. (2015) develop an automatic speech recognition system The research of language and DR is widely using template
that accepts a mixture of various kinds of dialects. based and/or phonetic based techniques. The template-
There are several challenges in DR research area, such based DR adopts the use of global parameters of the speech
as the collection of speech data, which needs to model signal regardless the specific characteristics of the available
the diversity of the studied dialects and/or languages phonemes related to each dialect. This kind of studies has
(Diakoloukas, et al., 1997). The conclusions made by the been frequently used as in Choueiter, et al. (2008) which find
researches on DR are mostly restricted to the available that a purely acoustic approach based on a combination of
heteroscedastic linear discriminant analysis and maximum
mutual information training is very effective. However,
ARO-The Scientific Journal of Koya University phonetic-based DR is also adopted and compared with
Volume V, No 1(2017), Article ID: ARO.10167, 4 pages acoustic and token-based DR and also found to be effective
DOI: 10.14500/aro.10167
as in Diakoloukas, et al. (1997).
Received 27 October 2016; Accepted 04 April 2017
Regular research paper: Published 24 April 2017
Another approach that adopted for DR is phonetic based
Corresponding author’s e-mail: [email protected] recognition of dialect. This approach adopts the use of local
Copyright © 2017 Abdulbasit K. Faeq, Zrar K. Abdul and Azad A. Ameen. feature that reflects the presence of various phonemes in
This is an open-access article distributed under the Creative Commons each language or dialect. For example, Chen, et al. propose
Attribution License. supervised and unsupervised learning algorithms to extract
language, 15 different speakers are involved; each speaker obtained for dialect and language recognition obtained by
has three different 2 s duration recordings. Consequently, fusing LPC and LBP features.
45 samples are recorded for each dialect and each language. The second aim of this study is to show how close each
The total length is 6 s for one speaker and 90 s for each Kurdish dialect to the neighbor languages as an attempt
individual dialect or language. to study the influence of the neighbor languages and the
Kurdish dialects on each other from a phonetic and style
based of view. For this purpose, CM of the accuracy results
III. Methodology is used and visualized by an NMDS technique using SPSS
The procedure of DR in this study adopts the use of software. The CM of the highest result obtained by fusing
three sets of features, which are MFCC, LPC, and LBP. LPC and LBP and the visualized form using NMDS are
Individual feature sets and their fusions at the feature level shown in Table II and Fig. 2, respectively.
feed a pairwise based support vector machine classifier, This study suggests to interpret the relations and the
with linear kernel function and sequential minimal influence of different languages and dialects through the CM
optimization optimization method. The protocol used in of the recognition procedure.
this study use the whole set of the data and validate them From Fig. 2, we can clearly observe that the Kurdish
using leave one sample out validation approach. Fusion at dialects are clustered in the top of the graph such that it can
the feature level for a couple sets of feature and the whole
sets of features is computed. To visualize the relation TABLE I
Recognition accuracy (%) of experiments using various features and
between the classes, confusion matrix (CM) a no-metric
their fusions
multi-dimensional scaling (NMDS) is adopted. NMDS
Feature sets Kurdish DR accuracy Languages DR accuracy
is an optimization procedure that aims to estimate the
non-metric relations between different objects. To show MFCC 71 67.8
LPC 78 71.8
the significance of the improvement, a chi-square test is
LBP 76 46
used and p value is computed for each comparison made LBP‑MFCC 88.9 73
between the results. LBP‑LPC 89.6 81.1
LPC‑MFCC 74.8 70
ALL 88.2 80
IV. Result and Discussion MFCC: Mel frequency cypstrum coefficients, LPC: Linear prediction coefficients,
LBP: Local binary pattern, DR: Dialect recognition
Using the method presented in the past section, experiments
are conducted for the three different types of feature
TABLE II
(MFCC, LPC, and LBP) and their fusion at the feature CM for the whole involved classes using LBP and LPC features
level. Table I shows the obtained recognition accuracy for
LBP_LPC Sorani Hawrami Badini Arabic Persian Turkish
both Kurdish dialects and the involved languages. Based on
Sorani 36 3 5 1 0 0
the phonetic characteristic of the MFCC and LPC features,
Hawrami 2 39 0 2 2 0
we can observe how both of these features are similarly Badini 2 0 40 3 0 0
contribute in dialect and language recognition. While the Arabic 4 0 2 36 0 3
pattern regarding the LPB feature, which reflects the style Persian 1 2 0 0 38 4
characteristic of the speech signal, is totally different Turkish 7 0 0 2 6 30
between the Kurdish dialects recognition from one side and CM: Confusion matrix, LPC: Linear prediction coefficients, LBP: Local binary pattern
the languages recognition from the other sides (76-46%).
This could be interpreted by the observation that the dialects
of the same language are mostly different in style of the
speech, while the languages are phonetically different.
In the other hand, from the fusion-based experiments,
it can be clearly observed how the LBP fusion with both
MFCC and LPC can significantly improve the recognition
accuracy for Kurdish dialects (from 71% to 88.9% with
p=5.1E-9, and from 78% to 89.6% with p=0.001, for both
MFCC and LPC, respectively) and also for Language
recognition (from 67.8% to 73% with p=0.02 and from
71.8% to 81.1% with p=0.002 with both of MFCC
and LPC, respectively). This improvement reflects the
complementarity characteristic of the LBP feature to the
widely used phonetic based features (MFCC and LPC in
this study). This complementarity of LBP to both MFCC
and LPC is also supported by the non-improved recognition Fig. 2. No-metric multi-dimensional scaling figure for the confusion matrix
accuracy when MFCC and LPC are fused. The best result shown in Table II.
be separated linearly from the involved languages. Another Bahari, M.H., Dehak, N., Burget, L., Ali, A.M. and Glass, J., 2014. Non negative
observation is that the Sorani and Badiny dialects are closer factor analysis of gaussian mixture model weight adaptation for language and
dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language
to each other than the Hawrami dialect and the nearest Processing, 22(7), pp.1117-1129.
language to these two dialects is the Arabic language. While
the closest language to the Hawrami dialect is the Persian Chen, N.F., Shen, W. and Campbell, J.P., 2010. A linguistically-informative
approach to dialect recognition using dialect-discriminating context-dependent
Language.
phonetic models. In: 2010 IEEE International Conference on Acoustics, Speech
and Signal Processing. IEEE, pp.5014-5017.
V. Conclusion Chen, N.F., Shen, W., Campbell, J.P. and Torres-Carrasquillo, P.A., 2011.
Informative dialect recognition using context-dependent pronunciation modeling.
The result obtained in this study shows that the LBP features In: 2011 IEEE International Conference on Acoustics, Speech and Signal
for DR are useful especially when fused with phonetic based Processing (ICASSP). IEEE, pp.4396-4399.
feature like the LPC. The LBP characterizes the speech Choueiter, G., Zweig, G. and Nguyen, P., 2008. An empirical study of automatic
style, and therefore it is useful for DR more than language accent classification. In: 2008 IEEE International Conference on Acoustics,
recognition. The first contribution of this study is the use of Speech and Signal Processing. IEEE, pp.4265-4268.
the LBP set of feature for DR, which has not been used so Diakoloukas, V., Digalakis, V., Neumeyer, L. and Kaja, J., 1997. April.
far. The study also contributes in using NMDS to visualize Development of dialect-specific speech recognizers using adaptation methods.
CM to interpret the relations among different languages In: Acoustics, Speech, and Signal Processing, 1997. ICASSP-97. 1997 IEEE
for future works. For future work, it might be useful to International Conference. Vol. 2. IEEE, pp.1455-1458.
investigate the fusion for more models at the decision level. Guo, Z., Zhang, L. and Zhang, D., 2010. Rotation invariant texture classification
using LBP variance (LBPV) with global matching. Pattern Recognition, 43(3),
pp.706-719.
VI. Acknowledgment
Hirayama, N., Yoshino, K., Itoyama, K., Mori, S. and Okuno, H.G., 2015.
Authors would like to thank the Koya University and Automatic speech recognition for mixed dialect utterances by mixing dialect
Charmo University for their support during the preparation language models. IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 23(2), pp.373-382.
of this work.
Huang, R. and Hansen, J.H., 2007. Unsupervised discriminative training with
application to dialect classification. IEEE Transactions on Audio, Speech, and
References Language Processing, 15(8), pp.2444-2453.
Abdul, Z.K., Al-Talabani, A. and Abdulrahman, A.O., 2016. A new feature Patil, H.A. and Basu, T.K., 2009. A novel modified polynomial network design
extraction technique based on 1D local binary pattern for gear fault detection. for dialect recognition. In: Advances in Pattern Recognition, 2009. ICAPR’09.
Shock and Vibration, 2016, pp.6. Seventh International Conference. IEEE, pp.175-178.