167 574 3 PB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/316449552

Kurdish Dialects and Neighbor Languages Automatic Recognition

Article  in  ARO-The Scientific Journal of Koya University · April 2017


DOI: 10.14500/aro.10167

CITATIONS READS
8 281

3 authors:

Abdulbasit Al-Talabani Zrar Khalid Abdul


Koya University Charmo University
16 PUBLICATIONS   72 CITATIONS    17 PUBLICATIONS   60 CITATIONS   

SEE PROFILE SEE PROFILE

Azad Ameen
Charmo University
5 PUBLICATIONS   21 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Currently, we are focusing on Smart-City services including tracking & fall-detecting (of elderly people!) services View project

Aspect Oriented Programming Research View project

All content following this page was uploaded by Abdulbasit Al-Talabani on 19 April 2020.

The user has requested enhancement of the downloaded file.


20http://dx.doi.org/10.14500/aro.10167

Kurdish Dialects and Neighbor Languages Automatic


Recognition
Abdulbasit K. Al-Talabani1, Zrar K. Abdul2, Azad A. Ameen2
1
Department of Software Engineering, Faculty of Engineering, Koya University,
Daniel Mitterrand Boulevard, Koya KOY45, Kurdistan Region, Iraq
2
Department of Computer Science, College of Basic Education, Charmo University,
Kurdistan Region, Iraq

Abstract–Dialect recognition is one of the hottest topics in the collected data. Consequently, generalizing the developed
speech analysis area. In this study, a system for dialect and language algorithms starting with the set of the used feature or the
recognition is developed using phonetic and a style-based features. classification methods is generally non-convincing. For this
The study suggests a new set of feature using one-dimensional local reason, some studies focus on using collected data under
binary pattern (LBP). The results show that the proposed LBP set
specific condition which “preserve” the real-life characteristic
of the feature is useful to improve dialect and language recognition
accuracy. The acquired data involved in this study are three of the data. A study made by Huang and Hansen (2007)
Kurdish dialects (Sorani, Badini, and Hawrami) with three neighbor addresses novel advances in unsupervised spontaneous DR in
languages (Arabic, Persian, and Turkish). The study proposed a English and Spanish. The problem considers the case where
new method to interpret the closeness of the Kurdish dialects and no transcripts are available for training and test data, and
their neighbor languages using confusion matrix and a non-metric speakers are talking spontaneously. In this study, we adopt
multi-dimensional visualization technique. The result shows that the use of spontaneous speech signals recorded from show
the Kurdish dialects can be clustered and linearly separated from and debate TV programs.
the neighbor languages. In the literature, some of the studies focus on investigating
Index Terms—Dialect recognition, Language processing, the nature of dialect speech signals. For example, in Bahari,
Speech analysis, Machine learning, Local binary pattern. et al. (2014) a non-negative factor analysis approach is
developed for Gaussian mixture model (GMM) weight
decomposition and adaptation. Their study show that GMM
1. Introduction weights carry less, yet complimentary, information to GMM
Dialect is the language variation of a population established means for language and DR. In addition, in Patil and Basu,
based on various real-life conditions (Chen, et al., 2010). 2009, a new method of machine learning, called modified
Recently, dialect recognition (DR) has become a hot topic for its polynomial networks is proposed for the DR problem in an
wide applications in speech recognition and forensic. Adapted Indian language. The proposed algorithm for machine learning
speech recognition system needs different tools such as the is interpreted as designing a neural network by viewing it as
recognition of the dialect or the accent to normalize the speech a curve fitting (approximation) problem in a high-dimensional
samples for speech recognition system. For example, Hirayama, space with the help of radial-basis functions.
et al. (2015) develop an automatic speech recognition system The research of language and DR is widely using template
that accepts a mixture of various kinds of dialects. based and/or phonetic based techniques. The template-
There are several challenges in DR research area, such based DR adopts the use of global parameters of the speech
as the collection of speech data, which needs to model signal regardless the specific characteristics of the available
the diversity of the studied dialects and/or languages phonemes related to each dialect. This kind of studies has
(Diakoloukas, et al., 1997). The conclusions made by the been frequently used as in Choueiter, et al. (2008) which find
researches on DR are mostly restricted to the available that a purely acoustic approach based on a combination of
heteroscedastic linear discriminant analysis and maximum
mutual information training is very effective. However,
ARO-The Scientific Journal of Koya University phonetic-based DR is also adopted and compared with
Volume V, No 1(2017), Article ID: ARO.10167, 4 pages acoustic and token-based DR and also found to be effective
DOI: 10.14500/aro.10167
as in Diakoloukas, et al. (1997).
Received 27 October 2016; Accepted 04 April 2017
Regular research paper: Published 24 April 2017
Another approach that adopted for DR is phonetic based
Corresponding author’s e-mail: [email protected] recognition of dialect. This approach adopts the use of local
Copyright © 2017 Abdulbasit K. Faeq, Zrar K. Abdul and Azad A. Ameen. feature that reflects the presence of various phonemes in
This is an open-access article distributed under the Creative Commons each language or dialect. For example, Chen, et al. propose
Attribution License. supervised and unsupervised learning algorithms to extract

ARO p-ISSN: 2410-9355, e-ISSN: 2307-549X


http://dx.doi.org/10.14500/aro.10167  21

dialect discriminating phonetic rules and use these rules


P f  x [i + r − p / 2] − x[i] 2r + 
to adapt biphones to identify dialects. They discovered that −1
2  
 
dialect discriminating biphones compatible with the linguistic
literature while outperforming a baseline monophone system
LBPP ( x [i ]) =∑  
r = 0 s  x [i + r + 1] − x [i ] 2
r+ 
p
2  

by 7.5% (Chen, et al., 2010). While in Chen, et al. (2011),    


(1)
the authors propose an informative DR system that learns
Where f is the sign function:
phonetic transformation rules and uses them to identify
dialects. A hidden Markov model is used to align reference 0, x < 0
f (x) = 
phones with dialect-specific pronunciations to characterize 1, x ≥ 0 (2)
when and how often substitutions, insertions, and deletions
And x[i] is the signal and p is the number of considered
occur.
This study adopts a template-based DR from speech signal neighbors. The Sign function f[x] transforms the differences
using global phonetic based features. It also introduces a new to a P-bit binary code.
style-based feature (one-dimensional local binary pattern In this paper, only eight neighbors are considered (four
[1DLBP]), which is not used in DR so far. The study used to the left of the center and four to the right). Equation (1)
data recorded from three Kurdish dialects (Sorani, Badini, illustrates how the 1DLBP is evaluated. Hence, the value
and Hawrami). It also involves Arabic, Persian and Turkish range of the new signal is between 0 and 255. The obtained
as three neighbor languages to study how independent the signal is discriminated into two parts, uniform and non-
Kurdish dialects from those languages, which supposed uniform number. The uniform number comprises the
to have an influence on each other based on cultural and numbers with fewer than or equal to two transition bits
geographical interactions. The study proposes a method to from 1 to 0 or 0 to 1 in their circular bit patterns. The
visualize the recognizer confusion between different dialects non-uniform numbers have more than two transition bits.
and languages. For instance, the patterns 11111111 (0 transitions) and
The rest of this paper is structured based on the following 10001111 (2 transitions) are uniform, while the patterns
sections: In section, two feature extraction procedures are 10101 (4 transitions) and 01010111 (6 transitions) are non-
presented, followed by the description of the data used uniform. There are 58 uniform numbers in the range 0–255
in section three, next to that the methodology is shown in and the rest are non-uniform numbers. The histogram is
section four, and finally discussion of the result and the computed such that an independent bin represents each
conclusion are presented in sections five and six. uniform number, while all the non-uniform numbers
are represented in one bin. Therefore, the set of features
consists of 59 bins, 58 of them for each uniform number
II. Feature Extraction and one bin for all non-uniform numbers. These bins
As any pattern recognition process, the DR includes are utilized as features of the dialect speech signals. The
some major steps starting with feature extraction. In DR, number of bins in the histogram depends on how many
mel frequency cypstrum coefficients (MFCC) and linear neighbors are considered. Fig. 1 demonstrates a 1DLBP
prediction coefficients (LPC) based features are well known operator for number of neighbors (p=6), with the center
for its capability to model the phonetic characteristic of the sample as given. After processing 1DLBP, the 6-neighbor
speech signal (Choueiter, et al., 2008; Patil and Basu, 2009). samples in the example above produce the 100101 codes.
In this study, global features (average and standard deviation) The code is then converted to a decimal system number
of 12 MFCC and 12 LPC on windows of length 30 and (=37) and substituted in the same index of the center
15 ms overlap are computed. However, besides the MFCC sample.
and LPC, the study introduces a 1DLBP feature, which
model the style of the speech, and investigates its benefit for A. Data Discription
DR. 1DLBP is adopted in many other applications such as
Data acquisition is an important task in any classification
Guo, et al. (2010) and Abdul, et al. (2016).
process. The data collected in this paper consists of three
The 1DLBP operator labels every single value of the
vibration signal by considering its neighborhoods and Kurdish dialects (Sorani, Badini, and Hawrami), and three
using the value of the center position as a threshold for the different languages (Arabic, Persian, and Turkish) recorded
neighborhoods. If the neighbor value is less than the center from TV broadcasts. For each dialect and individual
value, the value of the neighbor will turn to 0; otherwise, it
turns to 1. A LBP code for a neighborhood is then produced.
The decimal value of the LBP binary code presents the local
structural knowledge around the fixed value.
The histogram of the 1DLBP signal displays how
often these various patterns appear in a given signal. The
distribution of the patterns denotes the whole structure of
the signal. The 1DLBP operation of a sample value can be
defined as: Fig. 1. One-dimensional local binary pattern, number of neighbors (p=6).

 ARO p-ISSN: 2410-9355, e-ISSN: 2307-549X


22http://dx.doi.org/10.14500/aro.10167

language, 15 different speakers are involved; each speaker obtained for dialect and language recognition obtained by
has three different 2 s duration recordings. Consequently, fusing LPC and LBP features.
45 samples are recorded for each dialect and each language. The second aim of this study is to show how close each
The total length is 6 s for one speaker and 90 s for each Kurdish dialect to the neighbor languages as an attempt
individual dialect or language. to study the influence of the neighbor languages and the
Kurdish dialects on each other from a phonetic and style
based of view. For this purpose, CM of the accuracy results
III. Methodology is used and visualized by an NMDS technique using SPSS
The procedure of DR in this study adopts the use of software. The CM of the highest result obtained by fusing
three sets of features, which are MFCC, LPC, and LBP. LPC and LBP and the visualized form using NMDS are
Individual feature sets and their fusions at the feature level shown in Table II and Fig. 2, respectively.
feed a pairwise based support vector machine classifier, This study suggests to interpret the relations and the
with linear kernel function and sequential minimal influence of different languages and dialects through the CM
optimization optimization method. The protocol used in of the recognition procedure.
this study use the whole set of the data and validate them From Fig. 2, we can clearly observe that the Kurdish
using leave one sample out validation approach. Fusion at dialects are clustered in the top of the graph such that it can
the feature level for a couple sets of feature and the whole
sets of features is computed. To visualize the relation TABLE I
Recognition accuracy (%) of experiments using various features and
between the classes, confusion matrix (CM) a no-metric
their fusions
multi-dimensional scaling (NMDS) is adopted. NMDS
Feature sets Kurdish DR accuracy Languages DR accuracy
is an optimization procedure that aims to estimate the
non-metric relations between different objects. To show MFCC 71 67.8
LPC 78 71.8
the significance of the improvement, a chi-square test is
LBP 76 46
used and p value is computed for each comparison made LBP‑MFCC 88.9 73
between the results. LBP‑LPC 89.6 81.1
LPC‑MFCC 74.8 70
ALL 88.2 80
IV. Result and Discussion MFCC: Mel frequency cypstrum coefficients, LPC: Linear prediction coefficients,
LBP: Local binary pattern, DR: Dialect recognition
Using the method presented in the past section, experiments
are conducted for the three different types of feature
TABLE II
(MFCC, LPC, and LBP) and their fusion at the feature CM for the whole involved classes using LBP and LPC features
level. Table I shows the obtained recognition accuracy for
LBP_LPC Sorani Hawrami Badini Arabic Persian Turkish
both Kurdish dialects and the involved languages. Based on
Sorani 36 3 5 1 0 0
the phonetic characteristic of the MFCC and LPC features,
Hawrami 2 39 0 2 2 0
we can observe how both of these features are similarly Badini 2 0 40 3 0 0
contribute in dialect and language recognition. While the Arabic 4 0 2 36 0 3
pattern regarding the LPB feature, which reflects the style Persian 1 2 0 0 38 4
characteristic of the speech signal, is totally different Turkish 7 0 0 2 6 30
between the Kurdish dialects recognition from one side and CM: Confusion matrix, LPC: Linear prediction coefficients, LBP: Local binary pattern
the languages recognition from the other sides (76-46%).
This could be interpreted by the observation that the dialects
of the same language are mostly different in style of the
speech, while the languages are phonetically different.
In the other hand, from the fusion-based experiments,
it can be clearly observed how the LBP fusion with both
MFCC and LPC can significantly improve the recognition
accuracy for Kurdish dialects (from 71% to 88.9% with
p=5.1E-9, and from 78% to 89.6% with p=0.001, for both
MFCC and LPC, respectively) and also for Language
recognition (from 67.8% to 73% with p=0.02 and from
71.8% to 81.1% with p=0.002 with both of MFCC
and LPC, respectively). This improvement reflects the
complementarity characteristic of the LBP feature to the
widely used phonetic based features (MFCC and LPC in
this study). This complementarity of LBP to both MFCC
and LPC is also supported by the non-improved recognition Fig. 2. No-metric multi-dimensional scaling figure for the confusion matrix
accuracy when MFCC and LPC are fused. The best result shown in Table II.

ARO p-ISSN: 2410-9355, e-ISSN: 2307-549X


http://dx.doi.org/10.14500/aro.10167  23

be separated linearly from the involved languages. Another Bahari, M.H., Dehak, N., Burget, L., Ali, A.M. and Glass, J., 2014. Non negative
observation is that the Sorani and Badiny dialects are closer factor analysis of gaussian mixture model weight adaptation for language and
dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language
to each other than the Hawrami dialect and the nearest Processing, 22(7), pp.1117-1129.
language to these two dialects is the Arabic language. While
the closest language to the Hawrami dialect is the Persian Chen, N.F., Shen, W. and Campbell, J.P., 2010. A  linguistically-informative
approach to dialect recognition using dialect-discriminating context-dependent
Language.
phonetic models. In: 2010 IEEE International Conference on Acoustics, Speech
and Signal Processing. IEEE, pp.5014-5017.

V. Conclusion Chen, N.F., Shen, W., Campbell, J.P. and Torres-Carrasquillo, P.A., 2011.
Informative dialect recognition using context-dependent pronunciation modeling.
The result obtained in this study shows that the LBP features In: 2011 IEEE International Conference on Acoustics, Speech and Signal
for DR are useful especially when fused with phonetic based Processing (ICASSP). IEEE, pp.4396-4399.
feature like the LPC. The LBP characterizes the speech Choueiter, G., Zweig, G. and Nguyen, P., 2008. An empirical study of automatic
style, and therefore it is useful for DR more than language accent classification. In: 2008 IEEE International Conference on Acoustics,
recognition. The first contribution of this study is the use of Speech and Signal Processing. IEEE, pp.4265-4268.
the LBP set of feature for DR, which has not been used so Diakoloukas, V., Digalakis, V., Neumeyer, L. and Kaja, J., 1997. April.
far. The study also contributes in using NMDS to visualize Development of dialect-specific speech recognizers using adaptation methods.
CM to interpret the relations among different languages In: Acoustics, Speech, and Signal Processing, 1997. ICASSP-97. 1997 IEEE
for future works. For future work, it might be useful to International Conference. Vol. 2. IEEE, pp.1455-1458.
investigate the fusion for more models at the decision level. Guo, Z., Zhang, L. and Zhang, D., 2010. Rotation invariant texture classification
using LBP variance (LBPV) with global matching. Pattern Recognition, 43(3),
pp.706-719.
VI. Acknowledgment
Hirayama, N., Yoshino, K., Itoyama, K., Mori, S. and Okuno, H.G., 2015.
Authors would like to thank the Koya University and Automatic speech recognition for mixed dialect utterances by mixing dialect
Charmo University for their support during the preparation language models. IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 23(2), pp.373-382.
of this work.
Huang, R. and Hansen, J.H., 2007. Unsupervised discriminative training with
application to dialect classification. IEEE Transactions on Audio, Speech, and
References Language Processing, 15(8), pp.2444-2453.

Abdul, Z.K., Al-Talabani, A. and Abdulrahman, A.O., 2016. A new feature Patil, H.A. and Basu, T.K., 2009. A novel modified polynomial network design
extraction technique based on 1D local binary pattern for gear fault detection. for dialect recognition. In: Advances in Pattern Recognition, 2009. ICAPR’09.
Shock and Vibration, 2016, pp.6. Seventh International Conference. IEEE, pp.175-178.

 ARO p-ISSN: 2410-9355, e-ISSN: 2307-549X


View publication stats

You might also like