Multithreaded Java Approach To Speaker Recognition: Radosław Weychan, Tomasz Marciniak, Adam Dąbrowski
Multithreaded Java Approach To Speaker Recognition: Radosław Weychan, Tomasz Marciniak, Adam Dąbrowski
292
point of view (no direct access to speaker model As the threads do not share any resources and are not
parameters when using GMM). Another fact is that almost dependent on each other, they are commonly called
none of presented toolkits makes use of multithreading, processes. The system automatically assigns time slots for
where modern CPUs have up to 8 or even more cores in them. The situation is much more complicated when the
some cases. The only package which uses multithreading threads are processing the same data source, or returning
is MSR Identity Toolbox, but it cannot be used outside data which is used by other threads, so they must be
Matlab. The number of papers related to parallel synchronized.
implementation of speaker recognition is very limited. The There are many solutions for synchronization between
presented solutions are based on ineffective methods like threads. The critical sections can be blocked for other
VQ [13] and were not shared to be used by others [14]. threads with the use of lock() method from
In this paper we present a multithread approach for RentrantLock class, or with the use of synchronized
speaker recognition with the use of general GMM keyword. The primitive data can be set as volatile, which
modeling algorithm. The main idea of this paper is to is information for the compiler and Java virtual machine
measure the reduction of time of the recognition process that it may be modified by more than one thread at a time.
while using multithreading in function of the number of Having this information, Java virtual machine can handle
processors and threads. The proposed approach was used the process.
in two scenarios as a technique for reducing total time of The low-level concurrency methods can be, in most
training/testing part of speaker recognition when using cases, replaced by high-level mechanisms like queues with
large amount of input files, and as a technique for reducing producers and consumers. The producers threads put
time for single speaker recognition in e.g. security and elements (results) in queue, and the consumers threads
access systems. We used Java programming language take them and process in other ways (sending, displaying
because it can be used in most of modern platforms, etc.). The queue allows for safe data exchange between
including small computers, smartphones and tablets as the threads. In the java.util.concurrent package the
implementations of the algorithms are independent of automatically synchronized queue is a blocking queue. It
processor type. Second reason is that Java has a very causes blocking of a thread while attempting to put an
extensive multithreading library. It is also very fast in element if the queue is full, or while attempting to remove
computations. The ANSI C and C++ are faster in general, an element if the queue is empty. Blocking queues are
but the codes written in these languages are CPU- used in tasks of coordination of many threads. Specific
dependent in contrast to JAVA. threads put results in queue, while other (dedicated)
The provided implementation of GMM is based on the threads take them (removing from queue) and continue to
implementation available in scikit-learn Python package, process them. The queue automatically controls the flow
as it is very intuitive to use, and also based on our previous of the work. If one set of threads works slower than the
solutions presented in [15, 16, 17, 18, 19]. The second set, the latter set must wait for the first one to finish
experiments showed that the training time was lower in the their job. There are few kinds of blocking queues in Java
case of Python implementation, but it uses lots of C++ like ArrayBlockingQueue, LinkedBlockingQueue,
code in time consuming maximization parts of K-means PriorityBlockingQueue, DelayQueue and
and GMM. In the case of testing part (computing log- SynchronousQueue. As the speaker-recognition task does
likelihood in this case), where no C++ parts were used, as not require a special order of data acquisition in training
Java is much faster. The detailed data will be presented in and testing part, it is also not time-triggered, and no
the following sections. priority for training/testing data is needed, the
ArrayBlockingQueue was chosen in this case as the best
solution. It is a bounded FIFO blocking queue backed by
III. CONCURRENCY IN JAVA
an array. The length of the array must be previously set in
The methods, classes and interfaces related to contrast to LinkedBlockingQueue, but it prevents from the
concurrency issue are supported by java.util.concurrent high memory consumption.
package. The basic way of creating new thread is to put the
task code inside the run() method of class which IV. SOFTWARE DESCRIPTION
implements the Runnable interface: The main part of the research was to develop the code
Class MyRunnableClass implements related to all steps of the speaker recognition: reading
Runnable unprocessed wav files, computing MFCC set from the
{ speech, modeling the coefficients with the use of GMM-
public void run() { EM algorithm initialized by K-means++. The written
//the task code software uses FFT function from jtransforms library. It
} was written in the pure Java. Thus it can be used in all
} machines, even tablets and smartphones. The producer-
In order to run the thread, the new instance of the class consumer scenario was chosen as the basis for further
must be created and set as a parameter of the new Thread improvements due to its functionality and simplicity.
object: The presented software contains three independent
Runnable r = new MyRunnableClass(); parts related to offline training and testing with the use of
Thread newThread = new Thread(r); speaker databases, and an approach for fast single speaker
newThread.start();
293
recognition which can be used in real-time scenario access Last solution for multithreaded real-time testing, is
system. suitable only for testing a single sentence. One thread is
The multithreaded offline speaker training runs one used for consequently filling the queue with models to be
thread for consequently filling the queue with training tested, and 2 or more threads for computing log-likelihood
files, and 2 or more threads for taking file handle from ratio. Result variable is shared between threads and the
queue, processing them (reading wav file, computing access to it must be synchronized. The workflow is
MFCC and modeling them with the use of GMM). Saving presented in Fig. 3.
all data (speaker ID and GMM parameters) is done with
the use of serialization method. The database (ArrayList<>
of speakers models) is shared among threads and must be
synchronized individually. Queue and the cooperated
consumer and producer threads are presented in Fig. 1.
The last experiment differed from the previous Fig. 9. Influence of number of threads on processing time
experiments the queue was filled with the speaker
models instead of files. This queue-related approach is
VI. CONCLUSIONS
suitable for a single file/sentence processing. In other cases
(training/testing with the use of all input sentences as in Multithreading can be used in two ways: as a technique
previous experiments), other synchronization techniques to accelerate speaker recognition to speed up the general
have to be applied. The queue may be filled again with training and testing parts and, moreover, to speed up a
models already processed when the current recognition single recognition e.g. in the security applications. In the
part did not finished and the next file started to be first case, the total time of training and testing has been
processed in this case also the number of threads may reduced over 5 times with the use of 8-core CPU and 8-9
increase unexpectedly. In this more complex testing case, threads. In the second case, the acceleration was about 3.5
all 5670 files were tested, but the solution for
296
and allowed for real-time recognition with the use of IEEE Signal Processing: Algorithms, Architectures, Arrangements, and
Applications (SPA), 2010, pp. 95-98.
speaker databases containing even thousands of speakers.
The presented software is available on the Github >@ R. Weychan, T. Marciniak, A. Stankiewicz, A. Dabrowski, Real
source code repository [21]. Time Recognition Of Speakers From Internet Audio Stream,
Foundations of Computing and Decision Sciences, Volume 40, Issue
3, September 2015, DOI: 10.1515/fcds-2015-0014, , pp. 223-233
>@ F. Zheng, G. Zhang, S. Zhanjiang, Comparison of different >@ Source code repository for Multithreaded speaker recognition
implementations of MFCC, Journal of Computer Science and project, https://github.com/audiodsp/Java-multithreaded-GMM-
Technology, vol. 16, no 6/2001, pp. 582-589. speaker-recognition
>@ D. Reynolds, Robust text-independent speake identification This work was prepared within the DS-2016 project.
using Gaussian mixture models, IEEE Trans. Speech Audio Proc., vol.
3, no 1/1995, pp. 72-83.
>@ D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using
adapted gaussian mixture models, Digital Signal Processing, no
10/2000, pp. 19-41.
>@ W. M. Campbell, D. E. Sturim, and D. A. Reynolds, Support
vector machines using gmm supervectors for speaker verication,
Signal Processing Letters, IEEE, vol. 13, no. 5/5006, pp. 308-311.
>@ P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Joint
factoranalysis versus eigenchannels in speaker recognition, Audio,
Speech, and Language Processing, IEEE Transactions on, vol. 15, no.
4/2007, pp. 1435-1447.
>@ M. Brooks. Voicebox: Speech processing toolbox for Matlab.
[Online]. Available:
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
>@ C. M. Bishop, Pattern Recognition and Machine Learning.
Springer Science & Business Media, 2006.
>@ S. O. Sadjadi, M. Slaney, and L. Heck, Msr identity toolbox v1. 0:
A matlab toolbox for speaker recognition research, Speech and
Language Processing Technical Committee Newsletter, 2013.
>@ Cambridge University Engineering Department, HTK: The
Hidden Markov Model Toolkit, 2004, [Online].
Available: http://htk.eng.cam.ac.uk/
>@ A. Larcher, J.-F. Bonastre, B. G. Fauve, K.-A. Lee, C. Levy, H. Li,J.
S. Mason, and J.-Y. Parfait, Alize 3.0-open source toolkit for state-of-
the-art speaker recognition., http://mistral.univ-avignon.fr/
index_en.html
[11]F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J.
Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E.
Duchesnay, Scikit-learn: Machine learning in Python, Journal of
Machine Learning Research, vol. 12/2011, pp. 2825-2830.
[12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H.
Witten, The WEKA Data Mining Software: An Update, SIGKDD
Explorations, Volume 11, Issue 1, 2009.
[13] R. Soganci, F. Gurgen, H. Tupcuoglu, Parallel Implementation of a
VQ-Based Text-Independent Speaker Identification, Advances in
Information Systems 2005, pp. 291-300
[14] T. Herbig, F. Gerl, W. Minker, Self-Learning Speaker Identification:
A System for Enhanced Speech Recognition, Springer, 2011.
[15] R. Weychan, T. Marciniak, A. Dabrowski, Implementation aspects
of speaker recognition using Python language and Raspberry Pi
platform, IEEE Signal Processing: Algorithms, Architectures,
Arrangements, and Applications (SPA), 2015, Poznan, 2015, pp. 162-
167.
[16] T. Marciniak, R. Weychan, A. Krzykowska, Speaker recognition
based on telephone quality short Polish sequences with removed silence,
Przeglad Elektrotechniczny 2012, vol. 88, no. 6, pp. 42-46.
[17] A. Dbrowski, T. Marciniak, A. Krzykowska and R. Weychan
Influence of silence removal on speaker recognition based on short
Polish sequences, Signal Processing Algorithms, Architectures,
Arrangements, and Applications SPA 2011, Poznan, pp. 159-163.
[18] T. Marciniak, R. Weychan, Sz. Drgas, A. Dbrowski,
A. Krzykowska, Speaker recognition based on short Polish sequences,
297