How To Use An Existing DNN Recognizer For Decoding in Kaldi
How To Use An Existing DNN Recognizer For Decoding in Kaldi
Tweet
Share
Introduction
If you’re reading this, I’m assuming that you’ve already downloaded and installed Kaldi and
successfully trained a DNN-HMM acoustic model along with a decoding graph.
If you’ve run one of the DNN Kaldi run.sh scripts from the example directory egs/ , then
you should be ready to go. You may want to start with the baseline script for nnet2 in the
Wall Street Journal example. The script is run_nnet2_baseline.sh
I originally wrote this very same post for GMM models, and now I want to make it for DNN.
We normally generate transcriptions for new audio with the Kaldi testing and scoring scripts,
so I just simply dug out the most important parts of these scripts to demonstrate in a concise
way how decoding can work.
What you see here is what I gather to be the simplest way to do decoding with a DNN in
Kaldi - it is by no means garanteed to be the best way to do decoding.
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 1/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
wav.scp
The first file you need is wav.scp . This is the only file that you need to make for your new
audio files. All the other files listed below should have already been created during the
training phase.
This should be the same format as the wav.svp file generated during training and testing.
It will be a two-column file, with the utterance ID on the left column and the path to the audio
file on the right column.
I’m just going to decode one audio file, so my wav.scp file is one line long, and it looks like
this:
mfcc.conf
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 2/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
Next, you should have a configuration file specifying how to extract MFCCs. You need to
extract the exact same number of features for this new audio file as you did in training. If not,
the existing acoustic model and new audio feature vectors will have a different number of
parameters. Comparing these two would be like asking where a 3-D point exists in 2-D
space, it doesn’t make sense. So, you don’t need to adjust anything in the config file. I used
MFCCs, and my config file looks like this:
final.mdl
Next, you need a trained DNN acoustic model, such as final.mdl . This should have been
produced in your training phase, and should be located somewhere like egs/your-
model/your-model-1/exp/nnet2/final.mdl . It doesn’t make too much sense to a
human, but here’s what the head of the file looks like:
josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/nnet2_online/nnet_
B<TransitionModel> <Topology>
S � � � # ) N j k �
u
w
�
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 3/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
|
�
HCLG.fst
The compiled decoding graph, HCLG.fst is a key part of the decoding process, as it
combines the acoustic model ( HC ), the pronunciation dictionary ( lexicon ), and the
language model ( G ).
You will notice this graph is not located in the same directory as the trained DNN acoustic
model. This is not a mistake. You must train a GMM-HMM before you train a DNN-HMM, and
you use the graph from the GMM-HMM in decoding.
This file, like the acoustic model shown above, doesn’t make too much sense to humans, but
in any case, here’s what the head of mine looks like:
josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/triphones_lda_mllt
���~ vectostandard B� o*��@�p @v� � �w@w� � `�@x� � T@y� @ ��@z� � `�@{
�@�� �
x�@�� 4
��@�� ~
)7 $A�� �
(�@��
j@�� v
(�@�� �
p�@�� D��@�� bh�@�� z A�� � �@�� �@�� j�o@�� p\@%� p\@&� �?'� p
�3��@�� �
7�$A�� �
h<0'A�� D� �&A�� @�@ �@ � H�@ � 0�@)� � P�@@ (�@*� �
� hKA
� H A+� � �1A,� � �>A-� �,A | A @ � A F �@ �N@XA P A "��@
h�@"�
��@#4
�A$~
�6A%�
� A&
��@5� v
��@6� �
��@)DX A*Tp/A+b��@,zH �@7� b
A-���@. L A8� j��@/ 8�@9� 8�@:� � @;� p l A�� � d
�30 A�� �
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 4/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
7`EA>� �
h<�FA?� D� �GA�� �@} p 8/A@� � XGAA� � � D\AB� � FO
�3H;A6�
7,[AH� �
h<�UAI� D� �\A9b�QtRA:j� |LAJ� p � A;� րA<� �sAK� � �FA=
�3�'AS~
7 �AT�
7�GAU�
h< GAQ� D� `GAVp |CA;� �<� z�AK� � (bA=� � �mA>� FO
words.txt
Lastly, if we want to be able to read our transcriptions as an utterance of words instead of a
list of intergers, we need to provide the mapping of word-IDs to words themselves.
HCLG.fst uses the intergers representing words without worrying about what the words
are. As such, we need words.txt to map from the list of intergers we get from decoding to
something readable.
This file should have been generated during the data preparation (training) phase.
josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/triphones_lda_mllt
<eps> 0
<SIL> 1
<unk> 2
а 3
аа 4
аалы 5
аарчы 6
аарчып 7
аарчысы 8
аарчысын 9
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 5/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
Step-by-Step Decoding
Assuming you’ve got all the files listed above in the right place, I’m now going to go step-by-
step through the decoding process.
compute-mfcc-feats \
--config=config/mfcc.conf \
scp:transcriptions/wav.scp \
ark,scp:transcriptions/feats.ark,transcriptions/feats.scp
Next, we can go straight to decoding from the MFCCs, because even though you probably
trained your GMM-HMM with deltas and delta+deltas, DNN acoustic models typically don’t
use them, because they splice frames at the input layer to take into acount time information.
nnet-latgen-faster \
--word-symbol-table=experiment/triphones_lda_mllt_sat/graph/words.tx
experiment/nnet2_online/nnet_a_baseline/final.mdl \
experiment/triphones_lda_mllt_sat/graph/HCLG.fst \
ark:transcriptions/feats.ark \
ark,t:transcriptions/lattices.ark;
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 6/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
lattice-best-path \
--word-symbol-table=experiment/triphones_lda_mllt_sat/graph/words.tx
ark:transcriptions/lattices.ark \
ark,t:transcriptions/one-best.tra;
utils/int2sym.pl -f 2- \
experiment/triphones_lda_mllt_sat/graph/words.txt \
transcriptions/one-best.tra \
> transcriptions/one-best-hypothesis.txt;
Conclusion
If you run all the above programs successfully, you should end up with a new file
transcriptions/one-best-hypothesis.txt , which will list your files and their
transcriptions.
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 7/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
27 Comments jrmeyer.github.io
1 Login
Sort by Best
Recommend ⤤ Share
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
nnet-latgen-faster
--word-symbol-table=/exp/tri4_nnet/graph/words.txt
/exp/tri4_nnet/final.mdl /exp/tri4_nnet/graph/HCLG.fst
ark:/decoding/mfcc/raw_mfcc_decoding.1.ark ark,t:/decoding/lattices.ark
what to do??
please help....
Sreelakshmi
△ ▽ • Reply • Share ›
Dear Sreelakshmi,
Please look around the comments on this post and others, this question has been
answered before.
-josh
△ ▽ • Reply • Share ›
Cheers!
Prashant
△ ▽ • Reply • Share ›
Hi Prashant,
I've not worked on this exact issue, but if you look at some of the
multilingual scripts for nnet3 in the multilingual babel egs dir you might find
something.
Those scripts will switch final layers between languages and share hidden
layers... maybe you could use them somehow.
-josh
△ ▽ • Reply • Share ›
Hi Hari!
It's definitely possible... but you would ideally have to use the same senome
targets for the output layer.
Otherwise you strip off the last weight matrix and train a new one with your new
data.
You'll really want the input vectors to be exactly the same, tho.
Check out the literature on acoustic model adaptation... there's a lot out there. And
there are some kaldi scripts on multi-task learning with rm and wsj I think
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 9/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
-Josh
△ ▽ • Reply • Share ›
Thanks again !
Hari
△ ▽ • Reply • Share ›
Dear Hari,
I would love to do such a tutorial, but I won't have time in the near
future I think.
Let me know where you get with it, and I'll try to help, but I can't
devote enough time to do a walk through yet:(
-Josh
△ ▽ • Reply • Share ›
Thanks
Ravi
△ ▽ • Reply • Share ›
Dear Ravi,
I haven't no, I've been mostly working on my own data set lately.
-josh
△ ▽ • Reply • Share ›
Goodluck!
△ ▽ • Reply • Share ›
Thanks
Ravi
△ ▽ • Reply • Share ›
I'm interested as to what you mean exactly here... do you mean if you have one
audio file with multiple speakers you want to have a transcript where the speakers
are explicitly marked?
Such as:
I don't know how to do this right now... did you find an answer?
-josh
△ ▽ • Reply • Share ›
I've been working on this problem for my job (where we're trying to
transcribe phone calls). Most of this challenge is not `kaldi`-specific. By that
I mean, most of it was preprocessing. For example, in our case, each
speaker is on a channel of a stereo recording. So we simply process each
channel separately and are able to recover the speaker of each utterance.
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 11/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
Thanks
△ ▽ • Reply • Share ›
Hi Venki,
To be honest, I hadn't heard of the Aspire Challenge, and in case people reading
this are interested, here's the official site: https://www.innocentive.com...
It's a very relevant task for anyone working with ASR in reverberant settings.
-josh
△ ▽ • Reply • Share ›
Thanks
Ravi
△ ▽ • Reply • Share ›
But I have the directory and the file very much there
Any idea ?
Thanks
Ravi
△ ▽ • Reply • Share ›
I think you just have a PATH issue... did you figure this out?
-josh
△ ▽ • Reply • Share ›
Hi Ashwin!
You have to make sure you have the appropriate nnet & nnet2 bin dirs from
**kaldi/src/** included in your **path.sh** script.
Or you can call the compiled C++ programs from their absolute or relative paths
(relative to the current working directory).
-Josh
△ ▽ • Reply • Share ›
ALSO ON JRMEYER.GITHUB.IO
Josh Meyer's Website JRMeyer I'm an NSF Graduate Research Fellow and
joshmeyerphd PhD candidate at the University of Arizona.
I work on automatic speech recognition,
NLP, and machine learning. This blog is
some of what I'm learning along the way.
All opinions are my own.
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 14/14