0% found this document useful (0 votes)

221 views14 pages

How To Use An Existing DNN Recognizer For Decoding in Kaldi

The document provides instructions for using an existing deep neural network (DNN) recognizer trained in Kaldi to decode new audio files. It lists the necessary files, which include a wav.scp file linking utterance IDs to audio paths, an mfcc.conf file specifying MFCC extraction, a final.mdl file containing the trained DNN model, an HCLG.fst file containing the decoding graph, and a words.txt file mapping word IDs to words. It then walks through the decoding process step-by-step, which involves extracting MFCCs from the new audio, using the DNN model and decoding graph to generate lattices of hypothesized transcriptions.

Uploaded by

pathasreedhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

221 views14 pages

How To Use An Existing DNN Recognizer For Decoding in Kaldi

Uploaded by

pathasreedhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Josh Meyer's Website About

How to use an Existing DNN

Recognizer for Decoding in Kaldi
Jan 10, 2017

This post is essentially a walk through of this shell script.

Introduction
If you’re reading this, I’m assuming that you’ve already downloaded and installed Kaldi and
successfully trained a DNN-HMM acoustic model along with a decoding graph.

If you’ve run one of the DNN Kaldi run.sh scripts from the example directory egs/ , then
you should be ready to go. You may want to start with the baseline script for nnet2 in the
Wall Street Journal example. The script is run_nnet2_baseline.sh

I originally wrote this very same post for GMM models, and now I want to make it for DNN.

We normally generate transcriptions for new audio with the Kaldi testing and scoring scripts,
so I just simply dug out the most important parts of these scripts to demonstrate in a concise
way how decoding can work.

What you see here is what I gather to be the simplest way to do decoding with a DNN in
Kaldi - it is by no means garanteed to be the best way to do decoding.

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 1/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Things you need

# INPUT:
# transcriptions/
# wav.scp
#
# config/
# mfcc.conf
#
# experiment/
# nnet2_online/
# nnet_a_baseline/
# final.mdl
#
# triphones_lda_mllt_sat/
# graph/
# HCLG.fst
# words.txt

wav.scp
The first file you need is wav.scp . This is the only file that you need to make for your new
audio files. All the other files listed below should have already been created during the
training phase.

This should be the same format as the wav.svp file generated during training and testing.
It will be a two-column file, with the utterance ID on the left column and the path to the audio
file on the right column.

I’m just going to decode one audio file, so my wav.scp file is one line long, and it looks like
this:

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/transcriptions$ cat wav.scp

atai_45 input/audio/atai_45.wav

mfcc.conf

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 2/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Next, you should have a configuration file specifying how to extract MFCCs. You need to
extract the exact same number of features for this new audio file as you did in training. If not,
the existing acoustic model and new audio feature vectors will have a different number of
parameters. Comparing these two would be like asking where a 3-D point exists in 2-D
space, it doesn’t make sense. So, you don’t need to adjust anything in the config file. I used
MFCCs, and my config file looks like this:

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/config$ cat mfcc.conf

--sample-frequency=16000
--num-mel-bins=40 # similar to Google's setup.
--frame-length=25 # the default is 25
--frame-shift=10 # default is 10
--high-freq=0 # relative to Nyquist (ie. 8000 +/- x for 16k sampling rat
--low-freq=0 # low cutoff frequency for mel bins
--num-ceps=13
--window-type=hamming # Dans window is default
--use-energy=true

final.mdl
Next, you need a trained DNN acoustic model, such as final.mdl . This should have been
produced in your training phase, and should be located somewhere like egs/your-
model/your-model-1/exp/nnet2/final.mdl . It doesn’t make too much sense to a
human, but here’s what the head of the file looks like:

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/nnet2_online/nnet_
B<TransitionModel> <Topology>

�� @? �> @? �> @? �> ��

S � � � # ) N j k �

u
w
�

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 3/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

|
�

HCLG.fst
The compiled decoding graph, HCLG.fst is a key part of the decoding process, as it
combines the acoustic model ( HC ), the pronunciation dictionary ( lexicon ), and the
language model ( G ).

You will notice this graph is not located in the same directory as the trained DNN acoustic
model. This is not a mistake. You must train a GMM-HMM before you train a DNN-HMM, and
you use the graph from the GMM-HMM in decoding.

This file, like the acoustic model shown above, doesn’t make too much sense to humans, but
in any case, here’s what the head of mine looks like:

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/triphones_lda_mllt
��~ vectostandard B� o*��@�p @v� � �w@w� � `�@x� � T@y� @ ��@z� � `�@{

�@��
x�@�� 4
��@�� ~
)7 $A��
(�@��
j@�� v
(�@��
p�@�� D��@�� bh�@�� z A�� @�� @�� j�o@�� p\@%� p\@&� �?'� p

�3��@��
7�$A��
h<0'A�� D� �&A�� @�@ �@ � H�@ � 0�@)� � P�@@ (�@*� �
� hKA
� H A+� � �1A,� � �>A-� �,A | A @ � A F �@ �N@XA P A "��@
h�@"�
��@#4
�A$~
�6A%�
� A&
��@5� v
��@6� �
��@)DX A*Tp/A+b��@,zH �@7� b
A-��@. L A8� j��@/ 8�@9� 8�@:� � @;� p l A�� d
�30 A��
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 4/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

7`EA>� �
h<�FA?� D� �GA�� @} p 8/A@� � XGAA� � � D\AB� � FO
�3H;A6�
7,[AH� �
h<�UAI� D� �\A9b�QtRA:j� |LAJ� p � A;� րA<� �sAK� � �FA=
�3�'AS~
7 �AT�
7�GAU�
h< GAQ� D� `GAVp |CA;� �<� z�AK� � (bA=� � �mA>� FO

�2‫ڎ‬AP �26�AQ �2� AS� " �2r�AR

words.txt
Lastly, if we want to be able to read our transcriptions as an utterance of words instead of a
list of intergers, we need to provide the mapping of word-IDs to words themselves.
HCLG.fst uses the intergers representing words without worrying about what the words
are. As such, we need words.txt to map from the list of intergers we get from decoding to
something readable.

This file should have been generated during the data preparation (training) phase.

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/experiment/triphones_lda_mllt
<eps> 0
<SIL> 1
<unk> 2
а 3
аа 4
аалы 5
аарчы 6
аарчып 7
аарчысы 8
аарчысын 9

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 5/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Step-by-Step Decoding
Assuming you’ve got all the files listed above in the right place, I’m now going to go step-by-
step through the decoding process.

Audio –> Feature Vectors

First, we’re going to extract MFCCs from the audio according to the specifications listed in
the mfcc.conf file. At this point, we give as input (1) our configuration file and (2) our list
of audio files, and we get as output (1) ark and scp feature files.

compute-mfcc-feats \
--config=config/mfcc.conf \
scp:transcriptions/wav.scp \
ark,scp:transcriptions/feats.ark,transcriptions/feats.scp

Next, we can go straight to decoding from the MFCCs, because even though you probably
trained your GMM-HMM with deltas and delta+deltas, DNN acoustic models typically don’t
use them, because they splice frames at the input layer to take into acount time information.

Trained DNN-HMM + Feature Vectors –> Lattice

Now that we have feature vectors from our new audio in the appropriate shape, we can use
our acoustic model and decoding graph to generate lattices of hypothesized transcriptions.
This program takes as input (1) our word-to-symbol table, (2) a trained acoustic model, (3) a
compiled decoding graph, and (4) the features from our new audio, and we are returned (1)
a file of lattices.

nnet-latgen-faster \
--word-symbol-table=experiment/triphones_lda_mllt_sat/graph/words.tx
experiment/nnet2_online/nnet_a_baseline/final.mdl \
experiment/triphones_lda_mllt_sat/graph/HCLG.fst \
ark:transcriptions/feats.ark \
ark,t:transcriptions/lattices.ark;

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 6/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Lattice –> Best Path Through Lattice

Some people might be happy to stop with the lattice, and do their own post-processing, but I
think many people will want a single best-guess transcription for the audio. The following
program takes as input (1) the generated lattices from above and (2) the word-to-symbol
table and returns (1) the best path through the lattice.

lattice-best-path \
--word-symbol-table=experiment/triphones_lda_mllt_sat/graph/words.tx
ark:transcriptions/lattices.ark \
ark,t:transcriptions/one-best.tra;

Best Path Intergers –> Best Path Words

The best path that we get above will display a line of intergers for each transcription. This
isn’t very useful for most applications, so here is how we can substitute the intergers for the
words they represent.

utils/int2sym.pl -f 2- \
experiment/triphones_lda_mllt_sat/graph/words.txt \
transcriptions/one-best.tra \
> transcriptions/one-best-hypothesis.txt;

Conclusion
If you run all the above programs successfully, you should end up with a new file
transcriptions/one-best-hypothesis.txt , which will list your files and their
transcriptions.

I hope this was helpful!

If you have any feedback or questions, don’t hesitate to leave a comment!

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 7/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

27 Comments jrmeyer.github.io 
1 Login

Sort by Best
 Recommend ⤤ Share

Join the discussion…

Name

Sreelakshmi K R • a year ago

I am trying to decode wav files using existing DNN model in kaldi.
This helps me a lot.... Thank you...

while running following command :

nnet-latgen-faster
--word-symbol-table=/exp/tri4_nnet/graph/words.txt
/exp/tri4_nnet/final.mdl /exp/tri4_nnet/graph/HCLG.fst
ark:/decoding/mfcc/raw_mfcc_decoding.1.ark ark,t:/decoding/lattices.ark

I got this error message...

ERROR (nnet-latgen-faster:NnetComputer():nnet-compute.cc:71) Feature dimension is

13 but network expects 40

what to do??

please help....

Sreelakshmi
△ ▽ • Reply • Share ›

Josh Meyer Mod > Sreelakshmi K R • a year ago

Dear Sreelakshmi,

Please look around the comments on this post and others, this question has been
answered before.

You're feeding in raw mfccs to a system that expects lda probably.

-josh
△ ▽ • Reply • Share ›

Sreelakshmi K R > Josh Meyer • 2 months ago

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 8/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
y g

I understand. But I need to use the model that is trained with

LDA+MLLT+SAT features. How can I extract these features from wave files
and use them to decode them?

Any help appreciated.

I know its an old post, but I need help.
△ ▽ • Reply • Share ›

Hari • a year ago

Hi ! Would you know if it is possible to continue training an already existing acoustic
model with new data, like for instance the pre trained Aspire model ?
△ ▽ • Reply • Share ›

prashant maheshwari > Hari • 8 months ago

Hi Hari/Josh,

Is there any development on this as I am also looking to do this.

Any lead will be helpful.

P.S. I am new to kaldi, so don't have much knowledge in it.

Cheers!
Prashant
△ ▽ • Reply • Share ›

Josh Meyer Mod > prashant maheshwari • 8 months ago

Hi Prashant,

I've not worked on this exact issue, but if you look at some of the
multilingual scripts for nnet3 in the multilingual babel egs dir you might find
something.

Those scripts will switch final layers between languages and share hidden
layers... maybe you could use them somehow.

-josh
△ ▽ • Reply • Share ›

Josh Meyer Mod > Hari • a year ago

Hi Hari!

It's definitely possible... but you would ideally have to use the same senome
targets for the output layer.

Otherwise you strip off the last weight matrix and train a new one with your new
data.

You'll really want the input vectors to be exactly the same, tho.

Check out the literature on acoustic model adaptation... there's a lot out there. And
there are some kaldi scripts on multi-task learning with rm and wsj I think

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 9/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

-Josh
△ ▽ • Reply • Share ›

Hari > Josh Meyer • a year ago

Thank you very much Josh ! Would you know a tutorial for Kaldi in order to
adapt the acoustic model considering that I use the same senome targets ?
Furthermore, if I want to adapt it for a new language, do you know if your
second method would perform well (trip off the last weight matrix and train
a new one and of course modifying all the files linked corresponding to the
language model ) ?

Thanks again !
Hari
△ ▽ • Reply • Share ›

Josh Meyer Mod > Hari • a year ago

Dear Hari,

I would love to do such a tutorial, but I won't have time in the near
future I think.

I would recommend you go to the Kaldi-help Google group and

peice together what others have done.

Let me know where you get with it, and I'll try to help, but I can't
devote enough time to do a walk through yet:(

-Josh
△ ▽ • Reply • Share ›

Venki Ravi • a year ago

Josh,

A QQ. Have you got a chance to work on SRE10 Kaldi recipe?

Thanks

Ravi
△ ▽ • Reply • Share ›

Josh Meyer Mod > Venki Ravi • a year ago

I've been saying "Venki" as your first name... sorry!

Dear Ravi,

I haven't no, I've been mostly working on my own data set lately.

-josh
△ ▽ • Reply • Share ›

Venki Ravi > Josh Meyer • a year ago

Sure.
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 10/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Goodluck!
△ ▽ • Reply • Share ›

Venki Ravi • a year ago

Josh

Any idea how to break the transcription by speaker id?

Thanks

Ravi
△ ▽ • Reply • Share ›

Josh Meyer Mod > Venki Ravi • a year ago

Have you figured this out yet, Venki?

I'm interested as to what you mean exactly here... do you mean if you have one
audio file with multiple speakers you want to have a transcript where the speakers
are explicitly marked?

Such as:

speaker 1: the sky is blue

speaker 2: that's exactly what I thought!
speaker 1: wow

Is that what you're going for?

I don't know how to do this right now... did you find an answer?

Another cool problem:)

-josh
△ ▽ • Reply • Share ›

Michael Capizzi > Josh Meyer • a year ago

Hi guys -

I've been working on this problem for my job (where we're trying to
transcribe phone calls). Most of this challenge is not `kaldi`-specific. By that
I mean, most of it was preprocessing. For example, in our case, each
speaker is on a channel of a stereo recording. So we simply process each
channel separately and are able to recover the speaker of each utterance.

If Ravi means "break the transcription by speaker id" to simply mean

sorting by speaker, I'd imagine you'd be able to use `spk2utt` to sort the
outputting transcripts. `spk2utt` has the `speaker_id` and then a space-
delimited list of all of her/his `utterance_id`s
△ ▽ • Reply • Share ›

Venki Ravi • a year ago

Josh

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 11/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

I was using nnet 3 for aspire model .

Thanks
△ ▽ • Reply • Share ›

Josh Meyer Mod > Venki Ravi • a year ago

Hi Venki,

Sorry I missed these two messages from 19 days ago...

I haven't done anything with nnet3 yet, no.

I also haven't done anything with the aspire dataset, either.

To be honest, I hadn't heard of the Aspire Challenge, and in case people reading
this are interested, here's the official site: https://www.innocentive.com...

It's a very relevant task for anyone working with ASR in reverberant settings.

-josh
△ ▽ • Reply • Share ›

Venki Ravi • a year ago

Josh,

I am getting the following error:

ERROR (nnet-latgen-faster:ExpectToken():io-funcs.cc:201) Expected token "<nnet>", got

instead "<dimension>".

Thanks
Ravi
△ ▽ • Reply • Share ›

Sam J. > Venki Ravi • a year ago

Not sure, but I've gotten this error before when I'm trying to pass a model
'final.mdl' trained using a gmm to nnet-latgen-faster.
△ ▽ • Reply • Share ›

Josh Meyer Mod > Venki Ravi • a year ago

what's the program/command/script you're running?

△ ▽ • Reply • Share ›

Sreelakshmi K R > Josh Meyer • a year ago

Whats the answer to this question???
△ ▽ • Reply • Share ›

Venki Ravi • a year ago

Josh,

I am getting the following error --word-symbol-

table=/experiment/triphones lda mllt sat/graph/words.txt:
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 12/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
table /experiment/triphones_lda_mllt_sat/graph/words.txt:
No such file or directory

But I have the directory and the file very much there

..Also I get the decoded output that is cached earlier.

Any idea ?

Thanks

Ravi
△ ▽ • Reply • Share ›

Josh Meyer Mod > Venki Ravi • a year ago

Hi Ravi,

I think you just have a PATH issue... did you figure this out?

-josh
△ ▽ • Reply • Share ›

AshwinRaju • a year ago

Hi,
one quick question. How can we execute the scripts in the file you gave . Scripts like
"nnet-latgen-faster","lattice-best-path","compute-mfcc-feats" ,etc.. it shows command not
found
△ ▽ • Reply • Share ›

Josh Meyer Mod > AshwinRaju • a year ago

Hi Ashwin!

You have to make sure you have the appropriate nnet & nnet2 bin dirs from
**kaldi/src/** included in your **path.sh** script.

Or you can call the compiled C++ programs from their absolute or relative paths
(relative to the current working directory).

-Josh
△ ▽ • Reply • Share ›

AshwinRaju • a year ago

Thanks, Exactly what i needed.
△ ▽ • Reply • Share ›

Josh Meyer Mod > AshwinRaju • a year ago

Glad to hear, Ashwin:)
△ ▽ • Reply • Share ›

ALSO ON JRMEYER.GITHUB.IO

Josh's Speaker ID Challenge How to Visualize a Word Lattice with

K ldi
http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 13/14
6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi
2 comments • 8 months ago Kaldi
Josh Meyer — Hi Abhishek,The audio came 4 comments • a year ago
from the LibriSpeech project. I can't release Josh Meyer — Glad you found it useful!
the exact files, but if you want to train with

Josh Meyer's Website JRMeyer I'm an NSF Graduate Research Fellow and
joshmeyerphd PhD candidate at the University of Arizona.
I work on automatic speech recognition,
NLP, and machine learning. This blog is
some of what I'm learning along the way.
All opinions are my own.

http://jrmeyer.github.io/asr/2017/01/10/Using-built-DNN-model-Kaldi.html 14/14

Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
Hypervisor Overview Application Note Hypervisor Description ALL REV 0.00
No ratings yet
Hypervisor Overview Application Note Hypervisor Description ALL REV 0.00
27 pages
Lecture 1 Kaldi
No ratings yet
Lecture 1 Kaldi
56 pages
The Kaldi Speech Recognition Toolkit PDF
No ratings yet
The Kaldi Speech Recognition Toolkit PDF
4 pages
Bad Ideas
No ratings yet
Bad Ideas
69 pages
Glade Tutorial
No ratings yet
Glade Tutorial
5 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
What Is Kaldi?: History of The Kaldi Project
No ratings yet
What Is Kaldi?: History of The Kaldi Project
3 pages
AI-900 Microsoft Azure AI Fundamentals Updated Dumps
No ratings yet
AI-900 Microsoft Azure AI Fundamentals Updated Dumps
24 pages
Speech Enhancement Using Kalman Filter
No ratings yet
Speech Enhancement Using Kalman Filter
14 pages
Strategies For Passing Microsoft Certified Azure AI Fundamentals (AI-900)
No ratings yet
Strategies For Passing Microsoft Certified Azure AI Fundamentals (AI-900)
42 pages
SIMD For C++ Developers © 2019 Konstantin, Http://const - Me Page 1 of 21
No ratings yet
SIMD For C++ Developers © 2019 Konstantin, Http://const - Me Page 1 of 21
21 pages
Python Scripting
No ratings yet
Python Scripting
108 pages
MLOps
No ratings yet
MLOps
9 pages
01-Docker - 02 - Install Docker Desktop On Windows
No ratings yet
01-Docker - 02 - Install Docker Desktop On Windows
6 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
Microsoft PowerPoint - SoC Design Flow Tools Codesign
No ratings yet
Microsoft PowerPoint - SoC Design Flow Tools Codesign
110 pages
Unit I Introduction To Devops Notes
No ratings yet
Unit I Introduction To Devops Notes
20 pages
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
No ratings yet
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
34 pages
Smart Grid
No ratings yet
Smart Grid
31 pages
Exp 2 - MDA 8086
No ratings yet
Exp 2 - MDA 8086
6 pages
LangGraph Tutorials
100% (1)
LangGraph Tutorials
3 pages
AZ 203.v2019 12 01.71q
No ratings yet
AZ 203.v2019 12 01.71q
151 pages
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
No ratings yet
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
2 pages
ZFNET Architecture
No ratings yet
ZFNET Architecture
14 pages
Embedded C Interview Question
No ratings yet
Embedded C Interview Question
7 pages
GitHub Actions
No ratings yet
GitHub Actions
25 pages
LPC1768 - UART Programming
No ratings yet
LPC1768 - UART Programming
11 pages
20 Types of LLM Guardrails
No ratings yet
20 Types of LLM Guardrails
12 pages
The Docker Handbook: by Anand Nevase
No ratings yet
The Docker Handbook: by Anand Nevase
57 pages
Slide 3 Hadoop MapReduce Tutorial
No ratings yet
Slide 3 Hadoop MapReduce Tutorial
119 pages
State Common Entrance Test Cell: 6276 MKSSS's Cummins College of Engineering For Women, Karvenagar, Pune
No ratings yet
State Common Entrance Test Cell: 6276 MKSSS's Cummins College of Engineering For Women, Karvenagar, Pune
30 pages
LM Kubernetes
No ratings yet
LM Kubernetes
94 pages
NCA-AIIO Exam Dumps
No ratings yet
NCA-AIIO Exam Dumps
5 pages
Istio - Bookinfo Application
No ratings yet
Istio - Bookinfo Application
9 pages
Mellanox OFED Linux User Manual v2.3-1.0.1
No ratings yet
Mellanox OFED Linux User Manual v2.3-1.0.1
207 pages
LLM Intro
No ratings yet
LLM Intro
51 pages
Databricks Generative AI Engineer Associate Practice Questions
No ratings yet
Databricks Generative AI Engineer Associate Practice Questions
7 pages
Docklight Manual
No ratings yet
Docklight Manual
55 pages
Coursera Enterprise Catalog - Master
No ratings yet
Coursera Enterprise Catalog - Master
1,702 pages
Evolution of Microprocessor
No ratings yet
Evolution of Microprocessor
28 pages
Microsoft Azure AI Certified AI-900 - Yatharth Chauhan
No ratings yet
Microsoft Azure AI Certified AI-900 - Yatharth Chauhan
28 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Sonar Qube
No ratings yet
Sonar Qube
46 pages
3.7.YARN - Failures in Classic MapReduce
No ratings yet
3.7.YARN - Failures in Classic MapReduce
5 pages
Kaldi Whitepaper PDF
No ratings yet
Kaldi Whitepaper PDF
4 pages
4-Goolge Cloud Platform (GCP) Compute, Storage and Network Services-20-08-2024
No ratings yet
4-Goolge Cloud Platform (GCP) Compute, Storage and Network Services-20-08-2024
71 pages
Ace Your Job Interview in Kubernetes Application Developer
No ratings yet
Ace Your Job Interview in Kubernetes Application Developer
56 pages
GPGPU
No ratings yet
GPGPU
139 pages
LangChain Programming For Beginners
No ratings yet
LangChain Programming For Beginners
154 pages
Ai-102 2
No ratings yet
Ai-102 2
24 pages
Flask Restplus
No ratings yet
Flask Restplus
86 pages
Docker Networking
No ratings yet
Docker Networking
48 pages
Lynxos
No ratings yet
Lynxos
4 pages
Hemanshu Kumar Saraf - Resume New
No ratings yet
Hemanshu Kumar Saraf - Resume New
1 page
Microsoft Azure AI Fundamentals: AI-900
No ratings yet
Microsoft Azure AI Fundamentals: AI-900
2 pages
Learning SaltStack - Second Edition
From Everand
Learning SaltStack - Second Edition
Colton Myers
No ratings yet
Emotion Recognition in Paralinguistic Speech
No ratings yet
Emotion Recognition in Paralinguistic Speech
1 page
List of Books
No ratings yet
List of Books
8 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
Methods of Mathematical Physics, Vol.1 Courant R., Hilbert, D.
No ratings yet
Methods of Mathematical Physics, Vol.1 Courant R., Hilbert, D.
575 pages
Bio Data Name: Prof. Dr. Vinod Kapoor Date/ Place of Birth Educational Qualifications
No ratings yet
Bio Data Name: Prof. Dr. Vinod Kapoor Date/ Place of Birth Educational Qualifications
6 pages
Moral Leadership
No ratings yet
Moral Leadership
35 pages
Combined Convection and Radiation
No ratings yet
Combined Convection and Radiation
9 pages
IEEE Paper On Modbus
No ratings yet
IEEE Paper On Modbus
4 pages
Red Rose Children Academy: ANNUAL EXAM-2018-19 Time:-2Hrs Class 3 MM. 50
No ratings yet
Red Rose Children Academy: ANNUAL EXAM-2018-19 Time:-2Hrs Class 3 MM. 50
6 pages
Graphs of Reciprocal Functions: Investigate The Math
No ratings yet
Graphs of Reciprocal Functions: Investigate The Math
10 pages
Presentation REWARDS AND RECOGNITION FINAL
No ratings yet
Presentation REWARDS AND RECOGNITION FINAL
29 pages
14 Point Schedule Health Check
No ratings yet
14 Point Schedule Health Check
1 page
Economics.: Demand Curves
No ratings yet
Economics.: Demand Curves
1 page
Research Q1.1 Science Process Skills
No ratings yet
Research Q1.1 Science Process Skills
5 pages
Boening Display Info
No ratings yet
Boening Display Info
4 pages
Sim-Ace Users Guide Ep-Dcx346
No ratings yet
Sim-Ace Users Guide Ep-Dcx346
56 pages
Chapter 2 Outline
No ratings yet
Chapter 2 Outline
4 pages
Game Theory Terminology
No ratings yet
Game Theory Terminology
8 pages
The Critical Review Essay
No ratings yet
The Critical Review Essay
1 page
Analytical Method Development and Validation of Acetylcysteine and Taurinein Tablet Dosage Form by Using RP-HPLC
No ratings yet
Analytical Method Development and Validation of Acetylcysteine and Taurinein Tablet Dosage Form by Using RP-HPLC
10 pages
Nanoparticles
100% (1)
Nanoparticles
31 pages
Relativity: Tensor Analysis
No ratings yet
Relativity: Tensor Analysis
4 pages
GPG 4 Win
No ratings yet
GPG 4 Win
33 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
XHHW Class X 2024-25
No ratings yet
XHHW Class X 2024-25
6 pages
Profit Center Accounting Cost Center Accounting
No ratings yet
Profit Center Accounting Cost Center Accounting
7 pages
Fryer Combined G9 Solutions
100% (1)
Fryer Combined G9 Solutions
142 pages
PepsiCo Mission Statement Has Been Worded by CEO Indra Nooyi As
No ratings yet
PepsiCo Mission Statement Has Been Worded by CEO Indra Nooyi As
2 pages
Atkinson 2000
No ratings yet
Atkinson 2000
20 pages
Ethico Legal Dilemmas in Critical Care Nursing
No ratings yet
Ethico Legal Dilemmas in Critical Care Nursing
36 pages
Differentiated Activities For Teaching Key Math Skills Grades 4 6 Sample Pages
100% (1)
Differentiated Activities For Teaching Key Math Skills Grades 4 6 Sample Pages
20 pages
Apollo 11 Apollo 12: Space Missions
No ratings yet
Apollo 11 Apollo 12: Space Missions
6 pages
Chapter - 8 OB Motivation Re
No ratings yet
Chapter - 8 OB Motivation Re
26 pages
Daily Lesson Log: A. References
No ratings yet
Daily Lesson Log: A. References
4 pages

How To Use An Existing DNN Recognizer For Decoding in Kaldi

Uploaded by

How To Use An Existing DNN Recognizer For Decoding in Kaldi

Uploaded by

6/8/2018 How to use an Existing DNN Recognizer for Decoding in Kaldi

Josh Meyer's Website About

How to use an Existing DNN

This post is essentially a walk through of this shell script.

Things you need

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/transcriptions$ cat wav.scp

josh@yoga:~/git/kaldi/egs/kgz/kyrgyz-model/config$ cat mfcc.conf

���� @? �> @? �> @? �> ����

�2‫ڎ‬AP �26�AQ �2� AS� " �2r�AR

Audio –> Feature Vectors

Trained DNN-HMM + Feature Vectors –> Lattice

Lattice –> Best Path Through Lattice

Best Path Intergers –> Best Path Words

I hope this was helpful!

If you have any feedback or questions, don’t hesitate to leave a comment!

Join the discussion…

Sreelakshmi K R • a year ago

while running following command :

I got this error message...

ERROR (nnet-latgen-faster:NnetComputer():nnet-compute.cc:71) Feature dimension is

Josh Meyer Mod > Sreelakshmi K R • a year ago

You're feeding in raw mfccs to a system that expects lda probably.

Sreelakshmi K R > Josh Meyer • 2 months ago

I understand. But I need to use the model that is trained with

Any help appreciated.

Hari • a year ago

prashant maheshwari > Hari • 8 months ago

Is there any development on this as I am also looking to do this.

P.S. I am new to kaldi, so don't have much knowledge in it.

Josh Meyer Mod > prashant maheshwari • 8 months ago

Josh Meyer Mod > Hari • a year ago

Hari > Josh Meyer • a year ago

Josh Meyer Mod > Hari • a year ago

I would recommend you go to the Kaldi-help Google group and

Venki Ravi • a year ago

A QQ. Have you got a chance to work on SRE10 Kaldi recipe?

Josh Meyer Mod > Venki Ravi • a year ago

Venki Ravi > Josh Meyer • a year ago

Venki Ravi • a year ago

Any idea how to break the transcription by speaker id?

Josh Meyer Mod > Venki Ravi • a year ago

Have you figured this out yet, Venki?

speaker 1: the sky is blue

Is that what you're going for?

Another cool problem:)

Michael Capizzi > Josh Meyer • a year ago

If Ravi means "break the transcription by speaker id" to simply mean

Venki Ravi • a year ago

I was using nnet 3 for aspire model .

Josh Meyer Mod > Venki Ravi • a year ago

Sorry I missed these two messages from 19 days ago...

I haven't done anything with nnet3 yet, no.

I also haven't done anything with the aspire dataset, either.

Venki Ravi • a year ago

I am getting the following error:

ERROR (nnet-latgen-faster:ExpectToken():io-funcs.cc:201) Expected token "<nnet>", got

Sam J. > Venki Ravi • a year ago

Josh Meyer Mod > Venki Ravi • a year ago

what's the program/command/script you're running?

Sreelakshmi K R > Josh Meyer • a year ago

Venki Ravi • a year ago

I am getting the following error --word-symbol-

..Also I get the decoded output that is cached earlier.

Josh Meyer Mod > Venki Ravi • a year ago

AshwinRaju • a year ago

Josh Meyer Mod > AshwinRaju • a year ago

AshwinRaju • a year ago

Josh Meyer Mod > AshwinRaju • a year ago

Josh's Speaker ID Challenge How to Visualize a Word Lattice with

You might also like

�� @? �> @? �> @? �> ��