0% found this document useful (0 votes)
224 views21 pages

HMM Toolkit (HTK) : Presentation by Daniel Whiteley AME Department

HTK is a toolkit for building and manipulating hidden Markov models (HMMs). It contains tools for speech analysis, HMM training, testing, and results analysis. HTK uses HMMs with both continuous density mixture Gaussians and discrete distributions. The basic workflow in HTK involves data preparation, HMM model creation and training, and pattern recognition. Key steps include vector quantization to discretize continuous data, initialization and retraining of HMMs, and creation of dictionaries and label files.

Uploaded by

shfaisal2327
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views21 pages

HMM Toolkit (HTK) : Presentation by Daniel Whiteley AME Department

HTK is a toolkit for building and manipulating hidden Markov models (HMMs). It contains tools for speech analysis, HMM training, testing, and results analysis. HTK uses HMMs with both continuous density mixture Gaussians and discrete distributions. The basic workflow in HTK involves data preparation, HMM model creation and training, and pattern recognition. Key steps include vector quantization to discretize continuous data, initialization and retraining of HMMs, and creation of dictionaries and label files.

Uploaded by

shfaisal2327
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

HMM Toolkit (HTK)

Presentation by
Daniel Whiteley

AME department
What is HTK?
The Hidden Markov Model Toolkit (HTK) is a
portable toolkit for building and manipulating
hidden Markov models. HTK is primarily used
for speech recognition research although it has
been used for numerous other applications
including research into speech synthesis,
character recognition and DNA sequencing. HTK
is in use at hundreds of sites worldwide.
What is HTK?
HTK consists of a set of library modules and tools
available in C source form. The tools provide
sophisticated facilities for speech analysis, HMM
training, testing and results analysis. The software
supports HMMs using both continuous density
mixture Gaussians and discrete distributions and
can be used to build complex HMM systems.
Basic HTK command format

The commands in HTK follow a basic command
line format:
HCommand [options] files

Options are indicated by a dash followed by the
option letter. Universal options are capital letters.

In HTK, it is not necessary to use file extentions,
but headers to determine their format.
Configuration files

As well, you can set up the configuration of HTK
modules using config files. They are implemented using
the -C option; or they can be implemented globally using
the command setenv HCONFIG myconfig where
myconfig is your own config modifications.

All possible configuration variables can be found in
chapter 18 of the HTK manual. However, for most of
our purposes, we only need to create a config file with
these lines:
SOURCEKIND = USER %The user defined file format (not sound)
TARGETKIND = ANON_D %Keep the file the same format.
Using HTK

Parts of HMM modeling
– Data Preparation
– Model Training
– Pattern Recognition
– Model Analysis
Data Preparation

One small problem:

HTK was tailored for speech recognition. Therefore, most of
the data preparation tools are for audio.
– Due to this, we need to jerry-rig our data to the HTK
parameterized data file format.

HTK parameter files consist of a sequence of samples
preceeded by a header. The samples are simply data
vectors, whose components are 2-byte integers or 4-byte
floating point numbers.

For us, these vectors will be a sequence of joint angles
received from a motion capture session.
HTK file format

The file begins with a 12-byte header containing
the following information:
– nSamples (4-byte int): Number of samples
– samplePeriod (4-byte int): Sample period (calculated
by multiplying the number by 100ns)
– sampleSize (2-byte): Number of bytes per vector
– parameterKind (2-byte int): Defines the type of data

For our purposes, either this parameter will be 0x2400,
which is the user defined parameter kind, or 0x2800, which
is the discrete case.
HMM model creation

In order to model the motion capture squence, we need
to create a prototype of the HMM. In this prototype, the
values of B and  are arbitrary. The same is true for the
transition matrix A, save that any transition probability
you set to zero will remain as zero.

Models are created using a scripting language similar to
HTML.

As well, models in HTK have a beginning and ending
state which are non-emitting. These states are not
defined in the script.
Name of
the file
HMM Model Example
~h ''prototype'' Number of
Gaussian ... Transition
distributions matrix A
<BeginHMM>
<TransP>
Number <VectorSize> 4 <USER>
of states 0.0 0.4 0.3 0.3 0.0
<NumStates> 5 0.0 0.2 0.5 0.3 0.0
<State> 2 <NumMixes> 3 0.0 0.2 0.2 0.4 0.2
Mean <Mixture> 1 0.3
observation Sample size 0.0 0.1 0.2 0.3 0.4
vector <Mean> 4
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Covariance <Variance> 4
matrix 1.0 1.0 1.0 1.0 All the transition
diagonal <Mixture> 2 0.4 ... probabilities for
<State> 3 ... the ending state
are always zero
The distribution’s
ID and weight
Vector Quantization

In order to reduce computation, we can make the
HMM discreete.

In order to use a discreete HMM, we must first
quantize the data into a set of standard vectors.

Warning: in quantizing the data, error is
inheritably introduced.

Before quantizing the data, we must first have a
standard set of vectors, or a “vector cookbook”.
This is made with HQuant.
HQuant

HQuant takes the training data and uses a K-means
algorithm to evenly partition the data and find the centriods
of these partitions to create our quantization vectors (QVs).

A sample command: Number of You can use a
QVs for a script to list all Our cookbook
Use the configuration certain data of your will be written to
variables found in stream training files this file
config

HQuant -C config -n 1 64 -S train.scp vqcook



To reduce quatization time, a cookbook using a binary tree
search algorithm can be made using the -t option.
Converting to Discrete

The conversion of data files is done using the HCopy
command. In order to quantize our data, we do this:
HCopy –C quantize rawdata qvdata
Where rawdata is our original data, qvdata is our
quantized data, and quantize is a config file having
these commands:
SOURCEKIND = USER %We start with our original data
TARGETKIND = DISCRETE %Convert it into discrete data
SAVEASVQ = T %We throw away the continuous data
VQTABLE = vqcook %We use are previously made
%cookbook to quantize the data
Discrete HMM
~o <Discrete> <StreamInfo> 1 1

Discreete HMMs are
~h “dhmm”
very similar to their Number of
<BeginHMM> discrete symbols
continuous <NumStates> 5
counterparts, save for <State> 2 <NumMixes> 10
a few changes. <DProb> 5461*10
....

Discrete probabilities <EndHMM> Duplicate
function
are in logrithmic form,
where:
P(v) = exp(-d(v)/2371.8)
Model Training (token HMM)

The initialization of our prototype can be done
using HInit: (The HHMM
being trained)
HInit [options] hmm data1 data2 data3 ...

HInit is used mainly for left-right HMMs. For
more ergodic HMMs, it can be initialized by
doing a flat-start. This is done by setting all
means and variances to the global counterparts
using HCompV:
HCompV -m -S trainlist hmm
Retraining

The model this then retrained using the Welch-
Baum algorithm found in HRest:
HRest -w 1.0 -v 0.0001 -S trainlist hmm

The -w and -v options are to set floors for the
mixture probability and variances respectively.
The float used in -w represents a multiplier of
10^-5.

This can be iterated as many times as wanted to
achieve desired results.
Dictionary Creation

In order to create a recognition program or script,
we must first create a dictionary.

A dictionary in HTK gives the word and its
pronunciation. For our purposes, it will just
consist of our token HMM that we trained.
RUNNING run
WALKING walk
JUMPING [SKIPPING] jump
Word Tokens used to
Displayed output (if not
form the word
specified the word is displayed)
Label Files

Label files contain a transcription of what is
going on in the data sequence.
Start of frame End of frame Token found in
in samples in samples that time frame

000000 100000 walk


100001 200000 run
200001 300000 jump
Master Label Files (MLFs)
Same as a
original label file
• During training and “#!MLF!#”
“*/a.lab”
recognition, we may 000000 100000 walk
have many test files 100001 200000 run
200001 300000 jump
and their . If the entire file
accompanying label “*/b.lab” is one token, it
can be labeled
files. The label files run with just the
. token
can be condensed into “*/jump*.lab”
one file called a master jump
The wildcard operator can
label file, or MLF. . be used to label multiple
files at once
Pattern Recognition

The recognition of a motion sequence is done by
using HVite.

To receive a transcription of the recognition data
in MLF format, we use:
Throws away
Output transcription unnecessary data Text file containing
Create word network file in MLF format in the label files a list of HMM used
from given transcriptions

HVite –a –i results –o SWT –H hmmlist \


–I transcripts.mlf –S testfiles

MLF file that has the Motion capture data


test files’ transcriptions to be recognized
Model Analysis

The analysis of the recognition results is done by
HResults.
HResults -I transcripts.mlf -H hmmlist results

MLF containing the List of MLF containing


reference labels HMMs used result labels


Note: The reference labels and the results labels
must have different file extensions

You might also like