MFA Instructions
MFA Instructions
I installed the MFA via Miniconda, a minimal installer for conda (a Python/R package-
management distribution) that kept things nice and neat. This would presumably all work with
Python 3 or the like as well.
There are methods of training your own model, and a ton of dictionary for various languages
already available, but here we'll install the ones for US English. These use the ARPAbet
(essentially ASCII-friendly IPA for US English) – there are ones available in X-SAMPA and
other formats as well.
Great, the model and dictionary are ready. You can specify a directory to install these (check
the website for instructions there) – I just let them sit in the default folder, Documents/MFA.
You can track down the dictionary's wordlist here, and check if you like to see if a given token
is included:
Prepare your dataset. This should be a single folder with paired .wav and .txt files, where the
latter are transcriptions of the former. They should have the exact same filename, except for
the extension.
Transcription convention may depend on the dictionary you're using – for this one,
capitalization of letters does not matter. Nor does spacing except between words, though I
like to use line breaks between clauses and for readability.
This dictionary also has a few "words" for non-speech sounds:
In my experience it's not great at identifying/aligning laughter; when in doubt, go with <unk>.
A transcription will look something like this:
IMPORTANT NOTE: Ensure the encoding for this .txt file is UTF-8.
The aligner will throw a rather unhelpful-to-diagnose error if it's in ANSI.
Validate your dataset to check for errors, replacing the directory as relevant:
This will take a little while! It does provide helpful progress bars.
The results should be .TextGrid files in your output folder, with the same filenames:
These can then be loaded into Praat alongside their corresponding audio file.
Behold! It's so beautiful.
These TextGrids are formatted with two tiers – the first is word-level, the second is phone-
level in ARPAbet (with numeric marking of stressed syllables). You should manually scan
through the output for discrepancies and errors – I had very little trouble with Rainbow
Passage text readings (this example is pretty typical, and a very clean result), but
spontaneous speech with disfluencies, filler, etc. was more likely to spark some oddities.
Thou has aligned successfully! Now go forth, and make use of Praat scripts!