0% found this document useful (0 votes)
57 views5 pages

MFA Instructions

Uploaded by

grayeggsandsam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views5 pages

MFA Instructions

Uploaded by

grayeggsandsam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Montreal Forced Aligner: Notes and Instructions

I installed the MFA via Miniconda, a minimal installer for conda (a Python/R package-
management distribution) that kept things nice and neat. This would presumably all work with
Python 3 or the like as well.

Miniconda available here:


https://docs.conda.io/en/latest/miniconda.html

Detailed instructions on MFA installation may be found here:


https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html

Once conda/Anaconda/Miniconda/Python/what-have-you is installed, open the prompt:

conda create -n aligner -c conda-forge montreal-forced-aligner

This creates a new alignment ("aligner") and installs the MFA.


There are other installation methods if this does not work (i.e. via pip) – see site above.

Enter the new environment:

conda activate aligner

Now to install other dependencies/tools:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c


pytorch -c nvidia

pip install speechbrain


Great, the MFA is installed.
It works by taking a dictionary and acoustic model and applying it to your corpus.

There are methods of training your own model, and a ton of dictionary for various languages
already available, but here we'll install the ones for US English. These use the ARPAbet
(essentially ASCII-friendly IPA for US English) – there are ones available in X-SAMPA and
other formats as well.

Re-ensure you are in the aligner environment:

conda activate aligner

And install the dictionary and model:

mfa model download acoustic english_us_arpa


mfa model download dictionary english_us_arpa

Test this by fetching information about the model:

mfa model inspect acoustic english_us_arpa

Great, the model and dictionary are ready. You can specify a directory to install these (check
the website for instructions there) – I just let them sit in the default folder, Documents/MFA.
You can track down the dictionary's wordlist here, and check if you like to see if a given token
is included:

Prepare your dataset. This should be a single folder with paired .wav and .txt files, where the
latter are transcriptions of the former. They should have the exact same filename, except for
the extension.

Transcription convention may depend on the dictionary you're using – for this one,
capitalization of letters does not matter. Nor does spacing except between words, though I
like to use line breaks between clauses and for readability.
This dictionary also has a few "words" for non-speech sounds:

In my experience it's not great at identifying/aligning laughter; when in doubt, go with <unk>.
A transcription will look something like this:

IMPORTANT NOTE: Ensure the encoding for this .txt file is UTF-8.
The aligner will throw a rather unhelpful-to-diagnose error if it's in ANSI.

Put these in the same folder as your audio files. You


can run more than one at a time; if any run into a
critical problem, the aligner will skip over them.

My input path was: C:\Users\graye\Desktop\


MFAProcessing\Input

My output path was: C:\Users\graye\Desktop\


MFAProcessing\Output

(Yes, my folder naming methods are very creative.)


Again, make sure you're in the aligner environment:

conda activate aligner

Validate your dataset to check for errors, replacing the directory as relevant:

mfa validate C:\Users\graye\Desktop\MFAProcessing\Input


english_us_arpa english_us_arpa

Those last two arguments are the model and dictionary.


Assuming there are no errors, align with the following.

mfa align C:\Users\graye\Desktop\MFAProcessing\Input english_us_arpa


english_us_arpa C:\Users\graye\Desktop\MFAProcessing\Output --clean

Do not forget the --clean at the end!

This will take a little while! It does provide helpful progress bars.

The results should be .TextGrid files in your output folder, with the same filenames:

These can then be loaded into Praat alongside their corresponding audio file.
Behold! It's so beautiful.

These TextGrids are formatted with two tiers – the first is word-level, the second is phone-
level in ARPAbet (with numeric marking of stressed syllables). You should manually scan
through the output for discrepancies and errors – I had very little trouble with Rainbow
Passage text readings (this example is pretty typical, and a very clean result), but
spontaneous speech with disfluencies, filler, etc. was more likely to spark some oddities.

Thou has aligned successfully! Now go forth, and make use of Praat scripts!

You might also like