Speech Understanding Content
Speech Understanding Content
Contact
Email: [email protected]
Lecture Contents
N Topic/Activit Assignments
Reading
o. y Due
Install Praat.
Use it to record
yourself saying
a sentence with
5-10 words. Use
Praat to label
each word
(begin time, end
https://www.researchgate.net/publication/270819 time, word).
1 Praat 326_PRAAT_--_Short_Tutorial_--_An_introduction, Capture a
pages 1-12 screenshot that
shows the
spectrogram
and the labels
(note: to show
the
spectrogram,
you may need
to zoom in, so
that you are
only showing a
maximum of
ten seconds of
audio). Submit
the screenshot.
Due: by the
start of the
second lecture.
Setting up
Problems 1.1-
Python,
1.3, 2.1-2.8.
Anaconda,
2 Chapter 1 and Section 2.1 (Variables and Values) Due: by the
and Spyder.
start of the third
Scalar
lecture.
variables.
Python
loops,
Problems 2.9-
functions,
3 Remainder of chapter 2 2.22. Due: start
modules,
of lecture 4.
lists, dicts,
and tuples
Use np.sin to
create sine
waves, one
the first part of Section 3.1 (about PyAudio), and second long, at
https://people.csail.mit.edu/hubert/pyaudio/, 22050 samples
Numpy,
https://numpy.org/doc/stable/user/absolute_begin per second, at
4 Matplotlib, three different
ners.html, and
and PyAudio
https://matplotlib.org/stable/users/getting_started frequencies:
/ 220Hz, 440Hz,
and 880Hz. Use
PyAudio to play
them back. Use
matplotlib.pypl
ot.plot to plot
just the first
20ms (441
samples) from
each waveform.
Use fig.savefig
to save these
three figures as
PNG files. Now
use
np.abs(np.fft.fft
()) to find the
absolute value
of the Fourier
transform of
each sine wave,
and then plot it
using
matplotlib.pypl
ot.plot. Figure
out what values
you should give
to the X-axis in
matplotlib.pypl
ot.plot so that
the Fourier
transform of the
220Hz tone has
a peak at 220,
the Fourier
transform of the
440 Hz tone has
a peak at 440,
and the Fourier
transform of the
880Hz tone has
a peak at 880.
Once you have
the X-axis
correctly
labeled for
each plot, use
fig.savefig to
save these
three PNG files,
and submit
these spectral
plots, along
with the
previous three
waveform plots.
Due: start of
lecture 5
Use Praat to
record yourself
saying "aaa".
Use
librosa.load to
Do-it- load the
yourself waveform file
speech https://towardsdatascience.com/understanding- into python.
5 synthesis audio-data-fourier-transform-fft-spectrogram- Use
using and-speech-recognition-a4072d228520 matplotlib.pypl
Numpy and ot.plot to plot
Librosa the entire vowel
(one plot), and
then to make a
second plot
containing just
100ms (2205
samples) and a
third plot
containing just
20ms (441
samples) from
the loudest part
of the vowel.
Use fig.savefig
to save these
three plots as
PNG files. Now
use
np.abs(np.fft.fft
()) to identify
the frequencies
and amplitudes
of five different
sine waves that
you can add
together to
make a
waveform that
sounds like this
vowel. Add
together those
five sine waves,
with those five
different
frequencies
and
amplitudes,
with a length of
about 0.5
seconds. Play
the waveform
back. Does it
sound like
"aaa"? Use
matplotlib.pypl
ot.plt to plot a
100ms section
of your
synthetic
vowel, and
hand in this
fourth plot,
together with
the other three
plots. Due: start
of lecture 6
Record six
waveforms:
"aaa", "eee",
and "ooo",
twice each. Use
librosa to
create
Do-it- melspectrogra
yourself https://librosa.org/doc/latest/tutorial.html, m image plots
speech https://librosa.org/doc/latest/generated/librosa.di of all six
6 recognition splay.specshow.html, vowels, give
using https://librosa.org/doc/latest/generated/librosa.fe them
Numpy and ature.melspectrogram.html appropriate
Librosa titles, and turn
them in. Now
use np.average
to average each
of these six
melspectrogra
ms along the
time axis,
creating six
spectral
vectors. Use
matplotlib.pypl
ot.plot to plot
all six of these
vectors on the
same axes, and
use
matplotlib.pypl
ot.legend to
insert a legend
into the plot,
showing which
is which. Use
fig.savefig to
save this file as
a PNG, and turn
it in. You should
see that the two
"aaa" vowels
are similar to
each other, the
two "eee"
vowels are
similar to each
other, and the
two "ooo"
vowels are
similar to each
other. Due:
start of lecture
7
The Sections 3.1 (Install), 3.2 (Test), and 3.3 (Voice- Problem 3.1.
7 SpeechReco Controlled Web Search) Due: lecture 8
gnition
module,
part 1
The
SpeechReco Problems 3.2,
8 gnition Sections 3.4 (Open Files), 3.5 (Local Module) 3.3. Due:
module, lecture 9
part 2
Speech :
1 Sections 5.1 (Local Package) and 5.2 (Guess the Problem 5.1.
Guess the
0 Number Game) Due: lecture 11
Number
Problems 6.1,
1 Web Sections 6.1 (Primer on Web Scraping) and 6.2
6.2. Due:
1 Scraping (Scrape Live Web Pages)
lecture 12
Voice-
1 Sections 6.3 (Voice-Activated Podcasts), 6.4 Problem 6.3.
Activated
2 (Radio), and 6.5 (Videos) Due: lecture 13
Podcasts
Problems 7.1,
1 Personal
Sections 7.1 (Overview) through 7.5 (Tell a Joke) 7.2. Due:
3 Assistant
lecture 14
Create a
modified
personal
1 World assistant that
Chapter 16
4 Languages uses some
other language,
instead of
English (you
can choose
Japanese,
Chinese, or any
other language
that you wish).
Hand in your
code. Due:
lecture 15
1 World
None None
5 Languages