0% found this document useful (0 votes)
9 views

Speech Understanding Content

Uploaded by

Chamod Kanishka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Speech Understanding Content

Uploaded by

Chamod Kanishka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Office Hour:Reserve via email.

Contact
Email: [email protected]

This course introduces the basic signal


processing and artificial intelligence
concepts that underlie modern speech
understanding applications, then shows you
how to create your own speech
understanding applications.
Fundamental concepts to be introduced will
include waveforms segmentation and
Course Description labeling, sampling frequency, frequency
domain analysis, spectrogram and mel
spectrogram.
Open-source speech recognition and speech
synthesis toolkits will be introduced.
Methods will be introduced that use the
open-source toolkits to create a voice-
activated web browser and a personal
assistant.

By the end of this course, students should


know how to segment a waveform in the
time domain, how to identify the most
important frequencies present in a
Goals waveform, how to use open-source toolkits
to perform speech recognition and speech
synthesis, and how to create their own
voice-activated web browser and personal
assistant.

Students interested in developing artificial


Audience intelligence applications using open-source
toolkits.

Speech analysis, speech synthesis, speech


Topics
recognition, internationalization.

Prerequisite/Required Students must have the ability to program in


at least one object-oriented programming
language (python, ruby, C++, java, etc.).
knowledge The course will be taught primarily in
python, so students with a background in
that language will have an advantage.

Make Python Talk: Build Apps with Voice


Textbooks Control and Speech Recognition, by Mark
Liu, 2021

Educational Media None

Auxiliary readings will be assigned from


References several web tutorials, as described under
each of the lectures.

Lecture Contents

No
Topic/Activity Reading
.

1 Praat https://www.researchgate.net/publication/
270819326_PRAAT_--_Short_Tutorial_--_An_introduction,
pages 1-12
Setting up
Python,
2 Anaconda, and Chapter 1 and Section 2.1 (Variables and Values)
Spyder. Scalar
variables.

Python loops,
functions,
3 modules, lists, Remainder of chapter 2
dicts, and
tuples

4 Numpy, the first part of Section 3.1 (about PyAudio), and


Matplotlib, and https://people.csail.mit.edu/hubert/pyaudio/,
PyAudio https://numpy.org/doc/stable/user/absolute_beginners.ht
ml, and
https://matplotlib.org/stable/users/getting_started/
5 Do-it-yourself https://towardsdatascience.com/understanding-audio-
speech data-fourier-transform-fft-spectrogram-and-speech-
synthesis recognition-a4072d228520
using Numpy
and Librosa
6 Do-it-yourself https://librosa.org/doc/latest/tutorial.html,
speech https://librosa.org/doc/latest/generated/librosa.display.sp
recognition ecshow.html,
using Numpy https://librosa.org/doc/latest/generated/librosa.feature.m
and Librosa elspectrogram.html
The
SpeechRecogn Sections 3.1 (Install), 3.2 (Test), and 3.3 (Voice-Controlled
7
ition module, Web Search)
part 1

8 The Sections 3.4 (Open Files), 3.5 (Local Module)


SpeechRecogn
ition module,
part 2

Speech
9 Chapter 4
Synthesis

Speech :
Sections 5.1 (Local Package) and 5.2 (Guess the Number
10 Guess the
Game)
Number

Sections 6.1 (Primer on Web Scraping) and 6.2 (Scrape


11 Web Scraping
Live Web Pages)

Voice-
Sections 6.3 (Voice-Activated Podcasts), 6.4 (Radio), and
12 Activated
6.5 (Videos)
Podcasts

Personal
13 Sections 7.1 (Overview) through 7.5 (Tell a Joke)
Assistant

World
14 Chapter 16
Languages

World
15 None
Languages

You might also like