0% found this document useful (0 votes)

49 views30 pages

Lecture 2

Uploaded by

Rakshith Kamath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views30 pages

Lecture 2

Uploaded by

Rakshith Kamath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

ELEN 6820

Speech and audio signal processing

Instructor: Nima Mesgarani (nm2764)
3 credits
TA: Yi Luo (yl3364)
O ce hours: TBD
ffi
Course overview
• Brief history of speech recogni on
• Discrete Signal Processing (DSP) overview
• Pa ern recogni on and deep learning overview
• Speech signal produc on
• Speech signal representa on
• Auditory Scene Analysis, speech enhancement and separa on
• Speech processing in the auditory system
• Acous c modeling
• Sequence recogni on and Hidden Markov Models
• Language models
• Music signal processing
tt
ti
ti
ti
ti
ti
ti
ti
Homeworks
• HW1: Discrete signal processing (wri en) (W2)
• HW2: Neural networks and voice ac vity detec on (programming) (W3&4)
• HW3: Speech signal produc on and representa on (wri en) (W5)
• HW3: Speech enhancement and separa on (programming) (W6)
• HW4: Acous c event detec on and Speaker iden ca on (programming)
(W7&8)
• HW5: Phoneme recogni on and automa c speech recogni on
(programming) (W9&10)
• Final project (programming) (W11-13)
ti
ti
ti
ti
ti
tt
ti
ti
ti
ti
ti
fi
ti
tt
ti
Week topic HW

1 Introduc on and history -

2 Discrete signal processing DSP (W)

3 Machine learning 1 Neural network and VAD (P)

4 Machine learning 2 -

5 Speech signal produc on Speech produc on (W)

6 Speech signal representra on Speech enhancement (P)

Speech enhancement and
7 -
separa on
8 Human speech percep on Acous c event detec on (P)

9 Acous c modeling -

10 Sequence modeling and HMMs Phoneme recogni on and ASR (P)

11 Language modeling -

12 Automa c speech recogn on Projcet

13 Music signal processing -

ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
Course evalua on

• Two wri en homework (20%)

• Four programming homework (60%)
• Final project (20%)

• Late submission: 10% penalty per day

tt
ti
Final project

• Preferably choose the prede ned course project

• Alterna vely, de ne a project that is similar in scope and

workload, in discussion with me and Yi
ti
fi
fi
How to install Python with graphical interface on Mac/
Windows/Linux
Install Jupyter Notebook using Anaconda and cond
Download it from the following address and follow the instruction
www.anaconda.com/products/individual

• Anaconda will simultaneously install Python and Jupyter Notebook as well as

some necessary packages (e. g. numpy, scipy, etc.)
• You can either use graphical installer or use command line installer in Mac OS
• If you have Windows 10 and want to use bash commands, it is highly
recommended that you enable Linux subsystem bash environment and
install a Linux version of Anaconda on it (using command line installer)
• A er installing Anaconda, you can either run Jupyter Notebook from
Anaconda app or run the command, jupyter notebook, in terminal
• You can also use other environments for interac ng with Python, but the one
recommended for this course is Jupyter Notebook, specially if you want to
run your codes on a server (e.g. for Tensor ow)

• More informa on available on: h p://jupyter.readthedocs.io/en/latest/

index.html
• More instruc ons and tutorials for star ng Python will be taught next session
• The assignments will be checked in Jupyter Notebook using Python 3
ft
ti
ti
tt
ti
fl
ti
Signal processing
background (chapter 2)

Speech communica on

Produc on Percep on

Ear drum

Cocktail party problem, Cherry, (1953)

ti
ti
ti
Discrete Signal Processing
• Discrete me signals and systems

• Discrete me Fourier transform, z-transform

• Digital lters, IIR and FIR

• Sampling theorem, changing the sampling rate

• Emphasis on intui on
fi
ti
ti
ti
Discrete me Signals and
Systems
• Speech signal: represen ng con nuously varying pa ern
as func ons of a con nous variable t, which represents
me.

• Discrete signal: x[n] = xa(nT), where T = 1/Fs

• Telephone bandwidth speech: Fs = 6.4KHz

• Wide-band speech: Fs = 16KHz

ti
ti
ti
ti
ti
ti
tt
Few basics

• Unit impulse func on, unit step func on, exponen al

sequence

• Convolu on
ti
ti
ti
ti
Transforma ons of Signals and
Systems

• Fourier Transform

• z-Transform
ti
The Con nous-Time Fourier
Transform

• What did Fourier show?

• Whats the big deal? 1822

• Decomposing signals into fast and slow components

• Importance of sine func on for linear systems

ti
ti
The z-Transform
• A powerful tool for analyzing linear systems of di eren al
equa ons

• De ni on

• Inverse z-Transform

• Examples: delayed unit response, box pulse, exponen al

• Proper es of z-Transform: linearity, shi , exponen al

weigh ng, Linear weigh ng, convolu on, mul plica on of
sequences
fi
ti
ti
ti
ti
ti
ti
ft
ti
ff
ti
ti
ti
ti
The Discrete-Time Fourier Transform
Discrete-Time Fourier Transform
+∞

!
(e ) = x[n]e−jωn
 jω


 X
 n=−∞

 & π
 1

 x[n] = 2π
X (ejω )ejωn dω
−π
+∞ ""
! "
"
• De condition
• Sufficient ni on, periodic
for convergence: "
" x[n] " < +∞
"
n=−∞

• Although x[n] is discrete, X (ejω ) is continuous and periodic with period 2π.
• Inverse DTFT
• Convolution/multiplication duality:

y[n] = x[n] ∗ h[n]
• DTFT of a Cosine Signal


Y (ejω ) = X (ejω )H(ejω )





 y[n] = x[n]w[n]

& π
fi
ti
The Discrete Fourier Transform

• Sampling the DTFT: Discrete Fourier Transform (DFT)

Prac cal implica ons

• Periodic signals, or, nite length sequences

• What frequency each DFT corresponds to?

• Circular shi of x[n]

• Boundary condi ons, importance of windowing

ti
ft
ti
fi
ti
Dependent Fourier
e-Dependent Transform)
Fourier Transform)

Create a nite length sequence:

w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]
w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]

x [ mx] [ m ]

windowing m m

00 nn == 50
50 nn
==100
100 n = 200
n = 200

+∞
!+∞
Xn (ejω
jω
)= ! w[n − m]x[m]e−jωm
−jωm
Xn (e ) = m=−∞ w[n − m]x[m]e
m=−∞
fixed, then it can be shown that:
fixed, then it can be shown that:
" π
1
Xn (ejω ) = 2π
" πW (ejθ )ejθn X (ej(ω+θ) )dθ
1
Xn (ejω ) = 2π
−π W (ejθ )ejθn X (ej(ω+θ) )dθ
−π
bove equation is meaningful only if we assume that X (ejω ) represents
er transform
ove equationofisa meaningful
signal whoseonly
properties continuethat
if we assume X (ejω
outside the) repres
windo
ytransform
that the signal is zero whose
of a signal outside properties
the window.continue outside the wi
that
der forthe signal
Xn (e jω
) to is zero outside
correspond the
to X (e jω window.
), W (ejω ) must resemble an impu
jω
fi
Rectangular window
Rectangular Window

w[n] = 1, 0≤n≤N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 4

Hamming window
Hamming Window

2πn
" !
w[n] = 0.54 − 0.46cos , 0≤n≤N −1
N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 5

Comparison of Windows

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 6

Spectrogram

• Use a sliding window over the signal, and display the

magne te of the DFT for each step.

• Large vs. Small window?

• Overlapping vs. non-overlapping?

ti
A Wideband Spectrogram

Two plus seven is less than ten

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 8

A Narrowband Spectrogram

Two plus seven is less than ten

Tradeoff between DFT length (temporal resolution)

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 9

and spectral resolution

Digital lters

• A digital lter is a discrete- me shi -invariant system

• Convolu on equa on: unit response, transfer func on,

system func on

• All useful systems sa sfy the linear di erence equa on

ti
fi
ti
fi
ti
ti
ti
ft
ff
ti
ti
FIR vs. IIR lters

• Linear vs. nonlinear phase

• Large vs. small impulse response dura on

fi
ti
Sampling
• Represent a con nous me signal as a sequence of
numbers

• The Sampling Theorem

ti
ti
Changing the sampling rate of a signal

Phishing Dummies Ebook
100% (1)
Phishing Dummies Ebook
49 pages
EventGuideSpoilers 1-0-35
No ratings yet
EventGuideSpoilers 1-0-35
10 pages
Doblinger Matlab Course
No ratings yet
Doblinger Matlab Course
99 pages
Cs2403 Digital Signal Processing Notes
No ratings yet
Cs2403 Digital Signal Processing Notes
106 pages
Laboratory 3 Digital Filter Design
No ratings yet
Laboratory 3 Digital Filter Design
8 pages
Fundamentals of Programming: Using Python
From Everand
Fundamentals of Programming: Using Python
Bruce Embry
5/5 (2)
Chatfuel Chatbot JSON Guide - Guia
No ratings yet
Chatfuel Chatbot JSON Guide - Guia
30 pages
Course 01 - Introduction
No ratings yet
Course 01 - Introduction
56 pages
l4n JN Uhbh Hiunun Hbinun
No ratings yet
l4n JN Uhbh Hiunun Hbinun
36 pages
FFTandMatLab Wanjun Huang
No ratings yet
FFTandMatLab Wanjun Huang
26 pages
Matlab Exercises To Explain Discrete Fourier Transforms PDF
No ratings yet
Matlab Exercises To Explain Discrete Fourier Transforms PDF
9 pages
DSP Lecture 2
No ratings yet
DSP Lecture 2
77 pages
Spectral Modeling and Signal Processing Intro421
100% (2)
Spectral Modeling and Signal Processing Intro421
35 pages
Title Page
No ratings yet
Title Page
171 pages
Signal Processing
No ratings yet
Signal Processing
367 pages
Digital Filter Design For Audio Processing: Ethan Elenberg Anthony Hsu Marc L'Heureux
No ratings yet
Digital Filter Design For Audio Processing: Ethan Elenberg Anthony Hsu Marc L'Heureux
31 pages
BSP-L4-Discrete Time and System
No ratings yet
BSP-L4-Discrete Time and System
49 pages
Digital Signal Processing by Krishna
No ratings yet
Digital Signal Processing by Krishna
303 pages
MATLAB For Audio Signal Processing: P. Professorson UT Arlington Night School
No ratings yet
MATLAB For Audio Signal Processing: P. Professorson UT Arlington Night School
27 pages
DSP1
No ratings yet
DSP1
64 pages
pset01
No ratings yet
pset01
15 pages
MATLAB Audio Processing Ho
No ratings yet
MATLAB Audio Processing Ho
7 pages
DSP Full Slides
No ratings yet
DSP Full Slides
911 pages
Digital Signal Processing Lab Manual Updated
No ratings yet
Digital Signal Processing Lab Manual Updated
85 pages
Digital Signal Processing - Lecture 1_ Introduction
No ratings yet
Digital Signal Processing - Lecture 1_ Introduction
69 pages
Digital Signal Processing by S Salivahanan PDF Free
No ratings yet
Digital Signal Processing by S Salivahanan PDF Free
655 pages
Brief Notes On Signals and Systems: C. Sidney Burrus
No ratings yet
Brief Notes On Signals and Systems: C. Sidney Burrus
75 pages
Digital Signal Procesing
100% (1)
Digital Signal Procesing
800 pages
Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
100% (1)
Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
600 pages
Lecture 4 Slides DFT Sampling Theorem
No ratings yet
Lecture 4 Slides DFT Sampling Theorem
32 pages
Course Notes v17
No ratings yet
Course Notes v17
82 pages
DSP - 24 10 2022
No ratings yet
DSP - 24 10 2022
209 pages
Brainkart_211 - IT6502 Digital Signal Processing - Notes
No ratings yet
Brainkart_211 - IT6502 Digital Signal Processing - Notes
112 pages
Digital Signal Processing Lecture-2 29 July, 2008, Tuesday
No ratings yet
Digital Signal Processing Lecture-2 29 July, 2008, Tuesday
51 pages
ECTE301 Notes Week1
No ratings yet
ECTE301 Notes Week1
52 pages
1-dsp SBT PDF
No ratings yet
1-dsp SBT PDF
68 pages
Linear Algebra, Signal Processing, And Wavelets - A Unified Approach_ MATLAB Version (Instructor's Solution Manual) (Solutions)
No ratings yet
Linear Algebra, Signal Processing, And Wavelets - A Unified Approach_ MATLAB Version (Instructor's Solution Manual) (Solutions)
209 pages
ASP Exercises 1
No ratings yet
ASP Exercises 1
12 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
165 pages
Speech Signal Processing: A Handbook of Phonetic Science
No ratings yet
Speech Signal Processing: A Handbook of Phonetic Science
24 pages
Solution Manual for Introduction to Digital Signal Processing, 1st Edition, Dick Blandford John Parr - 2025 Version Is Available With All Chapters
100% (8)
Solution Manual for Introduction to Digital Signal Processing, 1st Edition, Dick Blandford John Parr - 2025 Version Is Available With All Chapters
45 pages
Analysis of Audio Signal Using Various T Ef70b0cd
No ratings yet
Analysis of Audio Signal Using Various T Ef70b0cd
13 pages
EE501 Adaptive Filter Design: Instructor: Dr. Farhan Khalid
No ratings yet
EE501 Adaptive Filter Design: Instructor: Dr. Farhan Khalid
26 pages
Brief Notes On Signals and Systems 7.2
No ratings yet
Brief Notes On Signals and Systems 7.2
77 pages
Ec3492-Digital Signal Processing Laboratory
100% (1)
Ec3492-Digital Signal Processing Laboratory
80 pages
Digital Signal Processing (Part1)
No ratings yet
Digital Signal Processing (Part1)
45 pages
Sampling Analog
No ratings yet
Sampling Analog
33 pages
Dsap Lab Report 077bei045
No ratings yet
Dsap Lab Report 077bei045
27 pages
Signals and Systems Laboratory 6:: Fourier Transform and Pulses
No ratings yet
Signals and Systems Laboratory 6:: Fourier Transform and Pulses
9 pages
DSP Salivahanan
No ratings yet
DSP Salivahanan
655 pages
Chapter1
No ratings yet
Chapter1
24 pages
DSP Journal - BE-08 PDF
No ratings yet
DSP Journal - BE-08 PDF
34 pages
Fourier Lab
No ratings yet
Fourier Lab
6 pages
DSP Lab Report # 04
No ratings yet
DSP Lab Report # 04
23 pages
Chapter 1 (Section-01)
No ratings yet
Chapter 1 (Section-01)
18 pages
2 System Overview: 2.1 Applications
No ratings yet
2 System Overview: 2.1 Applications
17 pages
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
From Everand
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
Lena Neill
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Building a BeagleBone Black Super Cluster
From Everand
Building a BeagleBone Black Super Cluster
Andreas Josef Reichel
No ratings yet
Programming for Kids and Beginners: 3-in-1 Masterclass into Python, Apps, and Games
From Everand
Programming for Kids and Beginners: 3-in-1 Masterclass into Python, Apps, and Games
Lena Neill
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture 4
No ratings yet
Lecture 4
50 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Lecture 3
No ratings yet
Lecture 3
49 pages
Heat Equation
No ratings yet
Heat Equation
19 pages
Internet Archive - GeoCities Special Collection 2009
No ratings yet
Internet Archive - GeoCities Special Collection 2009
2 pages
Monthly Homework Calendar For First Grade
100% (1)
Monthly Homework Calendar For First Grade
6 pages
Is 4576 1999
No ratings yet
Is 4576 1999
9 pages
Translating Grades Into The Danish Grading System (7 Point) : With Grades Given As Digits
No ratings yet
Translating Grades Into The Danish Grading System (7 Point) : With Grades Given As Digits
4 pages
SKF SE 508-607K7 + 22208 EK + H 308 Specification
No ratings yet
SKF SE 508-607K7 + 22208 EK + H 308 Specification
5 pages
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
No ratings yet
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
2 pages
2012 KW Oem Engine Harness
No ratings yet
2012 KW Oem Engine Harness
4 pages
Data Recovery Tomer
No ratings yet
Data Recovery Tomer
6 pages
Proforma for Submission of Dissertation DM MCh ABVMUUP-merged
No ratings yet
Proforma for Submission of Dissertation DM MCh ABVMUUP-merged
19 pages
Calgary Drop-In Center: Donor Information System: Decision Sheet by Abilesh. R
No ratings yet
Calgary Drop-In Center: Donor Information System: Decision Sheet by Abilesh. R
5 pages
Ipath3.0: Interactive Pathways Explorer V3: Youssef Darzi, Ivica Letunic, Peer Bork and Takuji Yamada
No ratings yet
Ipath3.0: Interactive Pathways Explorer V3: Youssef Darzi, Ivica Letunic, Peer Bork and Takuji Yamada
4 pages
Trading and Arbitrage in Cryptocurrency Markets
No ratings yet
Trading and Arbitrage in Cryptocurrency Markets
58 pages
Digital Technique Micro-Project Abhishek SH & Chinmay Kate
No ratings yet
Digital Technique Micro-Project Abhishek SH & Chinmay Kate
20 pages
03 - Module 3
No ratings yet
03 - Module 3
54 pages
OKR Spreadsheet Template With Weekly Checkins
No ratings yet
OKR Spreadsheet Template With Weekly Checkins
30 pages
Hardware Configuration Manual - 7300
No ratings yet
Hardware Configuration Manual - 7300
162 pages
Data Analysis
100% (1)
Data Analysis
4 pages
P2
No ratings yet
P2
1 page
Potvrzeni o Zajisteni Ubytovani-PO - PDF
No ratings yet
Potvrzeni o Zajisteni Ubytovani-PO - PDF
1 page
Interview Questions To Ask A Cyber Security Analyst Xobin Downloaded
No ratings yet
Interview Questions To Ask A Cyber Security Analyst Xobin Downloaded
8 pages
V11 Sage x3 Release Guide With Options On-Premises
No ratings yet
V11 Sage x3 Release Guide With Options On-Premises
48 pages
Dali 2-0
No ratings yet
Dali 2-0
21 pages
LN3Diff: Scalable Latent Neural Fields Diffusion For Speedy 3D Generation
No ratings yet
LN3Diff: Scalable Latent Neural Fields Diffusion For Speedy 3D Generation
29 pages
Terminales G-417 Crosby
No ratings yet
Terminales G-417 Crosby
1 page
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
24 pages
Ventilation Calculation 1591013185963
No ratings yet
Ventilation Calculation 1591013185963
8 pages
Generative Ai-Driven Human Digital Twin in Iot-Healthcare: A Comprehensive Survey
No ratings yet
Generative Ai-Driven Human Digital Twin in Iot-Healthcare: A Comprehensive Survey
22 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

ELEN 6820

Speech and audio signal processing

1 Introduc on and history -

2 Discrete signal processing DSP (W)

3 Machine learning 1 Neural network and VAD (P)

5 Speech signal produc on Speech produc on (W)

6 Speech signal representra on Speech enhancement (P)

10 Sequence modeling and HMMs Phoneme recogni on and ASR (P)

12 Automa c speech recogn on Projcet

13 Music signal processing -

• Two wri en homework (20%)

• Late submission: 10% penalty per day

• Preferably choose the prede ned course project

• Alterna vely, de ne a project that is similar in scope and

• Anaconda will simultaneously install Python and Jupyter Notebook as well as

• More informa on available on: h p://jupyter.readthedocs.io/en/latest/

Cocktail party problem, Cherry, (1953)

• Discrete me Fourier transform, z-transform

• Digital lters, IIR and FIR

• Sampling theorem, changing the sampling rate

• Discrete signal: x[n] = xa(nT), where T = 1/Fs

• Telephone bandwidth speech: Fs = 6.4KHz

• Wide-band speech: Fs = 16KHz

• Unit impulse func on, unit step func on, exponen al

• What did Fourier show?

• Whats the big deal? 1822

• Importance of sine func on for linear systems

• Examples: delayed unit response, box pulse, exponen al

• Proper es of z-Transform: linearity, shi , exponen al

Y (ejω ) = X (ejω )H(ejω )

• Sampling the DTFT: Discrete Fourier Transform (DFT)

• Periodic signals, or, nite length sequences

• What frequency each DFT corresponds to?

• Circular shi of x[n]

• Boundary condi ons, importance of windowing

Create a nite length sequence:

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 4

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 5

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 6

• Use a sliding window over the signal, and display the

• Large vs. Small window?

• Overlapping vs. non-overlapping?

Two plus seven is less than ten

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 8

Two plus seven is less than ten

Tradeoff between DFT length (temporal resolution)

and spectral resolution

• A digital lter is a discrete- me shi -invariant system

• Convolu on equa on: unit response, transfer func on,

• All useful systems sa sfy the linear di erence equa on

• Linear vs. nonlinear phase

• Large vs. small impulse response dura on

• The Sampling Theorem

You might also like