0% found this document useful (0 votes)
49 views30 pages

Lecture 2

Uploaded by

Rakshith Kamath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views30 pages

Lecture 2

Uploaded by

Rakshith Kamath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

ELEN 6820

Speech and audio signal processing


Instructor: Nima Mesgarani (nm2764)
3 credits
TA: Yi Luo (yl3364)
O ce hours: TBD
ffi
Course overview
• Brief history of speech recogni on
• Discrete Signal Processing (DSP) overview
• Pa ern recogni on and deep learning overview
• Speech signal produc on
• Speech signal representa on
• Auditory Scene Analysis, speech enhancement and separa on
• Speech processing in the auditory system
• Acous c modeling
• Sequence recogni on and Hidden Markov Models
• Language models
• Music signal processing
tt
ti
ti
ti
ti
ti
ti
ti
Homeworks
• HW1: Discrete signal processing (wri en) (W2)
• HW2: Neural networks and voice ac vity detec on (programming) (W3&4)
• HW3: Speech signal produc on and representa on (wri en) (W5)
• HW3: Speech enhancement and separa on (programming) (W6)
• HW4: Acous c event detec on and Speaker iden ca on (programming)
(W7&8)
• HW5: Phoneme recogni on and automa c speech recogni on
(programming) (W9&10)
• Final project (programming) (W11-13)
ti
ti
ti
ti
ti
tt
ti
ti
ti
ti
ti
fi
ti
tt
ti
Week topic HW

1 Introduc on and history -

2 Discrete signal processing DSP (W)

3 Machine learning 1 Neural network and VAD (P)

4 Machine learning 2 -

5 Speech signal produc on Speech produc on (W)

6 Speech signal representra on Speech enhancement (P)


Speech enhancement and
7 -
separa on
8 Human speech percep on Acous c event detec on (P)

9 Acous c modeling -

10 Sequence modeling and HMMs Phoneme recogni on and ASR (P)

11 Language modeling -

12 Automa c speech recogn on Projcet

13 Music signal processing -


ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
Course evalua on

• Two wri en homework (20%)


• Four programming homework (60%)
• Final project (20%)

• Late submission: 10% penalty per day


tt
ti
Final project

• Preferably choose the prede ned course project

• Alterna vely, de ne a project that is similar in scope and


workload, in discussion with me and Yi
ti
fi
fi
How to install Python with graphical interface on Mac/
Windows/Linux
Install Jupyter Notebook using Anaconda and cond
Download it from the following address and follow the instruction
www.anaconda.com/products/individual

• Anaconda will simultaneously install Python and Jupyter Notebook as well as


some necessary packages (e. g. numpy, scipy, etc.)
• You can either use graphical installer or use command line installer in Mac OS
• If you have Windows 10 and want to use bash commands, it is highly
recommended that you enable Linux subsystem bash environment and
install a Linux version of Anaconda on it (using command line installer)
• A er installing Anaconda, you can either run Jupyter Notebook from
Anaconda app or run the command, jupyter notebook, in terminal
• You can also use other environments for interac ng with Python, but the one
recommended for this course is Jupyter Notebook, specially if you want to
run your codes on a server (e.g. for Tensor ow)

• More informa on available on: h p://jupyter.readthedocs.io/en/latest/


index.html
• More instruc ons and tutorials for star ng Python will be taught next session
• The assignments will be checked in Jupyter Notebook using Python 3
ft
ti
ti
tt
ti
fl
ti
Signal processing
background (chapter 2)

Speech communica on

Produc on Percep on

Ear drum

Cocktail party problem, Cherry, (1953)


ti
ti
ti
Discrete Signal Processing
• Discrete me signals and systems

• Discrete me Fourier transform, z-transform

• Digital lters, IIR and FIR

• Sampling theorem, changing the sampling rate

• Emphasis on intui on
fi
ti
ti
ti
Discrete me Signals and
Systems
• Speech signal: represen ng con nuously varying pa ern
as func ons of a con nous variable t, which represents
me.

• Discrete signal: x[n] = xa(nT), where T = 1/Fs

• Telephone bandwidth speech: Fs = 6.4KHz

• Wide-band speech: Fs = 16KHz


ti
ti
ti
ti
ti
ti
tt
Few basics

• Unit impulse func on, unit step func on, exponen al


sequence

• Convolu on
ti
ti
ti
ti
Transforma ons of Signals and
Systems

• Fourier Transform

• z-Transform
ti
The Con nous-Time Fourier
Transform

• What did Fourier show?

• Whats the big deal? 1822


• Decomposing signals into fast and slow components

• Importance of sine func on for linear systems


ti
ti
The z-Transform
• A powerful tool for analyzing linear systems of di eren al
equa ons

• De ni on

• Inverse z-Transform

• Examples: delayed unit response, box pulse, exponen al

• Proper es of z-Transform: linearity, shi , exponen al


weigh ng, Linear weigh ng, convolu on, mul plica on of
sequences
fi
ti
ti
ti
ti
ti
ti
ft
ti
ff
ti
ti
ti
ti
The Discrete-Time Fourier Transform
Discrete-Time Fourier Transform
+∞

!
(e ) = x[n]e−jωn
 jω


 X
 n=−∞

 & π
 1

 x[n] = 2π
X (ejω )ejωn dω
−π
+∞ ""
! "
"
• De condition
• Sufficient ni on, periodic
for convergence: "
" x[n] " < +∞
"
n=−∞

• Although x[n] is discrete, X (ejω ) is continuous and periodic with period 2π.
• Inverse DTFT
• Convolution/multiplication duality:

y[n] = x[n] ∗ h[n]
• DTFT of a Cosine Signal

Y (ejω ) = X (ejω )H(ejω )




 y[n] = x[n]w[n]

& π
fi
ti
The Discrete Fourier Transform

• Sampling the DTFT: Discrete Fourier Transform (DFT)


Prac cal implica ons

• Periodic signals, or, nite length sequences

• What frequency each DFT corresponds to?

• Circular shi of x[n]

• Boundary condi ons, importance of windowing


ti
ft
ti
fi
ti
Dependent Fourier
e-Dependent Transform)
Fourier Transform)

Create a nite length sequence:


w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]
w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]

x [ mx] [ m ]

windowing m m

00 nn == 50
50 nn
==100
100 n = 200
n = 200

+∞
!+∞
Xn (ejω

)= ! w[n − m]x[m]e−jωm
−jωm
Xn (e ) = m=−∞ w[n − m]x[m]e
m=−∞
fixed, then it can be shown that:
fixed, then it can be shown that:
" π
1
Xn (ejω ) = 2π
" πW (ejθ )ejθn X (ej(ω+θ) )dθ
1
Xn (ejω ) = 2π
−π W (ejθ )ejθn X (ej(ω+θ) )dθ
−π
bove equation is meaningful only if we assume that X (ejω ) represents
er transform
ove equationofisa meaningful
signal whoseonly
properties continuethat
if we assume X (ejω
outside the) repres
windo
ytransform
that the signal is zero whose
of a signal outside properties
the window.continue outside the wi
that
der forthe signal
Xn (e jω
) to is zero outside
correspond the
to X (e jω window.
), W (ejω ) must resemble an impu

fi
Rectangular window
Rectangular Window

w[n] = 1, 0≤n≤N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 4


Hamming window
Hamming Window

2πn
" !
w[n] = 0.54 − 0.46cos , 0≤n≤N −1
N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 5


Comparison of Windows

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 6


Spectrogram

• Use a sliding window over the signal, and display the


magne te of the DFT for each step.

• Large vs. Small window?

• Overlapping vs. non-overlapping?


ti
A Wideband Spectrogram

Two plus seven is less than ten

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 8


A Narrowband Spectrogram

Two plus seven is less than ten

Tradeoff between DFT length (temporal resolution)


6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 9

and spectral resolution


Digital lters

• A digital lter is a discrete- me shi -invariant system

• Convolu on equa on: unit response, transfer func on,


system func on

• All useful systems sa sfy the linear di erence equa on


ti
fi
ti
fi
ti
ti
ti
ft
ff
ti
ti
FIR vs. IIR lters

• Linear vs. nonlinear phase

• Large vs. small impulse response dura on


fi
ti
Sampling
• Represent a con nous me signal as a sequence of
numbers

• The Sampling Theorem


ti
ti
Changing the sampling rate of a signal

You might also like