0% found this document useful (0 votes)
46 views

Lecture 2

Uploaded by

Rakshith Kamath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Lecture 2

Uploaded by

Rakshith Kamath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

ELEN 6820

Speech and audio signal processing


Instructor: Nima Mesgarani (nm2764)
3 credits
TA: Yi Luo (yl3364)
O ce hours: TBD
ffi
Course overview
• Brief history of speech recogni on
• Discrete Signal Processing (DSP) overview
• Pa ern recogni on and deep learning overview
• Speech signal produc on
• Speech signal representa on
• Auditory Scene Analysis, speech enhancement and separa on
• Speech processing in the auditory system
• Acous c modeling
• Sequence recogni on and Hidden Markov Models
• Language models
• Music signal processing
tt
ti
ti
ti
ti
ti
ti
ti
Homeworks
• HW1: Discrete signal processing (wri en) (W2)
• HW2: Neural networks and voice ac vity detec on (programming) (W3&4)
• HW3: Speech signal produc on and representa on (wri en) (W5)
• HW3: Speech enhancement and separa on (programming) (W6)
• HW4: Acous c event detec on and Speaker iden ca on (programming)
(W7&8)
• HW5: Phoneme recogni on and automa c speech recogni on
(programming) (W9&10)
• Final project (programming) (W11-13)
ti
ti
ti
ti
ti
tt
ti
ti
ti
ti
ti
fi
ti
tt
ti
Week topic HW

1 Introduc on and history -

2 Discrete signal processing DSP (W)

3 Machine learning 1 Neural network and VAD (P)

4 Machine learning 2 -

5 Speech signal produc on Speech produc on (W)

6 Speech signal representra on Speech enhancement (P)


Speech enhancement and
7 -
separa on
8 Human speech percep on Acous c event detec on (P)

9 Acous c modeling -

10 Sequence modeling and HMMs Phoneme recogni on and ASR (P)

11 Language modeling -

12 Automa c speech recogn on Projcet

13 Music signal processing -


ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
Course evalua on

• Two wri en homework (20%)


• Four programming homework (60%)
• Final project (20%)

• Late submission: 10% penalty per day


tt
ti
Final project

• Preferably choose the prede ned course project

• Alterna vely, de ne a project that is similar in scope and


workload, in discussion with me and Yi
ti
fi
fi
How to install Python with graphical interface on Mac/
Windows/Linux
Install Jupyter Notebook using Anaconda and cond
Download it from the following address and follow the instruction
www.anaconda.com/products/individual

• Anaconda will simultaneously install Python and Jupyter Notebook as well as


some necessary packages (e. g. numpy, scipy, etc.)
• You can either use graphical installer or use command line installer in Mac OS
• If you have Windows 10 and want to use bash commands, it is highly
recommended that you enable Linux subsystem bash environment and
install a Linux version of Anaconda on it (using command line installer)
• A er installing Anaconda, you can either run Jupyter Notebook from
Anaconda app or run the command, jupyter notebook, in terminal
• You can also use other environments for interac ng with Python, but the one
recommended for this course is Jupyter Notebook, specially if you want to
run your codes on a server (e.g. for Tensor ow)

• More informa on available on: h p://jupyter.readthedocs.io/en/latest/


index.html
• More instruc ons and tutorials for star ng Python will be taught next session
• The assignments will be checked in Jupyter Notebook using Python 3
ft
ti
ti
tt
ti
fl
ti
Signal processing
background (chapter 2)

Speech communica on

Produc on Percep on

Ear drum

Cocktail party problem, Cherry, (1953)


ti
ti
ti
Discrete Signal Processing
• Discrete me signals and systems

• Discrete me Fourier transform, z-transform

• Digital lters, IIR and FIR

• Sampling theorem, changing the sampling rate

• Emphasis on intui on
fi
ti
ti
ti
Discrete me Signals and
Systems
• Speech signal: represen ng con nuously varying pa ern
as func ons of a con nous variable t, which represents
me.

• Discrete signal: x[n] = xa(nT), where T = 1/Fs

• Telephone bandwidth speech: Fs = 6.4KHz

• Wide-band speech: Fs = 16KHz


ti
ti
ti
ti
ti
ti
tt
Few basics

• Unit impulse func on, unit step func on, exponen al


sequence

• Convolu on
ti
ti
ti
ti
Transforma ons of Signals and
Systems

• Fourier Transform

• z-Transform
ti
The Con nous-Time Fourier
Transform

• What did Fourier show?

• Whats the big deal? 1822


• Decomposing signals into fast and slow components

• Importance of sine func on for linear systems


ti
ti
The z-Transform
• A powerful tool for analyzing linear systems of di eren al
equa ons

• De ni on

• Inverse z-Transform

• Examples: delayed unit response, box pulse, exponen al

• Proper es of z-Transform: linearity, shi , exponen al


weigh ng, Linear weigh ng, convolu on, mul plica on of
sequences
fi
ti
ti
ti
ti
ti
ti
ft
ti
ff
ti
ti
ti
ti
The Discrete-Time Fourier Transform
Discrete-Time Fourier Transform
+∞

!
(e ) = x[n]e−jωn
 jω


 X
 n=−∞

 & π
 1

 x[n] = 2π
X (ejω )ejωn dω
−π
+∞ ""
! "
"
• De condition
• Sufficient ni on, periodic
for convergence: "
" x[n] " < +∞
"
n=−∞

• Although x[n] is discrete, X (ejω ) is continuous and periodic with period 2π.
• Inverse DTFT
• Convolution/multiplication duality:

y[n] = x[n] ∗ h[n]
• DTFT of a Cosine Signal

Y (ejω ) = X (ejω )H(ejω )




 y[n] = x[n]w[n]

& π
fi
ti
The Discrete Fourier Transform

• Sampling the DTFT: Discrete Fourier Transform (DFT)


Prac cal implica ons

• Periodic signals, or, nite length sequences

• What frequency each DFT corresponds to?

• Circular shi of x[n]

• Boundary condi ons, importance of windowing


ti
ft
ti
fi
ti
Dependent Fourier
e-Dependent Transform)
Fourier Transform)

Create a nite length sequence:


w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]
w [ 50 - m ] w [ 100 - m ] w [ 200 - m ]

x [ mx] [ m ]

windowing m m

00 nn == 50
50 nn
==100
100 n = 200
n = 200

+∞
!+∞
Xn (ejω

)= ! w[n − m]x[m]e−jωm
−jωm
Xn (e ) = m=−∞ w[n − m]x[m]e
m=−∞
fixed, then it can be shown that:
fixed, then it can be shown that:
" π
1
Xn (ejω ) = 2π
" πW (ejθ )ejθn X (ej(ω+θ) )dθ
1
Xn (ejω ) = 2π
−π W (ejθ )ejθn X (ej(ω+θ) )dθ
−π
bove equation is meaningful only if we assume that X (ejω ) represents
er transform
ove equationofisa meaningful
signal whoseonly
properties continuethat
if we assume X (ejω
outside the) repres
windo
ytransform
that the signal is zero whose
of a signal outside properties
the window.continue outside the wi
that
der forthe signal
Xn (e jω
) to is zero outside
correspond the
to X (e jω window.
), W (ejω ) must resemble an impu

fi
Rectangular window
Rectangular Window

w[n] = 1, 0≤n≤N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 4


Hamming window
Hamming Window

2πn
" !
w[n] = 0.54 − 0.46cos , 0≤n≤N −1
N −1

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 5


Comparison of Windows

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 6


Spectrogram

• Use a sliding window over the signal, and display the


magne te of the DFT for each step.

• Large vs. Small window?

• Overlapping vs. non-overlapping?


ti
A Wideband Spectrogram

Two plus seven is less than ten

6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 8


A Narrowband Spectrogram

Two plus seven is less than ten

Tradeoff between DFT length (temporal resolution)


6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 9

and spectral resolution


Digital lters

• A digital lter is a discrete- me shi -invariant system

• Convolu on equa on: unit response, transfer func on,


system func on

• All useful systems sa sfy the linear di erence equa on


ti
fi
ti
fi
ti
ti
ti
ft
ff
ti
ti
FIR vs. IIR lters

• Linear vs. nonlinear phase

• Large vs. small impulse response dura on


fi
ti
Sampling
• Represent a con nous me signal as a sequence of
numbers

• The Sampling Theorem


ti
ti
Changing the sampling rate of a signal

You might also like