0% found this document useful (0 votes)

93 views33 pages

Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition

This document provides an introduction and overview of a course on speech and audio processing and recognition. It discusses the topics that will be covered over the course, including fundamentals of digital signal processing, audio processing techniques, and applications like speech recognition and music information retrieval. The course structure is outlined, which includes weekly assignments, a midterm, and a final project. It also provides a review of relevant digital signal processing concepts like timescale modification algorithms.

Uploaded by

Brian Wheeler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views33 pages

Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition

Uploaded by

Brian Wheeler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

EE E6820: Speech & Audio Processing & Recognition

Lecture 1:
Introduction & DSP
Dan Ellis <[email protected]>
Mike Mandel <[email protected]>
Columbia University Dept. of Electrical Engineering
http://www.ee.columbia.edu/dpwe/e6820
January 22, 2009
1
Sound and information
2
Course Structure
3
DSP review: Timescale modication
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 1 / 33
Outline
1
Sound and information
2
Course Structure
3
DSP review: Timescale modication
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 2 / 33
Sound and information
Sound is air pressure variation
Mechanical vibration
Pressure waves in air
Motion of sensor
Time-varying voltage
+ + + +
t
v(t)
Transducers convert air pressure voltage
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 3 / 33
What use is sound?
Footsteps examples:
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
-0.5
0
0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
-0.5
0
0.5
time / s
Hearing confers an evolutionary advantage
useful information, complements vision
. . . at a distance, in the dark, around corners
listeners are highly adapted to natural sounds (including
speech)
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 4 / 33
The scope of audio processing
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 5 / 33
The acoustic communication chain
message signal channel receiver decoder
!
synthesis
audio
processing
recognition
Sound is an information bearer
Received sound reects source(s)
plus eect of environment (channel)
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 6 / 33
Levels of abstraction
Much processing concerns shifting between levels of abstraction
sound p(t)
representation
(e.g. t-f energy)
information
abstract
concrete
A
n
a
l
y
s
i
s
S
y
n
t
h
e
s
i
s
Dierent representations serve dierent tasks
separating aspects, making things explicit, . . .
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 7 / 33
Outline
1
Sound and information
2
Course Structure
3
DSP review: Timescale modication
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 8 / 33
Source structure
Goals

survey topics in sound analysis & processing

develop and intuition for sound signals

learn some specic technologies

Course structure

weekly assignments (25%)

midterm event (25%)

nal project (50%)

Text
Speech and Audio Signal Processing
Ben Gold & Nelson Morgan
Wiley, 2000
ISBN: 0-471-35154-7
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 9 / 33
Web-based
Course website:
http://www.ee.columbia.edu/dpwe/e6820/
for lecture notes, problem sets, examples, . . .
+ student web pages for homework, etc.
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 10 / 33
Course outline
Fundamentals
L1:
DSP
L2:
Acoustics
L3:
Pattern
recognition
L4:
Auditory
perception
Audio processing
L5:
Signal
models
L6:
Music
analysis/
synthesis
L7:
Audio
compression
L8:
Spatial sound
& rendering
Applications
L9:
Speech
recognition
L10:
Music
retrieval
L11:
Signal
separation
L12:
Multimedia
indexing
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 11 / 33
Weekly assignments
Research papers

journal & conference publications

summarize & discuss in class

written summaries on web page + Courseworks discussion

Practical experiments

Matlab-based (+ Signal Processing Toolbox)

direct experience of sound processing

skills for project

Book sections
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 12 / 33
Final project
Most signicant part of course (50%) of grade
Oral proposals mid-semester;
Presentations in nal class
+ website
Scope

practical (Matlab recommended)

identify a problem; try some solutions

evaluation
Topic

few restrictions within world of audio

investigate other resources

develop in discussion with me

Citation & plagiarism
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 13 / 33
Examples of past projects
Automatic prosody classication Model-based note transcription
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 14 / 33
Outline
1
Sound and information
2
Course Structure
3
DSP review: Timescale modication
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 15 / 33
DSP review: digital signals
time
x
d
[n] = Q( x
c
(nT ) )
Discrete-time sampling
limits bandwidth
Discrete-level
quantization
limits
dynamic range
T

sampling interval T
sampling frequency
T
=
2
T
quantizer Q(y) =
_
y

_
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 16 / 33
The speech signal: time domain
Speech is a sequence of dierent sound types
-0.2
-0.1
0
0.1
0.2
1.38 1.4 1.42
.1
0
.1
1.52 1.54 1.56 1.58
-0.1
0
0.1
1.86 1.88 1.92 1.9
-0.05
0
0.05
2.42 2.44 2.46 2.4
-0.02
0
0.02
1.4 1.6 1.8 2 2.2 2.4 2.6
time/s
watch thin as a dime a has
Vowel: periodic
has
Fricative: aperiodic
watch
Glide: smooth transition
watch
Stop burst: transient
dime
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 17 / 33
Timescale modication (TSM)
Can we modify a sound to make it slower ?
i.e. speech pronounced more slowly
e.g. to help comprehension, analysis
or more quickly for speed listening ?
Why not just slow it down?
x
s
(t) = x
o
(
t
r
), r = slowdown factor (> 1 slower)
equivalent to playback at a dierent sampling rate
2.35 2.4 2.45 2.5 2.55 2.6
-0.1
-0.05
0
0.05
0.1
2.35 2.4 2.45 2.5 2.55 2.6
-0.1
-0.05
0
0.05
0.1
time/s
Original
2x slower
r = 2
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 18 / 33
Time-domain TSM
Problem: want to preserve local time structure
but alter global time structure
Repeat segments

but: artifacts from abrupt edges

Cross-fade & overlap
y
m
[mL + n] = y
m1
[mL + n] + w[n] x
__
m
r
_
L + n
_
2.35 2.4 2.45 2.5 2.55 2.6
-0.1
0
0.1
4.7 4.75 4.8 4.85 4.9 4.95
-0.1
0
0.1
1
1
1 1 2 2 3 3 4 4 5 5 6
2
2
3
3
4
4
5 6
6
5
time / s
time / s
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 19 / 33
Synchronous overlap-add (SOLA)
Idea: allow some leeway in placing window to optimize alignment
of waveforms
1
2
K
m
maximizes alignment of 1 and 2
Hence,
y
m
[mL + n] = y
m1
[mL + n] + w[n] x
__
m
r
_
L + n + K
m
_
Where K
m
chosen by cross-correlation:
K
m
= argmax
0KK
u

N
ov
n=0
y
m1
[mL + n] x
__
m
r
_
L + n + K

(y
m1
[mL + n])
2

(x
__
m
r
_
L + n + K

)
2
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 20 / 33
The Fourier domain
Fourier Series (periodic continuous x)

0
=
2
T
x(t) =

k
c
k
e
jk
0
t
c
k
=
1
2T
_
T/2
T/2
x(t)e
jk
0
t
dt
k
1 2 3 5 6 7 4
|c
k
|
1.0
1.5 1 0.5 0 0.5 1 1.5
1
0.5
0
0.5
t
x(t)
Fourier Transform (aperiodic continuous x)
x(t) =
1
2
_
X(j )e
j t
d
X(j ) =
_
x(t)e
j t
dt
0 0.002 0.004 0.006 0.008
time / sec
level
/ dB
-0.01
0
0.01
0.02
x(t)
0 2000 4000 6000 8000
freq / Hz
-80
-60
-40
-20 |
X(j)
|
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 21 / 33
Discrete-time Fourier
DT Fourier Transform (aperiodic sampled x)
x[n] =
1
2
_

X(e
j
)e
j n
d
X(e
j
) =

x[n]e
j n
n
-1 1 2 3 4 5 6 7
0
|X(e
j
)|

2 3 4 5
1
2
3
x [n]
Discrete Fourier Transform (N-point x)
x[n] =

k
X[k]e
j
2kn
N
X[k] =

n
x[n]e
j
2kn
N
k
|X(e
j
)| |X[k]|
k=1...
n
1 2 3 4 5 6 7
x [n]
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 22 / 33
Sampling and aliasing
Discrete-time signals equal the continuous time signal at discrete
sampling instants:
x
d
[n] = x
c
(nT)
Sampling cannot represent rapid uctuations
0 1 2 3 4 5 6 7 8 9 10
1
0.5
0
0.5
1
sin
__

M
+
2
T
_
Tn
_
= sin(
M
Tn) n Z
Nyquist limit (
T
/2) from periodic spectrum:

T

T

T
-
M
T
+
M
G
p
(j)
G
a
(j)
alias of baseband
signal
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 23 / 33
Speech sounds in the Fourier domain
1.52 1.54 1.56 1.58
-0.1
0
0.1
2.42 2.44 2.46 2.48
-0.02
0
0.02
0 1000 2000 3000 4000
-100
-80
-60
-40
0 1000 2000 3000 4000
-100
-80
-60
1.37 1.38 1.39 1.4 1.41 1.42
-0.1
0
0.1
0 1000 2000 3000
-100
-80
-60
-40
1.86 1.87 1.88 1.89 1.9 1.91
-0.05
0
0.05
0 1000 2000 3000 4000
-100
-80
-60
Vowel: periodic
has
Fricative: aperiodic
watch
Glide: transition
watch
Stop: transient
dime
time domain frequency domain
time / s freq / Hz
e
n
e
r
g
y

/

d
B
dB = 20 log
10
(amplitude) = 10 log
10
(power)
Voiced spectrum has pitch + formants
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 24 / 33
Short-time Fourier Transform
Want to localize energy in time and frequency
break sound into short-time pieces
calculate DFT of each one
2.35 2.4 2.45
0
4000
3000
2000
1000
2.5 2.55 2.6
-0.1
0
0.1
time / s
f
r
e
q

/

H
z
k

short-time
window
DFT
m = 0 m = 1 m = 2 m = 3
L 2L 3L
Mathematically,
X[k, m] =
N1

n=0
x[n] w[n mL] exp
_
j
2k(n mL)
N
_
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 25 / 33
The Spectrogram
Plot STFT X[k, m] as a gray-scale image
time / s
time / s
f
r
e
q

/

H
z
i
n
t
e
n
s
i
t
y

/

d
B
2.35 2.4 2.45 2.5 2.55 2.6
0
1000
2000
3000
4000
f
r
e
q

/

H
z
0
1000
2000
3000
4000
0
0.1
-50
-40
-30
-20
-10
0
10
0 0.5 1 1.5 2 2.5
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 26 / 33
Time-frequency tradeo
Longer window w[n] gains frequency resolution at cost of time
resolution
1.4 1.6 1.8 2 2.2 2.4 2.6
f
r
e
q

/

H
z
time / s
level
/ dB
0
1000
2000
3000
4000
f
r
e
q

/

H
z
0
1000
2000
3000
4000
0
0.2
W
i
n
d
o
w

=

2
5
6

p
t
N
a
r
r
o
w
b
a
n
d

W
i
n
d
o
w

=

4
8

p
t
W
i
d
e
b
a
n
d

-50
-40
-30
-20
-10
0
10
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 27 / 33
Speech sounds on the Spectrogram
Most popular speech visualization
f
r
e
q

/

H
z
0
1000
2000
3000
4000
1.4 1.6 1.8 2 2.2 2.4 2.6
time/s
watch thin as a dime a has
V
o
w
e
l
:

p
e
r
i
o
d
i
c

h
a
s

F
r
i
c
'
v
e
:

a
p
e
r
i
o
d
i
c

w
a
t
c
h

G
l
i
d
e
:

t
r
a
n
s
i
t
i
o
n

w
a
t
c
h

S
t
o
p
:

t
r
a
n
s
i
e
n
t

d
i
m
e

Wideband (short window) better than narrowband (long window)
to see formants
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 28 / 33
TSM with the Spectrogram
Just stretch out the spectrogram?
Time
F
r
e
q
u
e
n
c
y
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
1000
2000
3000
4000
Time
F
r
e
q
u
e
n
c
y
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
1000
2000
3000
4000
how to resynthesize?
spectrogram is only |Y[k, m]|
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 29 / 33
The Phase Vocoder
Timescale modication in the STFT domain
Magnitude from stretched spectrogram:
|Y[k, m]| =

X
_
k,
m
r
_

e.g. by linear interpolation

But preserve phase increment between slices:

Y
[k, m] =

X
_
k,
m
r
_

e.g. by discrete dierentiator

Does right thing for single sinusoid

keeps overlapped parts of sinusoid aligned

time
=
T
.
' = 2T
.
T
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 30 / 33
General issues in TSM
Time window

stretching a narrowband spectrogram

Malleability of dierent sounds

vowels stretch well, stops lose nature

Not a well-formed problem?

want to alter time without frequency

. . . but time and frequency are not separate!

satisfying result is a subjective judgment

solution depends on auditory perception. . .
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 31 / 33
Summary
Information in sound

lots of it, multiple levels of abstraction

Course overview

survey of audio processing topics

practicals, readings, project

DSP review

digital signals, time domain

Fourier domain, STFT

Timescale modication

properties of the speech signal

time-domain

phase vocoder
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 32 / 33
References
J. L. Flanagan and R. M. Golden. Phase vocoder. Bell System Technical Journal,
pages 14931509, 1966.
M. Dolson. The Phase Vocoder: A Tutorial. Computer Music Journal, 10(4):1427,
1986.
M. Puckette. Phase-locked vocoder. In Proc. IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics (WASPAA), pages 222225, 1995.
A. T. Cemgil and S. J. Godsill. Probabilistic Phase Vocoder and its application to
Interpolation of Missing Values in Audio Signals. In 13th European Signal
Processing Conference, Antalya, Turkey, 2005.
Dan Ellis (Ellis & Mandel) Intro & DSP January 22, 2009 33 / 33

Introduction (UCS749)
No ratings yet
Introduction (UCS749)
72 pages
L1 - Introduction - Aysun
No ratings yet
L1 - Introduction - Aysun
38 pages
02 Filters and Fourier Transforms
No ratings yet
02 Filters and Fourier Transforms
108 pages
MEH-Nakai Lab-1
No ratings yet
MEH-Nakai Lab-1
93 pages
Unit5 Speech Processing
No ratings yet
Unit5 Speech Processing
31 pages
Designing Sound
No ratings yet
Designing Sound
18 pages
EE320 01 Sinusoids
No ratings yet
EE320 01 Sinusoids
20 pages
1 - Intro To DSP - 2023 - Lectures 1 2
No ratings yet
1 - Intro To DSP - 2023 - Lectures 1 2
83 pages
Lecture - 01 - Introduction
No ratings yet
Lecture - 01 - Introduction
33 pages
Musical Signal Processing
No ratings yet
Musical Signal Processing
19 pages
3hh13842ffaatczza01 - v1 - r6.5 Cli Commands For 7362 Isam DFSF
No ratings yet
3hh13842ffaatczza01 - v1 - r6.5 Cli Commands For 7362 Isam DFSF
4,580 pages
02 Filters and Fourier Transforms PDF
No ratings yet
02 Filters and Fourier Transforms PDF
108 pages
PCP Notes Speech Processing Jan08
No ratings yet
PCP Notes Speech Processing Jan08
35 pages
02 Filters and Fourier Transforms PDF
No ratings yet
02 Filters and Fourier Transforms PDF
107 pages
Mavlas Slides
No ratings yet
Mavlas Slides
554 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Week-1 EEE 2415 Speech Processing - Course Content
No ratings yet
Week-1 EEE 2415 Speech Processing - Course Content
3 pages
(Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora Download
100% (1)
(Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora Download
52 pages
DSPFirst L01
No ratings yet
DSPFirst L01
19 pages
PCS Report
No ratings yet
PCS Report
10 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
Time-Scale Modification Algorithms For Music Audio Signals
No ratings yet
Time-Scale Modification Algorithms For Music Audio Signals
104 pages
Optimal Filtering
From Everand
Optimal Filtering
Brian D. O. Anderson
4/5 (2)
Chapter6 - SPEECH SIGNAL PROCESSING
No ratings yet
Chapter6 - SPEECH SIGNAL PROCESSING
54 pages
Chapter 1: Introduction To Audio Signal Processing: KH Wong
100% (1)
Chapter 1: Introduction To Audio Signal Processing: KH Wong
55 pages
Spectral Modeling and Signal Processing Intro421
100% (2)
Spectral Modeling and Signal Processing Intro421
35 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
A First Course in Wavelets with Fourier Analysis
From Everand
A First Course in Wavelets with Fourier Analysis
Albert Boggess
3.5/5 (2)
Signal Processing First: Lecture #1 Sinusoids
No ratings yet
Signal Processing First: Lecture #1 Sinusoids
19 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Some Simple Manipulations of Sound Using Digital Signal Processing Richard M. Stern
No ratings yet
Some Simple Manipulations of Sound Using Digital Signal Processing Richard M. Stern
14 pages
Introduction To Multimedia. Analog-Digital Representation
100% (1)
Introduction To Multimedia. Analog-Digital Representation
29 pages
David Crawford Epson
No ratings yet
David Crawford Epson
31 pages
Steve Harris+Joern Nettingsmeier-Audio Engineering
No ratings yet
Steve Harris+Joern Nettingsmeier-Audio Engineering
57 pages
Audiosignalprocessing
No ratings yet
Audiosignalprocessing
11 pages
l4n JN Uhbh Hiunun Hbinun
No ratings yet
l4n JN Uhbh Hiunun Hbinun
36 pages
Speech & Audio Processing Part - II: Marc Moonen Dept. E.E./ESAT, K.U.Leuven
No ratings yet
Speech & Audio Processing Part - II: Marc Moonen Dept. E.E./ESAT, K.U.Leuven
22 pages
Audio Processes (Part)
No ratings yet
Audio Processes (Part)
37 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
No ratings yet
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
8 pages
Linear Prediction
No ratings yet
Linear Prediction
18 pages
Security-Plus Exam Cram - DOM3 - HANDOUT
No ratings yet
Security-Plus Exam Cram - DOM3 - HANDOUT
182 pages
Sound Design and Mixing in Reason
From Everand
Sound Design and Mixing in Reason
Andrew Eisele
3/5 (2)
Good Matter
No ratings yet
Good Matter
57 pages
Basic Course Material Winter 2015
100% (1)
Basic Course Material Winter 2015
19 pages
Filtering With Spatial Parameters in B-Format Audio Streams
No ratings yet
Filtering With Spatial Parameters in B-Format Audio Streams
49 pages
Trends in Audio and Acoustic Signal Processing
No ratings yet
Trends in Audio and Acoustic Signal Processing
35 pages
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
From Everand
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Anton R Kamenov
No ratings yet
Lab 04: Synthesis of Sinusoidal Signals-Music Synthesis: Signal Processing First
No ratings yet
Lab 04: Synthesis of Sinusoidal Signals-Music Synthesis: Signal Processing First
12 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
LAB 1: Overview of DSP LAB (EEE 3218) Objectives
No ratings yet
LAB 1: Overview of DSP LAB (EEE 3218) Objectives
16 pages
Computer Music
75% (4)
Computer Music
116 pages
Discrete Time Processing of Speech Signa
No ratings yet
Discrete Time Processing of Speech Signa
12 pages
Octave System Sound Processing Library: Lóránt Oroszlány
No ratings yet
Octave System Sound Processing Library: Lóránt Oroszlány
39 pages
Advanced Calculus
No ratings yet
Advanced Calculus
592 pages
Math Book
No ratings yet
Math Book
278 pages
11-Speech Encryption and Decryption
No ratings yet
11-Speech Encryption and Decryption
13 pages
Integrated Report Vodafone Spain. ING
No ratings yet
Integrated Report Vodafone Spain. ING
146 pages
Digital Signal Processing Digital Signal Processing Design Design
No ratings yet
Digital Signal Processing Digital Signal Processing Design Design
20 pages
The Saladin Strategy - Norm Clark PDF
No ratings yet
The Saladin Strategy - Norm Clark PDF
200 pages
How To Start Telecommunication Business in Nigeria (Updated)
No ratings yet
How To Start Telecommunication Business in Nigeria (Updated)
6 pages
Log IOS Full
No ratings yet
Log IOS Full
190 pages
Thesis Proposal
No ratings yet
Thesis Proposal
47 pages
【Datasheet】UNV HB-T-TP4K Wireless Screen Sharing Dongle 20250217 (Public)
No ratings yet
【Datasheet】UNV HB-T-TP4K Wireless Screen Sharing Dongle 20250217 (Public)
3 pages
Wnpo Risk Assessment Report v1 020205
No ratings yet
Wnpo Risk Assessment Report v1 020205
17 pages
Speech Recognition (Dr. M. Sabarimalai Manikandan
No ratings yet
Speech Recognition (Dr. M. Sabarimalai Manikandan
2 pages
MArantz NR1605
No ratings yet
MArantz NR1605
11 pages
LETTER WRITING For 6-10
No ratings yet
LETTER WRITING For 6-10
14 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Multistack: Bacnet Portal Technical Manual
No ratings yet
Multistack: Bacnet Portal Technical Manual
16 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Desktop Mastering
From Everand
Desktop Mastering
Bob Buontempo
No ratings yet
Audio Unit Graphs: Dave Dribin - @ddribin
No ratings yet
Audio Unit Graphs: Dave Dribin - @ddribin
28 pages
Stealth Book
No ratings yet
Stealth Book
35 pages
User Manual
No ratings yet
User Manual
43 pages
IC - ON-RF Transmitter - rws34
No ratings yet
IC - ON-RF Transmitter - rws34
2 pages
Cisco 640-507 CCNA (ICND) Questions I
No ratings yet
Cisco 640-507 CCNA (ICND) Questions I
123 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
NSC Topic 12 Wireless Security
No ratings yet
NSC Topic 12 Wireless Security
14 pages
Hanwha Robotics Specifications
No ratings yet
Hanwha Robotics Specifications
1 page
Vorderman Maths Report
No ratings yet
Vorderman Maths Report
117 pages
Full HD TV: ÝDL-32CX520
No ratings yet
Full HD TV: ÝDL-32CX520
6 pages
Nptel: Implementation of Iot With Raspberry Pi: Part 2
No ratings yet
Nptel: Implementation of Iot With Raspberry Pi: Part 2
106 pages
Model 1732 Remote Terminal Unit Spec Sheet: Input/Output Options
No ratings yet
Model 1732 Remote Terminal Unit Spec Sheet: Input/Output Options
4 pages
Module 3 Netiquette Notes
100% (1)
Module 3 Netiquette Notes
5 pages
3.3.3.3 Packet Tracer - Explore A Network Instructions
No ratings yet
3.3.3.3 Packet Tracer - Explore A Network Instructions
5 pages
PO NewCenturyBooks 11001
No ratings yet
PO NewCenturyBooks 11001
1 page
Overview of System Components For Stationary Heater (STH)
No ratings yet
Overview of System Components For Stationary Heater (STH)
3 pages
Mobile Banking in The Philippines
100% (1)
Mobile Banking in The Philippines
4 pages
Guide To Choosing Instruments
No ratings yet
Guide To Choosing Instruments
3 pages
Lab 03
No ratings yet
Lab 03
5 pages
Solutions To Analytical Problems: Chapter 6 Frequency Management and Channel Assignment
No ratings yet
Solutions To Analytical Problems: Chapter 6 Frequency Management and Channel Assignment
14 pages
DS-2CD1121-I 2 MP Fixed Dome Network Camera
No ratings yet
DS-2CD1121-I 2 MP Fixed Dome Network Camera
5 pages
A Whistle-Stop Tour of The C Language With Some C++: Victor Lazzarini & Richard Boulanger
No ratings yet
A Whistle-Stop Tour of The C Language With Some C++: Victor Lazzarini & Richard Boulanger
1 page
Fastest Way To Recover or Upgrade Cisco IOS Using TFTPDNLD
No ratings yet
Fastest Way To Recover or Upgrade Cisco IOS Using TFTPDNLD
3 pages
Apes Composing Techniques
100% (4)
Apes Composing Techniques
4 pages
Letter of Intent CLIENT - Wifi
No ratings yet
Letter of Intent CLIENT - Wifi
3 pages
HPX6 59 P1a
No ratings yet
HPX6 59 P1a
3 pages
T772 GCSE Maths Guide
No ratings yet
T772 GCSE Maths Guide
13 pages
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
From Everand
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
Fouad Sabry
No ratings yet

Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition

Uploaded by

Introduction & DSP: EE E6820: Speech & Audio Processing & Recognition

Uploaded by

EE E6820: Speech & Audio Processing & Recognition

survey topics in sound analysis & processing

develop and intuition for sound signals

learn some specic technologies

weekly assignments (25%)

midterm event (25%)

nal project (50%)

journal & conference publications

summarize & discuss in class

written summaries on web page + Courseworks discussion

Matlab-based (+ Signal Processing Toolbox)

direct experience of sound processing

skills for project

practical (Matlab recommended)

identify a problem; try some solutions

few restrictions within world of audio

investigate other resources

develop in discussion with me

but: artifacts from abrupt edges

e.g. by linear interpolation

e.g. by discrete dierentiator

keeps overlapped parts of sinusoid aligned

stretching a narrowband spectrogram

vowels stretch well, stops lose nature

want to alter time without frequency

satisfying result is a subjective judgment

lots of it, multiple levels of abstraction

survey of audio processing topics

practicals, readings, project

digital signals, time domain

Fourier domain, STFT

properties of the speech signal

You might also like