Spectral Modeling and Signal Processing Intro421
Spectral Modeling and Signal Processing Intro421
Course Overview
1
Administrative Information
Units
You may sign up for either 3 or 4 units:
• 3 units = lectures + assignments + final
• 4 units adds a final project based on outside reading
and/or a software project
Important Pointers
• The course schedule and outline1 (reachable from the
class home page2) lists the following information:
– Assignments
– Weekly class schedule
– Pointers to all lecture overheads
• The 421 home page further contains pointers to
– Programming examples
– Sound examples
– Related items of interest online
• The MUS421/EE367B Overview3 contains this
administrative info and more.
1
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html
2
http://ccrma.stanford.edu/CCRMA/Courses/421/
3
http://ccrma.stanford.edu/~jos/intro421/
2
Why The Fourier Transform
4
http://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdf
3
Applications of the
Short-Time Fourier Transform (STFT)
5
Emerging Applications in Audio Coding
• First transmit
objects = synthesizer patches = “decoder”
• Next transmit
messages = performance data (like MIDI) =
“encoded bit stream”
• Main challenge: to develop classifiers and coders for
general purpose audio
• Project Types
– Programming project and report
6
– Reading and report
• Suggested Outside Reading: See
http://ccrma.stanford.edu/~jos/refs421/
(also available in Appendix P of the text.5)
• Example Project Topics
– Windows
∗ New FFT window types
∗ Explore window types not covered in class
– Spectrum Analysis
∗ Short-time spectrum analysis of recorded data
∗ Study of statistical spectrum estimation
∗ Matching spectrum analysis parameters to
human hearing
∗ Alternative time-frequency representations
(Wavelets, Wigner, ...)
– Sinusoidal Modeling
∗ Readings in additive synthesis
∗ Implement your own sines+noise
analysis/synthesis system
∗ Noise reduction based on sinusoidal modeling
∗ Source separation based on sinusoidal modeling
5
http://ccrma.stanford.edu/~jos/sasp/
7
∗ Transient detection (for “sines + noise +
transients” modeling)
– Short-Time Fourier Transform (STFT) based
Analysis, Modification, and Resynthesis
∗ Software system development
∗ Noise reduction
∗ Pitch detection/tracking
∗ Time compression/expansion
∗ Transform coding
∗ Pitch-synchronous phase vocoder (adapt
window to pitch period)
∗ Signal reconstruction from the STFT magnitude
only (phase discarded)
∗ Spectral interpolation schemes to compensate
for lost frames
– Software Development
∗ Modify course Matlab examples to be
compatible with Octave
(See http://www.octave.org/.)
∗ Develop missing components of the Signal
Processing Toolbox for Octave (looking only at
Matlab help info).
∗ PD FFT-processing patches (see
http://www.pure-data.org/)
8
∗ LADSPA spectral-processing plug-ins (see
http://www.ladspa.org/
9
Main Pointer
• Assignments
• Weekly class schedule
• Pointers to all lecture overheads
6
http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html
7
http://ccrma.stanford.edu/CCRMA/Courses/421/
10
Notation
11
Introduction to Audio Spectrum Analysis
12
Example of Windowing
Notes:
13
The following is a diagram of a typical window function:
Zero−Phase Window
1
0.9
0.8
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
Time (samples)
14
We might also require that our window be zero for
negative time. Such a window is said to be ‘causal’.
Causal windows are necessary for real-time processing:
Linear Phase Window (Causal)
1
0.9
0.8
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
Time (samples)
15
Putting all this together, we get the following:
Our original signal (unwindowed, infinite duration), is
x(n) = ejω0nT , n ∈ Z
A portion of the real part, cos(ω0nT ), is plotted below:
1
0.8
0.6
0.4
0.2
Amplitude
−0.2
−0.4
−0.6
−0.8
−1
−2000 −1500 −1000 −500 0 500 1000 1500 2000
Time (samples)
16
The Fourier Transform of this infinite duration signal is a
delta function at ω0: X(ω) = δ(ω − ω0)
δ(ω − ω0)
0 ω0 ω
17
1
0.8
0.6
0.4
0.2
Amplitude
−0.2
−0.4
−0.6
−0.8
−1
−2500 −2000 −1500 −1000 −500 0 500 1000 1500 2000 2500
Time (samples)
18
0
−5
main lobe
−10
−15
−20
Amplitude − dB
−25
−30
−35
sidelobes
−40
−45
−50
−3 −2 −1 0
ω0 1 2 3
ωT
19
Summary
20
The Rectangular Window
0.8
Amplitude
0.6
0.4
0.2
0
−20 −15 −10 −5 0 5 10 15 20
Time (samples)
21
To see what happens in the frequency domain, we need
to look at the DTFT of the window:
∞
∆
X
WR (ω) = DTFTω (wR ) = wR(n)e−jωn
n=−∞
M −1
2 M −1 M +1
X ejω 2 − e−jω 2
= e−jωn =
1 − e−jω
n=− M2−1
Amp
10
6
Complex Amplitude
−2
−4
−6 −4 −2 0 2 4 6
freq
0.9
0.8
0.7
Magnitude (Linear)
0.6
0.5
0.4
0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6
ω/ΩM
3.1416
2.7489
2.3562
1.9635
Phase
1.5708
1.1781
0.7854
0.3927
Main Lobe
0
−0.3927
−6 −4 −2 0 2 4 6
ω/ΩM
24
In audio work, we more typically plot the window
transform magnitude on a decibel (dB) scale:
DFT of a Rectangular Window − M = 11
0
−5
main lobe
−10
13 dB down
−15
Magnitude (dB)
sidelobes sidelobes
−20
−25
−30
−35
nulls nulls
−40
−3 −2 −1 0 1 2 3
Normalized Frequency ω (rad/sample)
25
Since the DTFT of the rectangular window approximates
the sinc function, it should “roll off” at approximately 6
dB per octave, as verified in the log-log plot below:
DFT of a Rectangular Window − M = 20
0
−6.0206
Ideal −6 dB per octave line
−12.0412
−18.0618 Partial
Main
−24.0824 Lobe
Amplitude (dB)
−30.103
−36.1236
−42.1442
−48.1648
−54.1854
26
Sidelobe Roll-Off Rate
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
Frequency ωT (rad/sample) Frequency ωT (rad/sample)
M = 40 M = 80
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
Frequency ωT (rad/sample) Frequency ωT (rad/sample)
29
One Sine and One Cosine
(“Phase Quadrature” Case)
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
Frequency ωT (rad/sample) Frequency ωT (rad/sample)
M = 40 M = 80
0.7 0.7
0.6 0.6
0.5 0.5
Magnitude
Magnitude
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 0 1 2 3
Frequency ωT (rad/sample) Frequency ωT (rad/sample)
30
One Sine and One Cosine
(“Phase Quadrature” Case)
All Four Resolutions Overlaid
0.4
0.35
0.3
Magnitude
0.25
0.2
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
Frequency ωT (rad/sample)
6 WR (ω)
5 Μ=7
2
2πBw
1
0
−3Ω −2Ω −Ω Ω 2Ω 3Ω ω
-1
-2
-4 -3 -2 -1 0 1 2 3 4
0.9
0.8
∆
0.7
0.6
Amplitude
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
Bw Bw ωT (radians per sample)
34
For the rectangular window, Bw can be expressed as
fs
Bw = 2
M
Hence we need:
fs
Bw = 2 ≤∆
M
fs
⇒ M ≥2
∆
or
fs
M ≥2
|f2 − f1|
Thus, to resolve the frequencies f1 and f2 under a
rectangular window, it is sufficient for the window length
M to span at least 2 periods of the difference frequency
f2 − f1, measured in samples, where 2 is the width of the
main lobe, measured in sidelobe-widths.
A rectangular window of length or greater is said to
resolve the sinusoidal frequencies f1 and f2.
35