SSP 5 3 Music Coding
SSP 5 3 Music Coding
Sverre Holm
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2: Design filter pairs for distortion cancellation (much better)
Part II
• Quadrature mirror filter bank design
– Filter bank without quantization, SNR = 84 dB
• 32-channel filter bank – violin
– Uniform quantization, SNR = 10 dB
– Uniform adaptive quantization, SNR = 25 dB
– Perceptual quantization, SNR = 39 dB
27.04.2022 2
Principles
• Vocodere
– ”Robotic voice”
• Hybrid coders Speech Music
– Mobile phone
• Waveform coders (Music)
– MPEG-1 (1993)
• Layer 2 = mp2 = DAB
• Layer 3 = mp3
– MPEG-2
• AAC (Advanced Audio Coder,
1999): iTunes, Tidal
– MPEG-4
• HE-AAC (High-efficiency AAC,
2004): for low bit rates: DAB+
MPEG = International Standardization Organization
(ISO) Moving Picture Experts Group
27.04.2022 3
Cover the center!
27.04.2022 5
Intro to perceptional coding
• Repetition from IN3190 / INF3470
27.04.2022 7
The frequency filters of the ear:
Mapping frequency to a location
Unwound
cochlea
27.04.2022 8
Bandpass filters are essential
BIT STREAM
Scaling Quantize
FILTERBANK
Scaling Quantize BIT
STREAM
LP-
MUX
Source
filter
A/D
Scaling Quantize
PSYCHOACOUSTIC BIT
ANALYSIS ALLOCATION
27.04.2022 10
Decoding is much simpler
Scaling
FILTERBANK
BIT
Scaling
STREAM
LP-
S D/A filter
Scaling
27.04.2022 11
What is this Psychoacoustics that is
used in the Encoder ?
27.04.2022 12
Masking
We do not hear all sounds.
1. Absolute threshold of hearing: !
2. Masking: One sound is inaudible in the presence
of another sound.
1. Simultaneous masking
– Noise Masking Tone
– Tone Masking Noise
– Noise Masking Noise
2. Nonsimultaneous masking
• Pre masking (2 ms)
• Post masking (100 ms)
27.04.2022 13
Noise Masking Tone
Filtered Noise Tone 1, 820 Hz Tone 2, 410 Hz Noise Noise
Center 410 Hz 5 dB below noise 5 dB below noise + +
Width 111 Hz Tone 1 Tone 2
• Frequency analysis of
the signal to find the masking threshold.
• Put the quantization noise under the masking
threshold and we won’t hear the effect of
quantization.
27.04.2022 16
Processing in an mp3 player
27.04.2022 17
Subband coder
BP filtering, downsampling, quantization, upsampling, BP, sum
27.04.2022 18
Image
3500
Image
Down/up-sampling (repeat IN3190)
3000
2500
Frequency
2000
1500
Generated
1. Channel 0: No h0, no g0 by alias
1000
500
3500
2. Channel 0: Insert g0
3000
2500
Frequency
2000
1000
0
0.5 1 1.5 2 2.5 3 3.5
Time
3000
2500
Frequency
2000
1500
1000
500
0
0.5 1 1.5 2 2.5 3 3.5
Time
27.04.2022 19
2-filter bank and individual
optimization of each filter
• Approach of first examples with chirp
• Three kinds of distortion:
– Amplitude distortion (in pass band)
– Phase distortion (in pass band)
– Aliasing distortion due to stop band leakage
0.3
-0.2
-0.4
0 0.5 1 1.5 2 2.5 3 3.5
4
x 10
27.04.2022 20
Perfect Reconstruction
• Up to now: optimization of each filter branch
independently
• Instead: Accept some aliasing per sub-band,
provided that aliasing from adjacent bands
cancel in the final summation
• Ilustrated by pairs of filters
27.04.2022 21
Filterbank
Best!
10
H0(f)
0 H1(f)
-10
-20
Magnitude (dB)
-30
-40
-50
-60
-70
-80
0 0.2 0.4 0.6 0.8 1
Normalized frequency (*pi rad/sample)
27.04.2022 22
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2 (better): Design filter pairs for distortion cancellation
Part II
• Quadrature mirror filter bank design
– Filter bank without quantization: SNR = 84.2 dB
• 32-channel filter bank – Violin example
– Uniform quantization, Uniform adaptive quantization, Perceptual quantization
(SNR = 10.3, 25, 38.6 dB)
27.04.2022 23
Quadrature mirror expression of up-
sampling of sub-band signal
27.04.2022 24
Aliasing in decimation-interpolation
• Aliasing interpreted in the framework of quadrature mirrors
10
H0(f)
0 H1(f)
-10
-20
Magnitude (dB)
-30
-40
-50
-60
-70
-80
0 0.2 0.4 0.6 0.8 1
Normalized frequency (*pi rad/sample)
27.04.2022 25
Output of filter bank
from down-/up-sampling
27.04.2022 26
Include input filters
Aliasing:
27.04.2022 27
Aliasing cancellation
• H = input filters (anti-aliasing)
• G = output filters (anti-imaging) coupled to H:
2. condition cancel aliasing
= h1(n)(-1)n
= -h0(n)(-1)n
1. condition unity gain
27.04.2022 28
Quadrature Mirror Filters
- Unity gain constraint
- Couple h0 with h1
27.04.2022 29
ASP_mp3.m
• Sections 1-2
• May use rather poor LP, HP filters
• Despite aliasing in each filter, almost perfect
reconstruction due to near perfect
cancellation of aliasing errors
27.04.2022 30
Frequency-selective quadrature mirror
filters?
- Oh no!
M=2:
f0=1/8, f1=3/8
27.04.2022 32
ASP_mp3.m
• Section 3:
• Build 32 PQMF
• Listen to band 3 – lots of aliasing error
• Listen to reconstruction (no quantization)
– snr_PQMF = 84.2 dB
– For all practical purposes aliasing error has been
cancelled
27.04.2022 35
Violin spectrogram
4
x 10
0.5
0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Time
0.2
0.1
ASP_mp3.m
Amplitude
0
-0.1
-0.2
-0.3
0.2
Sub-band#2, original
Amplitude
0
-0.1
-0.2
-0.3
-0.4
27.04.2022 0 20 40 60 80
Time (samples at Fs/32)
100 120
38
Adaptive quantization in layer 1
• The max value of the quantizer is adapted to
the actual max level that exists automatic
gain control
27.04.2022 39
ASP_mp3.m
• Perceptual bit allocation also
• Max of global threshold and the absolute
auditory threshold is the final threshold.
• Signal-to-mask thresholds per band
• snr_scaled_perceptual = 38.6 dB
• bit_rate = 192.25 kbps
27.04.2022 40
Perceptual allocation of bits
100
Signal PSD
Min. threshold per sub-band
80 Absolute threshold
60
Magnitude (dB)
40
20
-20
0 0.5 1 1.5 2
Frequency (Hz) 4
x 10
27.04.2022 41
Spectrum of signal and error
4-bit adaptive Q per band (~192 192 kbps perceptual coding,
kbps) SNR = 25.0 dB SNR = 38.6 dB
0 0
Signal PSD Signal PSD
Error PSD Error PSD
-20 -20
-40 -40
Magnitude (dB)
Magnitude (dB)
-60 -60
-80 -80
-100 -100
-120 -120
0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5
Frequency (Hz) 4 Frequency (Hz) 4
x 10 x 10
• Bark scale
27.04.2022 45
Deviation from auditory process
• Edges of critical bands: not uniformly spaced, follow Bark scale
– But most sub-band coders have equal bandwidth channels
Former NTNU – Ramstad, Tor A., and Joar P. Tanem. "Cosine-modulated analysis-
colleagues
synthesis filterbank with critical sampling and perfect reconstruction." IEEE
Int Conf Acoust, Speech, Sign Proc,1991.
• The authors present a simple derivation of a parallel filterbank based on cosine-
modulated versions of a model low-pass filter. With a nonuniform channel
separation an efficient implementation consisting of a DFT (discrete Fourier
transform) related transform and subfilters is possible. Using critical sampling of
each channel and FIR (finite impulse response) filters, the conditions for perfect
reconstruction are given. The computational complexity of the derived FIR
filterbank is much lower than for a tree-structured FIR filterbank but cannot
compete with the most efficient IIR filterbanks.
• Masking depends on absolute loudness, but level is unknown
– Assumes that the least significant bit of the 16-bit signal can just be heard
27.04.2022 46
MPEG-1
• CD 2 x 16 bit/sample x 44.1 ksamples/sec =
1.4 Mbps
• Layer 1: 1:4 compression (down to 384 kbps)
• Layer 2 (mp2): 1:6-8 (192 – 256 kbps)
– Original DAB coder.
– DAB uses down to 128 kbps (Norway), even
lower in the UK
• Layer 3 (mp3): 1:10-12 (112-128 kbps, rather
poor quality mp3!)
27.04.2022 47
MPEG-1 Layer 1
• 32 channel filterbanks
• Frame size 384 samples, decimated to 384/32=
12 samples
• 512 pt FFT for local power estimate as input to
psychoacoustic model
• SMR=Signal-to-mask ratio determined
• Uniform quantization of sub-bands so that
SNR<SMR
• 0 = bits for bands 27-32
• Adaptive quantizer, normalized over 12 samples
27.04.2022 48
MPEG-1 Layer-II (mp2, DAB)
• Longer FFT for spectral estimation (1024
samples)
• Scale factors from groups of three blocks
• Also temporal in addition to spectral masking
27.04.2022 49
MPEG-1 Layer 3
• 32 original sub-bands are split into 18
channels by Modified Discrete Cosine
Transform when needed – better frequency
resolution
• Dynamic coding scheme – more bits to
frames that require it
27.04.2022 50
MPEG-2 Advanced Audio Coding
(AAC)
• More sampling rates (down to 8 kHz)
• 5.1 surround
• AAC (1997) improves mp3
• Single filter bank with adaptive block size
– 1024 samples for stationary sounds
– 128 samples for transient sounds
• Apples iTunes
• Wimp (Tidal)
27.04.2022 51
MPEG-4 AAC
• MPEG-4 HE AAC (High efficiency)
– Sub-band replication (SBR), exploits correlation
between LF (<4-8 kHz) and HF to reproduce HF
– Claim stereo transparency at 48 kbps (?)
– My measurements: very poor stereo at 48 kbps
• MPEG-4 HE AAC v2 (AAC+) 2004
– Also pseudo stereo (PS)
– Claim stereo transparency at 24 kbps (?)
– DAB+ in Norway
27.04.2022 52
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2: Design filter pairs for distortion cancellation (much better)
Part II
• Quadrature mirror filter bank design
– Filter bank without quantization, SNR = 84.2 dB
• 32-channel filter bank – violin
– Uniform quantization, SNR = 10.3 dB
– Uniform adaptive quantization, SNR = 25 dB
– Perceptual quantization, SNR = 38.6 dB
27.04.2022 53
Audio coding in practice
• Raw uncoded 44.1 kHz/16 bit: 1411 kbps
• Mobile (mono, speech): 12.2 kbps
• Radio - DAB+, HE-AAC: 48-96 kbps
1. Mono, low bandwidth: Weather, News
2.
3.
Most stations: average
Niche channelse: >FM at its best: Classical, often P2
Mobile
• iTunes before 2007, AAC: 128 kbps comm.
• iTunes+ today, AAC: 256 kbps
• Spotify (Vorbis), Tidal (AAC): 320 kbps
• Tidal Hi-Fi, lossless FLAC: ~1000 kbps
– Free Lossless Audio Codec, 50-80% av CD
• 96-192 kHz, 24 bit stereo: 4600-9200
• Video lecture: The Good, the Bad, the Ugly. Fra vinyl til høyoppløst digitallyd
(2020)
27.04.2022 54