0% found this document useful (0 votes)
65 views48 pages

SSP 5 3 Music Coding

The document discusses how sound is processed in an MP3 player. It involves separating the sound into frequency bands using a filter bank. This allows exploiting properties of human perception like masking to reduce file size by quantizing different frequency bands individually and placing quantization noise below masking thresholds so it cannot be heard. The key aspects are the filter bank design and its properties, which split the signal into subbands that can then be quantized and processed individually.

Uploaded by

Gabor Gereb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views48 pages

SSP 5 3 Music Coding

The document discusses how sound is processed in an MP3 player. It involves separating the sound into frequency bands using a filter bank. This allows exploiting properties of human perception like masking to reduce file size by quantizing different frequency bands individually and placing quantization noise below masking thresholds so it cannot be heard. The key aspects are the filter bank design and its properties, which split the signal into subbands that can then be quantized and processed individually.

Uploaded by

Gabor Gereb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

How is sound processed in an mp3 player?

Sverre Holm
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2: Design filter pairs for distortion cancellation (much better)

Part II
• Quadrature mirror filter bank design
– Filter bank without quantization, SNR = 84 dB
• 32-channel filter bank – violin
– Uniform quantization, SNR = 10 dB
– Uniform adaptive quantization, SNR = 25 dB
– Perceptual quantization, SNR = 39 dB

27.04.2022 2
Principles
• Vocodere
– ”Robotic voice”
• Hybrid coders Speech Music
– Mobile phone
• Waveform coders (Music)
– MPEG-1 (1993)
• Layer 2 = mp2 = DAB
• Layer 3 = mp3
– MPEG-2
• AAC (Advanced Audio Coder,
1999): iTunes, Tidal
– MPEG-4
• HE-AAC (High-efficiency AAC,
2004): for low bit rates: DAB+
MPEG = International Standardization Organization
(ISO) Moving Picture Experts Group

27.04.2022 3
Cover the center!

27.04.2022 5
Intro to perceptional coding
• Repetition from IN3190 / INF3470

27.04.2022 7
The frequency filters of the ear:
Mapping frequency to a location

Unwound
cochlea

27.04.2022 8
Bandpass filters are essential

The Bark scale: "...a frequency scale on which equal distances


correspond with perceptually equal distances. Above about 500
Hz this scale is more or less equal to a logarithmic frequency axis.
Below 500 Hz the Bark scale becomes more and more linear.
Figures: Evangelista, G., Dörfler, M., & Matusiak, E. (2013). Arbitrary phase vocoders by
means of warping. Musica/Tecnologia, 7, 91-118.
27.04.2022 9
Filterbank Approach. Encoding

BIT STREAM
Scaling Quantize

FILTERBANK
Scaling Quantize BIT
STREAM
LP-

MUX
Source
filter
A/D

Scaling Quantize

PSYCHOACOUSTIC BIT
ANALYSIS ALLOCATION

27.04.2022 10
Decoding is much simpler

Scaling

FILTERBANK
BIT
Scaling
STREAM
LP-
S D/A filter

Scaling

27.04.2022 11
What is this Psychoacoustics that is
used in the Encoder ?

27.04.2022 12
Masking
We do not hear all sounds.
1. Absolute threshold of hearing: !
2. Masking: One sound is inaudible in the presence
of another sound.
1. Simultaneous masking
– Noise Masking Tone
– Tone Masking Noise
– Noise Masking Noise
2. Nonsimultaneous masking
• Pre masking (2 ms)
• Post masking (100 ms)
27.04.2022 13
Noise Masking Tone
Filtered Noise Tone 1, 820 Hz Tone 2, 410 Hz Noise Noise
Center 410 Hz 5 dB below noise 5 dB below noise + +
Width 111 Hz Tone 1 Tone 2

Not masked Masked

You can not hear a sinusoid that lies in


the same critical band as a filtered noise if the
sound pressure level is below a certain threshold.

This effect also stretches out beyond the critical


band.
14
Tone Masking Noise
Filtered Noise Tone 1, 2 kHz Tone 2, 1 kHz Noise Noise
Center 1 kHz + +
Width 162 Hz Tone 1 Tone 2
15 dB below
Not masked Masked

You can not hear a filtered noise that lies in


the same critical band as a sinusoid if the sound
pressure level is below a certain threshold.

This effect also stretches out beyond the critical


band.
27.04.2022 15
Exploit Masking
• If a sound is masked we
cannot hear it.

• Frequency analysis of
the signal to find the masking threshold.
• Put the quantization noise under the masking
threshold and we won’t hear the effect of
quantization.

27.04.2022 16
Processing in an mp3 player

• The frequency domain is the key to exploiting


perceptual properties
• Implies a split of the signal into frequency
bands and individual quantization per band
• Focus of this lecture is therefore on the filter
bank, its properties and its design

27.04.2022 17
Subband coder
BP filtering, downsampling, quantization, upsampling, BP, sum

27.04.2022 18
Image

ASP_mp3.m: Upwards chirp 4000

3500
Image
Down/up-sampling (repeat IN3190)
3000

2500

Frequency
2000

1500
Generated
1. Channel 0: No h0, no g0 by alias
1000

500

• Aliasing distortion due to downsampling,


0.5 1 1.5 2 2.5 3 3.5
Time

image distortion due to upsampling 4000

3500

2. Channel 0: Insert g0
3000

2500

Frequency
2000

• Images disappear 1500

1000

3. Channel 0: Use both h0 and g0


500

0
0.5 1 1.5 2 2.5 3 3.5
Time

» Aliasing in 2nd part gone. No sampling distortion


4000

Throw out 50% ; replace by 0’s 3500

3000

2500

Frequency
2000

1500

1000

500

0
0.5 1 1.5 2 2.5 3 3.5
Time

27.04.2022 19
2-filter bank and individual
optimization of each filter
• Approach of first examples with chirp
• Three kinds of distortion:
– Amplitude distortion (in pass band)
– Phase distortion (in pass band)
– Aliasing distortion due to stop band leakage
0.3

• Here shown as error signal 0.2

in transition between filters:


0.1

• Requires ideal filters to


-0.1

-0.2

avoid this distortion -0.3

-0.4
0 0.5 1 1.5 2 2.5 3 3.5
4
x 10
27.04.2022 20
Perfect Reconstruction
• Up to now: optimization of each filter branch
independently
• Instead: Accept some aliasing per sub-band,
provided that aliasing from adjacent bands
cancel in the final summation
• Ilustrated by pairs of filters

27.04.2022 21
Filterbank

Near-ideal low pass filter Aliased low/high-pass

Best!
10
H0(f)
0 H1(f)

-10

-20

Magnitude (dB)
-30

-40

-50

-60

-70

-80
0 0.2 0.4 0.6 0.8 1
Normalized frequency (*pi rad/sample)

27.04.2022 22
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2 (better): Design filter pairs for distortion cancellation

Part II
• Quadrature mirror filter bank design
– Filter bank without quantization: SNR = 84.2 dB
• 32-channel filter bank – Violin example
– Uniform quantization, Uniform adaptive quantization, Perceptual quantization
(SNR = 10.3, 25, 38.6 dB)

27.04.2022 23
Quadrature mirror expression of up-
sampling of sub-band signal

= xi(n) for n even


= 0, for n odd

27.04.2022 24
Aliasing in decimation-interpolation
• Aliasing interpreted in the framework of quadrature mirrors

10
H0(f)
0 H1(f)

-10

-20

Magnitude (dB)
-30

-40

-50

-60

-70

-80
0 0.2 0.4 0.6 0.8 1
Normalized frequency (*pi rad/sample)

27.04.2022 25
Output of filter bank

 from down-/up-sampling

27.04.2022 26
Include input filters

• Perfect reconstruction: Out = In, x~(n) = x(n), if

Aliasing:

27.04.2022 27
Aliasing cancellation
• H = input filters (anti-aliasing)
• G = output filters (anti-imaging) coupled to H:
2. condition  cancel aliasing
= h1(n)(-1)n

= -h0(n)(-1)n
1. condition  unity gain

27.04.2022 28
Quadrature Mirror Filters
- Unity gain constraint
- Couple h0 with h1

All four filters


are now coupled

27.04.2022 29
ASP_mp3.m
• Sections 1-2
• May use rather poor LP, HP filters
• Despite aliasing in each filter, almost perfect
reconstruction due to near perfect
cancellation of aliasing errors

27.04.2022 30
Frequency-selective quadrature mirror
filters?

- Oh no!

- Must have frequency-selective filter,


otherwise the whole point of
perceptual coding is lost
- Impossible to satisfy (3.13) if also
frequency-selective!
- In practice allow some amplitude
27.04.2022
distortion 31
Pseudo Quadrature Mirror Filters

M=2:
f0=1/8, f1=3/8

27.04.2022 32
ASP_mp3.m
• Section 3:
• Build 32 PQMF
• Listen to band 3 – lots of aliasing error
• Listen to reconstruction (no quantization)
– snr_PQMF = 84.2 dB
– For all practical purposes aliasing error has been
cancelled

27.04.2022 35
Violin spectrogram
4
x 10

What is this spectral line?


1.5
Frequency

0.5

0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Time

Morse Code hidden in Mike Oldfield's Tubular Bells.


27.04.2022 36
0.4
Sub-band #2, quantized
0.3 Sub-band#2, original

0.2

0.1

ASP_mp3.m

Amplitude
0

-0.1

-0.2

-0.3

• Section 5: back to Pseudo QMF -0.4


0 20 40 60 80
Time (samples at Fs/32)
100 120

• 4-bit (+/-8 levels) uniform quantizer per band


– snr_4bits = 10.3 dB – very poor!
– strong high frequency tonal noise at each sub-band
transition i.e. every 1378 Hz (aliasing)
– Effectively only +/- 2 levels in e.g. sub-band 2 since
amplitude < +/-0.25 and quantizer spans +/- 1 0.4
Sub-band #2, quantized

• Adaptive quantizer per band 0.3

0.2
Sub-band#2, original

– snr_4bits_scaled = 25.0 dB 0.1

Amplitude
0

-0.1

-0.2

-0.3

-0.4
27.04.2022 0 20 40 60 80
Time (samples at Fs/32)
100 120
38
Adaptive quantization in layer 1
• The max value of the quantizer is adapted to
the actual max level that exists  automatic
gain control

27.04.2022 39
ASP_mp3.m
• Perceptual bit allocation also
• Max of global threshold and the absolute
auditory threshold is the final threshold.
• Signal-to-mask thresholds per band
• snr_scaled_perceptual = 38.6 dB
• bit_rate = 192.25 kbps

27.04.2022 40
Perceptual allocation of bits

100
Signal PSD
Min. threshold per sub-band
80 Absolute threshold

60
Magnitude (dB)

40

20

-20
0 0.5 1 1.5 2
Frequency (Hz) 4
x 10

27.04.2022 41
Spectrum of signal and error
4-bit adaptive Q per band (~192 192 kbps perceptual coding,
kbps) SNR = 25.0 dB SNR = 38.6 dB

0 0
Signal PSD Signal PSD
Error PSD Error PSD
-20 -20

-40 -40

Magnitude (dB)
Magnitude (dB)

-60 -60

-80 -80

-100 -100

-120 -120
0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5
Frequency (Hz) 4 Frequency (Hz) 4
x 10 x 10

Important Not so important


27.04.2022 42
Masking is level-dependent
Left: Absolute auditory threshold

Right: Masking due to a tone at 1kHz for various intensities


Ex: A pure tone at 1 kHz @ 80 dB SPL makes
another tone at 2 kHz @ 40 dB SPL inaudible
27.04.2022 43
Critical bands: not uniformly spaced

• Bark scale

27.04.2022 45
Deviation from auditory process
• Edges of critical bands: not uniformly spaced, follow Bark scale
– But most sub-band coders have equal bandwidth channels
Former NTNU – Ramstad, Tor A., and Joar P. Tanem. "Cosine-modulated analysis-
colleagues
synthesis filterbank with critical sampling and perfect reconstruction." IEEE
Int Conf Acoust, Speech, Sign Proc,1991.
• The authors present a simple derivation of a parallel filterbank based on cosine-
modulated versions of a model low-pass filter. With a nonuniform channel
separation an efficient implementation consisting of a DFT (discrete Fourier
transform) related transform and subfilters is possible. Using critical sampling of
each channel and FIR (finite impulse response) filters, the conditions for perfect
reconstruction are given. The computational complexity of the derived FIR
filterbank is much lower than for a tree-structured FIR filterbank but cannot
compete with the most efficient IIR filterbanks.
• Masking depends on absolute loudness, but level is unknown
– Assumes that the least significant bit of the 16-bit signal can just be heard

27.04.2022 46
MPEG-1
• CD 2 x 16 bit/sample x 44.1 ksamples/sec =
1.4 Mbps
• Layer 1: 1:4 compression (down to 384 kbps)
• Layer 2 (mp2): 1:6-8 (192 – 256 kbps)
– Original DAB coder.
– DAB uses down to 128 kbps (Norway), even
lower in the UK
• Layer 3 (mp3): 1:10-12 (112-128 kbps, rather
poor quality mp3!)
27.04.2022 47
MPEG-1 Layer 1
• 32 channel filterbanks
• Frame size 384 samples, decimated to 384/32=
12 samples
• 512 pt FFT for local power estimate as input to
psychoacoustic model
• SMR=Signal-to-mask ratio determined
• Uniform quantization of sub-bands so that
SNR<SMR
• 0 = bits for bands 27-32
• Adaptive quantizer, normalized over 12 samples
27.04.2022 48
MPEG-1 Layer-II (mp2, DAB)
• Longer FFT for spectral estimation (1024
samples)
• Scale factors from groups of three blocks
• Also temporal in addition to spectral masking

27.04.2022 49
MPEG-1 Layer 3
• 32 original sub-bands are split into 18
channels by Modified Discrete Cosine
Transform when needed – better frequency
resolution
• Dynamic coding scheme – more bits to
frames that require it

27.04.2022 50
MPEG-2 Advanced Audio Coding
(AAC)
• More sampling rates (down to 8 kHz)
• 5.1 surround
• AAC (1997) improves mp3
• Single filter bank with adaptive block size
– 1024 samples for stationary sounds
– 128 samples for transient sounds
• Apples iTunes
• Wimp (Tidal)
27.04.2022 51
MPEG-4 AAC
• MPEG-4 HE AAC (High efficiency)
– Sub-band replication (SBR), exploits correlation
between LF (<4-8 kHz) and HF to reproduce HF
– Claim stereo transparency at 48 kbps (?)
– My measurements: very poor stereo at 48 kbps
• MPEG-4 HE AAC v2 (AAC+) 2004
– Also pseudo stereo (PS)
– Claim stereo transparency at 24 kbps (?)
– DAB+ in Norway
27.04.2022 52
Overview
Part I
• Perceptual coding requires separation in frequency bands
• 2-channel filter bank
– Approach 1: Design individual filters for minimal distortion: aliasing, imaging
– Approach 2: Design filter pairs for distortion cancellation (much better)

Part II
• Quadrature mirror filter bank design
– Filter bank without quantization, SNR = 84.2 dB
• 32-channel filter bank – violin
– Uniform quantization, SNR = 10.3 dB
– Uniform adaptive quantization, SNR = 25 dB
– Perceptual quantization, SNR = 38.6 dB

27.04.2022 53
Audio coding in practice
• Raw uncoded 44.1 kHz/16 bit: 1411 kbps
• Mobile (mono, speech): 12.2 kbps
• Radio - DAB+, HE-AAC: 48-96 kbps
1. Mono, low bandwidth: Weather, News
2.
3.
Most stations: average
Niche channelse: >FM at its best: Classical, often P2
Mobile
• iTunes before 2007, AAC: 128 kbps comm.
• iTunes+ today, AAC: 256 kbps
• Spotify (Vorbis), Tidal (AAC): 320 kbps
• Tidal Hi-Fi, lossless FLAC: ~1000 kbps
– Free Lossless Audio Codec, 50-80% av CD
• 96-192 kHz, 24 bit stereo: 4600-9200

• Video lecture: The Good, the Bad, the Ugly. Fra vinyl til høyoppløst digitallyd
(2020)

27.04.2022 54

You might also like