0% found this document useful (0 votes)

116 views46 pages

MPEG Standards For Audio

1. The document discusses MPEG audio standards and psychoacoustics. It describes how MPEG audio compression works by exploiting properties of human hearing like frequency masking and temporal masking. (2) 2. The MPEG audio algorithm involves filtering the audio signal into critical bands, computing masking thresholds based on a psychoacoustic model, and allocating bits only where needed based on those thresholds. (3) 3. MPEG audio standards include Layers I, II, and III, with Layer III providing the highest compression ratio and best quality at lower bitrates by using both sub-band coding and modified discrete cosine transforms.

Uploaded by

Rajnish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views46 pages

MPEG Standards For Audio

Uploaded by

Rajnish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

ECN-516

Topic: MPEG Standards for Audio

Psychoacoustics
These methods are related to how humans actually hear sounds:
Human hearing and voice
Frequency range is about 20 Hz to 20 kHz, most sensitive at 1 to 5 KHz.
Dynamic range (quietest to loudest) is about 96 dB
Normal voice range is about 500 Hz to 2 kHz
Low frequencies are vowels and bass
High frequencies are consonants

Human Hearing and Voice

Experiment:
Put a person in a quiet room. Raise level of 1 kHz tone until just
barely audible. Vary the frequency and plot

40
30
dB

20
10
0
2

kHz

Psychoacoustics
How sensitive is human hearing?
To answer this question we look at the following concepts:
Threshold of hearing
Describes the notion of quietness

Frequency Masking
A component (at a particular frequency) masks components at neighboring
frequencies. Such masking may be partial.

Temporal Masking
When two tones (samples) are played closed together in time, one can mask the
other.

Threshold of hearing
The ear is most sensitive to frequencies between 1 and 5

kHz, where we can actually hear signals below 0 dB.

Two tones of equal power and different frequencies will
not be equally loud.
Sensitivity decreases at low and high frequencies.
dB
40

Will not be heard anyway; discard!

30
20
10
0
2

kHz

Frequency Masking
Question: Do receptors interfere with each other?
Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB).
Play test tone at a different level (e.g., 1.1kHz), and raise level
until just distinguishable.
Vary the frequency of the test tone and plot the threshold when it
becomes audible:

Repeat for various frequencies of masking tones

Frequency Masking
A tone at a certain frequency will raise the threshold in a
critical band around that frequency.
The masker raises the threshold of audibility so that the
adjacent tone above it is no longer audible.

Critical bands
Perceptually uniform measure of frequency, nonproportional to width of masking curve
About 100 Hz for masking frequency < 500 Hz, grow
larger and larger above 500 Hz.
The width is called the size of the critical band

Critical bands
The human auditory system has a limited, frequency
dependent resolution.
This frequency dependence is expressed in the form of
critical band widths, less then 100 Hz for low and more
then 4 kHz for high frequencies.
The human ear blurs the various signal components inside
a critical band.

Temporal Masking
If we hear a loud sound, and then it stops, it takes a little while until we
can hear a soft tone nearby (in frequency).
Question: how to quantify?
Experiment:
Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone
can't be heard (it's masked).
Stop masking tone, then stop test tone after a short delay.

Adjust delay time to the shortest time that test tone can be heard (e.g., 5 ms).
Repeat with different level of the test tone and plot:
Try other frequencies for test tone (masking tone duration constant). Total effect of
masking

The temporal masking effect is the masking that occurs when a sound
raises the audibility threshold for a brief interval preceding and
following the sound.

Temporal Masking
If we hear a loud sound, and then it stops, it takes a little while until we
can hear a soft tone nearby (in frequency).

The temporal masking effect is the masking that occurs when a sound
raises the audibility threshold for a brief interval preceding and
following the sound.
Energy

Backward (pre) masking

< 10 ms

Forward (post) masking

Approx. 100 ms

Strong sound (masker)

Time

Observation
If we have a loud tone at, say, 1 kHz, then nearby quieter
tones are masked.
Best compared on critical band scale - range of masking is
about 1 critical band
Two factors for masking - frequency masking and temporal
masking
Question: How to use this for compression?

MPEG
Moving Picture Experts Group (MPEG)
Established in 1988
Standards under

International Organization for standardization (ISO)

International Electrotechnical Commission (IEC)

Official name is: ISO/IEC JTC1 SC29 WG11

MPEG
First High Fidelity Audio standard
Part of a multiple standard for
Video compression

Audio compression
Audio, Video and Data synchronization at an aggregate rate of 1.5

Mbit/sec

MPEG Audio
Physically lossy compression algorithm
Perceptually lossless, transparent algorithm
Exploits perceptual properties of human ear

Psychoacoustic modeling

MPEG Audio Standard

Ensures inter-operability
Defines coded bitstream syntax
Defines decoding process

Guarantees decoders accuracy

MPEG audio features

No assumptions about the nature of the audio source
Exploitation of human auditory system perceptual
limitations
Removal of perceptually irrelevant parts of audio signal

MPEG audio sampling rates

32 kHz

44.1 kHz
48 kHz

MPEG Audio Overview

Facts
The two most common advanced (beyond simple ADPCM) techniques
for audio coding are:
Sub-Band Coding (SBC) based
Adaptive Transform Coding based

MPEG audio coding is comprised of three independent layers. Each

layer is a self-contained SBC coder with its own time-frequency
mapping, psychoacoustic model, and quantizer.
Layer I: Uses sub-band coding
Layer II: Uses sub-band coding (longer frames, more compression)
Layer III: Uses both sub-band coding and transform coding.

MPEG-1 Audio is intended to take a PCM audio signal sampled at a

rate of 32, 44.1 or 48 kHz, and encode it at a bit rate of 32 to 192 kbps
per audio channel (depending on layer).

MPEG Audio Compression

MPEG Coding Algorithm

Input

Filter into
Critical Bands
(Sub-band filtering

Allocate bits
(Quantization)

Format
BitStream

Output

Compute
Masking
(Psychoacoustic
Model)

Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into 32
frequency sub-bands. (sub-band filtering)

Determine amount of masking for each band caused by nearby band using
the psychoacoustic model .

If the power in a band is below the masking threshold, don't encode it.

Otherwise, determine number of bits needed to represent the coefficient such

that, the noise introduced by quantization is below the masking effect (Recall
that one fewer bit of quantization introduces about 6 dB of noise).

Format bitstream

MPEG Coding Specifics

12
12
12
samples samples samples

Sub-band filter 0
Audio
Samples

Sub-band filter 1
Sub-band filter 2
.
.
.

.
.
.

12
12
12
samples samples samples

Sub-band filter 31
Layer I
Frame
Layer II, III
Frame

The Polyphase Filter Bank

Key component common to all layers

Divides the audio signal into 32 equal-width frequency

subbands
The filters provide good time and reasonable frequency
resolution
Critical bands associated with psychoacoustic models

MPEG Audio Psycho-acoustic Model

MPEG audio compresses by removing acoustically
irrelevant parts of audio signals
Takes advantage of human auditory systems inability to
hear quantization noise under auditory masking

Analyzes the audio signal and computes the amount of

noise masking as a function of frequency
The encoder decides how best to represent the input signal
with a minimum number of bits

Basic Steps in Psychoacoustic Model

Time align audio data
Convert audio to frequency domain representation

Process spectral values into tonal and non-tonal components

Apply a spreading function
Set a lower bound for threshold values
Find the threshold values for each subband
Calculate the signal to mask ratio

Masking and Quantization (Example)

Say, performing the sub-band filtering step on the input results in the
following values (for demonstration, we are only looking at the first
16 of the 32 bands):
Band

Level

The 60 dB level of the 8th band gives a masking of 12 dB in the 7th

band, 15 dB in the 9th. (according to the Psychoacoustic model)
The level in 7th band is 10 dB ( < 12 dB ), so ignore it.
The level in 9th band is 35 dB ( > 15 dB ), so send it.

We only send the amount above the masking level.

Therefore, instead of using 6 bits to encode it, we can use 4 bits -- a
saving of 2 bits (= 12 dB).

determine number of bits needed to represent the coefficient such that, the
noise introduced by quantization is below the masking effect [noise
introduced = 12 dB; masking = 15 dB]

MPEG Audio Layer I

Simplest coding
Suitable for bit rates above 128 kbits/sec per channel
Philips Digital Compact Cassette

MPEG Audio Layer II

Intermediate complexity

Bit rates around 128 kbits/sec per channel

Digital Audio Broadcasting (DAB)
Synchronized Video and Audio on CD-ROM
Full motion CD-I
Video-CD

MPEG Audio Layer III

Most complex coding

Best audio quality

Bit rates around 64 kbits/sec per channel
Suitable for audio over ISDN

MPEG Layer III coding

Based on Layer I & II filter banks
Compensation of filter deficiencies by processing outputs
with a Modified Discrete Cosine Transform

MPEG Layer III enhancements

Alias reduction

Non uniform quantization

Scalefactor bands
Entropy coding of data values
Use of a bit reservoir

Effectiveness of MPEG Audio

*Quality factor:

5 perfect

4 - just noticeable

3 - slightly annoying

2 annoying

1 - very annoying

Layer

Target
bit-rate
for each
channel

Ratio

Quality*
at
64 kbps

Quality at
128 kbps

Layer I

192 kbps

4:1

Layer II

128 kbps

6:1

2.1 to 2.6

Layer III

64 kbps

12:1

3.6 to 3.8

16 bits stereo sampled at 48 KHz => 768

Layer I: 192 kbits/sec => Compression ration of (768/192) = 4:1

Layer II: 128 kbits/sec => Compression ration of (768/128) = 6:1

Layer II: 64 kbits/sec => Compression ration of (768/64) = 12:1

MPEG 1
First standard to be published by the MPEG organization
(in 1992)
A standard for storage and retrieval of moving pictures and
audio on storage media
Example formats: VideoCD (VCD), mp3, mp2

MPEG-1 Layers I, II, III

MPEG layer differences lie in processing power and
resulting audio/sound quality
Mp1 little processing needed, poor quality

Mp2 minimal processing, okay quality

Mp3 massive processing, high CD quality

MPEG-1 Audio Layer II

Called MP2
Dominant standard for audio broadcasting
DAB digital radio and DVB digital television

Sampling rates: 32, 44.1, 48 kHz

Bit rates: 32, 48, 56, 64, 80, 96, 384 kbps
Format: mono, stereo, dual channel,
MP2 sub-band audio encoder in time domain

MPEG-1 Audio Layer III

MPEG-1 Layer III is called MP3 format
Popular for PC and Internet applications
Goal to compress to 128 kbps, but can be compressed to higher or
lower resulting quality

Utilization of psychoacoustics
Scientific study of sound perception

MPEG-1 Audio Encoding

Characteristics
Precision 16 bits
Sampling frequency: 32KHz, 44.1 KHz, 48 KHz
3 compression layers: Layer 1, Layer 2, Layer 3 (MP3)
Layer 3: 32-320 kbps, target 64 kbps
Layer 2: 32-384 kbps, target 128 kbps
Layer 1: 32-448 kbps, target 192 kbps

MPEG-2
Extends video & audio compression of MPEG-1
Substantially
transmissions

reduces

bandwidth

required

for

high-quality

Optimizes balance between resolution (quality) and bandwidth (speed)

HDTV(Grand Alliance)
ITU-R HDTV
International Telecommunication Union Radiocommunication Sector
16/9 ASPECT RATIO

Audio: Dolby AC-3

DVB HDTV
Digital video broadcasting
4/3 ASPECT RATIO

MPEG audio layer 2

MPEG-2 Advanced Audio Coding (AAC)

codec
Sampling frequencies from 8 kHz to 96k Hz

1 to 48 channels per stream

Temporal Noise Shaping (TNS) smooths quantization
noise by making frequency domain predictions
Prediction: Allows predictable sound patterns such as
speech to be predicted and compressed with better quality

MPEG-4
Submergence
Handle specific requirements from rapidly developing multimedia applications

Advantages over MPEG-1 and MPEG-2

Object-oriented coding

Applications:
Digital TV
TV logos, Customized advertising, Multi-window screen

Mobile multimedia
Cell phones and palm computers

Games
Personalize games

Streaming Video
News updates and live music shows over Internet

MPEG 7
Content representation standard for information search
Makes searching the Web for multimedia content as easy
as searching for text-only files
Operates in both real-time and non real-time environments

MPEG 21

Multimedia framework

Based on two essential concepts:

Digital Item

MPEG Standards For Audio

Uploaded by

MPEG Standards For Audio

Uploaded by

ECN-516

Topic: MPEG Standards for Audio

Human Hearing and Voice

kHz, where we can actually hear signals below 0 dB.

Will not be heard anyway; discard!

Repeat for various frequencies of masking tones

Backward (pre) masking

Forward (post) masking

Strong sound (masker)

International Organization for standardization (ISO)

Official name is: ISO/IEC JTC1 SC29 WG11

MPEG Audio Standard

Guarantees decoders accuracy

MPEG audio features

MPEG audio sampling rates

MPEG Audio Overview

MPEG audio coding is comprised of three independent layers. Each

MPEG-1 Audio is intended to take a PCM audio signal sampled at a

MPEG Audio Compression

MPEG Coding Algorithm

Otherwise, determine number of bits needed to represent the coefficient such

MPEG Coding Specifics

The Polyphase Filter Bank

Divides the audio signal into 32 equal-width frequency

MPEG Audio Psycho-acoustic Model

Analyzes the audio signal and computes the amount of

Basic Steps in Psychoacoustic Model

Process spectral values into tonal and non-tonal components

Masking and Quantization (Example)

The 60 dB level of the 8th band gives a masking of 12 dB in the 7th

We only send the amount above the masking level.

MPEG Audio Layer I

MPEG Audio Layer II

Bit rates around 128 kbits/sec per channel

MPEG Audio Layer III

Best audio quality

MPEG Layer III coding

MPEG Layer III enhancements

Non uniform quantization

Effectiveness of MPEG Audio

16 bits stereo sampled at 48 KHz => 768

Layer II: 128 kbits/sec => Compression ration of (768/128) = 6:1

MPEG-1 Layers I, II, III

Mp2 minimal processing, okay quality

MPEG-1 Audio Layer II

Sampling rates: 32, 44.1, 48 kHz

MPEG-1 Audio Layer III

MPEG-1 Audio Encoding

Optimizes balance between resolution (quality) and bandwidth (speed)

Audio: Dolby AC-3

MPEG audio layer 2

MPEG-2 Advanced Audio Coding (AAC)

1 to 48 channels per stream

Advantages over MPEG-1 and MPEG-2

Based on two essential concepts:

Concept of Users interacting with Digital Item

More universal framework for digital content protection

You might also like