Low Bit Rate Coding
Low Bit Rate Coding
&+$//(1*(6$1')8785(',5(&7,216
.DUOKHLQ]%UDQGHQEXUJ
Ilmenau Technical University &
Fraunhofer IIS Arbeitsgruppe Elektronische Medientechnologie
Ilmenau, Germany
$%675$&7
Perceptual encoding of high quality audio has found its way to
many applications including digital radio, Electronic Music
Distribution (EMD) systems and portable audio devices. An
overview on the basics of high quality low bitrate audio coding
will be followed by a look into currently widely used and newer,
state-of-the-art coding systems like MP3 and MPEG-2
Advanced Audio Coding (AAC). The rapid deployment of older
(1992) technologies (like MP3) followed by the news of new
and improved algorithms (like AAC) raises the question about
future improvements. The paper will analyse some candidates
for such improvements and provide a view of some current
research activities.
,1752'8&7,21
High quality audio compression has found its way from research
to widespread applications within a couple of years. Early
research of 15 years ago was translated into standardization
efforts of ISO/IEC and ITU-R 10 years ago. Since the
finalization of MPEG-1 in 1992, many applications have beed
devised. In the last couple of years, Internet audio delivery has
emerged as a powerful cathegory of applications. These
techniques made headline news in many parts of the world
because of the potential to change the way of business for the
music industry.
Currently, among others the following
applications employ low bit-rate audio coding techniques:
-
"OBMZTJT
'JMUFSCBOL
2VBOUJ[FE
2VBOUJ[BUJPO 4BNQMFT
$PEJOH
&ODPEJOH PG
#JUTUSFBN
&ODPEFE
#JUTUSFBN
1FSDFQUVBM
.PEFM
Figure 1: Block diagram of a perceptual encoding/decoding
system.
It consists of the following building blocks:
-
7+(%$6,&62)+,*+48$/,7<
$8',2&2',1*
All current high quality low bit-rate audio coding systems follow
the basic paradigm described above. They differ in the types of
filterbanks used, in the quantization and coding techniques and
in the use of additional features.
67$1'$5',=('&2'(&6
03(*
MPEG-1 is the name for the first phase of MPEG work, started
in 1988, and finalized with the adoption of ISO/IEC IS 11172 in
late 1992. The audio coding part of MPEG-1 (ISO/IEC IS
11172-3, see [1] describes a generic coding system, designed to
fit the demands of many applications. MPEG-1 audio consists of
three operating modes called layers with increasing complexity
and performance from Layer-1 to Layer-3. Layer-3 (in recent
years nicknamed 03 because of the use of .mp3 as a file
extension for music files in Layer-3 format) is the highest
complexity mode, optimised to provide the highest quality at low
bit-rates (around 128 kbit/s for a stereo signal).
The following paragraphs describe the Layer-3 encoding
algorithm along the basic blocks of a perceptual encoder. More
details about Layer-3 can be found in [1] and [2]. Fig 2 shows
the block diagram of a typical MPEG-1/2 Layer-3 encoder.
3HUFHSWXDO0RGHO
The perceptual model is mainly determining the quality of a
given encoder implementation. A lot of additional work has gone
into this part of an encoder since the original informative part in
[1] has been written. The perceptual model either uses a separate
filterbank as described in [1] or combines the calculation of
energy values (for the masking calculations) and the main
filterbank. The output of the perceptual model consists of values
for the masking threshold or allowed noise for each coder
partition. In Layer-3, these coder partitions are roughly
equivalent to the critical bands of human hearing. If the
quantization noise can be kept below the masking threshold for
each coder partition, then the compression result should be
indistinguishable from the original signal.
4XDQWL]DWLRQDQG&RGLQJ
A system of two nested iteration loops is the common solution
for quantization and coding in a Layer-3 encoder. Quantization is
done via a power-law quantizer. In this way, larger values are
automatically coded with less accuracy and some noise shaping
is already built into the quantization process. The quantized
values are coded by Huffman coding. To adapt the coding
process to different local statistics of the music signals the
optimum Huffman table is selected from a number of choices.
The Huffman coding works on pairs or quadruples. To get even
better adaption to signal statistics, different Huffman code tables
can be selected for different parts of the spectrum. Since
Huffman coding is basically a variable code length method and
noise shaping has to be done to keep the quantization noise
below the masking threshold, a global gain value (determining
the quantization step size) and scalefactors (determining noise
shaping factors for each scalefactor band) are applied before
actual quantization. The process to find the optimum gain and
scalefactors for a given block, bit-rate and output from the
perceptual model is usually done by two nested iteration loops in
an analysis-by-synthesis way:
-
)LOWHUEDQN
The filterbank used in MPEG-1 Layer-3 belongs to the class of
hybrid filterbanks. It is built by cascading two different kinds of
filterbank: First a polyphase filterbank (as used in Layer-1 and
Layer2) and then an additional Modified Discrete Cosine
7RROVWRHQKDQFHDXGLRTXDOLW\
There are other improvements in AAC which help to retain high
quality for classes of very difficult signals.
-
03(*
MPEG-2 denotes the second phase of MPEG. It introduced a lot
of new concepts into MPEG video coding including support for
interlaced video signals. The main application area for MPEG-2
is digital television. The original (finalized in 1994) MPEG-2
Audio standard [3] just consists of two extensions to MPEG-1:
-
Perceptual
M o del
PreProcessing
Legend
Filter
Bank
03(*$GYDQFHG$XGLR&RGLQJ
TNS
Intensity/
Coupling
Q uantized
Spectrum
of
Previo us
Frame
Prediction
M /S
Iteration Loo ps
7RROVWRHQKDQFHFRGLQJHIILFLHQF\
The following changes compared to Layer-3 help to get the same
quality at lower bit-rates:
D ata
Contro l
Scale
Factors
Q uantizer
N oiseless
Coding
Bitstream
Formatter
13818-7
Coded Audio
Stream
&$1','$7(6)251(;7
*(1(5$7,21&2'(&6
2WKHU DXGLR FRGHFV
(OHFWURQLF0XVLF'LVWULEXWLRQ
SURSRVHG
IRU
Dolby AC-3 has been recommended by the ITU for the 5.1
multichannel sound of DTV systems.
Lucent EPAC is an audio coding system similar to MPEG
AAC. Wavelet based coding is used in addition to the
MDCT filterbank.
Sony ATRAC-3 is a recent system for EMD. The author is
not aware of publications describing ATRAC-3 in detail.
Microsoft WMA (Windows Media Audio) has been
proposed for EMD, too.
3DUDPHWULFFRGLQJ
&21&/86,21$1')8785(
:25.