M I Itai Au Ioc Ing: Dealing With Bit Rates
M I Itai Au Ioc Ing: Dealing With Bit Rates
11
Au io C Ing
11
T
e Moving Pictures Expert Group (MPEG) Dealing with Bit Rates
with n� �e International Organization of Stan PCM Bit Rates
dardizatlOn (ISO) has developed a series of
Typical audio signal classes are telephone speech, wide
audio-visual standards known as MPEG-l and
band speech, and wideband audio, all of which differ in
MPEG-2. These audio-coding standards are the first in
bandwidth, dynamic range, and in listener expectation of
ternational standards in the field of
high-quality digital audio compres
sion. MPEG-l covers coding of
stereophonic audio signals at high
sampling rates aiming at transparent
quality, whereas MPEG-2 also offers
stereophonic audio coding at lower
sampling rates. In addition, MPEG-2
introduces multichannel coding with
and without backwards compatibility
to MPEG-l to provide an improved
acoustical image for audio-only appli
cations and for enhanced television
and video-conferencing systems.
MPEG-2 audio coding without back
wards compatibility, called MPEG-2
Advanced Audio Coding (AAC), of
fers the highest compression rates.
Typical application areas for
MPEG-based digital audio are in the
fields of audio production, program
distribution and exchange, digital
sound broadcasting, digital storage,
and various multimedia applications.
In this article we describe in some de
tail the key technologies and main
features of MPEG-l and MPEG-2 audio coders. We also offered quality. The quality of telephone-bandwidth
present a short section on the upcoming MPEG-4 stan speech is acceptable for telephony and for some videote
dard, and we discuss some of the typical applications for lephony services. Higher bandwidths (7 kHz for wide
band speech) may be necessary to improve the
MPEG audio compression.
itt.. 3. Temporal masking (acoustic events in the gray areas will not be audible).
uncorrelated spectral components and by quantizing decimation results in an aggregate number of subband
these components separately. Two coding categories ex samples that equals that in the source signal. In the re
ist, transform coding (TC) and subband coding (SBC). The ceiver, the sampling rate of each subband is increased to
differentiation between these two categories is mainly that of the source signal by filling in the appropriate
due to historical reasons. Both use an analysis fIlterbank number of zero samples. Interpolated subband signals ap
in the encoder to decompose the input signal into pear at the bandpass outputs of the synthesis filterbank.
subsampled spectral components. The spectral compo The sampling processes may introduce aliasing distortion
nents are called subband samples if the fIlterbanlc has low due to the overlapping nature of the subbands. If perfect
frequency resolution, otherwise they are called spectral filters, such as two-band quadrature mirror filters or po
lines or transform coefficients. These spectral compo lyphase filters, are applied, aliasing terms will cancel and
nents are recombined in the decoder via synthesis filter the sum of the bandpass outputs equals the source signal
banlcs. in the absence of quantization [18-21]. With quantiza
In SBC, the source signal is fed into an analysis filter tion, aliasing components will not cancel ideally; never
bank consisting ofM bandpass filters that are contiguous theless, the errors will be inaudible in MPEG/Audio
in frequency so that the set of subband signals can be re coding if a sufficient number of bits is used. However,
combined additively to produce the original signal or a these errors may reduce the original dynamic range of 20
close version thereof. Each filter output is critically deci bits to around 18 bits [15].
mated (i.e., sampled at twice the nominal bandwidth) by In TC, a block of input samples is linearly transformed
a factor equal toM, the number of bandpass filters. This via a discrete transform into a set of near-uncorrelated
bling of the filter impulse responses resulting from the posed by Edler (window switching) [24]; typical block
overlap. sizes are between N = 64 and N = 1024. The small
Hybrid filterbanks) i.e., combinations of discrete blocks are only used to control pre-echo artifacts dur
transform and filterbank implementations, have fre ing nonstationary periods of the signal, otherwise the
quently been used in speech and audio coding [22,23]. coder switches back to long blocks. It is clear that
One of the advantages is that different frequency resolu- block size selection has to be based on an analysis of
the characteristics of the actual audio-coding block.
Fig. 5 demonstrates the effect in transform coding: if
the block size is N = 1024 (Fig. 5b) pre-echoes are
clearly (visible and) audible whereas a block size of
256 will reduce these effects because they are limited
to the block where the signal attack and the corre
sponding quantization errors occur (Fig. 5c). In addi
A 10. I 152-sample block of an audio signal. tion, pre-masking can become effective.
The Basics
Structure
The basic structure of MPEG-1 audio coders fol
A 11. Frequency distributions of various important MPEG parameters lows that of perception-based coders (see Fig. 4).
taken from the audio block of Fig. 10. MPEG-l Layer /I coding with an In the first step the audio signal is converted into
overall bit rate of 128 kb/s. (a) Sound-pressure level (SPL) of input spectral components via an analysis filterbank; Lay
frame vs. index of subbands (each subband is 750 Hz wide); (b) Global ers I and II make use of a sub band filterbank and
masking threshold vs. frequency; (c): Signal-to-mask ratio vs. fre Layer III employs a hybrid filterbank. Each spectral
quency; (d) Bit allocation vs. frequency; (e) SPL of reconstruction error component is quantized and coded with the goal of
vs. frequency. keeping the quantization noise below the masking
Psychoacoustic Models
We have already mentioned
that the adaptive bit
allocation algorithm is con
trolled by a psychoacoustic
model. This model computes
SMRs and takes into account
the short-term spectrum of
the audio block to be coded
and knowledge about noise
masking. The model is only
needed in the encoder, which
makes the decoder less com
plex; this asymmetry is a de
sirable feature for audio
playback and audio broad
casting applications.
The normative part of the
standard describes the de
coder and the meaning of the
ll ll
... 13. Bit a ocations for MPEG- 1 Layer 1/ coding with overa bit rates of 128kb/s and 64 kb/s. encoded bitstream, but the
encoder is not standardized,
cies are exploited to reduce the overall bit rate by using an thus leaving room for an
irrelevancy-reducing technique called intensity stereo. It is evolutionary improvement of the encoder. In particu
known that, above 2 kHz and within each critical band, lar, different psychoacoustic models can be used that
the human auditory system bases its perception of stereo range from very simple (or none at all) to very complex
imaging more on the temporal envelope of the audio sig based on quality and implementability requirements.
nal than on its temporal fine structure. Therefore, the Information about the short-term spectrum can be de
MPEG audio-compression algorithm supports a stereo rived in various ways; for example, as an accurate esti
redundancy coding mode called intensity stereo coding) mate from an FFT-based spectral analysis of the audio
which reduces the total bit rate without violating the spa input samples, or, less accurate, directly from the spec
tial integrity of the stereophonic signal. tral components as in the conventional ATe [14] (see
In this mode the encoder codes some upper-frequency also Fig. 6). Encoders can also be optimized for a cer
subband outputs with a single sum signal L + R (or some tain application. All these encoders can be used with
linear combination thereof) instead of sending independ complete compatibility with all existing MPEG-l
ent left (L) and right (R) subband signals. The decoder audio decoders.
reconstructs the left and right channels based only on the The informative part of the standard gives two exam
single L + R signal and on independent left- and right ples of FFT-based models (see also [8, 29, 36]). Both
channel scalefactors. Hence, the spectral shape of the left models identify, in different ways, tonal and nontonal
and right outputs is the same within each intensity-coded spectral components and use the corresponding results of
subband but the magnitudes are different [35]. The op- tone-masks-noise and noise-masks-tone experiments in
thresholds and the absolute maslcing threshold. The SMR 750 Hz; hence, at low frequencies, a single subband cov
is then the ratio of the maximum signal level within a ers a number of adjacent critical bands. The subband sig
given subband and the minimum value of the global nals are resampled (critically decimated) at a rate of 1500
maslcing threshold in that given subband (see Fig. 2). Hz. The impulse response of subband k, hsub(k)(n), is ob
Model 2, which may be used for all layers, is more com tained by multiplication of the impulse response of a sin
plex: tonality is assumed when a simple prediction indi gle prototype wwpassfilter) h (n), by a modulating function
cates a high prediction gain, the masking thresholds are that shifts the lowpass response to the appropriate sub
calculated in the cochlea domain, i.e., properties ofthe in band frequency range:
ner ear are talcen into account in more detail, and, finally,
in case of potential pre-echoes the global masking thresh hsub(k)(n) =h(n)cos
[( 2k - l) 1
+<p(k);
old is adjusted appropriately. 2M
M = 32; k = 1,2, . . .32;n = 1,2, ... ,512
The Layer II coder achieves a better performance, mainly at odd multiples thereof (all values at 48 kHz sampling
because the overall scalefactor side information is reduced rate). Therefore, the subsampled filter outputs exhibit a
by exploiting redundancies between the scalefactors. Ad significant overlap. However, the design of the prototype
ditionally, a slightly finer quantization is provided. filter and the inclusion of appropriate phase shifts in the
i.e.,blocks of decimated samples are formed and divided The following figures demonstrate the way MPEG-l
by a scalefactor such that the sample of largest magnitude Layer II encodes audio signals. Figure 10 shows an indi
is unity. In Layer I, blocks of 12 decimated and scaled vidual 1152-sample audio block to be coded. Figure 11
samples are formed in each subband (and for the left and shows the frequency dependencies of various important
right channel) and there is one bit allocation for each MPEG parameters; the frequency axes are divided in ac
block. At a 48 kHz sampling rate, 12 subband samples cordance with the subband separations. The sampling
correspond to 8 ms of audio. There are 32 blocks, each rate is 48 kHz, hence each subband index represents a
with 12 decimated samples,representing 32 X 12 = 384 subband of bandwidth 750 Hz. We have chosen an over
audio samples. all bit rate of 128 kb/s.
In Layer II,in each subband a 36-sample superblock is Figure ll(a) shows the frequency distribution of the
formed of three consecutive blocks of 12 decimated sam sound-pressure level of the audio block. From this distri
ples corresponding to 24 ms of audio at 48 kHz sampling bution and from the threshold in quiet a global masking
rate. There is one bit allocation for each 36-sample super threshold can be derived (Fig. 11(b)). For each subband,
block. All 32 superblocks, each with 36 decimated sam the SMR (in dB) is the difference between the level of the
ples, represent, altogether, 32 X 36 1152 audio
= masker and the minimum value of the global masking
samples. As in Layer I,a scalefactor is computed for each threshold (Fig. ll(c)). Note that, for subbands of index
12-sample block. A redundancy reduction technique is 23 and higher,the signal power is significantly below that
used for the transmission of the scalefactors: depending of the global masking threshold. Accordingly, the corre
on the significance of the changes between the three con sponding subband signals need not be transmitted. In the
secutive scalefactors,one,two, or all three scalefactors are next step, the number of bits per subband quantizer is
transmitted, together with a 2-bit scalefactor select infor chosen such that its quantization noise is kept sufficiently
mation. Compared with Layer I,the bit rate for the scale- below the global masking threshold (Fig. 11(d)). There
fore,the bit allocation for those subbands,which have to
be transmitted,roughly follows the SMR. The spectrum
of the reconstruction error is shown in Fig. 11(e) (Please
take into account that the dB values of Fig. 11(e) cover
only the range 0 to 35 dB). If we compare it with the spec
trum of the global masking threshold, we note that the
... 15. Typical sequence of windows in adaptive window switching. power of the reconstruction error is below the threshold,
Decoding
Quantization and Coding
The decoding is straightforward: the subband sequences
The MDCT output samples are nonuniformly quantized,
are reconstructed on the basis of blocks of 12 subband
thus providing both smaller mean-squared errors and
samples talcing into account the decoded scalefactor and
masking because larger errors can be tolerated if the sam
bit-allocation information. If a subband has no bits allo
ples to be quantized are large. Huffman coding, based on
cated to it,the samples in that subband are set to zero. Each
3� code tables and additional run-length coding, are ap
time the subband samples of all 32 subbands have been cal
phed to represent the quantizer indices in an efficient
culated, they are applied to the synthesis filterbank, and 32
way. The encoder maps the variable wordlength code
consecutive 16-bit, PCM-format audio samples are calcu
words of the Huffman code tables into a constant bit rate
lated. If available, as in bidirectional communications or in
by monitoring the state of a bit reservoir. The bit reser
recorder systems, the encoder (analysis) fllterbanlz can be
voir ensures that the decoder buffer neither underflows or
used in a reverse mode in the decoding process.
overflows when the bitstream is presented to the decoder
at a constant rate.
Layer 1/1 In order to keep the quantization noise in all critical
Layer III of the MPEG-l/Audio coding standard intro bands below the global masking threshold (noise alloca
duces many new features (see Fig. 14), in particular a tion) an iterative analysis-by-synthesis method is employed
switched hybrid filterbank. In addition, it employs an whereby the process of scaling, quantization,and coding
of spectral data is carried out within two nested iteration which varies in Layer II; and (iii) the ancillary data field,
loops. The decoding follows that ofthe encoding process. the length of which is not specified.
Subjective Quality
MPEG Mu ltichannel Audio Coding
(MPEG- l; Stereophonic Audio Signals)
The standardization process included extensive subjective Multichannel Audio Representations
tests and objective evaluations ofparameters such as com A logical further step in digital audio is the defInition of
plexity and overall delay. The MPEG (and equivalent multichannel audio representation systems to create a re
ITU-R) listening tests were carried out under very similar alistic surround-sound fIeld both for audio-only applica
and carefully defIned conditions with around 60 experi- tions and for audiovisual systems, including video
tion, components of the multichannel signal, which are R. The factors, a, �, and 0, attenuate the signals to avoid
irrelevant with respect to the spatial perception of the overload when calculating the compatible stereo signal
.. 23. Data format of MPEG-2 audio bit stream with extension part for multichannel data.
The predictor reduces the bit rate for coding subsequent The above-listed selected modules define the MPEG-2
subband samples in a given subband, and it bases its pre audio AAC standard that became an International Stan
diction on the quantized spectrum of the previous block, dard in April 1 997 as an extension to MPEG-2
which is also available in the decoder (in the absence of (ISO/MPEG 13818 - 7) . A more detailed description of
channel errors). Finally, for quantization and noiseless the MPEG-2 AAC multichannel standard can be found in
coding, an iterative method is employed so as to keep the the literature [43]. The standard offers high quality at the
quantization noise in all critical bands below the global lowest possible bit rates between 320 and 384 kb/s for
masking threshold. five channels; it will find many applications in both con
sumer and professional use.
Profiles. In order to serve different needs, the standard
provides three profiles: (i) the main profile offers highest
quality, (ii) the low-complexity profile works without Backwards Compatibility via
prediction, and (iii) the sampling-rate-scaleable profile Simulcast Transmission
offers the lowest complexity. For example, in its main pro If bit rates are not of high concern, a simulcast transmission
file the filterbank is a 1024-line MDCT with 50% overlap may be employed where a full MPEG-l bitstream is mul
(block length of 2048 samples). The filter bank is tiplexed with a full nonbackwards-compatible multichan
switchable to eight 128-line MDCTs (block lengths of n e l b i t s t r e a m in order to support backwards
256 samples). Hence, it allows for a frequency resolution compatibility without matrixing techniques (Fig. 26).
of 23.43 Hz and a time resolution of 2.6 ms (both at a
sampling rate of 48 kHz). In the case of the long block
length, the window shape can vary dynamically as a func Subjective Tests
tion of the signal. (MPEG-2, Multichannel Audio Signals)
The low-complexity profile does not employ temporal The first subjective tests, independently run at German
noise shaping and time-domain prediction (the predic Telekom and BBC (UK) under the umbrella of the
tion adds significantly to the complexing), whereas in the MPEG-2 standardization process, had shown a satisfac
sampling-rate-scaleable profile a hybrid illterbank is used. tory average performance of nonbackwards-compatible and
MPEG-2 AAC supports up to 46 channels for various of backwards-compatible coders. The tests had been carried
multichannel loudspeaker configurations and other ap out with experienced listeners and critical test items at
plications; the default loudspeaker configurations are the low bit rates (320 and 384 kb/s). However, all codecs
monophonic channel, the stereophonic channel, and the showed significant deviations from transparency for
5.1 system (five channels plus LFE channel). some of the test items [47, 48]. Recently, extensive for
mal subjective tests have been carried out to compare
MPEG-2 AAC versions, operating, respectively, at 256
and 320 kbls, and a backward-compatible MPEG-2
Layer II coder, operating at 640 kb/s [49] (a 1995 ver
sion of this latter coder was used, therefore its test results
do not reflect any subsequent enhancements). All coders
performed very well with a slight advantage to the
nonbackwards-compatible 320 kb/s MPEG-2 AAC
coder compared with the backwards-compatible 640 kb/s
MPEG-2 Layer II coder. The performances of those cod
.. 24. MPEG-2 advanced audio coding (multichannel configura ers are indistinguishable from the original in the sense of
tion). the EBU definition of indistinguishable quality [50].
From these subjective tests, it has become clear that the the choice of tools made by the encoder. This description
concept of backwards compatibility implies the need for can also be used to describe new algorithms and down
higher bit rates. load their configuration to the decoding processor for
execution.
The current toolset supports audio and speech com
MPEG-4 Audio Coding pression at monophonic bit rates ranging from 2 to 64
Activities within MPEG-4 aim at proposals for a broad kb/s. Three core coders are used:
field of applications including multimedia. (We note in Ji. a parametric coding scheme for low bit rate speech cod
passing that MPEG has started a new work item called ing (2 to 10 kb/s)
"Multimedia content description interface" (in short Ji. an analysis-by-synthesis coding scheme for medium bit
"MPEG-7") . MPEG-7 does not cover coding, its goal is rates (6 to 16 kb/s)
rather to specify a standardized description of various Ji. a subband/transform -based coding scheme for bit rates
types of multimedia information. A typical application below 64 kb/s.
will be the search for video, graphics, or audio material in The three core coders have been integrated into a so
the sense of today's text-based search engines in the called verification model that describes the operations of
World Wide Web.) MPEG-4 will offer higher compres encoders and decoders and that is used to carry out simu
sion rates, and it will merge the whole range of audio lations and optimizations. In the end, the verification
from high-fidelity audio coding and speech coding down model will be the embodiment of the standard [5 1] .
to synthetic speech and synthetic audio, supporting appli Let us also note that MPEG-4 will offer new function
cations from high-fidelity audio systems down to alities such as time-scale changes, pitch control, editabil
mobile-access multimedia terminals. In order to repre ity, database access, and scalability, which allows one to
sent, integrate, and exchange pieces of audio-visual infor extract from the transmitted bit stream a subset sufficient
mation, MPEG-4 offers standard tools that can be to generate audio signals with lower bandwidth and/or
combined to satisfY specific user requirements [51]. A lower quality depending on channel capacity or decoder
number of such configurations may be standardized. A complexity. MPEG-4 will become an international stan
syntactic description will be used to convey to a decoder dard in November 1998.
lack of available bandwidth, traditional channel coding 7. P . Noli, "Digital Audio Coding for Visual Communications," PrOG. of the
techniques may not be able to sufficiently improve the re IEEE, Vol. 83, No. 6, June 1995.
liability of the channel. 8 . ISOjIEC JTCljSC29, "Information Technology - Coding of Moving Pic
MPEG audio coders are controlled by psychoacoustic tures and Associated Audio for Digital Storage Media at up to About 1 . 5
Mbit/s - I S 1 1 172 (Part 3, Audio)," 1992.
models that may be improved, thus leaving room for an
evolutionary improvement of codecs. In the future, we 9. ISOjIEC JTCljSC29, "Information Technology · Generic Coding of Mov
ing Pictures and Associated Audio Information - IS 1 3 8 1 8 (Part 3,
will see new solutions for encoding. A better understand
Audio)," 1994.
ing of binaural perception and of stereo representation
10. ISO/MPEG, Doc. N0821, Proposal Package Description - Revision 1.0,
will lead to new proposals.
Nov. 1994.
Digital multichannel audio improves stereophonic im
1 1 . G.T. Hathaway, "A NICAM Digital Stereophonic Encoder," in "Audiovis
ages and will be of importance both for audio-only and ual Telecommunications (Editor: N.D. Nigthingale), Chapman & Hall,
multimedia applications. MPEG-2jAudio offers both 1992, pp. 71-84.
backwards-compatible and nonbackwards-compatible 12. E. Zwicker and R. Feldtkelier, Das Ohr als Nachrichtenemfanger. Sturt
coding schemes to serve different needs. Ongoing re gart: S. Hirzel Verlag, 1967.
search will result in enhanced multichannel representa 13. N.S. Jayant, J.D. Johnston, and R. Safranek, "Signal compression based on
tions by malGng better use of interchannel correlations models of human perception," ProG. ofthe IEEE, Vol. 81, No. 10, pp.
and interchannel maslGng effects to bring the bit rates fur 1385-1422, 1993.
ther down. We can also expect solutions for special pres 14. R. Zelinski and P. Noli, "Adaptive Transform Coding of Speech Signals,"
entations for people with impairments of hearing or IEEE Trans. onAcoustiGS, Speech and SignaI Proc. , Vol. ASSP-25, pp. 299-
309, August 1977.
vision which can make use ofthe multichannel configura
tions in various ways. 1 5 . A Hoogendorn, "Digital compact cassette," Proc. oftheIEEE, vol. 82, No.
10, pp. 1479-1489, Oct. 1994.
Current activities of the ISOjMPEG expert group
16. P. Noli, " On predictive quantizing schemes," Bell System TechnicalJournal,
aim at proposals for audio coding that will offer higher
vol. 57, pp. 1499-1532, 1978.
compression rates, and which will merge the whole
17. J. Makhoul and M. Berouti, "Adaptive noise spectral shaping and enttopy
range of audio, from high-fidelity audio coding and coding in predictive coding of speech." IEEE Trans. on Acoustics, Speech, and
speech coding down to synthetic speech and synthetic Signal Processing Vol. 27, No. 1, pp. 63-73, Feb. 1979
audio (ISOjlEC MPEG-4) . MPEG-4 will be the future 18. D. Esteban and C. Galand, "Application of Quadrature Mirror Filters to
multimedia standard. Because the basic audio quality Split Band Voice Coding.5chemes," Proc. lCASSP, pp. 191-195, 1987.
will be more important than compatibility with existing 19. J.H. Rothweiler, "Polyphase Quadrature Filters, a New Subband Coding
standards, this activity has opened the door for com Technique," Proc. International Conference ICASSP'83, pp. 1280-1283,
pletely new solutions. 1983.
2 1 . H.S. Malvar, "Signal Processing with Lapped Transforms," Artech House 40. H. Fuchs, "Report on the MPEG/Audio subjective listening tests in Han
Inc., 1992. nover," ISO/lEC JTCI/SC29/WG 1 1 : Doc.-No. MPEG 91/331, November
1991.
22. F.S. Yeah, C.S. Xydeas, "Split-band coding of speech signals using a trans
form technique," Proc. ICC, 1984, Vol.3, pp. 1 183-1187. 41. G. Stoll et al., "Extension ofISO/MPEG-Audio Layer II to multi-channel
coding: The future standard for broadcasting, telecommunication, and mul
23 W. Granzow, P. Noll, C. Volmary, "Frequency-domain coding of speech
signals," (in German), NTG-Fachbericht No. 94, VDE-Verlag, Berlin, timedia application," 94th Audio Engineering Society Convention, Berlin, Pre
24. B. Edler, "Coding of Audio Signals with Overlapping Block Transform and 42. B. Grill et al., "Improved MPEG-2 audio multi-channel encoding," 96th
Adaptive Window Functions," (in German), Frequcnz, vol. 43, pp. 252- Audio Engineering Society Convention, preprint 3865, Amsterdam, 1994.
256, 1989. 43. M. Bosi et aI., "ISO/IEC MPEG-2 advanced audio coding," 101" Audio
25. M. Iwadare, A. Sugiyama, F. Hazu, A. Hirano, and T. Nishitani, "A 128 Engineering Society Convention, preprint 4382, Los Angeles, 1996.
kb/s Hi-Pi Audio CODEC Based on Adaptive Transform Coding with
44. J.D. Johnston et al., "NBC-Audia-Stereo and multichannel coding meth
Adaptive Block Size," IEEE]. on Set. Areas in Commun. , Vol. 10, No. 1, pp.
ods," 101'h Audio Engineering Society Convention, Los Angeles, 1996, pre
138-144, January 1992.
print 4383.
26. R. Zelinski and P. Noll, "Adaptive Blockquantisierung von Sprachsig
45. W.R.Th. Ten Kate, et al., "Matrixing of bit rate reduced audio signals,"
nalen," Technical Report No. 181, Heinrich-Hertz-Institut fur
Proc. Int. Conf on Acoustics, Speech, and Signa/ Processing (ICASSP '92),
Nachrichtentechnik, Berlin, 1975.
Vol. 2, pp. II-205 - II-208, 1992.
27. R.G. van der Waal, K. Brandenburg, and G. Stoll, "Current and future
46. W.R.Th. Ten Kate, "Compatibility matrixing of multi-channel bit-rate
standardization of high-quality digital audio coding in MPEG," Proc. IEEE
reduced audio signals," 96thAudio Engineering Society Convention, preprint
ASSP Workshop on Applications ofSignal Processing to Audio andAcoustics,
3792, Amsterdam, 1994.
New Paltz, N.Y., 1993.
47. F. Feige and D. Kirby, "Report on the MPEG/Audio Multichannel Formal
28. P. Noll and D. Pan, "ISO/MPEG Audio Coding," InternationalJournal of
Subjective Listening Tests, " ISO/lEC JTCI/SC29/WG 1 1 : Doc. N 0685,
High Speed Electronics and Systems, Vol. 8, No. 1, pp. 69-118, 1997.
March 1994.
29. K. Brandenburg and G. Stoll, "The ISO/MPEG-Audio Codec: A Generic
48. D. Meares and D. Kirby, "Brief Subjective Listening Tests on MPEG-2
Standard for Coding of High Quality Digital Audio,"Journal ofthe Audio
Backwards Compatible Multichannel Audio Codecs," ISO/IEC
Engineerir!!J Society (AES), Vol. 42, No. 10, pp. 780-792, Oct. 1994.
JTCI/SC29/WG 1 1 : August 1994.
30. L.M. van de Kerkhof and A.G. Cugnini, "The ISO/MPEG audio coding
49. ISO/IEC/JTCl/SC29, "Reporr on the formal subjective listening tests of
standard," Widescreen Review, 1994.
MPEG-2 NBC multichannel audio coding," Document N1371, Oct. 1996.
31. Y.F. Dehery, G. Stoll, L. v.d. Kerkhof, "MUSICAM Source Coding for
50. ITU-R Document TG 10-2/3, Oct. 1991.
Digital Sound," 17th International Television Symposium, Montreux, Re
cord pp. 612-617, June 1991. 51. lEC/JTCl/SC29, "Description ofMPEG-4," Document N1410, Oct.
"ASPEC: Adaptive Spectral Perceptual Entropy Coding of High Quality 52. D.S. Burpee and P.W. Shumate, "Emerging residential broadband telecom
Music Signals," 90th. Audio Engineering Societ Cyonvention, Paris , preprint munications," Prot. IEEE, Vol. 82, No.4, pp. 604-614, 1994.
3011, 1991.
53. N.S. Jayant, "High Quality Networking of Audio-visual Information,"
33. H.G. Musmann, "The ISO Audio Coding Standard," Proc. IEEE Globe IEEE Commun. M'l!l'. , pp. 84-95, 1993.
com, Dec. 1990.
54. C. Todd et al., "AC-3 : Flexible perceptual coding for audio transmission
34. R.G. van der Waal, A.W.J. Oomen, and F.A. Griffiths, "Performance com
and storage," 96th Audio Engineering Society Convention, Preprint 3796,
parison of CD, noise-shaped CD and DCC," 96th Audio Engineering Society
Amsterdam, 1994.
Convention, Amsterdam, preprint 3845, 1994.
55. R. Hopkins, "Choosing an American digital HDTV terrestrial broadcasting
35. J. Herre, K. Brandenburg, and D. Lederer, "Intensity stereo coding," 96th
system," Proc. ofthe IEEE, Vol. 82, No. 4, pp. 554-563, 1994.
Audio Engineering Society Convention, Amsterdam, preprint no. 3799, 1994.
56. "The Grand Alliance," IEEE Spectrum, pp. 36-45, April 1995.
36. D. Pan, "A Tutorial on MPEG/Audio Compression," IEEE Trans. on Mul
timedia, Vol. 2, No. 2, 1995, pp. 60-74. 57. ETSI, European Telecommunication Standard, Draft prETS 300 401, Jan.
1994
37. K. Brandenburg et al., " Variable data-rate recording on a PC using
MPEG-Audio Layer III," 95th Audio Engineering Society Convention, New 58. Ch. Week, "The error protection of DAB," Audio Engineering Society
York, 1993. Conference "DAB - The Future of Radio," London, May 1995.
38. P.A. Sarginson,"MPEG-2: Overview of the system layer," BBC Research 59. R.D. Jurgen, "Broadcasting with Digital Audio," IEEE Spectrum, pp. 52-
and Development Report, BBC RD 1996/2, 1996. 59, March 1996.