AC3 Documentation
AC3 Documentation
(AC-3, E-AC-3)
NOTE: The user's attention is called to the possibility that compliance with this standard may
require use of an invention covered by patent rights. By publication of this standard, no position
is taken with respect to the validity of this claim or of any patent rights in connection therewith.
One or more patent holders have, however, filed a statement regarding the terms on which such
patent holder(s) may be willing to grant a license under these rights to individuals or entities
desiring to obtain such a license. Details may be obtained from the ATSC Secretary and the
patent holder.
2
Digital Audio Compression Standard, Table of Contents 22 November 2010
Table of Contents
1. SCOPE 19
2. INTRODUCTION 19
2.1 Motivation 20
2.2 Encoding 21
2.3 Decoding 22
3. REFERENCES 23
3.1 Normative References 23
3.2 Informative References 23
4. NOTATION, DEFINITIONS, AND TERMINOLOGY 24
4.1 Compliance Notation 24
4.2 Definitions 24
4.3 Terminology Abbreviations 25
5. BIT STREAM SYNTAX 29
5.1 Synchronization Frame 29
5.2 Semantics of Syntax Specification 29
5.3 Syntax Specification 30
5.3.1 syncinfo: Synchronization Information 30
5.3.2 bsi: Bit Stream Information 31
5.3.3 audioblk: Audio Block 32
5.3.4 auxdata: Auxiliary Data 37
5.3.5 errorcheck: Error Detection Code 38
5.4 Description of Bit Stream Elements 38
5.4.1 syncinfo: Synchronization Information 38
5.4.1.1 syncword: Synchronization Word, 16 bits 38
5.4.1.2 crc1: Cyclic Redundancy Check 1, 16 bits 38
5.4.1.3 fscod: Sample Rate Code, 2 bits 38
5.4.1.4 frmsizecod: Frame Size Code, 6 bits 38
5.4.2 bsi: Bit Stream Information 39
5.4.2.1 bsid: Bit Stream Identification, 5 bits 39
5.4.2.2 bsmod: Bit Stream Mode, 3 bits 39
5.4.2.3 acmod: Audio Coding Mode, 3 bits 39
5.4.2.4 cmixlev: Center Mix Level, 2 bits 40
5.4.2.5 surmixlev: Surround Mix Level, 2 bits 40
5.4.2.6 dsurmod: Dolby Surround Mode, 2 bits 40
5.4.2.7 lfeon: Low Frequency Effects Channel On, 1 bit 41
5.4.2.8 dialnorm: Dialogue Normalization, 5 bits 41
5.4.2.9 compre: Compression Gain Word Exists, 1 bit 41
5.4.2.10 compr: Compression Gain Word, 8 bits 41
5.4.2.11 langcode: Language Code Exists, 1 bit 41
5.4.2.12 langcod: Language Code, 8 bits 41
5.4.2.13 audprodie: Audio Production Information Exists, 1 bit 41
5.4.2.14 mixlevel: Mixing Level, 5 bits 42
3
Advanced Television Systems Committee Document A/52:2010
4
Digital Audio Compression Standard, Table of Contents 22 November 2010
5
Advanced Television Systems Committee Document A/52:2010
6
Digital Audio Compression Standard, Table of Contents 22 November 2010
7.4.1 Overview 85
7.4.2 Sub-Band Structure for Coupling 85
7.4.3 Coupling Coordinate Format 86
7.5 Rematrixing 87
7.5.1 Overview 87
7.5.2 Frequency Band Definitions 88
7.5.2.1 Coupling Not in Use 88
7.5.2.2 Coupling in Use, cplbegf > 2 89
7.5.2.3 Coupling in use, 2 ³ cplbegf > 0 89
7.5.2.4 Coupling in Use, cplbegf=0 89
7.5.3 Encoding Technique 90
7.5.4 Decoding Technique 90
7.6 Dialogue Normalization 91
7.6.1 Overview 91
7.7 Dynamic Range Compression 92
7.7.1 Dynamic Range Control; dynrng, dynrng2 92
7.7.1.1 Overview 92
7.7.1.2 Detailed Implementation 94
7.7.2 Heavy Compression; compr, compr2 95
7.7.2.1 Overview 95
7.7.2.2 Detailed Implementation 96
7.8 Downmixing 97
7.8.1 General Downmix Procedure 97
7.8.2 Downmixing Into Two Channels 101
7.9 Transform Equations and Block Switching 103
7.9.1 Overview 103
7.9.2 Technique 103
7.9.3 Decoder Implementation 104
7.9.4 Transformation Equations 104
7.9.4.1 512-Sample IMDCT Transform 104
7.9.4.2 256-Sample IMDCT Transforms 106
7.9.5 Channel Gain Range Code 109
7.10 Error Detection 109
7.10.1 CRC Checking 110
7.10.2 Checking Bit Stream Consistency 113
8. ENCODING THE AC-3 BIT STREAM 115
8.1 Introduction 115
8.2 Summary of the Encoding Process 115
8.2.1 Input PCM 115
8.2.1.1 Input Word Length 115
8.2.1.2 Input Sample Rate 115
8.2.1.3 Input Filtering 115
8.2.2 Transient Detection 115
8.2.3 Forward Transform 117
8.2.3.1 Windowing 117
8.2.3.2 Time to Frequency Transformation 118
7
Advanced Television Systems Committee Document A/52:2010
8
Digital Audio Compression Standard, Table of Contents 22 November 2010
9
Advanced Television Systems Committee Document A/52:2010
10
Digital Audio Compression Standard, Table of Contents 22 November 2010
11
Advanced Television Systems Committee Document A/52:2010
12
Digital Audio Compression Standard, Table of Contents 22 November 2010
13
Advanced Television Systems Committee Document A/52:2010
Index of Tables
14
Digital Audio Compression Standard, Table of Contents 22 November 2010
15
Advanced Television Systems Committee Document A/52:2010
16
Digital Audio Compression Standard, Table of Contents 22 November 2010
Index of Figures
17
ATSC Standard:
Digital Audio Compression Standard
1. SCOPE
This standard defines how to create a coded representation of audio information, how to describe
this representation, how to arrange the coded representation for storage or transmission and how
to decode the data to create audio. The coded representation defined herein is intended for use in
digital audio transmission and storage applications.
A short form designation of the audio coding algorithm specified in the body of this Standard
is “AC-3”. The short form designation of the audio coding algorithm specified in Annex E is “E-
AC-3”.
2. INTRODUCTION
The United States Advanced Television Systems Committee (ATSC), Inc., was formed by the
member organizations of the Joint Committee on InterSociety Coordination (JCIC)1, recognizing
that the prompt, efficient and effective development of a coordinated set of national standards is
essential to the future development of domestic television services.
One of the activities of the ATSC is exploring the need for and, where appropriate,
coordinating the development of voluntary national technical standards for Advanced Television
Systems (ATV). The ATSC Executive Committee assigned the work of documenting the U.S.
ATV standard to a number of specialist groups working under the Technology Group on
Distribution (T3). The Audio Specialist Group (T3/S7) was charged with documenting the ATV
audio standard.
This document was prepared initially by the Audio Specialist Group as part of its efforts to
document the United States Advanced Television Broadcast Standard. It was approved by the
Technology Group on Distribution on 26 September 1994, and by the full ATSC membership as
an ATSC Standard on 10 November 1994. Annex A, “AC-3 Elementary Streams in an MPEG-2
Multiplex,” was approved by the Technology Group on Distribution on 23 February 1995, and by
the full ATSC membership on 12 April 1995. Annex B, “AC-3 Data Stream in IEC958 Interface,”
and Annex C, “AC-3 Karaoke Mode,” were approved by the Technology Group on Distribution
on 24 October 1995 and by the full ATSC Membership on 20 December 1995.
Revision A of this standard was approved by the full ATSC membership on 20 August 2001.
Revision A corrected some errata in the detailed specifications, revised Annex A to include
additional information about the DVB standard, removed Annex B that described an interface
specification (superseeded by IEC and SMPTE standards), and added a new annex, “Alternate Bit
Stream Syntax,” which contributes (in a compatible fashion) some new features to the AC-3 bit
stream.
Revision B of this standard was approved by the full ATSC membership on 14 June 2005.
Revision B corrected some errata in the detailed specifications, and added a new annex,
1. The JCIC is presently composed of: the Electronic Industries Association (EIA), the Institute of Electrical and
Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable Television
Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE).
Page 19
Advanced Television Systems Committee, Inc. Document A/52:2010
“Enhanced AC-3 Bit Stream Syntax” which specifies a non-backwards compatible syntax that
offers additional coding tools and features. Informative references were removed from the body
of the document and placed in a new Annex B.
Note: Revision A of this standard removed the informative annex “AC-3 Data
Stream in IEC958 Interface” (Annex B). With this action, the former Annex C
“AC-3 Karaoke Mode” became Annex B, and a new annex, “Alternate Bit Stream
Syntax” became Annex C. Revision B of this standard restored the Annex “AC-3
Karaoke Mode” to its original designation of Annex C, moved the informative
references to a bibliograpy in a new Annex B, changed the designation of the
Annex “Alternate Bit Stream Syntax” to Annex D, and added a new Annex E,
“Enhanced AC-3 Bit Stream Syntax,” documenting an enhanced syntax for audio
coding (E-AC-3).
ATSC Standard A/53, “Digital Television Standard”, references this document and describes
how the audio coding algorithm described herein is applied in the ATSC DTV standard. The
DVB/ETSI TS 101 154 document describes how AC-3 is applied in the DVB DTV standard.
2.1 Motivation
In order to more efficiently broadcast or record audio signals, the amount of information required
to represent the audio signals may be reduced. In the case of digital audio signals, the amount of
digital information needed to accurately reproduce the original pulse code modulation (PCM)
samples may be reduced by applying a digital compression algorithm, resulting in a digitally
compressed representation of the original signal. (The term compression used in this context
means the compression of the amount of digital information which must be stored or recorded,
and not the compression of dynamic range of the audio signal.) The goal of the digital
compression algorithm is to produce a digital representation of an audio signal which, when
decoded and reproduced, sounds the same as the original signal, while using a minimum of digital
information (bit-rate) for the compressed (or encoded) representation. The AC-3 digital
compression algorithm specified in this document can encode from one to five full bandwidth
audio channels, along with a low frequency enhancement channel. The six channels of source
audio can be encoded from a PCM representation into a serial bit stream at data rates ranging
from 32 kbps to 640 kbps. When all six channels are present this is referred to as 5.1 channels.
The 0.1 channel refers to a fractional bandwidth channel intended to convey only low frequency
(subwoofer) signals.
While a wide range of encoded bit-rates is supported by this standard, a typical application of
the algorithm is shown in Figure 2.1. In this example, a 5.1 channel audio program is converted
from a PCM representation requiring more than 5 Mbps (6 channels × 48 kHz × 18 bits = 5.184
Mbps) into a 384 kbps serial bit stream by the AC-3 encoder. Satellite transmission equipment
converts this bit stream to an RF transmission which is directed to a satellite transponder. The
amount of bandwidth and power required by the transmission has been reduced by more than a
factor of 13 by the AC-3 digital compression. The signal received from the satellite is
demodulated back into the 384 kbps serial bit stream, and decoded by the AC-3 decoder. The
result is the original 5.1 channel audio program.
Digital compression of audio is useful wherever there is an economic benefit to be obtained
by reducing the amount of digital information required to represent the audio. Typical
20
Digital Audio Compression Standard 22 November 2010
Transmission
Input Audio
Signals
Left
Encoded
Center Bit-Stream Modulated
Right 384 kb/s Transmission Signal
Left Surround AC-3 Encoder
Equipment
Right Surround
Low Frequency Satellite Dish
Effects
Reception
Output Audio
Signals
Left
Encoded
Modulated Bit-Stream Center
Signal 384 kb/s Right
Reception
AC-3 Decoder Left Surround
Equipment
Right Surround
Satellite Dish Low Frequency
Effects
applications are in satellite or terrestrial audio broadcasting, delivery of audio over metallic or
optical cables, or storage of audio on magnetic, optical, semiconductor, or other storage media.
2.2 Encoding
The AC-3 encoder accepts PCM audio and produces an encoded bit stream consistent with this
standard. The specifics of the audio encoding process are not normative requirements of this
standard. Nevertheless, the encoder must produce a bit stream matching the syntax described in
Section 5, which, when decoded according to Sections 6 and 7, produces audio of sufficient
quality for the intended application. Section 8 contains informative information on the encoding
process. The encoding process is briefly described below.
The AC-3 algorithm achieves high coding gain (the ratio of the input bit-rate to the output bit-
rate) by coarsely quantizing a frequency domain representation of the audio signal. A block
diagram of this process is shown in Figure 2.2. The first step in the encoding process is to
transform the representation of audio from a sequence of PCM time samples into a sequence of
blocks of frequency coefficients. This is done in the analysis filter bank. Overlapping blocks of
512 time samples are multiplied by a time window and transformed into the frequency domain.
Due to the overlapping blocks, each PCM input sample is represented in two sequential
transformed blocks. The frequency domain representation may then be decimated by a factor of
two so that each block contains 256 frequency coefficients. The individual frequency coefficients
are represented in binary exponential notation as a binary exponent and a mantissa. The set of
exponents is encoded into a coarse representation of the signal spectrum which is referred to as
the spectral envelope. This spectral envelope is used by the core bit allocation routine, which
determines how many bits to use to encode each individual mantissa. The spectral envelope and
the coarsely quantized mantissas for six audio blocks (1536 audio samples per channel) are
formatted into an AC-3 frame. The AC-3 bit stream is a sequence of AC-3 frames.
21
Advanced Television Systems Committee, Inc. Document A/52:2010
The actual AC-3 encoder is more complex than indicated in Figure 2.2. The following
functions not shown above are also included:
1. A frame header is attached which contains information (bit-rate, sample rate, number of
encoded channels, etc.) required to synchronize to and decode the encoded bit stream.
2. Error detection codes are inserted in order to allow the decoder to verify that a received frame
of data is error free.
3. The analysis filterbank spectral resolution may be dynamically altered so as to better match
the time/frequency characteristic of each audio block.
4. The spectral envelope may be encoded with variable time/frequency resolution.
5. A more complex bit allocation may be performed, and parameters of the core bit allocation
routine modified so as to produce a more optimum bit allocation.
6. The channels may be coupled together at high frequencies in order to achieve higher coding
gain for operation at lower bit-rates.
7. In the two-channel mode, a rematrixing process may be selectively performed in order to
provide additional coding gain, and to allow improved results to be obtained in the event that
the two-channel signal is decoded with a matrix surround decoder.
2.3 Decoding
The decoding process is basically the inverse of the encoding process. The decoder, shown in
Figure 2.3, must synchronize to the encoded bit stream, check for errors, and de-format the
various types of data such as the encoded spectral envelope and the quantized mantissas. The bit
allocation routine is run and the results used to unpack and de-quantize the mantissas. The
spectral envelope is decoded to produce the exponents. The exponents and mantissas are
transformed back into the time domain to produce the decoded PCM time samples.
The actual AC-3 decoder is more complex than indicated in Figure 2.3. The following
functions not shown above are included:
22
Digital Audio Compression Standard 22 November 2010
3. REFERENCES
At the time of publication, the editions indicated were valid. All referenced documents are subject
to revision, and users of this Standard are encouraged to investigate the possibility of applying the
most recent edition of the referenced document.
23
Advanced Television Systems Committee, Inc. Document A/52:2010
[5] ATSC: “Digital Television Standard: Part 3 - Service Multiplex and Transport
SubsystemCharacteristics,” Doc. A/53 Part 3:2009, Advanced Television Systems
Committee, Washington, D.C., 7 August 2009.
[6] ATSC: “Digital Television Standard: Part 5 - AC-3 Audio System Characteristics,” Doc. A/
53 Part 5:2010, Advanced Television Systems Committee, Washington, D.C., 6 July 2010.
[7] ATSC: “Digital Television Standard: Part 6 - Enhanced AC-3 Audio System
Characteristics,” Doc. A/53 Part 6:2010, Advanced Television Systems Committee,
Washington, D.C., 6 July 2010.
[8] DVB/ETSI: TS 101 154 V1.9.1, “Specification for the use of Video and Audio Coding in
Broadcasting Applications based on the MPEG-2 Transport Stream,” 2009-09.
[9] ETSI: TS 102 366 V1.2.1, “Digital Audio Compression (AC-3, Enhanced AC-3) Standard,”
2008-08.
[10] ITU: ITU-R BT.1300-3, “Service multiplex, transport, and identification methods for digital
terrestrial television broadcasting,” 2005.
[11] DVB/ETSI: EN 300 468 V1.9.1, “Specification for Service Information (SI) in DVB
systems,” 2009-03.
[12] ATSC: “Program and System Information Protocol for Terrestrial Broadcast and Cable
(PSIP),” Doc. A/65:2009, Advanced Television Systems Committee, Washington, D.C., 14
April 2009.
4.2 Definitions
A number of terms are used in this document. Below are definitions that explain the meaning of
some of the terms used.
audio block – A set of 512 audio samples consisting of 256 samples of the preceding audio block,
and 256 new time samples. A new audio block occurs every 256 audio samples. Each audio
sample is represented in two audio blocks.
bin – The number of the frequency coefficient, as in frequency bin number n. The 512 point
TDAC transform produces 256 frequency coefficients or frequency bins.
24
Digital Audio Compression Standard 22 November 2010
coefficient – The time domain samples are converted into frequency domain coefficients by the
transform.
coupled channel – A full bandwidth channel whose high frequency information is combined into
the coupling channel.
coupling band – A band of coupling channel transform coefficients covering one or more
coupling channel sub-bands.
coupling channel – The channel formed by combining the high frequency information from the
coupled channels.
coupling sub-band – A sub-band consisting of a group of 12 coupling channel transform
coefficients.
downmixing – Combining (or mixing down) the content of n original channels to produce m
channels, where m < n.
exponent set – The set of exponents for an independent channel, for the coupling channel, or for
the low frequency portion of a coupled channel.
full bandwidth (fbw) channel – An audio channel capable of full audio bandwidth. All channels
(left, center, right, left surround, right surround) except the lfe channel are fbw channels.
independent channel – A channel whose high frequency information is not combined into the
coupling channel. (The lfe channel is always independent.)
low frequency effects (lfe) channel – An optional single channel of limited (<120 Hz)
bandwidth, which is intended to be reproduced at a level +10 dB with respect to the fbw
channels. The optional lfe channel allows high sound pressure levels to be provided for low
frequency sounds.
spectral envelope – A spectral estimate consisting of the set of exponents obtained by decoding
the encoded exponents. Similar (but not identical) to the original set of exponents.
synchronization frame – A unit of the serial bit stream capable of being fully decoded. The
synchronization frame begins with a sync code and contains 1536 coded audio samples.
window – A time vector which is multiplied by an audio block to provide a windowed audio
block. The window shape establishes the frequency selectivity of the filterbank, and provides
for the proper overlap/add characteristic to avoid blocking artifacts.
25
Advanced Television Systems Committee, Inc. Document A/52:2010
any computer source code, or other hardware or software implementation documentation. Table
4.1 lists the abbreviations used in this document, their terminology and section reference.
26
Digital Audio Compression Standard 22 November 2010
27
Advanced Television Systems Committee, Inc. Document A/52:2010
28
Digital Audio Compression Standard 22 November 2010
29
Advanced Television Systems Committee, Inc. Document A/52:2010
values). Fields or elements contained in the bit stream are indicated with bold type. Syntactic
elements are typographically distinguished by the use of a different font (e.g., dynrng).
Some AC-3 bit stream elements naturally form arrays. This syntax specification treats all bit
stream elements individually, whether or not they would naturally be included in arrays. Arrays
are thus described as multiple elements (as in blksw[ch] as opposed to simply blksw or blksw[]), and
control structures such as for loops are employed to increment the index ([ch] for channel in this
example).
Syntax
AC-3_bitstream()
{
while(true)
{
syncframe() ;
}
} /* end of AC-3 bit stream */
The syncframe consists of the syncinfo and bsi fields, the 6 coded audblk fields, the auxdata field,
and the errorcheck field.
Syntax
syncframe()
{
syncinfo() ;
bsi() ;
for (blk = 0; blk < 6; blk++)
{
audblk() ;
}
auxdata() ;
errorcheck() ;
} /* end of syncframe */
Each of the bit stream elements, and their length, are itemized in the following tables. Note
that all bit stream elements arrive most significant bit first, or left bit first, in time.
30
Digital Audio Compression Standard 22 November 2010
31
Advanced Television Systems Committee, Inc. Document A/52:2010
32
Digital Audio Compression Standard 22 November 2010
33
Advanced Television Systems Committee, Inc. Document A/52:2010
34
Digital Audio Compression Standard 22 November 2010
35
Advanced Television Systems Committee, Inc. Document A/52:2010
36
Digital Audio Compression Standard 22 November 2010
37
Advanced Television Systems Committee, Inc. Document A/52:2010
38
Digital Audio Compression Standard 22 November 2010
39
Advanced Television Systems Committee, Inc. Document A/52:2010
40
Digital Audio Compression Standard 22 November 2010
dsurmodis set to the reserved code, the decoder should still reproduce audio. The reserved code
may be interpreted as “not indicated”.
41
Advanced Television Systems Committee, Inc. Document A/52:2010
42
Digital Audio Compression Standard 22 November 2010
5.4.2.26 timecod1e, timcode2e: Time Code (first and second) Halves Exist, 2 bits
These values indicate, as shown in Table 5.13, whether time codes follow in the bit stream. The
time code can have a resolution of 1/64th of a frame (one frame = 1/30th of a second). Since only
the high resolution portion of the time code is needed for fine synchronization, the 28 bit time
code is broken into two 14 bit halves. The low resolution first half represents the code in 8 second
increments up to 24 hours. The high resolution second half represents the code in 1/64th frame
increments up to 8 seconds.
43
Advanced Television Systems Committee, Inc. Document A/52:2010
44
Digital Audio Compression Standard 22 November 2010
45
Advanced Television Systems Committee, Inc. Document A/52:2010
46
Digital Audio Compression Standard 22 November 2010
47
Advanced Television Systems Committee, Inc. Document A/52:2010
48
Digital Audio Compression Standard 22 November 2010
5.4.3.42 lfefsnroffst: Low Frequency Effects Channel Fine SNR Offset, 4 bits
This 4-bit code specifies the fine SNR offset parameter used in the bit allocation process for the
lfe channel.
5.4.3.43 lfefgaincod: Low Frequency Effects Channel Fast Gain Code, 3 bits
This 3-bit code specifies the fast gain parameter used in the bit allocation process for the lfe
channel.
49
Advanced Television Systems Committee, Inc. Document A/52:2010
mute. This parameter shall not be set to ‘00’ in block 0, or in any block for which coupling is
enabled but was disabled in the previous block.
50
Digital Audio Compression Standard 22 November 2010
51
Advanced Television Systems Committee, Inc. Document A/52:2010
The number of bits in the frame can be determined from the frame size code (frmsizcod) and
Table 5.18. The number of bits used includes all bits used by bit stream elements with the
exception of auxbits. Any dummy data which has been included with skip fields (skipfld) is included
in the used bit count. The length of the auxbits field is adjusted by the encoder such that the crc2
element falls on the last 16-bit word of the frame.
If the number of user bits indicated by auxdatal is smaller than the number of available aux bits
nauxbits, the user data is located at the end of the auxbits field. This allows a decoder to find and
unpack the auxdatal user bits without knowing the value of nauxbits (which can only be determined
by decoding the audio in the entire frame). The order of the user data in the auxbits field is forward.
Thus the aux data decoder (which may not decode any audio) may simply look to the end of the
AC-3 syncframe to find auxdatal, backup auxdatal bits (from the beginning of auxdatal) in the data
stream, and then unpack auxdatal bits moving forward in the data stream.
52
Digital Audio Compression Standard 22 November 2010
53
Advanced Television Systems Committee, Inc. Document A/52:2010
54
Digital Audio Compression Standard 22 November 2010
55
Advanced Television Systems Committee, Inc. Document A/52:2010
burst mode operation, either the data source or the decoder may be the master controlling the burst
timing. The AC-3 decoder input buffer may be smaller in size if the decoder can request bursts of
data on an as-needed basis. However, the external buffer memory may be larger in this case.
56
Digital Audio Compression Standard 22 November 2010
6.1.7 Decoupling
When coupling is in use, the channels which are coupled must be decoupled. Decoupling involves
reconstructing the high frequency section (exponents and mantissas) of each coupled channel,
from the common coupling channel and the coupling coordinates for the individual channel.
Within each coupling band, the coupling channel coefficients (exponent and mantissa) are
multiplied by the individual channel coupling coordinates. The coupling process is described in
detail in Section 7.4.
6.1.8 Rematrixing
In the 2/0 audio coding mode rematrixing may be employed, as indicated by the rematrix flags
(rematflg[rbnd]). Where the flag indicates a band is rematrixed, the coefficients encoded in the bit
stream are sum and difference values instead of left and right values. Rematrixing is described in
detail in Section 7.5.
57
Advanced Television Systems Committee, Inc. Document A/52:2010
6.1.12 Downmixing
If the number of channels required at the decoder output is smaller than the number of channels
which are encoded in the bit stream, then downmixing is required. Downmixing in the time
domain is shown in this example decoder. Since the inverse transform is a linear operation, it is
also possible to downmix in the frequency domain prior to transformation. Section 7.8 describes
downmixing and specifies the downmix coefficients which decoders shall employ.
7. ALGORITHMIC DETAILS
The following sections describe various aspects of AC-3 coding in detail.
7.1.1 Overview
The actual audio information conveyed by the AC-3 bit stream consists of the quantized
frequency coefficients. The coefficients are delivered in floating point form, with each coefficient
consisting of an exponent and a mantissa. This section describes how the exponents are encoded
and packed into the bit stream.
Exponents are 5-bit values which indicate the number of leading zeros in the binary
representation of a frequency coefficient. The exponent acts as a scale factor for each mantissa,
equal to 2-exp. Exponent values are allowed to range from 0 (for the largest value coefficients with
58
Digital Audio Compression Standard 22 November 2010
no leading zeroes) to 24. Exponents for coefficients which have more than 24 leading zeroes are
fixed at 24, and the corresponding mantissas are allowed to have leading zeros. Exponents require
5 bits in order to represent all allowed values.
AC-3 bit streams contain coded exponents for all independent channels, all coupled channels,
and for the coupling and low frequency effects channels (when they are enabled). Since audio
information is not shared across frames, block 0 of every frame will include new exponents for
every channel. Exponent information may be shared across blocks within a frame, so blocks 1
through 5 may reuse exponents from previous blocks.
AC-3 exponent transmission employs differential coding, in which the exponents for a
channel are differentially coded across frequency. The first exponent of a fbw or lfe channel is
always sent as a 4-bit absolute value, ranging from 0–15. The value indicates the number of
leading zeros of the first (dc term) transform coefficient. Successive (going higher in frequency)
exponents are sent as differential values which must be added to the prior exponent value in order
to form the next absolute value.
The differential exponents are combined into groups in the audio block. The grouping is done
by one of three methods, D15, D25, or D45, which are referred to as exponent strategies. The
number of grouped differential exponents placed in the audio block for a particular channel
depends on the exponent strategy and on the frequency bandwidth information for that channel.
The number of exponents in each group depends only on the exponent strategy.
An AC-3 audio block contains two types of fields with exponent information. The first type
defines the exponent coding strategy for each channel, and the second type contains the actual
coded exponents for channels requiring new exponents. For independent channels, frequency
bandwidth information is included along with the exponent strategy fields. For coupled channels,
and the coupling channel, the frequency information is found in the coupling strategy fields.
59
Advanced Television Systems Committee, Inc. Document A/52:2010
These five values are mapped into the values 0, 1, 2, 3, 4 before being grouped, as shown in Table
7.1.
In the D15 mode, the above mapping is applied to each individual differential exponent for
coding into the bit stream. In the D25 mode, each pair of differential exponents is represented by
a single mapped value in the bit stream. In this mode the second differential exponent of each pair
is implied as a delta of 0 from the first element of the pair as indicated in Table 7.2.
The D45 mode is similar to the D25 mode except that quads of differential exponents are
represented by a single mapped value, as indicated by Table 7.3.
60
Digital Audio Compression Standard 22 November 2010
The exponent field for a given channel in an AC-3 audio block consists of a single absolute
exponent followed by a number of these grouped values.
When the low frequency effects channel is enabled the lfeexpstr field is present. It is decoded as
shown in Table 7.5.
Following the exponent strategy fields in the bit stream is a set of channel bandwidth codes,
chbwcod[ch].
These are only present for independent channels (channels not in coupling) that have
new exponents in the current block. The channel bandwidth code defines the end mantissa bin
number for that channel according to the following
endmant[ch] = ((chbwcod[ch] + 12) * 3) + 37; /* (ch is not coupled) */
For coupled channels the end mantissa bin number is defined by the starting bin number of the
coupling channel
endmant[ch] = cplstrtmant; /* (ch is coupled) */
where cplstrtmant is as derived below. By definition the starting mantissa bin number for
independent and coupled channels is 0
strtmant[ch] = 0
For the coupling channel, the frequency bandwidth information is derived from the fields
cplbegf and cplendf found in the coupling strategy information. The coupling channel starting and
ending mantissa bins are defined as
cplstrtmant = (cplbegf * 12) + 37
61
Advanced Television Systems Committee, Inc. Document A/52:2010
The low frequency effects channel, when present, always starts in bin 0 and always has the
same number of mantissas
lfestrtmant = 0
lfeendmant = 7
The second set of fields contains coded exponents for all channels indicated to have new
exponents in the current block. These fields are designated as exps[ch][grp] for independent and
coupled channels, cplexps[grp] for the coupling channel, and lfeexps[grp] for the low frequency effects
channel. The first element of the exps fields (exps[ch][0]) and the lfeexps field (lfeexps[0]) is always a 4-
bit absolute number. For these channels the absolute exponent always contains the exponent value
of the first transform coefficient (bin #0). These 4 bit values correspond to a 5-bit exponent which
has been limited in range (0 to 15, instead of 0 to 24); i.e., the most significant bit is zero. The
absolute exponent for the coupled channel, cplabsexp, is only used as a reference to begin decoding
the differential exponents for the coupling channel (i.e., it does not represent an actual exponent).
The cplabsexp is contained in the audio block as a 4-bit value, however it corresponds to a 5-bit
value. The LSB of the coupled channel initial exponent is always 0, so the decoder must take the
4-bit value which was sent, and double it (left shift by 1) in order to obtain the 5-bit starting value.
For each coded exponent set the number of grouped exponents (not including the first
absolute exponent) to decode from the bit stream is derived as follows:
For independent and coupled channels:
nlfegrps =2
Decoding a set of coded grouped exponents will create a set of 5-bit absolute exponents. The
exponents are decoded as follows:
1. Each 7 bit grouping of mapped values (gexp) is decoded using the inverse of the encoding
procedure:
62
Digital Audio Compression Standard 22 November 2010
2. Each mapped value is converted to a differential exponent (dexp) by subtracting the mapping
offset:
dexp = M 2
4. For the D25 and D45 modes, each absolute exponent is copied to the remaining members of
the pair or quad.
The above procedure can be summarized as follows:
63
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
/* unpack the mapped values */
for (grp = 0; grp < ngrps; grp++)
{
expacc = gexp[grp] ;
dexp[grp * 3] = truncate (expacc / 25) ;
expacc = expacc - ( 25 * dexp[grp * 3]) ;
dexp[(grp * 3) + 1] = truncate ( expacc / 5) ;
expacc = expacc - (5 * dexp[(grp * 3) + 1]) ;
dexp[(grp * 3) + 2] = expacc ;
}
/* unbiased mapped values */
for (grp = 0; grp < (ngrps * 3); grp++)
{
dexp[grp] = dexp[grp] - 2 ;
}
/* convert from differentials to absolutes */
prevexp = absexp ;
for (i = 0; i < (ngrps * 3); i++)
{
aexp[i] = prevexp + dexp[i] ;
prevexp = aexp[i] ;
}
/* expand to full absolute exponent array, using grpsize */
exp[0] = absexp ;
for (i = 0; i < (ngrps * 3); i++)
{
for (j = 0; j < grpsize; j++)
{
exp[(i * grpsize) + j +1] = aexp[i] ;
}
}
Where,:
ngrps = number of grouped exponents (nchgrps[ch], ncplgrps, or nlfegrps)
grpsize = 1 for D15
= 2 for D25
= 4 for D45
absexp = absolute exponent (exps[ch][0], (cplabsexp<<1), or lfeexps[0])
For the coupling channel the above output array, exp[n], should be offset to correspond to the
coupling start mantissa bin:
For the remaining channels exp[n] will correspond directly to the absolute exponent array for
that channel.
64
Digital Audio Compression Standard 22 November 2010
7.2.1 Overview
The bit allocation routine analyzes the spectral envelope of the audio signal being coded with
respect to masking effects to determine the number of bits to assign to each transform coefficient
mantissa. In the encoder, the bit allocation is performed globally on the ensemble of channels as
an entity, from a common bit pool. There are no preassigned exponent or mantissa bits, allowing
the routine to flexibly allocate bits across channels, frequencies, and audio blocks in accordance
with signal demand.
The bit allocation contains a parametric model of human hearing for estimating a noise level
threshold, expressed as a function of frequency, which separates audible from inaudible spectral
components. Various parameters of the hearing model can be adjusted by the encoder depending
upon signal characteristics. For example, a prototype masking curve is defined in terms of two
piecewise continuous line segments, each with its own slope and y-axis intercept. One of several
possible slopes and intercepts is selected by the encoder for each line segment. The encoder may
iterate on one or more such parameters until an optimal result is obtained. When all parameters
used to estimate the noise level threshold have been selected by the encoder, the final bit
allocation is computed. The model parameters are conveyed to the decoder with other side
information. The decoder executes the routine in a single pass.
The estimated noise level threshold is computed over 50 bands of nonuniform bandwidth (an
approximate 1/6 octave scale). The banding structure, defined by tables in the next section, is
independent of sampling frequency. The required bit allocation for each mantissa is established by
performing a table lookup based upon the difference between the input signal power spectral
density (PSD) evaluated on a fine-grain uniform frequency scale, and the estimated noise level
threshold evaluated on the coarse-grain (banded) frequency scale. Therefore, the bit allocation
result for a particular channel has spectral granularity corresponding to the exponent strategy
employed. More specifically, a separate bit allocation will be computed for each mantissa within a
D15 exponent set, each pair of mantissas within a D25 exponent set, and each quadruple of
mantissas within a D45 exponent set.
The bit allocation must be computed in the decoder whenever the exponent strategy (chexpstr,
cplexpstr, lfeexpstr) for one or more channels does not indicate reuse, or whenever baie, snroffste, or
deltbaie = 1. Accordingly, the bit allocation can be updated at a rate ranging from once per audio
block to once per 6 audio blocks, including the integral steps in between. A complete set of new
bit allocation information is always transmitted in audio block 0.
Since the parametric bit allocation routine must generate identical results in all encoder and
decoder implementations, each step is defined exactly in terms of fixed-point integer operations
and table lookups. Throughout the discussion below, signed two's complement arithmetic is
employed. All additions are performed with an accumulator of 14 or more bits. All intermediate
results and stored values are 8-bit values.
65
Advanced Television Systems Committee, Inc. Document A/52:2010
implement. Alternatively, the seven steps can be executed horizontally, in which case multiple
passes through all seven steps are made for separate subsets of the input exponent set.
The choice of vertical vs. horizontal execution depends upon the relative importance of
execution time vs. memory usage in the final implementation. Vertical execution of the algorithm
is usually faster due to reduced looping and context save overhead. However, horizontal
execution requires less RAM to store the temporary arrays generated in each step. Hybrid
horizontal/vertical implementation approaches are also possible which combine the benefits of
both techniques.
7.2.2.1 Initialization
Compute start/end frequencies for the channel being decoded. These are computed from
parameters in the bit stream as follows:
Pseudo Code
/* for fbw channels */
for (ch=0; ch<nfchans; ch++)
{
strtmant[ch] = 0;
if (chincpl[ch]) endmant[ch] = 37 + (12 × cplbegf) ; /* channel is coupled */
else endmant[ch] = 37 + (3 × (chbwcod + 12)) ; /* channel is not coupled */
}
/* for coupling channel */
cplstrtmant = 37 + (12 × cplbegf) ;
cplendmant = 37 + [12 × (cplendf + 3)] ;
/* for lfe channel */
lfestartmant = 0 ;
lfeendmant = 7 ;
Pseudo Code
sdecay = slowdec[sdcycod] ; /* Table 7.6 */
fdecay = fastdec[fdcycod] /* Table 7.7 */
sgain = slowgain[sgaincod] /* Table 7.8 */
dbknee = dbpbtab[dbpbcod] /* Table 7.9 */
floor = floortab[floorcod] /* Table 7.10 */
66
Digital Audio Compression Standard 22 November 2010
Pseudo Code
start = strtmant[ch] ;
end = endmant[ch] ;
lowcomp = 0 ;
fgain = fastgain[fgaincod[ch]]; /* Table 7.11 */
snroffset[ch] = (((csnroffst − 15) << 4) + fsnroffst[ch]) << 2 ;
Pseudo Code
start = cplstrtmant ;
end = cplendmant ;
fgain = fastgain[cplfgaincod] ; /* Table 7.11 */
snroffset = (((csnroffst − 15) << 4) + cplfsnroffst) << 2 ;
fastleak = (cplfleak << 8) + 768 ;
slowleak = (cplsleak << 8) + 768 ;
Pseudo Code
start = lfestrtmant ;
end = lfeendmant ;
lowcomp = 0 ;
fgain = fastgain[lfefgaincod] ;
snroffset = (((csnroffst - 15) << 4) + lfefsnroffst) << 2 ;
Pseudo Code
for (bin=start; bin<end; bin++)
{
psd[bin] = (3072 - (exp[bin] << 7)) ;
}
Since exp[k] assumes integral values ranging from 0 to 24, the dynamic range of the psd[] values
is from 0 (for the lowest-level signal) to 3072 for the highest-level signal. The resulting function
is represented on a fine-grain, linear frequency scale.
67
Advanced Television Systems Committee, Inc. Document A/52:2010
contain duplicate information, all of which need not be available in an actual implementation.
They are shown here for simplicity of presentation only.
The integration of PSD values in each band is performed with log-addition. The log-addition
is implemented by computing the difference between the two operands and using the absolute
difference divided by 2 as an address into a length 256 lookup table, latab[], shown in Table 7.14.
Pseudo Code
j = start ;
k = masktab[start] ;
do
{
lastbin = min(bndtab[k] + bndsz[k], end);
bndpsd[k] = psd[j] ;
j++ ;
for (i = j; i < lastbin; i++)
{
bndpsd[k] = logadd(bndpsd[k], psd[j]) ;
j++ ;
}
k++ ;
}
while (end > lastbin) ;
logadd(a, b)
{
c=a−b;
address = min((abs(c) >> 1), 255) ;
if (c >= 0)
{
return(a + latab(address)) ;
}
else
{
return(b + latab(address)) ;
}
}
Pseudo Code
bndstrt = masktab[start] ;
bndend = masktab[end - 1] + 1 ;
68
Digital Audio Compression Standard 22 November 2010
69
Advanced Television Systems Committee, Inc. Document A/52:2010
{
fastleak -= fdecay ;
fastleak = max(fastleak, bndpsd[bin] - fgain) ;
slowleak -= sdecay ;
slowleak = max(slowleak, bndpsd[bin] - sgain) ;
excite[bin] = max(fastleak, slowleak) ;
}
calc_lowcomp(a, b0, b1, bin)
{
if (bin < 7)
{
if ((b0 + 256) == b1) ;
{
a = 384 ;
}
else if (b0 > b1)
{
a = max(0, a - 64) ;
}
}
else if (bin < 20)
{
if ((b0 + 256) == b1)
{
a = 320 ;
}
else if (b0 > b1)
{
a = max(0, a - 64) ;
}
}
else
{
a = max(0, a - 128) ;
}
return(a) ;
}
70
Digital Audio Compression Standard 22 November 2010
Pseudo Code
for (bin = bndstrt; bin < bndend; bin++)
{
if (bndpsd[bin] < dbknee)
{
excite[bin] += ((dbknee - bndpsd[bin]) >> 2) ;
}
mask[bin] = max(excite[bin], hth[fscod][bin]) ;
}
71
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
if ((deltbae == 0) || (deltbae == 1))
{
band = 0 ;
for (seg = 0; seg < deltnseg+1; seg++)
{
band += deltoffst[seg] ;
if (deltba[seg] >= 4)
{
delta = (deltba[seg] - 3) << 7 ;
}
else
{
delta = (deltba[seg] - 4) << 7 ;
}
for (k = 0; k < deltlen[seg]; k++)
{
mask[band] += delta ;
band++ ;
}
}
}
72
Digital Audio Compression Standard 22 November 2010
Pseudo Code
i = start ;
j = masktab[start] ;
do
{
lastbin = min(bndtab[j] + bndsz[j], end) ;
mask[j] -= snroffset ;
mask[j] -= floor ;
if (mask[j] < 0)
{
mask[j] = 0 ;
}
mask[j] &= 0x1fe0 ;
mask[j] += floor ;
for (k = i; k < lastbin; k++)
{
address = (psd[i] - mask[j]) >> 5 ;
address = min(63, max(0, address)) ;
bap[i] = baptab[address] ;
i++ ;
}
j++;
}
while (end > lastbin) ;
73
Advanced Television Systems Committee, Inc. Document A/52:2010
74
Digital Audio Compression Standard 22 November 2010
75
Advanced Television Systems Committee, Inc. Document A/52:2010
Table 7.13 Bin Number to Band Number Table, masktab[bin], bin = (10 * A) + B
B=0 B=1 B=2 B=3 B=4 B=5 B=6 B=7 B=8 B=9
A=0 0 1 2 3 4 5 6 7 8 9
A=1 10 11 12 13 14 15 16 17 18 19
A=2 20 21 22 23 24 25 26 27 28 28
A=3 28 29 29 29 30 30 30 31 31 31
A=4 32 32 32 33 33 33 34 34 34 35
A=5 35 35 35 35 35 36 36 36 36 36
A=6 36 37 37 37 37 37 37 38 38 38
A=7 38 38 38 39 39 39 39 39 39 40
A=8 40 40 40 40 40 41 41 41 41 41
A=9 41 41 41 41 41 41 41 42 42 42
A=10 42 42 42 42 42 42 42 42 42 43
A=11 43 43 43 43 43 43 43 43 43 43
A=12 43 44 44 44 44 44 44 44 44 44
A=13 44 44 44 45 45 45 45 45 45 45
A=14 45 45 45 45 45 45 45 45 45 45
A=15 45 45 45 45 45 45 45 46 46 46
A=16 46 46 46 46 46 46 46 46 46 46
A=17 46 46 46 46 46 46 46 46 46 46
A=18 46 47 47 47 47 47 47 47 47 47
A=19 47 47 47 47 47 47 47 47 47 47
A=20 47 47 47 47 47 48 48 48 48 48
A=21 48 48 48 48 48 48 48 48 48 48
A=22 48 48 48 48 48 48 48 48 48 49
A=23 49 49 49 49 49 49 49 49 49 49
A=24 49 49 49 49 49 49 49 49 49 49
A=25 49 49 49 0 0 0
76
Digital Audio Compression Standard 22 November 2010
77
Advanced Television Systems Committee, Inc. Document A/52:2010
78
Digital Audio Compression Standard 22 November 2010
79
Advanced Television Systems Committee, Inc. Document A/52:2010
7.3.1 Overview
All mantissas are quantized to a fixed level of precision indicated by the corresponding bap.
Mantissas quantized to 15 or fewer levels use symmetric quantization. Mantissas quantized to
more than 15 levels use asymmetric quantization which is a conventional two’s complement
representation.
Some quantized mantissa values are grouped together and encoded into a common codeword.
In the case of the 3-level quantizer, 3 quantized values are grouped together and represented by a
5-bit codeword in the data stream. In the case of the 5-level quantizer, 3 quantized values are
grouped and represented by a 7-bit codeword. For the 11-level quantizer, 2 quantized values are
grouped and represented by a 7-bit codeword.
In the encoder, each transform coefficient (which is always < 1.0) is left-justified by shifting
its binary representation left the number of times indicated by its exponent (0 to 24 left shifts).
The amplified coefficient is then quantized to a number of levels indicated by the corresponding
bap.
The following table indicates which quantizer to use for each bap. If a bap equals 0, no bits are
sent for the mantissa. Grouping is used for baps of 1, 2, and 4 (3, 5, and 11 level quantizers.)
80
Digital Audio Compression Standard 22 November 2010
During the decode process, the mantissa data stream is parsed up into single mantissas of
varying length, interspersed with groups representing combined coding of either triplets or pairs
of mantissas. In the bit stream, the mantissas in each exponent set are arranged in frequency
ascending order. However, groups occur at the position of the first mantissa contained in the
group. Nothing is unpacked from the bit stream for the subsequent mantissas in the group.
The mantissa number k, of length qntztab[bap[k]], is extracted from the bit stream. Conversion
back to a fixed point representation is achieved by right shifting the mantissa by its exponent. This
process is represented by the formula
transform_coefficient[k] = mantissa[k] >> exponent[k] ;
81
Advanced Television Systems Committee, Inc. Document A/52:2010
The resulting mantissa value is right shifted by the corresponding exponent to generate the
transform coefficient value
transform_coefficient[k] = quantization_table[mantissa_code[k]] >> exponent[k] ;
The mapping of coded mantissa value into the actual mantissa value is shown in tables Table
7.19 through Table 7.23.
82
Digital Audio Compression Standard 22 November 2010
83
Advanced Television Systems Committee, Inc. Document A/52:2010
84
Digital Audio Compression Standard 22 November 2010
Encoder equations
bap = 1:
group_code = 9 * mantissa_code[a] + 3 * mantissa_code[b] + mantissa_code[c] ;
bap = 2:
group_code = 25 * mantissa_code[a] + 5 * mantissa_code[b] + mantissa_code[c] ;
bap = 4:
group_code = 11 * mantissa_code[a] + mantissa_code[b] ;
Decoder equations
bap = 1:
mantissa_code[a] = truncate (group_code / 9) ;
mantissa_code[b] = truncate ((group_code % 9) / 3 ) ;
mantissa_code[c] = (group_code % 9) % 3 ;
bap = 2:
mantissa_code[a] = truncate (group_code / 25) ;
mantissa_code[b] = truncate ((group_code % 25) / 5 ) ;
mantissa_code[c] = (group_code % 25) % 5 ;
bap = 4:
mantissa_code[a] = truncate (group_code / 11) ;
mantissa_code[b] = group_code % 11 ;
where mantissa a comes before mantissa b, which comes before mantissa c
7.4.1 Overview
If enabled, channel coupling is performed on encode by averaging the transform coefficients
across channels that are included in the coupling channel. Each coupled channel has a unique set
of coupling coordinates which are used to preserve the high frequency envelopes of the original
channels. The coupling process is performed above a coupling frequency that is defined by the
cplbegf value.
The decoder converts the coupling channel back into individual channels by multiplying the
coupled channel transform coefficient values by the coupling coordinate for that channel and
frequency sub-band. An additional processing step occurs for the 2/0 mode. If the phsflginu bit = 1
or the equivalent state is continued from a previous block, then phase restoration bits are sent in
the bit stream via phase flag bits. The phase flag bits represent the coupling sub-bands in a
frequency ascending order. If a phase flag bit = 1 for a particular sub-band, all the right channel
transform coefficients within that coupled sub-band are negated after modification by the
coupling coordinate, but before inverse transformation.
85
Advanced Television Systems Committee, Inc. Document A/52:2010
common coupling channel up to the frequency (or tc #) indicated by cplendf. The coupling channel
is coded up to the frequency (or tc #) indicated by cplendf, which indicates the last coupling sub-
band which is coded. The parameter cplendf is interpreted by adding 2 to its value, so the last
coupling sub-band which is coded can range from 2–17.
The coupling sub-bands are combined into coupling bands for which coupling coordinates are
generated (and included in the bit stream). The coupling band structure is indicated by
cplbndstrc[sbnd]. Each bit of the cplbndstrc[] array indicates whether the sub-band indicated by the
index is combined into the previous (lower in frequency) coupling band. Coupling bands are thus
made from integral numbers of coupling sub-bands. (See Section 5.4.3.13.)
86
Digital Audio Compression Standard 22 November 2010
Coupling coordinate dynamic range is increased beyond what the 4-bit exponent can provide
by the use of a per channel 2-bit master coupling coordinate (mstrcplco[ch]) which is used to range
all of the coupling coordinates within that channel. The exponent values for each channel are
increased by 3 times the value of mstrcplco which applies to that channel. This increases the
dynamic range of the coupling coordinates by an additional 54 dB.
The following pseudo code indicates how to generate the coupling coordinate (cplco) for each
coupling band [bnd] in each channel [ch].
Pseudo Code
if (cplcoexp[ch, bnd] == 15)
{
cplco_temp[ch,bnd] = cplcomant[ch,bnd] / 16 ;
}
else
{
cplco_temp[ch,bnd] = (cplcomant[ch,bnd] + 16) / 32 ;
}
cplco[ch,bnd] = cplco_temp[ch,bnd] >> (cplcoexp[ch,bnd] + 3 * mstrcplco[ch]) ;
Using the cplbndstrc[] array, the values of coupling coordinates which apply to coupling bands
are converted (by duplicating values as indicated by values of ‘1’ in cplbandstrc[]) to values which
apply to coupling sub-bands.
Individual channel mantissas are then reconstructed from the coupled channel as follows:
Pseudo Code
for (sbnd = cplbegf; sbnd < 3 + cplendf; sbnd++)
{
for (bin = 0; bin < 12; bin++)
{
chmant[ch, sbnd*12+bin+37] = cplmant[sbnd*12+bin+37] * cplco[ch, sbnd] * 8 ;
}
}
7.5 Rematrixing
7.5.1 Overview
Rematrixing in AC-3 is a channel combining technique in which sums and differences of highly
correlated channels are coded rather than the original channels themselves. That is, rather than
code and pack left and right in a two channel coder, we construct
left' = 0.5 * (left + right) ;
The usual quantization and data packing operations are then performed on left' and right'.
Clearly, if the original stereo signal were identical in both channels (i.e., two-channel mono), this
technique will result in a left' signal that is identical to the original left and right channels, and a
87
Advanced Television Systems Committee, Inc. Document A/52:2010
right' signal that is identically zero. As a result, we can code the right' channel with very few bits,
and increase accuracy in the more important left' channel.
This technique is especially important for preserving Dolby Surround compatibility. To see
this, consider a two channel mono source signal such as that described above. A Dolby Pro Logic
decoder will try to steer all in-phase information to the center channel, and all out-of-phase
information to the surround channel. If rematrixing is not active, the Pro Logic decoder will
receive the following signals
received left = left + QN1 ;
where QN1 and QN2 are independent (i.e., uncorrelated) quantization noise sequences, which
correspond to the AC-3 coding algorithm quantization, and are program-dependent. The Pro
Logic decoder will then construct center and surround channels as
center = 0.5 * (left + QN1) + 0.5 * (right + QN2) ;
In the case of the center channel, QN1 and QN2 add, but remain masked by the dominant
signal left + right. In the surround channel, however, left – right cancels to zero, and the surround
speakers are left to reproduce the difference in the quantization noise sequences (QN1 – QN2).
If channel rematrixing is active, the center and surround channels will be more easily
reproduced as
center = left' + QN1 ;
In this case, the quantization noise in the surround channel QN2 is much lower in level, and it is
masked by the difference signal, right'.
88
Digital Audio Compression Standard 22 November 2010
89
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo code
if (minimum sum for a rematrixing sub-band n is L or R)
{
the variable rematflg[n] = 0 ;
transmitted left = input L ;
transmitted right = input R ;
}
if (minimum sum for a rematrixing sub-band n is L+R or L-R)
{
the variable rematflg[n] = 1 ;
transmitted left = 0.5 * input (L+R) ;
transmitted right = 0.5 * input (L-R) ;
}
This selection of matrix combination is done on a block by block basis. The remaining
encoder processing of the transmitted left and right channels is identical whether or not the
rematrixing flags are 0 or 1.
Note that if coupling is not in use, the two channels may have different bandwidths. As such,
rematrixing is only applied up to the lower bandwidth of the two channels. Regardless of the
actual bandwidth, all four rematrixing flags are sent in the data stream (assuming the rematrixing
strategy bit is set).
90
Digital Audio Compression Standard 22 November 2010
7.6.1 Overview
When audio from different sources is reproduced, the apparent loudness often varies from source
to source. The different sources of audio might be different program segments during a broadcast
(i.e., the movie vs. a commercial message); different broadcast channels; or different media (disc
vs. tape). The AC-3 coding technology solves this problem by explicitly coding an indication of
loudness into the AC-3 bit stream.
The subjective level of normal spoken dialogue is used as a reference. The 5-bit dialogue
normalization word which is contained in bsi, dialnorm, is an indication of the subjective loudness
of normal spoken dialogue compared to digital 100 percent. The 5-bit value is interpreted as an
unsigned integer (most significant bit transmitted first) with a range of possible values from 1 to
31. The unsigned integer indicates the headroom in dB above the subjective dialogue level. This
value can also be interpreted as an indication of how many dB the subjective dialogue level is
below digital 100 percent.
The dialnorm value is not directly used by the AC-3 decoder. Rather, the value is used by the
section of the sound reproduction system responsible for setting the reproduction volume; e.g., the
system volume control. The system volume control is generally set based on listener input as to
the desired loudness, or sound pressure level (SPL). The listener adjusts a volume control which
generally directly adjusts the reproduction system gain. With AC-3 and the dialnorm value, the
reproduction system gain becomes a function of both the listeners desired reproduction sound
pressure level for dialogue, and the dialnorm value which indicates the level of dialogue in the
audio signal. The listener is thus able to reliably set the volume level of dialogue, and the
subjective level of dialogue will remain uniform no matter which AC-3 program is decoded.
Example:
The listener adjusts the volume control to 67 dB. (With AC-3 dialogue
normalization, it is possible to calibrate a system volume control directly in sound
pressure level, and the indication will be accurate for any AC-3 encoded audio
source). A high quality entertainment program is being received, and the AC-3 bit
stream indicates that dialogue level is 25 dB below 100 percent digital level. The
reproduction system automatically sets the reproduction system gain so that full
scale digital signals reproduce at a sound pressure level of 92 dB. The spoken
dialogue (down 25 dB) will thus reproduce at 67 dB SPL.
The broadcast program cuts to a commercial message, which has dialogue level at
–15 dB with respect to 100 percent digital level. The system level gain
automatically drops, so that digital 100 percent is now reproduced at 82 dB SPL.
The dialogue of the commercial (down 15 dB) reproduces at a 67 dB SPL, as
desired.
In order for the dialogue normalization system to work, the dialnorm value must be
communicated from the AC-3 decoder to the system gain controller so that dialnorm can interact
91
Advanced Television Systems Committee, Inc. Document A/52:2010
with the listener adjusted volume control. If the volume control function for a system is performed
as a digital multiply inside the AC-3 decoder, then the listener selected volume setting must be
communicated into the AC-3 decoder. The listener selected volume setting and the dialnorm value
must be brought together and combined in order to adjust the final reproduction system gain.
Adjustment of the system volume control is not an AC-3 function. The AC-3 bit stream
simply conveys useful information which allows the system volume control to be implemented in
a way which automatically removes undesirable level variations between program sources. It is
mandatory that the dialnorm value and the user selected volume setting both be used to set the
reproduction system gain.
7.7.1.1 Overview
A consistent problem in the delivery of audio programming is that different members of the
audience wish to enjoy different amounts of dynamic range. Original high quality programming
(such as feature films) are typically mixed with quite a wide dynamic range. Using dialogue as a
reference, loud sounds like explosions are often 20 dB or more louder, and faint sounds like
leaves rustling may be 50 dB quieter. In many listening situations it is objectionable to allow the
sound to become very loud, and thus the loudest sounds must be compressed downwards in level.
Similarly, in many listening situations the very quiet sounds would be inaudible, and must be
brought upwards in level to be heard. Since most of the audience will benefit from a limited
program dynamic range, soundtracks which have been mixed with a wide dynamic range are
generally compressed: the dynamic range is reduced by bringing down the level of the loud
sounds and bringing up the level of the quiet sounds. While this satisfies the needs of much of the
audience, it removes the ability of some in the audience to experience the original sound program
in its intended form. The AC-3 audio coding technology solves this conflict by allowing dynamic
range control values to be placed into the AC-3 bit stream.
The dynamic range control values, dynrng, indicate a gain change to be applied in the decoder
in order to implement dynamic range compression. Each dynrng value can indicate a gain change
of ± 24 dB. The sequence of dynrng values are a compression control signal. An AC-3 encoder (or
a bit stream processor) will generate the sequence of dynrng values. Each value is used by the AC-
3 decoder to alter the gain of one or more audio blocks. The dynrng values typically indicate gain
reduction during the loudest signal passages, and gain increases during the quiet passages. For the
listener, it is desirable to bring the loudest sounds down in level towards dialogue level, and the
quiet sounds up in level, again towards dialogue level. Sounds which are at the same loudness as
the normal spoken dialogue will typically not have their gain changed.
The compression is actually applied to the audio in the AC-3 decoder. The encoded audio has
full dynamic range. It is permissible for the AC-3 decoder to (optionally, under listener control)
ignore the dynrng values in the bit stream. This will result in the full dynamic range of the audio
being reproduced. It is also permissible (again under listener control) for the decoder to use some
fraction of the dynrng control value, and to use a different fraction of positive or negative values.
92
Digital Audio Compression Standard 22 November 2010
The AC-3 decoder can thus reproduce either fully compressed audio (as intended by the
compression control circuit in the AC-3 encoder); full dynamic range audio; or audio with
partially compressed dynamic range, with different amounts of compression for high level signals
and low level signals.
Example:
A feature film soundtrack is encoded into AC-3. The original program mix has
dialogue level at –25 dB. Explosions reach full scale peak level of 0 dB. Some
quiet sounds which are intended to be heard by all listeners are 50 dB below
dialogue level (or –75 dB). A compression control signal (sequence of dynrng
values) is generated by the AC-3 encoder. During those portions of the audio
program where the audio level is higher than dialogue level the dynrng values
indicate negative gain, or gain reduction. For full scale 0 dB signals (the loudest
explosions), gain reduction of –15 dB is encoded into dynrng. For very quiet
signals, a gain increase of 20 dB is encoded into dynrng.
A listener wishes to reproduce this soundtrack quietly so as not to disturb anyone,
but wishes to hear all of the intended program content. The AC-3 decoder is
allowed to reproduce the default, which is full compression. The listener adjusts
dialogue level to 60 dB SPL. The explosions will only go as loud as 70 dB (they
are 25 dB louder than dialogue but get –15 dB of gain applied), and the quiet
sounds will reproduce at 30 dB SPL (20 dB of gain is applied to their original level
of 50 dB below dialogue level). The reproduced dynamic range will be 70 dB – 30
dB = 40 dB.
The listening situation changes, and the listener now wishes to raise the
reproduction level of dialogue to 70 dB SPL, but still wishes to limit how loud the
program plays. Quiet sounds may be allowed to play as quietly as before. The
listener instructs the AC-3 decoder to continue using the dynrng values which
indicate gain reduction, but to attenuate the values which indicate gain increases
by a factor of 1/2. The explosions will still reproduce 10 dB above dialogue level,
which is now 80 dB SPL. The quiet sounds are now increased in level by 20 dB / 2
= 10 dB. They will now be reproduced 40 dB below dialogue level, at 30 dB SPL.
The reproduced dynamic range is now 80 dB – 30 dB = 50 dB.
Another listener wishes the full original dynamic range of the audio. This listener
adjusts the reproduced dialogue level to 75 dB SPL, and instructs the AC-3
decoder to ignore the dynamic range control signal. For this listener the quiet
sounds reproduce at 25 dB SPL, and the explosions hit 100 dB SPL. The
reproduced dynamic range is 100 dB – 25 dB = 75 dB. This reproduction is exactly
as intended by the original program producer.
In order for this dynamic range control method to be effective, it should be used by all
program providers. Since all broadcasters wish to supply programming in the form that is most
usable by their audience, nearly all broadcasters will apply dynamic range compression to any
audio program which has a wide dynamic range. This compression is not reversible unless it is
implemented by the technique embedded in AC-3. If broadcasters make use of the embedded AC-
3 dynamic range control system, then listeners can have some control over their reproduced
93
Advanced Television Systems Committee, Inc. Document A/52:2010
dynamic range. Broadcasters must be confident that the compression characteristic that they
introduce into AC-3 will, by default, be heard by the listeners. Therefore, the AC-3 decoder shall,
by default, implement the compression characteristic indicated by the dynrng values in the data
stream. AC-3 decoders may optionally allow listener control over the use of the dynrng values, so
that the listener may select full or partial dynamic range reproduction.
The meaning of the X values is most simply described by considering X to represent a 3-bit
signed integer with values from –4 to 3. The gain indicated by X is then (X + 1) * 6.02 dB. Table
7.29 shows this in detail.
Partial Compression
The dynrng value may be operated on in order to make it represent a gain change
which is a fraction of the original value. In order to alter the amount of
94
Digital Audio Compression Standard 22 November 2010
where X0 is the sign bit and X1 X2 Y3 Y4 Y5 Y6 Y7 are a 7-bit fraction. This 8 bit
signed fractional number may be multiplied by a fraction indicating the fraction of
the original compression to apply. If this value is multiplied by 1/2, then the
compression range of ±24 dB will be reduced to ±12 dB. After the multiplicative
scaling, the 8-bit result is once again considered to be of the original form X0 X1
X2 . Y3 Y4 Y5 Y6 Y7 and used normally.
7.7.2.1 Overview
Some products which decode the AC-3 bit stream will need to deliver the resulting audio via a
link with very restricted dynamic range. One example is the case of a television signal decoder
which must modulate the received picture and sound onto an RF channel in order to deliver a
signal usable by a low cost television receiver. In this situation, it is necessary to restrict the
maximum peak output level to a known value with respect to dialogue level, in order to prevent
overmodulation. Most of the time, the dynamic range control signal, dynrng, will produce adequate
gain reduction so that the absolute peak level will be constrained. However, since the dynamic
range control system is intended to implement a subjectively pleasing reduction in the range of
perceived loudness, there is no assurance that it will control instantaneous signal peaks adequately
to prevent overmodulation.
In order to allow the decoded AC-3 signal to be constrained in peak level, a second control
signal, compr, (compr2 for Ch2 in 1+1 mode) may be present in the AC-3 data stream. This control
signal should be present in all bit streams which are intended to be receivable by, for instance, a
television set top decoder. The compr control signal is similar to the dynrng control signal in that it
is used by the decoder to alter the reproduced audio level. The compr control signal has twice the
control range as dynrng (±48 dB compared to ±24 dB) with 1/2 the resolution (0.5 dB vs. 0.25 dB).
Also, since the compr control signal lives in BSI, it only has a time resolution of an AC-3 frame (32
ms) instead of a block (5.3 ms).
Products which require peak audio level to be constrained should use compr instead of dynrng
when compr is present in BSI. Since most of the time the use of dynrng will prevent large peak
levels, the AC-3 encoder may only need to insert compr occasionally; i.e., during those instants
when the use of dynrng would lead to excessive peak level. If the decoder has been instructed to
use compr, and compr is not present for a particular frame, then the dynrng control signal shall be
used for that frame.
95
Advanced Television Systems Committee, Inc. Document A/52:2010
In some applications of AC-3, some receivers may wish to reproduce a very restricted
dynamic range. In this case, the compr control signal may be present at all times. Then, the use of
compr instead of dynrng will allow the reproduction of audio with very limited dynamic range. This
might be useful, for instance, in the case of audio delivery to a hotel room or an airplane seat.
The meaning of the X values is most simply described by considering X to represent a 4-bit
signed integer with values from –8 to +7. The gain indicated by X is then (X + 1) * 6.02 dB. Table
7.30 shows this in detail.
96
Digital Audio Compression Standard 22 November 2010
7.8 Downmixing
In many reproduction systems, the number of loudspeakers will not match the number of encoded
audio channels. In order to reproduce the complete audio program, downmixing is required. It is
important that downmixing be standardized so that program providers can be confident of how
their program will be reproduced over systems with various numbers of loudspeakers. With
standardized downmixing equations, program producers can monitor how the downmixed version
will sound and make any alterations necessary so that acceptable results are achieved for all
listeners. The program provider can make use of the cmixlev and smixlev syntactical elements in
order to affect the relative balance of center and surround channels with respect to the left and
right channels.
Downmixing of the lfe channel is optional. An ideal downmix would have the lfe channel
reproduce at an acoustic level of +10 dB with respect to the left and right channels. Since the
inclusion of this channel is optional, any downmix coefficient may be used in practice. Care
should be taken to assure that loudspeakers are not overdriven by the full scale low frequency
content of the lfe channel.
Pseudo code
downmix()
{
if (acmod == 0) /* 1+1 mode, dual independent mono channels present */
{
if (output_nfront == 1) /* 1 front loudspeaker (center) */
{
if (dualmode == Chan 1) /* Ch1 output requested */
{
route left into center ;
}
else if (dualmode == Chan 2) /* Ch2 output requested */
{
route right into center ;
}
else
{
mix left into center with –6 dB gain ;
mix right into center with –6 dB gain ;
}
}
else if (output_nfront == 2) /* 2 front loudspeakers (left, right) */
97
Advanced Television Systems Committee, Inc. Document A/52:2010
{
if (dualmode == Stereo) /* output of both mono channels requested */
{
route left into left ;
route right into right ;
}
else if (dualmode == Chan 1)
{
mix left into left with –3 dB gain ;
mix left into right with –3 dB gain ;
}
else if (dualmode == Chan 2)
{
mix right into left with –3 dB gain ;
mix right into right with –3 dB gain ;
}
else /* mono sum of both mono channels requested */
{
mix left into left with –6 dB gain ;
mix right into left with –6 dB gain ;
mix left into right with –6 dB gain ;
mix right into right with –6 dB gain ;
}
}
else /* output_nfront == 3 */
{
if (dualmode == Stereo)
{
route left into left ;
route right into right ;
}
else if (dualmode == Chan 1)
{
route left into center ;
}
else if (dualmode == Chan 2)
{
route right into center ;
}
else
{
mix left into center with –6 dB gain ;
mix right into center with –6 dB gain ;
}
98
Digital Audio Compression Standard 22 November 2010
}
}
else /* acmod > 0 */
{
for i = { left, center, right, leftsur/monosur, rightsur }
{
if (exists(input_chan[i])) and (exists(output_chan[i]))
{
route input_chan[i] into output_chan[i] ;
}
}
if (output_mode == 2/0 Dolby Surround compatible) /* 2 ch matrix encoded output requested */
{
if (input_nfront != 2)
{
mix center into left with –3 dB gain ;
mix center into right with –3 dB gain ;
}
if (input_nrear == 1)
{
mix -mono surround into left with –3 dB gain ;
mix mono surround into right with –3 dB gain ;
}
else if (input_nrear == 2)
{
mix -left surround into left with –3 dB gain ;
mix -right surround into left with –3 dB gain ;
mix left surround into right with –3 dB gain ;
mix right surround into right with –3 dB gain ;
}
}
else if (output_mode == 1/0) /* center only */
{
if (input_nfront != 1)
{
mix left into center with –3 dB gain ;
mix right into center with –3 dB gain ;
}
if (input_nfront == 3)
{
mix center into center using clev and +3 dB gain ;
}
if (input_nrear == 1)
{
99
Advanced Television Systems Committee, Inc. Document A/52:2010
100
Digital Audio Compression Standard 22 November 2010
{
mix left srnd into mono surround with –3 dB gain ;
mix right srnd into mono surround with –3 dB gain ;
}
}
}
}
}
The actual coefficients used for downmixing will affect the absolute level of the center
channel. If dialogue level is to be established with absolute SPL calibration, this should be taken
into account.
If LoRo are subsequently combined for monophonic reproduction, the effective mono
downmix equation becomes
M = 1.0 * L + 2.0 * clev * C + 1.0 * R + slev * Ls + slev * Rs ;
If only a single surround channel, S, is present (3/1 mode) the downmix equations are
Lo = 1.0 * L + clev * C + 0.7 * slev * S ;
The values of clev and slev are indicated by the cmixlev and surmixlev bit fields in the bsi data, as
shown in Table 5.9 and Table 5.10, respectively.
If the cmixlev or surmixlev bit fields indicate the reserved state (value of ‘11’), the decoder should
use the intermediate coefficient values indicated by the bit field value of 0 1. If the Center channel
is missing (2/1 or 2/2 mode), the same equations may be used without the C term. If the surround
channels are missing, the same equations may be used without the Ls, Rs, or S terms.
101
Advanced Television Systems Committee, Inc. Document A/52:2010
Prior to the scaling needed to prevent overflow, the 3/2 downmix equations for an LtRt stereo
signal are
Lt = 1.0 * L + 0.707 * C – 0.707 * Ls – 0.707 * Rs ;
If only a single surround channel, S, is present (3/1 mode) these equations become
Lt = 1.0 L + 0.707 C – 0.707 S ;
If the center channel is missing (2/2 or 2/1 mode) the C term is dropped.
The actual coefficients used must be scaled downwards so that arithmetic overflow does not
occur if all channels contributing to a downmix signal happen to be at full scale. For each audio
coding mode, a different number of channels contribute to the downmix, and a different scaling
could be used to prevent overflow. For simplicity, the scaling for the worst case may be used in all
cases. This minimizes the number of coefficients required. The worst case scaling occurs when
clev and slev are both 0.707. In the case of the LoRo downmix, the sum of the unscaled coefficients
is 1 + 0.707 + 0.707 = 2.414, so all coefficients must be multiplied by 1/2.414 = 0.4143
(downwards scaling by 7.65 dB). In the case of the LtRt downmix, the sum of the unscaled
coefficients is 1 + 0.707 + 0.707 + 0.707 = 3.121, so all coefficients must be multiplied by 1/
3.121, or 0.3204 (downwards scaling by 9.89 dB). The scaled coefficients will typically be
converted to binary values with limited wordlength. The 6-bit coefficients shown below have
sufficient accuracy.
In order to implement the LoRo 2-channel downmix, scaled (by 0.453) coefficient values are
needed which correspond to the values of 1.0, 0.707, 0.596, 0.500, 0.354.
In order to implement the LtRt 2-ch downmix, scaled (by 0.3204) coefficient values are
needed which correspond to the values of 1.0 and 0.707.
102
Digital Audio Compression Standard 22 November 2010
7.9.1 Overview
The choice of analysis block length is fundamental to any transform-based audio coding system.
A long transform length is most suitable for input signals whose spectrum remains stationary, or
varies only slowly, with time. A long transform length provides greater frequency resolution, and
hence improved coding performance for such signals. On the other hand, a shorter transform
length, possessing greater time resolution, is more desirable for signals which change rapidly in
time. Therefore, the time vs. frequency resolution tradeoff should be considered when selecting a
transform block length.
The traditional approach to solving this dilemma is to select a single transform length which
provides the best tradeoff of coding quality for both stationary and dynamic signals. AC-3
employs a more optimal approach, which is to adapt the frequency/time resolution of the
transform depending upon spectral and temporal characteristics of the signal being processed.
This approach is very similar to behavior known to occur in human hearing. In transform coding,
the adaptation occurs by switching the block length in a signal dependent manner.
7.9.2 Technique
In the AC-3 transform block switching procedure, a block length of either 512 or 256 samples
(time resolution of 10.7 or 5.3 ms for sampling frequency of 48 kHz) can be employed. Normal
blocks are of length 512 samples. When a normal windowed block is transformed, the result is
256 unique frequency domain transform coefficients. Shorter blocks are constructed by taking the
usual 512 sample windowed audio segment and splitting it into two segments containing 256
samples each. The first half of an MDCT block is transformed separately but identically to the
second half of that block. Each half of the block produces 128 unique non-zero transform
coefficients representing frequencies from 0 to fs/2, for a total of 256. This is identical to the
number of coefficients produced by a single 512 sample block, but with two times improved
temporal resolution. Transform coefficients from the two half-blocks are interleaved together on a
coefficient-by-coefficient basis to form a single block of 256 values. This block is quantized and
transmitted identically to a single long block. A similar, mirror image procedure is applied in the
decoder during signal reconstruction.
Transform coefficients for the two 256 length transforms arrive in the decoder interleaved
together bin-by-bin. This interleaved sequence contains the same number of transform
coefficients as generated by a single 512-sample transform. The decoder processes interleaved
sequences identically to noninterleaved sequences, except during the inverse transformation
described below.
Prior to transforming the audio signal from time to frequency domain, the encoder performs
an analysis of the spectral and/or temporal nature of the input signal and selects the appropriate
block length. This analysis occurs in the encoder only, and therefore can be upgraded and
improved without altering the existing base of decoders. A one bit code per channel per transform
block (blksw[ch]) is embedded in the bit stream which conveys length information: (blksw[ch] = 0 or
1 for 512 or 256 samples, respectively). The decoder uses this information to deformat the bit
stream, reconstruct the mantissa data, and apply the appropriate inverse transform equations.
103
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
for (k=0; k<N/4; k++)
{
/* Z[k] = (X[N/2-2*k-1] + j * X[2*k]) * (xcos1[k] + j * xsin1[k]) ; */
Z[k]=(X[N/2-2*k-1]*xcos1[k]-X[2*k]*xsin1[k])+j*(X[2*k]*xcos1[k]+X[N/2-2*k-1]*xsin1[k]);
}
Where:
xcos1[k] = –cos (2 p * (8 * k + 1)/(8 * N))
xsin1[k] = –sin (2 p * (8 * k + 1)/(8 * N))
104
Digital Audio Compression Standard 22 November 2010
Pseudo Code
for (n=0; n<N/4; n++)
{
z[n] = 0 ;
for (k=0; k<N/4; k++)
{
z[n] + = Z[k] * (cos(8*π*k*n/N) + j * sin(8*π*k*n/N)) ;
}
}
Pseudo Code
for (n=0; n<N/4; n++)
{
/* y[n] = z[n] * (xcos1[n] + j * xsin1[n]) ; */
y[n] = (zr[n] * xcos1[n] - zi[n] * xsin1[n]) + j * (zi[n] * xcos1[n] + zr[n] * xsin1[n]) ;
}
Where:
zr[n] = real(z[n])
zi[n] = imag(z[n])
xcos1[n] and xsin1[n] are as defined in step 2 above
5) Windowing and de-interleaving step.
Compute windowed time-domain samples x[n]:
Pseudo Code
for (n=0; n<N/8; n++)
{
x[2*n] = -yi[N/8+n] * w[2*n] ;
x[2*n+1] = yr[N/8-n-1] * w[2*n+1] ;
x[N/4+2*n] = -yr[n] * w[N/4+2*n] ;
x[N/4+2*n+1] = yi[N/4-n-1] * w[N/4+2*n+1] ;
x[N/2+2*n] = -yr[N/8+n] * w[N/2-2*n-1] ;
x[N/2+2*n+1] = yi[N/8-n-1] * w[N/2-2*n-2] ;
x[3*N/4+2*n] = yi[n] * w[N/4-2*n-1] ;
x[3*N/4+2*n+1] = -yr[N/4-n-1] * w[N/4-2*n-2] ;
}
Where:
yr[n] = real(y[n])
yi[n] = imag(y[n])
w[n] is the transform window sequence (see Table 7.33)
105
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
for (n=0; n<N/2; n++)
{
pcm[n] = 2 * (x[n] + delay[n]) ;
delay[n] = x[N/2+n) ;
}
Note that the arithmetic processing in the overlap/add processing must use saturation
arithmetic to prevent overflow (wraparound). Since the output signal consists of the original
signal plus coding error, it is possible for the output signal to exceed 100 percent level even
though the original input signal was less than or equal to 100 percent level.
Pseudo Code
for (k=0; k<N/4; k++)
{
X1[k] = X[2*k] ;
X2[k] = X[2*k+1] ;
}
Pseudo Code
for (k=0; k<N/8; k++)
{
/* Z1[k] = (X1[N/4-2*k-1] + j * X1[2*k]) * (xcos2[k] + j * xsin2[k]); */
Z1[k]=(X1[N/4-2*k-1]*xcos2[k]-X1[2k]*xsin2[k])+j*(X1[2*k]*xcos2[k]+X1[N/4-2*k-1]*xsin2[k]) ;
/* Z2[k] = (X2[N/4-2*k-1] + j * X2[2*k]) * (xcos2[k] + j * xsin2[k]) ; */
Z2[k]=(X2[N/4-2*k-1]*xcos2[k]-X2[2*k]*xsin2[k])+j*(X2[2*k]*xcos2[k]+X2[N/4-2*k-1]*xsin2[k]) ;
}
Where:
xcos2[k] = -cos(2p*(8*k+1)/(4*N)), xsin2(k) = -sin(2p*(8*k+1)/(4*N))
106
Digital Audio Compression Standard 22 November 2010
Pseudo Code
for (n=0; n<N/8; n++)
{
z1[n] = 0. ;
z2[n] = 0. ;
for (k=0; k<N/8; k++)
{
z1[n] + = Z1[k] * (cos(16*π*k*n/N) + j * sin(16*π*k*n/N)) ;
z2[n] + = Z2[k] * (cos(16*π*k*n/N) + j * sin(16*π*k*n/N)) ;
}
}
Pseudo Code
for (n=0; n<N/8; n++)
{
/* y1[n] = z1[n] * (xcos2[n] + j * xsin2[n]) ; */
y1[n] = (zr1[n] * xcos2[n] - zi1[n] * xsin2[n]) + j * (zi1[n] * xcos2[n] + zr1[n] * xsin2[n]) ;
/* y2[n] = z2[n] * (xcos2[n] + j * xsin2[n]) ; */
y2[n] = (zr2[n] * xcos2[n] - zi2[n] * xsin2[n]) + j * (zi2[n] * xcos2[n] + zr2[n] * xsin2[n]) ;
}
Where:
zr1[n] = real(z1[n])
zi1[n] = imag(z1[n])
zr2[n] = real(z2[n])
zi2[n] = imag(z2[n])
xcos2[n] and xsin2[n] are as defined in step 2 above
5) Windowing and de-interleaving step.
Compute windowed time-domain samples x[n].
Pseudo Code
for (n=0; n<N/8; n++)
{
x[2*n] = -yi1[n] * w[2*n] ;
x[2*n+1] = yr1[N/8-n-1] * w[2*n+1] ;
x[N/4+2*n] = -yr1[n] * w[N/4+2*n] ;
x[N/4+2*n+1] = yi1[N/8-n-1] * w[N/4+2*n+1] ;
107
Advanced Television Systems Committee, Inc. Document A/52:2010
Where:
yr1[n] = real(y1[n])
yi1[n] = imag(y1[n])
yr2[n] = real(y2[n])
yi2[n] = imag(y2[n])
w[n] is the transform window sequence (see Table 7.33)
108
Digital Audio Compression Standard 22 November 2010
The first half of the windowed block is overlapped with the second half of the previous block
to produce PCM samples (the factor of 2 scaling undoes headroom scaling performed in the
encoder):
Pseudo Code
for (n=0; n<N/2; n++)
{
pcm[n] = 2 * (x[n] + delay[n]) ;
delay[n] = x[N/2+n] ;
}
Note that the arithmetic processing in the overlap/add processing must use saturation
arithmetic to prevent overflow (wraparound). Since the output signal consists of the original
signal plus coding error, it is possible for the output signal to exceed 100 percent level even
though the original input signal was less than or equal to 100 percent level.
If the encoder does not perform the step of finding the maximum absolute value within each
block then the value of gainrng should be set to 0.
The decoder may use the value of gainrng to pre-scale the transform coefficients prior to the
transform and to post-scale the values after the transform. With careful design, the post-scaling
process can be performed right at the PCM output stage allowing a 16-bit output buffer RAM to
provide 18-bit dynamic range audio.
109
Advanced Television Systems Committee, Inc. Document A/52:2010
b0 b1 + b2 b3 b b + b +
13 14 15
u(x)
delivered the data. The data integrity may be checked using the embedded CRCs. Also, some
simple consistency checks on the received data can indicate that errors are present. The decoder
strategy when errors are detected is user-definable. Possible responses include muting, block
repeats, or frame repeats. The amount of error checking performed, and the behavior in the
presence of errors are not specified in this standard, but are left to the application and
implementation.
x16 + x15 + x2 + 1
or
5/8_framesize = (int) (framesize>>1) + (int) (framesize>>3) ;
where framesize is in units of 16-bit words. Table 7.35 shows the value of 5/8 of the frame size as a
function of AC-3 bit-rate and audio sample rate.
The CRC calculation may be implemented by one of several standard techniques. A
convenient hardware implementation is a linear feedback shift register (LFSR). An example of an
LFSR circuit for the above generator polynomial is given in Figure 7.1.
Checking for valid CRC with the above circuit consists of resetting all registers to zero, and
then shifting the AC-3 data bits serially into the circuit in the order in which they appear in the
data stream. The sync word is not covered by either CRC (but is included in the indicated 5/
8_framesize) so it should not be included in the CRC calculation. crc1 is considered valid if the
above register contains all zeros after the first 5/8 of the frame has been shifted in. If the
calculation is continued until all data in the frame has been shifted through, and the value is again
equal to zero, then crc2 is considered valid. Some decoders may choose to only check crc2, and not
check for a valid crc1 at the 5/8 point in the frame. If crc1 is invalid, it is possible to reset the
110
Digital Audio Compression Standard 22 November 2010
registers to zero and then check crc2. If crc2 then checks, then the last 3/8 of the frame is probably
error free. This is of little utility however, since if errors are present in the initial 5/8 of a frame it
is not possible to decode any audio from the frame even if the final 3/8 is error free.
Note that crc1 is generated by encoders such that the CRC calculation will produce zero at the
5/8 point in the frame. It is not the value generated by calculating the CRC of the first 5/8 of the
frame using the above generator polynomial. Therefore, decoders should not attempt to save crc1,
calculate the CRC for the first 5/8 of the frame, and then compare the two.
111
Advanced Television Systems Committee, Inc. Document A/52:2010
Table 7.35 5/8_framesize Table; Number of Words in the First 5/8 of the Frame
frmsizecod Nominal Bit- fs = 32 kHz fs = 44.1 kHz fs = 48 kHz
Rate 5/8_framesize 5/8_framesize 5/8_framesize
‘000000’ (0) 32 kbps 60 42 40
‘000001’ (0) 32 kbps 60 43 40
‘000010’ (1) 40 kbps 75 53 50
‘000011’ (1) 40 kbps 75 55 50
‘000100’ (2) 48 kbps 90 65 60
‘000101’ (2) 48 kbps 90 65 60
‘000110’ (3) 56 kbps 105 75 70
‘000111’ (3) 56 kbps 105 76 70
‘001000’ (4) 64 kbps 120 86 80
‘001001’ (4) 64 kbps 120 87 80
‘001010’ (5) 80 kbps 150 108 100
‘001011’ (5) 80 kbps 150 108 100
‘001100’ (6) 96 kbps 180 130 120
‘001101’ (6) 96 kbps 180 130 120
‘001110’ (7) 112 kbps 210 151 140
‘001111’ (7) 112 kbps 210 152 140
‘010000’ (8) 128 kbps 240 173 160
‘010001’ (8) 128 kbps 240 173 160
‘010010’ (9) 160 kbps 300 217 200
‘010011’ (9) 160 kbps 300 217 200
‘010100’ (10) 192 kbps 360 260 240
‘010101’ (10) 192 kbps 360 261 240
‘010110’ (11) 224 kbps 420 303 280
‘010111’ (11) 224 kbps 420 305 280
‘011000’ (12) 256 kbps 480 347 320
‘011001’ (12) 256 kbps 480 348 320
‘011010’ (13) 320 kbps 600 435 400
‘011011’ (13) 320 kbps 600 435 400
‘011100’ (14) 384 kbps 720 521 480
‘011101’ (14) 384 kbps 720 522 480
‘011110’ (15) 448 kbps 840 608 560
‘011111’ (15) 448 kbps 840 610 560
‘100000’ (16) 512 kbps 960 696 640
‘100001’ (16) 512 kbps 960 696 640
‘100010’ (17) 576 kbps 1080 782 720
‘100011’ (17) 576 kbps 1080 783 720
‘100100’ (18) 640 kbps 1200 870 800
‘100101’ (18) 640 kbps 1200 871 800
Syntactical block size restrictions within each frame (enforced by encoders), guarantee that
blocks 0 and 1 are completely covered by crc1. Therefore, decoders may immediately begin
processing block 0 when the 5/8 point in the data frame is reached. This may allow smaller input
112
Digital Audio Compression Standard 22 November 2010
buffers in some applications. Decoders that are able to store an entire frame may choose to
process only crc2. These decoders would not begin processing block 0 of a frame until the entire
frame is received.
113
Advanced Television Systems Committee, Inc. Document A/52:2010
Note that some of these conditions (such as #17 through #20) can only be tested for at low-
levels within the decoder software, resulting in a potentially significant MIPS impact. So long as
these conditions do not affect system stability, they do not need to be specifically prevented.
114
Digital Audio Compression Standard 22 November 2010
8.1 Introduction
This section provides some guidance on AC-3 encoding. Since AC-3 is specified by the syntax
and decoder processing, the encoder is not precisely specified. The only normative requirement
on the encoder is that the output elementary bit stream follow AC-3 syntax. Encoders of varying
levels of sophistication may be produced. More sophisticated encoders may offer superior audio
performance, and may make operation at lower bit-rates acceptable. Encoders are expected to
improve over time. All decoders will benefit from encoder improvements. The encoder described
in this section, while basic in operation, provides good performance. The description which
follows indicates several avenues of potential improvement. A flow diagram of the encoding
process is shown in Figure 8.1.
115
Advanced Television Systems Committee, Inc. Document A/52:2010
down into four steps: 1) high-pass filtering, 2) segmentation of the block into submultiples, 3)
peak amplitude detection within each sub-block segment, and 4) threshold comparison. The
transient detector outputs a flag blksw[n] for each full-bandwidth channel, which when set to “one”
indicates the presence of a transient in the second half of the 512 length input block for the
corresponding channel.
1. High-pass filtering: The high-pass filter is implemented as a cascaded biquad direct form I
IIR filter with a cutoff of 8 kHz.
2. Block Segmentation: The block of 256 high-pass filtered samples are segmented into a
hierarchical tree of levels in which level 1 represents the 256 length block, level 2 is two
segments of length 128, and level 3 is four segments of length 64.
116
Digital Audio Compression Standard 22 November 2010
3. Peak Detection: The sample with the largest magnitude is identified for each segment on
every level of the hierarchical tree. The peaks for a single level are found as follows:
P[j][k] = max(x(n))
Where:
x(n) = the nth sample in the 256 length block
j = 1, 2, 3 is the hierarchical level number
k = the segment number within level j
Note that P[j][0], (i.e., k = 0) is defined to be the peak of the last segment on level j of the tree
calculated immediately prior to the current tree. For example, P[3][4] in the preceding tree is
P[3][0] in the current tree.
4. Threshold Comparison: The first stage of the threshold comparator checks to see if there is
significant signal level in the current block. This is done by comparing the overall peak value
P[1][1] of the current block to a “silence threshold”. If P[1][1] is below this threshold then a long
block is forced. The silence threshold value is 100/32768. The next stage of the comparator
checks the relative peak levels of adjacent segments on each level of the hierarchical tree. If
the peak ratio of any two adjacent segments on a particular level exceeds a pre-defined
threshold for that level, then a flag is set to indicate the presence of a transient in the current
256 length block. The ratios are compared as follows:
mag(P[j][k]) × T[j] > mag(P[j][(k-1)])
Where:
T[j] is the pre-defined threshold for level j, defined as
T[1] = .1
T[2] = .075
T[3] = .05
If this inequality is true for any two segment peaks on any level, then a transient is indicated
for the first half of the 512 length input block. The second pass through this process
determines the presence of transients in the second half of the 512 length input block.
8.2.3.1 Windowing
The audio block is multiplied by a window function to reduce transform boundary effects and to
improve frequency selectivity in the filter bank. The values of the window function are included
in Table 7.33. Note that the 256 coefficients given are used back-to-back to form a 512-point
symmetrical window.
117
Advanced Television Systems Committee, Inc. Document A/52:2010
N–1
–2 2π π
X D [ k ] = ------ x [ n ] cos ------- ( 2n + 1 ) ( 2k + 1 ) + --- ( 2k + 1 ) ( 1 + α ) for 0 ≤ k < N/2
N 4N 4
N=0
Where:
α = –1 for the first short transform
0 for the long transform
+1 for the second short transform
Coupling coordinates for all channels may be transmitted for every other block; i.e. blocks 0,
2, and 4. During blocks 1, 3, and 5, coupling coordinates are reused.
118
Digital Audio Compression Standard 22 November 2010
8.2.6 Rematrixing
Rematrixing is active only in the 2/0 mode. Within each rematrixing band, power measurements
are made on the L, R, L+R, and L–R signals. If the maximum power is found in the L or R
channels, the rematrix flag is not set for that band. If the maximum power is found in the L+R or
L–R signal, then the rematrix flag is set. When the rematrix flag for a band is set, the encoder
codes L+R and L–R instead of L and R. Rematrixing is described in Section 7.5.
119
Advanced Television Systems Committee, Inc. Document A/52:2010
sdcycod = 2;
fdcycod = 1;
sgaincod = 1;
dbpbcod = 2;
floorcod = 4;
cplfgaincod = 4;
fgaincod[ch] = 4;
lfegaincod = 4;
cplsnroffst = fsnroffst[ch] = lfesnroffst = fineoffset;
Since the bit allocation parameters are static, they are only sent during block 0. Delta bit
allocation is not used, so deltbaie = 0. The core bit allocation routine (described in Section 7.2) is
run, and the coarse and fine SNR offsets are adjusted until all available bits in the frame are used
up. The coarse SNR offset adjusts in 3 dB increments, and the fine offset adjusts in 3/16 dB
increments. Bits are allocated globally from a common bit pool to all channels. The combination
of csnroffst and fineoffset are chosen which uses the largest number of bits without exceeding the
frame size. This involves an iterative process. When, for a given iteration, the number of bits
exceeds the pool, the SNR offset is decreased for the next iteration. On the other hand, if the
allocation is less than the pool, the SNR offset is increased for the next iteration. When the SNR
offset is at its maximum without causing the allocation to exceed the pool, the iterating is
complete. The result of the bit allocation routine are the final values of csnroffst and fineoffset, and
120
Digital Audio Compression Standard 22 November 2010
the set of bit allocation pointers (baps). The SNR offset values are included in the bit stream so
that the decoder does not need to iterate.
121
A/52:2010, Annex A:
AC-3 Elementary Streams in the MPEG-2 Multiplex
(Normative)
A1. SCOPE
This Annex contains certain syntax and semantics needed to enable the transport of one or more
AC-3 elementary streams in an MPEG 2 Transport Stream per ISO/IEC 13818-12.
A2. INTRODUCTION
When an AC-3 elementary bit stream is included in an MPEG-2 Transport Stream, the AC-3 bit
stream is packetized into PES packets. MPEG 2 Transport Streams containing AC-3 elementary
streams can be constrained by the STD model in System A or System B. Signaling is required in
order to indicate unambiguously that an AC-3 stream is, in fact, an AC-3 stream and to which
System (A/B) the stream conforms. Since the MPEG 2 Systems standard does not explicitly
define codes to be used to indicate an AC-3 stream, stream_type values are necessary to be defined.
It is important to note that the stream_type values assigned for AC-3 streams can be different for
different systems, two of which are covered below. Also, the MPEG 2 standard does not have an
audio descriptor adequate to describe the contents of the AC-3 bit stream in the PSI tables. This
Annex defines syntax and semantics to address these issues.
The AC-3 audio access unit (AU) or presentation unit (PU) is an AC-3 sync frame. The AC-3
sync frame contains 1536 audio samples. The duration of an AC-3 access (or presentation) unit is
32 ms for audio sampled at 48 kHz, approximately 34.83 ms for audio sampled at 44.1 kHz, and
48 ms for audio sampled at 32 kHz.
The items which need to be specified in order to include AC-3 within the MPEG-2 Transport
Stream are: stream_type, stream_id, AC-3 audio descriptor, and the MPEG-2 registration descriptor.
Some constraints are placed on the PES layer for the case of multiple audio streams intended to be
reproduced in exact sample synchronism. In System A, the AC-3 audio descriptor is titled “AC-
3_audio_stream_descriptor” while in System B the AC-3 audio descriptor is titled “AC-
3_descriptor”. It should be noted that the syntax of these descriptors differs significantly between
the two systems.
This annex does not place any constraint on the values in any of the fields defined herein or on
placement of any of the data structures defined herein. It does establish values for fields defined
by other standards, in particular ISO/IEC 13818-1 [1]. Standards developing organizations
referencing this Standard may place their own usage and placement constraints. ATSC has done
so to complete the standardization process for System A.
2. For example, as required by either "System A" or "System B," which are defined in Recommendation
ITU-R BT.1300-3 [10].
Page 123
Advanced Television Systems Committee, Inc. Document A/52:2010
A4.2 Stream ID
The value of stream_id in the PES header shall be 0xBD (indicating private_stream_1). Multiple AC-3
streams may share the same value of stream_id since each stream is carried within TS packets
identified by a unique PID value within that TS. The association of the PID value for each stream,
with its stream_type, is found in the transport stream program map table (PMT).
124
Digital Audio Compression Standard, Annex A 22 November 2010
125
Advanced Television Systems Committee, Inc. Document A/52:2010
sample_rate_code – This is a 3-bit field that indicates the sample rate of the encoded audio. The
indication may be of one specific sample rate, or may be of a set of values which include the
sample rate of the encoded audio (see Table A4.2).
bsid – This is a 5-bit field that is set to the same value as the bsid field in the AC-3 elementary
stream.
– This is a 6-bit field. The lower 5 bits indicate a nominal bit rate. The MSB indicates
bit_rate_code
whether the indicated bit rate is exact (MSB = 0) or an upper limit (MSB = 1) (see Table
A4.3).
126
Digital Audio Compression Standard, Annex A 22 November 2010
dsurmod – This is a 2-bit field that may be set to the same value as the dsurmod field in the AC-3
elementary stream, or which may be set to ‘00’ (not indicated) (see Table A4.4).
bsmod – This is a 3-bit field that is set to the same value as the bsmod field in the AC-3 elementary
stream.
num_channels – This is a 4-bit field that indicates the number of channels in the AC-3 elementary
stream. When the MSB is 0, the lower 3 bits are set to the same value as the acmod field in the
AC-3 elementary stream. When the MSB field is 1, the lower 3 bits indicate the maximum
number of encoded audio channels (counting the lfe channel as 1). See Table A4.5.
full_svc– This is a 1-bit field that indicates whether or not this audio service is a full service
suitable for presentation, or whether this audio service is only a partial service which should
be combined with another audio service before presentation. This bit should be set to a “1” if
this audio service is sufficiently complete to be presented to the listener without being
combined with another audio service (for example, a visually impaired service which contains
all elements of the program; music, effects, dialogue, and the visual content descriptive
narrative). This bit should be set to a “0” if the service is not sufficiently complete to be
presented without being combined with another audio service (e.g., a visually impaired
service which only contains a narrative description of the visual program content and which
needs to be combined with another audio service which contains music, effects, and dialogue).
langcod – This field is deprecated. If the langcod field is present in the descriptor then it shall be set
to 0xFF. (This field is immediately after the first allowed termination point in the descriptor.).
127
Advanced Television Systems Committee, Inc. Document A/52:2010
Note: This field is retained with the prescribed length at the prescribed location for
backwards compatibility with deployed receiving systems. In the AC-3 bit stream,
3
langcod is an optional field that may be present in the elementary stream. It was
initially specified to indicate language. The field language replaces this field’s
function in this descriptor
langcod2 – This field is deprecated. If the langcod2 field is present in the descriptor then it shall be
set to 0xFF.
Note: This field is retained with the prescribed length at the prescribed location for
backwards compatibility with deployed receiving systems. The field language_2
replaces this field’s function in this descriptor.
mainid – This is a 3-bit field that contains a number in the range 0–7 which identifies a main audio
service. Each main service should be tagged with a unique number. This value is used as an
identifier to link associated services with particular main services.
priority – This is a 2-bit field that indicates the priority of the audio service. This field allows a
Main audio service (bsmod equal to 0 or 1) to be marked as the primary audio service. Other
audio services may be explicitly marked or not specified. Table A4.6 below shows how this
field is encoded.
asvcflags – This is an 8-bit field. Each bit (0–7) indicates with which main service(s) this
associated service is associated. The left most bit, bit 7, indicates whether this associated
service may be reproduced along with main service number 7. If the bit has a value of 1, the
service is associated with main service number 7. If the bit has a value of 0, the service is not
associated with main service number 7.
textlen – This is an unsigned integer which indicates the length, in bytes, of a descriptive text field
that follows.
text_code – This is a 1-bit field that indicates how the following text field is encoded. If this bit is a
‘1’, the text is encoded as 1-byte characters using the ISO Latin-1 alphabet (ISO 8859-1). If
this bit is a ‘0’, the text is encoded with 2-byte unicode characters.
text[i] – The text field may contain a brief textual description of the audio service.
language_flag – This is a 1-bit flag that indicates whether or not the 3-byte language field is present
in the descriptor. If this bit is set to ‘1’, then the 3-byte language field is present. If this bit is
set to ‘0’, then the language field is not present.
3. The semantics of the langcod field in the elementary stream were changed in 2001.
128
Digital Audio Compression Standard, Annex A 22 November 2010
language_flag_2 – This is a 1-bit flag that indicates whether or not the 3-byte language_2 field is
present in the descriptor. If this bit is set to ‘1’, then the 3-byte language_2 field is present. If
this bit is set to ‘0”, then the language_2 field is not present. This bit shall always be set to ‘0’,
unless the num_channels field is set to ‘0000’ indicating the audio coding mode is 1+1 (dual
mono). If the num_channels field is set to ‘0000’ then this bit may be set to ‘1’ and and the
language_2 field may be included in this descriptor.
language – This field is a 3-byte language code defining the language of this audio service which
shall correspond to a registered language code contained in the ISO 639-2 Code column of the
ISO 639-2 registry [2], and shall be the code marked ‘(B)’ in that registry if two codes are
present. If the AC-3 stream audio coding mode is 1+1 (dual mono), this field indicates the
language of the first channel (channel 1, or “left” channel). Each character is coded into 8 bits
according to ISO 8859-1 [3] (ISO Latin-1) and inserted in order into the 24-bit field. The
coding is identical to that used in the MPEG-2 ISO_639_language_code value in the
ISO_639_language_descriptor specified in ISO/IEC 13818-1 [1].
language_2 – This field is only present if the AC-3 stream audio coding mode is 1+1 (dual mono).
This field is a 3-byte language code defining the language of the second channel (channel 2, or
“right” channel) in the AC-3 bit stream which shall correspond to a registered language value
code contained in the ISO 639-2 registry [2], and shall be the code marked ‘(B)’ in that
registry if two codes are present. Each character is coded into 8 bits according to ISO 8859-1
[3] (ISO Latin-1) and inserted in order into the 24-bit field. The coding is identical to that used
in the MPEG-2 ISO_639_language_code value in the ISO_639_language_descriptor specified in ISO/
IEC 13818-1 [1].
– This is a set of additional bytes filling out the remainder of the descriptor. The
additional_info[j]
purpose of these bytes is not currently defined. This field is provided to allow the ATSC to
extend this descriptor. No other use is permitted.
Where:
BSmux = 736 bytes
BSoh = PES header overhead
BSdec = access unit buffer
ISO/IEC 13818-1 [1] specifies a fixed value for BSn (3584 bytes) and indicates that any
excess buffer may be used for additional multiplexing.
When an AC-3 elementary stream is carried by an MPEG-2 transport stream, the transport
stream shall be compliant with a main audio buffer size of
BSn = BSmux + BSpad + BSdec
Where:
BSmux = 736 bytes
BSpad = 64 bytes
129
Advanced Television Systems Committee, Inc. Document A/52:2010
The value of BSdec employed shall be that of the highest bit rate supported by the system (i.e.,
the buffer size is not decreased when the audio bit rate is less than the maximum value allowed by
a specific system). The 64 bytes in BSpad are available for BSoh and additional multiplexing.
This constraint makes it possible to implement decoders with the minimum possible memory
buffer.
A5.2 Stream ID
The value of stream_id in the PES header shall be 0xBD (indicating private_stream_1). Multiple AC-3
streams may share the same value of stream_id since each stream is carried with a unique PID
value. The mapping of values of PID to stream_type can be indicated in the transport stream
program map table (PMT).
130
Digital Audio Compression Standard, Annex A 22 November 2010
descriptor_tag − The descriptor tag is an 8-bit field that identifies each descriptor. The AC-3
descriptor_tag shall have a value of 0x6A.
descriptor_length − This 8-bit field specifies the total number of bytes of the data portion of the
descriptor following the byte defining the value of this field. The AC-3 descriptor has a
minimum length of one byte but may be longer depending on the use of the optional flags and
the additional_info loop.
AC-3_type_flag − This 1-bit field is mandatory. It should be set to ‘1’ to include the optional AC-
3_type field in the descriptor.
bsid_flag− This 1-bit field is mandatory. It should be set to ‘1’ to include the optional bsid field in
the descriptor.
mainid_flag − This 1-bit field is mandatory. It should be set to ‘1’ to include the optional mainid field
in the descriptor.
131
Advanced Television Systems Committee, Inc. Document A/52:2010
asvc_flag− This 1-bit field is mandatory. It should be set to ‘1’ to include the optional asvc field in
the descriptor.
reserved flags − These 1-bit fields are reserved for future use. They should always be set to ‘0’.
AC-3_type − This optional 8-bit field indicates the type of audio carried in the AC-3 elementary
stream. It is set to the same value as the component type field of the component descriptor
(refer to Table A7).
bsid − This optional 8-bit field indicates the AC-3 coding version. The three MSBs should always
be set to ‘0’. The five LSBs are set to the same value as the bsid field in the AC-3 elementary
stream, ‘01000’ (=8) in the current version of AC-3.
mainid − This optional 8-bit field identifies a main audio service and contains a number in the
range 0–7 which identifies a main audio service. Each main service should be tagged with a
unique number. This value is used as an identifier to link associated services with particular
main services.
asvc − This 8-bit field is optional. Each bit (0–7) identifies with which main service(s) this
associated service is associated. The left most bit, bit 7, indicates whether this associated
service may be reproduced along with main service number 7. If the bit has a value of 1, the
service is associated with main service number 7. If the bit has a value of 0, the service is not
associated with main service number 7.
additional_info − These optional bytes are reserved for future use.
132
Digital Audio Compression Standard, Annex A 22 November 2010
A6.1 Encoding
In some applications, the audio decoder may be capable of simultaneously decoding two
elementary streams containing different program elements, and then combining the program
elements into a complete program.
Most of the program elements are found in the main audio service. Another program element
(such as a narration of the picture content intended for the visually impaired listener) may be
found in the associated audio service.
133
Advanced Television Systems Committee, Inc. Document A/52:2010
In order to have the audio from the two elementary streams reproduced in exact sample
synchronism, it is necessary for the original audio elementary stream encoders to have encoded
the two audio program elements frame synchronously; i.e., if audio stream 1 has sample 0 of
frame n taken at time t0, then audio stream 2 should also have frame n beginning with its sample
0 taken the identical time t0. If the encoding of multiple audio services is done frame and sample
synchronous, and decoding is intended to be frame and sample synchronous, then the PES packets
of these audio services shall contain identical values of PTS which refer to the audio access units
intended for synchronous decoding.
Audio services intended to be combined together for reproduction shall be encoded at an
identical sample rate.
A6.2 Decoding
If audio access units from two audio services which are to be simultaneously decoded have
identical values of PTS indicated in their corresponding PES headers, then the corresponding
audio access units shall be presented to the audio decoder for simultaneous synchronous
decoding. Synchronous decoding means that for corresponding audio frames (access units),
corresponding audio samples are presented at the identical time.
If the PTS values do not match (indicating that the audio encoding was not frame
synchronous) then the audio frames (access units) of the main audio service may be presented to
the audio decoder for decoding and presentation at the time indicated by the PTS. An associated
service which is being simultaneously decoded may have its audio frames (access units), which
are in closest time alignment (as indicated by the PTS) to those of the main service being decoded,
presented to the audio decoder for simultaneous decoding. In this case the associated service may
be reproduced out of sync by as much as 1/2 of a frame time. (This is typically satisfactory; a
visually impaired narration does not require highly precise timing.)
A6.3 Byte-Alignment
This section applies to both System A and System B. The AC-3 elementary stream shall be byte-
aligned within the MPEG-2 data stream. This means that the initial 8 bits of an AC-3 frame shall
reside in a single byte which is carried by the MPEG-2 data stream.
134
A/52:2010, Annex B:
Bibliography (Informative)
The following documents contain information on the algorithm described in this standard, and
may be useful to those who are using or attempting to understand this standard. In the case of
conflicting information, the information contained in this standard should be considered correct.
[1] Todd, C., et. al., “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage”,
AES 96th Convention, Preprint 3796, Audio Engineering Society, New York, NY, February
1994.
[2] Fielder, L. D., M. A. Bosi, G. A. Davidson, M. F. Davis, C. Todd, and S. Vernon; “AC-2 and
AC-3: Low-Complexity Transform-Based Audio Coding,” Collected Papers on Digital
Audio Bit-Rate Reduction, Neil Gilchrist and Christer Grewin eds., pp. 54–72, Audio
Engineering Society, New York, NY, 1996.
[3] Davidson, G. A.; The Digital Signal Processing Handbook, V. K. Madisetti and D. B.
Williams eds., pp. 41-1 – 41-21, CRC Press LLC, Boca Raton, FL, 1997.
[4] Princen, J., and A. Bradley; “Analysis/synthesis filter bank design based on time domain
aliasing cancellation,” IEEE Trans. Acoust. Speech and Signal Processing, vol. ASSP-34,
pp. 1153–1161, IEEE, New York, NY, October 1986.
[5] Davidson, G. A, L. D. Fielder, and B. D. Link; “Parametric Bit Allocation in Perceptual
Audio Coder,” AES 97th Convention, Preprint 3921, Audio Engineering Society, New York,
NY, November 1994.
[6] Vernon, Steve; “Dolby Digital: Audio Coding for Digital Television and Storage
Applications,” AES 17th International Conference: High-Quality Audio Coding, August
1999.
[7] Vernon, Steve, Vlad Fruchter, and Sergio Kusevitzky; “A Single-Chip DSP Implementation
of a High-Quality Low Bit-Rate Multichannel Audio Coder,” AES 95th Convention,
Preprint 3775, Audio Engineering Society, New York, NY, September 1993.
[8] Rao, R., and P. Yip; Discrete Cosine Transform, Academic Press, Boston, MA, pg. 11, 1990.
[9] Cover, T. M., and J. A. Thomas; Elements of Information Theory, Wiley Series in
Telecommunications, Wiley, New York, NY, pg. 13, 1991.
[10] Gersho, A., and R. M. Gray; Vector Quantization and Signal Compression, Kluwer
Academic Publisher, Boston, MA, pg. 309, 1992.
[11] Truman, M. M., G. A. Davidson, A. Ubale, and L. D. Fielder; “Efficient Bit Allocation,
Quantization, and Coding in an Audio Distribution System,” AES 107th Convention,
Preprint 5068, Audio Engineering Society, New York, NY, August 1999.
[12] Fielder, Louis D. and Grant A. Davidson; “Audio Coding Tools for Digital Television
Distribution,” AES 108th Convention, Preprint 5104, Audio Engineering Society, New
York, NY, January 2000.
Page 135
Advanced Television Systems Committee, Inc. Document A/52:2010
[13] Crockett, B.; “High Quality Multi-Channel Time-Scaling and Pitch-Shifting using Auditory
Scene Analysis,” AES 115th Convention, Preprint 5948, Audio Engineering Society, New
York, NY, October 2003.
[14] Crockett, B.; “Improved Transient Pre-Noise Performance of Low Bit Rate Audio Coders
Using Time Scaling Synthesis,” AES 117th Convention, Audio Engineering Society, New
York, NY, October 2004.
[15] Fielder, L. D., R. L. Andersen, B. G. Crockett, G. A. Davidson, M. F. Davis, S. C. Turner, M.
S. Vinton, and P. A. Williams; “Introduction to Dolby Digital Plus, an Enhancement to the
Dolby Digital Coding System,” AES 117th Convention, Audio Engineering Society, New
York, NY, October 2004.
136
A/52:2010, Annex C:
AC-3 Karaoke Mode
(Informative)
C1. SCOPE
This Annex contains specifications for how karaoke aware and karaoke capable AC-3 decoders
should reproduce karaoke AC-3 bit streams. A minimum level of functionality is defined which
allows a karaoke aware decoder to produce an appropriate 2/0 or 3/0 default output when
presented with a karaoke mode AC-3 bit stream. An additional level of functionality is defined for
the karaoke capable decoder so that the listener may optionally control the reproduction of the
karaoke bit stream.
C2. INTRODUCTION
The AC-3 karaoke mode has been defined in order to allow the multi-channel AC-3 bit stream to
convey audio channels designated as L, R (e.g., 2-channel stereo music), M (e.g., guide melody),
and V1, V2 (e.g., one or two vocal tracks). This Annex does not specify the contents of L, R, M,
V1, and V2, but does specify the behavior of AC-3 decoding equipment when receiving a karaoke
bit stream containing these channels. An AC-3 decoder which is karaoke capable will allow the
listener to optionally reproduce the V1 and V2 channels, and may allow the listener to adjust the
relative levels (mixing balance) of the M, V1, and V2 channels. An AC-3 decoder which is
karaoke aware will reproduce the L, R, and M channels, and will reproduce the V1 and V2
channels at a level indicated by the encoded bit stream.
The 2-channel karaoke aware decoder will decode the karaoke bit stream using the Lo, Ro
downmix. The L and R channels will be reproduced out of the left and right outputs, and the M
channel will appear as a phantom center. The precise level of the M channel is determined by
cmixlev which is under control of the program provider. The level of the V1 and V2 channels which
will appear in the downmix is determined by surmixlev, which is under control of the program
provider. A single V channel (V1 only) will appear as a phantom center. A pair of V channels (V1
and V2) will be reproduced with V1 in left output and V2 in right output.
The 5-channel karaoke aware decoder will reproduce the L, R channels out of the left and
right outputs, and the M channel out of the center output. A single V channel (V1 only) will be
reproduced in the center channel output. A pair of V channels (V1 and V2) will be reproduced
with V1 in left output and V2 in right output. The level of the V1 and V2 channels which will
appear in the output is determined by surmixlev.
The karaoke capable decoder gives some control of the reproduction to the listener. The V1,
V2 channels may be selected for reproduction independent of the value of surmixlev in the bit
stream. The decoder may optionally allow the reproduction level and location of the M, V1, and
V2 channels to be adjusted by the listener. The detailed implementation of the flexible karaoke
capable decoder is not specified; it is left up to the implementation as to the degree of adjustability
to be offered to the listener.
Page 137
Advanced Television Systems Committee, Inc. Document A/52:2010
Ck = d * V1 + e * V2 + f * M
Rk = R + g * V1 + h * V2 + i * M
138
Digital Audio Compression Standard, Annex C 22 November 2010
Additional flexibility may be offered optionally to the user of the karaoke decoder. For
instance, the coefficients a, d, and g might be adjusted to allow the V1 channel to be reproduced in
a different location and with a different level. Similarly the level and location of the V2 and M
channels could be adjusted. The details of these additional optional user controls are not specified
and are left up to the implementation. Also left up to the implementation is what use might be
made of the Ls, Rs outputs of the 5-channel decoder, which would naturally reproduce the V1, V2
channels.
139
A/52:2010, Annex D:
Alternate Bit Stream Syntax
(Normative)
D1. SCOPE
This Annex contains specifications for an alternate bit stream syntax that may be implemented by
some AC-3 encoders and interpreted by some AC-3 decoders. The new syntax redefines certain
bit stream information (bsi) fields to carry new meanings. It is not necessary for decoders to be
aware of this alternate syntax in order to properly reconstruct an audio soundfield; however those
decoders that are aware of this syntax will be able to take advantage of the new system features
described in this Annex. This alternate bit stream syntax is identified by setting the bsid to a value
of 6.
This Annex is Normative to the extent that when bsid is set to the value of 6, the alternate
syntax elements shall have the meaning described in this Annex. Thus, this Annex may be
considered Normative on encoders that set bsid to 6.
This Annex is Informative for decoders. Interpretation and use of the new syntactical elements
is optional for decoders.
The new syntactical elements defined in this Annex are placed in the two 14-bit fields that are
defined as timecod1 and timecod2 in the body of this document (these fields have never been applied
for their originally anticipated purpose).
D2. SPECIFICATION
Page 141
Advanced Television Systems Committee, Inc. Document A/52:2010
Table D2.1 Bit Stream Information; Alternate Bit Stream Syntax (Continued)
Syntax Word Size
compre 1
if (compre) {compr} 8
langcode 1
if (langcode) {langcod} 8
audprodie 1
if (audprodie)
{
mixlevel 5
roomtyp 2
}
if (acmod == 0) /* if 1+1 mode (dual mono, so some items need a second value) */
{
dialnorm2 5
compr2e 1
if (compr2e) {compr2} 8
langcod2e 1
if (langcod2e) {langcod2} 8
audprodi2e 1
if (audprodi2e)
{
mixlevel2 5
roomtyp2 2
}
}
copyrightb 1
origbs 1
xbsi1e 1
if (xbsi1e)
{
dmixmod 2
ltrtcmixlev 3
ltrtsurmixlev 3
lorocmixlev 3
lorosurmixlev 3
}
xbsi2e 1
if (xbsi2e)
{
dsurexmod 2
dheadphonmod 2
adconvtyp 1
xbsi2 8
142
Digital Audio Compression Standard, Annex D 22 November 2010
Table D2.1 Bit Stream Information; Alternate Bit Stream Syntax (Continued)
Syntax Word Size
encinfo 1
}
addbsie 1
if (addbsie)
{
addbsil 6
addbsi(addbsil+1)×8
}
} /* end of bsi */
Note: The meaning of this field is only defined as described if the audio coding
mode is 3/0, 2/1, 3/1, 2/2 or 3/2. If the audio coding mode is 1+1, 1/0 or 2/0 then
the meaning of this field is reserved.
143
Advanced Television Systems Committee, Inc. Document A/52:2010
Note: The meaning of this field is only defined as described if the audio coding
mode is 3/0, 3/1 or 3/2. If the audio coding mode is 1+1, 1/0, 2/0, 2/1 or 2/2 then
the meaning of this field is reserved.
Note: The meaning of this field is only defined as described if the audio coding
mode is 2/1, 3/1, 2/2 or 3/2. If the audio coding mode is 1+1, 1/0, 2/0 or 3/0 then
the meaning of this field is reserved.
144
Digital Audio Compression Standard, Annex D 22 November 2010
Note: The meaning of this field is only defined as described if the audio coding
mode is 3/0, 3/1 or 3/2. If the audio coding mode is 1+1, 1/0, 2/0, 2/1 or 2/2 then
the meaning of this field is reserved.
Note: The meaning of this field is only defined as described if the audio coding
mode is 2/1, 3/1, 2/2 or 3/2. If the audio coding mode is 1+1, 1/0, 2/0 or 3/0 then
the meaning of this field is reserved.
145
Advanced Television Systems Committee, Inc. Document A/52:2010
Note: The meaning of this field is only defined as described if the audio coding
mode is 2/2 or 3/2. If the audio coding mode is 1+1, 1/0, 2/0, 3/0, 2/1 or 3/1 then
the meaning of this field is reserved.
Note: The meaning of this field is only defined as described if the audio coding
mode is 2/0. If the audio coding mode is 1+1, 1/0, 3/0, 2/1, 3/1, 2/2 or 3/2 then the
meaning of this field is reserved.
146
Digital Audio Compression Standard, Annex D 22 November 2010
147
Advanced Television Systems Committee, Inc. Document A/52:2010
148
A/52:2010, Annex E:
Enhanced AC-3 Bit Stream Syntax
(Normative)
E1. SCOPE
This Annex defines the bit stream syntax that shall be used by Enhanced AC-3 bit streams, and a
reference decoding process. Enhanced AC-3 bit streams are similar in nature to standard AC-3 bit
streams, but are not backwards compatible (i.e., they are not decodable by standard AC-3
decoders). This Annex outlines the differences between the stream types, and specifies the
reference decoding process for Enhanced AC-3 bit streams. This Annex is normative in
applications that specify the use of Enhanced AC-3. Encoders shall construct bit streams for
decoding using the decoding process specified in this Annex.
E2. SPECIFICATION
Syntax
bit stream()
{
while(true)
{
syncframe() ;
}
} /* end of bit stream */
The syncframe consists of the syncinfo, bsi and audfrm fields, up to 6 coded audblk fields, the auxdata
field, and the errorcheck field.
Page 149
Advanced Television Systems Committee, Inc. Document A/52:2010
Syntax
syncframe()
{
syncinfo() ;
bsi() ;
audfrm() ;
for (blk = 0; blk < number_of_blocks_per_syncframe; blk++)
{
audblk() ;
}
auxdata() ;
errorcheck() ;
} /* end of syncframe */
Each of the bit stream elements, and their length, are itemized in the following tables. Note
that all bit stream elements arrive most significant bit first, or left bit first, in time.
150
Digital Audio Compression Standard, Annex E 22 November 2010
151
Advanced Television Systems Committee, Inc. Document A/52:2010
152
Digital Audio Compression Standard, Annex E 22 November 2010
153
Advanced Television Systems Committee, Inc. Document A/52:2010
154
Digital Audio Compression Standard, Annex E 22 November 2010
155
Advanced Television Systems Committee, Inc. Document A/52:2010
156
Digital Audio Compression Standard, Annex E 22 November 2010
157
Advanced Television Systems Committee, Inc. Document A/52:2010
158
Digital Audio Compression Standard, Annex E 22 November 2010
159
Advanced Television Systems Committee, Inc. Document A/52:2010
160
Digital Audio Compression Standard, Annex E 22 November 2010
161
Advanced Television Systems Committee, Inc. Document A/52:2010
162
Digital Audio Compression Standard, Annex E 22 November 2010
163
Advanced Television Systems Committee, Inc. Document A/52:2010
164
Digital Audio Compression Standard, Annex E 22 November 2010
165
Advanced Television Systems Committee, Inc. Document A/52:2010
166
Digital Audio Compression Standard, Annex E 22 November 2010
167
Advanced Television Systems Committee, Inc. Document A/52:2010
168
Digital Audio Compression Standard, Annex E 22 November 2010
169
Advanced Television Systems Committee, Inc. Document A/52:2010
170
Digital Audio Compression Standard, Annex E 22 November 2010
Type 2: These frames comprise an independent stream or substream that was previously coded in
AC-3. Type 2 streams must be independently decodable, and may not have any dependent
streams associated with them.
Type 3: Reserved.
171
Advanced Television Systems Committee, Inc. Document A/52:2010
E2.3.1.5 numblkscod / fscod2: Number of Audio Blocks / Sample Rate Code 2, 2 bits
numblkscod – This 2-bit code, as shown in Table E2.9, indicates the number of audio blocks per
syncframe if the fscod indicates 32, 44.1, or 48 kHz sampling rate:
fscod2 – If the fscod field indicates fscod2 then this 2-bit code indicates the reduced sample rate, as
shown in Table E2.10. When using reduced sample rates, numblkscod shall be ‘11’ (6 blocks per
syncframe).
172
Digital Audio Compression Standard, Annex E 22 November 2010
dependent substream is associated. Non-shaded entries in Table E2.11 represent channel locations
not present in the independent substream with which the dependent substream is associated.
The custom channel map indicates both which coded channels are present in the dependent
substream and the order of the coded channels in the dependent substream. For each channel
present in the dependent substream, the corresponding location bit in the chanmap is set to ‘1’. The
order of the coded channels in the dependent substream is the same as the order of the enabled
location bits in the chanmap. For example, if bits 0, 3, and 4 of the chanmap field are set to ‘1’, and
the dependent stream is coded with acmod = 3 and lfeon = 0, the first coded channel in the
dependent stream is the Left channel, the second coded channel is the Left Surround channel, and
the third coded channel is the Right Surround channel. Note that the number of channel locations
indicated by the chanmap field must equal the total number of coded channels present in the
dependent substream, as indicated by the acmod and lfeon bit stream parameters.
For more information about usage of the chanmap parameter, please refer to Section E3.7.
Valid values for the LFE mix level code are 0 to 31, and valid values for the LFE mix level are
therefore +10 to –21 dB. For more information on LFE mixing, please refer to Section E3.8.
173
Advanced Television Systems Committee, Inc. Document A/52:2010
174
Digital Audio Compression Standard, Annex E 22 November 2010
175
Advanced Television Systems Committee, Inc. Document A/52:2010
SNR Offset Strategy 1: When SNR Offset Strategy 1 is used, one coarse SNR offset value and
one fine SNR offset value are transmitted in the bit stream. These SNR offset values are used
for every channel of every block in the frame, including the coupling and LFE channels.
SNR Offset Strategy 2: When SNR Offset Strategy 2 is used, one coarse SNR offset value and
one fine SNR offset value are transmitted in the bit stream as often as once per block. The fine
SNR offset value is used for every channel in the block, including the coupling and LFE
channels. For blocks in which coarse and fine SNR offset values are not transmitted in the bit
stream, the decoder must reuse the coarse and fine SNR offset values from the previous block.
One coarse and one fine SNR offset value must be transmitted in block 0.
SNR Offset Strategy 3: When SNR Offset Strategy 3 is used, coarse and fine SNR offset values
are transmitted in the bit stream as often as once per block. Separate fine SNR offset values
are transmitted for each channel, including the coupling and LFE channels. For blocks in
which coarse and fine SNR offset values are not transmitted in the bit stream, the decoder
must reuse the coarse and fine SNR offset values from the previous block. Coarse and fine
SNR offset values must be transmitted in block 0.
176
Digital Audio Compression Standard, Annex E 22 November 2010
177
Advanced Television Systems Committee, Inc. Document A/52:2010
178
Digital Audio Compression Standard, Annex E 22 November 2010
where numblks is derived from the numblkscod in Table E2.15 and ceiling(n) is a function which
rounds the fractional number n up to the next higher integer.
179
Advanced Television Systems Committee, Inc. Document A/52:2010
For example,
ceiling(2.1) = 3
words_per_frame = frmsiz + 1
180
Digital Audio Compression Standard, Annex E 22 November 2010
181
Advanced Television Systems Committee, Inc. Document A/52:2010
182
Digital Audio Compression Standard, Annex E 22 November 2010
183
Advanced Television Systems Committee, Inc. Document A/52:2010
any other block, the band structure from the previous block is reused. The default enhanced
coupling banding structure defecplbndstrc[] is shown in Table E2.17.
184
Digital Audio Compression Standard, Annex E 22 November 2010
The total number of enhanced coupling bands, necplbnd, may be computed as follows:
A default setting of ecplbndstrc[], when all bands are used in enhanced coupling, is given in
Table E2.17.
185
Advanced Television Systems Committee, Inc. Document A/52:2010
186
Digital Audio Compression Standard, Annex E 22 November 2010
187
Advanced Television Systems Committee, Inc. Document A/52:2010
E3.3.1 Overview
The Adaptive Hybrid Transform (AHT) is composed of two linear transforms connected in
cascade. The first transform is identical to that employed in AC-3 – a windowed Modified
Discrete Cosine Transform (MDCT) of length 128 or 256 frequency samples. This feature
provides compatibility with AC-3 without the need to return to the time domain in the decoder.
For frames containing audio signals which are not time-varying in nature (stationary), a second
transform can optionally be applied by the encoder, and inverted by the decoder. The second
transform is composed of a non-windowed, non-overlapped Discrete Cosine Transform (DCT
Type II). When the DCT is employed, the effective audio transform length increases from 256 to
1536 audio samples. This results in significantly improved coding gain and perceptual coding
performance for stationary signals.
The AHT transform is enabled by setting the ahte bit stream parameter to 1. If ahte is 1, at least
one of the independent channels, the coupling channel, or the LFE channel has been coded with
AHT. The chahtinu[ch], cplahtinu, and lfeahtinu bit stream parameters are used to indicate which
channels are channels coded with AHT.
In order to realize gain made available by the AHT, the AC-3 scalar quantizers have been
augmented with two new coding tools. When AHT is in use, both 6-dimensional vector
quantization (VQ) and gain-adaptive quantization (GAQ) are employed. VQ is employed for the
largest step sizes (coarsest quantization), and GAQ is employed for the smallest stepsizes (finest
quantization). The selection of quantizer step size is performed using the same parametric bit
allocation method as AC-3, except the conventional bit allocation pointer (bap) table is replaced
with a high-efficiency bap table (hebap[]). The hebap[] table employs finer-granularity than the
conventional bap table, enabling more efficient allocation of bits.
188
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
/* Only compute ncplregs if coupling in use for all 6 blocks */
ncplregs = 0;
/* AHT is only available in 6 block mode (numblkscod ==0x3) */
for (blk = 0; blk < 6; blk++)
{
if ( (cplstre[blk] == 1) || (cplexpstr[blk] != reuse) )
{
ncplregs++;
}
}
Pseudo Code
for (ch = 0; ch < nfchans; ch++)
{
nchregs[ch] = 0;
/* AHT is only available in 6 block mode (numblkscod ==0x3) */
for (blk = 0; blk < 6; blk++)
{
if (chexpstr[blk][ch] != reuse)
{
nchregs[ch]++;
}
}
}
Pseudo Code
nlferegs = 0;
/* AHT is only available in 6 block mode (numblkscod ==0x3) */
for (blk = 0; blk < 6; blk++)
{
if ( lfeexpstr[blk] != reuse)
{
nlferegs++;
}
}
A second set of helper variables are required for identifying which and how many mantissas
employ GAQ. The arrays identifying which bins are GAQ coded are called chgaqbin[ch][bin],
cplgaqbin[bin], and lfegaqbin[bin]. Since the number and position of GAQ-coded mantissas varies from
frame to frame, these variables need to be computed after the corresponding hebap[] array is
available, but prior to mantissa unpacking. This procedure is shown in the following pseudo-code.
189
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
if (cplahtinu == 0)
{
for (bin = cplstrtmant; bin < cplendmant; bin++)
{
cplgaqbin[bin] = 0;
}
}
else
{
if (cplgaqmod < 2)
{
endbap = 12;
}
else
{
endbap = 17;
}
cplactivegaqbins = 0;
for (bin = cplstrtmant; bin < cplendmant; bin++)
{
if (cplhebap[bin] > 7 && cplhebap[bin] < endbap)
{
cplgaqbin[bin] = 1; /* Gain word is present */
cplactivegaqbins++;
}
else if (cplhebap[bin] >= endbap)
{
cplgaqbin[bin] = -1; /* Gain word is not present */
}
else
{
cplgaqbin[bin] = 0;
}
}
}
190
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
for (ch = 0; ch < nfchans; ch++)
{
if (chahtinu[ch] == 0)
{
for (bin = 0; bin < endmant[ch]; bin++)
{
chgaqbin[ch][bin] = 0;
}
}
else
{
if (chgaqmod < 2)
{
endbap = 12;
}
else
{
endbap = 17;
}
chactivegaqbins[ch] = 0;
for (bin = 0; bin < endmant[ch]; bin++)
{
if (chhebap[ch][bin] > 7 && chhebap[ch][bin] < endbap)
{
chgaqbin[ch][bin] = 1; /* Gain word is present */
chactivegaqbins[ch]++;
}
else if (chhebap[ch][bin] >= endbap)
{
chgaqbin[ch][bin] = -1;/* Gain word not present */
}
else
{
chgaqbin[ch][bin] = 0;
}
}
}
}
191
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
if (lfeahtinu == 0)
{
for (bin = 0; bin < lfeendmant; bin++)
{
lfegaqbin[bin] = 0;
}
}
else
{
if (lfegaqmod < 2)
{
endbap = 12;
}
else
{
endbap = 17;
}
lfeactivegaqbins = 0;
for (bin = 0; bin < lfeendmant; bin++)
{
if (lfehebap[bin] > 7 && lfehebap[bin] < endbap)
{
lfegaqbin[bin] = 1; /* Gain word is present */
lfeactivegaqbins++;
}
else if (lfehebap[bin] >= endbap)
{
lfegaqbin[bin] = -1; /* Gain word is not present */
}
else
{
lfegaqbin[bin] = 0;
}
}
}
In a final set of helper variables, the number of gain words to be read from the bitstream is
computed. These variables are called chgaqsections[ch], cplgaqsections, and lfegaqsections for the
independent channels, coupling channel, and LFE channel, respectively. They denote the number
of GAQ gain words transmitted in the bit stream, and are computed as shown in the following
pseudo code.
192
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
if (cplahtinu == 0)
{
cplgaqsections = 0;
}
else
{
switch(cplgaqmod)
{
case 0: /* No GAQ gains present */
{
cplgaqsections = 0;
break;
}
case 1: /* GAQ gains 1 and 2 */
case 2: /* GAQ gains 1 and 4 */
{
cplgaqsections = cplactivegaqbins;/* cplactivegaqbins was computed earlier */
break;
}
case 3: /* GAQ gains 1, 2, and 4 */
{
cplgaqsections = cplactivegaqbins / 3;
if (cplactivegaqbins % 3) cplgaqsections++;
break;
}
}
}
193
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
for (ch = 0; ch <nfchans; ch ++)
{
if (chahtinu[ch] == 0)
{
chgaqsections[ch] = 0;
}
else
{
switch(chgaqmod[ch])
{
case 0: /* No GAQ gains present */
{
chgaqsections[ch] = 0;
break;
}
case 1: /* GAQ gains 1 and 2 */
case 2: /* GAQ gains 1 and 4 */
{
chgaqsections[ch] = chactivegaqbins[ch]; /* chactivegaqbins[ch] was computed earlier */
break;
}
case 3: /* GAQ gains 1, 2, and 4 */
{
chgaqsections[ch] = chactivegaqbins[ch] / 3;
if (chactivegaqbins[ch] % 3) chgaqsections[ch]++;
break;
}
}
}
}
194
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
if (lfeahtinu == 0)
{
lfegaqsections = 0;
}
else
{
sumgaqbins = 0;
for (bin = 0; bin < lfeendmant; bin++)
{
sumgaqbins += lfegaqbin[bin];
}
switch(lfegaqmod)
{
case 0: /* No GAQ gains present */
{
lfegaqsections = 0;
break;
}
case 1: /* GAQ gains 1 and 2 */
case 2: /* GAQ gains 1 and 4 */
{
lfegaqsections = lfeactivegaqbins; /* lfeactivegaqbins was computed earlier */
break;
}
case 3: /* GAQ gains 1, 2, and 4 */
{
lfegaqsections = lfeactivegaqbins / 3;
if (lfeactivegaqbins % 3) lfegaqsections++;
break;
}
}
}
If the gaqmod bit stream parameter bits are set to 0, conventional scalar quantization is used in
place of GAQ coding. If the gaqmod bits are set to 1 or 2, a 1-bit gain is present for each mantissa
coded with GAQ. If the gaqmod bits are set to 3, the GAQ gains for three individual mantissas are
compositely coded as a 5-bit word.
195
Advanced Television Systems Committee, Inc. Document A/52:2010
bit allocation routines defined in the main body of this document in order to achieve higher
precision allocation.
Pseudo Code
if (ahtinu == 1) /* cplAHTinu, chAHTinu[ch], or lfeAHTinu */
{
i = start ;
j = masktab[start] ;
do
{
lastbin = min(bndtab[j] + bndsz[j]), end);
mask[j] -= snroffset ;
mask[j] -= floor ;
if (mask[j] < 0)
{
mask[j] = 0 ;
}
mask[j] &= 0x1fe0 ;
mask[j] += floor ;
for (k = i; k < lastbin; k++)
{
address = (psd[i] - mask[j]) >> 5 ;
address = min(63, max(0, address)) ;
hebap[i] = hebaptab[address] ;
i++ ;
}
196
Digital Audio Compression Standard, Annex E 22 November 2010
j++;
}
while (end > lastbin) ;
}
else
{
i = start ;
j = masktab[start] ;
do
{
lastbin = min(bndtab[j] + bndsz[j], end);
mask[j] -= snroffset ;
mask[j] -= floor ;
if (mask[j] < 0)
{
mask[j] = 0 ;
}
mask[j] &= 0x1fe0 ;
mask[j] += floor ;
for (k = i; k < lastbin; k++)
{
address = (psd[i] - mask[j]) >> 5 ;
address = min(63, max(0, address)) ;
bap[i] = baptab[address] ;
i++ ;
}
j++;
}
while (end > lastbin) ;
}
197
Advanced Television Systems Committee, Inc. Document A/52:2010
198
Digital Audio Compression Standard, Annex E 22 November 2010
Table E3.2 Quantizer Type, Quantizer Level, and Mantissa Bits vs. hebap
hebap Quantizer Type Levels Mantissa
Bits
0 NA NA 0
1 VQ NA (2/6)
2 VQ NA (3/6)
3 VQ NA (4/6)
4 VQ NA (5/6)
5 VQ NA (7/6)
6 VQ NA (8/6)
7 VQ NA (9/6)
8 symmetric + GAQ 7 3
9 symmetric + GAQ 15 4
10 symmetric + GAQ 31 5
11 symmetric + GAQ 63 6
12 symmetric + GAQ 127 7
13 symmetric + GAQ 255 8
14 symmetric + GAQ 511 9
15 symmetric + GAQ 1023 10
16 symmetric + GAQ 2047 11
17 symmetric + GAQ 4095 12
18 symmetric + GAQ 16,383 14
19 symmetric + GAQ 65,535 16
E3.3.4 Quantization
Depending on the bit allocation pointer (hebap) calculated in Section E3.3.3.1, the mantissa values
are either coded using vector quantization or gain adaptive quantization. The following section
describes both of these coding techniques.
199
Advanced Television Systems Committee, Inc. Document A/52:2010
mantissa vector and the table vector. The index of the closest matching vector is then transmitted
to the decoder.
In the decoder, the index is read from the bit stream and the mant values are replaced with the
values from the appropriate vector table.
200
Digital Audio Compression Standard, Annex E 22 November 2010
produce the gain attenuation element corresponding to each DCT mantissa block identified in the
bit stream. The switch position is also derived by the deformatter for each GAQ-coded mantissa.
The switch position is determined from the presence or absence of a unique bit stream tag, as
discussed in the next paragraph. When the deformatting operation is complete, the dequantized
and level-adjusted mantissas are available for the next stage of processing.
As a means for signaling the two mantissa lengths to the decoder, quantizer output symbols
for large mantissas are flagged in the bit stream using a unique identifier tag. In Enhanced AC-3,
the identifier tag is the quantizer symbol representing a full-scale negative output (e.g., the ‘100’
symbol for a 3-bit two’s complement quantizer). In a conventional mid-tread quantizer, this
symbol is often deliberately unused since it results in an asymmetric quantizer characteristic. In
gain-adaptive quantization, this symbol is employed to indicate the presence of a large mantissa.
The tag length is equal to the length of the small mantissa codeword (computed from hebap[] and
gaqgain[]), allowing unique bit stream decoding. If an identifier tag is found, additional bits
immediately following the tag (also of known length) convey the quantizer output level for the
corresponding large mantissas.
Four different gain transmission modes are available for use in the encoder. The different
modes employ switched 0, 1 or 1.67-bit gains. For each independent, coupling, and LFE channel
in which AHT is in use, a 2-bit parameter called gaqmod is transmitted once per frame to the
decoder. The bitstream parameters, values, and active hebap range are shown for each mode in
Table E3.3. If gaqmod = 0x0, GAQ is not in use and no gains are present in the bitstream. If gaqmod
= 0x1, a 1-bit gain value is present for each block of DCT coefficients having an hebap value
between 8 and 11, inclusive. Coefficients with hebap higher than 11 are decoded using the same
quantizer as gaqmod 0x0. If gaqmod = 0x2 or 0x3, gain values are present for each block of DCT
coefficients having an hebap value between 8 and 16, inclusive. Coefficients with hebap higher
than 16 are decoded using the same quantizer as gaqmod 0x0. The difference between the two last
modes lies in the gain word length, as shown in Table E3.3.
For the case of gaqmod = 0x1 and 0x2, the gains are coded using binary 0 to signal Gk = 1, and
binary 1 to signal Gk = 2 or 4. For the case of gaqmod = 0x3, the gains are composite-coded in
triplets (three 3-state gains packed into 5-bit words). The gains are unpacked in a manner similar
to exponent unpacking as described in the main body of this document. For example, for a 5-bit
composite gain triplet grpgain:
M1 = truncate (grpgain / 9)
M2 = truncate ((grpgain % 9) / 3)
M3 = (grpgain % 9) % 3
201
Advanced Television Systems Committee, Inc. Document A/52:2010
In this example, M1, M2, and M3 correspond to mapped values derived from consecutive
gains in three ascending frequency blocks, respectively, each ranging in value from 0 to 2
inclusive as shown in Table E3.4.
Details of the GAQ quantizer characteristics are shown in Table E3.5. If the received gain is 1,
or no gain was received at all, a single quantizer with no tag is used. If the received gain is either
2 or 4, both the small and large mantissas (and associated tags) must be decoded using the
quantizer characteristics shown. Both small and large mantissas are decoded by interpreting them
as signed two’s complement fractional values. The variable m in the table represents the number
of mantissa bits associated with a given hebap value as shown in Table E3.2.
Since the large mantissas are coded using a dead-zone quantizer, a post-processing step is
required to transform (remap) large mantissa codewords received by the decoder into a
reconstructed mantissa. This remapping is applied when Gk = 2 or 4. An identical post-processing
step is required to implement a symmetric quantizer characteristic when Gk = 1, and for all gaqmod
= 0x0 quantizers. The post-process is a computation of the form y = x + ax + b. In this equation, x
represents a mantissa codeword (interpreted as a signed two’s complement fractional value), and
the constants a and b are provided in Table E3.6. The constants are also interpreted as 16-bit
signed two’s complement fractional values. The expression for y was arranged for
implementation convenience so that all constants will have magnitude less than one. For decoders
where this is not a concern, the remapping can be implemented as y = a’x + b, where the new
coefficient a’ = 1 + a. The sign of x must be tested prior to retrieving b from the table. Remapping
is not applicable to the table entries marked N/A.
202
Digital Audio Compression Standard, Annex E 22 November 2010
Where
1 j≠0
Rj =
1 / 2 j=0
and k is the bin index, m is the block index, and j is the AHT transform index.
203
Advanced Television Systems Committee, Inc. Document A/52:2010
E3.4.1 Overview
Enhanced channel coupling is a spatial coding technique that elaborates on conventional channel
coupling, principally by adding phase compensation, a de-correlation mechanism, variable time
constants, and more compact amplitude representation. The intent is to reduce coupling
cancellation artifacts in the encode process by adjusting inter-channel phase before downmixing,
and to improve dimensionality of the reproduced signal by restoring the phase angles and degrees
of correlation in the decoder. This also allows the process to be used at lower frequencies than
conventional channel coupling.
The decoder converts the enhanced coupling channel back into individual channels
principally by applying an amplitude scaling and phase adjustment for each channel and
frequency sub-band. Additional processing occurs when transients are indicated in one or more
channels.
204
Digital Audio Compression Standard, Annex E 22 November 2010
Note: At 32 kHz sampling rate the sub-band frequency ranges are 2/3 the values of
those for 48 kHz.
205
Advanced Television Systems Committee, Inc. Document A/52:2010
The enhanced coupling sub-bands are combined into enhanced coupling bands for which
coupling coordinates are generated (and included in the bit stream). The coupling band structure
is indicated by ecplbndstrc[sbnd]. Each bit of the ecplbndstrc[] array indicates whether the sub-band
indicated by the index is combined into the previous (lower in frequency) enhanced coupling
band. Enhanced coupling bands are thus made from integral numbers of enhanced coupling sub-
bands. (See Section E2.3.3.19.)
206
Digital Audio Compression Standard, Annex E 22 November 2010
207
Advanced Television Systems Committee, Inc. Document A/52:2010
208
Digital Audio Compression Standard, Annex E 22 November 2010
209
Advanced Television Systems Committee, Inc. Document A/52:2010
210
Digital Audio Compression Standard, Annex E 22 November 2010
1. Define the MDCT transform coefficient buffers for the previous, current and next blocks (of
length k = 0, 1,…,N/2–1 where N = 512) as:
2. Compute the windowed time domain samples xPREV[n], xCURR[n] and xNEXT[n] using the 512-
sample IMDCT (as described in steps 1 to 5 of Section 7.9.4.1 in the main body of this
document).
3. Overlap and add the second half of the previous sample block and the first half of the next
sample block with the current sample block as follows:
Pseudo Code
for (n=0; n<N/2; n++)
{
pcm[n] = xPREV[n+N/2] + xCURR[n];
pcm[n+N/2] = xCURR[n+N/2] + xNEXT [n];
}
4) Adjust the enhanced coupling channel samples such that the following DFT (FFT) output is
an oddly stacked filterbank (as per the MDCT). The window w[n] is defined in Table 7.33 in
the main body of this document.
Pseudo Code
for (n=0; n<N/2; n++)
{
pcm_real[n] = pcm[n] * w[n] * xcos3[n];
pcm_real[n+N/2] = pcm[n+N/2] * w[N/2-n-1] * xcos3[n+N/2];
pcm_imag[n] = pcm[n] * w[n] * xsin3[n];
pcm_imag[n+N/2] = pcm[n+N/2] * w[N/2-n-1] * xsin3[n+N/2];
}
Where:
xcos3[n] = cos(π * n / N) ;
xsin3[n] = -sin(π * n / N) ;
211
Advanced Television Systems Committee, Inc. Document A/52:2010
5. Perform a Discrete Fourier Transform (as an FFT) on the complex samples to create the
complex frequency coefficients Z[k], k = 0, 1,…,N–1
N–1
1 2πkn 2πkn
Z [ k ] = ----
N ( pcm_real [ n ] + j × pcm_imag [ n ] ) × cos ------------
N
- – j × sin -------------
N
n=0
Pseudo Code
if (ecplamp[ch][bnd] == 31)
{
amp[ch][bnd] = 0;
}
else
{
amp[ch][bnd] = ( ecplampmanttab[ecplamp[ch][bnd]] / 32 ) >> ecplampexptab[ecplamp[ch][bnd]];
}
Modifications are made to the amplitude values using the transmitted chaos measure and
transient parameter. Firstly, chaos values for each enhanced coupling band [bnd] in each channel
[ch] are obtained from the ecplchaos parameters as follows.
Pseudo Code
if (ch == firstchincpl)
{
chaos[ch][bnd] = 0;
}
else
{
chaos[ch][bnd] = ecplchaostab[ecplchaos[ch][bnd]];
}
Pseudo Code
if ( (ecpltrans[ch] == 0) && (ch != firstchincpl) )
{
amp[ch][bnd] *= 1 + 0.38 * chaos[ch][bnd];
}
Using the ecplbndstrc[] array, the amplitude values amp[ch][bnd] which apply to enhanced
coupling bands are converted to values which apply to enhanced coupling sub-bands amp[ch][sbnd]
212
Digital Audio Compression Standard, Annex E 22 November 2010
by duplicating values as indicated by values of ‘1’ in ecplbndstrc[]. Amplitude values for individual
transform coefficients [bin] are then reconstructed as follows.
Pseudo Code
bnd = -1;
for (sbnd=ecpl_start_sbnd; sbnd<ecpl_end_sbnd; sbnd++)
{
if (ecplbndstrc[sbnd] == 0)
{
bnd++;
}
for (bin=ecplsubbndtab[sbnd]; bin<ecplsubbndtab[sbnd+1]; bin++)
{
amp[ch][bin] = amp[ch][bnd];
}
}
Pseudo Code
if (ch == firstchincpl)
{
angle[ch][bnd] = 0;
}
else
{
angle[ch][bnd] = ecplangletab[ecplangle[ch][bnd]];
}
The above band angle values are used to derive bin angle values associated with individual
transform coefficients in one of two ways depending on the ecplangleintrp flag.
If ecplangleintrp is set to 0, then no interpolation is used and the band angle values are applied to
bin angle values according to the ecplbndstrc[] array.
If ecplangleintrp is set to 1, then the band angle values are converted to bin angle values using
linear interpolation between the centers of each band. The following pseudo code interpolates the
band angles (angle[ch][bnd]) into bin angles (angle[ch][bin]) for channel [ch].
Pseudo Code
if (ecpangleintrp == 1)
{
213
Advanced Television Systems Committee, Inc. Document A/52:2010
bin = ecplsubbndtab[ecpl_start_subbnd];
for (bnd = 1; bnd < nbands; bnd++)
{
nbins_prev = nbins_per_bnd_array[bnd-1]; /* array of length nbands containing band sizes */
nbins_curr = nbins_per_bnd_array[bnd];
angle_prev = angle[ch][bnd-1];
angle_curr = angle[ch][bnd];
while ((angle_curr – angle_prev) > 1.0) angle_curr -= 2.0;
while ((angle_prev – angle_curr) > 1.0) angle_curr += 2.0;
slope = (angle_curr – angle_prev)/((nbins_curr + nbins_prev)/2.0); /* floating point calculation*/
/ * do lower half of first band */
if ((bnd == 1) && (nbins_prev > 1))
{
if (iseven(nbins_prev)) /* iseven() returns 1 if value is even, 0 if value is odd */
{
y = angle_prev - slope/2;
bin = nbins_prev/2 - 1;
}
else
{
y = angle_prev - slope;
bin = (nbins_prev - 3)/2;
}
count = bin + 1;
for (j = 0; j < count; j++)
{
ytmp = y;
while (y > 1.0) y -= 2.0;
while (y < (-1.0)) y += 2.0;
angle[ch][bin--] = y;
y = ytmp;
y -= slope;
}
bin = count;
}
if (iseven(nbins_prev))
{
y = angle_prev + slope/2;
count = nbins_curr/2 + nbins_prev/2; /* integer calculation */
}
else {
y = angle_prev;
count = nbins_curr/2 + (nbins_prev + 1)/2; /* integer calculation */
}
214
Digital Audio Compression Standard, Annex E 22 November 2010
215
Advanced Television Systems Committee, Inc. Document A/52:2010
across each bin within a subband. The chaos and random values are then used to modify each
angle value as follows.
Pseudo Code
if (ecpltrans[ch] == 0)
{
rand[ch][bin] = rand_notrans[ch][bin]
}
else
{
rand[ch][bin] = rand_trans[ch][bin]
}
angle[ch][bin] += chaos[ch][bin] * rand[ch][bin];
if (angle[ch][bin] < -1.0)
{
angle[ch][bin] += 2.0;
}
else if(angle[ch][bin] >= 1.0)
{
angle[ch][bin] -= 2.0;
}
Pseudo Code
Zr[ch][bin] = Zr[bin] * amp[ch][bin] * cos(π * angle[ch][bin]) - Zi[bin] * amp[ch][bin] * sin(π * angle[ch][bin]);
Zi[ch][bin] = Zi[bin] * amp[ch][bin] * cos(π * angle[ch][bin]) + Zr[bin] * amp[ch][bin] * sin(π * angle[ch][bin]);
chmant[ch][bin] = -2 * ( y[bin] * Zr[ch][bin] + y[N/2-1-bin] * Zi[ch][bin] );
Where:
Zr[bin] = real(Z[k]);
Zi[bin] = imag(Z[k]);
y[bin] = cos(2π * (N/4 + 0.5) / N * (k + 0.5));
for bin=k=0,1,…,N/2–1
E3.5.1 Overview
When spectral extension is in use, high frequency transform coefficients of the channels that are
participating in spectral extension are synthesized. Transform coefficient synthesis involves
216
Digital Audio Compression Standard, Annex E 22 November 2010
copying low frequency transform coefficients, inserting them as high frequency transform
coefficients, blending the inserted transform coefficients with pseudo-random noise, and scaling
the blended transform coefficients to match the coarse (banded) spectral envelope of the original
signal. To enable the decoder to scale the blended transform coefficients to match the spectral
envelope of the original signal, scale factors are computed by the encoder and transmitted to the
decoder on a banded basis for all channels participating in the spectral extension process. For a
given channel and spectral extension band, the blended transform coefficients for that channel and
band are multiplied by the scale factor associated with that channel and band.
The spectral extension process is performed beginning at the spectral extension begin
frequency, and ending at the spectral extension end frequency. The spectral extension begin
frequency is derived from the spxbegf bit stream parameter. The spectral extension end frequency
is derived from the spxendf bit stream parameter.
In some cases, it may be desirable to use channel coupling for a mid-range portion of the
frequency spectrum and spectral extension for the higher-range portion of the frequency
spectrum. In this configuration, the highest coupled transform coefficient number must be 1 less
than the lowest transform coefficient number generated by spectral extension.
217
Advanced Television Systems Committee, Inc. Document A/52:2010
Pseudo Code
nspxbnds = 1;
spxbndsztab[0] = 12;
for (bnd = spxbegf+1; bnd < spxendf; bnd ++)
{
if (spxbndstrc[bnd] == 0)
{
spxbndsztab[nspxbnds] = 12;
nspxbnds++;
}
else
{
spxbndsztab[nspxbnds – 1] += 12;
}
}
218
Digital Audio Compression Standard, Annex E 22 November 2010
previous spectral extension coordinates should be reused. If (spxcoe[ch] == 1), spectral extension
coordinates are present in the bit stream for channel [ch].
When present in the bit stream, spectral extension coordinates are transmitted in a floating
point format. The exponent is sent as a 4-bit value (spxcoexp[ch][bnd]) indicating the number of right
shifts which should be applied to the fractional mantissa value. The mantissas are sent as 2-bit
values (spxcomant[ch][bnd]) which must be properly scaled before use. Mantissas are unsigned
values so a sign bit is not used. Except for the limiting case where the exponent value = 15, the
mantissa value is known to be between 0.5 and 1.0. Therefore, when the exponent value < 15, the
msb of the mantissa is always equal to ‘1’ and is not transmitted; the next 2 bits of the mantissa
are transmitted. This provides one additional bit of resolution. When the exponent value = 15 the
mantissa value is generated by dividing the 2-bit value of spxcomant by 4. When the exponent
value is < 15 the mantissa value is generated by adding 4 to the 2-bit value of spxcomant and then
dividing the sum by 8.
Spectral extension coordinate dynamic range is increased beyond what the 4-bit exponent can
provide by the use of a per channel 2-bit master spectral extension coordinate (mstrspxco[ch]) which
is used to scale all of the spectral extension coordinates within that channel. The exponent values
for each channel are increased by 3 times the value of mstrspxco which applies to that channel. This
increases the dynamic range of the spectral extension coordinates by an additional 54 dB.
The following pseudo code indicates how to generate the spectral extension coordinate (spxco)
for each spectral extension band [bnd] in each channel [ch].
Pseudo Code
if (spxcoexp[ch][bnd] == 15)
{
spxco_temp[ch][bnd] = spxcomant[ch][bnd] / 4;
}
else
{
spxco_temp[ch][bnd] = (spxcomant[ch][bnd] + 4) / 8;
}
spxco[ch][bnd] = spxco_temp[ch][bnd] >> (spxcoexp[ch][bnd] + 3*mstrspxco[ch]);
219
Advanced Television Systems Committee, Inc. Document A/52:2010
derived from the bit stream parameter of the same name, is used as the index into a table to
determine the last transform coefficient to be inserted.
Transform coefficient translation is performed on a banded basis. For each spectral extension
band, coefficients are copied sequentially starting with the transform coefficient at copyindex and
ending with the transform coefficient at (copyindex + bandsize – 1). Transform coefficients are
inserted sequentially starting with the transform coefficient at insertindex and ending with the
transform coefficient at (insertindex + bandsize – 1).
Prior to beginning the translation process for each band, the value of (copyindex + bandsize – 1)
is compared to the copyendmant parameter. If (copyindex + bandsize – 1) is greater than or equal to the
copyendmant parameter, the copyindex parameter is reset to the copystartmant parameter and
wrapflag[bnd] is set to 1. Otherwise, wrapflag[bnd] is set to 0.
The following pseudo code indicates how the spectral component translation process is
carried out for channel [ch].
Pseudo Code
copystartmant = spxbandtable[spxstrtf];
copyendmant = spxbandtable[spxbegf];
copyindex = copystartmant;
insertindex = copyendmant;
for (bnd = 0; bnd < nspxbnds; band++)
{
bandsize = spxbndsztab[bnd];
if ((copyindex + bandsize) > copyendmant)
{
copyindex = copystartmant;
wrapflag[bnd] = 1;
}
else
{
wrapflag[bnd] = 0;
}
for (bin = 0; bin < bandsize; bin++)
{
if (copyindex == copyendmant)
{
copyindex = copystartmant;
}
tc[ch][insertindex] = tc[ch][copyindex];
insertindex++;
copyindex++;
}
}
220
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
noffset[ch] = spxblend[ch] / 32.0;
spxmant = spxbandtable[spxbegf];
if (spxcoe[ch])
{
for (bnd = 0; bnd < nspxbnds; bnd++)
{
bandsize = spxbndsztab[bnd];
nratio = ((spxmant + 0.5*bandsize) / spxbandtable[spxendf]) – noffset[ch];
221
Advanced Television Systems Committee, Inc. Document A/52:2010
The following pseudo code indicates how to compute the banded RMS energy of the
translated transform coefficients for channel [ch].
Pseudo Code
spxmant = spxbandtab[spxbegf];
for (bnd = 0; bnd < nspxbnds; bnd++)
{
bandsize = spxbndsztab[bnd];
accum = 0;
for (bin = 0; bin < bandsize; bin++)
{
accum = accum + (tc[ch][spxmant] * tc[ch][spxmant]);
spxmant++;
}
rmsenergy[ch][band] = squareroot(accum / bandsize);
}
222
Digital Audio Compression Standard, Annex E 22 November 2010
Pseudo Code
if (chinspxatten[ch])
{
/* apply notch filter at baseband / extension region border */
filtbin = spxbandtable[spxbegf] - 2;
for (bin = 0; bin < 3; bin++)
{
tc[ch][filtbin] *= spxattentab[spxattencod[ch]][binindex];
filtbin++;
}
for (bin = 1; bin >= 0; bin--)
{
tc[ch][filtbin] *= spxattentab[spxattencod[ch]][binindex];
filtbin++;
}
filtbin += spxbndsztab[0];
/* apply notch at all other wrap points */
for (bnd = 1; bnd < nspxbnds; bnd++)
{
if (wrapflag[bnd])/* wrapflag[bnd] set during transform coefficient translation */
{
filtbin = filtbin – 5;
for (binindex = 0; binindex < 3; bin++)
{
tc[ch][filtbin] *= spxattentab[spxattencod[ch]][binindex];
filtbin++;
}
for (bin = 1; bin >= 0; bin--)
{
tc[ch][filtbin] *= spxattentab[spxattencod[ch]][binindex];
filtbin++;
}
}
filtbin += spxbndsztab[bnd];
}
}
223
Advanced Television Systems Committee, Inc. Document A/52:2010
224
Digital Audio Compression Standard, Annex E 22 November 2010
The following pseudo code indicates how the translated transform coefficients and pseudo-
random noise for a channel [ch] are blended. The function noise() returns a pseudo-random number
generated from a zero-mean, unity-variance noise generator.
Pseudo Code
spxmant = spxbandtable[spxbegf];
for (bnd = 0; bnd < nspxbnds; bnd++)
{
bandsize = spxbndsztab[bnd];
nscale = rmsenergy[ch][bnd] * nblendfact[ch][bnd];
sscale = sblendfact[ch][bnd];
for (bin = 0; bin < bandsize; bin++)
{
tctemp = tc[ch][spxmant];
ntemp = noise();
tc[ch][spxmant] = tctemp * sscale + ntemp * nscale;
spxmant++;
}
}
Pseudo Code
spxmant = spxbandtable[spxbegf];
for (bnd = 0; bnd < nspxbnds; bnd++)
{
bandsize = spxbndsztab[bnd];
spxcotemp = spxco[ch][bnd];
for (bin = 0; bin < bandsize; bin++)
{
tctemp = tc[ch][spxmant];
tc[ch][spxmant] = tctemp * spxcotemp * 32;
spxmant++;
}
}
225
Advanced Television Systems Committee, Inc. Document A/52:2010
E3.6.1 Overview
When transient pre-noise processing is used, decoded PCM audio located prior to transient
material is used to overwrite the transient pre-noise, thereby improving the perceived quality of
low-bit rate audio coded transient material. To enable the decoder to efficiently perform transient
pre-noise processing with minimal decoding complexity, transient location detection and time
scaling synthesis analysis is performed by the encoder and the information transmitted to the
decoder. The encoder performs transient pre-noise processing for each full bandwidth audio
channel and transmits the information once per frame. The transmitted transient location and time
scaling synthesis information are relative to the first decoded PCM sample contained in the audio
frame containing the bit stream information. It should be noted that it is possible for the time
scaling synthesis parameters contained in audio frame N, to reference PCM samples and
transients located in audio frame N + 1, but this does not create a requirement for multi-frame
decoding.
226
Digital Audio Compression Standard, Annex E 22 November 2010
4* transprocloc[ch] samples
transproclen[ch]
a) samples
trtransproclen[ch] + PN + TC1
c) Synthesis buffer
system parameter equal to 256 samples. The first sample of the time scaling synthesis buffer is
located (2*TC1 + 2*PN) samples before the location of the transient.
Figure E3.2c outlines how the time scaling synthesis buffer is used along with the
transproclen[ch] parameter to remove the transient pre-noise. As shown in Figure E3.2c the original
decoded audio data is cross-faded with the time scaling synthesis buffer starting at the sample
located (PN + TC1 + transproclen[ch]) samples before the location of the transient. The length of the
cross-fade is TC1 or 256 samples. Nearly any pair of constant amplitude cross-fade windows may
be used to perform the overlap-add between the original data and the synthesis buffer, although
standard Hanning windows have been shown to provide good results. The time scaling synthesis
buffer is then used to overwrite the decoded PCM audio data that is located before the transient,
including the transient pre-noise. This overwriting continues until TC2 samples before the
transient where TC2 is another time scaling synthesis system parameter equal to 128 samples. At
TC2 samples before the transient, the time scaling synthesis audio buffer is cross-faded with the
original decoded PCM data using a set of constant amplitude cross-fade windows.
The following pseudo code outlines how to implement the transient pre-noise time scaling
synthesis functionality in the decoder for a single full bandwidth channel, [ch].
Where:
win_fade_out1 = TC1 sample length cross-fade out window (unity to zero in value)
win_fade_in1 = TC1 sample length cross-fade in window (zero to unity in value)
227
Advanced Television Systems Committee, Inc. Document A/52:2010
win_fade_out2 = TC2 sample length cross-fade out window (unity to zero in value)
win_fade_in2 = TC2 sample length cross-fade in window (zero to unity in value)
Pseudo Code
/* unpack the transient location relative to first decoded pcm sample. */
transloc = transprocloc[ch];
/* unpack time scaling length relative to first decoded pcm sample. */
translen = transproclen[ch];
/* compute the transient pre-noise length using audio coding block first sample, aud_blk_samp_loc. */
pnlen = (transloc – aud_blk_samp_loc);
/* compute the total number of samples corrected in the output buffer. */
tot_corr_len = (pnlen + translen + TC1);
/* create time scaling synthesis buffer from decoded output pcm buffer, pcm_out[ ]. */
for (samp = 0; samp < (2*TC1 + pnlen); samp++)
synth_buf[samp] = pcm_out[(transloc – (2*TC + 2*pnlen) + samp)];
end
/* use time scaling synthesis buffer to overwrite and correct pre-noise in output pcm buffer. */
start_samp = (transloc – tot_corr_len);
for (samp = 0; samp < TC1; samp++)
{
pcm_out[start_samp + samp] =(pcm_out[start_samp + samp] * win_fade_out1[samp]) +
(synth_buf[samp] * win_fade_in1[samp]);
}
for (samp = TC1; samp < (tot_corr_len – TC2); samp++)
{
pcm_out[start_samp + samp] = synth_buf[samp];
}
for (samp = (tot_corr_len – TC2); samp < tot_corr_len; samp++)
{
pcm_out[start_samp + samp] =(pcm_out[start_samp + samp] * win_fade_in2[samp]) +
(synth_buf[samp] * win_fade_out2[samp]);
}
E3.7.1 Overview
An Enhanced AC-3 bit stream must consist of at least one independently decodable stream (type 0
or 2). Optionally, Enhanced AC-3 bit streams may consist of multiple independent substreams
(type 0 or 2) or a combination of multiple independent (type 0 and 2) and multiple dependent
(type 1) substreams.
228
Digital Audio Compression Standard, Annex E 22 November 2010
Figure E3.3 Bitstream with a single program of greater than 5.1 channels.
The reference enhanced AC-3 decoder must be able to decode independent substream 0, and
skip over any additional independent and dependent substreams present in the bit stream.
Optionally, Enhanced AC-3 decoders may use the information present in the acmod, lfeon,
strmtyp, substreamid, chanmape, and chanmap bit stream parameters to decode bit streams with a single
program with greater than 5.1 channels, multiple programs of up to 5.1 channels, or a mixture of
programs with up to 5.1 channels and programs with greater than 5.1 channels.
229
Advanced Television Systems Committee, Inc. Document A/52:2010
E3.7.4 Decoding a Mixture of Programs with up to 5.1 Channels and Programs with Greater than
5.1 Channels
When an Enhanced AC-3 bit stream contains multiple independent and dependent substreams,
each independent substream and its associated dependent substreams correspond to an
independent audio program. The application interface may inform the decoder which independent
audio program should be decoded by selecting a specific independent substream ID. The decoder
should then only decode the desired independent substream and all its associated dependent
substreams, and skip over all other independent substreams and their associated dependent
substreams. If the selected independent audio program contains greater than 5.1 channels, the
decoder should decode the selected independent audio program as explained in Section E3.7.2.
The default program selection should always be Program 1.
In some cases, it may be desirable to decode multiple independent audio programs. In these
cases, the application interface should inform the decoder which independent audio programs to
decode by selecting specific independent substream ID’s. The decoder should then decode the
desired independent substreams and their associated dependent substreams, and skip over all
other independent substreams and associated dependent substreams present in the bit stream. (See
Figure E3.5.)
E3.7.5 Dynamic Range Compression for Programs Containing Greater than 5.1 Channels
A program using channel extensions to convey greater than 5.1 channels may require two
different sets of compr and dynrng metadata words: one set for the 5.1 channel downmix carried by
independent substream 0 and a separate set for the complete (greater than 5.1 channel) mix. If a
decoder is reproducing the complete mix, the compr and dynrng metadata words carried in
independent substream 0 shall be ignored. The decoder shall instead use the compr and dynrng
metadata words carried by the associated dependent substream. If multiple associated dependent
substreams are present, only the last dependent substream may carry compr and dynrng metadata
230
Digital Audio Compression Standard, Annex E 22 November 2010
Figure E3.5 Bitstream with mixture of programs of up to 5.1 channels and programs of greater
than 5.1 channels.
words, and these metadata words shall apply to all substreams in the program, including the
independent substream.
The compre bit is used by the decoder to determine which dependent substream in a program is
the last dependent substream of the program. Therefore, the compre bit in the last dependent
substream of a program must be set to 1, and the compre bit in all other dependent substreams of
the program must be set to 0. Additionally, the compr2e, dynrnge, and dynrng2e bits for all but the last
dependent substream of a program must be set to 0. The compr2e, dynrnge, and dynrng2e bits for the
last dependent substream shall be set as required to transmit the proper compr2, dynrng, and dynrng2
words for the program.
Note that the compr2e, compr2, dynrng2e, and dynrng2 metadata words are only present in the bit
stream when acmod = 0.
Pseudo Code
if (output mode == 1/0 or 2/0) && (lfeoutput == disabled) && (lfemixlevcode == 1))
{
mix LFE into left with (LFE mix level - 4.5) dB gain
mix LFE into right with (LFE mix level - 4.5) dB gain
}
if (output mode == 1/0)
{
mix left into center with -6 dB gain
mix right into center with -6 dB gain
}
231
Advanced Television Systems Committee, Inc. Document A/52:2010
232
Digital Audio Compression Standard, Annex E 22 November 2010
233
Advanced Television Systems Committee, Inc. Document A/52:2010
234
Digital Audio Compression Standard, Annex E 22 November 2010
235
Advanced Television Systems Committee, Inc. Document A/52:2010
236
Digital Audio Compression Standard, Annex E 22 November 2010
237
Advanced Television Systems Committee, Inc. Document A/52:2010
238
Digital Audio Compression Standard, Annex E 22 November 2010
239
Advanced Television Systems Committee, Inc. Document A/52:2010
240
Digital Audio Compression Standard, Annex E 22 November 2010
241
Advanced Television Systems Committee, Inc. Document A/52:2010
242
Digital Audio Compression Standard, Annex E 22 November 2010
243
Advanced Television Systems Committee, Inc. Document A/52:2010
244
Digital Audio Compression Standard, Annex E 22 November 2010
245
Table E4.7 VQ Table for hebap 7; 16-bit two’s complement (Continued)
Index val[index][0] val[index][1] val[index][2] val[index][3] val[index][4] val[index][5]
(16-bit two’s (16-bit two’s (16-bit two’s (16-bit two’s (16-bit two’s (16-bit two’s
complement) complement) complement) complement) complement) complement)
102 0x1842 0x25e2 0x3b33 0x9fa6 0xa815 0xf061
103 0xf934 0xfdaf 0x0447 0xe19d 0x61e2 0x15e1
104 0x53a7 0xfe50 0xf986 0xe50e 0xfa62 0xc78a
105 0xe4e1 0x02bc 0xd095 0xfd17 0xa185 0x57c2
106 0x188f 0x0cd3 0x2afe 0x0f04 0x4af0 0x39bd
107 0xa81a 0x3baa 0x1543 0xf508 0xfc36 0xf2f1
108 0x0cb9 0xf184 0x1288 0xdf93 0x591e 0xd820
109 0x5f1a 0xae16 0x4d86 0x03db 0xd14a 0xe77b
110 0x0f42 0xb30b 0x3304 0xf9b7 0x48d1 0x1d2a
111 0x98d7 0xa7eb 0x3fb1 0x07de 0x2adf 0x4670
112 0xe481 0x122f 0xc61e 0x4933 0x3dad 0x0510
113 0x245e 0xf96f 0x394b 0xf302 0x67a7 0xd1b3
114 0x1660 0x171d 0x3458 0x2724 0xf744 0x9f80
115 0x06cd 0xe5b9 0x3197 0xaa07 0x0ff0 0x154a
116 0xf5c3 0x24b1 0x5297 0x9aae 0xf3a6 0xf61f
117 0x50c0 0x49ce 0xc98d 0x1b4e 0xdfbc 0x3dc3
118 0xa2f6 0x2baf 0xcab9 0x2e5c 0x3ead 0x0a46
119 0x47b9 0xd814 0x033d 0x0358 0xfc0e 0x009d
120 0x3840 0xedba 0x1421 0xcc16 0x94d6 0xd4ec
121 0x546d 0x2bf8 0x442d 0x1db4 0x334a 0xfe1c
122 0x0007 0x04d4 0x023d 0x1076 0x15c8 0xf3f7
123 0x0394 0xdc7c 0x0505 0xdd02 0x04a1 0x8fe5
124 0x5453 0x5c8f 0x4aac 0xf4bb 0xc836 0xdf0a
125 0x5b76 0xe7ef 0x32b2 0x0bf5 0xdb79 0x08bc
126 0xf402 0xe350 0xb154 0x169c 0x0246 0xfdd9
127 0xf067 0x013b 0xe1a3 0x2020 0x924e 0xcf4f
128 0x35c6 0xc403 0x4b05 0xaf70 0x32f3 0xb4d1
129 0x0ec1 0xff4f 0x1f5d 0xfc17 0x4594 0x142a
130 0xe374 0xef19 0xb950 0xfd94 0xfaba 0x3a54
131 0x39a4 0xfb3b 0xcded 0xc5b6 0xfddd 0x69f5
132 0x08ba 0x06ac 0x0acc 0x1528 0x1f32 0x9db5
133 0x0b39 0x0e34 0x0f98 0x14e0 0x279e 0x530b
134 0x0486 0x1503 0x01fc 0xd6ee 0x0122 0xf9b1
135 0x045a 0x60d5 0x40bf 0x9db0 0xfed6 0xf4f0
136 0xfbad 0xe800 0xf882 0xe191 0xf465 0xa514
137 0x0fb0 0x2a29 0x43a5 0xef0a 0xae0a 0xf2c9
138 0xee72 0xff31 0xd921 0xf209 0x1f0b 0x0482
139 0xe268 0x1fb5 0xc921 0x4256 0x98a7 0x946c
140 0xc4c4 0x3ee0 0xbe34 0xdd4a 0xa358 0x3e22
141 0x615a 0x1630 0xf8ae 0x01a4 0x0084 0x0075
142 0xfe06 0xb492 0xff3a 0x019c 0xfec9 0x02f0
143 0xf88e 0x0f8d 0xe1f8 0x40b6 0xb4a5 0xc67e
Digital Audio Compression Standard, Annex E 22 November 2010
247
Advanced Television Systems Committee, Inc. Document A/52:2010
248
Digital Audio Compression Standard, Annex E 22 November 2010
249
Advanced Television Systems Committee, Inc. Document A/52:2010
250
Digital Audio Compression Standard, Annex E 22 November 2010
251
Advanced Television Systems Committee, Inc. Document A/52:2010
252
Digital Audio Compression Standard, Annex E 22 November 2010
253
Advanced Television Systems Committee, Inc. Document A/52:2010
254
Digital Audio Compression Standard, Annex E 22 November 2010
255
Advanced Television Systems Committee, Inc. Document A/52:2010
256