0% found this document useful (0 votes)
29 views

Context-Based Adaptive Arithmetic Coding

Context-based adaptive arithmetic coding

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Context-Based Adaptive Arithmetic Coding

Context-based adaptive arithmetic coding

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Wang et al.

EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9


http://asmp.eurasipjournals.com/content/2013/1/9

RESEARCH Open Access

Context-based adaptive arithmetic coding in time


and frequency domain for the lossless
compression of audio coding parameters at
variable rate
Jing Wang*, Xuan Ji, Shenghui Zhao, Xiang Xie and Jingming Kuang

Abstract
This paper presents a novel lossless compression technique of the context-based adaptive arithmetic coding which
can be used to further compress the quantized parameters in audio codec. The key feature of the new technique is
the combination of the context model in time domain and frequency domain which is called time-frequency context
model. It is used for the lossless compression of audio coding parameters such as the quantized modified discrete
cosine transform (MDCT) coefficients and the frequency band gains in ITU-T G.719 audio codec. With the proposed
adaptive arithmetic coding, a high degree of adaptation and redundancy reduction can be achieved. In addition, an
efficient variable rate algorithm is employed, which is designed based on both the baseline entropy coding method of
G.719 and the proposed adaptive arithmetic coding technique. Experiments show that the proposed technique is of
higher efficiency compared with the conventional Huffman coding and the common adaptive arithmetic coding when
used in the lossless compression of audio coding parameters. For a set of audio samples used in the G.719 application,
the proposed technique achieves an average bit rate saving of 7.2% at low bit rate coding mode while producing
audio quality equal to that of the original G.719.
Keywords: Adaptive arithmetic coding, Time-frequency context, Lossless compression, Variable rate, MDCT

1. Introduction procedure, audio codec can achieve better performance on


Natural digital audio signals require large bandwidth for the coding efficiency compared with using the lossy com-
transmission and enormous amounts of storage space. pression alone.
Developments in entropy coding, i.e., Huffman coding [1,2] With the development of modern multimedia commu-
and arithmetic coding [3,4], have made it practical to nication, high-quality full-band speech and audio coding
reduce these requirements without information loss. They becomes significant and is needed more at low bit rate.
employ non-stationary statistical behavior which exploits Besides the lossy compression through parametric and
redundant information in the source signal. Compared with transform coding, many audio codecs introduce lossless
lossless compression methods, vector quantization methods coding algorithm to further compress the coding bits,
and lossy compression methods are adopted in audio cod- such as Moving Picture Experts Group-4 advanced audio
ing system to remove irrelevancy inaudible to humans and coding (MPEG-4 AAC) [5], MPEG unified speech and
to improve the coding efficiency. Many audio codecs only audio coding (USAC) [6], and ITU-T G.719 [7]. ITU-T
use lossy compression methods to quantize and encode the G.719 is a low-complexity full-band (20 Hz to 20 kHz)
audio parameters. In fact, when further combined with audio codec for high-quality speech and audio, which
lossless entropy coding for the quantization and encoding operates from 32 to 128 kbps [7]. As with most of the
transform audio coding, G.719 uses modified discrete
* Correspondence: [email protected] cosine transform (MDCT) to realize the time-frequency
Research Institute of Communication Technology (RICT), School of transform and to avoid artifacts stemming from the block
Information and Electronic Engineering, Beijing Institute of Technology,
Beijing 100081, China
boundaries. In the MDCT domain [8], statistical and

© 2013 Wang et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 2 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

subjective redundancies of the signals can be better under- continuously updated during encoding. Suppose one
stood, exploited, and removed in most cases. After the pre-defined set T of the past symbols, a so-called
lossy compression with vector quantization, removing context template, and one related set C = {0,…,C-1} of
irrelevancy inaudible to humans, the further compression the contexts are given, where the contexts are specified
performance is largely determined by the entropy coding by a modeling function F:T → C operating on the
efficiency of the quantized MDCT coefficients. In G.719, template T. For each symbol x to be coded, a conditional
Huffman coding is applied, and the coding procedure has probability p(x|F(z)) is estimated by switching between
to be driven by an estimated probability distribution of the different probability models according to the already
quantized MDCT coefficients along with the norms coded neighboring symbols z ∊ T. Generally speaking, the
(frequency band gains). context model makes use of the information related to the
Although Huffman coding removes some of the quan- encoded symbols and describes the mapping between a
tized MDCT coefficients' redundancy, it suffers from sequence of symbols and the assignment of the symbols'
several shortcomings which limit further coding gains. For probability distribution.
instance, in Huffman code, the distribution of MDCT Lately, arithmetic coding schemes based on bit-plane
coefficients is pre-defined from training statistics, and the context are also involved in the field of audio coding
adaptation mechanism is not flexible enough to combat such as USAC, like the applications in video coding. The
the possible statistics mismatch, such as the techniques spectral noiseless coding scheme is based on an arith-
of switching between different codebooks and multi- metic coding in conjunction with a dynamically adaptive
dimensional codebooks which are exploited in AAC. context. The noiseless coding is fed by the quantized
Furthermore, if the symbols are not grouped into blocks, spectral values and uses context-dependent cumulative
the symbols whose probabilities greater than 0.5 cannot frequency tables derived from the two previously
be efficiently coded due to the intrinsic limit of 1 bit per decoded neighboring two-tuple quantized spectral
symbol of Huffman code. Hence, the entropy coding coefficients. The coding separately considers the sign,
schemes based on the adaptive arithmetic coding [9] are the two most significant bits (MSBs) and the remaining
involved in the audio codec like MPEG USAC. The adap- least significant bits. The context adaptation is applied
tive model measures the statistics of source symbols and only to the two MSBs of the unsigned spectral values.
is updated continuously with the encoding and decoding The sign and the least significant bits are assumed to be
processes. In addition, the context from the point of view uniformly distributed.
of the neighboring symbols is taken into account in order By now, entropy coding schemes based on arithmetic
to further improve the coding efficiency. coding are quite frequently involved in the field of none
For the context, it is firstly introduced in image and block-based video coding. The CABAC design is based
video coding. Here, context-based adaptive binary arith- on the key elements of binarization, context modeling,
metic coding (CABAC) in H.264/AVC [10] is taken as an and binary arithmetic coding. Binarization enables
example. CABAC is one of the two entropy coding efficient binary arithmetic coding via a unique mapping
methods of the new ITU-T/ISO/IEC standard for video of non-binary syntax elements to a sequence of bits,
coding, i.e., H.264/AVC, and plays a very important role in which are called bins. Now, the arithmetic coding as a
the efficiency improvement of the video coding. Through lossless data compression scheme also plays an essential
combining an adaptive binary arithmetic coding technique role in the chain of processing of audio signal coding.
with context modeling of the neighboring symbols in The correlation in bit plane of the quantized MDCT
binary bit stream and macro block, a high degree of coefficients is employed in the USAC [11]. However, the
adaptation and redundancy reduction is achieved. The concept of context model for the adaptive arithmetic
encoding process of CABAC consists of three elementary coding has been neither deeply investigated nor widely
steps: binarization, context model selecting, and adaptive used in audio coding especially for the efficient compres-
binary arithmetic encoding. The last step consists of prob- sion by setting up context model from the point of view
ability estimation and binary arithmetic encoder. of the quantized audio parameters. When using the
In the second step of CABAC [10], a context model is arithmetic coding to compress the coding parameters
chosen, and a model probability distribution is assigned directly, the probability estimation based on the bit-
to the given symbols. In the subsequent coding stage, plane context model may not be suitable. In this
the binary arithmetic coding engine generates a situation, the correlation of audio coding parameters
sequence of bits that represent the symbols. The model leading to lower information entropy could be consid-
determines the coding efficiency in the first place, so it ered both in time and frequency domain which can be
is of paramount importance to design an adequate deeply investigated in theory and carefully designed in
model that explores the statistical dependencies to a practice. Thus, a novel time-frequency plane context
large degree. At the same time, this model needs to be model will be given in this paper, and the adaptive
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 3 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

arithmetic coding will be used directly for the audio the source is correlated, this bound can be further
coding parameters. Furthermore, variable coding scheme lowered by taking into account a higher order of the
is introduced to advance the efficiency. entropy like the conditional entropy
In our work on arithmetic coding, the entropy coding
method of an adaptive arithmetic coding technique with J−1   I−1     
HðjSÞ ¼ − ∑ p sj ∑ p xi sj Þlog2 p xi sj ÞÞ; ð2Þ
a time-frequency plane context model (both time and j¼0 i¼0
frequency domain are taken into account) was devel-
oped, which has led to the improvement of coding the where sj, a so-called context, is a specific state of the
quantized MDCT coefficients and the frequency band source and J represents the total number of the consid-
gains. The adaptive arithmetic coding will be applied to ered states. For the application of the so-called context,
further compress the coding parameters in audio codec the distribution of the symbols (x0, ……, xI−1) is more
frame by frame and the probability estimation of which concentrated in the vicinity of the encoding symbol,
will make use of the inter-frame (time domain) correl- which means the probability of the encoded symbol can
ation and the intra-frame (frequency domain) correlation be increased through establishing the context model.
of the coding parameters. In fact, most of alternative Consequently, a suitable context design considering the
approaches to audio coding are on the basis of MDCT. correlation of the source means the lower entropy. In
One of its main distinguishing features is related to the the applications of audio coding, because of the similar-
time-frequency plane: Given a source of the quantized ity of the sequential frames as well as the adjacent
transform coefficients for instance, it was found to be frequency bands, some audio parameters like frequency
useful to utilize the correlation in the time domain and band gains and frequency spectral values have the cor-
frequency domain to increase the probability of the relation in time, and frequency domain and the context
encoding symbol for arithmetic coding. The experiment model with the neighboring parameters can be designed
on G.719 is carried out as an application of the proposed to make the entropy of the coding source lower, thus the
technique, in which the compatibility with the G.719 compression efficiency can be higher. In Sections 2.3
baseline is required. The good compression performance and 2.4, the proposed context model and the way to
is achieved. Adopting this method, the allocated bits for utilize it will be mentioned in theory. The practical
coding the quantized parameters vary in consecutive behavior and design in the case of G.719 codec will be
analysis frames, while the quality of decoded audio investigated in Section 3.
remains constant. Therefore, the average bit rate is lower
than that of the fixed bit rate codec while sustaining the 2.2 Integer arithmetic coding
same audio quality. Hence, a variable rate operation is The performance of arithmetic coding is optimal with-
introduced into the novel context-based adaptive arith- out the need for blocking of input data. It encourages a
metic coding algorithm, which achieves better perform- clear separation between the probability distribution
ance in terms of the coding efficiency. model and the encoding of information. For example,
This paper is organized as follows. Section 2 outlines the model may assign a predetermined probability to
the novel adaptive arithmetic coding of the parameters each symbol. These probabilities can be determined by
produced in the audio encoding. Section 3 describes in counting frequencies of representative samples to be
detail the novel techniques and the underlying ideas of transmitted. Such a fixed model is communicated in
our entropy coding modules. Section 4 presents the advance for both encoder and decoder. Alternatively, the
experimental results and the performance comparison. probabilities that an adaptive model assigns may change
Section 5 concludes this paper with a summary. as each symbol is transmitted. The encoder's model
changes as each symbol is transmitted and the decoder's
2. Modules of the novel adaptive entropy coding model changes as each symbol is received. If the context
2.1. Preliminary principle is involved, the adaptive model is based on the context.
The information entropy of a discrete memoryless In the arithmetic coding, a message is represented by
source X which has different symbols (x0, ……, xI−1) is an interval of real numbers between 0 and 1. As the
given by [12,13] message becomes longer, the interval becomes smaller,
and it is necessary for the decoder to know the final
I−1 interval at the end of the arithmetic coding. However,
H ðX Þ ¼ − ∑ pðxi Þlog2 ðpðxi ÞÞ; ð1Þ
i¼0 the integer arithmetic coding [14-16] can be employed
without knowing the final interval, i.e., the decoding
where p(xi) is the probability of the symbol xi. algorithm can be carried out even if the encoding pro-
The entropy establishes the lower bound of the aver- cedure has not been completed. Meanwhile, the interval
age bit rate achieved by source coding. However, when required to represent the message can grow in the
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 4 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

process of encoding. Integer arithmetic coding [14] is subinterval is, the smaller bits the coder output. Since the
done by subdividing the current interval initialized to context model can be established to increase the probabil-
[0, N-1] according to the symbol probabilities, where N ity of the encoded symbol, the subinterval representing
is the upper limit of the 32-bit integer in the computer. the probability will correspond to be enlarged.
The probabilities in the model are represented as integer
frequency counts [16], and the cumulative counts are
stored in the array c(). When a symbol comes each time, 2.3. Time-frequency context model
we take its subinterval as current interval. To put it sim- Generally, in the current applications, the context consists
ply, the subinterval to be encoded is represented in the of neighbors of the current symbol to be encoded. In the
form [l, u], where l is called the base or starting point of application of CABAC, the context models the neighboring
the subinterval and u is the ending point of the subinter- symbols of the binary bit, and in the application of USAC,
val. The subintervals in the arithmetic coding process the adaptive arithmetic coding is established based on the
are defined by the equations as follows: bit plane. In this paper, it deals with the correlation of
neighboring parameters in the transform audio coding and
Φ0 ¼ ½l0 ; u0  ¼ ½0; N−1; ð3Þ some basic rules are designed to help with selecting the
proper context model for the adaptive arithmetic coding.
The time-frequency context associated with the current
Φ
 i ¼ ½li ; ui  ¼  coded element is shown in Figure 1 which is different from
cðxi−1 Þ cðxi Þ the bit-plane context used in CABAC and USAC. In the
li−1 þ ðui−1 −li−1 þ 1Þ; li−1 þ ðui−1 −li−1 þ 1Þ−1 :
cðxI−1 Þ cðxI−1 Þ proposed model, the time-frequency context-based arith-
i ¼ 0; 1; ……; I−1 metic coding only makes use of the neighboring parameters
in the past frames when considering the time domain, so
ð4Þ
there is no extra algorithmic delay when the arithmetic
coder accesses to the time-frequency plane, shown in
The properties of the intervals guarantee that 0 ≤ li ≤ Figure 1.
li + 1 < N, and 0 ≤ ui ≤ ui + 1 < N. The expression A family of contexts is defined by means of the function
cðxi Þ−cðxi−1 Þ
cðxI−1 Þ is equivalent to p(xi) in Equation 1. To have T(m). The parameter m represents the number of symbols
incremental output, i.e., coded word, during the encod- lying in the vicinity of the present coded symbol with
ing process and to resolve the need for high-precision 0 ≤ m ≤ 2. For each symbol C to be coded, the conditional
computations, the algorithm is performed through three probability p(C|T(m)) is estimated by switching between
mappings as follows. ‘Scale’ is defined as an intermediate different probability models according to the already coded
variable in the calculation process to count the number neighboring symbols. In Figure 1, T(0) represents no
of the three mappings, which represents the bit follow- context, T(1) = A or B, and T(2) = A or B. A represents the
ing the previous output bit in steps I and II. context in the frequency domain, while B represents the
context in the time domain, and they correspond with the
 I: If the subinterval [l, u] lies entirely in the lower quantized parameters in the transform audio codec. Their
half part of [0, N − 1], i.e., [0, N/2 − 1], then the
coder emits a bit 0 and scale outputs a bit 1 until it
is successively reduced to 0, and linearly expands
[l, u] to [2l, 2u + 1]. Scale is reset to 0.
 II: If the subinterval [l, u] lies entirely in the upper
half part of [0, N − 1], i.e., [N/2, N − 1], then the
coder emits a bit 1 and scale outputs a bit 0 until it
Time

is successively reduced to 0, and linearly expands


[l, u] to [2l − N, 2u − N + 1]. Scale is reset to 0.
A C
 III: If the subinterval [l, u] lies entirely in the
interval [N/4, 3N/4 − 1], then the coder linearly B
expands [l, u] to [2l − N/2, 2u − N/2 + 1] and
increases the value of scale by 1.
Frequency
The three mapping steps will be ended until the interval Figure 1 A context template consisting of two neighboring
[l, u] meets with none of the above looping conditions. As elements A and B. Which are on the left and on the bottom of the
current element C, respectively. The x-axis represents frequency, and
the subinterval shortens, the number of loops increases
the y-axis represents time.
which lead to more bits output. Thus, the larger the
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 5 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

conditional probabilities are estimated by different methods 2.3.2. Context model in the time domain
which will be introduced in the following sections. When the neighboring elements are correlated and the
current symbol C distributes around the encoded symbol
2.3.1. Context model in the frequency domain B, i.e., C ∊ (B − δ, B + δ), where δ represents the rescaling
When the neighboring elements satisfy the following parameter, the model probability distribution is reassigned
equations to the current symbol C.
For the m-ary (m is the number of symbols) adaptive
x ¼ fx0 ; x1 ; ……; xI−1 g; ð5Þ arithmetic coding, the encoded symbol B is taken as the
center; 2δ symbols, which are located in the vicinity of
s ¼ fs0 ; s1 ; ……; sJ−1 g; ð6Þ B, would be chosen to add a large number λ on the basis
of the original frequency, leading to rearrange the distri-
cðxi jsj Þ−cðxi−1 jsj Þ cðxi Þ−cðxi−1 Þ bution of the model. λ is the cumulative counts of all
> ; ð7Þ
cðxI−1 jsj Þ cðxI−1 Þ symbols which can change the subinterval adaptively.
That is,
where x represents the symbols and s represents the
context. Then, the context dependence in the frequency f ðxi Þ ¼ cðxi Þ−cðxi−1 Þ; ð13Þ
domain is given a primary consideration. This guaran-
tees a larger subinterval, which we explain as follows: I−1
λ ¼ ∑ f ðxi Þ; ð14Þ
i¼0
ðu−l þ 1Þ  cðxi−1 jsj Þ
l1 ¼ l þ ð8Þ 
cðxI−1 jsj Þ ′ f ðxi Þ þ λ; i ¼ B−δ þ 1; …; B; …; B þ δ
f ð xi Þ ¼ ;
f ðxi Þ; other
ðu−l þ 1Þ  cðxi jsj Þ ð15Þ
u1 ¼ l þ −1; ð9Þ
cðxI−1 jsj Þ
where f(xi) is the original frequency counts of the symbol
ðu−l þ 1Þ  cðxi−1 Þ and f′(xi) represents the final frequency counts distribu-
l2 ¼ l þ ; ð10Þ
cðxI−1 Þ tion assigned to drive the arithmetic coder. The subin-
terval is changed to (l′1, u′1)
ðu−l þ 1Þ  cðxi Þ
u2 ¼ l þ −1: ð11Þ 0
cðxI−1 Þ 0 ðu−l þ 1Þ  c ðxi−1 Þ
l1¼lþ ; ð16Þ
c0 ðxI−1 Þ
In the case of Equation 7, the result u1 − l1 > u2 − l2
can be obtained. Since the subinterval calculated by the 0 ðu−l þ 1Þ  c ðxi Þ
0

conditional probability is larger, smaller bits are obtained. u1¼lþ −1; ð17Þ
c0 ðxI−1 Þ
The conditional entropy can be smaller than the entropy.
    I−1 ðu−l þ 1Þ  cðxi−1 Þ
XJ−1
c sj −c sj−1 X cðxi jsj Þ−cðxi−1 jsj Þ l2 ¼ l þ ; ð18Þ
HðXj SÞ ¼ −   log2 cðxI−1 Þ
j¼0
c ð s J−1 Þ i¼0 c xI−1 jsj
cðxi jsj Þ−cðxi−1 jsj Þ ðu−l þ 1Þ  cðxi Þ
ð    Þ u2 ¼ l þ −1: ð19Þ
c xI−1 sj cðxI−1 Þ
X
I−1
cðxi Þ−cðxi−1 Þ cðxi Þ−cðxi−1 Þ
< H ðX Þ ¼ − log2 ð Þ: As f′(xi) increases for i = B − δ + 1, …, B, …, B + δ,
i¼0
cðxI−1 Þ cðxI−1 Þ 0 0
c ðxi Þ−c ðxi−1 Þ
the inequality > cðxci Þ−cðxi−1 Þ
can be obtained.
ð12Þ c0 ðxI−1 Þ ðxI−1 Þ

The subinterval u′1 − l 1 is then larger than u2 − l2
The length of the context-based sequence is defined as under the above condition. Consequently, the higher the
the order of the context model. A key issue in context encoding symbol's frequency counts value is, the better
modeling for the input symbol sequence is to balance the designed coding scheme performs with the larger
the usage of the model order and the model cost. Higher subinterval of the encoding symbol.
order means higher cost of the computation. To solve As to the context model in time domain, we only con-
this problem, one order context model [17] can be sider one state context which models the past symbol B
chosen in the frequency domain regarding its good close to the current symbol C because the state before B
compression and low complexity in the audio coding has a weaker correlation with C while more states mean
application. higher complexity.
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 6 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

3. Scheme of the novel context adaptive 3.2 The novel structure of G.719
arithmetic coding in G.719 In this section, the novel context-based adaptive arithmetic
3.1 State-of-the-art techniques of G.719 coding is introduced to improve the coding scheme in
ITU-T G.719 codec [7] makes use of the transform coding G.719, and the probability statistic of the entropy coding is
technique for low-complexity full-band conversational established for the transient and the stationary audio
speech and audio, operating from 32 up to 128 kbps. The separately. The key elements will be discussed in the next
input signal sampled at 48 kHz is firstly processed through section.
a transient detector based on the energy ratio between the Figure 2 shows the basic structure of the proposed
short-term energy and the long-term energy. An adaptive method. The input signals (sampled at 48 kHz) are firstly
window switching technique is used depending on the processed through a transient detector [7] to be classified
detection of transient and stationary signal. Then, time do- into transient and stationary signals, which are assigned
main aliasing and MDCT techniques are designed to with different statistical models for the adaptive arithmetic
process the different kind of input signal. The transformed coding. After the modified discrete cosine transform, the
spectral coefficients are grouped into subbands of unequal obtained spectral coefficients are firstly grouped into
lengths. The gain of each band (i.e., norm) is estimated, and subbands of unequal lengths. Then, the norm of each band,
the resulting spectral envelope consisting of the norms of i.e., the frequency band gain, is estimated and the resulting
all bands is quantized and encoded. The quantized norms spectral envelope, consisting of the norms of all bands, is
are further adjusted based on adaptive spectral weighting quantized and encoded. Regarding the good correlation of
and used as the input for bit allocation. The spectral coeffi- the quantized norms of neighbor bands, we apply the time-
cients are normalized by the quantized norms, and the nor- frequency context-based adaptive arithmetic coding. The
malized MDCT coefficients are then lattice vector time-frequency context aims to remove the redundancies
quantized and encoded based on the allocated bits for each in the frequency domain and in the time domain.
frequency band. In the process of bit allocation, Huffman When the coding procedure of the quantized norms is
coding is applied to encode the indices of both the encoded over, the coefficients are normalized by the quantized
spectral coefficients and the encoded norms. The saved bits norms, and then, the normalized spectral coefficients are
by Huffman coding are used for the following bit allocation lattice vector quantized according to the bit allocation
and the noise adjustment in order to generate better audio which leads to different dynamic range in subbands. For
quality. Finally, the fixed bit stream is obtained and trans- the so-called bit allocation, the maximum number of bits
mitted to the decoder. assigned to each normalized transform coefficient is set to

Input
signal
Transient
Transform
detector

Norms
estimation

Adaptive Spectrum
arithmetic coding normalization and
based on context lattice
quantization

Norms Bit- Coding according


adjustment allocation to the different bit

1bit 2-4bit 5-9bit

Adaptive Adaptive Adaptive


arithmetic arithmetic coding arithmetic coding
coding based on context based on context

Transient Stationary Transient Stationary Transient Stationary


model model model model model model

Figure 2 The basic structure of the proposed method.


Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 7 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

Rmax = 9 in G.719 by default. Thus, nine statistical models


for the adaptive arithmetic coding to be updated are Conditional probability
0.5
employed, and all bands will be rearranged in order from probability
low band to high band for the arithmetic coding so that the
quantized coefficients in the subbands with the same allo-
0.4
cated bits are encoded continuously. Considering that the
1-bit subband, the 2- to 4-bit subband, and the 5- to 9-bit

Probability
subband have different correlations in the time domain and 0.3
in the frequency domain, we use different context models
when the bit allocation is different. The subbands of 5 to
9 bits are designed to exploit the correlation in the time 0.2
domain for compression, while the subbands of 2 to 4 bits
make good use of the correlation in frequency domain.
Finally, the subband of 1 bit uses the normal adaptive 0.1
arithmetic coding.

0
0 5 10 15
3.3 Time-frequency context model in G.719 Index of quantized MDCT coefficients
Through a large number of experiments, we have found Figure 3 The probability and conditional probability of the
that the quantized norms and the quantized MDCT coeffi- encoding symbol MDCT indexes. The x-axis represents the index
cients with 2 to 4 bits have the context statistical character- of quantized MDCT coefficients, and the y-axis represents the
probability of the encoding symbol. The solid line represents the
istic in the frequency domain, while the quantized norms
conditional probability, and the dotted line describes the probability.
and the quantized MDCT coefficients with 5 to 9 bits have
the characteristic in the time domain, as is discussed in
Section 2.3.
( ðt Þ ðt Þ
In the frequency domain, if the spectral parameters have 0 Di;j ; 0≤Di;j ≤2b−1 −1
ðt Þ
the correlation, the conditional probability of the current Di;j ¼ ðt Þ ðt Þ ;
encoding symbol will be larger than its probability. For the 2b−1 −Di;j 2b−1 < Di;j ≤2b −1
2- to 4-bit subbands of the quantized MDCT coefficients, ð21Þ
Figure 3 describes an example of the probability p(C) ( ðtþ1Þ ðtþ1Þ
(C∈{0,…15}) and the conditional probability p(C|A) (A = 0 ðtþ1Þ 0 Di;j ; 0≤Di;j ≤2b−1 −1
is the neighboring encoded symbol) of the current encod- Di;j ¼ ðtþ1Þ ðtþ1Þ ;
2b−1 −Di;j 2b−1 < Di;j ≤2b −1
ing symbol C, i.e., the code indexes of the quantized
MDCT coefficients. The solid line represents the condi- ð22Þ
tional probability p(C|A), and the dotted line describes the
ðt Þ
probability p(C). It can be found in Figure 3 that the con- where Di;j represents subband index with 1 ≤ j ≤ 44 and j
ditional probability distribution p(C|A) is more concen- means the number of the subbands. The subbands have
trated in the vicinity of the current encoding index 0 than different sizes n = 8, 16, 24, 32 that increase with the
the probability p(C), and the relationship of the two kinds increasing frequency. The character b represents the bits
of probability (shown by dotted line and solid line) satis- allocated for the current frame and 2b is just the number
fies the Equation 7. of symbols for the m-ary (m symbols) adaptive arithmetic
Thus, for the 2- to 4-bit subbands, the context in the coding, i.e., m = 2b.
frequency domain is defined as the encoded symbol A If γj(n) ≥ 0.5, then the context in the time domain is
before the input one C, as is shown in Figure 1. Then, the employed in the present adjacent subbands with the same
conditional cumulative counts c(C|A) can be obtained. Let bit allocation. By statistical analysis, we have found that
c(C|A) be the estimated conditional cumulative counts to the audio coding parameters for music signal have higher
drive the integer arithmetic coder. correlation than the speech signal in time domain. As to
In the time domain, γj(n) is defined as the correlation the quantized norms in G.719, a large percentage, 98.9%,
coefficient in the previous adjacent subbands with the of all the frames have the correlation (i.e., the correlation
same bit allocation coefficient is higher than 0.5) between the adjacent frames
 0 ðtÞ 0 ðtþ1Þ  which enables larger compression.
n D −D Given the encoded symbol in the previous frame,
∑i¼1 i;j 2b =8i;j
referred to as B, there is a large possibility of the input
γ j ðnÞ ¼ 1− ; ð20Þ
n symbol C distributing around B. In G.719, for the m-ary
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 8 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

(m symbols) adaptive arithmetic coding, the encoded together with the original Huffman coding module.
symbol B is the center; m/2 symbols, which are located in Figure 5 shows the block diagram of the variable rate
the range of B and m − B (provided by − B to avoid nega- scheme.
tive symbol), would be chosen to add λ ¼ ∑mi¼1 f ðiÞ on the The bit rate is determined through three steps. The
basis of the original frequency, and δ = m/8, which can module of Huffman coding is kept to calculate the saving
guarantee that the probability of half of all symbols is bits and prepare for the bit allocation. Let Sum be the total
increased. bits at a fixed bit rate. Firstly, the norms are coded by both
That is, the original Huffman coding consuming h1 bits and the
8 context-based adaptive arithmetic coding consuming a1
< f ðiÞ þ λ; i ¼ B−m=8 þ 1; …; B; …; B þ m=8

f ðiÞ ¼ f ðiÞ; others :
bits simultaneously. Compared to the Huffman coding,
: the context-based adaptive arithmetic coding can save bits
f ðiÞ þ λ; i ¼ ðm−BÞ−m=8 þ 1; …; m−B; …; ðm−BÞ þ m=8
ð23Þ L1 = h1 − a1. The remaining bits num1 = Sum − h1 are
used for bit allocation of the quantized MDCT coeffi-
As is depicted in Figure 4 which gives the behavior of cients. In the second step, the subbands with different bits
MDCT parameters with 5 to 9 bits, the dotted line de- assigned by the bit allocation are encoded by the proposed
scribes the original frequency counts (f(i)in formula (23)) of adaptive arithmetic coding. The quantized MDCT coeffi-
all symbols, while the solid line presents the final frequency cients are also encoded by Huffman coding consuming h2
counts (f′(i)in formula (23)) of all symbols. It is shown that bits to calculate the remaining bits num2 = Sum − h1 −
the solid line is higher than the dotted line which indicates h2 used for the noise level adjustment. Compared to the
that the subinterval for i = B − m/8 + 1, … , B, … , B + m/8 Huffman coding, the number of bits used for coding the
will be larger resulting from their higher final frequency quantized MDCT coefficients with the context-based
counts. After the operation of encoding, the model adaptive arithmetic coding is a2, which can save bits
frequency distribution returns to the original probability L2 = h2 − a2. Finally, the noise level is adjusted according
distribution, then its updating takes place. to num2. The total bits and the bits used for the bit alloca-
tion and noise level adjustment in the improved encoder
3.4 Variable rate in G.719 remain the same as those in the primary fixed rate G.719;
Variable rate coding methods [18-20] are important for hence, the saving bits L1 + L2 (provided by the context-
source compression, and they have been studied for based adaptive arithmetic coding compared to the original
many years especially in speech codec. This paper Huffman coding) lead to the variable rate of G.719. To
introduces an efficient variable rate algorithm for G.719 ensure the correct decoding, the header in G.719 [7]
based on the proposed adaptive arithmetic coding which specifies the number of bits used for encoding is
changed to indicate variable bits instead of fixed bits.
Final frequency counts of m symbols
Original frequency counts of m symbols
4. Experimental results
60 4.1 Bit rate comparison
In this section, the performance of the variable rate coder,
50 which employs the novel context-based adaptive arith-
Frequency counts

metic coding, is evaluated from the point of view of the


40 average bit rate. The samples used in the bit rate measure-
ment are ten speech and 29 music, including three
30 classical music, ten mixed music (music and speech),
three orchestras, one folk, one guitar, two harps, two per-
20 cussions, one pop, three saxophones, and three trumpets.
Each sample is sampled at the rate of 48 kHz and lasts
10 10 s. Table 1 summarizes the average bit rates of the im-
proved variable rate G.719 at low bit rate mode compared
with those of the fixed rate G.719 at 32 kb/s.
5 10 15 20 25 30
The index of quantization MDCT coefficients As is shown in Table 1, our scheme achieves an average
Figure 4 The estimation of the probability by the context bit rate from 29.4817 to 29.9606 kb/s at low bit rate
model in time domain. The x-axis represents the index of coding mode, compared with the fixed rate 32 kb/s. The
quantization MDCT coefficients, and the y-axis represents frequency coding gains of the three types of signal have a range from
counts. The solid line represents the final frequency counts of all 6.4% to 7.9%, and it shows a coding gain on average 7.2%
symbols and the dotted line indicates the final frequency counts of
for all the test samples. Particularly, the bit rate saving for
all symbols.
music signal is the largest compared with the mixed music
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 9 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

Quantized
norms

Total Bits Total Bits


Sum Sum
Save bits
Adaptive arithmetic L1=h1-a1
Huffman
coding based on
coding(h1 bits)
context(a1 bits)

Remaining bits
num1=Sum-h1
Bit-allocation for the quantized
MDCT coefficients

Quantized MDCT
coefficients MUX
Save bits
Adaptive arithmetic L2=h2-a2
Huffman
coding based on
coding(h2 bits)
context(a2 bits)

Remaining bits
num2=Sum-h1-h2 Noise level
adjustment

Figure 5 The basic structure of the variable rate encoder with introducing the context-based adaptive arithmetic coding.

signal and speech signal because of its good correlation in coding, the context-based adaptive arithmetic coding has
time domain and frequency domain. a better performance. The lower the bit rate is, the higher
Table 2 shows the coding modes in G.719, and we the average coding gain is achieved when the context-
carried out experiments at all coding modes. As the bit based adaptive arithmetic coding is compared with the
rate increases, the context-based adaptive arithmetic cod- common adaptive arithmetic coding. The test shows a
ing scheme achieves a better coding gain compared with gain of 2.3% with the context-based adaptive arithmetic
the original Huffman coding especially for the highest bit coding at the lowest bit rate (coding mode 1 in Table 2).
rate coding mode. The test shows an average coding gain
of 9.1% at the highest bit rate (coding mode 7 in Table 2). 4.2 Investigation of the short-term coding efficiency
Specifically, music processing shows an average coding The bit rate comparison in Section 4.1 shows the overall
gain of 10.9% at the highest bit rate, which indicates the bit rate reduction that reflecting the long-term average of
good statistical characteristic for pure music. coding efficiency performance. In order to investigate the
In order to have a good knowledge of the performance short-term coding efficiency of the proposed variable rate
of the proposed adaptive arithmetic coding, we also car- arithmetic coding, the bit allocation is evaluated frame by
ried out experiments to compare the different improved frame, and the performance is shown in Table 3.
coders. Figure 6 presents the bit rate of G.719 fixed rate As it can be seen from Table 3, the minimum bits of
coder with Huffman coding, G.719 variable rate coder each frame in the variable rate G.719 are less than that in
with the adaptive arithmetic coding and the context-based the fixed rate G.719, and the maximum bits of each frame
adaptive arithmetic coding at different coding modes, as is
shown in Table 2. Compared with the adaptive arithmetic Table 2 Coding modes in G.719
Coding mode Fixed rate (kb/s) Variable rate (kb/s)
1 32 29.6512
Table 1 Average bit rate of different signal type
2 48 44.8484
Signal type Average bit rate in Average bit rate in
fixed rate G.719 (kb/s) variable rate G.719 (kb/s) 3 64 59.8988
Music 32 29.4817 4 80 74.5056
Mixed music 32 29.6640 5 96 88.6831
Speech 32 29.9606 6 112 102.5974
Total 32 29.6512 7 128 116.3237
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 10 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

Table 4 The average number of bits when coding the


Variable bit rate G.719 quantized norms
120 with context-based adaptive
arithmetic coding Coding bits for Huffman Adaptive Context-based
110 Variable bit rate G.719 the quantized coding arithmetic adaptive arithmetic
with adaptive arithmetic coding norms coding coding
100 Fixed bit rate G.719 with
Huffman coding Modes 1 to 7 147.1909 132.0607 119.8233
Bit rate(kb/s)

90

80
As it can be seen from the two tables, the coding bits
70 required for the quantized norms and the quantized
60 MDCT coefficients are the least using the proposed
context-based adaptive arithmetic coding. Since the
50
energy of all subbands will not change at different coding
40 mode, the coding bits of the quantized norms remain the
30 same along the different modes.
1 2 3 4 5 6 7
Coding mode
In order to further understand the compression degree
between the adaptive arithmetic coding and Huffman
Figure 6 Average bit rate at different coding modes. The x-axis
represents the coding modes (1 to 7), and the y-axis represents the coding, and the compression degree between the context-
average bit rate (kb/s). The solid line represents the fixed bit rate of based adaptive arithmetic coding and Huffman coding,
G.719 using Huffman coding, the dotted line represents the variable the compression percentage can be calculated according
bit rate of G.719 using the adaptive arithmetic coding, and the dash to the following formulas:
line represents the variable bit rate of G.719 using the context-based
adaptive arithmetic coding. h bits−a bits
Δ1 ¼  100%; ð24Þ
h bits
in the variable rate G.719 are more than that in the fixed h bits−ca bits
rate G.719 only because the context model tends to be Δ2 ¼  100%; ð25Þ
h bits
stable after the first several input frames. Through statis-
tical analysis, there is an extraordinarily large percentage, where h_bits represents the bits for encoding the audio
99.1%, of all the frames needing less than the fixed parameters by Huffman coding, a_bits represents the bits
640 bits, which guarantees the short-term coding effi- for encoding parameters by the adaptive arithmetic cod-
ciency of the proposed variable rate arithmetic coding. ing, and ca_bits represents the bits for encoding parame-
Since the good correlation in the time domain and in the ters by the proposed context-based adaptive arithmetic
frequency domain, the minimum bits in the variable rate coding. Tables 6 and 7 present the compression percent-
G.719 for music signal have the best performance. age of the quantized norms and the quantized MDCT
coefficients which are calculated by Equations 24 and 25.
4.3 The performance comparison of different entropy As it can be seen from Tables 6 and 7, the compression
coding percentage of the quantized norms is higher than that of
A comparative study of different entropy coding schemes the quantized MDCT coefficients. Since the variation of
will be presented in this section, which includes Huffman the quantized norms is less than that of the quantized
coding, the adaptive arithmetic coding, and the context- MDCT coefficients, the conditional probability of the
based adaptive arithmetic coding, respectively. Table 4 encoding symbol of the quantized norms is bigger than
shows the average number of bits to code the quantized that of the quantized MDCT coefficients. Moreover, the
norms using different coding schemes under different
Table 5 The average number of bits when coding the
coding modes, while Table 5 presents the average number
quantized MDCT coefficients
of bits to code the quantized MDCT coefficients using
Coding bits for Huffman Adaptive Context-based
different coding schemes under different coding modes. the quantized coding arithmetic adaptive arithmetic
MDCT coefficients coding coding
Table 3 The performance of bit allocation of each frame Mode 1 429.8683 418.1427 417.3993
Signal Bits of each frame The minimum bits The maximum bits Mode 2 720.0894 695.6415 694.4265
type in fixed rate in variable rate in variable rate
Mode 3 1,011.484 972.396 970.7839
G.719 (bits/frame) G.719 (bits/frame) G.719 (bits/frame)
Mode 4 1,305.252 1,250.046 1,247.866
Music 640 495 725
Mode 5 1,604.597 1,532.503 1,530.054
Mixed 640 540 718
music Mode 6 1,860.674 1,773.074 1,770.578
Speech 640 550 714 Mode 7 2,239.736 2,131.121 2,128.619
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 11 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

Table 6 The compression percentage of the quantized


30
norms Quantized norms coded by adaptive arithmetic coding
Quantized norms coded by context-based adaptive arithmetic coding
Compression Adaptive Context-based adaptive Quantized MDCT coefficients coded by adaptive arithmetic coding
percentage (%) arithmetic coding arithmetic coding Quantized MDCT coefficients coded by context-based adaptive arithmetic coding

Compression percentage(%)
25
Modes 1 to 7 10.27932 18.59329

correlation in the time domain of the quantized norms is 20


higher than that of the quantized MDCT coefficients
because of the less variation of norms. As a result, the
scheme of the context-based adaptive arithmetic coding 15
used for the quantized norms has a better performance
than that used for the quantized MDCT coefficients.
10
Figure 7 presents the compression percentage of all
kinds of the parameters with different entropy coding.
The solid line presents the compression percentage of the
5
quantized norms coded by the context-based adaptive
arithmetic coding compared to Huffman coding. The
1 2 3 4 5 6 7
dashed line presents the compression percentage of the
Coding mode
quantized norms coded by the adaptive arithmetic coding
compared to Huffman coding. The dotted line presents Figure 7 Average compression percentages of quantized
norms and quantized MDCT coefficients. The x-axis represents
the compression percentage of the quantized MDCT coef- the coding modes (1 to 7), and the y-axis represents the
ficients coded by the context-based adaptive arithmetic compression percentage of all kinds of the parameters with different
coding compared to Huffman coding. The dash dotted entropy coding. The solid line presents the compression percentage
line presents the compression percentage of the quantized of the quantized norms coded by the context-based adaptive
MDCT coefficients coded by the adaptive arithmetic cod- arithmetic coding compared to Huffman coding. The dashed line
presents the compression percentage of the quantized norms
ing compared to Huffman coding. It can be seen that the coded by the adaptive arithmetic coding compared to Huffman
proposed context-based adaptive arithmetic coding per- coding. The dotted line presents the compression percentage of the
forms better than the adaptive arithmetic coding when quantized MDCT coefficients coded by context-based adaptive
coding both norms and MDCT coefficients, especially arithmetic coding compared to Huffman coding. The dash dotted
when the frequency band gains are coded. line presents the compression percentage of the quantized MDCT
coefficients coded by the adaptive arithmetic coding compared to
Huffman coding.
4.4 Audio quality
The proposed context-based arithmetic coding is per-
formed directly on the quantized audio parameters, and the rate G.719 appear the same as those of the fixed rate G.719.
technique is lossless, so the decoded parameters using the Secondly, we carry out the preferable listening tests to
proposed arithmetic coding method should have no distor- verify that the proposed scheme does not introduce any
tion. In the quality tests to evaluate the arithmetic coding, kind of undesirable effects although there is no need to use
objective comparison tests would be firstly used to subjective listening tests if the sample values are not
verify the lossless coding. By the objective comparison, changed. It is thus verified that the proposed variable rate
i.e., PEAQ [21] over a large number of speech and music coder has the same audio quality as the original G.719
samples, all samples generated by the proposed variable under the different coding modes. Besides, we use the audio
comparing tool ‘CompAudio’ [22] to check if all the sample
Table 7 The compression percentage of the quantized values are equal before and after the arithmetic coding.
MDCT coefficients
Table 8 Average complexity comparison test results in
Compression Adaptive Context-based adaptive
terms of WMOPS
percentage (%) arithmetic coding arithmetic coding
Signal Fixed rate G.719 Variable rate The proposed
Mode 1 2.727712 2.900664
type G.719 modules
Mode 2 3.395124 3.563846
Encoder Decoder Encoder Decoder Encoder Decoder
Mode 3 3.864423 4.023802
Music 6.5986 6.2988 9.6376 8.8378 3.0390 2.5390
Mode 4 4.229515 4.396522
Mixed 6.6356 6.3944 9.6641 8.9429 3.0285 2.5485
Mode 5 4.492942 4.645568 music
Mode 6 4.707953 4.842141 Speech 6.6854 6.3825 9.7223 9.2516 3.0369 2.8691
Mode 7 4.849440 4.961144 Total 6.6298 6.3288 9.6643 8.9808 3.0345 2.6520
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 12 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

Through careful audio quality evaluation and the value Competing interests
comparison, the proposed context-based adaptive arith- The authors declare that they have no competing interests.

metic coding actually leads to lossless compression used for


the quantized audio parameters. It is verified that the Acknowledgements
proposed technique is lossless and the detailed test results The authors would like to thank the reviewers for their suggestions which
have contributed a lot to the great improvement of the manuscript. The
need not to be reported. As to the audio qualities of the full
work in this paper is supported by the National Natural Science Foundation
codec (e.g., ITU-T G.719), the formal test results can be of China (no.11161140319), and the corporation between BIT and Ericsson.
found in [23,24].
Received: 24 September 2012 Accepted: 2 May 2013
Published: 21 May 2013
4.5 Complexity test
The computational complexity obtained per frame can be
specified in terms of the weighted million operations per References
1. PM Fenwick, Huffman code efficiencies for extensions of sources. IEEE Trans.
second (WMOPS) and can be evaluated by the average Commun. 43(234), 163–165 (1995). doi:10.1109/26.380027
running time. The coding rate is set to 32 kbps. The pro- 2. DA Huffman, A method for construction of minimum redundancy codes.
cessor of the computer is the Intel Core 2 Duo processor Proc. IRE 40(9), 1098–1101 (1952). doi:10.1109/JRPROC.1952.273898
3. GG Langdon, An introduction to arithmetic coding. IBM J. Res. Dev. 28(2),
(Intel, Santa Clara, CA, USA). The basic frequency is 135–149 (1984). doi:10.1147/rd.282.0135
1.8 GHz. Each frame has a length of 960 samples. Table 8 4. K Hyungjin, W Jiangtao, JD Villasenor, Secure arithmetic coding. IEEE Trans.
shows the average complexity of the original fixed rate Signal Process. 55(5), 2263–2272 (1987). doi:10.1109/TSP.2007.892710
5. Information technology, Coding of Audio-Visual Objects - Part 3, Audio,
G.719, the proposed variable rate G.719, and the proposed Subpart 4: Time/Frequency Coding, in International Organization for
adaptive arithmetic coding modules based on context. The Standardization. ISO/IEC 14496–3:1999, 1999
encoder and decoder complexities are computed separately. 6. M Neuendorf, P Gournay, M Multrus, J Lecomte, B Bessette, R Geiger,
S Bayer, G Fuchs, J Hilpert, N Rettelbach, R Salami, G Schuller, R Lefebvre,
In fact, the proposed adaptive arithmetic coding itself B Grill, Unified speech and audio coding scheme for high quality at low
results in the increase of the complexity in the new scheme. bitrates, in Proc of IEEE Int Conf Acoustics, Speech and Signal Processing, 2009,
The additive complexity of the proposed entropy coding pp. 1–4. doi:10.1109/ICASSP.2009.4959505
7. ITU-T Recommendation, G.719 (06/08), Low-complexity full-band audio coding for
modules can be acceptable in some applications because of high-quality conversational applications (Int Telecomm Union, Geneva, 2008)
the intrinsic low complexity of G.719 codec. However, al- 8. L Zhang, X Wu, N Zhang, W Gao, Q Wang, D Zhao, Context-based
most 50% increase in total complexity should be considered arithmetic coding reexamined for DCT video compression, in IEEE
International Symposium on Circuits and Systems (New Orleans, 2007),
to be optimized if very low complexity is actually needed. pp. 3147–3150. doi:10.1109/ISCAS.2007.378098
9. B Ryabko, J Rissanen, Fast adaptive arithmetic code for large alphabet
5. Conclusions sources with asymmetrical distributions. IEEE Commun. Lett. 7(1), 33–35
(2003). doi:10.1109/LCOMM.2002.807424
The novel context-based adaptive arithmetic coding tech-
10. D Marpe, H Schwarz, T Wiegand, Context-based adaptive binary arithmetic
nique proposed in this paper behaves promising and sig- coding in the H.264/AVC video compression standard. IEEE T Circ Syst Vid
nificant for the lossless compression when both the time 13(7), 620–636 (2003)
11. Information technology - MPEG audio technologies, International
and frequency plane of the audio coding parameters are
Organization for Standardization (ISO/IEC). ISO/IEC 23003–3: 2012
considered. The proposed technique has been introduced 12. CE Shannon, A mathematical theory of communications. Bell Syst. Tech. J.
to compress the quantized MDCT coefficients and the 27(3), 379–423 (1948)
quantized norms in G.719. Variable rate coding structure 13. G Fuchs, V Subbaraman, M Multrus, Efficient context adaptive entropy
coding for real-time applications, in Proc of IEEE Int Conf Acoustics, Speech
has also been investigated and adopted to obtain high cod- and Signal Processing, 2011, pp. 493–496. doi:10.1109/ICASSP.2011.5946448
ing efficiency compared with the original fixed rate G.719. 14. H Moradmand, A Payandeh, MR Aref, Joint source-channel coding using finite
Experiments have shown that the new technique achieves state integer arithmetic codes, in IEEE International Conference on Electro/
Information Technology (Windsor, 2009), pp. 19–22. doi:10.1109/EIT.2009.5189577
a coding gain of 6% to 10% at all coding modes for differ- 15. YM Huang, YC Liang, A secure arithmetic coding algorithm based on
ent types of signals, appearing to be advantageous over integer implementation, in International Symposium on Communications and
the conventional Huffman coding. To evaluate the per- Information Technologies (Hangzhou, 2011), pp. 518–521. doi:10.1109/
ISCIT.2011.6092162
formance of the proposed algorithm, objective and sub- 16. IH Witten, RM Neal, JG Cleary, Arithmetic coding for data compression.
jective quality tests have been done for a variety of speech Communication of the ACM 30(6), 520–540 (1987). doi:10.1145/
and audio samples. The average bit rates and computation 214762.214771
17. Y Chen, H Zhu, H Jin, X-H Sun, Improving the effectiveness of context-based
complexity have also been computed at different coding prefetching with multi-order analysis (International Conference on Parallel
modes. It is verified that the proposed variable rate coder Processing Workshops, San Diego, 2010), pp. 428–435. doi:10.1109/
with the adaptive arithmetic coding based on the time- ICPPW.2010.64
18. O Pasi, Toll quality variable-rate speech codec. Int Conf Acoust Spee 2,
frequency context produces the same audio quality as the 747–750 (1997)
original G.719 coder while achieving a high coding gain. 19. E Dong, H Zhao, Y Li, Low bit and variable rate speech coding using local
The proposed method in this paper can be easily used in cosine transform. Proceedings of TENCON. on Computers, Communications,
Control and Power Engineering. 1, 28–31 (2002)
other audio codecs which need to lower the coding bit 20. S McClellan, JD Gibson, Variable rate CELP based on subband flatness. IEEE
rate by means of entropy coding. T Speech Audi P 5(2), 120–130 (1997). doi:10.1109/89.554774
Wang et al. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:9 Page 13 of 13
http://asmp.eurasipjournals.com/content/2013/1/9

21. ITU-R Recommendation, BS.1387-1 (11/01), Method for Objective


Measurements of Perceived Audio Quality (Int Telecomm Union,
Geneva, 2001)
22. P Kabal, CompAudio. (1996). http://www.csee.umbc.edu/help/sound/AFsp-
V2R1/html/audio/CompAudio.html. Accessed 20 January 2013
23. M Xie, P Chu, A Taleb, M Briand, ITU-T G.719, A new low-complexity full-
band (20 kHz) audio coding standard for high-quality conversational
applications, in IEEE Workshop on Applications of Signal Processing to Audio
and Acoustics (New Paltz, 2009), pp. 265–268. doi:10.1109/
ASPAA.2009.5346487
24. A Taleb, S Karapetkov, G.719: The first ITU-T standard for high-quality
conversational full-band audio coding. IEEE Communication Magazine
47(10), 124–130 (2009). doi:10.1109/MCOM.2009.5273819

doi:10.1186/1687-4722-2013-9
Cite this article as: Wang et al.: Context-based adaptive arithmetic
coding in time and frequency domain for the lossless compression of
audio coding parameters at variable rate. EURASIP Journal on Audio,
Speech, and Music Processing 2013 2013:9.

Submit your manuscript to a


journal and benefit from:
7 Convenient online submission
7 Rigorous peer review
7 Immediate publication on acceptance
7 Open access: articles freely available online
7 High visibility within the field
7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

You might also like