New Implementation Techniques of An Effi
New Implementation Techniques of An Effi
Abstract — MPEG-AAC is the current state of the art in Among the perceptual audio coding schemes available
audio compression technology. The CD-quality promised at today, MPEG-AAC is the leading option, giving transparent
bit rate as low as 64 kbps makes AAC a strong candidate for CD quality at 64kbps. In this scheme, each AAC frame is
high quality low bandwidth audio streaming applications over independently decodable. With time domain aliasing
wireless network. Besides this low bit rate requirement, the cancellation concept, the information is carried by two
codec must be able to run on personal wireless handheld consecutive AAC frames. These features make the scheme
devices with its inherent low power characteristics. While the favourable when it comes to audio streaming application.
AAC standard is definite enough to ensure that a valid AAC The recent advances in wireless network bring about the
stream is correctly decodable by all AAC decoders, it is challenge of developing applications on portable devices,
flexible enough to accommodate variations in implementation,
including digital audio streaming. Not only low bit rate is
suited to different resources available and application areas.
desired, but also the encoder and decoder pair must be able to
This paper reviews various implementation techniques of the
run on this low power portable device. These are the
encoder. We then proposed our method of an optimized
software implementation of MPEG-AAC (LC profile). The motivations behind our research.
coder is able to perform encoding task using half the The AAC decoder is less demanding computationally,
processing power compared to standard implementation particularly because of the lack of psychoacoustics and bit
without significant degradation in quality as shown by both allocations modules. These two modules will be the focus of
subjective listening test and an ITU-R compliant quality our discussion. Section 2 will give a brief description of AAC
testing program (OPERA). and its efficiency issues. Section 3 will discuss
psychoacoustics and time to frequency transformation in
Index Terms — Audio Compression, MPEG-AAC, greater detail and section 4 will focus on bit allocation-
Psychoacoustics Model, Quantization. quantization module. Finally, section 5 will highlight the
experimental results and conclusion will be presented in
I. INTRODUCTION section 6.
Audio technology has evolved tremendously over the last
II. MPEG-ADVANCED AUDIO CODER (AAC)
century. In the advent of digital systems, sound reproduction
reaches its state of the art performance in terms of quality. AAC is the latest audio compression standard released by
However, the high bit rate characteristic of digital music does Moving Picture Experts Group (MPEG). Being a perceptual
not suit the demand of application with limited bandwidth, for encoder, it follows the basic structure depicted in figure 1
example, in digital audio streaming. To achieve efficient
transmission, compression needs to be employed. input
filter bank entropy output
Efficient coding systems are those that could optimally quantisation coding
eliminate irrelevant and redundant parts of an audio stream.
The first is achieved by reducing psychoacoustical irrelevancy
through psychoacoustics analysis. The term “perceptual audio
masking
coder” was coined to refer to those compression schemes that spectral
threshold
analysis
exploit the properties of human auditory perception. Further calc.
Contributed Paper
Manuscript received February 16, 2004 0098 3063/04/$20.00 © 2004 IEEE
656 IEEE Transactions on Consumer Electronics, Vol. 50, No. 2, MAY 2004
block is used to reduce redundant components, consisting profile tiled to have lesser computational burden compared to
mostly of prediction tools. the other profiles. However, the overall efficiency still depends
AAC uses Modified Discrete Cosine Transform (MDCT) on the detail implementations of the encoder itself.
with 50% overlap in its filterbank module. After overlap-add
process, due to the time domain aliasing cancellation, we Input
time
should be able to get a perfect reconstruction of the original signal
signal. However, this is not the case because error is AAC Gain
Control Tool
introduced during the quantization process. The idea of a
perceptual coder is to hide this quantization error such that our Window
Filter
hearing will not notice it. Those spectral components that we Length
Decision
Bank
would not be able to hear are also eliminated from the coded Spectral Processing
TNS
properties of human ear (more details on this will be given in
Perceptual
subsequent section). The quality of a perceptual coder depends model
LTP
on the psychoacoustics module because this is where all the
psychoacoustical analysis is performed. The calculation of Bark Scale
to
Intensity/
14496-3
Scalefactor Coded
masking threshold is among the computationally intensive task Band
Coupling
Bitstream
Audio Stream
Mapping
of the encoder. Predictio
Formatter
inner loop quantizes the input vector and increases the Scalefactor
coding
Spectrum
normalization and
quantizer step size until the output vector can be coded with Quantization
Noiseless coding
Interleaved VQ
where klow and khigh are the lowest and highest frequency 60
Offset
40
30 Spreading Function
expression for the spreading function (for each partition) is critical band rate [Bark]
in Fig. 4 that the masking effect is gently sloping on the where f is the frequency in kHz.
higher frequency end while on the lower side it is
considerably steep. This accounts for the fact that it is easier The computational complexity of step 1, 2 and 3 are N log
to mask higher frequency component than the lower ones. N, N and N2 respectively. Our ear analyzes sounds according
to bark scale. Therefore, conversion to frequency domain (step
4. Determination of the tonality index 1) and grouping of the spectral lines to 1/3rd bark resolution
Tonals and noise have different masking capabilities (the (step 2) as well as convolution with spreading function (step 3)
later being a better masker). A precise assessment of are inevitable. PAM implementation differs mostly in the 4th
tonality is crucial in order to avoid under-coding and over- step. Furthermore, the quality of the masking threshold
coding. In AAC, this parameter is estimated using depends greatly on how accurate this tonality index estimation
unpredictability measure. Here, let Xp(k) be the predicted is. The last two steps have negligible computational cost
value for coefficient X(k). Xp(k) is computed by compared to the previous ones.
extrapolating values of X(k) over the previous two frames. A
658 IEEE Transactions on Consumer Electronics, Vol. 50, No. 2, MAY 2004
The standard tonality calculations using weighted non-linearity involved in human auditory system [5]. However,
unpredictability highlighted above involve an N2 complexity of the use of this non-linear PAM would increase the
a convolution process. Instead of this, we propose the computational weight, which is against the goal of our
identification of the nature of the spectrum locally at different experiment.
bark band, thus avoiding this convolution process. This would Further improvement in efficiency was realized in
help to isolate the calculation strictly within a partition. The conjunction with the filter bank module. Transform is a costly
unpredictability is averaged within the partition: process, and the fact that AAC has MDCT in its filter bank
module and DFT in PAM makes this a computational
1 khigh ( b ) overhead. The MDCT used in AAC[1] is formulated as
average _ u (b ) = ∑ (ub )(k )
(khigh (b) − klow (b) ) + 1 k = klow follows:
40
30
20
10
0
0 10 20 30 40 50 60 70
partition
managed to catch the two frequency component of the signal employed for the psychoacoustics analysis. Figure 8 illustrates
whereas MDCT coefficients in figure 6c gives zero results due the minor differences in masking threshold result obtained
to the problem highlighted earlier. This misdetection poses a from using different window functions. A more thorough
problem when one tries to track the signal tonality with comparison on the use of each window in PAM is discussed in
unpredictability function. One way to workaround this is by [13].
using SFM to determine the tonality [6][7]. If unpredictability 70
dB
30
frame due to phase and/ or resolution, it will not be ignored 20
[10].
10
Instead of using MDCT in PAM, we propose the use of Odd
0
DFT (ODFT), which can be easily manipulated to obtain the 0 10 20 30 40
MDCT coefficients [12]. ODFT corresponds to DFT with the scalefactor band
discrete frequency bins shifted by π/N. Using this hann sine kbd
modification, the complexity is also reduced by one transform
Fig 8. Masking threshold comparison with different window functions
process, but in this case we do not have to deal with the
problem of misdetection stated earlier. The restructured coder
IV. BIT ALLOCATION-QUANTIZATION
is illustrated in figure 7.
input output
AAC Quantization module: AAC uses a non-uniform
ODFT to entropy
ODFT
MDCT
quantization
coding quantizer:
3
x 4
masking x _ quantized (i ) = int 3 + 0.4054 (1)
16 ( gl − scf (i ))
spectral
threshold
analysis
calc.
2
Psychoacoustics module
where i is the scale factor band index, x is the spectral values
Fig 7. Restructured AAC-LC encoder
within that band to be quantized, gl is the global scale factor
(the rate controlling parameter), and scf(i) is the scale factor
ODFT is defined as : value (the distortion controlling parameter). Figure 9 illustrates
j 2π ( k + 1 ) n the nested loop in this module to obtain the parameter gl and
N −1 − 2
Xo(k ) = ∑ h(n) x(n)e N scf(i) from inner and outer loop respectively.
n =0 Begin
where x(n) is the time domain sample and h(n) is the window
function. This ODFT output is fed into the psychoacoustics 1
module for further spectral analysis, whereas for the filter bank Initialized gl
Initialized scf(i)
module, the coefficients of the MDCT are obtained as:
no yes
MDCT ( k ) = Re{ Xo ( k )} cos θ ( k ) + Im{ Xo ( k )} sin θ ( k ) exit criteria end
where θ ( k ) =
π 1 N
k + 1 +
N 2 2 2
bit_used is below the chosen bit rate and the quantization noise We adopt both ideas by fixing the lower scale factor band to
in all scale factor bands are below the masking threshold. use the same codebook and sub-optimal solution for the
However, this is not always achievable, especially in a very upper band. The reason for this is because the lower band
low bit rate case. Two more exit criteria are defined in the contains less spectral lines. The savings in bits gained from
standard. Firstly when all scale factor bands have been using the most optimal codebook per band is less than that
amplified and secondly when the difference between two in overhead of the side information. Therefore the groupings
consecutive scale factor bands exceeds 60 (which is the of first few bands containing only 4 spectral lines resulted in
maximum number decodable). Time constraint has to be a better (less) bit_used.
employed as well when real time encoding is desired.
Improving the efficiency of this module involved optimizing 4. Quantization error calculation
each of the steps outlined in figure 9. The quantization error is calculated per scale factor band, by
summing the square difference between the original spectral
1. Initialization of gl and scf(i) value and the dequantized value. The dequantization
Normally the initial value of scf(i) would be zero, and gl process uses the following formula:
would be 1
( gl −scf (i ))
x _ dequantized (i ) = (x _ quantized (i )) .2
4
3 4
16 max_mdct _ line34
gl = log2 (2) which is also the process performed at the decoder side.
3 8191
This analysis by synthesis process has to be performed
which ensures that the maximum MDCT coefficient is every time the distortion control in the outer loop is
decoded as 8191 (the maximum value decodable by AAC executed.
decoder). However, in general audio signals, the current To reduce this task, a pre-allocation and pre-exclusion of
audio frame is highly correlated with the previous one. Due bits can be adopted [16][17]. From robust experiment, there
to this property, these bit allocation parameters of the are bands in which the bits are always allocated and bands
current audio frame is similar to that of the previous frame. which always have zero allocation (this mostly occurs in the
By using the previous frame result as the initial estimate of upper band due to the high threshold of hearing in this
gl and scf(i) parameters, the iterative step of the bit region). For these special bands, iteration is no longer
allocation can be reduced [14]. needed and the process of calculating the quantization error
can be skipped. However, this technique can only be used
2. Global scale factor (gl) adjustment when we have enough bits at hand. Pre-allocation might
Instead of using linear search from initial to the desired result in bit shortage in a more important band and hence,
value for this parameter, binary search is employed. This not advisable for low bit rate coding.
would reduce the number of iterations from N to log N. A more general approach to optimize this task would be to
approximate the quantization error mathematically. In order
3. Calculation of bit_used to strongly reduce the number of operations, a uniform
One of the reasons why bit allocation module is a time quantizer can be considered to estimate the noise power
3
( gl − scf (i ))
consuming task is the presence of Huffman coding within [18], that is ∆2 where the step size ∆ = 2 16 . This
the inner loop. The relation between the global scale factor 12
(gl) and bit_used is not linear due to this reason. Every time method disregards the compression process (x¾) of a non-
gl is adjusted, the coefficient needs to be requantized and uniform quantizer in exchange for simplicity. We will adopt
Huffman coded. There are eleven Huffman codebook a more precise approximation for quantization noise, which
options for each of the scale factor band and there is a will be discussed later in this section in conjunction with
grouping option for adjacent scalefactor that uses the same approximation of global and the individual scale factors.
Huffman codebook. Grouping is performed to reduce the
number of side information, but it is not always 5. Scale factor (scf(i)) adjustment
advantageous to use the same codebook for adjacent scale As mentioned earlier, when bit resources are low, more
factor band. Hence in choosing the most optimum often than not we have to choose to only amplify the scale
codebook, we have to try grouping possibilities as well. factor bands with the highest NMR (Noise to Mask Ratio).
This is an NP-complete problem and it is not always This search process can be optimized by using a complete
feasible to get the optimum solution mostly due to time binary tree data structure with a property that the value
constraint. (NMR) of each node is at least as large as the value of its
A sub-optimal solution has been suggested in [7][9] by children nodes [19]. In this case, the scale factor adjustment
checking the grouping possibilities just in one iteration for reduces to just deleting the top element of the tree, adjusting
adjacent scale factor bands. Another option is to have a the scale factor (modifying the NMR accordingly), and
fixed Huffman sectioning, by having three nonzero bands reinserting this element back into the tree. When the NMR
share the same codebook [15]. becomes lower than zero, it need not be inserted back into
E. Kurniawati et al.: New Implementation Techniques of an Efficient MPEG Advanced Audio Coder 661
the tree as no scale factor adjustment needs to be made so This is the same approach used in [22] with the assumption
the tree size becomes smaller during this process. The that random variable e and x are independent and uniformly
advantage of this approach is that for each modification, we distributed. The error is relatively small and this series
need to work on log N elements as opposed to N elements in converges practically fast. For simplicity reason, only the first
traditional linear search. order result is employed.
From experimental result, the quantization noise derived
Apart from these optimizations, we are still faced with a from the above approximation is often lower than that obtained
problem that all these processes are repetitively executed until from the traditional analysis by synthesis method. This could
the best solution is found or until the time in the exit criteria be due to the truncation of higher order results.
expires. A more intuitive way to get a better result is to start Underestimating the noise could lead to perceptual artefact
with a better initial value for the parameters. The best case because what we thought was already under the masking
then is to arrive at the best solution within first trial. This is the threshold might end up being higher. We adopted a scaling
method attempted in [15][20][21][22], especially in obtaining factor within the noise approximation to circumvent this
the initial value of scale factor (sf(i)). This distortion problem as over estimating this value will not have any effect
controlling parameter will depend on the masking threshold, on the perceptual quality. Figure 10 shows the comparison
and we will try to relate this two variables mathematically. between the noise and its approximated counterpart.
Combining equation (1) and (2), we will have the
dequantized value 100
90
4
3 3 1 ( gl − scf (i ) 80
x 4
+ 0.4054 .2 4
70
x _ dequantized (i ) = int 3 60
2 16 ( gl − scf (i ))
dB
50
40
30
20
4
3 3 1 ( gl − scf (i ) 10
x 4
= 3 + e .2 4
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
the resulting global gain and the minimum masking threshold. critical audio signals. The optimized method does not show
Initial value refers to the initial gl calculated with equation 2. different result for both bit rate because both rarely involves
TABLE I
80
MAIN DIFFERENCES IN IMPLEMENTATION
70
Traditional impl. Optimized impl.
1. Transform MDCT calculation Derived from
60
PAM’s FFT
50 • block switching Perceptual entropy PE and energy
(PE) based based
40
2. Psychoacoustics FFT with Hann FFT with Sin/KBD
30 Module (PAM) window and π/N freq. shift
20
• tonality index Weighted Average
10 unpredictability unpredictability
3. Bit Allocation
0
1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 210
• initial sfb All zero Estimated with
equation 3.
global scalefactor initial value minimum xmin
Fig 11. Correlation between gl and minimum masking threshold. • initial gl Equation (2) Previous frame
Minimum xmin serve as better initial value and have better correlation. value (except after
short block)
Instead of using this initial value, we take the previous global
gain value and use linear interpolation based on the gradient of • gl adjustment Linear adjustment Interpolated based
the minimum masking threshold obtained. Figure 12 shows the on xmin gradient
linear regression analysis of the two variables having a
correlation value of 0.85. • scf adjustment Linear adjustment Minor fine-tuning
75
• noise Analysis by Approximated
70 calculation synthesis with equation 3.
65
TABLE II
60 PROCESSING TIME COMPARISON FOR BIT RATE 64 KBPS
Number Original Optimized Gain
55
of method method
50 frames (seconds) (seconds)
45
Castanet 301 11 5 2.20
0 10 20 30 40 50 60 Flute 804 14 8 1.75
global gain Predicted gl Glockenspiel 345 11 6 1.83
Fig 12. Linear regression analysis
Pop music 330 13 7 1.86
Speech 727 14 7 2.00
This method however was not applied during the transient Hihat 109 5 2 2.50
part of the signal (when short window is used). The inter frame
correlation during transient is extremely low; hence the use of TABLE III
previous window does not yield a good result. In this case, the PROCESSING TIME COMPARISON FOR BIT RATE 96 KBPS
xmin value will be used as an initial estimate until the coder Number Original Optimized Gain
switches back to using long window. of method method
frames (seconds) (seconds)
Castanet 301 10 5 2.00
V. RESULTS AND DISCUSSIONS
Flute 804 10 9 1.11
We tested the codec to verify the performance of the
Glockenspiel 345 9 7 1.29
encoding system both in quality and encoding speed. The
Pop music 330 12 7 1.71
comparison is performed against a standard implementation
from ISO reference coder. Table 1 highlights the main Speech 727 12 8 1.50
differences between the two encoder implementation from Hihat 109 5 2 2.50
algorithm point of view. iteration in the bit allocation module (due to the direct
The encoding speed was evaluated using PC with Pentium II estimation of scale factor and global scale factor values). For
350 MHz processor for two different bit rates of 44.1KHz the original method, the higher the bit rate, the faster the rate
audio signal. Tables 2 and 3 summarize the result for the control loop converges. This is because the bit budget is much
E. Kurniawati et al.: New Implementation Techniques of an Efficient MPEG Advanced Audio Coder 663
higher. In this experiment, it can be observed that for 96 kbps, The computational demand of the optimized encoder is
the encoding time is generally much shorter. shown in figure 14. Comparing it with figure 3 from the initial
The perceptual quality was tested using two approaches. implementation, the major improvement comes from the
The first approach is subjective listening test, involving six quantization module due to the reduction of the nested loop.
critical signals listed in table 2. These are the signals known to The filterbank module also shows improvement because the
be difficult to be encoded by a perceptual coder because they major calculation has been absorbed by the psychoacoustics
are prone to perceptual audio artefact [24]. The second module. Overall, the proposed optimized method was able to
approach uses a quality-testing program called OPERA safe half of the computational resources.
(Objective Perceptual Analyzer) which simulates the human
ear. This software is compliant with PEAQ (Perceptual
Evaluation of Audio Quality), an ITU-R standard. The result is Psychoacoustics (19%)
presented in figure 11 for bit rate 64 an 96 kbps. Filterbank (1%)
Figure 13 shows MOS differences, with diffscore = 0 for the Quantisation (23%)
-0.40
namely the psychoacoustics analysis and the bit allocation.
-0.60 ODG = -0.49 As a perceptual encoder, AAC quality relies heavily on the
ODG = -0.6 psychoacoustics module, which generate the masking
-0.80
threshold curve. This threshold represents the maximum
-1.00 threshold of noise that will not be perceptible to our ear. The
analysis exploits simultaneous masking properties of our
-1.20
auditory system, which is calculated in bark scale. Therefore,
Slightly Annoying
ODG = -1.6 ODG = -1.57 Since tone and noise have different masking properties, a
-1.80 precise estimation of this index is important to avoid over and
-2.00
under masking. Three methods have been discussed in this
paper, and average unpredictability scheme was selected for
-2.20 implementation mainly because of its low computational
-2.40
weight and relatively good quality.
Annoying
ODG (Objective Difference Grade) from OPERA only need to perform one transform in the encoder without
-3.60
Original Implementation degrading the overall quality.
-3.80 Optimized implementation Bit allocation unit took more than half of the processing
power due to the present of rate distortion control loop. This
-4.00
nested loop iterates until the optimum global and individual
64Kbps 96Kbps
scale factors are found. A better way to calculate the initial
Fig 13. Subjective quality test and OPERA ODG at two different bit rate value for these parameters is presented in this paper in an
664 IEEE Transactions on Consumer Electronics, Vol. 50, No. 2, MAY 2004
effort to avoid unnecessary iteration. The calculation of [15] C.M. Liu, W.J. Lee, R.S. Hong, “Bit Allocation for Advanced Audio
Coding using Bandwidth Proportional Noise Shaping Criterion”,
quantization error is improved using an estimator derived from Proceedings of the 6th International Conference on Digital Audio
the scale factor values. This saves us from performing the Effects (DAFX-03).
dequantization process in the encoder, as normally used in [16] Hyen-O Oh, Joon-Seok Kim, Chang-Jun Song, Young-Cheol Park, Dae
analysis by synthesis method to calculate the error. Hee Youn, “Low Power MPEG/Audio Encoders Using Simplified
Psychoacoustic model and Fast Bit Allocation”, 0-7803-6622-0/01,
The perceptual quality of the optimized encoder was 2001 IEEE.
evaluated using subjective listening test and objective [17] Hyen-O Oh, Joon-Seok Kim, Chang-Jun Song, Dae-Hee Youn, Il-
evaluation of a quality testing program called OPERA. Both WhanCha, “New Implementation Techniques of A Real Time MPEG-2
Audio Encoding System”, 0-7803-5041-3/99, 1999 IEEE.
results show no significant degradation in the optimized coder [18] A.D.Duenas, R. Perez, B.Rivas, E.Alexandre, A.S.Pena, “Realtime
for bit rate of 64kbps and 96 kbps, as overlapping confidence Implementation of MPEG-2 and MPET-4 Natural Audio Coders”,
interval was obtained from both listening test. Audio Engineering Society 110th Convention 2001, Preprint #5302
[19] Manoj Kumar, Mohammad Zubair, “A High Performance Software
The latest effort to further reduce the audio bit rate results in Implementation of MPEG Audio Encoder”, ICASSP, Vol. 2, 1996
the standardization of High Efficiency – AAC (HE-AAC) as [20] C.M. Liu, W.J. Lee, R.S. Hong, “A New Crieterion and Associated Bit
part of MPEG 4 systems, promising CD quality at 48 kbps. Allocation Method For Current Audio Coding Standards”, Proceedings
of the 5th International Conference on Digital Audio Effects (DAFX-
HE-AAC contains a standard AAC to code the low frequency
02).
region and a new Spectral Band Replication (SBR) technology [21] Chi-Min Liu, Chin-Ching Chen, Wen-Chieh Lee, Szu-Wei Lee, “A Fast
to generate the high frequency portion. All the modifications Bit Allocation Method for MPEG Layer III”, 0-7803-5123-1/99, 1999
highlighted in this paper can be utilized in the core coder of IEEE.
[22] C.Y.Lee, Y.C.Fang, H.C.Chuang, C.N.Wang, T.H. Chiang, “A Fast
HE-AAC. Our future research will focus on the optimization Audio Bit Allocation Technique Based on a Linear R-D Model”, IEEE
of the SBR part of this newly defined coding system. Transactions of Consumer Electronics, Vol. 48, No.3, August 2002
[23] Kelvin H.C. Eng, D.Y.Huang, S.W. Foo, “A New Bit Allocation
Method for Low Delay Audio Coding at Low Bit Rates”, Audio
REFERENCES Engineering Society 112th Convention 2002, Preprint #5573
[1] ISO/IEC 14496-3, “Information Technology – Coding of audio-visual [24] Markus Erne, “Perceptual Audio Coders, What to listen for”, Audio
objects, Part 3: Audio” (1999) Engineering Society 111th Convention 2001
[2] E.Kurniawati, J.Absar, S.George, C.T.Lau, B.Premkumar, “An
Investigation Into Different Masking Behaviours Resulting from
Estimation of Tonality Index”, 14th International Conference on Evelyn Kurniawati received her Bachelor of Applied
Digital Signal Processing, July 2002, Santorini, Greece. Science (Computer Engineering) degree from Nanyang
[3] J.D. Johnston, “Estimation of Perceptual Entropy Using Noise Masking Technological Uniersity (NTU) Singapore in 2000. She
Criteria”, IEEE CH2561-9/88/0000-2524, 1988. is now pursuing her doctoral degree in School of
[4] J.D. Johnston, “Transform Coding of Audio Signals Using Perceptual Computer Engineering, NTU. Her research interest are
Noise Criteria”, IEEE Journal on Selected Areas in Communications in digital audio compression, network security and
Vol.6No. 2, February 1988 computer animation.
[5] E.Kurniawati, J.Absar, S.George, C.T.Lau, B.Premkumar, “The
Significance of Tonality Index and Nonlinear Psychoacoustics Models Chiew-Tong Lau received his B.Eng. degree from
for Masking Threshold Estimation”, Audio Engineering Society 22nd Lakehead University in 1983, and M.A.Sc and Ph.D.
International Conference on Virtual, Synthetic and Entertainment degrees in Electrical Engineering from the University of
Audio, June 2002, Espoo, Finland. British Columbia in 1985 and 1990 respectively. He is
[6] Ivan Dimkovic, Dragorad Milovanovic, Zoran Bojkovic, “Fast Software currently an Associate Professor and Head of Division
Implementation of MPEG Audio Encoder”, 14th International of Computer Communications in the School of
Conference on Digital Signal Processing, July 2002, Santorini, Greece. Computer Engineering, Nanyang Technological
[7] Toshiyuki Nomura, Yuchiro Takamizawa, “Processor-Efficient University, Singapore. His main research interests are in
Implementation of a Hight Quality MPEG-2 AAC Encoder”, Audio
Engineering Society 110th Convention 2001, Preprint #5294 wireless communications.
[8] T.H. Tsai, S.W. Huang, L.G.Chen, “Design of a Low Power
Psychoacoustic Model Co-Processor for MPEG-2/4 AAD LC Stereo Benjamin Premkumar received his Bachelor of
Encoder”, 0-7803-7761-3/03, IEEE Science degree in Physics and Math from Bangalore
[9] Yuichiro Takamizawa, Toshiyuki Nomura, and Masao Ikekawa, “High- University (India) and a Bachelor’s degree in Electrical
Quality and Processor-Efficient Implementation of an MPEG-2 AAC communication Engineering from the Indian Institute of
Encoder”, 0-7803-7041-4/01, IEEE. Science (India). He briefly worked in large
[10] A.D.Duenas, R. Perez, B.Rivas, E.Alexandre, A.S.Pena, “A robust and communication industry in Bangalore in their Research
Efficient Implementation of MPEG-2/4 AAC Natural Audio Coders”, and Development division before proceeding to the US
Audio Engineering Society 112th Convention 2002, Preprint #5556 to earn his M.S. from North Dakota State University.
[11] Ye Wang, Leonid Yaroslavsky, Miikka Vilermo, Mauri Vaananen, His MS research was in the area of Digital Speech Processing. He taught as a
“Some Peculiar Properties of the MDCT”, 0-7803-5747-7/00, 2000. graduate teaching fellow at NDSU. He then went on to obtain his PhD from
[12] Anibal J.S. Ferreira, “Perceptual coding using sinusoidal modeling in University of Idaho. His PhD thesis was in the area of Synthetic Aperture
the MDCT domain”, Audio Engineering Society 112th Convention Radar Signal Processing, a project funded by NASA.
2002, Preprint #5569. He has held various teaching positions since 1991 both in the US and
[13] E.Kurniawati, J.Absar, S.George, C.T.Lau, B.Premkumar, “Single Singapore. Currently he is an Associate Professor in the school of Computer
Transform Perceptual Audio Encoder”,14th International Conference Engineering (NTU). His research interests include Digital Signal Processing
on Digital Signal Processing, July 2002, Santorini, Greece. and its applications in Wireless Communication, Software Defined Radio and
[14] Kai-Tat Fung, Yui-Lam Chan, Wan-Chi Siu, “A Fast Bit Allocation Impulse Radio. He also works in the area of multirate signal processing, filter
Algorithm for MPEG Audio Encoder”, Proceedings of 2001 banks, transform techniques, speech coding techniques, Number Theory,
International Symposium on Intelligent Multimedia, Video and Speech Wavelet transform and its application to signal analysis.
Processing, May 2001
E. Kurniawati et al.: New Implementation Techniques of an Efficient MPEG Advanced Audio Coder 665