Level and Time Panning of Phantom Images For Musical Sources
Level and Time Panning of Phantom Images For Musical Sources
net/publication/283708337
CITATIONS READS
23 312
2 authors, including:
Hyunkook Lee
University of Huddersfield
86 PUBLICATIONS 273 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Mulitchannel Microphone Array Recording for Popular Music Production in Virtual Reality View project
Perceptual Rendering of Vertical Image Width for 3D Multichannel Audio View project
All content following this page was uploaded by Hyunkook Lee on 11 November 2015.
1
University of Huddersfield, Huddersfield, HD1 3DH, United Kingdom
2
Logophon, Ltd., Witney, Oxon, OX28 1DB, United Kingdom
This study investigates the independent influences of interchannel level difference (ICLD)
and interchannel time difference (ICTD) on the panning of 2-channel stereo phantom images
for various musical sources. The results indicate that a level panning can perform robustly
regardless of the spectral and temporal characteristics of source signals, whereas a time panning
is not suitable for a continuous source with a high fundamental frequency. Statistical differences
between the data obtained for different sources are found to be insignificant, and from this
a unified set of ICLD and ICTD values for 10◦ , 20◦ , and 30◦ image positions are derived.
Linear level and time panning functions for the two separate panning regions of 0◦ –20◦ and
21◦ –30◦ are further proposed, and their applicability to arbitrary loudspeaker base angle is
also considered. These perceptual panning functions are expected to be more accurate than the
theoretical sine or tangent law in terms of matching between predicted and actually perceived
image positions.
978 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES
Typically, a time-panned image would suffer from the pannings, a total of five sound sources were chosen for this
comb-filtering effect when the signals are combined as experiment, comprising:
mono and be less localizable than a level-panned one
[14,15]. However, the former tends to be perceived as • Piano: “staccato” note of C2 (f0 = 65 Hz);
more spacious than the latter [15], which is why spaced • Piano: “staccato” note of C6 (f0 = 1046 Hz);
microphone techniques are often preferred to coincident • Trumpet: “sustain” note of Bflat3 (f0 = 228 Hz);
techniques by sound engineers. The spatial quality of • Trumpet: “sustain” note of Bflat5 (f0 = 922 Hz);
time-panned images could also be beneficial for practical • Male speech.
panning-based mixing applications. While it would have
been practically difficult to implement a time panning func- The piano and trumpet were chosen as representatives
tionality in an analog mixing desk, it would not be any of transient and continuous sounds. For each source, low
longer with today’s digital technology. and high-pitched notes were used to examine the effect of
The results from the ICLD and ICTD experiments con- spectral characteristics. It was decided to use single notes
ducted in the past are considerably divergent depending instead of performance extracts to control the variables as
on the spectral and temporal characteristics of the test much as possible. The onset and offset transients of the
stimulus used. The difference between low frequency and trumpet sounds were removed by fading in and out the
high frequency sounds in ICLD-based panning (level pan- beginning and ending for one second each in order to in-
ning) was noted in [2,7, 12], showing that high frequen- vestigate the effect of continuous nature more exclusively.
cies generally required less ICLD than low frequencies for The speech signal was included for its broadband frequency
a given image position. It is also evident that continuous spectrum as well as complex temporal characteristics that
and transient sounds are localized differently in level pan- combine both transient and continuous natures. Since a
ning [3,7]. In case of time panning, it has been shown number of past localization experiments used speech, the
that the change in image position perceived as a function speech source used in this test was expected to be a useful
of ICTD had a fluctuating pattern [7,16] and that tran- reference for comparison of results.
sient sounds were more reliably localized than continuous Ideally all the sound sources would have been recorded
sounds [7]. The above-mentioned findings were obtained under an anechoic condition, but this was unavailable. Al-
for controlled sound sources such as noise and pure tone. ternatively, the piano signals were recorded in a small
However, to the authors’ best knowledge there has been recording booth using a single cardioid condenser micro-
no panning experiment using different notes of musical phone (AKG C414 B-ULS) placed about 30 cm over the
sources. Musical sources have a complex harmonic and en- hammers for the desired notes. The piano was completely
velope nature as opposed to noise or tone, and are used for covered with thick cloth in order to reduce room reflections
practical panning applications most widely. It was there- and reverberation as much as possible. The trumpet signals
fore considered to be worth investigating level and time were also recorded in an acoustically dry studio using the
panning behaviors of musical notes with different spectral same type of microphone placed about 1 m away from the
and temporal characteristics. Furthermore, ICLD and ICTD instrument. The speech signal used was an anechoically
panning values derived using musical sources were consid- recorded male speech taken from the Bang and Olufsen’s
ered more ecologically valid than those using controlled Archimedes project CD [17]. The waveforms and frequency
sources. responses of the stimuli used are presented in Fig. 1.
From the above background, a series of listening tests
was conducted to investigate the independent influences of 1.2 Subjects
ICLD and ICTD on the localization of stereo phantom im-
A total of five subjects took part in the test. They were
ages using musical notes. It was also of interest to derive
research staff and doctoral students at the Institute of Sound
level and time panning functions for loudspeaker reproduc-
Recording of the University of Surrey. All of them reported
tion based on the subjective data obtained from the listening
normal hearing and were trained and experienced in spatial
tests. The first section of this paper describes the experi-
listening. Due to the nature of the test requiring a highly
mental method. Second, the listening test data obtained for
critical listening skill, it was decided to employ a relatively
different panning methods and different sources are ana-
small number of critical subjects rather than a large number
lyzed statistically. Third, main findings from the listening
of less experienced ones and repeat the whole test set three
tests are discussed. Finally, perceptually motivated level and
times for each subject in order to ensure a sufficient amount
time panning equations are proposed based on the results
of data for later statistical analysis.
of the listening test.
J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 979
LEE AND RUMSEY PAPERS
Fig. 1. Waveforms and long-term-averaged spectra of the stimuli used for the experiments
2 RESULTS
Fig. 2. Control interface for the localization test developed using
Max-MSP software This section presents the results from the listening tests
conducted. The ICLD and ICTD data obtained for each
target angle are first explored for each sound source. The
in the right reproduction sector. The tests were designed so effect of source for each angle is then analyzed. Since the
that the listener adjusted ICTD or ICLD using a slider pro- scales used for the experiments had an ordinal nature, me-
vided in a control interface to match the perceived positions dian values and 25th –75th percentile intervals are plotted.
of phantom images to those of the reference markers. Time For the same reason, the statistical analysis methods used
delay or level attenuation was applied to the left chan- were non-parametric tests.
nel only. In case of level panning, this introduced modest
changes in overall loudness as the ICLD was varied. How- 2.1 Level Panning
ever, horizontal localization is not known to be influenced The results obtained using the level panning method are
by overall loudness level [e.g., 18], so this was not com- shown in Fig. 3. From the plots, it appears that there is
pensated here. The average playback levels of the stimuli no overlap of percentile bars between any pair of target
before panning were calibrated to 75 dB(A) at the listening angles. To confirm the statistical significance of the differ-
position. ence between the data for each angle, a Wilcoxon test was
The subjects were allowed to listen to the stimuli re- performed using the SPSS data analysis tool. The results
peatedly until they were completely confident about their showed that all the differences were significant at the 0.01
decisions. They were asked to face the front and not to move level for all sources.
their heads while listening to the sounds, which was moni- From the results for individual sources, the following
tored by watching the subject through the window between observations were made. First, the low and high note pi-
the control room and the listening room. The control inter- ano sources appear to have small differences in the median
face was developed using MAX-MSP software as shown values for all angles, which range from 0.5 to 1.1 dB. The
in Fig. 2. The adjustment range of ICTD was from 0 to percentile bars are in the range of only around 2 dB at all an-
5 ms with the interval of 0.01 ms. For the ICLD variation, gles for both sources except 30◦ for high piano. The median
980 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES
Fig. 3. Results of level panning test: Medians and associated 25% to 75% percentiles
difference between low trumpet and high trumpet for each range of only 0.1–0.3 dB. In general, the percentile bars for
angle is also small. However, the percentile intervals of low all sources generally become larger as the angle increases.
trumpet are slightly larger than those of high trumpet for 10◦
and 20◦ . Although the trumpet sources have similar median
values to the piano ones, it can be observed that there are 2.2 Time Panning
some obvious differences in the size of percentile interval The median values and 25th –75th percentile intervals for
between the trumpet and piano. For example, low trumpet the ICTD data obtained for each source are plotted in Fig. 4.
has a substantially wider percentile bar for 30◦ compared to The ICTD result for the high note trumpet source was not
low piano, although the results for 10◦ and 20◦ are similar. produced since all the subjects found it impossible to judge
Moreover, high trumpet appears to have wider percentile fixed ICTDs to localize the phantom image. They all com-
bars than high piano at all angles. The results for the speech mented that the image position appeared to change ran-
source have similar patterns to those for low piano. The domly and rapidly between the loudspeakers as they moved
difference between median values for each angle is in the the position of the slider. The Wilcoxon test showed that
J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 981
LEE AND RUMSEY PAPERS
Fig. 4. Results of time panning test: Medians and associated 25% to 75% percentiles
the difference between the data for each pair of angles was Table 1. Friedman test performed on the effect of sound
significant for all sources (p < 0.01). However, compared source for the ICTD and ICID panning methods
to the ICLD results, the gap between the percentile intervals
ICLD ICTD
for adjacent angles appears to be smaller in general.
It can be seen from the data plots that the low and high Angle X2 df Sig. X2 df Sig.
notes of piano produced similar results. The median values
are plotted in a similar range for each angle. Similarly 10◦ 5.177 4 0.700 3.207 3 0.361
to the results of the ICLD test, the size of the percentile 20◦ 8.492 4 0.075 5.011 3 0.171
30◦ 0.225 4 0.994 4.333 3 0.228
interval increases as the angle increases. The data ranges
for 10◦ and 20◦ are around 0.2 ms whereas those for 30◦
is 0.5–0.7 ms. The low trumpet and speech results show Table 2. Overall median values and InterQuartile Ranges
(IQRs)
similar patterns to the piano ones in terms of an increase in
percentile interval at a higher angle. However, the median
Panning Angle Median IQR
values for the trumpet are greater than those for the piano
especially at 20◦ and 30◦ . For the speech source, the median ICLD (dB) 10◦ 4.1 1.15
value for 30◦ is the highest among those of all sources. 20◦ 8.4 2.0
30◦ 17.1 4.7
ICTD (ms) 10◦ 0.27 0.11
2.3 Analysis of the Effect of Sound Source 20◦ 0.49 0.29
From the results presented above, common patterns were 30◦ 1.0 0.58
observed for all sources regardless of the panning method as
follows. First, the percentile intervals increase as the target
angle increases. Second, the difference in median values parametric test was chosen since each subject tested all five
between 20◦ and 30◦ is bigger than that between 10◦ and sound sources and therefore the obtained data are interre-
20◦ . Finally, the median values for all sources lie within the lated. Table 1 summarizes the results of the test.
smallest percentile interval of all for each angle. As can be seen from the results, there is no significant
In order to statistically examine the effects of sound difference found between sound sources for any angle. It is
source on the ICLD and ICTD judgments for each pan- therefore possible to combine the data for all sound sources
ning angle, the Friedman test was carried out. This non- and produce unified results. Table 2 shows the overall
982 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES
3.1 Source Effect in Level Panning 3.2 Source Effect in Time Panning
There was no significant difference found between low The low note trumpet was time-panned with a similar
and high notes in the judgment of ICLD for each panning certainty to the speech signal. However, the time panning
angle. This result can be discussed in light of the findings of the high trumpet was found to be impossible due to er-
of Pulkki and Karjalainen [2001]. They investigated the ratic and random changes in apparent image position with
localization accuracies of ITD and ILD cues created from varied ICTD. Since the trumpet sources had onset and off-
level-panned noise signals at 11 ERB frequency bands. It set transients removed by fade in/out, this result leads to a
was concluded in their study that the localization of a level- discussion on the effect of frequency in the time panning
panned phantom image relied more on ITD cues at low of ongoing sound. As shown in Fig. A.1 in the Appendix,
frequencies and more on ILD cues at high frequencies. The ITD and ILD caused by ICTD fluctuate between positive
degrees of localization accuracy for the high frequency ILD and negative values periodically as a function of ICTD.
cues were reported to be similar to those for low frequency This is due to the interaction between the time difference
ITD cues. In other studies [2,13], it was suggested that the between two loudspeaker signals arriving at each ear and
ITD of the signal envelope is also used for the localization the wavelength of the signal at a given frequency [14]. It
of level-panned image at high frequencies. Based on these, can be observed in the figure that the ITD and ILD for
J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 983
LEE AND RUMSEY PAPERS
Table 3. Comparison of ICLD values obtained by different authors: de Boer, speech; Brittain and Leakey, speech limited to 5 kHz;
Simonsen, speech and maracas; Wittek, speech/based on the literature; Lee and Rumsey, various musical sources and speech
Image de Boer Brittain and Sine law: Tangent law: Simonsen Wittek and Lee and Rumsey
position [5] Leakey [6] Clark et al. [12] Leakey [2] [8] Theile [9,29] [present]
250 Hz vary constantly at least up to about 1.5 ms, within mainly relied on the strong transient nature rather than the
which the ICTDs for the low trumpet were judged. How- fundamental frequency, whereas that of the low note relied
ever, the periods of ITD and ILD fluctuations become more on the frequency component.
shorter as the frequency goes up. This means that at higher
frequencies ITD and ILD would change more rapidly with 3.3 Comparison with Previous Results
a small change of ICTD, thus a more frequent variation in In Tables 3 and 4 the unified median ICLD and ICTD
perceived image position over ICTD. Furthermore, in the data obtained from the present study are compared with the
localization of a real source at high frequencies above 1.5 results from some of the previous studies that used speech
kHz, ITD cues would become ambiguous and ILD cues sources. The data for de Boer [5], Brittain and Leaky [6],
would play a dominant role according to the duplex theory and Leakey [2] were approximated from the original data
[25], although for complex waveforms ITDs can still be de- since their studies tested different target image positions.
tected at high frequencies dependent on the signal envelope Wittek and Theile [29]’s values for 10◦ and 20◦ were cal-
[26]. With time panning, however, the ILD tends to become culated using the image shift factors of 2.2◦ /dB and 3.9◦ /
smaller at a higher frequency as can be seen also in Fig. A.1. 0.1 ms, which they proposed based on the average of results
Since there is no effective ITD-ILD trading for localization found in the literature. They suggest that these factors are
at high frequencies [27], the small and fluctuating ILD cues valid up to 22.5◦ since a linear relationship between ICLD
alone would not be able to cause an image position to be or ICTD and perceived position is generally observed only
panned to a wide angle. up to that angle in the literature. The value for 30◦ was
For the piano staccatos, on the other hand, it was pos- obtained from Wittek and Theile’s own listening test [9].
sible for the listeners to judge ICTDs for the high note Additionally, the ICLDs predicted by the sine and tangent
without difficulty. This seems to be due to the strong tran- level panning law are also included in Table 3 for a compar-
sient nature of the piano staccato. While the time panning ison between theoretical and experimental results. It can be
of high frequency ongoing sound suffers from ambiguous seen that the present ICLD results for 10◦ and 20◦ are most
ITD and ILD cues, that of transient sound mainly relies on similar to Brittain and Leakey’s and Wittek and Theile’s.
the time difference between the signals that arrive first at For 30◦ , Wittek and Theile’s result is the closest to the
the two ears [1,28]. The results also showed that the ICTDs present one. It is interesting to observe that Simonsen’s
judged for the high piano were not significantly different data, which are arguably most widely quoted in the context
from those for the low piano. This might be explained by of microphone technique, are considerably lower than the
the effect of signal onset duration on localization accuracy. others. It is also noticeable that the values from the sine and
It is usually observed that the onset durations for a piano tangent laws are greater than all the experimental results.
staccato signal become shorter as the note number goes up. This means that there would be a discrepancy between the
For example, the onset duration for the C6 note used in predicted and perceived image positions. With respect to
the present study was 9 ms whereas that for the C2 was ICTD, again the present results appear to be most similar
45 ms. Rakerd and Hartmann [22] investigated the effect of to Wittek and Theile’s. Simonsens’s values for 10◦ and 20◦
onset duration on localization accuracy in the context of the are the lowest of all whereas that for 30◦ is the highest. In
precedence effect using 500 Hz and 2 kHz tones, and found general, the discrepancy among the different ICTD results
that the shorter the onset duration was, the smaller the local- is smaller than that among the ICLD results.
ization error was. It was further found that the 2 kHz tone
was localized as accurately as the 500 Hz when the onset 3.4 Level and Time Panning Functions
duration was short. This suggests the dominance of onset The results show that the unified ICLD and ICTD me-
duration over frequency in localization. From this, it might dian values required for 10◦ and 20◦ image shifts increased
be hypothesized that the localization of the high note piano almost linearly. On the other hand, the increase from 20◦
Table 4. Comparison of ICTD values obtained by different authors: de Boer, speech; Leakey, speech; Simonsen, speech and maracas;
Wittek and Theile, speech/based on the literature; Lee and Rumsey, various musical sources and speech
Image position de Boer [5] Leakey [2] Simonsen [8] Wittek and Theile [9,29] Lee and Rumsey [present]
984 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES
to 30◦ was almost double that in the lower region. This Table 5. Panning factors derived from the present results
seems to suggest that the directional resolution of ICLD or
ICTD becomes lower beyond around 20◦ . A similar ten- Panning region
dency can be seen in the previous results mentioned above
Panning method 0◦ –20◦ 21◦ –30◦
and might be explained as follows. Mills [30] found that
the smallest possible angular change of sound source that ICLD 0.425 dB/deg. 0.85 dB/deg.
could be just detected (“minimum audible angle” (MAA)) (2.4◦ /dB) (1.2◦ /dB)
became larger as the source moved away from the front ICTD 0.025 ms/deg. 0.05 ms/deg.
(4.0◦ /ms) (2.0◦ /ms)
toward the side of the listener. Since an increase in source
angle causes increases in ILD and ITD, the MAA result
could be interpreted as a just noticeable difference (JND)
of interaural cues in perceived source position being in- !
0.025α [ms], α ≤ 20
creased with larger interaural differences. In terms of the I C T D(α) = (2)
0.05α − 0.5 [ms], 20 < α ≤ 30
current result, therefore, it is hypothesized that the JND of
interaural differences resulting from level or time panning
increased as the ICLD or ICTD was increased beyond a where α is target image position in degree.
certain threshold. It needs to be noted that the above equations are valid only
It was also shown that the judged ICLD and ICTD be- for the conventional 60◦ loudspeaker arrangement since the
came more divergent as the targeted panning angle in- panning factors were obtained from listening tests using
creased. One possible explanation for this result is that that particular arrangement. However, according to Theile
the subject might have found it more difficult to locate [33,34], the angular displacement of phantom image caused
the image precisely to one position as the ICLD or ICTD by a certain ICLD or ICTD changes constantly in propor-
was increased. Wendt [7] observed this phenomenon in his tion to the loudspeaker base angle. For example, an ICLD
stereophonic localization study using pure tones, which is or ICTD required for 20◦ panning in the conventional 60◦
described as the “blur of summing localization” by Blauert loudspeaker arrangement would lead to 30◦ panning if the
[1]. An imprecise stereophonic localization tends to be re- base angle is increased to 90◦ . Based on this, more general
lated to an increased image width [31]. As the ICLD or equations that take into account half the loudspeaker base
ICTD increased, the interaural cross-correlation coefficient angle can be expressed using Eq. (3) and Eq. (4). It should
(IACC) would decrease due to an increase in the resulting be noted, however, that the range of loudspeaker base angle
ITD at low frequencies. This would result in an increase in valid for this proportional relationship has not been spec-
perceived image width [32], thus possibly a greater local- ified explicitly. Therefore, the validities of the proposed
ization blur. The dependency of localization blur (or error) equations need to be verified through listening tests with
on source angle is originally an attribute of real source lo- various loudspeaker angle arrangements.
calization; the localization blur for a single source is greater
" # $
at a wider source angle [1]. From the above it could be gen- 0.425 30α [d B], α ≤ 2θ
3
erally suggested that the localization precision of a single I C L D(α) = # 30αθ$ (3)
0.85 θ − 8.5 [d B], 2θ < α<θ
or phantom source becomes lower with an increase in the 3
resulting interaural differences.
" # $
Wittek and Theile [29] proposed linear shift factors only 0.425 30α [d B], α ≤ 2θ
3
up to 22.5◦ as introduced earlier. However, based on the I C L D(α) = # 30αθ$ (4)
0.85 θ − 8.5 [d B], 2θ < α<θ
results of the present study it is proposed here to apply two 3
linear panning factors for two separate panning regions,
which are 0◦ –20◦ and 21◦ –30◦ . Since no intermediate an- where α is the target image position in degree, and θ is half
gles between 21◦ and 30◦ were tested, the linearity of image the loudspeaker base angle in degree.
shift in this region is not confirmed. In fact, the literature It is considered that these perceptually motivated pan-
tends to show slightly exponential curves in that region. ning methods could be more effective than the theoret-
However, considering that the panning uncertainty becomes ically obtained sine or tangent law when it comes to a
greater at a wider angle a linear approximation is proposed. matter of accurate matching between predicted and per-
The proposed level and time panning factors for the two ceived image positions. One example application that can
panning regions are shown in Table 5. To derive the linear benefit from the proposed methods would be the recently
factors the original unified ICLD and ICTD data for each standardized “Spatial Audio Object Codec” (SAOC) [35],
angle were adjusted within the deviation ranges of 0.15 dB which requires decoded audio objects to be mapped to spe-
and 0.01 ms, respectively. cific positions based on a given rendering scenario in the
From these panning factors, the following panning equa- post-processing. Other potential applications for the pro-
tions can be derived. posed methods include: extracting directional information
for music information retrieval [36], stereo to multichan-
! nel upmixing based on source separation technique [37],
0.425α [d B], α ≤ 20 and adjusting the stereophonic sweet spot to the listener’s
I C L D(α) = (1)
0.85α − 8.5 [d B], 20 < α ≤ 30 position [38].
J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 985
LEE AND RUMSEY PAPERS
986 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES
[13] D. Griesinger, “Stereo and Surround Panning in [29] H. Wittek, and G. Theile, “The Recording Angle—
Practice,” presented at the 112th Convention of the Au- Based on Localization Curves,” presented at the 112th
dio Engineering Society (2002 May), convention paper Convention of the Audio Engineering Society (2002 May),
5564. convention paper 5568.
[14] S. P. Lipshitz, “Stereo Microphone Techniques: Are [30] A. W. Mills, “On the Minimum Audible Angle,”
the Purists Wrong?” J. Audio Eng. Soc., vol. 34, no. 9, J. Acoust. Soc. Am., vol. 30, pp. 237–246 (1958).
pp. 717–743 (1986 Sep.). [31] J. Berg, and F. Rumsey, “Validity of Selected Spatial
[15] H-K. Lee, and F. Rumsey, “Elicitation and Grading Attributes in the Evaluation of 5-Channel Microphone Tech-
of Subjective Attributes of 2-Channel Phantom Images,” niques,” presented at the 112th Convention of the Audio
presented at the 116th Convention of the Audio Engineering Engineering Society (2002 May), convention paper 5593.
Society (2004 May), convention paper 6142. [32] T. Hidaka, L. Beranek, and T. Okano “Interaural
[16] J. Blauert, and W. Cobben, “Some Consideration Cross-Correlation Lateral Fraction, and Low- and High-
of Binaural Crosscorrelation Analysis,” Acoustica, vol. 39, Frequency Sound Levels as Measures of Acoustical Quality
pp. 96–104 (1978). in Concert Halls,” J. Acoust. Soc. Am., vol. 98, pp. 988–
[17] V. Hansen, and G. Munch, “Making Recordings for 1007 (1995).
Simulation Tests in the Archimedes Project,” J. Audio Eng. [33] G. Theile, “Natural 5.1 Music Recording Based on
Soc., vol. 39, no. 10, pp. 768–774 (1991 Oct.). Psychoacoustic Principles,” Proc. of the Audio Engineering
[18] C. S. Myers, “The Influence of Timbre and Loud- Society 19th International Conference, pp. 201–229.
ness on the Localization of sounds,” In Proc. R. Soc. Lond. [34] G. Theile, “On the Performance of Two-Channel
B., vol. 88, pp. 267–284 (1914 Sep.). and Multi-channel Stereophony,” presented at the 88th
[19] F. E. Toole and B. McA. Sayers, “Lateralization Convention of the Audio Engineering Society (1990 Mar.),
Judgments and the Nature of Binaural Acoustic Images,” convention paper 2887.
J. Acoust. Soc. Am., vol. 37, pp. 319–324 (1965). [35] J. Herre, H. Purnhagen, J. Koppens, O. Hellmuth, J.
[20] J. L. Flanagan, E. E. David, and B. J. Watson, “Bin- Engdegard, J. Hilper, L. Villemoes, L. Terentiv, C. Falch,
aural Lateralization of Cophasic and Antiphasic Clicks,” A. Hölzer, M. Valero, B. Resch, H. Mundt, H-O. Oh and,
J. Acoust. Soc. Am., vol. 36, pp. 2184–2193 (1964). “MPEG Spatial Audio Object Coding – The ISO/MPEG
[21] W. Yost, F. Wightman, and M. Green, “Lateral- Standard for Efficient Coding of Interactive Audio Scenes,”
ization of Filtered Clicks,” J. Acoust. Soc. Am., vol. 50, J. Audio Eng. Soc., vol. 60, no. 9, p. 655–673 (2012 Sep.).
pp. 1526–1531 (1971). [36] G. Tzanetakis, L. G. Martins, K. McNally, R. Jones
[22] B. Rakerd and W. Hartmann, “Localization of and, “Stereo Panning Information for Music Information
Sound in Rooms, III: Onset and Duration Effects,” Retrieval Tasks,” J. Audio Eng. Soc., vol. 58, no. 5, p. 409–
J. Acoust. Soc. Am., vol. 80, pp. 1695–1706 (1986). 417 (2010 May).
[23] J. Tobias and S. Zerlin, “Lateralization Thresholds [37] M. Cobos and J. J. Lopez, “Resynthesis of Sound
as a Function of Stimulus Duration,” J. Acoust. Soc. Am., Scenes on Wave-Field Synthesis from Stereo Mixtures Us-
vol. 31, pp. 1591–1594 (1959). ing Sound Source Separation Algorithms,” J. Audio Eng.
[24] W. Hartmann, “Localization of Sounds in Rooms,” Soc., vol. 57, no. 3, p. 91–110 (2009 Mar.).
J. Acoust. Soc. Am., vol. 74, pp. 1380–1391 (1993). [38] M. Sebastian and S. Groth, “Adaptively Adjust-
[25] S. S. Stevens and E. B. Newman, “The Localiza- ing the Stereophonic Sweet Spot to the Listener’s Posi-
tion of Actual Sources of Sound,” Am. J. Psych., vol. 48, tion,” J. Audio Eng. Soc., vol. 58, no. 10, p. 809–817
pp. 297–306 (1936). (2010 Oct.).
[26] G. Henning, “Detectability of Interaural Delay in [39] J. Braasch, “Modelling of Binaural Hearing,”
High-Frequency Complex Waveforms,” J. Acoust. Soc. J. Blauert Ed., in Communication Acoustics (Springer,
Am., vol. 55, pp. 84–90 (1974). Berlin, 2005).
[27] G. G. Harris, “Binaural Interaction of Impulsive [40] B. Gardner and K. Martin, URL: http://sound.
Stimuli and Pure Tones,” J. Acoust. Soc. Am., vol. 32, media.mit.edu/resources/KEMAR.html. 2000.
pp. 685–692 (1960). [41] D. J. Kistler and F. L. Wightman, “A Model of
[28] G. Theile, On Localization in the Superimposed Head-Related Transfer Functions Based on Principal Com-
Sound Field, Ph.D. Thesis (Technische Universität, Berlin, ponents Analysis and Minimum-Phase Reconstruction,”
Germany, 1980). J. Acoust. Soc. Am., vol. 91, pp. 1637–1647 (1992).
J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 987
LEE AND RUMSEY PAPERS
Fig. A.1. Variations of Interaural Time Difference (upper panel) and Interaural Level Difference (lower panel) as a function of Interchannel
Time Difference for various frequencies
THE AUTHORS
Hyunkook Lee was born in Seoul, South Korea, in 1977. member of the Audio Engineering Society and a fellow of
He received a BMus degree in music and sound recording the Higher Education Academy, UK.
(Tonmeister) from the University of Surrey, Guildford, UK, •
in 2002, and his Ph.D. degree in audio engineering from the
Institute of Sound Recording (IoSR) at the same University Francis Rumsey is an independent technical writer and
in 2006. From 2006 to 2010, Dr. Lee was Senior Research consultant, based in the UK. Until 2009 he was Professor
Engineer in audio R&D at LG Electronics, South Korea. and Director of Research at the Institute of Sound Record-
Since 2010, he has been working as Senior Lecturer in ing, University of Surrey, specializing in sound quality,
music technology at the University of Huddersfield, Hud- psychoacoustics, and spatial audio. He is currently chair of
dersfield, UK. Dr. Lee has also been a freelance recording the AES Technical Council, Consultant Technical Writer
engineer since 2002. His current research interests include and Editor for the AES Journal. Among his musical activ-
auditory spatial perception, 3-D-audio recording and ren- ities he is organist and choirmaster of St. Mary the Virgin
dering techniques, and virtual acoustics. He is an active Church in Witney, Oxfordshire.
988 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
View publication stats