0% found this document useful (0 votes)
80 views12 pages

Level and Time Panning of Phantom Images For Musical Sources

This document summarizes a study that investigated how interchannel level differences (ICLD) and interchannel time differences (ICTD) influence the perceived panning of stereo phantom images for different musical sources. The study found that level panning works robustly regardless of the source's spectral or temporal characteristics, while time panning is not suitable for continuous sources with a high fundamental frequency. The results also derived unified ICLD and ICTD values for 10°, 20°, and 30° image positions, as well as linear panning functions for 0-20° and 21-30° regions.

Uploaded by

LavkeshKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views12 pages

Level and Time Panning of Phantom Images For Musical Sources

This document summarizes a study that investigated how interchannel level differences (ICLD) and interchannel time differences (ICTD) influence the perceived panning of stereo phantom images for different musical sources. The study found that level panning works robustly regardless of the source's spectral or temporal characteristics, while time panning is not suitable for continuous sources with a high fundamental frequency. The results also derived unified ICLD and ICTD values for 10°, 20°, and 30° image positions, as well as linear panning functions for 0-20° and 21-30° regions.

Uploaded by

LavkeshKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/283708337

Level and Time Panning of Phantom Images for Musical Sources

Article  in  Journal of the Audio Engineering Society · December 2013

CITATIONS READS

23 312

2 authors, including:

Hyunkook Lee
University of Huddersfield
86 PUBLICATIONS   273 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Mulitchannel Microphone Array Recording for Popular Music Production in Virtual Reality View project

Perceptual Rendering of Vertical Image Width for 3D Multichannel Audio View project

All content following this page was uploaded by Hyunkook Lee on 11 November 2015.

The user has requested enhancement of the downloaded file.


PAPERS

Level and Time Panning of Phantom Images for


Musical Sources

HYUNKOOK LEE,1 AES Member , AND FRANCIS RUMSEY,2 AES Fellow


([email protected]) ([email protected])

1
University of Huddersfield, Huddersfield, HD1 3DH, United Kingdom
2
Logophon, Ltd., Witney, Oxon, OX28 1DB, United Kingdom

This study investigates the independent influences of interchannel level difference (ICLD)
and interchannel time difference (ICTD) on the panning of 2-channel stereo phantom images
for various musical sources. The results indicate that a level panning can perform robustly
regardless of the spectral and temporal characteristics of source signals, whereas a time panning
is not suitable for a continuous source with a high fundamental frequency. Statistical differences
between the data obtained for different sources are found to be insignificant, and from this
a unified set of ICLD and ICTD values for 10◦ , 20◦ , and 30◦ image positions are derived.
Linear level and time panning functions for the two separate panning regions of 0◦ –20◦ and
21◦ –30◦ are further proposed, and their applicability to arbitrary loudspeaker base angle is
also considered. These perceptual panning functions are expected to be more accurate than the
theoretical sine or tangent law in terms of matching between predicted and actually perceived
image positions.

0 INTRODUCTION Summing localization is valid only up to a certain thresh-


old of ICTD (e.g., 1 ms as widely accepted), within which
The localization of a stereophonic phantom image is a trade-off between ICLD and ICTD is possible. Beyond
based on the principle of the so-called “summing localiza- this threshold, the localization of an auditory image largely
tion” [1]. In 2-channel loudspeaker reproduction, acoustic relies on the precedence effect [4], where the image is per-
crosstalk of loudspeaker signals occurs at each ear of the ceived constantly at the earlier loudspeaker up to the echo
listener; the signal from the contralateral loudspeaker is threshold.
“summed” with that from the ipsilateral loudspeaker, with Since 1940 a number of studies have been conducted
the former being attenuated in level at high frequencies due to investigate the independent influence of ICLD or ICTD
to head shadowing and delayed in time relative to the lat- on the position of a phantom image perceived between two
ter. If the signals are coherent, the listener will perceive a loudspeakers [2,5, 6,7, 8,9]. The data from these studies has
single phantom image in the median plane. If an interchan- had many practical applications. For example, Williams
nel level difference (ICLD) or interchannel time difference [10] analyzed the coverage angles of two-channel near-
(ICTD) is applied to the loudspeaker signals, some com- coincident microphone techniques based on the data ob-
bination of interaural level difference (ILD) and interaural tained by Simonsen [8]. Wittek [11] developed a tool called
time difference (ITD) will be introduced between the ear “Image Assistant” to calculate localization curves for vari-
input signals, and consequently the apparent position of the ous microphone arrays based on the ICLD and ICTD data
image will be “panned” from the middle toward the ear- derived from the literature. A reliable ICLD or ICTD data
lier or louder loudspeaker. Research suggests that phantom set that is obtained from perceptual experiments would also
images panned using ICLD are localized mainly based on be useful for panning applications where an accurate match-
ITDs at low frequencies and on ILDs and envelope-based ing between target and perceived image positions is essen-
ITDs at high frequencies [2,3]. With regard to ICTD-based tial. The conventional sine and tangent level panning laws
panning, the frequency-dependency of interaural cues has [2,12] have been claimed to be inaccurate in this type of
not been studied extensively. However, it was shown in [1] application. They are based on ITD cues at low frequen-
that an ICTD produces only an ILD at a low frequency when cies only and tend to result in a greater angular displace-
it is assumed that there is no level difference between the ment than predicted for broadband sources [13]. With re-
loudspeaker signals arriving at each ear at low frequencies, spect to ICTD-based panning (time panning), there has been
whereas it leads to both ILD and ITD at a high frequency. no global law proposed to date for practical applications.

978 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES

Typically, a time-panned image would suffer from the pannings, a total of five sound sources were chosen for this
comb-filtering effect when the signals are combined as experiment, comprising:
mono and be less localizable than a level-panned one
[14,15]. However, the former tends to be perceived as • Piano: “staccato” note of C2 (f0 = 65 Hz);
more spacious than the latter [15], which is why spaced • Piano: “staccato” note of C6 (f0 = 1046 Hz);
microphone techniques are often preferred to coincident • Trumpet: “sustain” note of Bflat3 (f0 = 228 Hz);
techniques by sound engineers. The spatial quality of • Trumpet: “sustain” note of Bflat5 (f0 = 922 Hz);
time-panned images could also be beneficial for practical • Male speech.
panning-based mixing applications. While it would have
been practically difficult to implement a time panning func- The piano and trumpet were chosen as representatives
tionality in an analog mixing desk, it would not be any of transient and continuous sounds. For each source, low
longer with today’s digital technology. and high-pitched notes were used to examine the effect of
The results from the ICLD and ICTD experiments con- spectral characteristics. It was decided to use single notes
ducted in the past are considerably divergent depending instead of performance extracts to control the variables as
on the spectral and temporal characteristics of the test much as possible. The onset and offset transients of the
stimulus used. The difference between low frequency and trumpet sounds were removed by fading in and out the
high frequency sounds in ICLD-based panning (level pan- beginning and ending for one second each in order to in-
ning) was noted in [2,7, 12], showing that high frequen- vestigate the effect of continuous nature more exclusively.
cies generally required less ICLD than low frequencies for The speech signal was included for its broadband frequency
a given image position. It is also evident that continuous spectrum as well as complex temporal characteristics that
and transient sounds are localized differently in level pan- combine both transient and continuous natures. Since a
ning [3,7]. In case of time panning, it has been shown number of past localization experiments used speech, the
that the change in image position perceived as a function speech source used in this test was expected to be a useful
of ICTD had a fluctuating pattern [7,16] and that tran- reference for comparison of results.
sient sounds were more reliably localized than continuous Ideally all the sound sources would have been recorded
sounds [7]. The above-mentioned findings were obtained under an anechoic condition, but this was unavailable. Al-
for controlled sound sources such as noise and pure tone. ternatively, the piano signals were recorded in a small
However, to the authors’ best knowledge there has been recording booth using a single cardioid condenser micro-
no panning experiment using different notes of musical phone (AKG C414 B-ULS) placed about 30 cm over the
sources. Musical sources have a complex harmonic and en- hammers for the desired notes. The piano was completely
velope nature as opposed to noise or tone, and are used for covered with thick cloth in order to reduce room reflections
practical panning applications most widely. It was there- and reverberation as much as possible. The trumpet signals
fore considered to be worth investigating level and time were also recorded in an acoustically dry studio using the
panning behaviors of musical notes with different spectral same type of microphone placed about 1 m away from the
and temporal characteristics. Furthermore, ICLD and ICTD instrument. The speech signal used was an anechoically
panning values derived using musical sources were consid- recorded male speech taken from the Bang and Olufsen’s
ered more ecologically valid than those using controlled Archimedes project CD [17]. The waveforms and frequency
sources. responses of the stimuli used are presented in Fig. 1.
From the above background, a series of listening tests
was conducted to investigate the independent influences of 1.2 Subjects
ICLD and ICTD on the localization of stereo phantom im-
A total of five subjects took part in the test. They were
ages using musical notes. It was also of interest to derive
research staff and doctoral students at the Institute of Sound
level and time panning functions for loudspeaker reproduc-
Recording of the University of Surrey. All of them reported
tion based on the subjective data obtained from the listening
normal hearing and were trained and experienced in spatial
tests. The first section of this paper describes the experi-
listening. Due to the nature of the test requiring a highly
mental method. Second, the listening test data obtained for
critical listening skill, it was decided to employ a relatively
different panning methods and different sources are ana-
small number of critical subjects rather than a large number
lyzed statistically. Third, main findings from the listening
of less experienced ones and repeat the whole test set three
tests are discussed. Finally, perceptually motivated level and
times for each subject in order to ensure a sufficient amount
time panning equations are proposed based on the results
of data for later statistical analysis.
of the listening test.

1.3 Test Setup and Procedure


The listening tests were conducted in the ITU-R
1 EXPERIMENTAL METHOD BS.1116-compliant listening room at the University of
Surrey, UK. Two Genelec 1032A loudspeakers were ar-
1.1 Test Signals ranged in the standard 60◦ configuration with a distance of
In order to investigate the effects of different spectral and 2.4 m between them and from the listener position. Refer-
temporal characteristics on the behaviors of time and level ence markers were placed at +10◦ , +20◦ , and +30◦ positions

J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 979
LEE AND RUMSEY PAPERS

Fig. 1. Waveforms and long-term-averaged spectra of the stimuli used for the experiments

a gain coefficient from 1 to 0 was continuously applied to


the left channel as the slider was moved from top to bot-
tom, and the judged slider position was transformed into a
decibel value with the interval of 0.1 dB. When the subject
adjusted ICTD, ICLD was maintained at 0 dB, and vice
versa. The order of stimulus as well as that of the angles
to be judged was randomized for each subject in order to
avoid a psychological order effect.

2 RESULTS
Fig. 2. Control interface for the localization test developed using
Max-MSP software This section presents the results from the listening tests
conducted. The ICLD and ICTD data obtained for each
target angle are first explored for each sound source. The
in the right reproduction sector. The tests were designed so effect of source for each angle is then analyzed. Since the
that the listener adjusted ICTD or ICLD using a slider pro- scales used for the experiments had an ordinal nature, me-
vided in a control interface to match the perceived positions dian values and 25th –75th percentile intervals are plotted.
of phantom images to those of the reference markers. Time For the same reason, the statistical analysis methods used
delay or level attenuation was applied to the left chan- were non-parametric tests.
nel only. In case of level panning, this introduced modest
changes in overall loudness as the ICLD was varied. How- 2.1 Level Panning
ever, horizontal localization is not known to be influenced The results obtained using the level panning method are
by overall loudness level [e.g., 18], so this was not com- shown in Fig. 3. From the plots, it appears that there is
pensated here. The average playback levels of the stimuli no overlap of percentile bars between any pair of target
before panning were calibrated to 75 dB(A) at the listening angles. To confirm the statistical significance of the differ-
position. ence between the data for each angle, a Wilcoxon test was
The subjects were allowed to listen to the stimuli re- performed using the SPSS data analysis tool. The results
peatedly until they were completely confident about their showed that all the differences were significant at the 0.01
decisions. They were asked to face the front and not to move level for all sources.
their heads while listening to the sounds, which was moni- From the results for individual sources, the following
tored by watching the subject through the window between observations were made. First, the low and high note pi-
the control room and the listening room. The control inter- ano sources appear to have small differences in the median
face was developed using MAX-MSP software as shown values for all angles, which range from 0.5 to 1.1 dB. The
in Fig. 2. The adjustment range of ICTD was from 0 to percentile bars are in the range of only around 2 dB at all an-
5 ms with the interval of 0.01 ms. For the ICLD variation, gles for both sources except 30◦ for high piano. The median

980 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES

Fig. 3. Results of level panning test: Medians and associated 25% to 75% percentiles

difference between low trumpet and high trumpet for each range of only 0.1–0.3 dB. In general, the percentile bars for
angle is also small. However, the percentile intervals of low all sources generally become larger as the angle increases.
trumpet are slightly larger than those of high trumpet for 10◦
and 20◦ . Although the trumpet sources have similar median
values to the piano ones, it can be observed that there are 2.2 Time Panning
some obvious differences in the size of percentile interval The median values and 25th –75th percentile intervals for
between the trumpet and piano. For example, low trumpet the ICTD data obtained for each source are plotted in Fig. 4.
has a substantially wider percentile bar for 30◦ compared to The ICTD result for the high note trumpet source was not
low piano, although the results for 10◦ and 20◦ are similar. produced since all the subjects found it impossible to judge
Moreover, high trumpet appears to have wider percentile fixed ICTDs to localize the phantom image. They all com-
bars than high piano at all angles. The results for the speech mented that the image position appeared to change ran-
source have similar patterns to those for low piano. The domly and rapidly between the loudspeakers as they moved
difference between median values for each angle is in the the position of the slider. The Wilcoxon test showed that

J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 981
LEE AND RUMSEY PAPERS

Fig. 4. Results of time panning test: Medians and associated 25% to 75% percentiles

the difference between the data for each pair of angles was Table 1. Friedman test performed on the effect of sound
significant for all sources (p < 0.01). However, compared source for the ICTD and ICID panning methods
to the ICLD results, the gap between the percentile intervals
ICLD ICTD
for adjacent angles appears to be smaller in general.
It can be seen from the data plots that the low and high Angle X2 df Sig. X2 df Sig.
notes of piano produced similar results. The median values
are plotted in a similar range for each angle. Similarly 10◦ 5.177 4 0.700 3.207 3 0.361
to the results of the ICLD test, the size of the percentile 20◦ 8.492 4 0.075 5.011 3 0.171
30◦ 0.225 4 0.994 4.333 3 0.228
interval increases as the angle increases. The data ranges
for 10◦ and 20◦ are around 0.2 ms whereas those for 30◦
is 0.5–0.7 ms. The low trumpet and speech results show Table 2. Overall median values and InterQuartile Ranges
(IQRs)
similar patterns to the piano ones in terms of an increase in
percentile interval at a higher angle. However, the median
Panning Angle Median IQR
values for the trumpet are greater than those for the piano
especially at 20◦ and 30◦ . For the speech source, the median ICLD (dB) 10◦ 4.1 1.15
value for 30◦ is the highest among those of all sources. 20◦ 8.4 2.0
30◦ 17.1 4.7
ICTD (ms) 10◦ 0.27 0.11
2.3 Analysis of the Effect of Sound Source 20◦ 0.49 0.29
From the results presented above, common patterns were 30◦ 1.0 0.58
observed for all sources regardless of the panning method as
follows. First, the percentile intervals increase as the target
angle increases. Second, the difference in median values parametric test was chosen since each subject tested all five
between 20◦ and 30◦ is bigger than that between 10◦ and sound sources and therefore the obtained data are interre-
20◦ . Finally, the median values for all sources lie within the lated. Table 1 summarizes the results of the test.
smallest percentile interval of all for each angle. As can be seen from the results, there is no significant
In order to statistically examine the effects of sound difference found between sound sources for any angle. It is
source on the ICLD and ICTD judgments for each pan- therefore possible to combine the data for all sound sources
ning angle, the Friedman test was carried out. This non- and produce unified results. Table 2 shows the overall

982 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES

it can be regarded that the images of the high note piano


(f0 = 1046 Hz) and trumpet (f0 = 922 Hz) were localized
using ILD and envelope ITD cues. For the low note piano
and trumpet as well as speech, on the other hand, both cues
were likely to be used with different weightings depending
on the frequency.
Despite the dependency of ITD and ILD cues on fre-
quency, the median ICLD values for high note sources ap-
peared to be similar to those of the low note and speech
sources, which have wider bandwidths. This result seems to
suggest that frequencies above the fundamental frequency
of the high note for each source played a more domi-
nant role than the lower frequencies in the judgment of
ICLD. Griesinger [13] reported the dominance of frequen-
cies between 700 Hz and 4 kHz in the determination of the
perceived position of amplitude-panned broadband speech
source. He explains this from a physiological viewpoint that
the perceived position of each frequency band is weighted
according to the strength of nerve firings in that band, which
is determined by the transfer function of the external and
middle ears. It was reported in other studies [19,20] that
interaural differences produced at middle frequencies be-
tween 500 Hz and 2000 Hz dominated the average image
position for broadband noise signals. Based on the above
findings, it can be suggested that the result of the current
study for each musical source was dominated by frequen-
cies between the fundamental frequency and certain high-
middle frequency, which is considered to be worth further
investigation.
The results for the trumpet sustain notes without on-
Fig. 5. Plots of overall median values and 25th to 75th percentiles set transient were not significantly different from those of
for the ICLD and ICTD data
piano staccato notes. As mentioned above, ICLD in level
panning is translated into ITD at low frequencies. Although
a steady-state tone is difficult to localize using ITD [21,22],
median values and the 25th and 75th percentiles, which are
the ongoing part of broadband noise can still be localized
also plotted in Fig. 5. It can be observed again in the unified
accurately using ITD [23,24]. Hartmann [24] proposes that
plots that the percentile interval becomes wider as the an-
the random amplitude fluctuations of noise signal can be
gle increases. This effect is greater with the ICTD results.
interpreted as a series of small transients, which produces
It is also worth noting that the increase of median value
useful ITD cues for localization. The trumpet sources used
is almost constant up to 20◦ and becomes steep from 20◦
in the current study also fluctuate in level over time slightly
to 30◦ .
as can be seen in Fig. 1, and this might explain why they
were localized similarly to the piano sources.
3 DISCUSSIONS

3.1 Source Effect in Level Panning 3.2 Source Effect in Time Panning
There was no significant difference found between low The low note trumpet was time-panned with a similar
and high notes in the judgment of ICLD for each panning certainty to the speech signal. However, the time panning
angle. This result can be discussed in light of the findings of the high trumpet was found to be impossible due to er-
of Pulkki and Karjalainen [2001]. They investigated the ratic and random changes in apparent image position with
localization accuracies of ITD and ILD cues created from varied ICTD. Since the trumpet sources had onset and off-
level-panned noise signals at 11 ERB frequency bands. It set transients removed by fade in/out, this result leads to a
was concluded in their study that the localization of a level- discussion on the effect of frequency in the time panning
panned phantom image relied more on ITD cues at low of ongoing sound. As shown in Fig. A.1 in the Appendix,
frequencies and more on ILD cues at high frequencies. The ITD and ILD caused by ICTD fluctuate between positive
degrees of localization accuracy for the high frequency ILD and negative values periodically as a function of ICTD.
cues were reported to be similar to those for low frequency This is due to the interaction between the time difference
ITD cues. In other studies [2,13], it was suggested that the between two loudspeaker signals arriving at each ear and
ITD of the signal envelope is also used for the localization the wavelength of the signal at a given frequency [14]. It
of level-panned image at high frequencies. Based on these, can be observed in the figure that the ITD and ILD for

J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 983
LEE AND RUMSEY PAPERS

Table 3. Comparison of ICLD values obtained by different authors: de Boer, speech; Brittain and Leakey, speech limited to 5 kHz;
Simonsen, speech and maracas; Wittek, speech/based on the literature; Lee and Rumsey, various musical sources and speech

Image de Boer Brittain and Sine law: Tangent law: Simonsen Wittek and Lee and Rumsey
position [5] Leakey [6] Clark et al. [12] Leakey [2] [8] Theile [9,29] [present]

10◦ 4.5 dB 3.9 dB 6.3 dB 5.5 dB 2.5 dB 4.4 dB 4.1 dB


20◦ 9.5 dB 7.9 dB 14.5 dB 12.9 dB 5.5 dB 8.8 dB 8.4 dB
30◦ N/A 13.5 dB Inf. Inf. 15 dB 18 dB 17.1 dB

250 Hz vary constantly at least up to about 1.5 ms, within mainly relied on the strong transient nature rather than the
which the ICTDs for the low trumpet were judged. How- fundamental frequency, whereas that of the low note relied
ever, the periods of ITD and ILD fluctuations become more on the frequency component.
shorter as the frequency goes up. This means that at higher
frequencies ITD and ILD would change more rapidly with 3.3 Comparison with Previous Results
a small change of ICTD, thus a more frequent variation in In Tables 3 and 4 the unified median ICLD and ICTD
perceived image position over ICTD. Furthermore, in the data obtained from the present study are compared with the
localization of a real source at high frequencies above 1.5 results from some of the previous studies that used speech
kHz, ITD cues would become ambiguous and ILD cues sources. The data for de Boer [5], Brittain and Leaky [6],
would play a dominant role according to the duplex theory and Leakey [2] were approximated from the original data
[25], although for complex waveforms ITDs can still be de- since their studies tested different target image positions.
tected at high frequencies dependent on the signal envelope Wittek and Theile [29]’s values for 10◦ and 20◦ were cal-
[26]. With time panning, however, the ILD tends to become culated using the image shift factors of 2.2◦ /dB and 3.9◦ /
smaller at a higher frequency as can be seen also in Fig. A.1. 0.1 ms, which they proposed based on the average of results
Since there is no effective ITD-ILD trading for localization found in the literature. They suggest that these factors are
at high frequencies [27], the small and fluctuating ILD cues valid up to 22.5◦ since a linear relationship between ICLD
alone would not be able to cause an image position to be or ICTD and perceived position is generally observed only
panned to a wide angle. up to that angle in the literature. The value for 30◦ was
For the piano staccatos, on the other hand, it was pos- obtained from Wittek and Theile’s own listening test [9].
sible for the listeners to judge ICTDs for the high note Additionally, the ICLDs predicted by the sine and tangent
without difficulty. This seems to be due to the strong tran- level panning law are also included in Table 3 for a compar-
sient nature of the piano staccato. While the time panning ison between theoretical and experimental results. It can be
of high frequency ongoing sound suffers from ambiguous seen that the present ICLD results for 10◦ and 20◦ are most
ITD and ILD cues, that of transient sound mainly relies on similar to Brittain and Leakey’s and Wittek and Theile’s.
the time difference between the signals that arrive first at For 30◦ , Wittek and Theile’s result is the closest to the
the two ears [1,28]. The results also showed that the ICTDs present one. It is interesting to observe that Simonsen’s
judged for the high piano were not significantly different data, which are arguably most widely quoted in the context
from those for the low piano. This might be explained by of microphone technique, are considerably lower than the
the effect of signal onset duration on localization accuracy. others. It is also noticeable that the values from the sine and
It is usually observed that the onset durations for a piano tangent laws are greater than all the experimental results.
staccato signal become shorter as the note number goes up. This means that there would be a discrepancy between the
For example, the onset duration for the C6 note used in predicted and perceived image positions. With respect to
the present study was 9 ms whereas that for the C2 was ICTD, again the present results appear to be most similar
45 ms. Rakerd and Hartmann [22] investigated the effect of to Wittek and Theile’s. Simonsens’s values for 10◦ and 20◦
onset duration on localization accuracy in the context of the are the lowest of all whereas that for 30◦ is the highest. In
precedence effect using 500 Hz and 2 kHz tones, and found general, the discrepancy among the different ICTD results
that the shorter the onset duration was, the smaller the local- is smaller than that among the ICLD results.
ization error was. It was further found that the 2 kHz tone
was localized as accurately as the 500 Hz when the onset 3.4 Level and Time Panning Functions
duration was short. This suggests the dominance of onset The results show that the unified ICLD and ICTD me-
duration over frequency in localization. From this, it might dian values required for 10◦ and 20◦ image shifts increased
be hypothesized that the localization of the high note piano almost linearly. On the other hand, the increase from 20◦

Table 4. Comparison of ICTD values obtained by different authors: de Boer, speech; Leakey, speech; Simonsen, speech and maracas;
Wittek and Theile, speech/based on the literature; Lee and Rumsey, various musical sources and speech

Image position de Boer [5] Leakey [2] Simonsen [8] Wittek and Theile [9,29] Lee and Rumsey [present]

10◦ 0.7 ms 0.25 ms 0.2 ms 0.26 ms 0.27 ms


20◦ 1.7 ms 0.55 ms 0.44 ms 0.51 ms 0.49 ms
30◦ N/A N/A 1.12 ms 1.0 ms 1.0 ms

984 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES

to 30◦ was almost double that in the lower region. This Table 5. Panning factors derived from the present results
seems to suggest that the directional resolution of ICLD or
ICTD becomes lower beyond around 20◦ . A similar ten- Panning region
dency can be seen in the previous results mentioned above
Panning method 0◦ –20◦ 21◦ –30◦
and might be explained as follows. Mills [30] found that
the smallest possible angular change of sound source that ICLD 0.425 dB/deg. 0.85 dB/deg.
could be just detected (“minimum audible angle” (MAA)) (2.4◦ /dB) (1.2◦ /dB)
became larger as the source moved away from the front ICTD 0.025 ms/deg. 0.05 ms/deg.
(4.0◦ /ms) (2.0◦ /ms)
toward the side of the listener. Since an increase in source
angle causes increases in ILD and ITD, the MAA result
could be interpreted as a just noticeable difference (JND)
of interaural cues in perceived source position being in- !
0.025α [ms], α ≤ 20
creased with larger interaural differences. In terms of the I C T D(α) = (2)
0.05α − 0.5 [ms], 20 < α ≤ 30
current result, therefore, it is hypothesized that the JND of
interaural differences resulting from level or time panning
increased as the ICLD or ICTD was increased beyond a where α is target image position in degree.
certain threshold. It needs to be noted that the above equations are valid only
It was also shown that the judged ICLD and ICTD be- for the conventional 60◦ loudspeaker arrangement since the
came more divergent as the targeted panning angle in- panning factors were obtained from listening tests using
creased. One possible explanation for this result is that that particular arrangement. However, according to Theile
the subject might have found it more difficult to locate [33,34], the angular displacement of phantom image caused
the image precisely to one position as the ICLD or ICTD by a certain ICLD or ICTD changes constantly in propor-
was increased. Wendt [7] observed this phenomenon in his tion to the loudspeaker base angle. For example, an ICLD
stereophonic localization study using pure tones, which is or ICTD required for 20◦ panning in the conventional 60◦
described as the “blur of summing localization” by Blauert loudspeaker arrangement would lead to 30◦ panning if the
[1]. An imprecise stereophonic localization tends to be re- base angle is increased to 90◦ . Based on this, more general
lated to an increased image width [31]. As the ICLD or equations that take into account half the loudspeaker base
ICTD increased, the interaural cross-correlation coefficient angle can be expressed using Eq. (3) and Eq. (4). It should
(IACC) would decrease due to an increase in the resulting be noted, however, that the range of loudspeaker base angle
ITD at low frequencies. This would result in an increase in valid for this proportional relationship has not been spec-
perceived image width [32], thus possibly a greater local- ified explicitly. Therefore, the validities of the proposed
ization blur. The dependency of localization blur (or error) equations need to be verified through listening tests with
on source angle is originally an attribute of real source lo- various loudspeaker angle arrangements.
calization; the localization blur for a single source is greater
" # $
at a wider source angle [1]. From the above it could be gen- 0.425 30α [d B], α ≤ 2θ
3
erally suggested that the localization precision of a single I C L D(α) = # 30αθ$ (3)
0.85 θ − 8.5 [d B], 2θ < α<θ
or phantom source becomes lower with an increase in the 3
resulting interaural differences.
" # $
Wittek and Theile [29] proposed linear shift factors only 0.425 30α [d B], α ≤ 2θ
3
up to 22.5◦ as introduced earlier. However, based on the I C L D(α) = # 30αθ$ (4)
0.85 θ − 8.5 [d B], 2θ < α<θ
results of the present study it is proposed here to apply two 3
linear panning factors for two separate panning regions,
which are 0◦ –20◦ and 21◦ –30◦ . Since no intermediate an- where α is the target image position in degree, and θ is half
gles between 21◦ and 30◦ were tested, the linearity of image the loudspeaker base angle in degree.
shift in this region is not confirmed. In fact, the literature It is considered that these perceptually motivated pan-
tends to show slightly exponential curves in that region. ning methods could be more effective than the theoret-
However, considering that the panning uncertainty becomes ically obtained sine or tangent law when it comes to a
greater at a wider angle a linear approximation is proposed. matter of accurate matching between predicted and per-
The proposed level and time panning factors for the two ceived image positions. One example application that can
panning regions are shown in Table 5. To derive the linear benefit from the proposed methods would be the recently
factors the original unified ICLD and ICTD data for each standardized “Spatial Audio Object Codec” (SAOC) [35],
angle were adjusted within the deviation ranges of 0.15 dB which requires decoded audio objects to be mapped to spe-
and 0.01 ms, respectively. cific positions based on a given rendering scenario in the
From these panning factors, the following panning equa- post-processing. Other potential applications for the pro-
tions can be derived. posed methods include: extracting directional information
for music information retrieval [36], stereo to multichan-
! nel upmixing based on source separation technique [37],
0.425α [d B], α ≤ 20 and adjusting the stereophonic sweet spot to the listener’s
I C L D(α) = (1)
0.85α − 8.5 [d B], 20 < α ≤ 30 position [38].

J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 985
LEE AND RUMSEY PAPERS

4 CONCLUSIONS ICTD change constantly in proportion to the loudspeaker


base angle, general equations for arbitrary loudspeaker base
The localization behaviors of ICLD- and ICTD-based angle were further proposed. The perceptually motivated
pannings (level and time pannings) at different target im- level and time panning methods are expected to be more
age positions were investigated using musical sources with accurate than the theoretical sine or tangent panning law.
different spectral and temporal characteristics as well as a Future works will include performance evaluations of the
wideband speech source. C2 and C6 piano staccato notes proposed methods using various practical sound sources in
represented transient sounds with low and high fundamental two-channel loudspeaker arrangements with various base
frequencies while Bflat3 and Bflat5 trumpet sustain notes angles. These methods will also be tested objectively using
continuous sounds. Subjects were asked to judge ICLDs such a binaural localization model suggested in [39].
and ICTDs to pan the perceived phantom image to 10◦ ,
20◦ , and 30◦ positions, which were marked between the
loudspeakers configured in the standard 60◦ angle. 5 ACKNOWLEDGMENT
There was no significant source effect found for the ICLD
results. The fact that the high and low note sounds were The authors would like to thank the anonymous reviewers
panned with similar ICLD values and IQRs seems to sup- for their insightful comments to improve the quality of this
port the past research suggesting that localization cues at paper. They are also grateful to those who participated in
high-middle frequencies around 1 kHz play the most domi- the listening tests.
nant role in the localization of level-panned phantom image.
It is suggested that the similarity between the staccato piano
and sustain trumpet results in level panning is mainly due to 6 REFERENCES
a series of small transients in the trumpet signal producing [1] J. Blauert, Spatial Hearing, rev. ed. (MIT Press, Cam-
ITD cues at low frequencies. bridge, MA, 1997).
With regard to the time panning, it was not possible [2] D. M. Leakey, “Some Measurements on the Effect of
to obtain results for the high note trumpet without onset Interchannel Intensity and Time Difference in Two Channel
and offset transients since every subject experienced an Sound Systems,” J. Acoust. Soc. Am., vol. 31, pp. 977–986
extreme difficulty in panning the perceived image to the (1959 July).
desired positions. It is suggested, based on the interaural [3] V. Pulkki, and M. Karjalainen, “Localization of
cue analysis described in the Appendix, that this problem Amplitude-Panned Virtual Sources, II: Two- and Three-
occurs due to the fact that ITD and ILD fluctuate between Dimensional Panning,” J. Audio Eng. Soc., vol. 49,
positive and negative values over ICTD with a higher rate at pp. 753–767 (2001 Sep.).
a higher frequency. This result implies that a time panning [4] H. Wallach, E. Newman and M. Rosenzweig, “The
would not be suitable for a highly continuous sound that Precedence Effect in Sound Localization,” Am. J. Psych.,
contains high frequencies only. vol. 52, pp. 315–336 (1949 July).
For the piano staccato sources, it was possible for the [5] K. De Boer, “Stereophonic Sound Reproduction,”
subjects to localize both the low and high notes using time Philips Tech. Rev., vol. 5, no. 4, pp. 107–114 (1940).
panning. There was no significant source effect. The high [6] F. H. Brittain, and D. M. Leakey, “Two-Channel
note staccato had more rapid onset than the low note one, Stereophonic Sound Systems,” Wireless World, vol. 62,
and from this it is tentatively hypothesized that the time pan- no. 5, pp. 206–210 (1956).
ning of a high note musical signal relies on strong transient [7] K. Wendt, Das Richtungshören bei der Überlagerung
nature than fundamental frequency whereas that of a low zweier Schallfelder bei Intensitäts- und Laufzeitasterero-
note more on frequency component. A further investigation phonie, Dissertation (Technische Hochschule, Aachen,
is required to confirm this. Germany, 1963).
Since the obtained ICLD and ICTD results for each target [8] G. Simonsen, Master’s Thesis (Technical University
image position were not significantly different for different of Lyngby, Denmark, 1984).
sources, a unified set of median ICLD and ICTD values [9] H. Wittek, and G. Theile, “Investigations into Di-
were obtained from the entire data. The unified ICLD and rectional Imaging Using L-C-R Stereo Microphones,”
ICTD were observed to increase almost linearly from 0◦ in Proc. of Tonmeistertagung, pp. 432–454 (Hanover,
to 20◦ . The increase in the required value became steeper Germany, 2000).
from 20◦ to 30◦ in both level and time panning methods, [10] M. Williams, “Unified Theory of Microphone Sys-
with the value for 30◦ being almost double the value for tems for Stereophonic Sound Recording,” presented at the
20◦ . Based on the assumption that the increase of ICLD 82nd Convention of the Audio Engineering Society (1987
or ICTD is linear within the panning uncertainty range be- Mar.), convention paper 2466.
tween 21◦ and 30◦ , two separate linear panning factors for [11] H. Wittek, URL: www.hauptmikrofon.de. 2013.
the 0◦ –20◦ and the 21◦ –30◦ regions were derived for each [12] H. A. M. Clark, G. F. Dutton, and P. B. Vanderlyn,
panning method. From this, global equations for level and “The ‘Stereosonic’ Recording and Reproducing System,” J.
time pannings for the 60◦ conventional loudspeaker config- Audio Eng. Soc., vol. 6, no. 2, pp. 102–115 (1958); reprinted
uration were produced. Based on the literature suggesting in Stereophonic Techniques (Audio Engineering Society,
that the image position shift caused by a certain ICLD or New York, 1986).

986 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
PAPERS LEVEL AND TIME PANNING OF PHANTOM IMAGES

[13] D. Griesinger, “Stereo and Surround Panning in [29] H. Wittek, and G. Theile, “The Recording Angle—
Practice,” presented at the 112th Convention of the Au- Based on Localization Curves,” presented at the 112th
dio Engineering Society (2002 May), convention paper Convention of the Audio Engineering Society (2002 May),
5564. convention paper 5568.
[14] S. P. Lipshitz, “Stereo Microphone Techniques: Are [30] A. W. Mills, “On the Minimum Audible Angle,”
the Purists Wrong?” J. Audio Eng. Soc., vol. 34, no. 9, J. Acoust. Soc. Am., vol. 30, pp. 237–246 (1958).
pp. 717–743 (1986 Sep.). [31] J. Berg, and F. Rumsey, “Validity of Selected Spatial
[15] H-K. Lee, and F. Rumsey, “Elicitation and Grading Attributes in the Evaluation of 5-Channel Microphone Tech-
of Subjective Attributes of 2-Channel Phantom Images,” niques,” presented at the 112th Convention of the Audio
presented at the 116th Convention of the Audio Engineering Engineering Society (2002 May), convention paper 5593.
Society (2004 May), convention paper 6142. [32] T. Hidaka, L. Beranek, and T. Okano “Interaural
[16] J. Blauert, and W. Cobben, “Some Consideration Cross-Correlation Lateral Fraction, and Low- and High-
of Binaural Crosscorrelation Analysis,” Acoustica, vol. 39, Frequency Sound Levels as Measures of Acoustical Quality
pp. 96–104 (1978). in Concert Halls,” J. Acoust. Soc. Am., vol. 98, pp. 988–
[17] V. Hansen, and G. Munch, “Making Recordings for 1007 (1995).
Simulation Tests in the Archimedes Project,” J. Audio Eng. [33] G. Theile, “Natural 5.1 Music Recording Based on
Soc., vol. 39, no. 10, pp. 768–774 (1991 Oct.). Psychoacoustic Principles,” Proc. of the Audio Engineering
[18] C. S. Myers, “The Influence of Timbre and Loud- Society 19th International Conference, pp. 201–229.
ness on the Localization of sounds,” In Proc. R. Soc. Lond. [34] G. Theile, “On the Performance of Two-Channel
B., vol. 88, pp. 267–284 (1914 Sep.). and Multi-channel Stereophony,” presented at the 88th
[19] F. E. Toole and B. McA. Sayers, “Lateralization Convention of the Audio Engineering Society (1990 Mar.),
Judgments and the Nature of Binaural Acoustic Images,” convention paper 2887.
J. Acoust. Soc. Am., vol. 37, pp. 319–324 (1965). [35] J. Herre, H. Purnhagen, J. Koppens, O. Hellmuth, J.
[20] J. L. Flanagan, E. E. David, and B. J. Watson, “Bin- Engdegard, J. Hilper, L. Villemoes, L. Terentiv, C. Falch,
aural Lateralization of Cophasic and Antiphasic Clicks,” A. Hölzer, M. Valero, B. Resch, H. Mundt, H-O. Oh and,
J. Acoust. Soc. Am., vol. 36, pp. 2184–2193 (1964). “MPEG Spatial Audio Object Coding – The ISO/MPEG
[21] W. Yost, F. Wightman, and M. Green, “Lateral- Standard for Efficient Coding of Interactive Audio Scenes,”
ization of Filtered Clicks,” J. Acoust. Soc. Am., vol. 50, J. Audio Eng. Soc., vol. 60, no. 9, p. 655–673 (2012 Sep.).
pp. 1526–1531 (1971). [36] G. Tzanetakis, L. G. Martins, K. McNally, R. Jones
[22] B. Rakerd and W. Hartmann, “Localization of and, “Stereo Panning Information for Music Information
Sound in Rooms, III: Onset and Duration Effects,” Retrieval Tasks,” J. Audio Eng. Soc., vol. 58, no. 5, p. 409–
J. Acoust. Soc. Am., vol. 80, pp. 1695–1706 (1986). 417 (2010 May).
[23] J. Tobias and S. Zerlin, “Lateralization Thresholds [37] M. Cobos and J. J. Lopez, “Resynthesis of Sound
as a Function of Stimulus Duration,” J. Acoust. Soc. Am., Scenes on Wave-Field Synthesis from Stereo Mixtures Us-
vol. 31, pp. 1591–1594 (1959). ing Sound Source Separation Algorithms,” J. Audio Eng.
[24] W. Hartmann, “Localization of Sounds in Rooms,” Soc., vol. 57, no. 3, p. 91–110 (2009 Mar.).
J. Acoust. Soc. Am., vol. 74, pp. 1380–1391 (1993). [38] M. Sebastian and S. Groth, “Adaptively Adjust-
[25] S. S. Stevens and E. B. Newman, “The Localiza- ing the Stereophonic Sweet Spot to the Listener’s Posi-
tion of Actual Sources of Sound,” Am. J. Psych., vol. 48, tion,” J. Audio Eng. Soc., vol. 58, no. 10, p. 809–817
pp. 297–306 (1936). (2010 Oct.).
[26] G. Henning, “Detectability of Interaural Delay in [39] J. Braasch, “Modelling of Binaural Hearing,”
High-Frequency Complex Waveforms,” J. Acoust. Soc. J. Blauert Ed., in Communication Acoustics (Springer,
Am., vol. 55, pp. 84–90 (1974). Berlin, 2005).
[27] G. G. Harris, “Binaural Interaction of Impulsive [40] B. Gardner and K. Martin, URL: http://sound.
Stimuli and Pure Tones,” J. Acoust. Soc. Am., vol. 32, media.mit.edu/resources/KEMAR.html. 2000.
pp. 685–692 (1960). [41] D. J. Kistler and F. L. Wightman, “A Model of
[28] G. Theile, On Localization in the Superimposed Head-Related Transfer Functions Based on Principal Com-
Sound Field, Ph.D. Thesis (Technische Universität, Berlin, ponents Analysis and Minimum-Phase Reconstruction,”
Germany, 1980). J. Acoust. Soc. Am., vol. 91, pp. 1637–1647 (1992).

J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December 987
LEE AND RUMSEY PAPERS

APPENDIX loudspeaker listening, phase differences corresponding to


ICTDs between 0 and 3 ms were applied to two identical
Although the ITD and ILD mechanism of level panning sinusoids of the six different frequencies with 0.1 ms in-
has been studied extensively in the past, that of the time terval and then the resulting signals were convolved with
panning was hardly reported in the context of stereophonic the HRIRs. ITD was estimated by computing the cross-
reproduction. In order to gain insights into the frequency correlation between the ongoing parts of the synthesized
dependency of localization cues in the time panning of con- binaural signals and finding the time at which the maxi-
tinuous sound, ITDs and ILDs were measured as a function mum correlation occurs, as suggested by Kistler and Wight-
of ICTD using steady-state sinusoids of 250, 500, 1 k, man [41]. ILD was calculated by the dB ratio between
2 k, 4 k, and 8 kHz. The measurement was based on the the levels of the ongoing parts of the signals. The mea-
head related impulse responses (HRIRs) for a 30◦ loud- surement results are shown in Fig. A.1, and discussed in
speaker position, which was taken from the MIT’s KE- Section 3.2 in relation to the results of the time panning
MAR dummy head HRIR database [40]. To simulate the experiment.

Fig. A.1. Variations of Interaural Time Difference (upper panel) and Interaural Level Difference (lower panel) as a function of Interchannel
Time Difference for various frequencies

THE AUTHORS

Hyunkook Lee Francis Rumsey

Hyunkook Lee was born in Seoul, South Korea, in 1977. member of the Audio Engineering Society and a fellow of
He received a BMus degree in music and sound recording the Higher Education Academy, UK.
(Tonmeister) from the University of Surrey, Guildford, UK, •
in 2002, and his Ph.D. degree in audio engineering from the
Institute of Sound Recording (IoSR) at the same University Francis Rumsey is an independent technical writer and
in 2006. From 2006 to 2010, Dr. Lee was Senior Research consultant, based in the UK. Until 2009 he was Professor
Engineer in audio R&D at LG Electronics, South Korea. and Director of Research at the Institute of Sound Record-
Since 2010, he has been working as Senior Lecturer in ing, University of Surrey, specializing in sound quality,
music technology at the University of Huddersfield, Hud- psychoacoustics, and spatial audio. He is currently chair of
dersfield, UK. Dr. Lee has also been a freelance recording the AES Technical Council, Consultant Technical Writer
engineer since 2002. His current research interests include and Editor for the AES Journal. Among his musical activ-
auditory spatial perception, 3-D-audio recording and ren- ities he is organist and choirmaster of St. Mary the Virgin
dering techniques, and virtual acoustics. He is an active Church in Witney, Oxfordshire.

988 J. Audio Eng. Soc., Vol. 61, No. 12, 2013 December
View publication stats

You might also like