Sensaura
Sensaura
cancellation
by Alastair Sibbald
www.sensaura.co.uk
the crosstalk Y from the right speaker.
d w
However, the right ear first hears the
crosstalk X from the left speaker before
! the primary recorded impulse Z from
the right speaker, thus reducing the
impulsive sound source directly to the left
of the mics, at azimuth angle -90 degrees L R
time-of-arrival delay from the expected
580 µs to 250 µs. Because the
td tw
when W corresponds to one effective
crosstalk sound derives from the same,
head-width (20 cm), the time-of-arrival
difference between the microphones, real sound source, the brain receives a
t(w), from the sound source, is
SPL (dB)
L approximately 0.58 ms pair of highly correlated L and R sound
R
signals, which it immediately uses to
t determine where the perceived sound
source would be located. The brain,
therefore, uses the effective inter-aural
Figure 1: Recording an event with time-of-arrival delay of only 250 µs,
spaced microphones which corresponds to the actual
position of the left-hand loudspeaker
This configuration would ensure that the inter- at minus 30° azimuth. This position is where
aural time delay cues are correctly the brain (incorrectly) localises the sound
incorporated into the recording. Imagine that source. The correctly timed, primary
a recording is made of a sound source placed right-hand signal Z eventually arriving at the
immediately to the left of the microphone right ear is ignored because of the
configuration, as is shown in Figure 1. Haas/precedence effect, which is described in
When the sound source emits an impulse, it is the following section.
first recorded by the left-hand microphone The transaural crosstalk has, in effect,
before it is recorded by the one on the right. disabled the time-domain information that
If the microphones were 20 cm apart, then was built into the recording. Whereas the
the time-delay would be about 580 µs. position of the recorded sound was actually
Imagine now that this recording is being -90° azimuth, the position of the reproduced
replayed on a two-speaker hi-fi system and sound is perceived to be only -30° azimuth.
that the listener is sitting in the ‘correct’
listening position, as is shown in
Figure 2. Under these
L R
2
In summary, the presence of transaural begin to become discernible as separate
crosstalk restricts the conventional stereo entities.
sound image to the confines of the Perhaps the precedence effect has evolved as
loudspeaker placement. a valuable survival mechanism, enabling the
brain to cope with multiple reflections in an
2 Haas/precedence effect enclosed space. Thus, the true source of a
sound may be identified without confusion
The Haas (or precedence) effect[3] is the
from an array of first-order sound images.
phenomenon that, when presented with
several similar pieces of audio information to
process at slightly different times, the brain 3 Atal & Schroeder transaural
uses the first information to arrive to compute crosstalk cancellation
the direction of the sound source. It then
One of the first practical transaural crosstalk
attributes subsequent, similar sound packets
cancellation (TCC) systems was described in
with the same directional information.
the 1962 patent of Atal and Schroeder[1], later
For example, if several loudspeakers were to be described more explicitly in the 1975
playing music at exactly the same loudness in publication of Schroeder[2]. The latter
a room, all the sound would appear to come disclosure describes the method for solving
from the nearest loudspeaker and the others the multiple cancellation problem, where the
would appear to be silent. The first signals to crosstalk cancellation signals themselves need
arrive are used to determine the spatial to be accounted for in such a way that they do
position of the sound source and the not create errors which are compounded and
subsequent signals simply make it sound propagated further.
louder. This effect is so strong that the
intensity of the second signal could be up to
L source R source
8 dB greater than the initial signal and the
brain would still use the first (but quieter)
signal to decide from where the sound
C C
originated.
This effect is known also under the names ‘law
of the first wave front’, ‘auditory suppression + +
effect’, ‘first-arrival effect’ and ‘threshold of
extinction’, and is used for the basis of sound 1
1
2 2
1− C 1− C
reinforcement used in public address systems.
The brain attributes great relative importance
1 1
to time information as opposed to intensity
S
S
information. For example, an early paper of
Snow[4] describes experiments on
compensating differences in left-right intensity
balance using relative L-R time delays. It was L R
reported that a 1 ms time delay would
balance as much as 6 dB of intensity A A
misbalance. S S
The precedence effect is relevant to the 3D-
listener
sound user when 3D soundscapes are created
using many sources. Any signals deriving from
a particular sound source (such as a reverb Figure 3: Conventional transaural
signal) will fuse with the primary signal if they crosstalk cancellation scheme
are presented to the listener within about (Schroeder[2])
15 ms of each other. Beyond this time, they
3
The block diagram of the latter transaural problems if the data is to be used for filter
crosstalk cancellation method, by Schroeder, design.
is shown in Figure 3. The binaural signal
sources are at the top of the diagram, passing
down through the filtering system to generate 10
A and S functions
loudspeaker-driving signals L and R 5
dB
-15
alternate side of the central axis to be A. It is -20
commonly assumed that loudspeakers for -25
stereo listening will be placed to subtend -30
4 S and A functions
The amplitude components of typical 5 Band-limited transaural
transmission functions A and S, measured crosstalk cancellation
from an artificial head with canal simulator, Cooper and Bauck have patented a crosstalk
are shown in Figure 4, using logarithmic cancellation system[5] which is essentially
frequency (x) and intensity (y) scales. based on the Atal and Schroeder system, but
The important and notable features are as featuring a pair of high-cut (~10 kHz) filters in
follows. the cross-feed system. The theory behind this
approach is that, even if a perfect crosstalk
1. Both A and S functions converge at low
cancellation system could be implemented, it
frequencies.
would only function properly if the listener’s
2. There is a substantial boost in the mid- head was totally immobile in the absolute
range response, typically about 30 dB at centre of the sweet spot. The reason for this is
2.9 kHz. This is caused by the combined that wave cancellation effects are dependent
resonances of both the auditory canal and on the coincidence of primary wave peaks
the concha. It is possible to model these and cancellation troughs. When one wave of
resonances separately. If the artificial head the cancelling pair is displaced, the
were without the auditory canal element, cancellation is incomplete. For example,
then the boost would be about 12 to assume a listener’s head was to move
15 dB and at a higher frequency. sideways such that the left ear was 5 cm closer
to the left speaker and 5 cm further away from
3. The dips and peaks at higher frequencies,
the right speaker. The unwanted crosstalk
above about 5 kHz, are caused by the
signal to the left ear from the right speaker
physical convolutions on the pinna,
would be shifted by 10 cm with respect to its
combined also with harmonic resonances
intended cancellation wave from the left
of the concha and canal. Some of the
speaker. The cancellation would then be
peaks and dips can be rather sharp (where
imperfect. In the extreme, when the
the various effects combine or where a
displacement distance is equal to one-half of a
resonant cavity in the concha has a high
wavelength, then constructive addition occurs,
acoustic Q factor). For example, in the
rather than destructive addition, and so the
data of Figure 4, at 11.5 kHz, the A and S
unwanted crosstalk signal can actually be
functions overlap a little, which can cause
magnified at particular frequencies, rather
4
than destroyed. For the 10 cm example signals bearing the 3D-sound cues can exist in
above, the frequency at which this occurs is one of two differing formats: (a) with a full
around 1.7 kHz. external-ear function incorporated; and
(b) with the external-ear function absent.
At first glance, this apparent sensitivity of
cancellation effect with listener position is In the real world, we hear through the
slightly disturbing. However, in practise, the following audio chain:
creation of effective crosstalk-cancellation is
sounds ⇒ external ear ⇒ auditory canal
not so critical as one might be led to believe
by such calculations. This is because of the ...and this is what we consider sounds natural.
natural acoustic properties of the head and However, a sound recording made using an
ears themselves, and the properties of the 30° artificial head system featuring an outer-ear
HRTFs. In essence, as frequency increases, component (but not an auditory canal
the head acts more and more effectively as an element) can be played on circumaural
acoustic baffle, thus suppressing transaural headphones quite satisfactorily. This is
crosstalk naturally. Consequently, there is because the headphones acoustically dampen
little crosstalk to cancel at high frequencies and inhibit the outer ear resonance. Hence,
and so the basic Atal and Schroeder method the listener is effectively listening through an
can be successfully used in hi-fi type audio chain as follows:
configurations where the speakers are several
metres distant from the listener. real sounds ⇒
There are other factors to be borne in mind. artificial external ear ⇒
We are used to discussing the audio spectrum listener's own auditory canal
over the full 20 Hz to 20 kHz range, but most
music fundamental frequencies lie below ...and this is equivalent to the real-life
1 kHz (the highest note a soprano sings is just situation. In this case, the audio signals must
over 1.1 kHz). Consequently, much of the be delivered into the listener’s ears with unity
audio energy we hear lies at the low- gain, such that the listener is ‘hearing through
frequency end of the spectrum and this is not a single set of ears’.
too demanding for crosstalk cancellation Option 1: unity-gain system
systems. Secondly, wave-cancellation and
Now consider what occurs when the sounds
wave-addition effects occur equally with
are replayed via loudspeakers (and we ignore,
conventional stereo as they do with 3D
for the moment, the transaural crosstalk
systems and yet everyone accepts these effects
effects). The listener hears the sound via his
without question. Indeed, many are not
aware of their existence. However, if you play own external near-ear function (at 30°), S.
This means that the audio chain now
a (monophonic) 2.2 kHz sine wave through
becomes:
both L and R speakers of a stereo system, you
will clearly hear some unpleasant effects as real sounds ⇒
you move your head around. (This
wavelength corresponds to twice the inter- artificial external ear ⇒
aural time-delay, thereby creating significant listener's own external ear ⇒
natural destructive cancellation at both ears in
the sweet spot.) listener's own auditory canal.
...there are effectively two outer-ear functions
in the chain! This causes spatial degradation
6 Fundamental gain options of the 3D effects, but also creates a gross tonal
Setting aside the intrinsic adjustments which error (12 dB mid-range boost). Accordingly,
can be made to crosstalk cancellation in order to provide a transaural crosstalk
schemes, such as the use of appropriate A and cancellation scheme for binaural signals that
S functions for particular loudspeaker angles, specifically contain an outer-ear function, the
there is a more fundamental option related to gain factors into near and far ears must be set
the mode of signal delivery. The binaural to unity and zero respectively.
5
Option 2: S-gain system mechanism includes one, then the transaural
The above description relates to the first type crosstalk cancellation scheme must deliver the
of binaural format, in which a full external-ear sound into the listener’s ears with a gain of S,
function is incorporated. The second option, rather than unity.
in which the external-ear function is absent, is
also common and requires a different crosstalk 7 Unity-gain transaural
cancellation algorithm. Why does this
crosstalk cancellation
alternative binaural format exist? There are
two main reasons. Considering Figure 3 again, it will be
appreciated that, if there was no crosstalk
First, it has long been recognised that when across the head, the transmission function
artificial head recordings are replayed via from the right source to the right ear (and
loudspeakers, they are tonally incorrect (for from the left source to the left ear) would
the reasons cited above in section 4, item 2). be S. Hence, the goal of providing
Thus, several manufacturers have transmission functions of 1 (unity) from the
incorporated equalisation circuitry into their right source to the right ear and 0 (zero) from
artificial heads to compensate. The the right source to the left ear can be achieved
equalisation parameters are chosen to provide by simply adding a serial (1/S) function (the
a flat response when the signals are played inverse of the same-side transmission function)
through loudspeakers by creating a mid-range between source and loudspeaker. The
band-cut, equivalent to an inverse outer-ear presence of crosstalk, however, requires a
function. This is analogous to wearing cancellation signal to be provided by the other
headphones that dampen and inhibit the loudspeaker.
outer-ear response. The result is tonally
satisfactory and comparable to stereo but, in For example, consider the process of
the absence of transaural crosstalk transferring the R channel signal into the right
cancellation, does not create a 3D-sound field ear only. The transfer from R loudspeaker to
for the user. right ear is via the same-side function, S. The
crosstalk from the R loudspeaker will arrive at
Secondly, it is often required to synthesise 3D- the left ear with transfer function A.
audio using head-response transfer function Therefore, we need to deliver a (-A) signal to
(HRTF) filters that emulate the acoustic the left ear from the L speaker in order to
properties of the head and ears. Many cancel it. However, we know that the transfer
applications, such as computer games, feature function from the L speaker to left ear is S and
playback via loudspeakers such that the user’s so the overall crosstalk cancellation feed from
own outer-ear function is invoked. It is the R to L channel must be (-A) x (1/S),
convenient, therefore, to take advantage of i.e. (-A/S). This would deliver the crosstalk
this and create normalised HRTF filter sets in cancellation signal correctly to the left ear.
which all the individual HRTFs are divided by (For convenience, this crosstalk cancellation
the 30° function. This reduces the dynamic feed function (-A/S) will be referred to as C.)
range of the filters by 12 dB or so and enables
greater accuracy in their design. Also, it is However, this cancellation signal also cross-
logistically easier to implement systems which talks to the other (right) ear and so this
mix 3D-synthesised audio with artificial-head secondary crosstalk must also be cancelled
material[6]. For example, in the SPU-800 (and so on, ad infinitum). Despite this
Sensaura Digital Workstation[7], it is more apparently recursive problem, Schroeder
effective and efficient to provide equalisation showed that accurate crosstalk cancellation
separately to a plurality of signals and then can be achieved by taking the basic
implement crosstalk cancellation as a final configuration described above, adding extra
stage in the processing. correction filters and solving simultaneous
equations to derive their values. These are
In circumstances where the external-ear calculated to be (1/1-C2) and (1/S), as shown
function is absent in the binaural source in Figure 3.
material, but it is required that the listening
6
In summary, transaural crosstalk cancellation special transaural crosstalk cancellation
can be achieved by feeding the R source via method has been devised to provide optimal
crosstalk filter C (which is equal to (-A/S)), and cancellation at any given loudspeaker position
adding it to the L channel (and vice versa). and distance. We call this Sensaura XTC
The subsequent serial correction filters and it is described in a separate technical
1/(1-C2) deal with the multiple cancellation white paper.
problem and (1/S) corrects for the same side
transmission (above). It would appear that the
Atal and Schroeder scheme provides a
8 S-gain transaural crosstalk
theoretically ideal solution. By inspection of cancellation
Figure 3, it can be seen that the overall As shown above, in certain circumstances it is
transmission function from the right input (R) desirable to devise a transaural crosstalk
to the right ear (r), defined here to be Rr(f), is: cancellation scheme with built-in spectral
equalisation to compensate for the twice
1 1 1 1 (1)
Rr ( f ) = S + C
1− C2 S
A = 1
1− C2 S
through the ears effect (section 6, option 1).
The second option relates to circumstances
and the overall transmission function from the
where the external-ear function is absent in
same (R) input to the left ear (l), defined to be
the binaural source material which requires
Rl(f), is:
that the overall listening chain be designed to
1 1 1 1 incorporate an external-ear function. This is
Rl ( f ) = A + C S = 0 (2)
1 − C2 S 1 − C2 S readily achieved by specifying the gain into
the listener’s ears to be S, rather than unity,
These results satisfy the cancellation and solving the near-ear and far-ear equations
requirements. for S and 0, respectively, rather than 1 and 0.
(It is useful to note that the compounded serial
1 and 1 terms simplify to S ,
1 − C2
S
2
S − A2
9 Sweet spot
and that, at low-frequencies, A and S are All stereo reproduction systems, including
almost identical (both in amplitude and transaural crosstalk cancellation schemes,
phase).) require that the listener is positioned in the so-
called sweet spot area, such that the listener
The Atal and Schroeder configuration has forms an equilateral triangle with respect to
been known to perform well[2]: “the practical the loudspeaker pair. The loudspeakers,
experience... has been nothing less than therefore, are present at azimuth angles of
amazing... virtual sound images can be approximately ±30° from the listener. If the
created far off to the sides and even behind listener moves forwards or backwards from
the listener”. However, the early results were the sweet spot, both loudspeaker-to-ear path
also reported to be very listener-position lengths change by similar amounts, so the
dependent: “...if the listener turns his head by crosstalk cancellation conditions still prevail
more than 10° from the frontal direction ...the and the sound image is little affected. If the
realistic illusion disappears, frequently listener moves sideways away from the sweet
changing into an ‘inside the head’ sensation”. spot, however, then the crosstalk cancellation
The author’s experience is that the basic Atal process is no longer so precise because the
and Schroeder configuration can work quite path lengths from the loudspeakers change in
well for speakers at two metres distance or opposite ways (one increases as the other
more. However, in critical listening tests using decreases) and so the sound image begins to
classical music, there are some minor audible deteriorate.
artefacts when moving one’s head sideways
and back and forth. However, for PC
applications the physical situation is different
because the loudspeakers are relatively close
to the listener. The importance of this has not
been previously recognised. Accordingly, a
7
The size of the sweet spot depends upon the L source R source
wavelengths of the sounds being heard. Low-
frequency sounds, with wavelengths of several − xA
− xA
feet or more, create a large sweet spot that is
S S
very tolerant of head movement, whereas
higher frequency sounds create a smaller
sweet spot. However, it is fortunate that, at + +
these higher frequencies (several kHz and
above), the head itself acts as an efficient
G G
baffle, thus screening higher frequency sounds
from the opposite ear such that much smaller
amounts of crosstalk cancellation are needed.
By inspection of the S and A function in L R
Figure 4, for example, it can be seen that the xA xA
level of crosstalk (i.e. the separation between
S S
the curves) with respect to the primary signal
at 500 Hz is about -3.3 dB, whereas for most listener
frequencies above 2.2 kHz, it is below -10 dB.
As a consequence, the need for crosstalk
cancellation diminishes naturally with Figure 5: Generic Sensaura transaural
increasing frequency. crosstalk cancellation
8
frequencies). This creates an intrinsically
L source R source stable system, which is another advantage of
this particular method.
− xA − xA Figure 6 shows the explicit configuration for
S S
the Sensaura unity-gain system.
The overall transmission function from the
+ + right input (R) to the left ear (l), Rl(f), and vice
versa, can be confirmed to be zero:
S S
− xA
2 2 2 2 2 2
S −x A S −x A
Rl ( f ) = G.xA + G.S = 0 (5)
S
S-gain option
L R
For use with binaural format option 2 (such as
xA xA
3D-sound synthesis using normalised HRTF
S S filters), the S-gain option is required
(i.e. Rr(f) = S). Again, this is achieved by
listener solving the near-ear and far-ear transmission
paths for S and 0, rather than 1 and 0.
Figure 6: Sensaura XTC transaural The overall transmission function from the
crosstalk cancellation scheme right input (R) to the right ear (r), Rr(f), given in
(unity-gain option) equation (3), must be set equal to S, and
hence:
required goal of precise crosstalk cancellation, − xA
G.S + G.xA =S and therefore:
whilst dealing with the multiple cancellation
S
problem correctly, as before, such that there is
an appropriate gain factor between the right S2
G = 2 (6)
source and the right ear. S −x A
2 2
Unity-gain option It will be noted that this is the product of the
For use with binaural format option 1 (such as previous equation (4) and S, and that the
artificial head recordings) the unity-gain implementation involves simply substituting
option is required (i.e. Rr(f) = 1). the solution of (6) for G into Figure 5 to yield
The overall transmission function from the an S-gain version of the explicit unity-gain
right input (R) to the right ear (r), Rr(f), is: scheme of Figure 6.
The signal processing must be carried out in
− xA
R r ( f ) = G.S + G.xA (3) such a way that the phase relationships are
S preserved and that an appropriate time-delay
and this must be equal to 1 for unity gain, as is incorporated into each line immediately
described above. Hence: prior to the summation element. In this way,
the cancellation signal arrives in
− xA synchronisation with the primary crosstalk
G.S + G.xA =1 and therefore:
S signal, thus enabling cancellation to occur.
Minimum-phase FIR filters are commonly
S used for this.
G = 2 2
(4)
S −x A
2
9
Sensaura XTC provides optimal transaural
crosstalk cancellation for users of PC-based
multimedia systems in which the loudspeakers
are relatively close to the listener and might be
at a variety of angles and distances, depending
on the individual user’s set-up configuration
and preferences. This has been achieved by
deriving a mathematical function for the
distance and angle dependency of the
transaural crosstalk factor, x, such that it can
be calculated for any given loudspeaker
distance and angle. This, in turn, can be used
to control the amount of crosstalk cancellation
that is implemented.
11 References
1. Apparent sound source translator.
B S Atal and M R Schroeder
US Patent 3,236,949.
2. Models of hearing.
M R Schroeder
Proc. IEEE, Sep 1975, 63, (9),
pp. 1332-1350.
3. Historical background of the Haas
and/or precedence effect.
M B Gardner
J. Acoust. Soc. Am., 43, (6),
pp.1243-1248 (1968).
4. Effect of arrival time on stereophonic
localization.
W B Snow
J. Acoust. Soc. Am., 26, (6),
pp.1071-1074 (1954).
5. Head diffraction compensated stereo
system with optimal equalisation.
D H Cooper and J L Bauck
US Patent 4,975,954.
6. Sound recording, processing and
reproduction means.
A Sibbald et al
US Patent 5,666,425.
For further information, please contact:
7. Sensaura SPU-800.
D Mellor Email: [email protected]
Audio Media, November 1996, WWW: www.sensaura.co.uk
pp. 164-171. Tel: +44 20 8848 6636