Carl Yon 2004

Review
TRENDS in Cognitive Sciences
Vol.8 No.10 October 2004
How the brain separates sounds

Robert P. Carlyon
MRC Cognition and Brain Sciences Unit, 15, Chaucer Road, Cambridge CB2 2EF, UK
In everyday life we often listen to one sound, such as someones voice, in a background of competing sounds. To do this, we must assign simultaneously occurring frequency components to the correct source, and organize sounds appropriately over time. The physical cues that we exploit to do so are well-established; more recent research has focussed on the underlying neural bases, where most progress has been made in the study of a form of sequential organization known as auditory streaming. Listeners sensitivity to streaming cues can be captured in the responses of neurons in the primary auditory cortex, and in EEG wave components with a short latency (!200 ms). However, streaming can be strongly affected by attention, suggesting that this early processing either receives input from non-auditory areas, or feeds into processes that do. How do we listen to something interesting, such as a person talking or a melody, when the rest of the world just wont keep quiet? The auditory systems ability to accomplish this feat is all the more impressive because the interfering sounds are often very similar to the ones we want to hear for example, they too may be voices or melodies. Over the past few decades, auditory scientists have identied which cues listeners can and cant use when performing this Auditory Scene Analysis (ASA) [1]. I will summarize that research only briey here, as it has been dealt with in more detail by several recent reviews [2,3]. Instead, I will focus on a relatively new line of study, which addresses the neural mechanisms that perform ASA, and how they interact with other aspects of cognitive function. A question that has inspired many of these experiments concerns the role of attention: does ASA occur automatically, providing an attentional system with several pre-segregated sound sources from which to select? Or does attention actually inuence the operation of those processes that are needed for sound segregation? What cues do listeners use to segregate sounds? When answering this question, it is useful to draw a distinction between two broad classes of phenomenon. One, known as auditory streaming, concerns the perceptual organization of sounds over time. It is widely exploited by composers of music, and forms the basis of our ability to track one speaker in the presence of interfering speech. Streaming is usually studied using quite simple stimuli, such as the tone sequence shown in Figure 1a (originally due to L.P.A.S. von Noorden, PhD thesis, Eindhoven University of Technology, The Netherlands, 1975). It
Corresponding author: Robert P. Carlyon ([email protected]).
consists of two tones, with frequencies A and B, presented in a pattern of repeating triplets. When the frequency difference (Df ) between A and B is small, and the sequence is played slowly, listeners hear a galloping rhythm, corresponding to the repeating triplets. However, when Df is large, or when the sequence is speeded up, the galloping rhythm is lost, and subjects hear instead two concurrent streams of tones, one with a frequency A and the other with a frequency B. For intermediate separations and speeds, the percept can ip from one organization to the other, with an increased tendency to hear two streams as the sequence progresses. (For demonstrations of these and other relevant phenomena go to http://www. mrc-cbu.cam.ac.uk/hearing.)
(a)
B < f >
(b)
(c)
Frequency
(d)
(e)
(f)
Time
Figure 1. Schematic spectrograms of stimuli used to study sound segregation. (a) Galloping rhythm sequence used to study streaming. (b) Schematic representation of two two-formant vowels spoken by different talkers; each vowel is shown in a different color. The vowels have the same F0 and start and stop synchronously. (c) as (b) but with an onset and offset asynchrony. (d) as (b) but with an F0 difference. (e) a tone glide with a gap, which in part (f) is lled by a noise burst, giving the illusion of continuity
www.sciencedirect.com 1364-6613/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2004.08.008
466
Review
(a) F0=100 Hz Level (dB, arbitrary) 30 20 10 Amplitude 1
1 0 8 2 4 6 Frequency (Hz) 10 60 70 80 90 Time (ms) 100
(b) F0=200 Hz Level (dB, arbitrary) 30 20 10 1 0 2 8 4 6 Frequency (Hz) 10 60 70 80 90 Time (ms) 100 Amplitude 1
(c) Effect of F0 difference on streaming Probability of 2 streams response
1.0 0.8 0.6 0.4 0.2 0.0 6 0 6 12 F0 difference (semitones) 18 Low freq. region High freq. region
bers were excited, streaming still increased with increasing DF0 (Figure 2c). A perhaps more taxing problem for the auditory system occurs when two sounds overlap in time. Figure 1b shows a schematic spectrogram of two vowels having the same pitch and which start and stop synchronously. Listeners hear this sound as a single source, and are rather poor at identifying the two component vowels [9,10]. Two cues are overwhelmingly important for segregating these two sounds, at least when they are presented to the same ear such as when listening to a mono radio [11]. First, in most situations, even when two speakers voices overlap in time, they do not start and stop in synchrony. Listeners exploit differences in onset (and, to a lesser extent, offset) between two sounds to segregate them (Figure 1c). Second, differences in the pitch of two speakers aid segregation, as illustrated by the wider spacing between the harmonics of one vowel in Figure 1d. Segregation can also be facilitated when the amplitudes of the frequency components in one source uctuate coherently. Further cues are used when the waveforms reaching the two ears are different, such as when listening to two talkers in a room (see Box 1). The brain is also able to deal intelligently with instances where portions of the signal are briey masked (Box 2). Neural and cognitive bases of ASA: auditory streaming Animal experiments Most neurons in the auditory system, from the AN upwards, are frequency selective. This simple fact suggests that some aspects of streaming might arise from quite basic processes, such as could be observed at any neural site where this frequency selectivity is observed. Consider the response of a neuron tuned to the A frequency in Figure 1a. When Df is small, that neuron will also respond to the B tones, and its output will reect the galloping rhythm in the sequence. As Df is increased, the neuron will respond only to the A tones. There will be other neurons that only respond to the B tones, but very few will respond to both. This could provide a neural basis for the effects of frequency separation on streaming. Now consider the case where Df is intermediate, so that some neurons respond strongly to the A tones and weakly to the B tones. As the sequence is speeded up, the tones get closer together in time, and we might expect the short-term adaptation produced by the strong A response to reduce the response to the B tones. This would provide a basis for the effects of repetition rate on streaming. In fact, both of these ndings have been observed in the primary auditory cortex (area A1) of awake macaques [12,13]. These effects could have arisen from any part of the auditory pathway from the auditory nerve to A1, and it is tempting to conclude that auditory streaming occurs, perhaps automatically, at a fairly peripheral stage of processing [14]. However, there are at least three good reasons to resist this temptation. First, all electrophysiological studies of streaming, both in animals and humans [12,13,1518], have used stimuli where the tones in each stream occupy separate frequency regions. Separation of these sounds already occurs at the level of the AN, and we do not yet know whether similar
Figure 2. (a,b) Bandpass ltered harmonic complexes similar to those used to study auditory streaming [6,7]. The fundamental (F0) corresponds to the spacing between harmonics in the frequency spectra (left panels) and to the reciprocal of the repetition rate of the waveform (right panels). Note that the spectral region occupied by the two complexes is the same, despite the difference in F0. Stimulus details vary across experiments, but in all cases the spacing between harmonics was smaller than the bandwidths of peripheral auditory lters [55] that respond to the complex. This prevented any place-of-excitation cues arising from differences in the positions of the individual harmonics within the complex. (c) Probability of hearing two streams as a function of the F0 difference between two complex tones, such as in (a) and (b). Circles show results obtained with stimuli ltered as in (a) and (b), in which no place-of-excitation cues were present. Streaming increases with increasing F0 difference, but the effect is slightly smaller than when the complexes are ltered into a lower frequency region, where auditory lters are narrower and place-of-excitation cues were present (squares). Data re-plotted from a subset of those presented by Grimault et al. [6].
As Moore and Gockel [3] noted in a recent review, two sounds that occupy the same frequency region can also be streamed apart, provided that there is a sufciently large perceptual difference between them [37]. For example, several studies have used A and B tones that were harmonic complexes ltered into the same frequency region [68]. The A and B complexes had a different fundamental frequency (F0), causing them to differ in pitch (Figure 2a,b). Even though the stimuli were carefully designed so that the F0 differences (DF0) would not cause a difference in which peripheral auditory nerve (AN)
www.sciencedirect.com
Review
467
Box 1. Two ears are better than one

Imagine you are in the situation shown in Figure Ia, where two people, labeled S1 and S2, are talking to you. If we silence S1 for a moment, there are two main cues that you can use to estimate S2s location, whose speech will have a higher level, and arrive earlier, at your left than at your right ear. The rst of these cues, interaural level differences (ILDs), is important for sound segregation: when both speakers are talking, listeners can improve perception of S1s speech by attending to the right ear [38,39]. Interestingly, there is no evidence that listeners use the second cue, termed interaural time differences (ITDs) for segregation of completely simultaneous sounds [2,11,38,40,41]. There is, however, another cue listeners do use, and its neural basis is well known. When only S1 is talking, the waveforms reaching the two ears are perfectly correlated. The voice of S2 will have energy in some of the same frequency regions as that of S1. So, in each ear, the output of an auditory nerve ber tuned to a particular frequency will reect a mixture of the two voices. Because the contribution of S2 will naturally be greater at the left ear, this decorrelates the outputs of the auditory nerve bers in the two ears. Furthermore, this decorrelation will be greatest in frequency regions where S2s voice has the most energy, and listeners can use this information to work out what that speaker is saying [38,42,43]. The auditory systems ability to exploit decorrelation has been known since the early studies of the Binaural Masking Level Difference (BMLD) [44,45]. Imagine that we present a subject with a noise masker and a tonal signal that are identical at the two ears (Figure Ib). By inverting the signal at one ear (arrow in Figure Ic) the combined waveform at the two ears becomes less correlated, and this decorrelation aids signal detection. The neural basis of this processing has been tracked down to neurons in the inferior colliculus (IC), and responses of guinea pig IC neurons closely match the pattern of results observed in human BMLD experiments (Figure Id,e) [46]. Indeed, one could argue that this is the most successful instance of auditory scientists identifying the neural basis for a component of concurrent sound segregation.
(a)
S1
(d)
4 D values
S2
4 BMLD:11 dB (b) 8 20 40 60 80 500 Hz Signal level (dB SPL)
(e) 10 5 D values 0 5 10 30 50 BMLD:13 dB 70 90
(c)
500 Hz Signal level (dB SPL)

Figure I. (a) Schematic of a listener with one speaker (S1) positioned in front and one (S2) to the left. (b) and (c) illustrate the binaural masking level difference (BMLD). In (b) the masker and signal are correlated perfectly between the two ears, and the listener cannot hear the signal. In (c) the signal has been inverted at one ear (arrow),so the tone is heard and the subject (who really should get out more) is happy. (d) The response of a guinea pig inferior colliculus neuron, for the case where the masker and signal are in phase (blue curve) and where the signal is inverted in one ear (red curve). The ordinate shows a normalized value, D, dened as the increase in ring rate produced when the 500-Hz signal is added to the noise masker, divided by the standard deviation of the ring rate. The abscissa shows the signal level. The signal is dened as being detected when the absolute value of D exceeds 1.0 (outside the dashed lines). Note that in this case, detection of the signal is caused by an increase in ring rate when the signal is in phase, but by a decrease when it is out of phase. The BMLD is the threshold difference between the two curves, 11 dB here. (e) is the response for a neuron where adding an out-of-phase signal causes an increase in ring rate. Guinea-pig data reproduced with minor modications from [46], with permission.
results would occur for streaming between sounds that excite the same population of AN bers [47]. This would demonstrate the operation of processing which did not simply reect the responses of AN bers. Note, however,
that if higher order-neurons are tuned to some feature (such as F0), then the adaptation and selectivity of those neurons could contribute to streaming in much the same way as occurs with frequency-tuned neurons and pure
468
Review
Box 2. The continuity illusion

Another trick up the brains sleeve is invoked when a portion of a sound is briey masked by noise. If I were to clap my hands loudly during the middle of the word meet, you would hear my voice continue behind the clap, rather than interpret the utterance as two words (me eat). What makes this phenomenon especially interesting is that even if a gap is introduced into a sound (as shown for a tone glide in Figure 1e in main text), an illusion of continuity can be introduced by lling the gap with an inducing sound (Figure 1f). This continuity illusion will occur only when the peripheral excitation produced by the inducer (e.g. noise) is such that it would have at least partially masked the inducee (the tone glide) if it had really been continuous. In addition, there should be no silent gaps between the inducee and inducer, that might reveal the fact that the inducee was interrupted [1,47]. Evidence for neural correlates of the continuity illusion is relatively scarce. Sugita measured responses of neurons in the primary auditory cortex of cats to frequency glides [48]. For a minority of neurons, lling the gap with a band of noise caused the units to continue ring during the gap, at a rate higher than occurred during the noise alone. Unfortunately, there was no evidence that this super-additive behavior depended on the noise having a frequency region that matched the tone, and so it is doubtful whether it really reects a neural correlate of the illusion. For example, he reported super-additivity for a tone glide having a gap corresponding to frequencies between 6.3 and 10.2 kHz, and a noise whose lower cut-off was 14.3 kHz. More recently, Micheyl et al. measured an MMN that reected the presence of an illusory continuity, produced by lling a gap between two tone bursts with noise [49]. Crucially, this trace depended on whether the frequency content of the noise was sufcient to produce the continuity illusion. Finally, it is worth noting that the continuity illusion can both improve and impair performance in a forced-choice task [5053]. The fact that it can impair performance suggests that it involves a compulsory re-coding of the sensory input, and that subjects do not have conscious access to a pre-illusion stage of processing.
tones. Second, it is now known that the responses of neurons in primary auditory cortex can be strongly inuenced by attention. Fritz et al. [19] measured the response properties of neurons in region A1 of the ferret primary auditory cortex, and, in some conditions, required the animal to detect a tone at a particular frequency. Neural tuning varied depending on which frequency region the ferret was attending to. Hence, even when correlates of streaming are observed in the responses of A1, these responses might be modied by input from other parts of the brain. Finally, as discussed next, there is evidence from research with human listeners that the streaming process is also affected by attention. Effects of attention on streaming by humans The issue of whether streaming is affected by attention raises an interesting challenge: if subjects are not attending to a sound sequence, how can we nd out whether streaming has occurred? Here we summarize three approaches to this problem EEG One approach is to measure neural responses to the sound sequences using EEG. A measure termed the mismatch negativity (MMN), has proved particularly useful here. The MMN is observed when a rare deviant sound (or short sequence of sounds) is presented in a sequence of otherwise identical standards. It takes the form of a negative wave, with a latency of 150200 ms, obtained when the response to the standards is subtracted from that to the deviants. Although the MMN is thought to have multiple generators, its major source has been shown to be in auditory areas along the supra-temporal plane [2022]. Several researchers have designed standard and deviant sequences such that, if subjects were required to identify the deviants, they could do so only if streaming had occurred. The idea is that, by measuring the MMN to the deviants under different stimulus conditions, one can obtain a correlate of streaming without requiring subjects to respond to the stimuli. For example, Sussman et al. presented a tone sequence that alternated regularly between a high-frequency and a low-frequency range,
with the tones in each range playing a simple melody [18]. On a minority of trials, they altered the order of tones in the low range, and argued that this should elicit an MMN only when the interfering high-frequency tones were pulled into a separate auditory stream. They observed an MMN when the stimulus onset asynchrony (SOA) between successive tones was 100 ms, at which behavioral experiments suggest that streaming should occur. By contrast, when the SOA was 750 ms, for which streaming should be much weaker, no MMN was observed. They attributed the difference in MMN between the two conditions to differences in streaming, which, as subjects were reading a book at the same time, they described as occurring outside the focus of attention. There are two general and interrelated limitations to this approach. First, although there is strong evidence that the MMN to deviants in some paradigms can be measured in the absence of attention for example when the subject is in a coma [23,24] more recent evidence shows that the MMN can be affected by attention [25,26]. Second, although subjects are instructed to ignore the sounds, and to watch a movie or read a book, this is not particularly demanding, and one can never be sure that they did not sneak a little listen. For this reason, MMN researchers are usually careful to conclude that the phenomenon under study (i.e. streaming) occurs outside the focus of attention. A useful variation of the technique is to manipulate the amount of attention paid to the sounds and see whether this affects the MMN response. This was in fact adopted by Sussman et al. [17], using a paradigm similar to that in their later experiment described above. They used an SOA of 500 ms and observed an MMN only when subjects were attending to the tones, and concluded that, for this stimulus, attention did indeed modulate streaming. Effects of streaming on competing tasks A second approach was developed by Jones and his colleagues, who measured the effects of streaming on a secondary task namely, serial recall of visually presented letters [2729]. Recall can be disrupted by sound sequences that subjects are instructed to ignore, provided that the sounds change over time. For example, two tones
Review
469
that alternate in frequency disrupt performance, whereas a single repeated tone does not. Importantly, this irrelevant sound effect appears to be sensitive to streaming, so that when a sequence splits into two streams, each of which is heard as a repetition of a single sound, its deleterious effect on visual recall is reduced. This shows that some aspects of streaming can be observed for sound sequences that are not the focus of subjects attention. However, as discussed in the previous section with regard to EEG studies, it is hard to rule out the possibility that subjects were paying some attention to the tones. Manipulating attention during the build-up of streaming A further approach, developed in our laboratory, exploited the fact that the initial percept of a tone sequence as a single stream can turn into a two-stream percept after several seconds [30]. The subjects attention was manipulated during the rst half of a sequence, and the effect of this manipulation on the streaming reported during the second half was measured. In a baseline condition, a 20-s sequence of repeating ABA- triplets (Figure 1a) was presented to subjects left ears, and subjects reported how many streams they heard throughout the sequence. In a second condition, subjects performed a demanding task on a series of noise bursts presented to the right ear during the rst 10 s of the sequence, and then switched their attention to the tones in their left ear and started making streaming judgements. The question was, when they did this, whether the streaming would be the same as if they had been attending to the tones all along, or more like that at the very start of an attended sequence. The results supported the latter prediction; when attention was diverted to the tones, the build-up of streaming was much less than in the baseline condition. The noise bursts had no effect in a third condition, where subjects were instructed to ignore them. A subsequent study showed that streaming could be disrupted even by non-auditory competing tasks [31]. At the time, it was concluded that streaming had not built up when attention was diverted. An alternative, proposed by Cusack et al. [32] is that streaming did in fact build up, but that the act of switching attention back to the tones reset the streaming mechanism. Either way, the results show that attention can have a strong inuence on auditory streaming, and this is inconsistent with accounts based purely on peripheral mechanisms (e.g. [14]). This conclusion can be reconciled with the research on the irrelevant sound effect if one assumes that some streaming can occur without full attention, but that it can be strongly affected when subjects have to perform a demanding competing task. Non-auditory areas and auditory streaming There are two reports suggesting that brain areas not devoted to auditory processing nevertheless play a role in auditory streaming. One involved stroke patients with damage to the right hemisphere, who were identied as having difculty in attending to visual objects to their left (unilateral neglect). Those same subjects showed reduced streaming for sounds presented to their left ears, compared with when the sounds were presented to their
right ear or to either ear of controls [30]. The pattern of lesions across patients was heterogeneous, but only one of the four patients tested had a lesion near auditory cortex. In a more recent study, Cusack [33] used fMRI to study neural correlates of streaming in healthy subjects, using sequences similar to those in Figure 1a. He measured subjects subjective judgements while they were being scanned, and showed that activation of three regions in the intraparietal sulcus (IPS) correlated with the perception of two streams. Because, for a given sound sequence, streaming varies over time, he could show that activation of the IPS for the same sequence changed, depending on whether subjects were hearing one or two streams at a given time. He noted that the same area has been implicated in segregation of visual scenes, and suggested that the IPS might be involved in some supra-modal aspect of perceptual organization. One possibility is that the IPS codes the output of perceptual organization processes in each modality, so this does not rule out the idea that much of stream segregation could take place in purely auditory areas. Neural and cognitive bases of ASA: concurrent sound segregation In comparison with auditory streaming, less research has investigated the neural mechanisms underlying the segregation of sounds that overlap in time (Figure 1bd). With the exception of binaural cues (Box 2), studies of neural correlates of simultaneous sound segregation have focussed on the mistuning of a single component from an otherwise harmonic complex. Those studies show that the introduction of mistuning produces a change in the physiological response somewhere in the auditory system [25,3437]. Of course, any physical change in a stimulus is likely to inuence the response of some part of the auditory system, and so the challenge is to show that a given neural change truly reects the processing of mistuning. One approach, adopted by Sinex and colleagues [36,37], is to show that a particular type of response occurs in a predictable way for mistuning with a wide range of stimuli, and that it does not occur for stimuli which are not mistuned. They reported that cells in the central nucleus of the chinchilla inferior colliculus (IC) responded with a complex pattern of beating when, and only when, a complex tone contained a mistuned component. This pattern was consistent across IC neurons with different CFs, and could be predicted from a simple model involving co-incidence detection between the rectied and lowpass ltered outputs of multiple AN bers. However, although these results indicate that the presence of mistuning might be encoded by IC responses, it is less clear how those responses encode which component has been mistuned. Another method is to correlate EEG responses in human listeners with behavioral responses to the same stimuli. Alain and his colleagues [25,35] presented subjects with a sequence of complex tones, which could be either perfectly harmonic or have one partial mistuned. They measured the difference between the traces obtained in response to the mistuned vs. the harmonic tones, and observed an attention-independent component which they
470
Review
Box 3. Questions for future research

Some correlates of streaming have been observed in auditory cortex, but could reect earlier processes. What is the earliest stage of processing at which the effects of Df, repetition rate, and the buildup of streaming can be observed? At what stage in the auditory pathway does attention affect the neural correlates of streaming? Are these effects present only in areas not dedicated to auditory processing? To date, there is no evidence for an effect of attention on the segregation of simultaneous sounds. Is this because simultaneous segregation is always totally independent of attention, or because we havent developed the techniques to reveal such effects? When subjects initially ignore a sequence and then start making streaming judgements, the amount of streaming reported is much less than if they were attending all along [30,32,54]. Is this because the streaming never built up, or because the act of switching attention to something resets streaming?
stimuli, similar ndings could probably be observed in the AN, as a simple result of the frequency selectivity and adaptation present in neurons throughout the auditory system. For other stimuli, however, streaming must occur at least more centrally than the AN, and we now know that streaming can strongly be affected by attention. The multiplicity of factors that can inuence streaming make it unlikely that one can pin down streaming to a specic neural locus. A more fruitful approach might be to identify and characterize those processes, such as selectivity, adaptation, and attention, that inuence streaming, and, importantly, how they interact.
Acknowledgements
I thank Ingrid Johnsrude, Rhodri Cusack, and Chris Darwin for helpful comments on a previous version of this article.
termed the object related negativity (ORN), with a latency of about 150 ms. It was followed by a positive deection, with a latency of about 400 ms, which was dependent on subjects attending to the sounds. The size of the ORN increased monotonically with the amount of mistuning, in a manner similar to the increase in the tendency of listeners to report hearing two sound sources. However, mistuning one component does have other effects not directly related to sound segregation for example by altering the pattern of beating between adjacent harmonics. It would therefore be reassuring to know that other manipulations, which did not result in the percept of two sound sources, did not produce an ORN. The need for some caution is highlighted by the case of an earlier component, with a latency of around 30 ms, which increases monotonically with increases in mistuning, but which also differs between perfectly harmonic complexes [34]. It should also be noted that the range over which increasing mistuning had the greatest effect on this early component differed from that which had the greatest effect on perceptual judgements. Summary and conclusions Two tasks facing the listener in a complex auditory world are the grouping of simultaneous frequency components into one or more auditory objects, and the organization of sequential sounds into auditory streams. The study of the neural mechanisms underlying simultaneous sound segregation is very much in its infancy. Several studies have revealed differences in neural responses brought about by mistuning, but it remains a major challenge to show that a particular component has been segregated by a given neural structure. When a component is mistuned from a complex, subjects can hear out that component, which sounds like a pure tone. No-one has yet reported a neural response that resembles that to a pure tone, and which appears when the appropriate harmonic of a complex tone is mistuned (see also Box 3 for other outstanding questions). Rather more progress has been made in the eld of auditory streaming. Single-cell measurements in primary auditory cortex have revealed response properties that mimic the effects of frequency separation and presentation rate of pure tones. I have argued here that, for these
References
1 Bregman, A.S. (1990) Auditory Scene Analysis, MIT Press 2 Darwin, C.J. (1997) Auditory grouping. Trends Cogn. Sci. 1, 327333 3 Moore, B.C.J. and Gockel, H. (2002) Factors inuencing sequential stream segregation. Acta Acustica/Acustica 88, 320333 4 Roberts, B. et al. (2002) Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. J. Acoust. Soc. Am. 112, 20742085 5 Cusack, R. and Roberts, B. (2000) Effects of differences in timbre on sequential grouping. Percept. Psychophys. 62, 11121120 6 Grimault, N. et al. (2000) Inuence of peripheral resolvability on the perceptual segregation of harmonic complex tones differing in fundamental frequency. J. Acoust. Soc. Am. 108, 263271 7 Vliegen, J. and Oxenham, A.J. (1999) Sequential stream segregation in the absence of spectral cues. J. Acoust. Soc. Am. 105, 339346 8 Vliegen, J. et al. (1999) The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task. J. Acoust. Soc. Am. 106, 938945 9 Assmann, P.F. and Summereld, Q. (1989) Modeling the perception of concurrent vowels: vowels with the same fundamental frequency. J. Acoust. Soc. Am. 85, 327338 10 Assmann, P. and Summereld, Q. (1990) Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. J. Acoust. Soc. Am. 88, 680697 11 Darwin, C.J. and Carlyon, R.P. (1995) Auditory Grouping. In Hearing (Vol. 6) (Moore, B.C.J. ed.), Hearing, pp. 387424, Academic 12 Fishman, Y.I. et al. (2001) Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167187 13 Micheyl, C. et al. (2003) The neural basis of stream segregation in the primary auditory cortex. Assoc. Res. Otalryngol. Abs. 26, 195 14 Beauvois, M.W. and Meddis, R. (1991) A computer model of auditory stream segregation. Q. J. Exp. Psychol. 43A, 517542 15 Shinozaki, N. et al. (2000) Mismatch negativity (MMN) reveals sound grouping in the human brain. Neuroreport 11, 15971601 16 Yabe, H. et al. (2001) Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration. Brain Res. 897, 222227 17 Sussman, E. et al. (1998) Attention affects the organization of auditory input associated with the mismatch negativity system. Brain Res. 789, 130138 18 Sussman, E. et al. (1999) An investigation of the auditory streaming effect using event-related potentials. Psychophysiology 36, 2234 19 Fritz, J. et al. (2003) Rapid task-related plasticity of spectrotemporal receptive elds in primary auditory cortex. Nat. Neurosci. 6, 12161223 20 Kasai, K. et al. (1999) Multiple generators in the auditory automatic discrimination process in humans. Neuroreport 10, 22672271 21 Kropotov, J.D. et al. (1995) Mismatch negativity to auditory stimulus change recorded directly from the human temporal cortex. Psychophysiology 32, 418422 22 Javitt, D.C. et al. (1994) Detection of stimulus deviance within primate primary auditory- cortex - intracortical mechanisms of mismatch negativity (Mmn) generation. Brain Res. 667, 192200
Review
471
23 Kane, N.M. et al. (2000) Coma outcome prediction using event-related potentials: P-3 and mismatch negativity. Audiol. Neurootol. 5, 186191 24 Kane, N.M. et al. (1996) Event related potentials - neuropsychological tools for predicting emergence and early outcome from traumatic coma. Intensive Care Med. 22, 3946 25 Alain, C. and Izenberg, A. (2003) Effects of attentional load on auditory scene analysis. J. Cogn. Neurosci. 15, 10631073 26 Sussman, E. et al. (2002) Top-down effects can modify the initially stimulus-driven auditory organization. Brain Res. Cogn. Brain Res. 13, 393405 27 Jones, D.M. et al. (1999) Organizational factors in selective attention: the interplay of acoustic distinctiveness and auditory streaming in the irrelevant sound effect. J. Exp. Psychol. Learn. Mem. Cogn. 25, 464473 28 Macken, W.J. et al. (2003) Does auditory streaming require attention? Evidence from attentional selectivity in short-term memory. J. Exp. Psychol. Hum. Percept. Perform. 29, 4351 29 Jones, D.M. and Macken, W.J. (1995) Organizational factors in the effect of irrelevant speech: the role of spatial location and timing. Mem. Cogn. 23, 192200 30 Carlyon, R.P. et al. (2001) Effects of attention and unilateral neglect on auditory stream segregation. J. Exp. Psychol. Hum. Percept. Perform. 27, 115127 31 Carlyon, R.P. et al. (2003) Cross-modal and non-sensory inuences on auditory streaming. Perception 32, 13931402 32 Cusack, R. et al. (2004) Effects of location, frequency region, and time course of selective attention on auditory scene analysis. J. Exp. Psychol. Hum. Percept. Perform. 30, 643656 33 Cusack, R. The intraparietal sulcus and perceptual organization. J. Cogn. Neurosci. (in press) 34 Dyson, B.J. and Alain, C. (2004) Representation of concurrent acoustic objects in primary auditory cortex. J. Acoust. Soc. Am. 115, 280288 35 Alain, C. et al. (2001) Bottom-up and top-down inuences on auditory scene analysis: evidence from event-related brain potentials. J. Exp. Psychol. Hum. Percept. Perform. 27, 10721089 36 Sinex, D.G. et al. (2003) Responses of auditory nerve bers to harmonic and mistuned complex tones. Hear. Res. 182, 130139 37 Sinex, D.G. et al. (2002) Responses of inferior colliculus neurons to harmonic and mistuned complex tones. Hear. Res. 168, 150162 38 Culling, J.F. and Summereld, Q. (1995) Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J. Acoust. Soc. Am. 98, 785797 39 Bronkhorst, A.W. and Plomp, R. (1988) The effect of head-induced interaural time and level differences on speech intelligibility in noise. J. Acoust. Soc. Am. 83, 15081516
40 Gockel, H. and Carlyon, R.P. (1998) Effects of ear of entry and perceived location of synchronous and asynchronous components on mistuning detection. J. Acoust. Soc. Am. 104, 35343545 41 Licklider, J.C.R. (1948) The inuence of interaural phase relations upon the masking of speech by white noise. J. Acoust. Soc. Am. 20, 150159 42 Akeroyd, M.A. and Summereld, A.Q. (2000) Integration of monaural and binaural evidence of vowel formants. J. Acoust. Soc. Am. 107, 33943406 43 Culling, J.F. and Colburn, H.S. (2000) Binaural sluggishness in the perception of tone sequences and speech in noise. J. Acoust. Soc. Am. 107, 517527 44 Jeffress, L.A. et al. (1952) The masking of tones by white noise as a function of the interaural phases of both components: I. 500 cycles. J. Acoust. Soc. Am. 24, 523527 45 Grantham, D.W. (1995) Spatial hearing and related phenomena. In Hearing (Vol. 6) (Moore, B.C.J ed.), Hearing, pp. 297346, Academic Press 46 Palmer, A.R. and Shackleton, T.M. (2002) The physiological basis of the binaural masking level difference. Acta Acustica/Acustica 88, 312319 47 Warren, R.M. (1999) Auditory Perception: A New Analysis and Synthesis, Cambridge University Press 48 Sugita, Y. (1997) Neuronal correlates of auditory induction in the cat cortex. Neuroreport 8, 11551159 49 Micheyl, C. et al. (2003) Neurophysiological correlates of a perceptual illusion: a mismatch negativity study. J. Cogn. Neurosci. 15, 747758 50 Petkov, C.I. et al. (2003) Illusory sound perception in macaque monkeys. J. Neurosci. 23, 91559161 51 Kluender, K.R. and Jenison, R.L. (1992) Effects of glide slope, noise intensity, and noise duration on the extrapolation of fm glides through noise. Percept. Psychophys. 51, 231238 52 Plack, C.J. and White, L.J. (2000) Perceived continuity and pitch perception. J. Acoust. Soc. Am. 108, 11621169 53 Carlyon, R.P. et al. Coding of FM and the continuity illusion. In Auditory Signal Processing: Physiology, Psychoacoustics, and Models (Pressnitzer, D. et al., eds), Springer (in press) 54 Carlyon, R.P. et al. (2001) Cross-modal and cognitive inuences on the build-up of auditory streaming. Br. J. Audiol. 35, 139140 55 Patterson, R.D. (1976) Auditory lter shapes derived with noise stimuli. J. Acoust. Soc. Am. 59, 640654
Reproduction of material from Elsevier articles

Interested in reproducing part or all of an article published by Elsevier, or one of our article gures? If so, please contact our Global Rights Department with details of how and where the requested material will be used. To submit a permission request on-line, please visit: http://www.elsevier.com/wps/nd/obtainpermissionform.cws_home/obtainpermissionform Alternatively, please contact: Elsevier Global Rights Department PO Box 800, Oxford OX5 1DX, UK. Phone: (+44) 1865-843830 Fax: (+44) 1865-853333 [email protected]

Carl Yon 2004

Uploaded by

Copyright:

Available Formats

Carl Yon 2004

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Carl Yon 2004

Uploaded by

Copyright:

Available Formats

Review

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

How the brain separates sounds

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

(a) F0=100 Hz Level (dB, arbitrary) 30 20 10 Amplitude 1

1 0 8 2 4 6 Frequency (Hz) 10 60 70 80 90 Time (ms) 100

(c) Effect of F0 difference on streaming Probability of 2 streams response

TRENDS in Cognitive Sciences

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

Box 1. Two ears are better than one

4 BMLD:11 dB (b) 8 20 40 60 80 500 Hz Signal level (dB SPL)

(e) 10 5 D values 0 5 10 30 50 BMLD:13 dB 70 90

500 Hz Signal level (dB SPL)

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

Box 2. The continuity illusion

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

Box 3. Questions for future research

TRENDS in Cognitive Sciences

Vol.8 No.10 October 2004

Reproduction of material from Elsevier articles

You might also like