HC2 Multi Modal Perception
HC2 Multi Modal Perception
Multi-Modal Perception
Lorin Lachs
Most of the time, we perceive the world as a unified bundle of sensations from multiple
sensory modalities. In other words, our perception is multimodal. This module provides an
overview of multimodal perception, including information about its neurobiology and its
psychological effects.
Learning Objectives
• Describe the neuroanatomy of multisensory integration and name some of the regions of
the cortex and midbrain that have been implicated in multisensory processing.
Perception: Unified
Although it has been traditional to study the various senses independently, most of the time,
perception operates in the context of information supplied by multiple sensory modalities
at the same time. For example, imagine if you witnessed a car collision. You could describe
the stimulus generated by this event by considering each of the senses independently; that
is, as a set of unimodal stimuli. Your eyes would be stimulated with patterns of light energy
bouncing off the cars involved. Your ears would be stimulated with patterns of acoustic energy
Multi-Modal Perception 2
emanating from the collision. Your nose might even be stimulated by the smell of burning
rubber or gasoline.
Several theoretical problems are raised by multimodal perception. After all, the world is a
“blooming, buzzing world of confusion” that constantly bombards our perceptual system with
light, sound, heat, pressure, and so forth. To make matters more complicated, these stimuli
come from multiple events spread out over both space and time. To return to our example:
Let’s say the car crash you observed happened on Main Street in your town. Your perception
during the car crash might include a lot of stimulation that was not relevant to the car crash.
For example, you might also overhear the conversation of a nearby couple, see a bird flying
into a tree, or smell the delicious scent of freshly baked bread from a nearby bakery (or all
three!). However, you would most likely not make the mistake of associating any of these
Multi-Modal Perception 3
stimuli with the car crash. In fact, we rarely combine the auditory stimuli associated with one
event with the visual stimuli associated with another (although, under some unique
circumstances—such as ventriloquism—we do). How is the brain able to take the information
from separate sensory modalities and match it appropriately, so that stimuli that belong
together stay together, while stimuli that do not belong together get treated separately? In
other words, how does the perceptual system determine which unimodal stimuli must be
integrated, and which must not?
Once unimodal stimuli have been appropriately integrated, we can further ask about the
consequences of this integration: What are the effects of multimodal perception that would
not be present if perceptual processing were only unimodal? Perhaps the most robust finding
in the study of multimodal perception concerns this last question. No matter whether you
are looking at the actions of neurons or the behavior of individuals, it has been found that
responses to multimodal stimuli are typically greater than the combined response to either
modality independently. In other words, if you presented the stimulus in one modality at a
time and measured the response to each of these unimodal stimuli, you would find that adding
them together would still not equal the response to the multimodal stimulus. This
superadditive effect of multisensory integration indicates that there are consequences
resulting from the integrated processing of multimodal stimuli.
One of the most closely studied multisensory convergence zones is the superior colliculus
(Stein & Meredith, 1993), which receives inputs from many different areas of the brain,
including regions involved in the unimodal processing of visual and auditory stimuli (Edwards,
Ginsburgh, Henkel, & Stein, 1979). Interestingly, the superior colliculus is involved in the
Multi-Modal Perception 5
“orienting response,” which is the behavior associated with moving one’s eye gaze toward the
location of a seen or heard stimulus. Given this function for the superior colliculus, it is hardly
surprising that there are multisensory neurons found there (Stein & Stanford, 2008).
The details of the anatomy and function of multisensory neurons help to answer the question
of how the brain integrates stimuli appropriately. In order to understand the details, we need
to discuss a neuron’s receptive field. All over the brain, neurons can be found that respond
only to stimuli presented in a very specific region of the space immediately surrounding the
perceiver. That region is called the neuron’s receptive field. If a stimulus is presented in a
neuron’s receptive field, then that neuron responds by increasing or decreasing its firing rate.
If a stimulus is presented outside of a neuron’s receptive field, then there is no effect on the
neuron’s firing rate. Importantly, when two neurons send their information to a third neuron,
the third neuron’s receptive field is the combination of the receptive fields of the two input
neurons. This is called neural convergence, because the information from multiple neurons
converges on a single neuron. In the case of multisensory neurons, the convergence arrives
from different sensory modalities. Thus, the receptive fields of multisensory neurons are the
combination of the receptive fields of neurons located in different sensory pathways.
Now, it could be the case that the neural convergence that results in multisensory neurons is
set up in a way that ignores the locations of the input neurons’ receptive fields. Amazingly,
however, these crossmodal receptive fields overlap. For example, a multisensory neuron in
the superior colliculus might receive input from two unimodal neurons: one with a visual
receptive field and one with an auditory receptive field. It has been found that the unimodal
receptive fields refer to the same locations in space—that is, the two unimodal neurons
respond to stimuli in the same region of space. Crucially, the overlap in the crossmodal
receptive fields plays a vital role in the integration of crossmodal stimuli. When the
information from the separate modalities is coming from within these overlapping receptive
fields, then it is treated as having come from the same location—and the neuron responds
with a superadditive (enhanced) response. So, part of the information that is used by the brain
to combine multimodal inputs is the location in space from which the stimuli came.
This pattern is common across many multisensory neurons in multiple regions of the brain.
Because of this, researchers have defined the spatial principle of multisensory integration:
Multisensory enhancement is observed when the sources of stimulation are spatially related
to one another. A related phenomenon concerns the timing of crossmodal stimuli.
Enhancement effects are observed in multisensory neurons only when the inputs from
Multi-Modal Perception 6
different senses arrive within a short time of one another (e.g., Recanzone, 2003).
Multisensory neurons have also been observed outside of multisensory convergence zones,
in areas of the brain that were once thought to be dedicated to the processing of a single
modality (unimodal cortex). For example, the primary visual cortex was long thought to be
devoted to the processing of exclusively visual information. The primary visual cortex is the
first stop in the cortex for information arriving from the eyes, so it processes very low-level
information like edges. Interestingly, neurons have been found in the primary visual cortex
that receives information from the primary auditory cortex (where sound information from
the auditory pathway is processed) and from the superior temporal sulcus (a multisensory
convergence zone mentioned above). This is remarkable because it indicates that the
processing of visual information is, from a very early stage, influenced by auditory information.
There are zones in the human brain where sensory information comes together and is integrated such as the Auditory,
Visual and Motor Cortices pictured here. [Image: BruceBlaus, https://goo.gl/UqKBI3, CC BY 3.0, https://goo.gl/b58TcB]
Multi-Modal Perception 7
There may be two ways for these multimodal interactions to occur. First, it could be that the
processing of auditory information in relatively late stages of processing feeds back to
influence low-level processing of visual information in unimodal cortex (McDonald, Teder-
Sälejärvi, Russo, & Hillyard, 2003). Alternatively, it may be that areas of unimodal cortex contact
each other directly (Driver & Noesselt, 2008; Macaluso & Driver, 2005), such that multimodal
integration is a fundamental component of all sensory processing.
In fact, the large numbers of multisensory neurons distributed all around the cortex—in
multisensory convergence areas and in primary cortices—has led some researchers to
propose that a drastic reconceptualization of the brain is necessary (Ghazanfar & Schroeder,
2006). They argue that the cortex should not be considered as being divided into isolated
regions that process only one kind of sensory information. Rather, they propose that these
areas only prefer to process information from specific modalities but engage in low-level
multisensory processing whenever it is beneficial to the perceiver (Vasconcelos et al., 2011).
Although neuroscientists tend to study very simple interactions between neurons, the fact
that they’ve found so many crossmodal areas of the cortex seems to hint that the way we
experience the world is fundamentally multimodal. As discussed above, our intuitions about
perception are consistent with this; it does not seem as though our perception of events is
constrained to the perception of each sensory modality independently. Rather, we perceive
a unified world, regardless of the sensory modality through which we perceive it.
It will probably require many more years of research before neuroscientists uncover all the
details of the neural machinery involved in this unified experience. In the meantime,
experimental psychologists have contributed to our understanding of multimodal perception
through investigations of the behavioral effects associated with it. These effects fall into two
broad classes. The first class—multimodal phenomena—concerns the binding of inputs from
multiple sensory modalities and the effects of this binding on perception. The second class
—crossmodal phenomena—concerns the influence of one sensory modality on the
perception of another (Spence, Senkowski, & Roder, 2009).
Multimodal Phenomena
Audiovisual Speech
Multi-Modal Perception 8
Multimodal phenomena concern stimuli that generate simultaneous (or nearly simultaneous)
information in more than one sensory modality. As discussed above, speech is a classic
example of this kind of stimulus. When an individual speaks, she generates sound waves that
carry meaningful information. If the perceiver is also looking at the speaker, then that perceiver
also has access to visual patterns that carry meaningful information. Of course, as anyone
who has ever tried to lipread knows, there are limits on how informative visual speech
information is. Even so, the visual speech pattern alone is sufficient for very robust speech
perception. Most people assume that deaf individuals are much better at lipreading than
individuals with normal hearing. It may come as a surprise to learn, however, that some
individuals with normal hearing are also remarkably good at lipreading (sometimes called
“speechreading”). In fact, there is a wide range of speechreading ability in both normal hearing
and deaf populations (Andersson, Lyxell, Rönnberg, & Spens, 2001). However, the reasons for
this wide range of performance are not well understood (Auer & Bernstein, 2007; Bernstein,
2006; Bernstein, Auer, & Tucker, 2001; Mohammed et al., 2005).
both the auditory and visual components understand others. [Image: Jeremy Keith, https://goo.gl/18sLfg,
the Principle of Inverse Effectiveness: The advantage gained by audiovisual presentation was
highest when the auditory-alone condition performance was lowest (i.e., when the noise was
loudest). At these noise levels, the audiovisual advantage was considerable: It was estimated
that allowing the participant to see the speaker was equivalent to turning the volume of the
noise down by over half. Clearly, the audiovisual advantage can have dramatic effects on
behavior.
Another phenomenon using audiovisual speech is a very famous illusion called the “McGurk
effect” (named after one of its discoverers). In the classic formulation of the illusion, a movie
is recorded of a speaker saying the syllables “gaga.” Another movie is made of the same
speaker saying the syllables “baba.” Then, the auditory portion of the “baba” movie is dubbed
onto the visual portion of the “gaga” movie. This combined stimulus is presented to
participants, who are asked to report what the speaker in the movie said. McGurk and
MacDonald (1976) reported that 98 percent of their participants reported hearing the syllable
“dada”—which was in neither the visual nor the auditory components of the stimulus. These
results indicate that when visual and auditory information about speech is integrated, it can
have profound effects on perception.
Not all multisensory integration phenomena concern speech, however. One particularly
compelling multisensory illusion involves the integration of tactile and visual information in
the perception of body ownership. In the “rubber hand illusion” (Botvinick & Cohen, 1998),
an observer is situated so that one of his hands is not visible. A fake rubber hand is placed
near the obscured hand, but in a visible location. The experimenter then uses a light paintbrush
to simultaneously stroke the obscured hand and the rubber hand in the same locations. For
example, if the middle finger of the obscured hand is being brushed, then the middle finger
of the rubber hand will also be brushed. This sets up a correspondence between the tactile
sensations (coming from the obscured hand) and the visual sensations (of the rubber hand).
After a short time (around 10 minutes), participants report feeling as though the rubber hand
“belongs” to them; that is, that the rubber hand is a part of their body. This feeling can be so
strong that surprising the participant by hitting the rubber hand with a hammer often leads
to a reflexive withdrawing of the obscured hand—even though it is in no danger at all. It
appears, then, that our awareness of our own bodies may be the result of multisensory
integration.
Crossmodal Phenomena
Multi-Modal Perception 10
Crossmodal phenomena are distinguished from multimodal phenomena in that they concern
the influence one sensory modality has on the perception of another.
A related illusion demonstrates the opposite effect: where sounds have an effect on visual
perception. In the double flash illusion, a participant is asked to stare at a central point on a
computer monitor. On the extreme edge of the participant’s vision, a white circle is briefly
Multi-Modal Perception 11
flashed one time. There is also a simultaneous auditory event: either one beep or two beeps
in rapid succession. Remarkably, participants report seeing two visual flashes when the flash
is accompanied by two beeps; the same stimulus is seen as a single flash in the context of a
single beep or no beep (Shams, Kamitani, & Shimojo, 2000). In other words, the number of
heard beeps influences the number of seen flashes!
Another illusion involves the perception of collisions between two circles (called “balls”)
moving toward each other and continuing through each other. Such stimuli can be perceived
as either two balls moving through each other or as a collision between the two balls that
then bounce off each other in opposite directions. Sekuler, Sekuler, and Lau (1997) showed
that the presentation of an auditory stimulus at the time of contact between the two balls
strongly influenced the perception of a collision event. In this case, the perceived sound
influences the interpretation of the ambiguous visual stimulus.
Crossmodal Speech
with the person’s speaking style simply by clues about the actual sound of their voice. [Ken Whytock, https://
seeing that person speak. Even more goo.gl/VQJssP, CC BY-NC 2.0, https://goo.gl/tgFydH]
Similarly, it has been shown that when perceivers see a speaking face, they can identify the
(auditory-alone) voice of that speaker, and vice versa (Kamachi, Hill, Lander, & Vatikiotis-
Bateson, 2003; Lachs & Pisoni, 2004a, 2004b, 2004c; Rosenblum, Smith, Nichols, Lee, & Hale,
2006). In other words, the visual form of a speaker engaged in the act of speaking appears to
contain information about what that speaker should sound like. Perhaps more surprisingly,
the auditory form of speech seems to contain information about what the speaker should
look like.
Conclusion
In this module, we have reviewed some of the main evidence and findings concerning the role
of multimodal perception in our experience of the world. It appears that our nervous system
(and the cortex in particular) contains considerable architecture for the processing of
information arriving from multiple senses. Given this neurobiological setup, and the diversity
of behavioral phenomena associated with multimodal stimuli, it is likely that the investigation
of multimodal perception will continue to be a topic of interest in the field of experimental
perception for many years to come.
Multi-Modal Perception 13
Outside Resources
Discussion Questions
1. The extensive network of multisensory areas and neurons in the cortex implies that much
perceptual processing occurs in the context of multiple inputs. Could the processing of
unimodal information ever be useful? Why or why not?
2. Some researchers have argued that the Principle of Inverse Effectiveness (PoIE) results
from ceiling effects: Multisensory enhancement cannot take place when one modality is
sufficient for processing because in such cases it is not possible for processing to be
enhanced (because performance is already at the “ceiling”). On the other hand, other
researchers claim that the PoIE stems from the perceptual system’s ability to assess the
relative value of stimulus cues, and to use the most reliable sources of information to
construct a representation of the outside world. What do you think? Could these two
possibilities ever be teased apart? What kinds of experiments might one conduct to try to
get at this issue?
3. In the late 17th century, a scientist named William Molyneux asked the famous philosopher
Multi-Modal Perception 14
John Locke a question relevant to modern studies of multisensory processing. The question
was this: Imagine a person who has been blind since birth, and who is able, by virtue of
the sense of touch, to identify three dimensional shapes such as spheres or pyramids. Now
imagine that this person suddenly receives the ability to see. Would the person, without
using the sense of touch, be able to identify those same shapes visually? Can modern
research in multimodal perception help answer this question? Why or why not? How do
the studies about crossmodal phenomena inform us about the answer to this question?
Multi-Modal Perception 15
Vocabulary
Crossmodal phenomena
Effects that concern the influence of the perception of one sensory modality on the perception
of another.
Crossmodal stimulus
A stimulus with components in multiple sensory modalties that interact with each other.
Integrated
The process by which the perceptual system combines information arising from more than
one modality.
McGurk effect
An effect in which conflicting visual and auditory components of a speech stimulus result in
an illusory percept.
Multimodal
Of or pertaining to multiple sensory modalities.
Multimodal perception
The effects that concurrent stimulation in more than one sensory modality has on the
perception of events and objects in the world.
Multimodal phenomena
Effects that concern the binding of inputs from multiple sensory modalities.
Multi-Modal Perception 16
Multisensory enhancement
See “superadditive effect of multisensory integration.”
Receptive field
The portion of the world to which a neuron will respond if an appropriate stimulus is present
there.
Sensory modalities
A type of sense; for example, vision or audition.
Unimodal
Of or pertaining to a single sensory modality.
Unimodal components
The parts of a stimulus relevant to one sensory modality at a time.
Unimodal cortex
A region of the brain devoted to the processing of information from a single sensory modality.
Multi-Modal Perception 18
References
Andersson, U., Lyxell, B., Rönnberg, J., & Spens, K.-E. (2001). Cognitive correlates of visual
speech understanding in hearing-impaired individuals. Journal of Deaf Studies and Deaf
Education, 6(2), 103–116. doi: 10.1093/deafed/6.2.103
Auer, E. T., Jr., & Bernstein, L. E. (2007). Enhanced visual speech perception in individuals with
early-onset hearing impairment. Journal of Speech, Language, and Hearing Research, 50
(5), 1157–1165. doi: 10.1044/1092-4388(2007/080)
Bernstein, L. E., Auer, E. T., Jr., & Tucker, P. E. (2001). Enhanced speechreading in deaf adults:
Can short-term training/practice close the gap for hearing adults? Journal of Speech,
Language, and Hearing Research, 44, 5–18.
Botvinick, M., & Cohen, J. (1998). Rubber hands /`feel/' touch that eyes see. [10.1038/35784].
Nature, 391(6669), 756–756.
Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional
neuroimaging studies. Cerebral Cortex, 11, 1110–1123.
Calvert, G. A., Hansen, P. C., Iversen, S. D., & Brammer, M. J. (2001). Detection of audio-visual
integration sites in humans by application of electrophysiological criteria to the bold effect.
NeuroImage, 14(2), 427–438. doi: 10.1006/nimg.2001.0812
Driver, J., & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on
‘sensory-specific’ brain regions, neural responses, and judgments. Neuron, 57(1), 11–23.
doi: 10.1016/j.neuron.2007.12.013
Edwards, S. B., Ginsburgh, C. L., Henkel, C. K., & Stein, B. E. (1979). Sources of subcortical
projections to the superior colliculus in the cat. Journal of Comparative Neurology, 184(2),
309–329. doi: 10.1002/cne.901840207
Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). "Putting the face to the voice":
Matching identity across modality. Current Biology, 13, 1709–1714.
Lachs, L., & Pisoni, D. B. (2004a). Crossmodal source identification in speech perception.
Ecological Psychology, 16(3), 159–187.
Lachs, L., & Pisoni, D. B. (2004b). Crossmodal source information and spoken word recognition.
Journal of Experimental Psychology: Human Perception & Performance, 30(2), 378–396.
Multi-Modal Perception 19
Lachs, L., & Pisoni, D. B. (2004c). Specification of crossmodal source information in isolated
kinematic displays of speech. Journal of the Acoustical Society of America, 116(1), 507–518.
Macaluso, E., & Driver, J. (2005). Multisensory spatial interactions: A window onto functional
integration in the human brain. Trends in Neurosciences, 28(5), 264–271. doi: 10.1016/j.
tins.2005.03.008
McDonald, J. J., Teder-Sälejärvi, W. A., Russo, F. D., & Hillyard, S. A. (2003). Neural substrates of
perceptual enhancement by cross-modal spatial attention. Journal of Cognitive
Neuroscience, 15(1), 10–19. doi: 10.1162/089892903321107783
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.
Mohammed, T., Campbell, R., MacSweeney, M., Milne, E., Hansen, P., & Coleman, M. (2005).
Speechreading skill and visual movement sensitivity are related in deaf speechreaders.
Perception, 34(2), 205–216.
Rosenblum, L. D., Miller, R. M., & Sanchez, K. (2007). Lip-read me now, hear me better later:
Cross-modal transfer of talker-familiarity effects. Psychological Science, 18(5), 392–396.
Rosenblum, L. D., Smith, N. M., Nichols, S. M., Lee, J., & Hale, S. (2006). Hearing a face: Cross-
modal speaker matching using isolated visible speech. Perception & Psychophysics, 68,
84–93.
Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception.
[10.1038/385308a0]. Nature, 385(6614), 308–308.
Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature,
408(6814), 788. doi: 10.1038/35048669
Soto-Faraco, S., Kingstone, A., & Spence, C. (2003). Multisensory contributions to the
perception of motion. Neuropsychologia, 41(13), 1847–1862. doi: 10.1016/s0028-3932(03)
00185-4
Spence, C., Senkowski, D., & Roder, B. (2009). Crossmodal processing. [Editorial Introductory].
Exerimental Brain Research, 198(2-3), 107–111. doi: 10.1007/s00221-009-1973–4
Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: The MIT Press.
Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the
perspective of the single neuron. [10.1038/nrn2331]. Nature Reviews Neuroscience, 9(4),
255–266.
Sumby, W. H., & Pollack, I. (1954). Visual contribution of speech intelligibility in noise. Journal
of the Acoustical Society of America, 26, 212–215.
Multi-Modal Perception 20
Vasconcelos, N., Pantoja, J., Belchior, H., Caixeta, F. V., Faber, J., Freire, M. A. M., . . . Ribeiro, S.
(2011). Cross-modal responses in the primary visual cortex encode complex objects and
correlate with tactile discrimination. Proceedings of the National Academy of Sciences, 108
(37), 15408–15413. doi: 10.1073/pnas.1102780108
The Diener Education Fund (DEF) is a non-profit organization founded with the mission of re-
inventing higher education to serve the changing needs of students and professors. The initial
focus of the DEF is on making information, especially of the type found in textbooks, widely
available to people of all backgrounds. This mission is embodied in the Noba project.
Noba is an open and free online platform that provides high-quality, flexibly structured
textbooks and educational materials. The goals of Noba are three-fold:
• To provide instructors with a platform to customize educational content to better suit their
curriculum
The Diener Education Fund is co-founded by Drs. Ed and Carol Diener. Ed is the Joseph Smiley
Distinguished Professor of Psychology (Emeritus) at the University of Illinois. Carol Diener is
the former director of the Mental Health Worker and the Juvenile Justice Programs at the
University of Illinois. Both Ed and Carol are award- winning university teachers.
Acknowledgements
The Diener Education Fund would like to acknowledge the following individuals and companies
for their contribution to the Noba Project: The staff of Positive Acorn, including Robert Biswas-
Diener as managing editor and Peter Lindberg as Project Manager; The Other Firm for user
experience design and web development; Sockeye Creative for their work on brand and
identity development; Arthur Mount for illustrations; Chad Hurst for photography; EEI
Communications for manuscript proofreading; Marissa Diener, Shigehiro Oishi, Daniel
Simons, Robert Levine, Lorin Lachs and Thomas Sander for their feedback and suggestions
in the early stages of the project.
Copyright
R. Biswas-Diener & E. Diener (Eds), Noba Textbook Series: Psychology. Champaign, IL: DEF
Publishers. DOI: nobaproject.com
Copyright © 2016 by Diener Education Fund. This material is licensed under the Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_US.
The Internet addresses listed in the text were accurate at the time of publication. The inclusion
of a Website does not indicate an endorsement by the authors or the Diener Education Fund,
and the Diener Education Fund does not guarantee the accuracy of the information presented
at these sites.
Contact Information:
Noba Project
2100 SE Lake Rd., Suite 5
Milwaukie, OR 97222
www.nobaproject.com
[email protected]
How to cite a Noba chapter using APA Style
Lachs, L. (2013). Multi-modal perception. In R. Biswas-Diener & E. Diener (Eds), Noba textbook
series: Psychology. Champaign, IL: DEF publishers. DOI:nobaproject.com.