Cognition - Arnold Lewis Glass (1) (2) - Copia-137-186

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

4

Mental action: attention and


consciousness

People like to look for things. From Where’s Waldo? to countless video
games, searching for visual targets is considered by many people to be
fun. Paleontologists hunt for dinosaur bones, anthropologists hunt for
human bones, and archeologists hunt for the debris of ancient
civilizations. Quarterbacks search for the open receiver and point
guards search for the open shooter. Merchants stock miles of shelves
in giant stores with confidence that customers will be able to find what
they look for. People can routinely detect and respond to task relevant
targets so quickly that they can drive vehicles through crowded streets
safely much more quickly than any person can walk or run. The general
name for this ability to control perception so that what is important is
seen or heard is attention.
Attention refers to voluntary actions that are used to control
perception. Seeing and hearing are the consequence of looking and
listening. The control of perception involves either of two tasks. One
task is to select a single target for processing, such as when you listen
to a single conversation during a party. The other task is to divide
attention among several targets, such as when you drive a car down a
crowded street. In this chapter we will see the following.
Selective attention involves three stages: target specification,
search, and target identification. The frontal cortex, where the
target is specified, selects for the target by the inhibition of all
perceptual input that is not target-related.

Because the human information-processing system can plan and


perform only one ad hoc voluntary response at a time, the
bottleneck imposed by serial responding limits the number of
targets that can be identified and responded to. Tasks that
require independent ad hoc responses to multiple targets are
called divided attention tasks. Divided attention inevitably results
in missed targets and lower responses. Multitasking results in
poorer performance than performing the tasks separately.

Target selection is not entirely determined by top-down control


from the prefrontal cortex and parietal cortex. Emotional arousal
may also play a role. Emotional arousal increases the inhibition
of distracters, narrowing perceptual processing to just the
emotionally significant target. In other words, one cannot ignore
something sufficiently threatening.

4.1 About attention


People do not passively see and hear but actively look and listen. The
same neural systems that evolved to control motor movements control
perception as well.

Definitions
Attention begins with action. As mentioned in Chapter 2, the
fundamental unit of action is the same as the fundamental unit of
memory: a voluntary action is made to a target in a context, producing
a result. Similarly, the fundamental unit of attention is the same as the
fundamental unit of action: a voluntary action is made to a target in a
context, producing a result. The only difference between action and
attention is that action implies movement but attention includes both
the initiation of movements and mental actions that do not involve
movement. Listening for the sound of a starting gun, refusing to move
until “Simon says,” and trying to remember the answer to a question
are all voluntary mental actions that do not require motor movement.
The ability to make a voluntary response is the central subject
matter of psychology, and there are many names for the experience of
performing a voluntary action. When a target enters awareness, just as
this sentence is in your awareness, you can do something in response
to it. Saying that you are aware of something, that you are conscious
of something, that you experience something, that you perceive
something, and that you can make a voluntary response to something
are all ways of describing the target of an action (Table 4.1).

Table 4.1 Different ways of describing the target of voluntary action

Target is in…

Awareness

Consciousness

Experience

Perception

While these various words tend to be used in different contexts


and emphasize different aspects of experience, they do not name
different related things; they are all names for the same thing. This can
be proved by a test in which any two of the words are selected and we
try to imagine whether one can be true of something without the other.
For example, you cannot be aware of something without being
conscious of it, so awareness and consciousness must be the same
thing. We must therefore avoid the trap of “explaining” consciousness
as your awareness of something. We cannot describe or explain
something by simply giving it different names. Rather, we can begin to
explain voluntary action by describing its consequences.
Consciousness: the consequence of a voluntary action to a
target
There are important consequences of directing a voluntary response to
a target (Table 4.2). First, you perceive the target, its context, your
action, your intention, and the result of the action. These elements
comprise your experience of that moment in time. Second, a
representation of the episode may be encoded in your brain. The action
is recorded by the habit system, which is procedural learning. A
perceptual representation of the target is encoded in the brain by the
instrumental system, which activates a familiar representation so that
the target is perceived as familiar (Somers and Sheremata, 2013). One
functional consequence is that you are aware of the entire
representation, so that, when you hear the notes of a tune, you hear
the tune rather than just single notes. Moreover, you see an entire face,
rather than just a nose, an eye, or an ear (Chapter 5).
Table 4.2 Differences between targets and distracters

Targets Distracters

Part of present Yes No

Procedural learning Yes No

Declarative learning Maybe No

4.2 Selective attention: target


specification, search,and identification
Voluntary action consists of two stages: planning and performance
(Chapter 3). One advantage of having voluntary movements controlled
by a two-stage process is that a planned action can be withheld until
just the right moment for execution. For example, a projectionist plans
the motion to turn on the second projector and then waits until the
second splotch appears. When the plan and execution of an action are
separated in time because the execution is deliberately delayed, we
are aware of the plan and call it an intention. When the plan is
immediately executed, as when you reach for your toothbrush, the
planning stage is too brief to be noticed. However, it is always present.
Planning often begins with the retrieval of an episode. Even when
movies were sent to movie theaters in boxes of fifteen-minute reels, the
audience saw a movie lasting more than an hour without interruption. A
projectionist used two projectors focused on the screen and switched
reels and projectors as needed. The projectionist’s behavior was
directed by an episode in memory describing the end of every reel. The
episode described two successive target splotches in the upper left-
hand corner of a movie screen, the second one coinciding with the end
of the reel, as well as the action of turning on the other projector when
the second splotch appeared. The episode specified the target of the
action – that is, the second splotch. So, the projectionist knew what the
splotch looked like and when it would appear, which is why he or she
was able to rapidly detect it and was prepared to initiate the action of
turning on the other projector upon its appearance. The first stage of
selective attention will therefore be called target specification.
Second, the location of a potential target is identified through the
comparison of perceptual input with the target representation. This will
be called search. The search process often involves eye movements
under the control of the habit system. The search stage ends when the
comparison produces a match identifying the target, which causes a
response indicating that the target has been found within the
instrumental system. As a result of the match, input from non-target
locations is inhibited.

The gate model of the visual search system


Figure 4.1 shows the parts of the brain involved in controlling the
identification of a target. These include some of the same parts of the
brain that control motor action (compare Figure 4.1 with Figure 3.2, in
Chapter 3), as well as the visual system (Chapter 6). When you choose
to look here or there, the prefrontal cortex selects a target by inhibiting
the processing of the visual field around it and moving your eyes
towards it. The selection versus inhibition of visual information from
different locations in the visual field may be conceptualized as the
opening versus closing of gates along the visual pathway for sensory
input from different locations in the visual field.

Figure 4.1 The top panel shows the cortical areas that direct visual target
selection. The bottom panel shows the medial areas that carry out the selection
and the reticular formation, which influences target selection by modulating
arousal.
The visual scanning system uses a two-stage process of target
identification. In the first stage, sensory information from multiple
locations in the visual display is subjected to feature analysis to
determine whether it contains a feature that identifies the target. For
example, suppose that the task is to find a red “O” in a field of green
“X”s and “O”s (Figure 4.2, top). The visual system organizes the
sensory input into several distinct maps. One map contains all the
locations where the input is red and one contains all the locations
where a circle has been detected. In the second stage, the eyes move
among the locations of one of these feature maps. At each location a
more complete description of the input at that location is constructed,
until the target is found. When a feature map contains many fewer
locations than the entire field, selecting possible target locations on the
basis of that feature correspondingly decreases the number of locations
that must be examined to find the target. For example, in the display at
the top of Figure 4.2 there is only a single location with a red feature,
so this location will be searched and the red “O” will be found
immediately.
Figure 4.2 Selective attention to feature conjunction. Red circle can be
effortlessly selected on the basis of color (top); but red circle requires effort to
select on the basis of color and shape (bottom).

The brain system for target selection is shown schematically in


Figure 4.3. By opening one gate and closing the others, a particular
input may be selected for further processing. At the top of the figure we
have the prefrontal cortex, parietal cortex, and basal ganglia,
components of the motor system (Chapter 3). Together, the prefrontal
cortex and the posterior parietal cortex specify both the target and the
target-selection feature that determines the locations that must be
searched. They activate the basal ganglia, which, among their many
functions involving the control of action, control eye movements and the
processing of visual information from specific locations. In turn, the
basal ganglia signal the superior colliculus to compute the trajectory of
the eye movement to each location. The basal ganglia also inhibit
thalamic processing of input from all non-target locations and disinhibit
thalamic processing of input from the target location.

Figure 4.3 The basal ganglia control the thalamic gate and hence regulate the
flow of information to the cortex.

To the center and right of Figure 4.3 we have the thalamus


controlling the flow of sensory input from the retina to the visual
processing areas in the cortex (Desimone and Duncan, 1995; Heilman,
Watson, and Valenstein, 1993; Posner and Petersen, 1990). In the
thalamus is a system of gates that distributes visual sensory input to
the occipital cortex and inferior temporal cortex, and to the parietal
cortex. The visual cortex is where the analysis of the visual input into
simple features and patterns occurs and the inferior temporal cortex is
where these patterns are compared with a representation of the visual
target (Chapter 6). The posterior parietal cortex marks the spatial
location of the visual target and passes this information to the prefrontal
cortex.
At the bottom of Figure 4.3 are the superior colliculus and the
reticular formation. The superior colliculus moves the eyes to fixate on
a visual target (Fischer and Breitmeyer, 1987; Posner and Cohen,
1984). The basal ganglia direct the eyes to particular locations by
inhibiting their movement to other locations. The reticular formation is
part of the general arousal system, which exerts a general level of
control over action through inhibition of the basal ganglia.
Notice that the processing of the input from a location leads rather
than follows the eye movement to it. Complete information about the
target is not obtained until after an eye movement has been made to
fixate on it. Remington (1980) found that the thalamic shift to the new
target occurs within 150 milliseconds of its initiation by the basal
ganglia. Weichselgartner and Sperling (1987) confirmed this estimate in
a behavioral study. They presented continuous streams of characters
at the rate of 10 or 12.5 characters per second at two locations.
Observers watched for the presence of the letter “C” in one location
and then reported the first four digits seen in the adjacent location. In
fact, the probability of reporting a digit depended on how long after the
“C” it appeared. The first digits detected appeared 100 msec after the
“C” (Figure 4.4). Processing of the target is not complete until about
300 msec after the initial processing of the location began.
Weichselgartner and Sperling (1987) found that the probability of
reporting a digit peaked at 300 to 400 msec after the “C” was shown
(Figure 4.4).
Figure 4.4 The probability of a report peaks at 400 milliseconds after the cue to
switch attention (Weichselgartner & Sperling, 1987).

4.3 Visual target detection


When a target is selected through the closing and opening of thalamic
gates to sensory input, the process is called early selection. How the
visual system organizes visual information will be described in Chapter
6. How that organization is used for target selection is described here.

Early selection during target search


The speed of target selection is determined by the perceptual
organization of the visual field containing the target. When the target is
distinctively different from its surroundings, it is found rapidly. Either a
distinctive shape or a distinctive color may mark a target as distinct. For
even three-month-old infants, a single plus sign will pop out from
among six “L”s and a single “L” will pop out from among six plus signs,
as shown in Figure 4.5 (Rovee-Collier, Hankins, and Bhatt, 1992).
When the target has a unique shape or color, it pops out of the visual
field. This is because the visual system organizes the sensory input so
that a visual pathway transmits only the sensory input that contains the
distinctive feature. Hence, when the target pops out, it reduces the
number of locations that are searched to find the target to only the one
(or few) containing its distinctive feature.
Figure 4.5 For even three-month old infants, a single plus sign will pop out
from among six “L”s and a single “L” will pop out from among 6 plus signs
(Rovee-Collier, Hankins, and Bhatt, 1992).

How quickly the pop-out occurs depends on how different the


target is from the surrounding distracters and how far it is from the
initial fixation point of the target search (Theeuwes, Kramer, and
Atchley, 1999) and how many adjacent targets contain the target
feature. Marking a target with a distinctive feature initiates its
processing in less than 100 milliseconds (Bay and Wyble, 2014). For
example, in Figure 4.6, the column of “O”s pops out. Figure 4.6
illustrates that adjacent locations that all contain the same feature result
in rapid pop-out for the target they collectively form. Pop-out can also
occur for separated locations with the same distinctive feature. When
two of four briefly presented letters were cued by red bars, two
simultaneous letters were recognized as accurately as one (Bay and
Wyble, 2014).
Figure 4.6 Attention can be directed by perception to the vertical pattern by the
“o”s. Attention can also be directed by memory to the horizontal letter string
that forms the word “broth.”

Memory also plays a role in early selection. The features used to


restrict the locations searched for a target not only include innate
features, such as color and orientation, but familiar patterns, such as
the shapes of letters. Wang, Cavanaugh, and Green (1994) found that
a mirror image “Z” pops out from a field of “Z”s and a mirror image “N”
pops out from a field of “N”s, even though the only difference between
the novel target and familiar texture is the orientation of a single
diagonal line (Figure 4.7).

Figure 4.7 Reaction time does not vary as a function of number of display
elements (set size) when searching for a novel target (mirror-image N) among
familiar distracters (N). Observers responded whether 1- to 6-element displays
contained only identical elements (target absent condition) or contained one
different element (target present condition). The slopes for correct target-
present and -absent regressions are shown in parentheses (Wang et al.,
1994).
Rather than marking a target with a distinctive feature such as a
color or shape that causes it to pop out from the background, its
location can be indicated by an external visual (such as an arrow) or a
verbal cue that is not part of the target itself. Of course, a person can
move his or her eyes where directed, but this top-down selection is
slower than the automatic bottom-up selection that is an effect of
perceptual organization. Marking a target with a distinctive feature
initiates its processing in less than 100 msec (Bay and Wyble, 2014). In
contrast, marking the location of a target with an external cue does not
initiate its processing in less than 300 msec (Müller and Rabbitt, 1989).

Late visual target selection


When a target does not have a unique feature and is not marked by an
external cue, locations throughout the visual field must be fixated and
the sensory input from each location compared with the target
representation one at a time until it is found. This is a much slower
process, dependent on the number of locations in the visual field
containing targets (Duncan and Humphreys, 1992; Northdurft, 1993;
Treisman and Gormican, 1988). This is called late selection. For
example, in Figure 4.6, the word “BROTH” in the figure is an example
of late selection. It can be found only by serially reading each letter until
it is found.
Generally, late selection is required whenever the target is not
distinguished by a unique visual feature such as its color or shape. A
target with a unique conjunction of shape and color does not pop out
from a display containing distracters that have either the same color or
the same shape. It can be selected only by either serially examining
each display item of the target color to determine its shape or by
serially examining each display item of the target shape to determine its
color. It therefore takes longer to find a conjunction of a letter and a
color among colored letters (Duncan and Humphreys, 1992; Treisman,
1991; 1992) than to pick out a unique color or a unique letter that pops
out. In the top panel of Figure 4.2, the visual system organizes all the
green “X”s and circles into a single pattern called a texture, and the red
circle into another pattern. The red circle thus pops out, and the time it
takes to detect it is independent of the number of green “X”s and circles
in the display. In the bottom panel of Figure 4.2, the visual system
organizes the visual field into an “X” map, a circle map, a red map, and
a green map. The single red circle is not the sole member of any of the
maps, so it does not pop out from among either the circles or the red
shapes. The red circle can be found only by comparing the
representation of each element in either the circle or red map, one at a
time, with a representation of a red circle in memory. As a result, the
time to find the red circle is proportional to the number of display
elements.
Attentional blink for simultaneous targets. Recall that, during
the identification of a target, the processing of information at all non-
target locations is inhibited. When an observer has to make
independent responses simultaneously or near-simultaneously to
targets, the voluntary response to one target inhibits the perception of
other targets, so one target may be missed. The period during which
the ability to detect a target is momentarily reduced by a voluntary
response to another target is called a refractory period. Duncan
(1980) devised a pair of tasks for measuring the refractory period. The
failure to detect one target as the result of detecting another target is
called an attentional blink.
One task for studying the attentional blink is the simultaneous,
brief, presentation of two targets in two different locations. Duncan
(1980) presented a cross consisting of four characters. An observer
had to look for a digit target among letter distracters. In one experiment
there were independent probabilities that a digit could appear in either
the vertical or horizontal bar of the cross. Thus, a cross could contain
zero-, one-, or two-digit targets. In the single response condition the
observer had a single key and was told to press it whenever a cross
contained at least one digit. In this condition, even though the observer
had to press a key when he or she saw even a single digit, he or she
was more likely to press it when the vertical and horizontal bars both
contained a digit. Thus, the probability of pressing the key indicated
that, when both the vertical bar and the horizontal bar contained a digit,
both targets were seen. Furthermore, it did not matter whether the two
bars of the cross were presented simultaneously or successively, half a
second apart.
In the dual-response condition the observer had one key for the
horizontal bar and another key for the vertical bar and had to press the
key for the bar where the digit target appeared. Hence, zero, one, or
two independent responses might be made to a display. In this
condition the probability of detecting a digit target in one bar was
negatively correlated with the probability of detecting a digit target in
the other bar. Furthermore, performance was better when the bars
were presented successively than when they were presented
simultaneously. Since the results of the single-response condition
demonstrated that the observer saw both targets when only one
response was required, it must be that in the dual-response condition
making a response to one target interfered with detecting the other.
Detection versus counting. Because counting requires an
independent response to each target counted, Duncan (1980) found
that reporting the number of targets increased the likelihood that one
would be missed. Four characters were presented in a square display.
In one condition either no or one target appeared and the observer had
to press one of two keys to indicate which. In the other condition either
one or two targets appeared and the observer had to press one of two
keys to indicate which. Detection was more accurate in the zero or one
condition, presumably because the detection of one target requires one
response, but the detection of two targets requires two responses,
because the observer must count to two. Moreover, in the one- or two-
target condition, accuracy was much better when each side of the
square was presented successively than when the entire square was
presented simultaneously, presumably because successive
presentation made it possible to make successive responses to each
side of the square. Similarly, Duncan and Nimmo-Smith (1996) found a
decrement for simultaneously presented targets when the target was
selected early on the basis of a variety of features, including color,
brightness, texture, length, location, and motion, generalizing the
finding that more than one response cannot be made at a time
regardless of how the target is defined.
Attentional blink for successive targets. Another task for
studying the attentional blink is the simultaneous, brief, presentation of
two targets in two different locations. Converging evidence that during
the response to one target another target may be missed comes from
target detection in the RSVP task. Broadbent and Broadbent (1987)
used the RSVP method to present words at the rate of eighty
milliseconds per word (about twelve words per second). The target was
either a word in capital letters or an animal name. When two targets
were presented within half a second of each other, the probability of
reporting the second target was significantly reduced. Furthermore, the
likelihood of missing the second target was greater when the first target
was detected than when it was missed, so the probability of detecting
both targets was very low.
Illusory conjunctions. Another consequence of late selection for
a briefly presented target is an illusory conjunction between a target
feature and a non-target feature. Recall that target detection is a two-
stage process: first a single target feature is detected, and then a
complete representation of the target is identified by comparison with
memory representations. When a sequence of visual patterns occurs
one after the other in the same location at a rate of twenty per second,
there is sufficient time to detect a single target feature but not enough
time to assemble the other features appearing with it into a complete
target representation. Lawrence (1971) flashed successive words at
observers at rates of twenty words a second so that each word was
available for processing for fifty milliseconds before it was replaced by
its successor. All the words but one were in lower case, and the
observer had to report the upper-case word. However, after case
detection, when the processing of information from the target location
was disinhibited, there were representations of several words
presented in the temporal window just before and after the target
upper-case word that were available for identification. Furthermore,
these representations did not include case information (Adams, 1979).
Consequently, illusory conjunctions between the case of one word
and the identity of another word frequently resulted. The error rate was
42 percent for a presentation rate of twenty words per second, and 72
percent of the errors were the reporting of lower-case words that were
also presented. Of these errors, 69 percent of the time a word following
the upper-case word was reported and 31 percent of the time a word
preceding the upper-case word was reported. Lawrence’s task is called
a rapid serial visual presentation (RSVP) task.
Using the RSVP technique, Broadbent and his colleagues
(Gathercole and Broadbent, 1984; McLean, Broadbent, and Broadbent,
1982) demonstrated illusory conjunctions between digits and their
colors, and Intraub (1985) demonstrated illusory conjunctions between
pictures and the kinds of frames around them. In each case a feature
was sometimes perceived as part of a distracter either preceding or
following the target. Using a similar technique, Treisman and Schmidt
(1982) presented colored letters and created illusory conjunctions
between a letter’s shape and color for spatially adjacent colored letters
that were presented simultaneously. For example, if a display contained
a red “B” next to a black “R,” an observer might see a black “B.”
Botella et al. (2011) used the RSVP method to show that an
attentional blink and an illusory conjunction can both occur in the same
refractory period. Participants had to identify two letters in target colors
in a multicolored letter sequence (Figure 4.8, top). Figure 4.8 (middle)
shows the probability of detecting the second target as a function of the
number of intervening letters between the first and second target. The
second target is most often missed when there is one intervening letter
between it and the first target. Botella et al. reasoned that towards the
end of the refractory period an observer would have time to detect and
report some features, and so illusory conjunctions would result. Figure
4.8 (bottom) shows that illusory conjunctions between the target color
and a non-target letter were most often reported towards the end of the
refractory period, when there were four intervening letters between the
first and second target.
Figure 4.8 The task is to detect the letter in the target color in an RSVP of
letters (top). The probability of detecting a second target letter is a function of
the number of letters between the targets (middle). The average position
response is the position of the letter reported as the second target. If the
second target were always reported correctly then this value would be zero.
The probability of an illusory correlation between a target color and letter peaks
at a lag of 4 (bottom) (Botella et al., 2011).
Practice and visual target search
Recall that a reverse “N” or a reverse “Z” pops out from among the
familiar orientations of the letters. As the result of practice with familiar
shapes such as letters and numbers, maps of familiar shapes are
constructed when the visual field is organized. This potential
organization is realized when an observer repeatedly searches for one
kind of target, such as letters, among another kind of target, such as
numbers (Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977).
When unpracticed observers searched for a letter among numbers, the
results were as shown in Figure 4.9. The time to find a letter among
digits was proportional to the number of distracters in the display,
because each display element was compared with memory one at a
time until the target was found. However, as the result of practice,
searching for a letter was associated with the inhibition of numbers,
hence the inhibition of the number texture. After fourteen days of
practicing searching for a letter among digits (during which over 4,000
searches were conducted), the detection of a combination of distinctive
features at a location rapidly resulted in inhibition if they indicated a
number and further processing if they indicated a letter. Furthermore,
multiple locations in the visual field were checked simultaneously for
distinctive features. The time to detect a letter target was therefore no
longer linearly related to the number of display items when the
distracters were letters (Figure 4.9). So, combinations of orientation
features that defined particular characters as letters or numbers, such
as a horizontal line meeting a top-right to bottom-left diagonal for a “7,”
and a circle intersecting top-right to bottom-left diagonal for a “Q,” were
used by the visual system to sort them into numbers or letters for
inhibition or further processing.
Figure 4.9 Performance on an item recognition task for one or more items from
a fixed set as a function of display size (Schneider and Shiffrin, 1977; Shiffrin
and Schneider, 1977).

Furthermore, as the result of practice, a set of features rather than


a single feature was compared with each of several locations in the
visual field at the same time. When an unpracticed observer searched
for a “Q” or a “T,” because more than one distinctive feature had to be
checked at each location, it took longer to search for any one of a set of
two or more targets than for only a single one, indicating that each
location had to be serially checked first for one feature and then for the
other. However, after an observer practiced finding any one of a small
set of targets, it took no longer to search for any one of the set than for
a single member. In other words, a practiced observer can find either a
“Q” or “T” as quickly as when looking for just a “Q” or just a “T”
(Neisser, 1963; Schneider and Shiffrin, 1977; Shiffrin and Schneider,
1977).
There was a downside to practice in visual search. When Shiffrin
and Schneider (1977) reversed the task, practicing searching for letters
among digits made it difficult for the students to search for digits among
letters. The now automatic inhibition of digit features and disinhibition of
letter features could not be immediately reversed. The students found
that they could not ignore the familiar letters; the former letter targets
kept intruding on their awareness when they tried to search for the
digits.
Long-term improvement in visual target search. During the first
months of life, features such as color and shape determine pop out
(Bahrick and Lickliter, 2014), making early selection possible.
Subsequently, as the infant practices visual search as it looks for things
in familiar environments as part of daily life, late selection in familiar
environments improves. Performance on visual search tasks improves
all through childhood and peaks somewhere between the ages of
twenty and thirty. Lane (1980) measured the amount of task-relevant
and task-irrelevant information remembered after the completion of a
selective-attention task to measure what was attended to during the
task. For children, the memory of targets and distracters was positively
correlated. This result implies that the children were not selectively
identifying targets and inhibiting distracters but, rather, were
indiscriminately identifying every input they fixed on. In contrast, for
college students, the memory for targets versus distracters was
negatively correlated, implying that target identification inhibited the
processing of distracters. High school and college students
remembered fewer distracters than elementary school children (Druker
and Hagen, 1969; Hagen, Meacham, and Mesibov, 1970; Wagner,
1974).
Inattention blindness. As mentioned above, when searching for a
target, children still see some distracters. However, by college age,
when a target is detected, observers see only a single target, not the
details of the background. Consequently, what observers see is a
consequence of what they are looking for. Experiments using realistic
materials confirmed the narrowness of perception. People counting the
number of passes during a basketball game failed to notice either a
woman with an umbrella or a gorilla strolling across the court in the
middle of play. Simons and Chabris (1999) found that more than a third
of the observers keeping track of the number of bounce versus aerial
passes failed to notice the intruder in plain view. Inattention blindness
does not only afflict naive observers engaged in an unfamiliar task. It
also occurs for expert searchers who have spent years honing their
ability to detect small abnormalities in specific types of images. Twenty-
four radiologists performed a familiar lung nodule detection task. A
gorilla, forty-eight times the size of the average nodule, was inserted in
the last X-ray that was presented. The gorilla was not seen by 83
percent of the radiologists. Eye tracking revealed that the majority of
those who missed the gorilla looked directly at its location (Drew, Võ,
and Wolfe, 2013).

Vigilance
When targets requiring late selection are infrequent, another difficulty
arises. Life includes a variety of late-selection tasks in which the
environment must be monitored for an extended period of time. Such a
task, called a vigilance task, is the assignment of a sentry, lookout, or
watchman. During World War II the vigilance task required of sonar
operators stimulated interest in the problem. A small blip on a sonar
screen could indicate a school of fish, while a larger blip could be a
ship. The operator’s task is to alert the rest of the crew if a larger blip is
seen. When distracters are sufficiently similar to the target, so that they
are rejected only after comparison with the target representation, the
target representation is disrupted after a few minutes and the
probability of detecting the target is reduced. Therefore, when the
similarity among targets and distracters is high, performance
deteriorates over time. The seminal study was reported by Mackworth
(1948). How long a time passes before the decrement in performance
becomes noticeable depends on a host of task variables,
environmental variables, and subject variables (Mackie, 1977; Stroh,
1971) that influence the degree of processing of a target. When targets
are rare and similar distracters are common, a decrement occurs within
ten minutes of the task’s onset (Jerison, 1977; Mackworth, 1964).
Vigilance is important in a variety of services, including security, quality
control, and traffic control. So, eliminating the vigilance decrement has
practical value. One obvious way is to provide breaks in the task.
However, each successive break beyond the first has a smaller effect
on the decline in performance. Ross, Russell, and Helton (2014) found
that, when one-minute breaks were inserted twenty and thirty minutes
into a forty-minute task, only the first break improved performance.
4.4 Auditory target detection
A parallel auditory system for target specification, search, and
identification operates along with the visual system. During auditory
target identification, the prefrontal cortex and parietal cortex jointly
regulate the processing of auditory input in the superior temporal cortex
through the basal ganglia–thalamic pathway by inhibiting the
processing of auditory distracters while an auditory target is being
processed. Knight and his colleagues (Knight, Scabini, and Woods,
1989) found evidence of the inhibition by making a recording of the
electrical activity (an electroencephalogram: EEG) of the auditory
(area of the superior temporal) cortex in both healthy individuals and
individuals with damage to the prefrontal cortex. When clicks were
presented, the electrical activity in the auditory cortex to the clicking
sounds was greater in individuals with prefrontal cortical damage than
in healthy individuals. The increase in electrical activity in the auditory
cortex when the prefrontal cortex was damaged suggested that, in
healthy individuals, the prefrontal cortex normally inhibits its activity.
Thus, when the prefrontal cortex was damaged, activity in the auditory
cortex increased.
To examine the effect of prefrontal activity on target selection,
electrical activity from both the left and right auditory cortex was
recorded in a study in which listeners heard a sequence of clicks in
both ears at the same time. The listener was told either to attend to the
clicks in the right ear or to attend to the clicks in the left ear. As was
mentioned in Chapter 2, and will be considered in more detail in
Chapter 7, the right auditory cortex processes sounds located on the
left, and the left auditory cortex processes sounds located on the right.
Consequently, the right auditory cortex initially processed the clicks in
the left ear, and vice versa. For healthy listeners, the electrical activity
was greater in the auditory cortex that initially received input from the
attended ear than in the auditory cortex receiving input from the
ignored ear. When input to the right ear was the target, the processing
of input to the left ear was inhibited, and vice versa.
For individuals with damage to the right or left prefrontal cortex, the
different levels of EEG activity indicated that, while the right
hemisphere processes auditory input only from the left side of space,
the left hemisphere ultimately controlled auditory input from both the
right and left sides of space. Damage to the right prefrontal cortex
eliminated the difference in EEG activity between the right and left ear
for left ear targets but did not affect the difference from right ear targets.
This result indicates that the right prefrontal cortex controls the
processing of auditory input only from the left side of space. In contrast,
damage to the left prefrontal cortex reduced but did not eliminate the
difference in EEG activity for right ear and left ear targets alike. This
result indicates that the right prefrontal cortex responds to auditory
input from both the left and right sides of space (Knight et al., 1981;
Knight, Scabini, and Woods, 1989). As mentioned above, the prefrontal
cortex does not control auditory selection alone. Similar results were
obtained when the inferior parietal cortex was damaged (Woods,
Knight, and Scabini, 1983).

Early selection for pitch


The auditory system can search for a single target in a specific location
or a single target with a distinctive pitch, so that it pops out, producing
early selection. In a famous experiment, Treisman and Riley (1969)
demonstrated that target detection based on pitch was independent of
its location. They presented one spoken message to each ear of the
listener. In this case, both messages were lists of digits that also
included an occasional letter spoken in a different voice from the digits.
The listeners had to shadow one of the messages, ensuring attention
to that location. In addition, the listeners were told that, when they
heard a letter in either ear, they should stop shadowing at once and tap
their desk with a ruler. The listeners, then, had to selectively attend
both to the digits presented in one ear and, at the same time, to all the
letters presented to either ear, which they could do by attending to the
distinctive pitch of the voice the letters were spoken in. In fact,
Triesman and Riley’s results showed that nearly all the letters
presented to both ears were detected. Therefore, a listener can
selectively attend to sounds distinguished by their pitch even when he
or she is also attending to sounds distinguished by their location. So,
there is early selection for pitch.
Besides pitch, there is not much auditory processing beyond the
target location. The identification of a target or targets at one location is
accompanied by inhibition of the sensory input, other than pitch, at
other locations. In a famous experiment, Cherry had listeners wear
earphones while one spoken message was presented to the right ear
and another spoken message was presented to the left ear (Cherry,
1953; Wood and Cowan, 1995). Listeners had to repeat everything they
heard in the right ear. This is called shadowing. Its purpose was to
make certain listeners were selectively attending to the target message
in the right ear and to provide a measure of when listening shifted to
the left, which produced a shadowing error. The input in the left ear
always began and ended with normal English spoken in a male voice.
However, the middle portion of the left ear’s input either remained the
same or changed to English spoken in a higher-pitched female voice, to
reversed male speech (that is, a segment of tape-recorded speech
played backwards), or to a single tone. People rarely remembered any
word or phrase heard in the left ear, and only a shadowing error for the
speech in the right ear indicated that they had briefly switched to
listening to the left ear message. Furthermore, although all listeners
knew that the ignored input was speech, some listeners were unable to
definitely identify it as English. Listeners who briefly listened to the left
ear remembered the reversed speech as having something queer
about it, but it was thought to have been normal speech by others. Only
the change of voice from male to female or the introduction of a tone
was almost always noticed. Moreover, after four minutes of practice
shadowing the right ear, switches to the left ear decreased. Selection
on the basis of location was highly effective, therefore. Everything in
the target message was perceived but very little of the distracter input
was perceived.

Late selection in audition


When an auditory target is not distinguished by its pitch or location, and
so does not pop out from the background sounds, each sound at a
different location in the environment must be serially compared with the
target representation. This is a slower process, dependent on the
number of distinct sounds at different locations and the similarity
among them. Selective listening becomes difficult when the inputs are
similar in pitch and location (Treisman, 1964). Recall that Treisman and
Riley (1969) presented lists of digits that also included an occasional
letter to each ear of a listener. The listener had to shadow the list in one
ear and detect all the letters. When the letters were spoken in a
different voice from the digits, they popped out and were detected.
However, when the letters were spoken in the same voice as the digits,
they no longer popped out, so each spoken character had to be
categorized as a digit or letter by matching it with its representation in
memory. When every character presented in either ear had to be
compared with memory and identified, the comparison process was
overwhelmed, and some targets were missed. Thus, late selection was
not perfect in Treisman and Riley’s (1969) task, as about 75 percent of
the letters presented in the attended ear and only 33 percent presented
in the unattended ear were detected.
Moray et al. (1976) found an attentional blink for briefly presented
tones. A listener had to detect a target tone of a particular pitch or
loudness. The target tone always occurred in one of two locations. The
listener had to press one key if the target was heard in one location and
the other key when the target was heard in the other location. In
addition, when target tones were presented simultaneously in both
locations the listener was required to press both keys. When both tones
occurred simultaneously the probability of detecting a tone in one
location was reduced.
Auditory target identification has the same two stages as visual
target identification: first, a feature unique to the target is detected, and,
second, the entire target is identified. When the identification system is
overwhelmed by too many briefly present items, so there is more than
one target representation following feature detection, auditory illusory
conjunctions are observed for pitch and location (Cutting, 1976; Efron
and Yund, 1974) just as visual illusory conjunctions are observed for
shape and color. Furthermore, when a click was used as an external
cue for a visual target, an illusory conjunction was found between
auditory and visual features. When a visual target digit was signaled by
an auditory click, distracters that both preceded and followed the target
were erroneously reported, indicating an illusory conjunction between
the click and the visual presentation of the digit (Weichselgartner and
Sperling, 1987).
As was the case for visual late selection, the ability to selectively
listen to an auditory target increases as its familiarity increases
(Poltrock, Lansman, and Hunt, 1982). However, again, the same
practice that makes a specific message, such as a familiar song, easier
to hear in the presence of other sounds also makes it more difficult to
ignore (Johnston, 1978).

4.5 Hypnosis
The basic function of selective attention is to make looking and listening
possible – that is, to interact with the world. However, some individuals
are extremely adept at inhibiting all other processing when responding
to a target. The ability to concentrate on a target can have other
purposes besides following a conversation. A hypnotic trance is not
something that one person, the hypnotist, induces in the other but,
instead, a selectivity of attention that some people are able to
voluntarily attain. People develop no special abilities under hypnosis
that they do not have when they are not hypnotized (Barber, 1969;
Orne, 1959). Rather, hypnotizable individuals are highly susceptible to
suggestion (Kirsch and Braffman, 2001). A hypnotizable individual will
be compliant with a hypnotist’s requests whether or not the individual is
told he or she is undergoing hypnosis (Orne, 1966).
In a test of hypnotic susceptibility, a person is given a set of
suggestions (external cues), such as that his or her arm is growing
heavy, his or her eyelids are glued shut, or there is a fly buzzing about.
Notice that these targets may be the perception of an internal state or a
memory. The more suggestions the person translates into a perceptual
experience, the more hypnotically susceptible the person is said to be.
There are three standardized tests of hypnotic susceptibility. The
Stanford Hypnotic Susceptibility Scale (Weitzenhoffer and Hilgard,
1959; 1962) and the Barber Suggestibility Scale (Barber and Glass,
1962) must be administered to individuals. The Harvard Group Scale of
Hypnotic Susceptibility (Shor and Orne, 1962) may be administered to
groups. Perhaps 15 percent of all people respond to nearly all the test
items and hence are highly susceptible to hypnosis.
What distinguishes the hypnotic state is the kind of perceptual and
cognitive experiences the hypnotized individual is capable of avoiding
(Orne, 1977). A hypnotized individual may not feel pain from an input
that would cause an unhypnotized individual to feel pain (Hilgard and
Hilgard, 1975). In addition, as described in the next chapter, readers
cannot avoid reading words when identifying the color of the ink they
are printed in. However, hypnotized individuals can inhibit reading in
this task (Raz et al., 2002).

4.6 Distributing voluntary actions among


tasks
Multiple target detection is a part of everyday life. When someone
drives a car, traffic signals, traffic signs, pedestrians, and other vehicles
are all potential targets. Furthermore, different targets may require
different responses, including speeding up, slowing down, or stopping.
When targets requiring different responses may appear at more than
one location, and more than one target may appear at the same time,
the task presents the challenges of monitoring more than one possible
target location and preparing more than one voluntary response.
Divided attention tasks are challenging for two reasons. First, only one
voluntary action may be made at a time (Chapter 3). Second, as
implied by the description of selective attention above, ad hoc voluntary
action inhibits all task-irrelevant perceptual processing.

Visual dominance: cross-modal inhibition


The inhibition associated with a voluntary response does not occur
across modalities. A response to a visual target does not disrupt an
auditory target, and vice versa. Eijkman and Vendrik (1965) and Moore
and Massaro (1973) found little or no decrement in people’s accuracy
in detecting a simultaneous tone and light pair in comparison with their
accuracy in detecting only a single target. Gescheider, Sager, and
Ruffolo (1975) found a similar result when using tones and brief
vibrations as inputs.
However, even though the simultaneous auditory and visual
targets are both detected, they may not be perceived as having
occurred simultaneously. For example, when a light and a tone are
presented simultaneously, the light is likely to be detected first
(Colavita, 1974; Egeth and Sager, 1977). This phenomenon is called
visual dominance (Posner, Nissen, and Klein, 1976). Tactual targets
dominate over auditory targets in the same way that visual targets do
(Gescheider, Sager, and Ruffolo, 1975). When required to make two
separate voluntary responses to two separate inputs in different
modalities, most people give priority to the visual input. When people
are given instructions stressing that they should attend to the auditory
input, the difference in the time it takes them to detect the visual and
auditory targets is virtually eliminated (Egeth and Sager, 1977). The
misperception of simultaneous visual and auditory targets has actually
been known for centuries. Eighteenth-century astronomers used the
ticks of a metronome to time the transits of stars they were observing. It
was they who first noticed and recorded the fact that simultaneously
occurring visual and auditory targets (a star crossing and a tick) were
not perceived as having occurred simultaneously. This observation was
confirmed by Wilhelm Wundt, one of the first experimental
psychologists, and in 1908 it appeared in a textbook (Titchener, 1908)
as the “law of prior entry.”

Target specification and working memory


Voluntary action can specify and maintain a complicated target
representation. For example, you can follow the direction to make a left
at the third traffic signal. You typically maintain the representation of the
direction through verbal rehearsal – that is, you silently tell yourself
three lights, now two lights, etc. Verbal rehearsal is commonly used to
retain information briefly. Conrad (1964) found that even printed letters
were remembered verbally. However, this was a choice, not a
constraint. Kroll et al. (1970) found that, when verbalization of the
letters was impossible, they were remembered visually. The rehearsal
of a complicated target representation is inaccurately called working
memory. Working memory refers to target specification through the
voluntary action of rehearsal. It is not the name of a special kind of
memory.
Just as the effect of verbal rehearsal on target specification is
inaccurately called verbal working memory, the effect of a repeated
sequence of responses to visual targets is called visual working
memory. Since you can perform only one ad hoc voluntary action at a
time, the voluntary actions of working memory necessarily reduce the
kinds of ad hoc voluntary actions that may be made during target
search and identification. Recall that a response to a target inhibits the
perception of other targets in the same perceptual modality but not the
perception of targets in another perceptual modality. This implies that,
when a target must be specified and maintained through voluntary
action, maintaining the target description will interfere with its detection
but an external cue for the target in a different perceptual modality will
not interfere with its detection. Brooks (1968) confirmed this prediction.
Brooks asked students to form either a visual image (for example,
a block letter H) or an auditory image of a simple sentence. For the H,
the students had to determine which corners were convex as they
navigated around a mental image of the H. For the sentence, they had
to report whether or not each word was a noun. The students
responded either manually, by pointing to either a “Y” or an “N” (for
“Yes” or “No”) on a page for each corner or word, or vocally by saying
“Yes” or “No” for each corner or word. The students responded more
rapidly vocally than manually for the visual imagery task but more
rapidly manually than vocally for the linguistic task. Presumably,
forming the visual image of the H interfered with visual control of the
manual response, and forming the auditory image of the sentence
interfered with the construction of the vocal response. This pattern of
results is called selective interference, because forming an image
selectively interferes with perception or production in the same
modality. For example, as just mentioned, forming a visual image
interferes with the scanning of a visual display.
Cross-modal multitasking
Because a response to a target in one perceptual modality does not
prevent the detection of a target in another modality when two tasks
involving different modalities do not require two independent but
simultaneous responses, after sufficient practice, some or all of the
participants can perform both tasks together as quickly and accurately
as either one alone. Schumacher et al. (2001) performed an
experiment to investigate the conditions under which dual-task
performance was equivalent to single-task performance. They
combined an auditory-verbal task with a visual-manual task. In their first
experiment the auditory-verbal task was to say “One,” “Two,” or “Three”
in response to a low, medium, or high tone, respectively. The visual-
manual task was to respond with the index, middle, or ring finger to the
visual targets “O–,” “–O–,” and “–O,” respectively. Half a second after a
warning signal, an auditory, a visual, or both kinds of targets were
presented. Hence, the instant when at least one target would appear
was perfectly predicted by the warning signal.
The results of the experiment are shown in Table 4.3. As can be
seen, when the participants were novices their responses were slower
and less accurate when two targets appeared (a dual task) than when
only one appeared (a single task). However, after practice, the
responses were equally fast whether one or two targets appeared,
though still slightly more accurate for a single target. How were the
practiced participants able to make two responses as rapidly as one?
Notice from the table that the auditory-verbal task took longer to
perform than the visual-manual task. Suppose we assume that the
decision processes for the auditory and visual targets could occur in
parallel, but the planning and performance of the verbal and manual
responses could be performed only sequentially. Practiced participants
made use of the fact that the visual-manual task required a much
shorter decision process than the auditory-verbal task, as indicated by
the shorter response times for it. The practiced participants created a
plan to execute the fast manual response first, and then the slower
verbal response. By the time that the slower decision process for the
verbal response was completed, the manual action (button press) had
already been initiated. The first response therefore did not interfere with
the second.

Table 4.3 The effect of practice on reaction time (and percentage error) for tasks
that have different completion times

Task Trial type Novice Practiced

Auditory-visual Dual-task 725 (6.5) 456 (5.4)

Single-task 655 (5.3) 447 (3.3)

Visual-manual Dual-task 352 (2.4) 283 (5.6)

Single-task 338 (1.3) 282 (2.7)

Source: Schumacher et al. (2001).

To test this explanation of the equally fast dual- and single-task


responses, another experiment was performed in which the preparation
times for the two responses were equalized by making the visual-
manual decision more difficult. Again the auditory-verbal task was to
say “One,” “Two,” or “Three” in response to a low, medium, or high
tone, respectively. This time the visual-manual task was to respond with
the ring, index, little, or middle finger to “O– – –,” “–O– –,” “– –O–,” and
“– – –O,” respectively. Again, a warning signal occurred half a second
before one or both targets were presented.
The results of the experiment are shown in Table 4.4. Notice that
making the visual-manual task more difficult had the effect of slowing
responses in both tasks. This time, practiced responses to two targets
remained slower than responses to one target. These results are
consistent with the results of other studies (Ruthruff, Pashler, and
Klaassen, 2001) indicating that there is an absolute limit to performing
one voluntary action at a time. When a task requires that two voluntary
actions be performed at exactly the same time, one or the other must
give way.

Table 4.4 The effect of practice on reaction time (and percentage error) for tasks
that have similar completion times
Task Trial type Novice Practiced

Auditory-visual Dual-task 1178 (6.8) 565 (4.8)

Single-task 821 (2.9) 466 (4.1)

Visual-manual Dual-task 965 (10.2) 522 (5.3)

Single-task 778 (6.9) 466 (6.9)

Source: Schumacher et al. (2001).

Task switching when responses conflict


It is possible for you to try to perform more than one task with
conflicting responses at the same time by distributing your actions
among all the tasks. However, attempting to do more than one task at a
time and succeeding at doing them all well are two different things.
In a seminal study, Jersild (1927) presented observers with two-
digit numbers to which they had to either add or subtract a number to
determine how quickly a person could switch from addition to
subtraction and back again. Jersild presented his participants with
columns of two-digit numbers. In one condition a participant had to add
six to each number in the column. In a second condition the participant
had to subtract three from each number in the column. In the third
condition the participant had to alternately add six to the first number,
subtract three from the second number, and so on. In order to
determine the time to switch from addition to subtraction and back
again, the average time to complete the first two homogeneous (all
addition and all subtraction) conditions was subtracted from the time for
the alternating condition. The difference was the time it took to
repeatedly switch from addition to subtraction and back again. Jersild
also performed an otherwise identical experiment in which a participant
had to add seventeen or subtract thirteen from each digit. The more
difficult computations produced a longer switching time.
Spector and Biederman (1976) proposed that switching time
reflected the time required to retrieve from memory the plan for
performing the next task. They reasoned that if switching time were the
time to retrieve a representation from memory then a cue for that
representation would reduce retrieval, hence switching, time. They
repeated Jersild’s experiment with an easier addition task. This time
participants had to add three or subtract three from each digit. Again,
performance was slower in the alternating condition. In another
alternating condition, “+3” or “–3” was printed next to each number in
the column, indicating which operation had to be performed. These
redundant visual cues reduced switching time, confirming the retrieval
hypothesis. Subsequently, other studies found evidence consistent with
the retrieval hypothesis. Rogers and Monsell (1995) alternated runs of
two or more trials before switching tasks. If switching time represents
the time to retrieve the task representation on a “switch” trial, then
response time should increase only for the first trial in a run of same-
task trials. This is what Rogers and Monsell found.
The study of multitasking in this modern world has taken on new
urgency because electronic devices entice many users to multitask in
situations in which it is unproductive or even dangerous. Carrying on a
conversation on an interesting topic demands enough attention to
interfere with braking to a light that has changed from green to red
while driving (Strayer and Johnston, 2001).
The cellphone is a serious threat to life because of the inattention
blindness it induces for everything other than the conversation on the
cellphone. Conversation impairs not just target detection but target
memory as well. As a consequence, it impairs performance on any task
in which you must keep track of what you are doing or in which you
must remember what has occurred (such as listening in class). Strayer
and Drews (2007) examined the effects of hands-free cellphone
conversations on simulated driving. Even when participants looked
directly at objects in the driving environment, they were less likely to
create a durable memory of those objects if they were conversing on a
cellphone. This pattern was obtained for objects of both high and low
relevance, suggesting that very little semantic analysis of the objects
occurred. In-vehicle conversations did not interfere with driving as
much as cellphone conversations did, because drivers were better able
to synchronize the demands of driving with in-vehicle conversations
than with cellphone conversations.
Even when cellphone use is not life-threatening there remains the
possibility of severe disruption of daily life. People now routinely
multitask between a cellphone activity and some other activity while in
class, at a business meeting, or a social situation. The results of the
cross-modal multitasking and task-switching studies described above
provide evidence that multitasking between a cellphone and anything
else causes slower, poorer performance on both tasks compared with
performing the tasks one at a time. On the other hand, perhaps the
human brain is sufficiently flexible that infants born with a cellphone in
hand will learn to multitask efficiently. So far, the results are not
encouraging. Ophir, Nass, and Wagner (2009) compared the multitask
performance of those Stanford University students who reported both
extensive multitasking experience and that they were good at it with the
performance of Stanford students who reported being less adept at
multitasking and doing it less often. The experienced multitaskers who
reported being good at it actually did more poorly than the students
who multitasked less. Thus, there is no evidence that practice improves
performance on multitasking, though it may make multitaskers less
aware of how poorly they do it.

4.7 Alerting and arousal


When you are reading this book you are not aware of extraneous
auditory, tactual, temperature, and visual inputs. It is normally quite
satisfactory not to be aware of such inputs, since they are usually not
very important. In fact, these extraneous signals receive so little
attention that we are tempted to believe that they are not processed at
all. However, the complete failure to process such inputs could have
disastrous consequences. For example, if you were not processing the
temperature, and the room suddenly became very hot, you might be
trapped in a fire. If sounds were being totally ignored, then an explosion
would bring no response. It would also be inconvenient if you did not
hear your name when addressed unexpectedly. In order to survive, you
must be able to detect and respond rapidly to unexpected changes in
the environment. Fortunately, a variety of unconditioned responses to
unconditioned stimuli (Chapter 1) serve the function of alerting humans
to important distracters. Reflexes that rapidly withdraw the body from
harmful inputs also provide alerts that direct the perceptual system so
that the input may be perceived and the reflexive response followed by
a voluntary response if necessary. If you unexpectedly prick your finger,
a flexor reflex not only pulls your hand back but also sends an alerting
signal to the brain, so that you will perceive your hand’s location and
the injury.
Startle reflexes to intense stimuli such as loud noises and sharp
changes in brightness generate responses, such as turning the head
and eyes towards the stimulus, that make it a perceptual target, hence
the target of a voluntary response. The various startle reflexes in the
(superior and inferior) colliculus generally do not occur independently of
each other but are modulated by control pathways extending downward
from the amygdala to the colliculus. These top-down pathways control
the likelihood that a stimulus will evoke a response by inhibiting (or
disinhibiting) the response. As the result of this top-down control, the
various reflexes tend to occur together and form an orienting
response to the intense stimulus or stimuli eliciting them. Orienting
responses have several components, including pupil dilation,
contraction of the blood vessels in the limbs, dilation of the blood
vessels in the head, and changes in the galvanic skin response and the
electrical activity of the brain. As mentioned, some of these responses
make the stimulus the target of further perceptual processing. The
remaining responses prepare the body for whatever action may be
directed towards that target. Recall that a response may be either a
motor response or the release of a hormone. Collectively, the variety of
responses are said to arouse the individual or to increase the
individual’s arousal level. Differential stimuli result in orienting
responses composed of different reflexes. Different orienting responses
are perceived as fear, anger, disgust, hunger, sexual arousal, or
surprise.

Emotional arousal
The emotional arousal system provides a more primitive alternative to
the cognitive system of voluntary action for directing actions to targets
and performing responses to them. Unlike the motor system, which
provides a general system for performing any action in response to any
target, the emotional system generates specific behaviors to specific
targets. The four “F”s of emotion-determined behaviors are fear,
fighting, feeding, and sexual activity. Voluntary action and emotional
arousal sometimes compete and sometimes cooperate in the
production of responses.
The amygdala is the structure in the brain that modulates the
orienting responses elicited by alerting stimuli for fear, anger, and
disgust (LeDoux, 1993). The amygdala is a key structure in the
conditioning of fear to novel stimuli. More generally, the amygdala
modulates the likelihood that a stimulus will elicit an avoidance
response on the basis of the context in which the stimulus occurs. So,
the same scream or grotesque image that may frighten you when alone
in a strange place may only mildly excite you in an amusement park
funhouse or on television. Figure 4.10 shows schematically the
principal brain circuits containing the amygdala. The amygdala is at the
center of a network comprised of six principal pathways. Two pathways,
from the thalamus and perceptual processing areas of the cortex, direct
input to the amygdala. Through its anatomical connections, the
amygdala is influenced by simple features, whole objects, the context in
which the objects occur, the semantic properties of objects, images and
memories of objects, and the like. It may be influenced by a present,
imagined, or remembered target. Four pathways from the amygdala, to
the hypothalamus and colliculus, to the basal ganglia, and to the
hippocampus, make it possible for the amygdala to regulate reflexes,
voluntary responses, and memory, respectively. The input and output
pathways may be partitioned into two input-output circuits: a low road,
providing a rapid, reflexive response to a simple stimulus; and a high
road, providing a slower, voluntary response to a perceptual target.
Figure 4.10 The amygdala is at the center of several circuits that generate
fear, anger, and disgust.

Low road. A pathway from the thalamus directly to the amygdala


sends visual input to the amygdala without being first transmitted to the
cortex. In contemporary mammals this pathway functions as an early
warning system, allowing the amygdala to be activated by simple
stimulus features. In response, through the pathway to the
hypothalamus, the amygdala initiates a response appropriate to the
stimulus. The hypothalamus initiates endocrine and autonomic
responses associated with emotional arousal. Pathways to the
forebrain and hypothalamic areas are also involved in the control of
hormones released by the pituitary gland.
High road. In humans the emotional and cognitive memory
systems operate together. Pathways for representations of visual and
auditory targets proceed forwards to the prefrontal cortex and then
downwards to the amygdala. Through the combined effects of the low
road and high road, the emotional arousal influences the voluntary
action directed towards the target. The influence of emotion on action is
first demonstrated by the Kluver–Bucy syndrome. This is a complex set
of behavioral changes brought about by damage to the amygdala in
primates (Kluver and Bucy, 1937; Weiskrantz, 1956). Following such
lesions, animals lose their fear of previously threatening stimuli, attempt
to copulate with members of another species, and attempt to eat a
variety of things that normal primates find unattractive (such as feces
and rocks).
Easterbrook’s hypothesis. A high level of emotional arousal
prepares an individual to do one thing: flee, fight, feed, etc. Easterbrook
(1959) proposed that, as arousal increases, the inhibition of non-target
processing associated with voluntary action increases. Consequently,
as arousal increases from low to moderate levels, the inhibition of
distracters increases. Consequently, performance is less likely to be
disrupted by the processing of distracters. However, as arousal further
increases from moderate to high levels, further increases in inhibition
increase the refractory period between voluntary actions.
Consequently, the number of voluntary responses to task-relevant
targets is reduced. Consequently, performance is disrupted because of
an inability to process all the targets.
The distribution of attention is usually tested with an experimental
paradigm in which a primary task and secondary task are performed
concurrently. If increasing arousal restricts the distribution of attention,
then as arousal increases there should be more processing of the
primary task and less processing of the secondary task, resulting in
better performance on the primary task and poorer performance on the
secondary task. Most of the experiments that have examined the effect
of arousal have used either electric shock or noise to increase it.
Easterbrook (1959) reviewed a large number of studies that found that
an increase in arousal either improved performance on the primary task
or impaired performance on the secondary task. An updated review by
Eysenck (1982) found that, in seven out of ten further studies using
electric shock and eight out of fourteen studies using noise,
performance on the secondary task deteriorated. Moreover, in five of
the fourteen studies in which noise was used to increase arousal,
performance on the primary task improved.
Although the entire point of a fear response is to facilitate escape
behavior by narrowing processing to escape-relevant targets, fear can
sometimes result in a paralyzing narrowing of processing to the fear-
producing target. Consequently, fear can make a dangerous situation
even more dangerous by causing paralysis rather than escape.
Baddeley (1972) found that this explained the difference between diver
performance in the open sea and pressure chamber simulations. The
degree of danger was a crucial variable affecting performance.
Baddeley was stimulated to review findings on people’s performance in
dangerous environments, including deep-sea divers, soldiers in
combat, army parachutists, and soldiers subjected to extremely
realistic, simulated, life-threatening emergencies (see also Weltman,
Smith, and Egstrom, 1971). He found that a dangerous situation tends
to produce a high level of arousal – that is, fear – thus reducing target
detection and decreasing task performance. Just at the moment that
correct performance becomes essential, it is impaired. This leads to the
tragedy of an individual in a car stalled on railroad tracks unable to
figure out how to open the door while the train approaches.
Another task relevant to Easterbrook’s (1959) hypothesis is the
vigilance task. Recall that, in a vigilance task, interference from the
distracters degrades the target’s representation. Any factor that
increases arousal delays the onset of the performance decrement by
reducing the number of distracters processed. Arousing factors include
noise, paying monetary rewards (Bergum and Lehr, 1964), and telling
participants that the task is a selection task for a high-paying job
(Nachreiner, 1977).
Social behavior. The amygdala responds to complex, socially
relevant stimuli (LeDoux, 1993). For social creatures, the emotional
system plays an important role in regulating their social behavior.
Camras (1977) observed that the display of distress cues (a sad facial
expression) resulted in the termination of aggression in four- to seven-
year-olds. Blair (1995) suggested that, for humans, the perception of
distress (that is, a sad facial expression, the sight and sound of tears)
makes it hard to stay angry with someone who is upset.
Task difficulty and distraction
The probability of an orienting response depends on more than the
intensity of the distracter. Whether a distracter causes an alert depends
in part on the working memory demands of the task. If an individual is
maintaining the target through verbal rehearsal then other perceptual
processing is inhibited, so there is little chance for a distracter to
stimulate a response and alert him. As a result, a very difficult task,
requiring immediate, serial, voluntary action to specify the target, is also
more difficult to interrupt than an easier one. Kahneman (1973: 14,
emphasis in original) put it this way:

First try to mentally multiply 83 x 27. Having completed this task, imagine
that you are going to be given four numbers, and that your life depends on
your ability to retain them for ten seconds. The numbers are seven, two, five,
nine. Having completed the second task, it may appear believable that, even
to save one’s life, one cannot work as hard in retaining four digits as one
must work to complete a mental multiplication of two-digit numbers.

Mental multiplication requires repeated voluntary rehearsals to maintain


the intermediate products as targets of subsequent action during the
computation. This continuous voluntary activity reduces the likelihood
that a distracter will stimulate a response and cause an alert.
An experiment by Zelniker (1971) demonstrated this principle.
Zelniker used a very distracting input called delayed auditory
feedback (DAF). People spoke into microphones while their own
speech was played back to them through earphones two-thirds of a
second later. DAF is so distracting that people usually stutter and stop
when they try to speak under these conditions. Zelniker subjected
people to DAF both while they were performing an easy task and while
they were performing a difficult task. The easy task was to shadow a
string of three numbers as they heard them; thus, no more than three
numbers ever had to be rehearsed. The difficult task was to repeat one
string of three numbers while at the same time listening for and
remembering the next string of three numbers; six numbers therefore
had to be rehearsed in two three-number sequences. The speech of
the participants was much more filled with stutters and stops when they
were performing the easy task than when they were performing the
difficult one.
Sometimes the entire point of performing a sequence of voluntary
actions is to block painful distracters. In the Lamaze method of natural
childbirth, controlled breathing techniques are used to divert attention
from the pain of labor. The breathing techniques serve other purposes
as well, but pain reduction is the most important one. In using this
method the woman takes over conscious control of what is usually
unconscious activity: breathing. Because awareness is directed to the
task of breathing in a carefully controlled way, attention is less likely to
be diverted to the strong pain signals being generated by the
contractions of labor.

4.8 Neglect
Neglect is a disorder in which an individual does not respond to
sensory input from some locations in space, or even from some parts of
his or her own body. An individual with severe neglect will be
completely unaware of everything on one side of space, including his or
her own body. Look at Figure 4.11. It shows six self-portraits of the
artist Anton Raderscheidt. Portrait (a) was done before he had a stroke
in his right hemisphere. Portrait (b) was the first one done after the
stroke. Notice that in the first portrait after the stroke he completely
neglected to draw the left side of his face. Yet when asked about it he
saw nothing odd about it. As he recovered, the neglect lessened in
subsequent portraits (b) through (e), and had disappeared by the time
he painted portrait (f).
Figure 4.11 Self-portraits of the painter Anton Raderscheidt before and after he
suffered a right-hemisphere stroke in October 1967: in 1965 (a); December
1967 (b); January 1968 (c); March 1968 (d); April 1968 (e); June 1968 (f)
(Jung, 1980).

Since many different brain structures contribute to the voluntary


control of action, damage to many different brain structures can cause
some form of neglect, including damage to the parietal cortex,
prefrontal cortex, basal ganglia, or thalamus. Depending on the location
of the cortical damage, two major behavioral manifestations of the
neglect syndrome are sensory neglect and motor neglect (Heilman,
Watson, and Valenstein, 1993). Sensory neglect is the failure to
respond to sensory inputs from a particular location. For example, a
patient may be unaware of his wife standing to his left but be able to
see and hear her when she steps to his right. Motor neglect is the
failure to make responses with a particular portion of the body. For
example, when asked to clap hands, a patient may uselessly lift only
his or her right hand in the air.
Neglect to various portions of space has been observed (Rapcsak,
Cimino, and Heilman, 1988; Shelton, Bowers, and Heilman, 1990), but
usually neglect is observed for the left side of space. Recall that the
different effects of damage to the left versus right prefrontal cortex
indicate that areas in the right hemisphere integrate the representations
of sounds from the left and right sides of space. Similarly, the different
effects of damage to the left versus right hemisphere, especially the
posterior parietal cortex, indicate that the integration of the visuospatial
representations of the left and right sides of space usually takes place
in the right parietal cortex, where there are many neurons that respond
to inputs from both the left and right. The right parietal cortex contains
maps encoding the locations of visual targets across the entire visual
field (Somers and Sheremata, 2013). When only the left hemisphere is
damaged the right parietal cortex is often able to immediately
compensate, because its visual map includes the right side of the visual
field, and so no neglect is observed. However, the left hemisphere
responds only to inputs from the right and the visual map of the left
parietal cortex includes only the right side of space. Hence, when only
the right hemisphere is damaged, the left hemisphere cannot
compensate and neglect is observed.
The neglect syndrome may be subcategorized into three different
levels of severity. In its mildest form, a patient responds to a single
input in any location but when inputs are presented simultaneously to
more than one location the patient is unaware of the input in the
neglected location. This phenomenon is an exaggeration of the normal
non-response. The interval over which a normal individual may miss a
second target is about half a second, and there is not an irreversible
bias to miss targets in a particular location (Chapter 4). In the patient
with neglect, briefly presented targets in the neglected location are
always missed when another, competing target appears in another
location.
In the moderate form of sensory or motor neglect, a patient will
neglect inputs in a particular location even when there is no limitation
on the time available to detect it. For example, if a person with left
sensory neglect is asked to put a line through all the letter “A”s on a
page placed in front of him or her, he or she may cross out only the
“A”s on the right side. However, if the patient’s attention is called to the
“A”s on the left by someone pointing them out, he or she can
momentarily respond appropriately to the neglected side.
The neglect extends to visual imagery. Bisiach et al. (1981) asked
patients with right hemisphere lesions to describe a location familiar to
them: the cathedral square in Milan. The patients first described the
features of the square from the vantage point facing the cathedral from
the opposite side of the square. Then the patients were asked to
describe the square again, this time imagining their vantage point to be
the central entrance to the cathedral looking out onto the square. The
patients were able to correctly report more details on their right for both
perspectives.
The severest form of the neglect disorder is coupled with
anosognosia. The patient not only completely ignores some area of
space but also denies having a deficit. Not only is the patient aware
only of sights and sounds from one side of the environment, but the
patient will wash, shave, and comb one side of his or her face and
head, eat the food off one side of the plate, etc. For example,
Raderscheidt thought the self-portrait shown in Figure 4.11 (b) looked
perfectly normal. Patients with anosognosia cannot be talked out of it. If
you point out to them that they cannot move their left arm, they will
respond that there is nothing wrong with their left arm. If you challenge
them to move it you get an evasive reply, such as “I am not going to do
it just because you told me to.” To provide a powerful alerting stimulus
that would arouse the damaged hemisphere, Bisiach, Rusconi, and
Vallar (1992) poured ice water in the left ear of a neglect patient. Cold
water in the left ear causes the eyes to move to the left. Sure enough,
for about a half-hour the patient became aware that half her body was
paralyzed and that she perceived only part of visual space.
Ramachandran repeated the experiment and wrote a vivid account of it
(Ramachandran and Blakeslee, 1998: 145).
Before he administered the ice water, Ramachandran asked:

Mrs. M. how are you doing?


Fine.

Can you walk?


Sure.

Can you use your left hand?


Yes

Are they equally strong?


Yes.

After administering the ice water he again asked:

Can you use your arms?


No, my left arm is paralyzed.

Mrs. M., how long have you been paralyzed?


Oh, continuously, all these days.

This was an extraordinary remark, for it implies that even though she had
been denying her paralysis each time I had seen her over these last few
weeks, the memories of her failed attempts had been registering somewhere
in her brain, yet access to them had been blocked.
Twelve hours later a student of mine visited her and asked,
Do you remember Dr. Ramachandran?
Oh, yes, he was that Indian doctor.

What did he ask you?


He asked me if I could use both my arms.

And what did you tell him?


I told him I was fine.

The effect of the cold water was therefore temporary. When the
shock wore off the denial returned. Fortunately, as is dramatically
apparent in the self-portraits shown in Figure 4.11, over a period of
weeks spontaneous recovery from severe neglect is common.
Summary
Mental action to control perceptual processing is the same as motor
action except that it does not involve the movement of muscles. Any ad
hoc voluntary action, whether involving a motor or non-motor response,
is associated with the inhibition of all other processing not associated
with the action. During the control of perceptual processing, the
inhibition of processing not associated with the action is the entire point
of the action. A perceptual target is selected by inhibiting the
processing of distracters.

The control of perception makes use of the same neural system


that controls physical action, the prefrontal cortex, the parietal
cortex, the basal ganglia, and the thalamus. Looking and
listening are the voluntary actions that determine perception.
Perceptual control involves two kinds of tasks:

selective attention, in which a response must be directed to a


single target in the presence of distracters; and

divided attention, in which responses must be directed to


multiple targets that may be present at the same time.

Selective attention involves three stages: target specification,


search, and target identification. The prefrontal cortex, where the
target is specified, selects for the target by inhibiting all
perceptual input that is not target-related.

Early selection of the target occurs when the target is


encoded as a distinct part of the perceptual representation,
and so pops out without a search for it.

Late selection of the target occurs when it must be identified


by comparing the perceptual representation to a specification
of the target in memory.
Because the human information-processing system can plan and
perform only one ad hoc voluntary response at a time, the
bottleneck imposed by serial responding limits the number of
targets that can be identified and responded to in a divided
attention task.

Tasks requiring divided attention inevitably involve missed


targets when more than one occurs simultaneously within
the same perceptual modality.

Cross-model divided attention is possible as long as only a


single response at a time is required.

Multitasking inevitably results in poorer performance than if


the tasks were performed separately.

Target selection is not entirely determined by top-down control


from the prefrontal cortex and parietal cortex. Bottom-up alerting
and emotional arousal also play a role. Emotional arousal
increases the inhibition of distracters, narrowing perceptual
processing to just the target. In other words, one cannot ignore
something sufficiently threatening.

When both the intentional control of perception and the alerting


system are damaged for a specific part of the perceptual field,
the result is neglect of that area, so nothing in that location is
perceived. Left field neglect is the most common form, because
the left hemisphere processes information only from the right
field and cannot compensate when the right hemisphere is
damaged.

Questions

You might also like