Visual Perception From Visual Science
Visual Perception From Visual Science
disciplines, each of which provides different pieces of the jigsaw puzzle. This
interdisciplinary field, which I will call vision science, is part of cognitive science. In this
book, I try to convey a sense of the excitement that it is generating among the scientists
who study vision and of the promise that it holds for reaching a new understanding about
how we see.
In this initial chapter, I will set the stage for the rest of the book by providing an
introductory framework for understanding vision in terms of three domains:
1. phenomena of visual perception,
2. the nature of optical information, and
3. the physiology of the visual nervous system.
The view presented in this book is that an understanding of all three domains and the
relations among them is required to explain vision. In the first section of this chapter, we
will consider the nature of visual perception itself from an evolutionary perspective,
asking what it is for. We will define it, talk about some of its most salient properties, and
examine its usefulness in coupling organisms to their environments for survival. Next,
we will consider the nature of optical information, because all vision ultimately rests on
the structure of light reflected into the eyes from surfaces in the environment. Finally, we
will describe the physiology of the part of the nervous system that underlies our ability to
see. The eyes are important, to be sure, but just as crucial are huge portions of the brain,
much of which vision scientists are only beginning to understand. In each domain, the
coverage in this introductory chapter will be rudimentary and incomplete. But it is
important to realize from the very beginning that only by understanding all three domains
and the relations among them can we achieve a full and satisfying scientific explanation
of what it means to see. What we learn here forms the scaffold onto which we can fit the
more detailed presentations in later chapters.
1.1 Visual Perception
Until now, I have been taking for granted that you know what I mean by "visual
perception." I do so in large part because I assume that you are reading the words on this
page using your own eyes and therefore know what visual experiences are like. Before
we go any further, however, we ought to have an explicit definition.
Figure 1.1.2
The eye-camera analogy. The eye is much like
a camera in the nature of its optics: Both form an upside-down
image by admitting light through a variable-sized opening and
focusing it on a two-dimensional surface using a transparent lens.
1.1.1 Defining Visual Perception
In the context of this book, visual perception will be defined as the process of acquiring
knowledge about environmental objects and events by extracting information from the
light they emit or reflect. Several aspects of this definition are worth noting:
1. Visual perception concerns the acquisition of knowledge. This means that vision is
fundamentally a cognitive activity (from the Latin cognoscere, meaning to know or
learn), distinct from purely optical processes such as photographic ones. Certain physical
similarities between cameras and eyes suggest that perception is analogous to taking a
picture, as illustrated in Figure 1.1.2. There are indeed important similarities between
eyes and cameras in terms of optical phenomena, as we will see in Section 1.2, but there
are no similarities whatever in terms of perceptual phenomena. Cameras have no
perceptual capabilities at all; that is, they do not know anything about the scenes they
record. Photographic images merely contain information, whereas sighted people and
animals acquire knowledge about their environments. It is this knowledge that enables
perceivers to act appropriately in a given situation.
Page 6
2. The knowledge achieved by visual perception concerns objects and events in the
environment. Perception is not merely about an observer's subjective visual experiences,
because we would not say that even highly detailed hallucinations or visual images would
count as visual perception. We will, in fact, be very interested in the nature of people's
subjective experienceparticularly in Chapter 13 when we discuss visual awareness in
detailbut it is part of visual perception only when it signifies something about the nature
of external reality.
3. Visual knowledge about the environment is obtained by extracting information. This
aspect of our definition implies a certain "metatheoretical" approach to understanding
visual perception and cognition, one that is based on the concept of information and how
it is processed. We will discuss this information processing approach more fully in
Chapter 2, but for now suffice it to say that it is an approach that allows vision scientists
to talk about how people see in the same terms as they talk about how computers might
be programmed to see. Again, we will have more to say about the prospects for sighted
computers in Chapter 13 when we discuss the problem of visual awareness.
4. The information that is processed in visual perception comes from the light that is
emitted or reflected by objects. Optical information is the foundation of all vision. It
results from the way in which physical surfaces interact with light in the environment.
Because this restructuring of light determines what information about objects is available
for vision in the first place, it is the appropriate starting point for any systematic analysis
of vision (Gibson, 1950). As we will see in Section 1.2, most of the early problems in
understanding vision arise from the difficulty of undoing what happens when light
projects from a three-dimensional world onto the two-dimensional surfaces at the back of
the eyes. The study of what information is contained in these projected images is
therefore an important frontier of research in vision science, one that computational
theorists are constantly exploring to find new sources of information that vision might
employ.
1.1.2 The Evolutionary Utility of Vision
Now that we have considered what visual perception is, we should ask what it is for.
Given its biological importance to a wide variety of animals, the answer must be that
vision evolved to aid in the survival and successful reproduction of organisms. Desirable
objects and situationssuch as nourishing food, protective shelter, and desirable matesmust
be sought out and approached. Dangerous objects and situationssuch as precipitous
drops, falling objects, and hungry or angry predatorsmust be avoided or fled from. Thus,
to behave in an evolutionarily adaptive manner, we must somehow get information about
what objects are present in the world around us, where they are located, and what
opportunities they afford us. All of the sensesseeing, hearing, touching, tasting, and
smellingparticipate in this endeavor.
There are some creatures for which nonvisual senses play the dominant rolesuch as
hearing in the navigation of batsbut for homo sapiens, as well as for many other species,
vision is preeminent. The reason is that vision provides spatially accurate information
from a distance. It gives a perceiver highly reliable information about the locations and
properties of environmental objects while they are safely distant. Hearing and smell
sometimes provide information from even greater distances, but they are seldom as
accurate in identifying and locating objects, at least for humans. Touch and taste provide
the most direct information about certain properties of objects because they operate only
when the objects are actually in contact with our bodies, but they provide no information
at all from farther distances.
Evolutionarily speaking, visual perception is useful only if it is reasonably accurate. If the
information in light were insufficient to tell one object from another or to know where
they are in space, vision never would have evolved to the exquisite level it has in
humans. In fact, light is an enormously rich source of environmental information, and
human vision exploits it to a high degree. Indeed, vision is useful precisely because it is
so accurate. By and large, what you see is what you get. When this is true, we have what
is called veridical perception (from the Latin veridicus meaning to say truthfully):
perception that is consistent with the actual state of affairs in the environment. This is
almost always the case with vision, and it is probably why we take vision so completely
for granted. It seems like a perfectly clear window onto reality. But is it really?
In the remainder of this section, I will argue that perception is not a clear window onto
reality, but an actively constructed, meaningful model of the environment that allows
perceivers to predict what will happen in the
Page 7
future so that they can take appropriate action and thereby increase their chances of
survival. In making this argument, we will touch on several of the most important
phenomena of visual perception, ones to which we will return at various points later in
this book.
1.1.3 Perception as a Constructive Act
The first issue that we must challenge is whether what you see is necessarily what you
get: Is visual perception unerringly veridical? This question is important because the
answer will tell us whether or not vision should be conceived as a "clear window onto
reality."
Adaptation and Aftereffects
One kind of evidence that visual experience is not a clear window onto reality is provided
by the fact that visual perception changes over time as it adapts to particular conditions.
When you first enter a darkened movie theater on a bright afternoon, for instance, you
cannot see much except the images on the screen. After just a few minutes, however, you
can see the people seated near you, and after 20 minutes or so, you can see the whole
theater surprisingly well. This increase in sensitivity to light is called dark adaptation. The
theater walls and distant people were there all along; you just could not see them at first
because your visual system was not sensitive enough.
Another everyday example of dark adaptation arises in gazing at stars. When you leave a
brightly lit room to go outside on a cloudless night, the stars at first may seem
disappointingly dim and few in number. After you have been outside for just a few
minutes, however, they appear considerably brighter and far more numerous. And after
20-30 minutes, you see the heavens awash with thousands of stars that you could not see
at first. The reason is not that the stars emit more light as you continue to gaze at them,
but that your visual system has become more sensitive to the light that they do emit.
Adaptation is a very general phenomenon in visual perception. As we will see in many
later chapters, visual experience becomes less intense1 as a result of prolonged exposure
to a wide variety of different kinds of stimulation: color, orientation, size, and motion, to
name just a few. These changes in visual experience show that visual perception is not
always a clear window onto reality because we have different visual experiences of the
same physical environment at different stages of adaptation. What changes over time is
our visual system, not the environment. Even so, one could sensibly argue that although
some things may fail to be perceived because of adaptation, whatever is perceived is an
accurate reflection of reality. This modified view can be shown to be incorrect, however,
by another result of prolonged or very intense stimulation: the existence of visual
aftereffects.
When someone takes a picture of you with a flash, you first experience a blinding blaze
of light. This is a veridical perception, but it is followed by a prolonged experience of a
dark spot where you saw the initial flash. This afterimage is superimposed on whatever
else you look at for the next few minutes, altering your subsequent visual experiences so
that you see something that is not there. Clearly, this is not veridical perception because
the afterimage lasts long after the physical flash is gone.
Not all aftereffects make you see things that are not there; others cause you to
misperceive properties of visible objects. Figure 1.1.3 shows an example called an
orientation aftereffect. First, examine the two striped gratings on the right to convince
yourself that they are vertical and identical to each other. Then look at the two tilted
gratings on the left for about a minute by fixating on the bar between them and moving
your gaze back and forth along it. Then look at the square between the two gratings on
the right. The top grating now looks tilted to the left, and the bottom one looks tilted to
the right. These errors in perception are further evidence that what you see results from
an interaction between the external world and the present state of your visual nervous
system.
Reality and Illusion
There are many other cases of systematically nonveridical perceptions, usually called
illusions. One particularly striking example with which you may already be familiar is the
moon illusion. You
1 It may be confusing that during dark adaptation the visual system becomes more sensitive to
light rather than less. This apparent difference from other forms of adaptation can be eliminated
if you realize that during dark adaptation the visual system is, in a sense, becoming less
sensitive to the dark.
Page 8
have probably noticed that the moon looks much larger when it is close to the horizon
than it does when it is high in the night sky. Have you ever thought about why?
Figure 1.1.3
An orientation aftereffect. Run your eyes along
the central bar between the gratings on the left for 30-60
seconds. Then look at the square between the two identical gra-
tings on the right. The upper grating should now appear tilted to
the left of vertical and the lower grating tilted to the right.
Many people think that it is due to refractive distortions introduced by the atmosphere.
Others suppose that it is due to the shape of the moon's orbit. In fact, the optical size of
the moon is entirely constant throughout its journey across the sky. You can demonstrate
this by taking a series of photographs as the moon rises; the size of its photographic
image will not change in the slightest. It is only our perception of the moon's size that
changes. In this respect, it is indeed an illusiona nonveridical perceptionbecause its image
in our eyes does not change size any more than it does in the photographs. In Chapter 7,
we will discuss in detail why the moon illusion occurs (Kaufman & Rock, 1962; Rock &
Kaufman, 1962). For right now, the important thing is just to realize that our perception
of the apparent difference in the moon's size at different heights in the night sky is
illusory.
There are many other illusions demonstrating that visual perception is less than entirely
accurate. Some of these are illustrated in Figure l.l.4. The two arrow shafts in A are
actually equal in length; the horizontal lines in B are actually the same size; the long lines
in C are actually vertical and parallel; the diagonal lines in D are actually collinear; and
the two central circles in E are actually equal in size. In each case, our visual system is
somehow fooled into making perceptual errors about seemingly obvious properties of
simple line drawings. These illusions support the conclusion that perception is indeed
fallible and therefore cannot be considered a clear window onto external reality. The
reality that vision provides must therefore be, at least in part, a construction by the visual
system that results from the way it processes information in light. As we shall see, the
nature of this construction implies certain hidden assumptions, of which we have no
conscious knowledge, and when these assumptions are untrue, illusions result. This topic
will appear frequently in various forms throughout this book, particularly in Chapter 7.
Figure 1.1.4
Visual illusions. Although they do not appear to
be so, the two arrow shafts are the same length in A, the horizon-
tal lines are identical in B, the long lines are vertical in C, the di-
agonal lines are collinear in D, and the middle circles are equal in
size in E.
It is easy to get so carried away by illusions that one starts to think of visual perception as
grossly inaccurate and unreliable. This is a mistake. As we said earlier,
Page 9
vision is evolutionarily useful to the extent that it is accurateor, rather, as accurate as it
needs to be. Even illusory perceptions are quite accurate in most respects. For instance,
there really are two short horizontal lines and two long oblique ones in Figure 1.1.4B,
none of which touch each other. The only aspect that is inaccurately perceived is the
single illusory propertythe relative lengths of the horizontal linesand the discrepancy
between perception and reality is actually quite modest. Moreover, illusions such as these
are not terribly obvious in everyday life; they occur most frequently in books about
perception.
All things considered, then, it would be erroneous to believe that the relatively minor
errors introduced by vision overshadow its evolutionary usefulness. Moreover, we will
later consider the possibility that the perceptual errors produced by these illusions may
actually be relatively harmless side effects of the same processes that produce veridical
perception under ordinary circumstances (see Chapters 5 and 7). The important point for
the present discussion is that the existence of illusions proves convincingly that
perception is not just a simple registration of objective reality. There is a great deal more
to it than that.
Once the lesson of illusions has been learned, it is easier to see that there is really no good
reason why perception should be a clear window onto reality. The objects that we so
effortlessly perceive are not the direct cause of our perceptions. Rather, perceptions are
caused by the two-dimensional patterns of light that stimulate our eyes. (To demonstrate
the truth of this assertion, just close your eyes. The objects are still present, but they no
longer give rise to visual experiences.) To provide us with information about the three-
dimensional environment, vision must therefore be an interpretive process that somehow
transforms complex, moving, two-dimensional patterns of light at the back of the eyes
into stable perceptions of three-dimensional objects in three-dimensional space. We must
therefore conclude that the objects we perceive are actually interpretations based on the
structure of images rather than direct registrations of physical reality.
Ambiguous Figures
Potent demonstrations of the interpretive nature of vision come from ambiguous figures:
single images that can give rise to two or more distinct perceptions. Several compelling
examples are shown in Figure 1.1.5. The vase/faces figure in part A can be perceived
either as a white vase on a black background (A1) or as two black faces in silhouette
against a white background (A2). The Necker cube in Figure 1.1.5B can be perceived as
a cube in two different orientations relative to the viewer: with the observer looking
down and to the right at the cube (B1) or looking up and to the left (B2). When the
percept ''reverses," the interpretation of the depth relations among the lines change; front
edges become back ones, and back edges become front ones. A somewhat different kind
of ambiguity is illustrated in Figure 1.1.5C. This drawing can be seen either as a duck
facing left (C1) or as a rabbit facing right (C2). The interpretation of lines again shifts
from one percept to the other, but this time the change is from one body part to another:
The duck's bill becomes the rabbit's ears, and a bump on the back of the duck's head
becomes the rabbit's nose.
Figure 1.1.5
C. Duck/Rabbit
Ambiguous figures. Figure A can be seen either
as a white vase against a black background or as a pair of black
faces against a white background. Figure B can be seen as a cube
viewed from above or below. Figure C can be seen as a duck (fac-
ing left) or a rabbit (facing right).
There are two important things to notice about your perception of these ambiguous
figures as you look at them. First, the interpretations are mutually exclusive. That
Page 10
is, you perceive just one of them at a time: a duck or a rabbit, not both. This is consistent
with the idea that perception involves the construction of an interpretive model because
only one such model can be fit to the sensory data at one time. Second, once you have
seen both interpretations, they are multistable perceptions, that is, dynamic perceptions in
which the two possibilities alternate back and forth as you continue to look at them. This
suggests that the two models compete with each other in some sense, with the winner
eventually getting "tired out" so that the loser gains the advantage. These phenomena can
be modeled in neural network theories that capture some of the biological properties of
neural circuits, as we will see in Chapter 6.
1.1.4 Perception as Modeling the Environment
Ambiguous figures demonstrate the constructive nature of perception because they show
that perceivers interpret visual stimulation and that more than one interpretation is
sometimes possible. If perception were completely determined by the light stimulating the
eye, there would be no ambiguous figures because each pattern of stimulation would map
onto a unique percept. This position is obviously incorrect. Something more complex
and creative is occurring in vision, going beyond the information strictly given in the
light that stimulates our eyes (Bruner, 1973).
But how does vision go beyond the optical information, and why? The currently favored
answer is that the observer is constructing a model of what environmental situation might
have produced the observed pattern of sensory stimulation. The important and
challenging idea here is that people's perceptions actually correspond to the models that
their visual systems have constructed rather than (or in addition to) the sensory
stimulation on which the models are based. That is why perceptions can be illusory and
ambiguous despite the nonillusory and unambiguous status of the raw optical images on
which they are based. Sometimes we construct the wrong model, and sometimes we
construct two or more models that are equally plausible, given the available information.
The view that the purpose of the visual system is to construct models of the environment
was initially set forth by the brilliant German scientist Hermann yon Helmholtz in the
latter half of the 1800s. He viewed perception as the process of inferring the most likely
environmental situation given the pattern of visual stimulation (Helmholtz, 1867/1925).
This view has been the dominant framework for understanding vision for more than a
century, although it has been extended and elaborated by later theorists, such as Richard
Gregory (1970), David Marr (1982), and Irvin Rock (1983), in ways that we will discuss
throughout this book.
Care must be taken not to misunderstand the notion that visual perception is based on
constructing models. Invoking the concept of models does not imply that perception is
"pure fiction." If it were, it would not fulfill the evolutionary demand for accurate
information about the environment. To satisfy this requirement, perceptual models must
(a) be closely coupled to the information in the projected image of the world and (b)
provide reasonably accurate interpretations of this information. Illusions show that our
models are sometimes inaccurate, and ambiguous figures show that they are sometimes
not unique, but both tend to occur only under unusual conditions such as in the books
and laboratories of vision scientists. Everyday experience tells us that our perceptual
models are usually both accurate and unique. Indeed, if the sensory information is rich
and complex enough, it is nearly impossible to fool the visual system into interpreting the
environment incorrectly (Gibson, 1966).
Visual Completion
Perhaps the clearest and most convincing evidence that visual perception involves the
construction of environmental models comes from the fact that our perceptions include
portions of surfaces that we cannot actually see. Look at the shapes depicted in Figure
1.1.6A. No doubt you perceive a collection of three simple geometrical figures: a square,
a circle, and a long rectangle. Now consider carefully how this description relates to what
is actually present in the image. The circle is partly occluded by the square, so its lower
left portion is absent from the image, and only the ends of the rectangle are directly
visible, the middle being hidden (or occluded) behind the square and circle. Nevertheless,
you perceive the partial circle as complete and the two ends of the rectangle as parts of a
single, continuous object. In case you doubt this, compare this perception with that of
Figure 1.1.6B, in which exactly the same regions are present but not in a configuration
that allows them to be completed.
Page 11
Figure 1.1.6
Visual completion behind partly occluding
objects. Figure A is perceived as consisting of a square, a circle,
and a rectangle even though the only visible regions are those
shown separated in Figure B.
This perceptual filling in of parts of objects that are hidden from view is called visual
completion. It happens automatically and effortlessly whenever you perceive the
environment. Take a moment to look at your present surroundings and notice how much
of what you "see" is actually based on completion of unseen or partly seen surfaces.
Almost nothing is visible in its entirety, yet almost everything is perceived as whole and
complete.
You may have noticed in considering the incompleteness of the sensory information
about your present environment that visual perception also includes information about
self-occluded surfaces: those surfaces of an object that are entirely hidden from view by
its own visible surfaces. For example, only half of the cube that you perceive so clearly
in Figure 1.1.7A is visible. Your perception somehow manages to include the three
hidden surfaces that are occluded by the three visible ones. You would be more than a
little surprised if you changed your viewpoint by walking to the other side and saw that
the cube now appeared as in Figure 1.1.7B. Indeed, there are infinitely many possible
physical situations that are consistent with Figure 1.1.7A, yet you automatically perceived
just one: a whole cube.
Figure 1.1.7
Visual completion due to self-occlusion. Figure A
is invariably perceived as a solid cube, yet it is physically possible
that its rear side looks like Figure B.
Completion presents an even more compelling case for the model-constructive view of
visual perception than do illusions and ambiguous figures. It shows that what you
perceive actually goes a good deal beyond what is directly available in the light reaching
your eyes. You have very strong expectations about what self-occluded and partly
occluded surfaces are like. These must be constructed from something more than the light
entering your eyes, because the image itself contains no direct stimulation corresponding
to these perceived, but unseen, parts of the world.
Impossible Objects
There is another phenomenon that offers an especially clear demonstration of the
modeling aspect of visual perception. Impossible objects are two-dimensional line
drawings that initially give the clear perception of coherent three-dimensional objects but
are physically impossible. Figure 1.1.8 shows some famous examples. The "blivit" in
Figure 1.1.8A looks sensible enough at first glance, but on closer inspection, it becomes
clear that such an object cannot exist because the three round prongs on the left end do
not match up with the two square ones on the right end. Similarly, the continuous three-
dimensional triangle that we initially perceive in Figure 1.1.8B cannot exist because the
surfaces of the locally interpretable sides do not match up properly (Penrose & Penrose,
1958).
Page 12
Figure 1.1.8
Impossible objects. Both the objects shown in this
figure initially produce perceptions of coherent three-dimensional
objects, but they are physically impossible. Such demonstrations
support the idea that vision actively constructs environmental
models rather than simply registering what is present.
One of the most interesting things about impossible objects is how clearly they show that
our perceptions are internal constructions of a hypothesized external reality. If visual
perception were merely an infallible reflection of the world, a physically impossible
object simply could not be perceived. It would be as impossible perceptually as it is
physically. Yet people readily perceive such objects when viewing properly constructed
images. This fact suggests that perception must be performing an interpretation of visual
information in terms of the three-dimensional (3-D) objects in the environment that might
have given rise to the images registered by our eyes. Moreover, the kinds of errors that
are evident in perceiving impossible objects seem to indicate that at least some visual
processes work initially at a local level and only later fit the results into a global
framework. The objects in Figure 1.1.8 actually make good sense locally; it is only in
trying to put these local pieces together more globally that the inconsistencies become
evident.
Predicting the Future
Supposing that the visual system does construct hypothetical models of reality rather than
just sticking to information available in sensory stimulation, why might such a system
have evolved? At some level, the answer must be that the models are more useful from
an evolutionary standpoint than the images that gave rise to them, but the reason for this
is not entirely clear. The usefulness of visual completion, for example, would seem to be
that 3-D models representing hidden surfaces contain much more comprehensive
information about the world than purely stimulus-based perceptions. The additional
information in the constructed model is valuable because it helps the perceiving organism
to predict the future. We have already considered one example in our discussion of
Figure 1.1.7. Perceiving a whole three-dimensional cube provides the basis for expecting
what we would see if we were to move so that new surfaces come into view. This is
terribly important for creatures (like us) who are constantly on the move. A stable three-
dimensional model frees us from having to reperceive everything from scratch as we
move about in the world.
A perceptual model of the three-dimensional environment does not need to be modified
much as we move around because the only thing that changes is our viewpoint relative to
a largely stable landscape of objects and surfaces. In fact, the only time the model needs
major modification is when model-based expectations are disconfirmed as unexpected
surfaces come into view. Everyday experience tells us that this does not happen nearly as
often as confirmation of our expectations. Thus, although constructing a three-
dimensional model of the environment may initially seem like a poor evolutionary
strategy, its short-term costs appear to be outweighed by its long-term benefits. It takes
more time and effort to construct the complete model initially, but once it is done, it
requires far less time and effort to maintain it. In the final analysis, the completed model
is a remarkably economical solution to the problem of how to achieve stable and accurate
knowledge of the environment.
The ability to predict the perceptual future is also evolutionarily crucial because we live in
a world that includes moving objects and other mobile creatures. It is useful to know the
current position of a moving object, but it is far more useful to know its direction and
speed so that you can predict its future trajectory. This is particularly important when
something is coming toward you, because you need to decide whether to approach,
Page 13
sidestep, flee, or ignore it. Without a perceptual model that somehow transcends
momentary stimulus information, vision would not be able to guide our actions
appropriately.
The view that the purpose of the brain is to compute dynamic, predictive models of the
environment was set forth by British psychologist Kenneth Craik in 1943. He argued
forcefully that organisms that can rapidly extrapolate the present situation into the future
have an evolutionary advantage over otherwise identical organisms that cannot. An
organism that can predict accurately is able to plan future actions, whereas one that
cannot predict can only react once something has happened. There is an important caveat
here, however: The process of extrapolation must work faster than the predicted event to
be useful. Not surprisingly, then, most perceptual predictions are generated very quickly.
Indeed, they are usually generated so quickly that we have no conscious experience of
them unless they are violated. Even then, our conscious experience reflects the violation
rather than the expectation itself.
1.1.5 Perception as Apprehension of Meaning
Our perceptual constructions of the external world go even further than completing
unseen surfaces in a three-dimensional model, however. They include information about
the meaning or functional significance of objects and situations. We perceive an object
not just as having a particular shape and being in a particular location, but as a person, a
dog, a house, or whatever. Being able to classify (or recognize or identify) objects as
members of known categories allows us to respond to them in appropriate ways because
it gives us access to vast amounts of information that we have stored from previous
experiences with similar objects.
Classification
Perhaps the easiest way to appreciate the importance of classification is to imagine
encountering some completely foreign object. You could perceive its physical
characteristics, such as its color, texture, size, shape, and location, but you wouldn't know
what it was or what you should do with it. Is it alive? Can it be eaten? Is it dangerous?
Should you approach it? Should you avoid it? Such questions can seldom be answered
directly from an object's physical characteristics, for they also depend on what kind of
object it is. We embrace loved ones, flee angry dogs, walk around pillars, eat
hamburgers, and sit in chairs. All this is so obvious that it scarcely seems worth
mentioning, but without perceptually classifying things into known categories, it would
be difficult to behave appropriately with the enormous variety of new objects that we
encounter daily. We can simply walk around the pillar because past experience informs
us that such objects do not generally move. But angry dogs can and do!
Classification is useful because objects within the same category share so many properties
and behaviors. All chairs are not exactly alike, nor are all hamburgers, but one chair is a
lot more like another than it is like any hamburger, and vice versa. Previous experience
with members of a given category therefore allows us to predict with reasonable certainty
what new members of that same class will do. As a consequence, we can deal with most
new objects at the more abstract level of their category, even though we have never seen
that particular object before.
Classifying objects as members of known categories seems simple, but it is actually quite
an achievement. Consider the wide variety of dogs shown in Figure 1.1.9, for example.
How can we recognize almost immediately that they are all dogs? Do dogs have some
unique set of properties that enable us to perceive them as dogs? If so, what might they
be? These are problems of object identification, one of the most difficultand as yet
unsolvedpuzzles of visual perception. In Chapters 8 and 9, we will consider some current
ideas about how this might happen.
Attention and Consciousness
It is an undeniable fact that the visible environment contains much more information than
anyone can fully perceive. You must therefore be selective in what you attend to, and
what you select will depend a great deal on your needs, goals, plans, and desires.
Although there is certainly an important sense in which a hamburger is always a
hamburger, how you react to one depends a great deal on whether you have just finished
a two-day fast or a seven-course meal. After fasting, your attention would undoubtedly
be drawn immediately to the hamburger; right after a big Thanksgiving dinner, you
would probably ignore it, and if you did not, the sight of it might literally nauseate you.
This example demonstrates that perception is not an entirely stimulus-driven process; that
is, perceptions
Page 14
are not determined solely by the nature of the optical information present in sensory
stimulation. Our perceptions are also influenced to some extent by cognitive constraints:
higher-level goals, plans, and expectations. It would be strange indeed if this were not so,
since the whole evolutionary purpose of perception, I have argued, is to make contact
between the needs of the organism and the corresponding opportunities available in its
environment. There are countless ways in which such higher-level cognitive constraints
influence your perception, many of which involve the selective process of visual
attention. As the hamburger example suggests, we look at different things in our
surroundings depending on what we are trying to accomplish, and we may perceive them
differently as a result. This point is perhaps so obvious that it goes without saying, but it
is important nevertheless.
Figure 1.1.9
Many kinds of dogs. Visual perception goes be-
yond the physical description of objects to classify them into
known categories. Despite the substantial physical differences in
their appearance, all these animals are readily perceived as be-
longing to the category of dogs.
One of the functions of attention is to bring visual information to consciousness. Certain
properties of objects do not seem to be experienced consciously unless they are attended,
yet unattended objects are often processed fully enough outside of consciousness to
attract your attention. Everyday examples abound. You may initially not notice a
stationary object in your visual periphery, but if it suddenly starts moving toward you,
you look in its direction without knowing why, only then becoming consciously aware
of its presence. While driving your car, you sometimes look over at the car next to you
without knowing why, only to find that the driver has been looking at you. In both cases,
visual processing must have taken place outside of consciousness, directing your
attention to the interesting or important aspects of the environment: the moving object or
the person looking at you. Once the object is attended, you become conscious of its
detailed properties and are able to identify it and discern its meaning in the present
situation.
Attending to an object visually usually means moving your eyes to fixate on it, but
attention and visual fixation are not the same. You are probably familiar with the fact that
you can be looking directly at something without attending to it in the slightest. Your
thoughts may wander to some completely different topic, and once attention returns to
the visual information, you may realize that you had no awareness of what was in your
visual field during the diversion. Conversely, you can attend to an object without fixating
on it. To demonstrate this, hold your hand out in front of you and fixate directly on your
middle finger. Now, without moving your eyes, try attending to each of the other fingers
in turn. It is not terribly easy, because you want to move your eyes at the same time as
you shift your attention, but it clearly can be done.
Many high-level aspects of perception seem to be fully conscious. For example, when
you look around the room trying to find your keys, you are certainly aware of the key-
finding goal that directs your attention to various likely places in the room. Other aspects
of perception are clearly not conscious, even in the same situation, such as knowing what
makes an object "keylike" enough to direct your eyes at it during this visual search. In
Page 15
general, lower levels of perception do not seem to be accessible to, or modifiable by,
conscious knowledge and expectations, whereas higher levels do.
Not much is yet known about the role of consciousness in perception. Indeed, we know
surprisingly little even about the evolutionary advantage of conscious perception. There
is a general belief that there must be one, but nobody has yet managed to give a good
account of what it is. The basic question is what advantage there might be for a
consciously perceiving organism over one that can perform all the same perceptual tasks
but without having conscious visual experiences. The unconscious automaton can, by
definition, engage in all of the same evolutionarily useful activitiessuccessfully finding
food, shelter, and mates while avoiding cliffs, predators, and falling objectsso it is unclear
on what basis consciousness could be evolutionarily selected.
One possibility is that the problem is ill-posed. Perhaps the automaton actually could not
perform all the tasks that the consciously perceiving organism could. Perhaps
consciousness plays some crucial and as-yet-unspecified role in our perceptual abilities.
We will return to these conjectures in Chapter 13 when we consider what is known about
the relation between consciousness and perception.
1.2 Optical Information
Our definition of visual perception implies that vision depends crucially on the
interaction among three things: light, surfaces that reflect light, and the visual system of
an observer that can detect light. Remove any one of these ingredients, and visual
perception of the environment simply does not occur. It seems reasonable, therefore, to
begin our study of vision by considering some basic facts about each of them. The
present section will describe how light interacts with surfaces to produce the optical
events that are the starting point of vision. The next section will describe the overall
structure of the human visual system that processes information in these optical events.
The remainder of the book discusses in detail how the visual system goes about
extracting relevant information from light to produce useful perceptions of environmental
scenes and events.
I argued in the preceding section that the evolutionary role of visual perception is to
provide an organism with accurate information about its environment. For this to happen,
the light that enters our eyes must somehow carry information about the environment. It
need not carry all the information we ultimately get from looking at things, but it must
carry enough that the rest can be inferred with reasonable accuracy. In this section, we
will consider how light manages to carry information about the world of visible objects
around us.