4 Perception

Perception is defined as: the integration of sensations into percepts of the objects. Each sensory modality -
seeing, hearing, and so on - has both a sense organ involved in acquiring the raw information from the
environment and a more central system in the brain for transforming this information into organized percepts.

Five functions of perception:
1. Determining which part of the sensory environment to attend to.(attention).
2. Localizing, or determining where objects are. (localization).
3. Recognizing, or determining what objects are. (Recognition).
4. Abstracting the critical information from objects. (Abstraction).
5. Keeping the appearance of objects constant, even though their retinal images are changing.(Constancy).
The ability to selectively attend only to a small subset of all of the information in the environment. This simple
ability believed to involve three separate sets of processes that are anatomically distinct in the brain:
1. Is responsible for keeping us alert.
2. System is responsible for orienting processing resources to task-relevant information
3. Decides whether we want to continue attending to the information or instead switch attention to other
Selective attention:
Is the process by which we select some stimuli for further processing while ignoring others? In vision, the
primary means of directing our attention are eye movements. Most eye fixations are on the more informative,
i.e., unusual, parts of a scene.

.listening to a professor's lecture when others around you are whispering to you is an example of selective
Eye movements:
It is evident that the eye is not stationary. Instead visual scanning takes the form of brief periods during which
the eyes are relatively stationary, called eye fixations, separated by quick jumps of the eye called saccades.
Each fixation lasts approximately 300 milliseconds (about a third of a second) while saccades are very fast (on
the order of 20 milliseconds). It is during the fixation periods that visual information is acquired from the
environment; vision is essentially suppressed during saccades. Generally speaking the points on which the eyes
fixate are not random, but rather are the areas of the scene that contain the most information. Example: a person
looking at a face makes many fixations on the eyes, nose) and mouth - those features that most efficiently
distinguish one face from another.
Auditory attention:
The sounds of many voices bombard our ears. However, we can use purely mental means to selectively attend
to the desired message. Some of the cues that we use to do this are the direction the sound is coming from, the
speaker's lip movements, and the particular characteristics of the speaker's voice (pitch and intonation). Even in
the absence of any of these cues, we can (though with difficulty) select one of two messages to follow on the
basis of its meaning. By not attending to - i.e., ignoring - large parts of the environment, we lose the ability to
remember much about those parts of the environment. However, selective attention pares down the amount of
necessary information processing to the point where it is manageable by the brain.

.Your ability to listen carefully to your best friend talking while ignoring the conversations around you is
called auditory shadowing.
To localize objects, we must first separate them and then organize them into groups. Localization involves
determining an object's position In the up-down and left-right dimensions. This is relatively easy because the
required information is pad of our retinal image. Localizing an object also requires that we know its distance
from us. This form of perception, known as depth perception, is not so easy because it's not available in the
retinal image. We have a variety of depth cues, both monocular and binocular, that allow us to do this.
Separation of objects:
The Gestalt psychologists emphasized the importance of perceiving whole objects or forms, and proposed a
number of principles to explain how we organize objects. The most elementary form of perceptual organization
is that in a stimulus with two or more distinct regions, we usually see part of it as a figure and the rest as ground
(or background), The regions seen as a figure contain the objects of interest - they appear more solid than the
ground and appear in front of it.
Separation of objects: Figure and ground:
It should be noted that, while vision is the most salient source of figure-ground relations, we can also perceive
figure-ground relations in other senses. For example, we may hear the song of a bird against a background of
outdoor noises, or the melody played by a violin against the harmonies of the rest of the orchestra.

.Our tendency to automatically perceive a form as standing out from its surround is known as figure-ground
Separation of objects: grouping of objects:
We see not only objects against a ground, but we see them in a particular grouping as well. Even simple patterns
of dots fall into groups when we look at them.

.According to the principles of proximity, we perceive items that are close together as forming a group.

.Closure and proximity are Gestalt principles of perceptual organization.

Perceiving distance:
To know where an object is, we must know its distance or depth. A retina is a two-dimensional surface onto
which a three-dimensional world is projected. The retina therefore directly reflects height and width, but depth
information is lost and must somehow be reconstructed on the basis of subtle pieces of information known
collectively as depth cues. Depth cues can be classified as binocular or monocular.
Binocular cues:
Humans have two eyes in the front of their heads, both pointing in the same direction. The two eyes' ability to
jointly infer depth comes about because the eyes are separated in the head, which means that each eye has a
slightly different view of the same scene. Binocular disparity is define as the difference in the views seen by
each eye. The disparity is largest for objects that are seen at close range and becomes smaller as the object
recedes into the distance. Beyond 3-4 meters, the difference in the views seen by each eye is so small that
binocular disparity loses its effectiveness as a cue for depth.

.The left and right eyes receive slightly different images of objects in the environment. This fact is known as
binocular disparity or

.The term binocular disparity refers to the difference in views perceived by each eye.
Monocular cues:
As indicated, the use of binocular cues is limited to objects that are relatively close.'What about objects that are
further away like distant clouds, cityscapes, or mountains?
1. Relative size. If an image contains an array of similar objects that differ in size, the viewer interprets the
smaller objects as being farther away.
2. Interposition. If one object is positioned so that it obstructs the view of the other, the viewer perceives
the overlapping object as being nearer.
3. Relative height. Among similar objects, those that appear closer to the horizon are perceived as being
farther away
4. Perspective. when parallel lines in a scene appear to converge in the image, they are perceived as
vanishing in the distance
5. Shading and shadows. Whenever a surface in a scene is blocked from receiving direct light, a shadow is
cast. If that shadow falls on a part of the same object that is.
6. Motion. Have you ever noticed that if you are moving quickly - perhaps on a fast-moving train – nearby
objects seem to move quickly in the opposite direction while more distant objects move more slowly.
LOCALIZATION: Perceiving motion:

Stroboscopic motion is produced most simply by flashing a light in darkness and then, a few milliseconds later,
flashing another light near the location of the first light. The light will seem to move from one place to the other
in a way that is indistinguishable from real motion.

.Stroboscopic motion refers to the perception of motion when no object is actually moving.
Real motion:
Real motion: is movement of an object through all intermediate points in space. Some paths of motion on the
retina must be attributed to:
 Movements of the eye over a stationary scene (as occurs when we are reading).
 Other motion paths must be attributed to moving objects (as when a bird enters our visual field).
 Moreover, some objects whose retinal images are stationary must be seen to be moving (as when we
follow the flying bird with our eyes), while some objects whose retinal images are moving must be seen
as stationary (as when the stationary background traces motion across the retina because our eyes are
pursuing a flying bird).
Some aspects of real motion are coded by specific cells in the visual cortex. These cells respond to some
motions and not to others, and each cell responds best to one direction and speed of motion. There are even cells
that are specifically tuned to detect an object moving toward the head, an ability that is clearly useful for


.The stimulation of specialized motion cells appear to be responsible for our perception of motion aftereffect.
The perceptual system needs to determine not only where relevant objects are in the scene, but also what they
are. This is the process of recognition. Recognizing an object, in turn, entails several sub-problems. First, we
have to acquire fundamental or primitive features of information from the environment and assemble them
properly. Second we have to figure out what the objects we're seeing actually are.

.Perceiving a large dark object as a cow fills which function of perception? Recognition.

.Characteristics of objects in the visual field such as shape and color are called primitive features.
RECOGNITION: Global to local processing:
One of the most powerful tools used by the perceptual system to solve this and other similar problems is to use
the context (the scene) within which the object is embedded to make inferences about what the object is. That
is, the system can start by carrying out global processing – understanding what the scene is - followed by local
processing – using knowledge about the scene to assist in identifying individual objects. The logic of this
process is articulated by Tom Sanocki (1993). Sanocki notes that our perceptual system act by using early
(global) information to constrain the interpretation of later information. This provides evidence that the visual
system tends to acquire global information first, followed by local information.

The binding problem: pre-attentive and attentive processes:
Attention is the process by which we select which of the vast amount of incoming information is processed and
eventually perceived consciously. Attention has also been conceptualized as having the role of binding together
different features of an incoming stimulus. An excellent illustration of we mean by this takes the form of what is
known as an illusory conjunction.
Illusory conjunctions suggest that information from the visual world is pre-attentively encoded along separate
dimensions – in the example, shape and color are encoded separately – and then integrated in a subsequent
attentive processing stage. Illusory conjunctions occur when stimulus duration is sufficient for the primitives to
be obtained, but not sufficient for the longer, attentional gluing stage.
Features integration theory:
The general idea is that in a first, pre-attentive stage, primitive features such as shape and color are perceived
while in the second, attentive stage, focused attention is used to properly 'glue‘ the features together into an
integrated whole. A standard experimental procedure for distinguishing primitive features from ‘glued-together’
features is a visual search task in which the observer’s task is to determine whether some target object is present
in a cluttered display.
The major problem is that, using visual search and related procedures, scientists have unveiled too many
presumed ‘primitives’ to be realistic. Di Lollo, Kawahara, Suvic, and Visser (2001). They go on to Describe an
alternative, dynamic control theory.
Dynamic control theory:
'instead of an early, hard-wired system sensitive to a small number of visual primitives, there is a malleable
system whose components can be quickly reconfigured to perform different tasks at different times, much as the
internal pattern of connectivity in a computer is rearranged dynamically by enabling and disabling myriad gates
under program control‘. This basically means that the system rearranges itself for different tasks – as opposed to
there being many subsystems for each possible task.
Determining what an object Is?
Attentive versus pre-attentive processing is concerned with the problem of determining which visual
characteristics belong to the same object. A second problem is that of using the resulting information to
determine what an object actually ls?
Here, shape plays a critical role.'S7e can recognize a cup, for example, regardless of whether it is large or small
(a variation in size), brown or white (a variation in color), smooth or bumpy (a variation in texture), or
presented upright or tilted slightly (a variation in orientation). In contrast, our ability to recognize a cup is
strikingly affected by variations in shape; if part of the cup's shape is hidden, we may not recognize it at all.
One piece of evidence for the importance of shape is that we can recognize many objects about as well from
simple line drawings, which preserve only the shapes of the objects, as well as from detailed color photographs,
which preserve many other attributes of the objects as well visual processing can be divided into earlier and
later stages. In early stages, the perceptual system uses information on the retina, particularly variations in
intensity, to describe the object in terms of primitive components like lines, edges, and angles. The system uses

these components to construct a description of the object.In later stages, the system compares this description to
those of various categories of objects stored in visual memory and selects the best match.
Feature detectors in the cortex:
single-cell studies were pioneered by David Hubel and Torstein Wiesel (1981), identified three types of cells in
the visual cortex that can be distinguished by the features to which they respond.
1. Simple cells respond when the eye is exposed to a line stimulus (such as a thin bar or straight edge
between a dark and a light region) at a particular orientation and position within its receptive field.
2. A complex cell also responds to a bar or edge in a particular orientation, but it does not require that the
stimulus be at a particular place within its receptive field. It will respond continuously as the stimulus is
moved across that field.
3. Hypercomplex cells require not only that the stimulus be in a particular orientation, but also that it be of
a particular length. If a stimulus is extended beyond the optimal length, the response will decrease and
may cease entirely.
All of the cells described above are referred to as feature detectors. Because the edges, bars, corners, and angles
to which these detectors respond can be used to approximate many shapes, the feature detectors might be
thought of as the building blocks of shape perception.

.Simple cells respond to visual stimuli which are in a particular orientation, in a particular place in the
receptive field.
Later stages of recognition: network models:
Now that we have some idea of how an object's shape is described, we can consider how that description is
matched to shape descriptions stored in memory to find the best match.
Simple networks:
The basic idea is that letters are described in terms of certain features, and that knowledge about what features
go with what letter is contained in a network of connections. Such proposals are referred to as connectionist
connectionist models:
These models are appealing in that it is easy to conceive how networks could be realized in the brain with its
array of interconnected neurons. Thus, connectionism offers a bridge between psychological and biological
Recognizing natural objects:
Features of natural objects .
The shape features of natural objects are more complex than lines and curves, and more like simple geometric
forms. These features must be such that they can combine to form the shape of any recognizable object (just as
lines and curves can combine to form any letter). The features of objects must also be such that they can be
determined or constructed from more primitive features, such as lines and curves, because, as noted earlier,
primitive features are the only information available to the system in the early stages of recognition. One
popular though controversial suggestion is that the features of objects include a number of geometric forms,
such as cylinders, cones, blocks, and wedges. These features, referred to as GEONS (short for 'geometric ions'),
were identified by Biederman (1987), who argues that a set of 36 geons combined according to a small set of
spatial relations, is sufficient to describe the shapes of all objects that people can possibly recognize.

.When simple two-dimensional features, such as lines and angles are combined, a new object is perceived that
cannot be understood by examining the component parts. The new characteristics are known as emergent

.A suitcase can be described as the combination of a cube and an arc, a pail as a cylinder and an arc, and a
flashlight as two cylinders and a block. These descriptive geometric features are known as geons.
The importance of context:
Bottom-up processes are driven solely by the input – the raw, sensory data – whereas top-down processes are
driven by a person’s knowledge, experience attention, and expectations.
Top-down processes, in the form of expectations, underlie the powerful effects of context on our perception of
objects and people.

.When you see your professor in the supermarket, you have trouble recognizing him. What best explains this?
You have used bottom-up processing instead of top-down processing.

.Because of top-down processing, you may perceive a red ball on the kitchen table as an apple.
Special processing of socially relevant stimuli: face recognition:
The social importance of faces, combined with inherent recognition difficulties resulting from their similarity to
one another, has apparently led to the development of special recognition processes that are employed for faces
but not for objects.
prosopagnosia is a syndrome that can arise following brain injury, in which a person is completely unable to
identify faces but retains the ability to recognize objects.

.Prosopagnosia is an inability to recognize faces. It is an example of a breakdown of recognition called

Abstraction is the process of converting the raw sensory information acquired by the sense organs (for example,
patterns of straight and curved lines) into abstract categories that are pre-stored in memory (for example, letters
or words). Abstracted information takes less space and is therefore faster to work with than raw information. A
useful analogy is between a bitmapped computer image of a face versus an abstracted image of the same face
that is made up of preformed structures such as ovals and lines. as files saved, the original, freehand version
required 30,720 bytes of memory, while the 'abstracted‘ version required only 902 bytes.
perceptual constancy –is to keep the appearance of objects the same in spite of large variations in the initial
representations of the stimuli received by the sense organs that are engendered by various environmental
Color and brightness constancy entail perceiving the actual color and brightness of a stimulus even when the
actual information arriving at the eye varies in color makeup (because of the color makeup of the ambient
lighting) and in brightness (because of the level of ambient illumination).
Size constancy entails perceiving the actual size of a stimulus even when the actual size of the object’s image
on the retina varies because of the object’s distance. Although visual constancies are the most salient,
constancies exist in all sensory modalities.
The neural basis of attention:
Three separate brain systems seem to mediate the psychological act of selecting an object to attend to.
The first system is generally associated with arousal.
The second, or posterior system, selects objects on the basis of location, shape, or color.
The third, or anterior system is responsible for guiding this process, depending on the goals of the viewer.
The visual cortex operates according to the principle of division of labor. Localization is mediated by a region
near the top of the cortex, and recognition by a region near the bottom of the cortex. Recognition processes are
further subdivided into separate modules such as color, shape, and texture.

