Lecture5 2
Lecture5 2
Aude Oliva
PPA
Definition
• A scene is a view of a real-world environment
that contains multiples surfaces and objects,
organized in a meaningful way.
http://cvcl.mit.edu/SUNSarticles.htm
I. Rapid Visual Scene
Recognition
We move our eyes every 300 msec on average
How do human recognize natural images in a short glance
?
Demonstrations
A B
C D
Systematic scene memory
distortion correct answer
A B
B C D
Oliva, A. (2005). Gist of a scene. In Neurobiology of Attention. Eds. L. Itti, G. Rees and J. Tsotsos. Academic Press, Elsevier.
Rapid Scene “Gist” Understanding:
Mechanism of recognition
• Mary Potter (1975, 1976) demonstrated that during a rapid
sequential visual presentation (100 msec per image), a novel picture
is instantly understood and observers seem to comprehend a lot of
visual information
• But a delay of a few hundreds msec (~ 300 msec) is required for the
picture to be consolidated in memory.
Old or
Pict
1
Interval
Pict
2
Interval Pict
3
Interval
? New ?
http://suns.mit.edu/SUnS07Slides/FabreThorpe_SUnS07.pdf
Saccadic response 180 msec
Kirchner & Thorpe (2006) after image presentation
http://suns.mit.edu/SUnS07Slides/Thorpe_SUnS07.pdf
Evans & Treisman (2005): An RSVP task
Hypotheses: Performance should deteriorate when the distractors scenes
share some of the same features with targets.
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Non-Human Human Non-Human Human
Distractors Distractors Distractors Distractors
Conditions
Features set like parts of head, body, hair are shared between animals and
Human: this level of information may help recognition of animals in previous studies
Evans & Treisman: Results
Animal Targets Vehicle Targets
% of correct target detection
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Non-Human Human Non-Human Human
Distractors Distractors Distractors Distractors
Conditions
Features set like parts of head, body, hair are shared between animals and
Human: this level of “part “information may help recognition of animals in
previous studies
Scene Representation
Time course of visual information
within a glance
- Definition: what is the “gist”
- A few observations : getting the gist of a scene
- How do spatial frequency information unfold?
- What is the role of color ?
- What are the global properties of a scene?
Hybrid Images :
A method to study human image analysis
Albert
Einstein
Marilyn
Monroe
Superordinate Classification
Task: Binary classification in super-ordinate categories.
Result: 80 % of correct classification at a spatial resolution of
8 cycles / image (image of 16 x 16 pixels size).
80%
Scene Identification: Basic-Level
Task: Identify the basic-level category of the scene (scenes from 24 different semantic
categories).
Result: 80 % of correct classification at a spatial resolution of 8 cycles / image for grey-
level scenes, and at a resolution of 4 cycles/images for colored scenes
80 %
Oliva, A., & Schyns, P.G. (2000). Colored diagnostic blobs mediate scene recognition. Cognitive Psychology
Edges or Blobs ?
• Scenes can be identified at a
superordinate and a basic-level
with only coarse spatial layout
(resolution of 4-8 cycles/image)
• At such a coarse spatial
resolution, local object identity is
not available
• Objects identity can be inferred
after identifying the scene
• But … natural images are usually
characterized by contours and our
visual system encodes edges. Torralba & Oliva, 2001
Hybrid images allow to study concurrently the roles of “blobs” and “edges”
in fast scene recognition. Which information do we process first ?
Schyns & Oliva (1994, 1997), Oliva (1995), Oliva & Schyns (1997)
Exp 1: Detection Task
LF Subjects were not aware that
images were hybrids. Hybrid: 30 msec
80 % correct
70
60
+ 50
40
30
20
HF 30ms 10
0
Match Match
LF HF
+ 50
40
30
20
HF 120 ms 10
0
Match Match
LF HF
or
LSF-Hybrid
“hall” “city”
or
“hall” “city”
Oliva, A. (2005). Gist of a scene. In Neurobiology of Attention. Eds. L. Itti, G. Rees and J. Tsotsos. Academic Press, Elsevier.
Color Diagnosticity
Man-made categories: no
specific colour mode
Natural categories: specific and
distinctive colour modes
Hypothesis:
Oliva & Schyns (2000). Colored diagnostic blobs mediate scene recognition. Cognitive Psychology.
R G B space -> L*a*b*
Lab Luminance
860
840
820
• Color helps scene
800
Abn
identification but
780
760
Lum
Norm
only when it is a
740 diagnostic feature
720
700
of the scene
Nat Art
category
Oliva & Schyns (2000). Colored diagnostic blobs mediate scene recognition. Cognitive Psychology.
The role of diagnostic color
Oliva & Schyns (2000). Colored diagnostic blobs mediate scene recognition. Cognitive Psychology.
The role of Color & Brain Signals
Diagnostic colors contribute to early stages of scene recognition
Normal color
Grayscale
Abnormal
color
50 75 100 125 150 175 200 225
msec
Goffaux, V., Jacques, C., Mouraux, A., Oliva, A., Rossion, B., & Schyns. P.G. (2005). Visual Cognition.
Scene Representation
Time course of visual information
within a glance
Some simple features are correlated
with scene recognition
Irving Biederman
Forest Before Trees: The Precedence of Global Features in Visual
Perception
Navon (1977)
Oliva & Torralba (2001). International Journal of Computer Vision. Torralba & Oliva (2002). PAMI.
Oliva & Torralba (2002). 2nd Workshop on Biologically Motivated Computer Vision.
Part-based approach: e.g. objects
If you knew the identity of all the objects in a scene, recognition would be perfect
– Schemas (Bartlett;
Piaget; Rumelhart)
– Scripts (Schank)
– Frames (Minsky)
Part-based approach: e.g. objects
Oliva & Torralba (2001). International Journal of Computer Vision. Torralba & Oliva (2002). PAMI.
Oliva & Torralba (2002). 2nd Workshop on Biologically Motivated Computer Vision.
Holistic approach: global surface properties
A flat frontal surface projects an array of stimuli on the retina whose gradient
(interval between stimuli) is constant
J J Gibson
Textural Signatures of Visual Scenes
“Flat longitudinal surface”
A flat longitudinal surface projects an array of stimuli on the retina whose gradient
decreases and nears the center of the retina with increasing distance from the observer
Textural Signatures of Visual Scenes
“Flat slanting surface”
A flat slanting surface projects an array of stimuli on the retina whose gradient
decreases and nears the center of the retina either more or less rapidly than that of
a longitudinal surface.
Textural Signatures of Visual Scenes
“A rounded surface”
When increasing the size of the space, natural environment structures become larger
and smoother.
Torralba & Oliva. (2002). Depth estimation from image structure. IEEE Pattern Analysis and Machine Intelligence
Hints of Globality: Spatial
Structure
Forests are “enclosed”
A lake
Scene-Centered Object-Centered
Representation Representation
“Street”
Oliva et al (1999); Oliva & Torralba (2001, 2002, 2006); Torralba & Oliva (2002,2003); Greene & Oliva (2006, in revision)
Spatial Envelope Representation
Global Properties diagnostic of the space the scene
subtends provide the basic level of the scene
ss
Mean depth street
ne
en
op
Openness Highway
Perspective
skyscraper
City center
(2) Content of the space Ex
pa
Naturalness n si
on
Roughness
Roughness
Lack of texture
High spatial
frequency isotropic
texture
{Σ
{
,Σ ,Σ
{
{Σ ,Σ ,Σ
street
Highway
Degree of Expansion
skyscraper
City center
Oliva & Torralba (2001)
Degree of Openness
Oliva & Torralba (2001). The spatial envelope model
Spatial Envelope Theory of Scene
Recognition
Scene-centered representation
Object-centered representation
Scene centered
representation
Potential for Navigation
Mean depth
Greene & Oliva (2008). Recognition of Natural Scenes from Global Properties: Seeing the Forest Without Representing the Trees. Cognitive Psychology
Database
Desert Field Forest Lake
“desert”
A scene-centered classifier
predicts correct performances
field
Ocean (error)
(error)
river desert
Scene Classification from “Texture”