INTRODUCTOR
TECHNIQUES
for
3-D COMPUTE!
VISION
Emanuele Trucco
Alessandro Verri‘nunc agere incipiam tibi, quod vementer ad has res
atte, esse ea quae rerum simulacra vocamus.
you shall now see me begin to deal with what is of high importance to the
‘subject, and to show that there exists what we call images of things)
Lucretius, De Rerum Natura, 424-44
Contents
Foreword
Preface: About this Book
Introduction 1
What is Computer Vision? 1
12 The Mary Faces of Computer Vision 2
121. Rested Discipines2
122. Reearchand Application Aves 3
13. Exploring the Computer Vision World 4
LL Cosfeences Journals and Books, $
132 Invrne,6
133 Some Hins on Math Sofware 11
14 The Road Abead 11
Digital snapshots 5
21 Introdudion 16
2.2 Intensitylmages 16
221 Main Conces 16
222 Baie Opis 18
223 Baie Radiomeny, 22
224 Gemeric Image Formation 26
23. Acquitirg Digital Images 28
2AL Baie acs 28
222 Spatial Sampling, 21
223 Acquistion Noise and How to Esimat 32
24 — CameraParameters 4
24d Denon, 34
242. Berinsic Parameters, 38Contents
243 Inrnsic Parameters 36
244 Camera Models Revisited, 28
25 Range Data and Range Sensors 40
25.1 Representing Range Images 4
252 Range Senors 41
253 Active Pangulaion,
25.4 A Simple Sensor 4
26 Summary 47
27 Further Readings 47
28 Review 48
Dealing with Image Noise st
31 Image Noise SI
BLL. Gaussian Nose, $3
B12 Impulsive Nowe, 53
32 Noise Filtering 55
321. Smoothing by Averaging, 56
322. Gausian Smoodking, 5
4323 AreourSomples Really Gaston, 60
324 Nonlinear Filtering, 2
33° Summary 64
34 Further Readings 64
35 Review 64
Image Features 6
4.1 What Are Image Features? 68
42 Edge Detection 69
421 Basics 9
422. The Cammy Edge Decor 71
423 Other Fe Detectors 50
424 Concluding Remarks on Fage Detection, 8
43. Point Beatures: Comers. 82
44 Surface Extraction from Range Images. 85
441. Defining Shape Clases 86
442. Eximating Local Shape, 87
43° Summary 90
46 Further Readings 90
47 Review 91
More Image Features 95
5A. Introduction: Line and Curve Detection 96
52 The Hough Transform 97
S21. The Hough Transform for Lines 97
522 The Hough Transform for Curves 100,
Contents vl
523. Concluding Remarks on Hough Transforms, 100
53. Fitting Elipses to Image Data 101
S21 Euclidean Disance Fi 101
S22 Algebraic Distance i103
‘S33 Robust ting 106
534 Concluding Remarks on Elipe Fiting 107
54 Deformable Contours 108
SAL. The Energy Faetonal 109
S42 The Element ofthe Eneray Functional 109
343 AGreedy Algorithm, 10
55 LineGrouping 114
56 Summary 117
57 Further Readings 117
58 — Review 118
Camera Calibration 123
61 Introdtetion 123,
62 Digect2arameter Calibration 125
621. sie Equations 25
622 Focal Length, Aspeet Rati and Burinsic rameters 127
(623 Enimating the Image Cenen,131
63 Camera Parameters from the Projection Matrix 132
631. imation ofthe Projection Marx, 122
632 Computing Camera Parameters 134
64 Conclusing Remarks 136
65 Summay 136
66 Further Readings. 136
67 Review 137
Stereopsis 139
71 Introdvetion 140
ZL. Te Two Problems of trea, 40
712. ASinple Stereo Sytem, 188
713 Toe Paramore ofa Store Syton, 168
72 The Correspondence Problem 145
721. Bases, MS
722 Corelation-Based Methods 146
723. eanwre-based Methods 148
724 Concluding Remarks 149
73 Epipoler Geometry 150
731 Noon 150
732. Buses 151
733 The Esental Mai, E, 152
734 Tae Fundamental Mtr, F,1S¢Contents
735. Computing End F: The Eigh-pint Algoriton, 15S
7316. Loctng the Epipoes from B and F158,
737. Rectfcaion, 157
14 3D Reconstruction 161
74.1 Reconsiructon by angulation, 162
742. Reconsnction up toa Scale Factor, 164
743. Reconsircionup oa Projecive Transformation, 165
78 Summary 171
76 Further Readings 171
77 Review 172
Motion ”
81 Introduction 178,
ALL. The Importance of Visual Motion, 178
B12 The Problems of Motion Anas, 180
82 The Motion Feld of Rigid Objects 183
821 Bases 188
822 Special Case: Pure Translation, 185
823 Special Case 2: Moving Pane, 187
824 Motion Parlay 158
325 The Insuntancou Epipole, 191
83 The Notion of Optical Flow 191
83. The Image Brighines Constancy Equation, 192
832 The Aperare Problem, 192
833 The Validiy ofthe Consaney Equation: Optical Plow; 19
84 Estimating the Motion Field 195
41. Differential Techniques, 195
842. Feature based Techniques 198,
85 Using the Motion Field 203
SID Motion and Sracture roma Sparse Motion Field, 203
852 FD Motion and Sucre rom a Dense Motion Feld, 208
86 — Motion-based Segmentation 212
87 Summary 215
88 Further Readings 215
89 Review 216
Shape from Single-image Cues n9
9.1 Introduction 220
92 Shape from Shading 221
921. The Refectance Map, 221
922 The Fundamental Equation, 23
93 Finding Albedo and lluminant Direction 226
93.1. Some Necessary Assumptions 226
932. ASimple Method for Lamberian Swfaces 227
10
"
Contents
94 A Variational Method for Shape from Shading 229
94.1 The Functional be Minimized, 229
942. The Euler Lagrange Equations 230
943. Irom the Continuous othe Discrete Case, 231
B44. The Algonihn, 231
945 Enforcing Integrin, 232
946. Some Necessary Desay, 234
95 Shape tom Texture 235,
95.1. WhatisTxtre?, 235
932 Cag Tero InferShape: Fundamentals 257
953 Surface Oremation from Sais Texture 239
954 Concluding Remarks, 242
96 Summary 241
97 Furthe: Readings 242
98 Review 242
Recognition 207
101 What Does it Mean to Recognize? 248,
102 Interpretation Trees 299
1021 An Example, 251
1022 Wild Cards and Spurious Features 258
1023 A Feasble Algorithm, 253
103 Invariants 255
103.1 troduction 285
1032 Defniions 256
1033 tnvanane-Based Recognition Algom 259
14 Appes-ance-Based Kdetification 262
1041 Images or Features? 202
1042 mage Eigenspaces, 25
105 Concluding Remarks on Object Identification 270
106 3D Object Modeling 270
1061 Feawre-based and Appearance-based Modes 271
1062 Objet Versus Viewer centered Representations, 272
1063 Concluding Remarks 273
107 Summary 273,
108 Furthe= Readings 274
109 Review 275
Locating Objects in Space 279
ULL Introduction 280
11.2 Matehing from Intensity Data. 283
1121 3D Location froma Pespctv Imag, 283
11.22. ¥D Location froma Weak perspective Image 285.
1123 Pose from Elipes, 292Contents
1124 Concluding Remarks 296
113 Matching from Range Data. 294
131. Exiting Translation Firs, 296
4132 Esinaing Rottion irs, 300
1133 Concluding Remarks 301
114 Summary 302
115 Further Readings 302
116 Review 303
‘A Appendix 307
AJ Experiments: Good Practice Hints 307
AZ — Numerical Differentiation 311
AS TheSampling Theorem 314
Ad Projective Geometry 316
AS Differential Geometry 320
AG Singular Value Decomposition 322
AT Robust Estimators and Model Fiting 326
AS Kalman Filtering 328
AG Three-dimensional Rotations 332
Index 335
Foreword
Until reeatly, computer vision was regarded asa eld of research stil in its infancy,
rot yet mature and stable enough to be considered part ofa standard curriculum in
‘computer science. Asa consequence, most hooks on Computer vsion became obsolete
as soon as they were published. No hook thus far has ever managed to provide @
‘comprehensive overview ofthe field since even the good ones focusonanarrow subarea,
typically the author'stesearch endeavor.
With Trucco and Veri, the situation has finally changed. Ther book promises to
bo the first true textbook of computer vision, the first to show that computer vision is
‘now a mature discipline with sold foundations. Among connoisseurs, the aulhors are
‘well nowa a carefuland critical experts inthe eld. (Tam proud to have figured inthe
career of one of ther: Alessandro Verri worked with me at MIT fora short year and
it was ajoy to work with him)
‘Over the years I have been asked many times by new graduate students or
colleagues what toretdin order to learn about computer vision. Until now, my answer
was that I could not recommend any single book. As a substitute, I would suggest an
‘ever-changing list of existing books together witha small collection of specific papers.
From now on, however, my answers clear: Introductory Techniques for 4D Computer
Vision is the text to read,
personaly belive that Jnroductory Techniques for 3.D Computer Vsin wile
the standard textbook for graduate and undergraduate courses on computer vision in
years to come, It is ar almost perfect combination of theory and practice. It provides a
complete introduction to computer vison, effectively giving the basic background for
practitioners and futre researchers in the field
“Trueco and Verrihave written a textbook that is exemplary in its clarity of ex-
positon and in its inentions. Despite the intial warning (“Frail dre e fare c di
‘mezzo il mare”), the objectives stated in the preface are indeed achieved. The book
"Benwase word na dee there tha
Foreword
not only places a correctly balanced emphasis on theory and practice but also provides
‘needed material about typically neglected but important topics such as measurements,
calibration, SVD, robust estimation, and numerical diferentiaton
Computer vision i just now maturing from an almast esoteric corner of research
to.a key discipline in computer science. In the last couple of years, the first bllion-
dollar computer vision companies have emerged, a phenomenon no doubt facilitated
by the irrational exuberance of the stock market. We will undoubtedly see many more
commercial applications of eomputer vision inthe near future, ranging from industrial
inspection and measurements to security database search, surveillance, multimedia and
‘computer interfaces This is a transition that other fields ia engineering, such as signal
processing and computer graphics, underwent long ago. Trucco and Vers imely book
isthe first to represent the discipline of computer vision in its new, mature state, asthe
industries and applications of computer vision grow and mature as wel. sit reaches
‘adulthood, computer vision i stil far from being a solved problem. The most exciting
developments, ciscoveries and applications ie ahead of us. Though a similar statement
can be made about most areas of computer science, iis true for computer vision in
8 much deeper sense than, ay, for databases or graphics, Afterall, understanding the
principles of vision has implications far beyond engineering, since visual perception is
‘one of the key modules of human intelligence, Ulimately, understanding the problem
‘of vision isliely to help us understand the brain. For this reason, lam suze that along
and successful series of new editions will follow this book, with updates most likely
to.come in the chapters dedicated to object recognition and in new hot topics such as
adaptation and learning.
Introductory Techniques for 3-D Computer Vision is much more than a good
‘textbook: Iisthe fist book to mark the coming of age of our own discipline, computer
‘Tomaso Poggio
Cambridge, MA,
Brain Sciences Department and Artfcial Intelligence Laboratory
Massachussetts Institut of Technology
Preface: About this Book
Here tae this book and peruse wel
(Christopher Marlowe, Doctor Fuusts,
Frail dire el fare ce’ di mezzo il mare!
Tealian proverb
What this Book is and is Not
‘This book ie meant 0 be:
‘+ an applied introduction othe problems and solutions of madera computer vision
‘+s practical texbook, teaching how to develop and implement algorithms to rep:
resentative problems.
‘+a structured, easy-to-follow textbook, in which each chapter concentrates on a
specific problem and solves it building on previous results, and all chapters form
logical progression,
+ a collection ofselected, well-tested methods (theory and algorithms), aiming to
balance difficulty and applicability.
‘+ starting pont to understand and investigate the literature of computer vision,
Including confrences, journals, and Internet sites.
selF-eaching tool for research students academies, and professional scientists,
‘This book is mor meant tobe:
«an alLembracing book on computer vision and image processing.
‘+ book reporting research results that only specialist can appreciate: It is meant
for teaching
‘an exhaustive or historical review of methods and algorithms proposed for each
problem.
‘The choive of topics has been guided by our feeling of practitioners. There is
10 implication whatsoever that what is lft out is unimportant. selection has been
Between words nak there ithe se.Preface: About this Book
imposed by space limits and the intention of explaining both theory and algorithms to
the level of detail necessary to make implementation really posible
What are the Objectives ofthis Book?
* ‘Tointroduce the fundamental problems of computer vison.
‘+ Toenable the reader to implement solutions for teasonably complex problems.
‘+ To develop two parallel tracks, showing how fundamental problems are solved
using both intensity and range images, two most popular types of imagesin today's
‘computer vision community.
‘+ Toenable the reader to make sense of the literature of computer vision.
‘What isthe Reader Expected to Know?
Thisbookhas been written for people interested in programming solutions tocomputer
vision problems The bes way of reading it iso ty out the algorithss ona computer.
‘We assume thatthe reader is able to translate our pseudocode into computer programs,
and therefore that he or hei familiar with language suitable for numerial compu
tations (or instance C, Fortran). We also expect thatthe reader has acoso popular
sumerical libraries like the Numerical Recipes of Meschac, oto igh vel anguages
for developing numerical software, like MATLAB, Mathematica or Scab.
The whole book is non language specific. We have endeavored to present all the
necessary vision specif information, so thatthe reader only needs some competence
ina programming language.
Although some of the mathematics may appear complex at first lance, the whole
book evolves around base calculus, linear algebra (including eat squares, eigenve.
tors and singular vale decomposition), and the fundamentals of analyticand projective
geometry.
Who can Benefit from this Book?
+ Student of university courses on computer vision, typically final-year undergrad
uates or postgraduates of degrees like Computer Science, Engineering, Mathe-
matics, and Physics Most ofthe knowledge required to read this book should be
part oftheir normal background,
‘+ Researchers looking for a moder presentation of computer vision, as well asa
collection of practical algorithms covering the main problems ofthe discipline.
+ “Teachers and students of professional training courses.
‘+ Industry sciemtisis and academics interested in learning the fundamentals and the
practical aspeets of computer vision.
For inormation on this the othe phase aioe ere se Chae
Preface: About this Book xv
How is this Book Organized?
Each chapters opened by a summary fits contents, and conchuded by asel-check lst,
review questions, acancse guide tofurther readings, as wellas exercises and suggestions
or computer projec
For each problem analyzed, we give
1, problem statement, defiaing the objective to be achieved
2. atheoretical treatment ofthe problem,
3, one or two algorithms in pseudocode.
4, hints on the practical applicability ofthe algorithms
few maheraticl concep are cai tothe understanding of shone and
algorhns but not nesesarlyEnown to exeyby. To make the bok rexsonay
Selfcontane, we lave ince an apeadi wth several bi section reviewing
Socigound tps We ted to ger te append to he level of etal ner 0
Understand edison ofthe main ext the tempt ovoid just mere spel
otvaue reminders
“We ‘made an effort to keep the tone informal throughout, hopefully without
relaxing oo mosh te maternal gor
"The graphics fave ben designed folate quik identification of important
mater Pelem Satementinportantdefions and algorithms ate ese
fs hins and comment of pace elvan ahdg codlaguggevion, appear
inadfrent point rand are bgt a pointe (=).
Final be hate nuded in Cape information on he computer vison com
munity inclingpitr fo Internet von ses flare, nage and documens)
{hd alt ofthe naa putin elecronk mowers nd conferences.
Suggestions for instructors
‘The materialin ths ex shouldbe enough fortwo semesters atthe senior undergraduate
level, assuming thre» hours per week. Ultimately, this depends on the students’ back:
round, the desired evel of detail, the choice of topics and how much time isallocated
tq project work. Insructors may want to review some of the material in the appendix
in the first few lectures ofthe course
Tn ase ony ore semester is available, we suggest two selections of topics.
+ Stereo and Motion Chapters 1 to 6 (image acquisition, noise attenuation, feature
‘exttaction andcalibration), then Chapters 7 stereopsis) and 8 (motion analysis),
+ Object Recognition Chapters 1 to 6, thea Chapters 10 (object recognition) and 11
{object location).
Ideally, the students shouldbe assigned projects to implement and test at last
some of the algorittms It is up to the instructor to decide which ones, depending on
how the course i structured, what existing software is available to students, and which
part of the book one wants to cover.appealing manne, na log progresion Bam Tre Alesandro Ver
Itseemstousthatthere isa shortage of such textbooks on computer vison, There Depot Gempuingand Dip inmate
ae books surveying large numbers tpis ad eenigues often ge and expensive, teal Engeeig ‘Send norman
Sometimes vage in many places hecase ofthe amount of mater incded: books Herc at Une Ueto
‘ery detailed on theory bt lacking algorithms and practical advice books neat or fearon aD
the specials, eporting advanced resas inspec research or applation areas butt Edinb ss Geno
litle use to tadens andbooks which are nearly completly out of date, Moreover, and i ava ae asia
‘not infrequently in computer vision, the style and contents of research articles makes
it dificult (sometimes close to impossible) to reimplement the algorithms reported.
‘When working on such articles for this book, we have tried to explain the theory in,
what seemed to us a more understandable manner, and to add details necessary for
implementation. Of course, we take full and sole responsiblity for our interpretation.
‘We hope our book fils gap, and satisfies areal demand. Whether or not we have
succeeded is for you, the eader, to decide, and we would be delighted to hear your
comments. Above all, we hope you enjoy reading this book and find it useful,
‘Acknowledgments
‘Weare indebted toa numberof persons who contributedin various ways tothe making
ofthis book
We thank Dave Braunegg, Bob Fisher, Andrea Fusiello, Massimiliano Ponti,
Claudio Uras, and Larry Wolff fr their precious comments, Which allowed us tore
‘move several laws from preliminary drafts Thanks als to Massimiliano Aonzo, Adele
Lorusso, Jean-Francois Lo Alessandro Migliorini, Adriano Pascoleti, Piero Parodi,
and Maurizio Plu for theis careful proofreading.
Many people kindly contributed various material which has been incorporated in
the book; in the hope of mentioning them all, we want to thank Tiziana Aicard, Bill
Austin, Brin Calder, Start Clarke, Bob Fisher, Andrea Fusiello, Christan Fruhling,
Alois Goller, Dave Lane, Gerald McGunnigle, Stephen McKenna, Alessandro Miglio-
fini. Maid Mirmehdi, David Murray, Frannesea Qvlone, Masrzi Pi, Costas Plaka,
Toseba Tena Ruiz, John Selkirk, Marco Straforini, Manickam Umasuthan, and Andy
Wallace
Thanks to Marco Campari, Marco Cappello, Bruno aprile, Enrico De Micheli,
Andrea Fusiello, Federico Gites, Francesco Isgrd, Greg Michaelson, Pasquale Ot
tonello, and Vito Roberto for many useful discussions
(Our thanks to Chris Glennie and Jackie Harbor of Prentice-Hall UK, the former
{or taking us through the early stages ofthis adventure, the latter for following up with
remarkably light-hearted patience the development of this book, which was peppered 2s bak wa rite ile
aor was nit the Daren of Pays atthe Unies of GeneL
Introduction
“alse
“Ready when you are”
Big Trouble Lite Chia
1.1. Whatis Computer Vision?
“This is the fist, inescapable question of this book. Since itis very dificult to produce
an uncontroversial definition of such a multifaceted disciptine as computer vision, et
tusask more precise questions Which problems are we attempting to tackle? And how
ddo we plan to solve them? Answering these questions wil imit and define the scope of
this book, and, in doing so, motivate our definition of computer vision,
‘The Problems of Computer Vision, The target problem ofthis book is comput
ing properies of the -D world from one or more distal images. The properties that
interest us are mainly geometric (fr instance, shape and position of solid objets) and
‘dynamic ((or instance, object velocities). Most of the solutions we present assume that
‘considerable amourt of image processing has alteady taken place; thats, new images
have been computed from the original ones, or some image parts have been identified
tomake explicit the information necessary to the target computation,
‘The Tools of Cemputer Vision. As he name suggests, computer vision involves
computers interpre images, Therefore, the tools needed by a computer vision sytem
include hardware for acquiring and stoing digital imagesina computer, processing the
mages, and communcating results to users or other automated systems. Thisisa hook
‘bout the algorithms f computer vision: it contains very litle material about hardware,
but hopefully enough to realize where digital images come from. This does nor mean
that algorithms and software are the only important aspect ofa vision system. On the2
Chapter 1 introduction
contrary, in some applications, one can choose th hardware and can engineer the scene
to facilitate the task of the vision system: for instance, by controlling the illumination,
using high-resolution cameras, or constraining the pose and location of the objects In
‘many situations, however, oe has litle or no control over the scene. For instance, in
the case of outdoors surveillance or autonomous navigation in unknown environments,
appropriate algorithms are the key to sucess.
‘We are now ready to define the scope of computer vison targeted by this book: a
se1of computational techniques aimed at estimating or making explicit the geomericand
‘dynamic properties of the 3-D world from digital images,
1.2. The Many Faces of Computer Vision
‘An exhaustive ist of al the topics covered by the term “computer vision” i dificult to
collate, because the field is vast, multidsciplinary, and in continuous expansion: new,
exciting applications appear all the time. So there is more to computer vision than this
‘book can cover, and we complement our definition inthe previous section with a quick
‘overview of the main research and application areas, and some related disciplines,
12.1 Related
‘Computer vison has been evolving as @ multidisciptinary subject for about thirty years,
Its contours blend into those of artificial intelligence, robotics, signal processing, pat-
{em recognition, control theory, psychology, neuroscience, and other fields Two conse-
{quences of the rapid growth and young age of the field of computer vision have been,
that
+ the objectives tools and people ofthe computer vision community overlap those
of several other disciplines,
* the definition and scope of computer vision are still matters of discussion, so that
all definitions shouldbe taken witha grain of sal
‘You are likely to come across terms lke image analysis, scene analysis, and image
understanding, that in this book we simply regard as synonyms for computer vision,
‘Some other terms, however, denote disciplines closely related but not identical ro
computer vision. Here are the principal ones:
Image Processing. Image processing isa vast research area. For our purposes,
it differs from computer vision in that it concerns image properties and image-o-image
transformations, whereas the main target of computer vsion isthe 3-D world. As most
computer vision algorithms require some preliminary image processing, the overlap
between the two disciplines is significant. Examples of image processing include en-
‘hancement (computing an image of better quality than the orignal one), compression
(devising compact representations for digital images, typically for transmission pur-
‘poses, restoration (eliminating the effect ofknown degradations), and featureesiraction
(Gocating special image elements like contours, or textured areas). A practical way to
Section 12 The Many Faces of Computer Vision 3
understand the diffrence between representative problems of image processing and
‘computer vision is compare the contents of Chapters 3 4, and 5 with those of Chap:
ters6t011,
Pattern Recogaition. For a long time, pattern recognition has produced teeh-
niques for recognizng and clasifving objects using digial images. Many methods
developed in the past worked well with 2-D objects or 3D objects presented in con-
strained poses, but were unsuitable for the general 3-D world. This triggered much
of the research whith led to today’s field of computer vision. This book does not
cover classic patter recognition, although some of its methods ereep up here and
there, The Imernatinal Association for Pattern Recognition (IAPR) gathers many re-
searchers and users interested in the field, and maintains a comprehensive WWW ste
(ntep://peipa.essex.ac.uk/iapr/)
Photogrammeny. Photogrammetry is concerned with obiining reliable and
accurate measuremers from noncontaetimaging. This discipline overlap les with com:
puter vision than image processing and pattern recognitio, The main differences are
‘hat photogrammetry pursues higher levels of accuracies than computer vision, and not
all of computer vision is related to measuring Taking a look at photogrammetric meth-
fds before designinga vision system carrying out messurements is always a good idea
‘The International Soiety of Photogrammetry and Remote Sensing iste international
organization promoting the advancement of photogrammetry. It maintains avery com:
prehensive Internet site (heep://¥u.p.igp-ethz.ch/teprs/isprs.htal) including
archives and activites, and publishes the Journal of Photogrammetry and Remote
Sensing
1.22. Research and Application Areas
For the purposes ofthis section, research areas refer to topis addressed by a significant
‘numberof computer vision publications (a visible indicator of research), and application
‘areas refer to domains in which computer vision methods ate used, possibly in conjunc-
tion with other technologies to solve real-world problems. The following Ists and the
accompanying figures should give you the favor ofthe variety and scope of computer
vision; further applications ae illustrated in the book. The ls are meant to be sugges
tive, not exhaustive; most ofthe terms that may be unclear now will be explained later
in the book.
Examples of Research Areas
Image feature detection
Contour representation
Feature-hased segmentation
Range image analysis
‘Shape modelling and representation
‘Shape reconstruction from single image cues (shape from X)4
Chapter 1 Introduction
Stereo vision
Motion analysis
Color vision
Active and purposive vison
Tavariants
‘Unealibrated and selealibrating ystems
Object detection
34D object recognition
3D object location
High-performance and real-time architectures
Examples of Application Areas
Industrial inspection and quality control
Reverse engineering
Surveillance and security
Face recognition
Gesture recognition
Road monitoring
‘Autonomous vehicles (Land, underwater, space vehicles)
Hand-eye robotics systems
Space and applications
Miltary applications
Medical image analysis (e,, MRI, CT, Xays and sonar sean)
Image databases
Virtual reality, telepresence, and teleroboties
13. Exploring the Computer Vision World
‘This section proves satin vet of inten (Hh mulifaceted work of computer
vision. In al the following lists items appear in no particular order.
13.1. Conferences, Journals, and Books
Conferences. Te following international conferences cover the most significant
advancements oa the topics central o this book. Printed proceedings are available for
all conferences, and details appear regularly on the Internet.
International Conference on Computer Vision (ICCV)
International Conference on Computer Vision and Pattern Recognition (CVPR)
Furopean Conference on Computer Vision (ECCV)
Section 13. Exploring the Computer Vision World 5
Figure 1.1 A prototype of 3D inspestion cel, The cel incudes two types of depth sensors, 2
laser scaner, and a Mir fringe sem (se Chapter), whi locate the object n space and
perform measurements Notice the urtable for optimal, tomate objet postioning.
Intemational Conference on Image Processing (ICIP)
Intemational Conference on Pattern Recognition (ICPR)
Several national conferences and international workshops are organized on annual or
biennial basis. A complete list would be oo long, so none ofthese are mentioned for
fairness
Journals “The following technical journals cover the most sipniicant advance:
‘ments onthe fied. They ean be found inthe libraries of any university hosting research
‘on computer vision or image processing
Intemational Journal of Computer Vision
IEEE Transactions on Patter Analysis and Machine Intelligence
Computer Vision and Image Understanding
‘Machine Visionand its Applications
“mage and Visien Computing Journal
Journal ofthe Cptical Society of America A
Patter Recognition6
Chapter Introduction
Figure 12 Lett automatic recognition of road bridges in aril neared images (courtesy
‘of Majid Micmebii, University of Surrey; Crown copyiht reproduced with the persion
‘of the Concller of Her Majesty's Stationery Oifice), Right an example of automatic ace
‘eteton, particularly important forsurvellance and scary systems The fae regions selected
‘ean be subsequenly compared wi database of faces for demiaton (courtesy of Stephen
Mekenna, Queen Mary and Wesel College, London)
Pattern Recognition Leters
IFEE Transactions on Image Processing
IFEE Transactions on Systems, Man and Cybernetics
IEE Proceedings: Vision, Image and Signal Processing
Biological Cybernetics
Neural Computation
Artifical Intelligence
Books. So many books on computer vison and related fields have been pub-
lished that it seems futile to produce long lists nless each entry is accompanied by a
‘comment. Since including a complete, commented ist here would take too much space,
we leave the task of introducing books in specific, technical contexts to the followit
chapters.
13.2 Internet
As the Internet undergoes continuous, ebullient transformation, this information is
likely to age faster than the rest of this book, and we can only guarantee that the
Section 12. Explring the Computer Vsion World 7
aionomous rnd navigation some images froma sequence
inated motion fed (opis ow, dscussed in Chapter 8)
ho ee motion of well and camera
Figure 13 Computer vsion and
cured fom a moving cand thee
Somputed by a motion rays program, inisting
Fist below is correct a the time of printing, Further Internet sites, related to specific
problems ae given inthe relevant chapters ofthis book
vs w.c8.cau-edu/~ci/vision
+ The Computer Vision Home Page, nevp://ivu.cs.caa.ad/~ci2/ision,
nen) andthe Pfot European Image Processing Archive ome pass, beep://
peipa eszex.ac-uk, contain links (0 test images, demos archives, researl8 Chapter 1 Introduction Section 13. Exploring the Computer Vsion World 9
Figure 5 Computer vison and vital telepresence: the movements ofthe operator’ head
te tracked by a sion stem (aot shown) and copied in el tine by the ead-ee plato (or
stereo head) onthe sigh (courtesy of Daid W. Maray, University of Oxford)
soups, research publication, teaching material, frequently asked questions. and
plenty of pointes to other interesting sites.
‘The Annotated Computer Vision Bibliography is an cxcllent, well-organized
source of online published papers and reports, aswell as announcements of con
ferences and journals at ketp://sris.use.edu/Vision-lotes /oibiography/
contents. heal. You can search the contents by keyword, author, journal, con
ference, paper tle, and other ways.
Very comprehensive bibliographies on image analysis pater recognition and
Figure 1.4. Computer von ein inenngly important for rey oper and conor radon are yrodnieg ey yeer Oy Autel Resend at oe Univer
‘tones sie shies (ROVIAU) Bk the one shew above, ANGUS, bal the Ba aiavial Cen itay eck eal istarainencerraiente
Oocar ystens Laboratory of Heriot Wat Univy As ih nny RONAUV, ANGUS eT
tas eo and sna esr Gee Chapter 2) oto ft neal ude sont annie
ings: The wht cas rete retusa a des ane pols oa, imaged fom hore ;
Beton nig terest of aromatic sear for bet of eet (cut of Dae Lane + CVonline is olson of hypetent summaries of methods and applications of
Hevea Univesity) computer visor, eenly established by the University of Edinburgh http ://
wovs885 oda ah/daidb/ svat personal pages/19t/CVoa}sne/Crest2y
nea)0
Chapter 1 Introduction
Figure16 Anexample of motical application of computer vision: computer-assied diagnoses
from mammographic mages Top: X-ray image ofa feral breast, distized irom a conventional
Xray photowraphy. Bottom: ose-up and automatic identifation of suspect nodules (courtesy
{rsetaet Clarke and Brian Calder, Heriot-Watt University. and Mathew Freedman,
University Media School, Washington DC)
Section 14 TheRoad Ahead 11
«The Vision List and The Pixel are fre electronic bulletins circulating news and
‘requests and hostng technical debates. To subscribe, email pixeLeessex.ac-ak
tnd
[email protected], Both have fip and WWW archives of
‘wseful material
1.33 Some Hints on Math Software
‘Thssecton gives pointers to numerical computation packages widely used in computer
sision, which we founc useful. Notice that this ist reflects only our experience; no
‘comparison whatsoever with other packages is implied.
«+ Numerical Recipes is book and software package very popularin the vision com
‘munity The sourcecode, in C, FORTRAN and Pascal, is published by Cambridge
University Press together with the companion book by Pres, Teukolsky, Vet
tering, and Flannery Numerical Recipes in C/FORTRAN/Pasca. The book isan
‘xcellent introduction to the practicalities of numerical computation. There ialso
4 Numerical Recipes: Example Book illustrating how to cal the library routines.
+ Meschach i a public-domain numerical library of C routines for linear algebra,
developed by David E. Stewart and Zbigniew Leyk ofthe Australian National
Univerity, Canberra. For information and how to obtain a copy, see Web page at
http: //uww.nettb.no/net1tb/c/neschach/readze,
+ MATLAB sa softvare environment for fast prototyping of numerical programs,
vith its own language, interpreter, libraries (called ¢ootbox) and visualization
took, commetcaized by the US, company The MathWorks. It is designed to be
‘easy to use, and rns on several platforms, including UNIX and DOS machines.
Matlab is desrited in several rovent books, and there i large community of
‘users Plenty ofiformation on software, books bulletins training and soon avail
ble at The MarsWorks’ WWW site, xeep://wx nathvorks.con/, of contact
"The MathWorks in, 24 Prime Park Way, Natick, MA 01760, USA.
«+ Mathematias another software ensironment for mathematical applications, wth
a large community of users. The standard reference book is Stephen Wolfram’
Mathematica. Pty of information on software, books and bulletins available at
Wolfram Research's WWW site, btep://sw. x3. con
«+ Scilab is a publedomain scientific software package for numerical comput-
ing developed ty INRIA (France). It includes linear algebra, control sizal
processing, praphics and animation. You can acess Sclab from http: //ewe-
rroeq. inti. z/eciLab/, or contact Seslabeinria fr,
1.4 The Road Ahead
‘This book is organize in two logical parts The fist part (Chapters 2to 5) deals with
the image acquisition and processing methods (noise attenuation, feature extraction,
fine and eurve detection) necessary to produce the input data expected by subsequent
algorithms The primary purpose ofthis frst partisnotto give anexhaustivetreatment of
mage processing, butto make the book self-contained by suggesting image processing2
chapter 1 introduction
Figure 1.7 The book ta ghnce: method asses (white bones), resus (grey boxes) their
fnterdependence, and where to ind the various opis this book
etd commonly found inn soe ces cacti of compte vison, The
me cra be (Chap 1) dew compute sn probes
‘censor anal bee vento, oat tn) hat we hve nied
sree cute of he book captured by Fi 17, whic shows the mttods
preset her erence he ntrmeit d reesu and with ha
{Eons igh opis Our eps wh he segiton of ne imag ore eo
fas onate do sauce, Ble big fo grt te mages oe
prepcredw atenate te aie roee bythe aguston pres. The gt
Astin suctre ot leaton nd ieny of oes and sem paren~
Section 14 The Road Ahead 13
eters) are shown at th bottom of Figure 7. The diagram suggests that in most cases
the same information zan be computed in more than one way.
‘One well-known class of methods rely on the identification of special image
clement, called image features. Examples of such methods ar:
+ callbravon, which determines the value of internal and external parameters of the
vision system:
+ stereo analysis, vbich exploits the difference between two images to compute the
struture (shape) of 3-D objects and their location in spaces
+ recognition, whigh determines the objects’ identity and location;
+ feature-based motion analysis, which exploits the finite changes induced in an
mage sequence by the relaive motion of world and camera to estimate }-D
structure and motion; and
+ some shape fron single image methods, which estimate 3D structure from the
informatio contained in one image only.
‘Another class of methods computes the target information from the images ai-
retly.O these, his book includes:
+ one shape from single image method, which estimates 3-D structure from the
shading of a singe image, and
+ optical flow metiods, a class of motion analysis methods which regards an image
sequence as ¢ cbse approximation ofa continuous, time-varying signal.
‘We are now ready to begin our investigation into the theory and algorithms of
computer vision.Digital Snapshots
‘Verwsile doch! Dubs so shal!
Goethe, Faust
This chapter deals with digita images and thei relation to the physical world, We learn the
principles of image formation, define the two main types of images in this book (intensity and
‘range images), and discuss hov to acquire and store them in a computer.
Chapter Overview
Section 22 considers the base optical, radiometric, and geometric principles underlying the
formation of intensity images.
‘Section 23 brings the computer into the picture, laying out the special nature of digital images,
their acquisition, and some mathematical modes of intensity cameras,
Section 24 discusses the funlamental mathematical models of intensity cameras and theit
parameters
Section 25 introduces range images and describes a cass of range sensors hased on intensity
cameras so that we can use wiat we learn about intensity imaging,
‘What You Need to Know to Understand this Chapter
‘+ Sampling theorem (Appensix, section A3).
+ Rotation matrices (Appendix, section A.)
‘op Yor eo tei!
86
‘Chapter 2 Digital Snapshots
24 Introduction
This chapter deals with the main ingredients of computer vision: digital images. We
concentrate on two types of images frequently used in computer vision:
intensity images, the familar, photographlke images encoding light intensities, ac
‘quired by television cameras;
ange images, encoding shape and distance, acquired by special sensors like sonars oF
laser scanners.
Intensity images measure the amount of light impinging on a photosensitive device
range mages estimate directly the 3-D structure ofthe viewed scene through variety of
techniques, Throughout the Book, we will develop algorithms for both types of images?
Tt is important to stress immediately that any digital image, irrespective ofits
‘ype, is @2-D array (matrix) of numbers Figure 2.1 illustrates this fact forthe case of
intensity images: Depending on the nature ofthe image, the-numbers may represent
light intensities, distances, or other physical quantities This fact has two fundamental
consequences
“+ The exact relationship ofa digital image tothe physical world (ie. its nature of
range or intensity image) s determined by the aequstion process, which depends
on the sensor used
+ Any information contained in images (¢. shape, measurements, or object iden:
tity) must ultimately be extracted (computed) from2-D numerical arrays in which
itisencoded
In this chapter, we investigate the origin ofthe numbers forming digital image:
the rest ofthe book is devoted to computational techniques that make explicit some of
the information contained implicdy in these numbers.
22 Intensity Images
‘We start by introducing the main concepts behind intensity image formation
2.2.1. Main Concepts
In the visual systems of many animals, nchoding man, the process of image formation
begins withthe ight rays coming from the outside world and impinging onthe photore
ceptors in the retina. A simple look at any ordinary photograph suggests the variety of
physical parameters playing a role in image formation. Here i an incomplete list:
(Optical parameters of the lens characterize the sensor's optics They include
= lens type,
+ focal length,
boop esas ome sgt mak ses fr intesy nage only.
Section 22 Intensity mages 17
Baa ed ae
Figure 21. Digital images are 2 arrays of numbers: 2020 grey evel image of an eye
(pivels have been enhrged for display) andthe coresponding -D array.
+ field of view,
+ angular apectures.
Photometric paramcters appear in models ofthe light energy reaching the sosor after
being reflected from the objects inthe scene. They include:
+ type, intensity, and direction of ilumination,
« reflectance properties ofthe viewed surfaces,
‘effects ofthe sensor's structure on the amount of light reaching the photore:
cexptors
Geometric parameters determine the image postion on which aD points projected,
‘They include:
+ typeof projections,
+ position and orientation of camera in space,
+ perspective distortions introduced by the imaging process.8
Chapter? Digital snapshots
‘oprica
‘SoREEN
oprical
ae
svereM
‘APERTURE
Figure 22. "The basic elements of an imaging device
Allthe above plays arole in any intensity imaging device beta photographiccam-
cra, cameoeder, or computer-based system, However, further parametersare needed to
characterize digital images and their acquisition systems. These include:
‘the physical properties of the photosensitive matrix of the viewing camera,
“the discrete natue ofthe photoreceptors,
‘the quantization of the intensity scale
We will now review the optical, radiometric, and geometric aspects of image
formation,
222 Basic Optics
We frst need to establish a few fundamental notions of optics. As for many natural
visual systems, the process of image formation in computer vision begins with the light
‘ays which enter the camera through an angular aperture (or pupil), and ht asereen
‘oF image plane (Figure 2.2), the camera's photosensitive device which registers ight
intensities. Notice that most ofthese rays re the result of the reflections ofthe rays
‘mitted bythe light sources and biting object surfaces.
Image Focusing. Any single point of a sene reflects light coming from possibly
‘many directions so that many rays reflected by the same poiat may enter the camera
In order to obtain sharp images, all rays coming from a single scene point, P, must
‘converge onto a single point on the image plane, p, the mage of P. It his happen, we
‘ay thatthe image of Pisin focus: not, the image is spread over a circle. Focusing all
rays from a scene point onto a single image point can be achieved in two ways:
1 Reducing the camera's aperture to point, called a pinhole. This means that only
‘one ray from any given point can enter the camera, and ereaes a one-to-one
correspondence between visible points rays, and image points Thisresulsin very
Section22 Intensity mages 19
sharp, undistored images of objects at different distances from the camera (see
Project 21),
2. Introducing an optical system composed of lenses, apertures and other elements,
expliciy designed to make all rays coming from the same 3-D point converge
‘onto a single image point
{An obvious disadvartage of a pnole aperture is its exposure time; that i, how long
the image plane is allowed to reovive light. Any photosensitive devie (camera fn,
lecronie sensors) neds a minimum amount of ight to register a legible image. Asa
Pinhole allows very tlt into the camera pe time uni, the expose tine neesaty
{o form the image i too long typically several seconds) the of practical use Optical
systems instead can be ajusted to work under a wide range of illumination conditions
and exposure times (he exposure time being controlled bya shuter).
"= Intuitively, an opical system canbe regarded as a devi that sims at producing the same
‘mage obtained pinhole aperture, butby means ofa much ager aperture and ashortet
«exposure time: Moreover, an opti system enhances the light gathering power.
Thin Lenses. Standard optical systems are quite sophisticated, bu we can learn
the basi ideas from the simplest optical system, the shin lens, The optical behavior ofa
thin lens (Figure 23s characterized by two elements an axis, called opical avs, going
through the lens enter, O, and perpendicular to the plane; and two special points, F
and F,, called left ant right focus, placed on the optical axis, on the opposite sides of
the Len, and at the same distance from 0. This distance, called the focal length of the
lens is usually indicated by J.
By construction a thin lens deflects all rays parallel tothe optical axis and coming
fom one sie onto tke focus onthe other side, as described by 1Wo basic properties,
Thin Lens: Basie Properties
1. Any ray entering the les parallel othe axis on one side goes through the foets on the
othe side,
2. Any ray enteringthe lens from the focus. on onesie emerges pall to the sis the
ther side
The Fundamental Equation of Thin Lenses. Our next tak isto derive the
{fundamental equation of thin lenses tran the basic properties | and 2. Consider a point
'P, not too far from te optical axis and let Z+ f be the distance of P from the lens
along the optical axis (Figure 2.4). By assumption, a thin lens focuses all the rays from
P onto the same point, the image point p, Therefore, we can locate p by intersecting
only two known rays.and we do not have to worry about tracing the path of any other
2h ape time gow. very proprio the ero he aperture dame, hich a ua
‘sproprtorl tthe most of igh ht entre the imaging em®
Chapter? Digital snapshots
‘oprica
axe
‘SOREN
coprical
svSTEM
‘APERTURE
Figure 22 "The basic elements of an imaging device
Allthe above plays ole in any intensity imaging device beita photographic cam-
cera, camoorder, or computer-based system. However, further parametersare needed to
characterize digital images and their acquisition systems. These include:
‘the physical properties of the photosensitive matrix of the viewing camera,
+ the discrete nature ofthe photoreceptors,
‘the quantization of the intensity sale
We will now review the optical, radiometric, and geometric aspects of image
formation,
22.2 Basic Optics
We first need to establish a few fundamental notions of optics As for many natural
visual systems, the process of image formation in computer vision begins with the light
rays which enter the camera through an angular aperture (or pupil), and hit sreen
‘ot image plane (Figuse 2.2), the camera's photosensitive device which repstrs light
intensities, Notice that most ofthese rays are the result of the reflections of the rays
mitted by the ight sources and hiting object surfaces
Image Focusing. Any single point of a sene reflects light coming from possibly
‘many directions, so that many rays reflected by the same point may enter the camera,
In order to obtain sharp images, all rays coming from a single scene point, P, must
‘converge onto single point on the image plane, p, the mage of P. It his happen, we
‘ay that the image of Pisin focus; not, the image is spread over a circle, Focusing all
rays from a scene point onto a single image point can be achieved in two ways
1. Reducing the camera's aperture to a point, called a pinhole. This means that only
‘one ray from any given point can enter the camera, and ereates a one-to-one
correspondence between visible points rays and image points Tis resulsin very
Section22 Intensity mages 19
sharp, undistored images of objects at different distances from the camera (see
Project 21),
2. Introducing an optical system composed of lenses, apertures, and other elements,
explicitly designed to make all rays coming from the same 3-D point converge
‘onto single image point
‘An obvious disadvartage ofa pinhole aperture is its exposure rime; that i, how long
the image plane is alowed to receive light. Aay photosensitive device (camera film,
electronic sensors) needs a minimum amount of light to register a legible image. As @
pinhole allows very litle lightinto the camera per time unit, the exposure time necessary
{form the image is too long (typically several seconds) tobe of practical use? Optical
systems, instead, can be adjusted to work under a wide range of illumination conditions
and exposure times (he exposure time being controlled bya shuter)
"= Intuitively an opical system can be regarded as a device tht sims at producing the same
{mage obtained pinhole aperture, but by means ofa much lager aperture and ashoret
«exposue time: Moreover, an optical system enhances the ight gathering power.
Thin Lenses. Standard optical systems are quite sophisticated, but we can learn
the basic ideas from the simplest optical system, the sin lens. The optical behavior of a
thin lens (Figure 2.3)s characterized by two elements: an axis, called opical aus going
through the lens center, O, and perpendicular to the plane; and two special points, F
and F,, called left and right focus, placed on the optical axis, on the opposite sides of
the kes, and at the same distance from 0. This distance, called the focal length ofthe
lens is usually indicated by f.
By construction a thin lens deflects all rays parallel tothe optical axis and coming
from one side onto tke focus onthe other sie, as described by two basic properties,
Thin Lens: Basic Properties
1 Any rae
other side,
esng the lens parallel othe axis on oe side goes through the foes on the
2. Any ray eoteringte lens from the focus. on one side emerges pall to thesis om the
ther side
The Fundamental Equation of Thin Lenses. Our next task isto derive the
{fundamental equation of thin lenses trom the basic properties | and 2. Consider a point
'P, not too far from te optical axis, and let 7+ f be the distance of P from the lens
along the optical axis (Figure 2.4) By assumption, a thin lens focuses al the rays from
P onto the same point, the image point p, Therefore, we can locate p by intersecting
only two known rays.and we do not have to worry about tracing the path of anyother
2h apo tine grou. very propriate the er othe aperture dame, which aa
‘sproportoral tthe mont of igh ht entre he nang semChapter2. Digital snapshots
Section 22 Intensity mages 21
Figure 23 Geometic optics ofa thin les (a perpenicalar view to
‘he plane approximating the len)
Note that by applying property 1 tothe ray PQ and property? tothe ray PR, PQ
and PR are deflected to intersect at acertan pointon the other side of the thin lens. But
singe the lens focuses all rays coming from P onto the same point, PQ and PR must
intersect at p! From Figuee 24 and using the two pars of similar triangles < PFS >
and < ROF, > and « psF, > and < QOF, >, we obtain immediately
p en
Setting 2= 2-4 f and? =2+ f, (2.1) reduces to our target equation.
2:
‘The Fundamental Equation of Thin Lenses,
a
| The say going through the lens center, 0, named the principal ray, goes through p un
deleted.
Field of View, One last observation about optics Let d be the effective diameter
ofthe lens, ideatifying the portion of te lens atually reachable by light rays
Figure 24 ging bya thin lens Noie that, in general, teal leas has 0
ferent focal lengths hecause the carvaturesof ts two surfaces may bedlerent
‘The sition depicted hee isa speci ease, bu its suficient for our purposes.
See the Further Readings at the end of this chapter for more on optics.
© Wecalld the efcive diameter wo emphasie the dillerence between d and the physical
ameter ofthe ens The aperture may prevent ght rays from reaching the peripheral
Dont ofthe lens so that dis usually smaller than the pial diameter ofthe les.
‘The effective lens diameter and te focal length determine the field of view of the
leas, which san angular measure of the portion of 3-D space actually seen by the camera
Itiscustomary to deine the field of view, w, as half of the angle subtended by the lens
diameter as seen fom the fous:
a
nw = @3)
‘This is the minimum amount of optics needed for our purposes. Optical models of
real imaging devices are a great deal more complicated than our treatment of thin
(and ideal) lenses problems and phenomena not considered here include spherical
aberration (defocusirg of nonparaxil rays), chromatic aberration (different defocusing
of ays of different dlrs), and focusing objects at different distances from the camera.*
“The tondamestal oye of thin enim ht sone point dierent dace rr he es cae
‘afocs ot ite: nage disances The opal ln sens fel eames te designed hat al os
thn agen age of dances area ono slow ot inge plane nd hertoe sepa ino,
Tustange scaled dh ld ofthe camera22 Chapter? Digital Snapshots
Liew source
see coo ARRAY
‘9.
Tot D>
comes |p
. ap)
ua
1 a
|sunrace
Figure 25. tlusrton ofthe basic atiometrie concep.
‘The Further Readings section at the end of ths chapter tell where to find more about
opts.
223. Basic Radiometry
‘Radiometry ste essential part of image formation concerned with the relation among
the amounts of ight energy emitted {rom light sources, reflected from surfaces, and
registered by sensors. We shall use radiometric concepts to pursue two objectives
1. modelling how much of the illuminating light is reflected by object surfaces;
2, modelling how much ofthe reflected light actually reaches the image plane ofthe
Definicons. We begia wit some definitions illustrated in Figure 25 and sun
marized as follows:
‘Image Irradiance and Seene Radiance
Ti nage rade power f hgh pr nia an ah sin po he ee
" “The scene radiance is the power of the light, per unit arca, ideally emitted by each point P of
asus sina renin
‘= Ideally refers to the fst that the surace in the destin of soene radiance might be
the illuminated surface ofan objet the radiating surface of alight source, or even @
Sctitious surface. Te term scone radiance denotes the ttl radiance emite bya pint;
Section22 Intersty mages 23
sometimes radiance refers tothe energy radiated from a sutace (emitted o refed),
whereas irradiance refers tothe energy incident on a surface.
Surface Reflecance and Lambertian Model. model of the way in which a
surface reflects incident light called a surface reflectance model. A well-known one is
the Lambertian mocel, which assumes that each surface point appears equally bright
from all viewing directions This approximates well the behavior of rough, nonspecular
surfaces as well as various materials like matte paint and paper. Ifwe represent the
direction and amouat of incident light by @ vector I, the scene radiance ofan ideal
‘Lambertian surface, issimply proportional to the dot product between Land the unit
‘normal to the surfac,
Un ea)
With p >0 a constant called the surface’s albedo, which i typical of the surfaces
material, We also assume tat I'm is positive; hati the surface faces the light soute
“his isa necessary canton for the eu flight to reach P this condition snot met,
the scene radiance shouldbe set equal 0.
‘We willuse the Lambertian modelin several pars fthisbook; or example, wile
analzingimage sequences (Chapter) and computing shape rom shading (Chapter).
Intuitively the Lambertian model is based on the exact cancellation of two factors Ne-
sletng constant ems the amount of ight reaching any suraceisalays proportional
tothe cosine ofthe angle between the uminant and the surface normal m (that
the effective area ofthe suriae as seen fom the lluinant direction). According to
the model, a Lambetin surface eles light ina given direction d proportionally 10
the cosine ofthe angle between dand m. But ince the surfac’s area sen rom the
1Chapter 2 Digital Snapshots
followed by isotropic scaling by he factor f/. Section 4 shows that this and other
camera models ca also be derived in a compact matrix notation, Meanwhile, tis ime
forasummary.
‘The Perspective Camera Model
Inthe perspetin camera made! (anv) he coordinates xy point image ofthe 3D
pein = [81 2" ae pen by
The Weak-Perspective Camera Model
The average depth of the sone, 2 is much ager than the relative distance between any O80
seene points along the opis ax, the weak perspective camera model (lnat) hols
All equations ar writen nthe camera reference frame.
‘Acquiring Digital images
Inthissection, we now discuss the aspects ofimage acquisition that ar special odigtal
images, namely:
‘the essential structure ofa typical image acquisition sytem
‘+ the representation of digital images ina computer
‘+ practical information on spatial sampling and camera noise
23.1. Basic Facts
How do we acquire a digital image into a computer? A digital image acquisition system
consists of three hardware components: a viewing camera, typically a CCD (Charged
Coupled Device) camera, a frume grabber, and a host computer, on which processing
takes place Figure 29)
‘Tiss eatad contrat, ut tbe oly post Fr sas, sever manufactures sommes.
alas mar camera which an aque ages and perfor en amount ol imag proces,
Section 23 Acquiting Digtal mages 29
CCD arrey
pies
frame hoa
I video | amber || computer
signal
Figure 29. Essential components of a digital image acquisition system.
“The input to the camera is, as we know, the incoming light, which enters the
‘camera's lens and itsthe image plane-Ina CCD camera, the physical image planes the
CCD array, an x m rectangular grid of photosensors each sensitive to light intensity
Each photosensor cas be regarded as tiny, rectangular black box which convert ight,
‘energy into a voltage, The output of the CCD array is usually a continuous electric
‘signal, the video sign which we can rogard as generated by scanning the photosensors
inthe CCD array ina given order (ene by ine) and reading out their voltages. The
video signal is sent can eletronic devie calle a frame grabber, where its digitized
into a 2-D, rectanguhi array of 8 > Mf integer values and stored in a memory butter.
‘At this point, the image can be conveniently represented by a N x M matrix, E, whose
entries are called pital (an acronym for picture elements), with Nand M being two fixed
integers expressing the image size, in pixels along each direction, Finally, the matrix E
{s transferred to host computer for processing.
For the purposcs ofthe following chapters, the starting point of computer vision is
the digitized image, E. Hece ate the main assumptions we make about E,
Digital Images: Representation
[A igial images represented by a numerical mati, with N rows and M columns,
"i j) denotes the mage value (nage bightnes) tpt! (i,j) Fah row and j-th column),
and encodes the intensty recorded by the photosensosof the CCD array contributing to that
pial
dj) a tegerin te range 0,258)
“Thislast statement about the range of E(,j) meansthat the brightness of an image
point can be represested by one byte, or256 grey level (ipically Dis black, 285 white)
‘Thisis an adequate resolution for ordinary, monochromati (or grey-level) images and
{ssuitable for many vision tasks. Color images require three monochromatic component
images (ted, green, Hue) and therefore three numbers Throughout this book, we shall
always refer to grey-evel images,
If we assume thatthe chain of sampling and filtering procedures performed by
‘camera and frame buffer does not distort the video signal, the image stored in the30
CChapter2 Digital snapshots
‘rame butfersa faithful digitization of the image captured by the CCD array: However,
‘the number of elements along eac side ofthe CCD arrays is usualy diferent from the
dimensions, in pixels of the frame buffer. Therefore, the position ofthe same point
‘nthe image plane will be different if measured in CCD elements or image pixels;
‘more precisely, measuring positions from the upper lft corer, the elation between the
position (sm Yn) in pixels) inthe frame buffer image and the postion (sce, Yeo)
(in CCD elements) on the CCD arrays given by
ceo
Exe iy
Sete han and in: areotheoly rants esponsb det
tape nag tape ote CCD ty aon te Frond ete
Sern en renal oneal te CD ety ias cay
theme tet Ths buat figs 210 The nae se ne compar
tenon com tase gia V2 pe gues la). By mene ts
capita pila CCD ccnne wih tye cr/m teers te
Sorboul an eal CCD ent sae (Faw 2100) pois cay ee
SSioonotanmr COD ot wih sured ees gue 2100)
@
) ©)
Figure 2.10 The same distortion of a piven pattern on the CCD array (a) s produced by
‘ann gi of rectangular elements of aspect aio n/m (Band by am x id of squared
‘element (0.
Section 23 Acquiring Digital Images 31
In summary, itis convenient to assume that the CCD elements are always in
one-to-one correspondence with the image pixels and to introduce effective horizontal
‘and vertical sizes to account for the possible different scaling along the horizontal and
vertical direction, The effective sizes of the CCD elements are our frst examples of
camera parameters, vhich are the subject of section 2.4,
23.2. Spatial Sampling
‘The spatial quantization of images originates at the very early stage of the image
formation process asthe photoreceptors ofa CCD sensorare organized in rectangular
array of photosensitie elements packed closely together. For simplicity, we assume that
the distance d between adjacent CCD elements (specified by the camera manufacturer)
{sthesamein the horzontal and vertical directions. We know from the sampling theorem
‘that d determines the highest spatial frequency, that can be captured by the system,
according tothe relation
How does this characteristic frequency compare with the spatial frequency spectrum of
ages? A classical rsult ofthe diffraction theory of aberrations states thatthe imaging
‘process can be expressed in terms of a linear low-pass tering ofthe spatial frequencies,
(ofthe visual signal. (For more information about the diffraction theory of aberrations,
see the Further Readings) In particular, i ais the linear size of the angular aperture
ofthe opties (eg, the diameter ofa cixcula aperture), the wavelength of light, and
‘the focal length, spatial frequencies larger than
9
{do not contribute tothe spatial spectrum of the image (that is they are filtered ou).
Ina typical imege acquisition system, the spatial frequency vs nearly one order
‘of magnitude smaller than v,. Therefore, since the viewed pattern may well contain
spatial frequencies larger than», we expect aliasing. You can convince yourself of the
realty of spatial aliasing by taking images of a patter of equally spaced thin black
lines on a white background (sce Exercise 2.6 at increasing distance from the camera.
‘As predicted by the sampling theorem, ifn isthe numberof the CCD elements in the
horizontal direction, the camera cannot ee more than’ vertical lines with ’ somewhat
less than n/2, say n/~n/3). Until the number of lines within the field of view remains
‘smaller than altho lines are correctly imaged and resolved. Once the limits reached,
ifthe distance of thepatter is increased futher, but before bluring effects take over,
the number of imaged lines decreases a the distance ofthe pattern increases!
© Themain rece why spatial aliasing soften neglected is thatthe amplitude (hat ithe
{information coment of high requency components of ordinary images s usualy though
‘byno means always, very small32 Chapter? Digital snapshots
233. Acquisition Noise and How to Estimate It
Lt us briefly touch upon the problem of nose introduced by the imaging system and
how its estimated. The effect of nosis, essential, that image values are ot those
expected, these are corrupted during the various stages of image acquisition. As a
onscqucne, the pixel values oftwoimagesof the same scene taken bythe sume camera
andin the same light conditions are never excl the same (ty it). Such Bustuations
vilintroduce errs i the results of ealeaatons based on pine values iti therefore
‘inportant to estimate the magnitude ofthe noise.
“The main objective of ths section iso suggest a imple characterization of image
noite, which can be used by the algorithms of fllowing chapters, Noise attenuation, in
pariculr, isthe subject of Chapter 3
‘An obvious way 1 proceed isto regard noisy variations as random variables,
and try to characterize thee statistical behavior. Todo this, we acquire a sequence of
mages ofthe same scene in the same acquisition condition, and compute the pointwise
average of the image brightness overalltheimages The same sequene can also be sed
{o-estimate the signal-noise rat of the acqustion system, as fllows®
‘Algorithm EST_NOISE.
Weare given images ofthe same cee, EE... ys which we assume square (N x N) fo
simplicity
Toretch j=
het
LF mas
}
97) em
‘Thequantty o(,) isan estimate of the standard deviation ofthe acquisition noise at ach pte,
‘The average of, ) over the image is an estimate ofthe average nos, while Max j.y-1,
(oii) am estimate ofthe worst cake aouision noke,
Notice thatthe bea requeney of some Muorescent room lights may skew the resus of
EST-NOISE.
Figure211 shows the noise estimates relative toa particular acquisition system. A
static camera was pointed ata picture posted on the wall. Sequence of » = 10images
‘was then acyuired. The graphs in Figure 2.11 reproduce the average plus and minus
"the sigan ois rat is usally expres ia declbel (¢8), a i dated as 10 tne the past in
‘nue othe ato of wo poner fn ur ese, fal nd mae). Fo xa» gal tone ao of
Mearepoact 00 = 20.
Section 23 Acquiring Digital Images 33,
1 Srey eves
scan line
Figure 2.11. Estimated aequiston noise. Graphs of the average image
[rghinss pl (slid ine) and xmas (dotted line) the estimated standard
eviaton, over a sequence of images of th same scene along the same
Fortaontal can fine. The image brightness ranges fom 73 to 211 grey evel.
the standard devitionof the image brightness (pixel values) over the entre sequence,
along an horizontal seanline (image row). Notice thatthe standard deviation s almost
independent ofthe average, typically les than 2 and never lager than 25 grey values.
This corresponds to an average signal-to-noise ratio of nearly one hundred.
"Another cause of noise, which i important when a vision system is used for fine
measurements, is that pizel values are not completely independent ofeach other: some
crossalking occurs between adjacent photosensors in each row of the CCD array.
{due to the way the content of each CCD row i read in order to be sent to the frame
buffer. This can be vetied by computing the autocovariance Ceg(, ) of the image of
spatially uniform pater parallel to the image plane and illuminated by diffuse light.
‘Algorithm AUTO_COVARIANCE
=i yan Nya ~~ 1. Given an image foreach :
Ceat’, = LEG -FTTMEU H+) ET TAD C184
chapter? Digital Snapshots
Figure 2.12 Autocovarince ofthe image of a uniform
pattern for 8 typical image aequistion sistem, showing
ross aking Between adjacent piel along
5 The autocovariance sbould actually be estimated asthe average ofthe autocovariance
computed on many ines of the sme patter. To minimize the effect of radiometric
rorlineaies (8 (2.13), Cee should be computed on 2 patch in the central portion of
the image,
Figure 2.12 displays the graph ofthe average ofthe autocovariance computed on
‘many images acquired bythe same acquisition system usedto generate Figure 2.11, The
utocovariance was computed by means of (2.18) onapatch of 16x 16pixelscentered in
the image center, Notice the small ut vsile covariance along the horizontal direction
consistently with the physical properties of many CCD cameras ths indicates thatthe
‘rey vale ofeach pixels not completely independent ofthat of is neighbors
24 Camera Parameters
‘We now come back to discuss the geometry of a vision system in greater detail. In
prtvula, we want to characterie the parameters underlying camera model
24.1 Definitions
‘Computer vision algorithms reconstructing the 3-D structure of a scene or computing
the positon of objects in space need equations linking the coordinates of points in -D
space with the coordinates oftheir corresponding image points. These equations are
‘written in the camera reference frame (see (2.14) and section 22.4), but itis often
assumed that
+ the camera reference frame can be located with respect to some other, known,
reference frame (the world reference fame), and
Section 24 Camere Parameters 35
+ the coordinatesof the image pointsin the camera reference frame canbe obtained
‘tom pixel coordinates, the only ones directly available from the image.
This is equivalent to assume knowledge of some camera's characteristics, known in
vision asthe camera's exrinsic and intrinsic parameters. Our next task isto understand
theexactnature of the intrinsic and extrinsic parameters and why the equivalence holds.
Defi
‘Camera Parameters
‘Te xin parameten aethe parameters tat define he location andorentaton of the camera
referece fame with spect to 2 known world eference fame.
"Themis paramaers are the parameters necessary oink the piel coordinates ofanimage
point with the corespanding coordinates inthe camera reference frame
Inthe next two sections, we write the basi equations that allow us to define the
extrinsic and intrinsic parameters in practical terms. The problem of estimating the
value ofthese pararreters is called camera calibration. We shall solve this problem in
‘Chapter 6 since calneation methods need algorithms which we diseuss in Chapters 4
and 5
242. Extrinsic Parameters
“The camera reference frame has been introdced forthe purpose of writing the fund
‘mental equations of the perspective projection (2.14) in a simple form. However, the
‘camera reference frane i ofen unknown, and a common problem is determining the
location and orientajion of the camera frame with respect to some known reference
frame, using only image information. The extrinsic parameters are defined as any’ set
‘of geomeric parame that identify uniquely the transformation Berween the unknown
‘camera reference frarweanda known reference frame, named the world reference fame
‘A typieal choie for describing the transformation between camera and world
frame isto use
+ 3D translation vector, T, desribing the relative positions ofthe origins ofthe
two reference frames, and
+ 3 x 3rotationmatri, R, an orthogonal matrix (R R= RT = 1) that brings the
corresponding axes of the two frames onto each othe,
The orthogonality relations reduce the numberof degrees of freedom of Rto thee (see
section A.9 inthe Appeadi).
Inan obvious rotation (sce Figure 2.13), the relation between the coordinates of
‘point Pin world aed camera frame, Py and P, respectively is
219)36
‘Chapter 2 Digital Snapshots
Figure 213 The relation between camera and world coordinate
frames.
with
ny ma ny
R= [ nm ns
mre 3
Definition: Extrinsic Parameters
The camer entrnsc parameters ae the translation vector, T, and the rotation mati, (cr,
‘eter its fee parameters) which pei the transformation between the eamera.and the word
reference fare
243. Intrinsic Parameters
‘The ntinsc parameters can be defined as the set of parameters needed to characterize
‘the optical, geometric, and digital characteristics ofthe viewing camera, Fora pinhole
camera, we need thee sets of intriasic parameters, specifying respectively
+ the perspective projection, for which the only parameter is the focal length, f;
+ the transformation between camera frame coordinates and piel coordinates;
* the geometric distortion introduced by the optics.
From Camera to Pisel Coordinates. To find the scone set of intrinsic param-
‘ters, we must link the coordinates (iq, Yn) of an image point in pixel units with the
‘coordinates (x,y) of the same point in the camera reference frame. The coordinates
Section24 Camera Parameters 37
(ins Yin) canbe thought of as coordinates ofa new reference frame, sometimes called
image reference frame,
“The Transformation between Camera and Image Frame Coordinates
Neglecting any geometrs dstorsions possibly introduced by the opis and in the assumption that
the OCD arrays made os rectangular rd of photosensitive elements, me hare
2G ~ons
in =) em)
wih (0) the cooriats in pixel ofthe image centr (the principal point), and (5) the
tffetive sie of th pixel (in ilimeters) inthe boizoatl and vertal direction respectively
Therefore, the earrent set of intrinsic parameters is f.0,,0), 58
(The sig changein(220)isdue tothe fact thatthe horizontal and vertical axes ofthe image
sn camera erence frames have opposite orientation.
In several cases the optics introduces image distortions that become evident
atthe periphery of the image, or even elsewhere using optics with large fields of
View. Fortunately, thee distortions can be modelled rather accurately as simple radial
distortions, according to the relations
xeag(l +hy? +4)
pull +h? +h)
with (x4 34) the eoondnates ofthe distorted points, andr? = x3 +93. As shown by
te equations above, tis distortion isa radial displacement ofthe image points The
displacement is null atthe image center, and increases with the distance ofthe point
from the image center. ky and are further intrinsic parameters Since they are usually
very small, radial distortion is ignored whenever high accuracy ie not required inal
regions ofthe image ce when the peripheral pixelscan be discarded, IFnot, asf <<,
fis often set equal to O, and ky isthe only intrinsie parameter to be estimated inthe
radial distortion mode
= Themapnitede of ;cometr distortion depends on the quality ofthe lens wed. Asa rule of
hum with optiesof average quality and CCD sie around 50 S00 expect distorsonsof
several pcs Gayaround 8) inthe outer cornice ofthe image- Under these circumstances,
model with s=0is stil acute.
tis now time foe a summary.Chapter 2 Digital Snapshots
Intrinsic Parameters
‘he camera intrinsic parameters are defined as the focal length, f, the location of te image
center in piel coordinates (0s the effective pixel size inthe hovigonta nd vertical direction
(Gay) ad, required, the rail distortion coefficient, hy
2.44 Camera Models Revisited
We are now fully equipped to write relations linking directly the pixel coordinates ofan
image point withthe world coordinates ofthe corresponding 3D point, without explicit
reference tothe camera reference frame needed by (2.14).
Linear Version of the Perspective Projection Equations, Plugging (219) and
(2.20) into (2.14) we obtain
dag one
Je.
1a =09)5= fe 21
Re
where R,,/= 1,2,3,isa3-D vectorformedby the/-throw of hematrix R Indeed, (2.21)
relates the 3-D coordinates ofa point in the world frame to the image coordinates of
the corresponding image point, via the camera extrinsic and intrinsic parameters
"© Notice that, ve tothe particular form of (2.21), not all the itis parameters are
independent. In particular, the focal length could be absorbed into the effective sizes of
the OCD elements
"Neglecting radial distortion, we can rewrite (2.21) asa simple matrix product. To
this purpose, we define two matrices, My and Mion a8
fis 0 oy
(3 -t 3)
oY
ny mans “RET
seon(t sR),
nym ny RIT
M,
and
so that the 33 matrix Miy, depends only on the intrinsic parameters, while the 3x 4
‘matric M,,, ony on he extrinsic parameters. we now adéa“I” asa fourth coordinate of
Po (thatis express Pin homogeneous coordinates) and form the product Mie Mex Pay
‘we obtain linear matrix equation describing perspective projections.
Section24 Camera Parameters 39
‘The Linear Matrix Equation of Perspective Projections
What is interestng about vector [1,22 25] is that the ratios (1/23) and (x23),
are nothing but the image coordinates:
n/n in
/= Ym
“Moreover, we have separated nicely the two steps of the world-image projection:
‘+ Mg performs te transformation between the world and the camera reference
frame;
‘+ Ms performs the transformation between the camera reference frame and the
mage reference frame.
"© In more formal terms, the elation between a 3-D pint an its perspective projection on
the image plane cn be seen a alinea transformation fom the projective space, the space
of vectors [Xa Fy Za 1] tothe projective plane, the space of econ [2,22 3) This
tsansformatin is tind up oan arbrary sal factor and so thatthe matix ba only
1 Independent erties (ee review questions) Thisfact willbe discussed in Chapter 6
The Perspective Camera Model. Various aera models, including the perspec
tive and weak-perspestive ones, can be derived by Seting appropriate constraints on
the matrix M= Mfg Mas. Assuming, for simplicity, 0, and sy sy 1, M ean
then be rewritten as
Mw
fin fra ~frus RIT
( Sry —fra ~frs rmx)
mores RIT.
When unconstrained, Mf describes the fullperspective camera model and is called
projection mar
‘The Weak-Perspective Camera Model. To derive the form of M forthe weak:
perspective camera model, we observe that the image p of a point Pi given by
ye) pia
pam | 7 (ma-») ez
Rie-7)40 chapter? Digital Snapshots
Bur IR]P —T)ssnpl te distance of Pom he projstioncenteralonethe opt
tris therefore, the Bae constraint forthe wek-prspectie aproxmaton ean be
writen ss
ech any
Re =
are y Bo aro pointsin3-D spc, and the cenrid of P and P, Using 228,
(2.23) cane wate or vat
RiP)
n(n )
Re)
‘Theefore, the projection matrix M becomes
fru fra ~fny fRIT
(= -fra -frs RIT )
o 0 0 Re
The Affine Camera Model. Another interesting camera model, widely used in
the iterature forts simplicity the so-called afine model, a mathematical eneralize
tion ofthe weak-perspective model. In the afine model, the first three entresin the last
row of the matrix Mare equal to zero, All other entries are unconstrained. The afine
‘model des not appear to correspond to any physical camera, but leads to simple equa-
tions and has appealing geometric properties The affine projection does not preserve
angles but does preserve parallelism,
‘The main difference with the weak-perspective model sthe fact that, inthe affine
model, only the ratio of distances measured along parallel directions is preserved. We
row move on to consider range images.
25. Range Data and Range Sensors
Inmany application, one wants to use vision to measure distances; for example, tosteer
vehicles away from obstacle, estimate the shape of surfaces, or inspect manufactured
“objects A single intensity image proves of limited use, as pixel values are related to
surface geometty only indirectly; thats through the optical and geometrical properties,
‘ofthe surfaces as wel asthe illumination conditions. ll these are usually complex to
model and often unknown. As we shall see in Chapter 9, reconstructing -D shape from
asingle intensity image isdficlt and often inaccurate. Can we aquireimagesencoding
shape directly? Yes this is exactly what range sensors do.
‘Range Images
Range images area special clas of digital images Each pitel ofa range image expresses the
Alstance between a known reference frame anda vile poi i the sene. Therefore, a range
image reproduces the 4D sractre of Sone, andi est thought ofa sompled surface.
Section2.5 Range Data and Range Sensors 41
Figure 2.14. Range views ofa mechanical component displayed as intensity imge (fl the
lighter the closer), cosine shaded (idle), and HD sulace (right). Courtesy of R. B. Fer,
Department of Aric nteligene, University of Edinburgh
25.1. Representing Range Images
‘Range images can be represented in two basic forms. One i ist of 3D coordinates
ina given reference ‘rame, called xyz form ot cloud of poins, for which no specific
order is required. Th other i a matrix of depth values of points along the directions
of the x,y image axes, called the ry form, which makes spatial information explicit.
Notice that xyz data 2an be more dificult to process than n data, as no spatial order
is assumed. Range inages are also referred 10 as depth images depth maps, xyz maps,
surface profiles, and 25-D images,
= Obviowslyr data can always be visualized a8 anormal intensity image the term “ange
image” refersindeed tothe , for, We wil sume this in the folowing unless otherwise
species. One ca alo display a range image 1s cosine shaded, whercby the grey level
of each ptelis proportional othe nom ofthe gradient othe range surface Figure 2.14
ilstrates he three main method of displaying range images.
Danseranginaes rove fo einai edie rope espe) of
object surfaces
252 Range Sensors
An optical range senor is advice using optical phenomena to acquire range images.
‘We concentrate om otal sensors as we are concerned with sion. Range sensors may
‘measure depth at onepoint onl, o the distance and shape of surface profiles oof fll
surfaes Tt suseful tw distinguish between active and pasive range sensors
Definition: Active and Passive Range Sensors
Aaive rane sensors projet energy (eg a pattern of ight, sonar pulses) onthe scene and
detects position to perform the measure; o expla the effect of comtoled changes of some
sensor parameters (e f0eu)a
Chapter? Digital Snapshots
Passive range sensors rely only on intensity images to reensruct dep (eg. stereopsis,
diseased in Chapt 7).
Passiverange senso are the subject of Chapters 7,8,and 9, and are not discussed
farther here Active range sensors exploit a variety of physical principles; examples are
radars and sonars, Moré interferometry, focusing, and triangulation. Here, we sketch
the first three, and concentrate on the later in greater detail,
Radars and Sonar. The basic principe ofthese sensrsistoemitashor electromagnetic
‘or acouticwave, or pue, and detect the return echo) reflected from surounding
surfaces, Distance i obtained a a function of th ime taken by the wave to hit
8 surface and come back, called time of ight, whichis measured directly. By
Sweeping such asensor actos the are cee, fllrange image canbe acquired.
Dilerentprinepes are used in imaging laser radars or insane, sch sensors
can emit an amplitude-modulated laser beam and measure the phase diference
tetmeen the transmitted and recived signals
MoiréInerferometry. A Moir interference pattern created when two gratings with
regularly spaced palters (6, line) are superimposed on eachother, Moré
Sensors project such gratings onto surfaces, and measure the phase differences of
the observed interference pattern. Distance isa function of such phase diference.
Notice that such sensors can reaover absolute distance ony ifthe distance of one
reference points known otherwise, only relative distances between sene points
se obained (whichis desirable for inspection)
Active Focusing Defocusing These methods infer range from wor moreimages ofthe
same scene, aquired Under varying focus setings For instance, shapefrom-focus
Sensors vary the focus of a motorized lens ontinuovsy and measure the amount
‘of blr foreach focus value. Once determined the hes focused image, a model
Taking focus values and distance yes the distance. In shape from-deocus, the
bur focus model sited two images only to estimate distance
Inthe following section, we concentrate on triangulation-based range sensors. The
‘main reason for this choice is that they are based on intensity eamera, so we can exploit
«everything we know on intensity imaging. Moreover, such sensors can give aeurateand
‘dense 3-D coordinate maps arecasy 10 understandand build (aslougesTinited ecuracy
is acceptable) and are commonly found in applications.
253. Active Triangulation
We start by discussing the basic principle of active triangulation. Then, we discuss
4 simple sensor, and how to evaluate its performance. As we do not know yet how
‘o calibrate intensity cameras nor how to detect image features, you willbe able to
implement the algorithms inthis section only after reading Chapters 4 and 5,
‘The basic geometry for an active triangulation system is shown in Figure 2.15. A
light projectors placed ata distance b (called baseline) from the center of projection of
Section 25 Range Data and Renge Sensors. 43
a Arnra
plane of it
<4
ihe projector
Figure 215. The basic oomety of active,
‘pte triangulation (planar X72 view)-Tho ¥
tnd y ares are perpendicular tothe pane ofthe
figure.
«pinhole camera” The center of projection isthe origin ofthe reference frame X¥2Z,
in which all the sensor's measurements are expressed. The Z axis and the camera's
‘optical axis coincide. The y and ¥, and x and X axes are respectively parallel but point
in opposite directions Let f be the focal length The projector emits a plane of ight
perpendicular to the rlane XZ and forming a controlled angle, 2, withthe XY plane
‘The ¥ axis parallel fo the plane of light and perpendicular to the pag, so that only
the profile ofthe plane of ight i shown, The intersection of the plane of ight with the
scene surfaces is planar curve ealled the stripe, which is observed by the camer In
this setup, the eoordintes ofa stripe point P =[X,¥,Z]' are given by
24
= Thefocllenghanith othe itis parameters ofthe intensity cameracan be cllbrated
with the some produrs to be wed for inter cameras (Chaps 6)
Applying this equation to all the visible tripe points, we obtain the 3-D profile
ofthe surface points under the stripe (a cross-section of the surface). We can acquire
‘multiple, adjacent proiles by advancing the objest under the stripe, or sweeping the
Stripe across the object, and repeat the computation foreach relative position of stripe
and object. The sequence of al profiles is full range image ofthe scene,
"Node that inFgue 218, e centro pjetioni roof the ren bind the rn
tho eera ame, Ti ee not alter the set ina oman, Why“
Chapter? Digital snapshots
In order to measure (x,y), We must identify the stripe points in the image. To
facilitate this task, we try to make the stripe stand ou in the image. We ean do this by
projecting laser light, which makes the stipe brighter than the ret of the image; or We
fan project a back line onto a matte white or light grey abject, so tha the only really
ark image points are the stripe's, Both solutions are popular but have drawbacks, In
the former case concavities on shiny surfaces may create reflections that confuse the
stripe detection; in the latter, stripe location may be confused by shadows, marks and
atk patches In both cases, no range data canbe obtained where the stip isinvisible
to the camera because of oclusions Sensors based on laser light ate called 3-D laser
anne and are found very fequenty in applications. real sensor, modelled closely
after the basic geometry in Figure 215, is shown in Figure 2.16
‘© Tolmit occlusions one often uses to oF more cameras 5othat he stipe is early always
visible fom a east ne camer.
254 A Simple Sensor
In order to use (2.24) we must calibrate f, b and 0. Although it is not dificult t0
devise a complete calibration procedure based on the projection equations and the
geometry of Figure 217, we present here a simple and effient method, called direct
tulibration, which doesnot require aay equations at all, Altogether we shal describe
‘smal but complete range sensor, how to calibrate i, and how to use it for measuring
range profiles of 3-D objects The algorithms require Knowledge of some simple image
processing operations that you wil be able to implement ater going through the next
three chapters.
The direct calibration procedure builés «lookup table (LUT) linking image and
5-D coordinates, Notice that tis is possible because a one-to-one correspondence ex
ists between image and 3D coordinates, thanks tothe fact thatthe stripe points are
constrained to lie in the plane of light. The LUT is built by measuring the image coor-
dinates of a grid of known XD points, and recording both image and world coordinates
foreach point; the depth values ofall other visible points are obtained by interpolation.
“The procedure uses a few rectangular blocks of known heights 5 (Figure 2.17).
‘One block (call t G) must have a number (say n) of parallel, rectangular grooves. We
assume the image size (in ites) 8 Sas * Ya
‘Algorithm RANGE. CA
‘Setup the system and reference frame a in Figure 2.17. With no object in the soene the vera
Stripe alls on Z = 0 (bckground plane) and shouldbe imaged near.
1 Place lock under the stip, wth roots perpendicular tthe stg plane Ensure the
stipe appear pall ox cosa 9)
2 Acq an image othe stipe ling on CB the ycoriates of these pins
falingon G's bighersuace (Le ots the rove) by canning the image columns
44 Compute the ooriates xy] he center ofthe stipe segments 00
{Gr top sarc, by thing the centers of the segment inthe scan y=. Exlereach
image pit x, as oresponing pois [X, ]" (known) int a table.
Section 25 Range Data and Range Sensors 45
44. Put another blockunderG, raising G's topsurtac by 8. Ensure ha the condiions ofstep 1
sillaply. Be cardul no to move the XY reference kame
S. Repeat steps 2.34 until Gs top surface isimaged nee =0,
6 Conver Tintoa-D lookup table L indexedby image coordinates," with x between
(and nu ~ 180d y between Oaad yor ~ I, and returning [XZ], To aso values
tothe pikes not measured direct interpolate neat sing the four nearest eighbor
‘The outputs a LUT inking coordinates of mage points and cordintes of sene points
Figure 2.16 A real3.D angulation system, developed at Heriot-Watt University by
A.M, Wallace an coworkers Notice the laser soure (tp let), which generates asset
beam the optical components forming the plane of laser ight (Lop mile andl)
the cameras; andthe motorized platform (httom mile) supporting the abject and
‘svepingt through he stationary pane of ight46
CChapter2. Digital snapshots
Figure 2.17. Setup for direct calibration of simple pro ange sensor.
‘And here is how to use Lto acquire # range profil
‘Algorithm RANGE_ACQ
‘The inputs the LUT, L built by RANGE_CAL.
1 Puta obec under the stripe and aeuite an image ofthe stipe fling on G.
2 Compute the image coordinates, 9] of the stripe pints byscanningeach image column,
‘XIndexL using the image coordinates,» fthe spe pont, oobtainrange points |X, 2.
‘The outputs the set of HD coordinates corresponding to the stipe pins imaged
[Notice thatthe numbers computed by such a sensor grow from the background
plane (Z = 0), not from the camer.
5 When annem blocks added to the calibration scene, the stipe should move up by atleast,
‘one or two pitti not, the calbaton will ot discriminate between Z levels. Be sure
tase the same code for peak loeation ia RANGE_ACO and RANGE. CAL! The more
sparse the calibration rid the es accurate the range values obtained by interpolationin L.
When sa range sensor beter than another fora given application? The following
list of parameters is @ bass for characterizing and comparing range sensors. Most
‘parameters apply for non-tiangulation sensors to,
Section 27 Further Readings 47
‘Basic Parameters of Range Sensors
‘Workspace the wlume of spe in which range data can be colete.
Standoff stance: he approximate distance beeen the sensor andthe workspace
‘Depth of elds: depth ofthe workspace (along 2).
Accuracy saistal variations of repeated measurements ofa know tue vale (ground
‘rut)- Accuracy pecitatons shuld include at leas the mean absolute ero, the RMS
or, andthe matimum absolute error over V measures of sane obs, wth N >
[Resolution or prcson: he smallest change in range thatthe Sensor can measure ot
represett.
Speed: the number of ange points messured pr second,
‘Sie and weight inportant in some application («ony sal sensors canbe fied on
bot ar),
‘= tis often difieu to know the actual souray of sensor without carrying out your ova
measurements A:curacy figures ae sometimes reported without specifying to wich error
they refer to (eg, RMS. absolute mea, maximum), and ote amitng the experimental
onions andthe optical properties of the suraces sed.
2.6 Summary
After working through this chapter you shouldbe able to:
explain how digital images are formed, represented and acquired
2 estimate experimentally the noise introduced in an image by an acquisition system
© explainthe concept ofintrnsicand extrinsic parameters, the most common models
of intensity cameras, and their applicability
design (but not yet implement) an algorithm for calibrating and using a complete
range sensor based on direct calibration
27 Further Readings
Ttishard to find moreon the cnntent of hichapter on ist one hook Asa rest ifyou
‘want to know more jou must be willing to do some bibliographic search. A readable
account bai opticscanbe foundinthe Feynmaa’s Lecture on Physics[4). Aclasscon
the subject and beyond i the Bor and Wolf [3]. The Bom and Wolf also coves topics
like image formation and spatial frequency filtering (though tsnot always simple to go
through). Our derivation of (2.13) is based on Horm and Sjoberg 6), Hora [5] gvesan
extensive treatment of surface refetance models Of the many, very good textbooks on
signal theory, ou favorite isthe Oppenteim, Willsky and Young [11]. The discussion
con camera models va the projection mati is sed on the appendix of Mundy and
Zisserman’s book Geometric Invariants in Computer Vision (9).
Our discussion of range sensors is largely based on Bes [1], which is « very
00d introduction tothe principles types and evaluation of range sensors. recent48 Chapter2 Digital Snapshots
Aetailed review of commercial laser scanners can be found in [14]. Two laser-based,
sctive triangulation range sensors are described in (12, 13]; the latter is based on
direct calibration, the former uses a geometric camera model, Referenoes [8] and
[2] are examples of triangulation sensors projecting patterns of lines generated using
incoherent light (as opposed to laser light) onto the scene, Krotkow [7] and Nayar and
[Nakagawa [10] make good introductions to focus- based ranging.
28 Review
Questions
21 How does an image change if the focal ent is varied?
22 Give an intuitive explanation of the season why a pio camera as an
infinite depth ofl
1 23. Use the definition of F-aumer to explain geometrically why tis quantity
measures the fraction ofthe light entering the camera which reaches the image
plane
24 Explain why the beat frequency of forescent room ight (eg, 0 Hz) ean
‘skew the res of EST NOISE,
21 25 Inensiydresholing is probably the simplest way to locate interesting ob-
jects in an image (a problem called image segmenaion). Te idea is that only
the pitels whose value is above a threshold belong to interesting object. Com
rent om the shortcomings ofthis technique, partcualy in terms of he relation
between scene radiance and image iraiance. Assuming that scene andillamina-
tion ean be contlled, what would you do to guarantee sucessul segmentation
by thresholding?
2 26 The projection marx Mis a3 4 matrix defined p to an arbitary cle
factor. Thisleaves only I o he 12 entries of T independent. On the ater hand,
ve have seen thatthe matrix canbe written in terms of 10 parameters (4intinsic
and extrisicindependent parameters) Can you gues the independent intrinsic
parameter that has been lft out? Ifyou cannot guess now, you have o wait for
Chapter
427 Esplain the problem of eamera calibration and why calibration isnecessary
atall
28 Explain why the length in milimeters ofan image line of endpoints (,93]
and [2,2 snot simpy v(x — 81)? Q2 yy. What does this formula miss?
2 29. Explain the diferene between a range and an intensity image. Could range
images be acquired using intensity cameras only (Le, no laser light othe ike)?
2.10. Explain the reason fo the word “shaded” in “cosine shaded rendering ofa
range image". What assumptions onthe illumination doesa cosine shaded image
imply? How isthe surface gradient linked to shading?
1-211 Whatisthe eason for step lin RANGE_CAL?
Section2.8 Review 49
9 212 Consider a triangulation sensor which scans a whole surface profile by
translating an object through a plane of laser light. Now imagine the surface is
scanned by macing the laser light sweep the object. In both case the camera is
stationary, Whct parts ofthe triangulation algorithm change? Why’?
5-213. The performance ofa range sensor based on (224) depend on the values of
1f.4,0. How would you define and determine “optimal” values of 9 for such
Exercises
© 24 Show that (2.1) and (22) are equivalent
© 22 Devise an experiment that checks the prediction of (213) on your own
system. Hin Usa spatially uniform object (hike a fat sheet of matte gay paper)
iuminated by perfect dfs ght. Use opis witha wide eld of view: Repeat
te experiment y averaging the acquired image overtime. What difference does
this averaging sep make?
© 23 Show tha.inthe pinhole camera mode, thee colinear points in 3-D space
are imaged int three colinear points onthe image plane.
9 24 Use the pespective projection equations toeplan why, na picture ofa face
{aken rontally and fom avery smal distance, the nose apears uch ager than
te res ofthe fice, Can this effect be reduced by acting on the foal length?
© 25 Estimate ne nie of your acquisition sytem using procedures EST NOISE
dnd AUTO, COVARIANCE.
© 26 Use the equations of section 232 to estimate the spatial alising of your
acquisition system, and devise a procedure to estimate, ough the number of
CCD elementsof your camera
© 27 Writea peogram which displays arangeimageasa normal image (grey levels
encode distance or as cosine shaded image
9 28 Derive (224) fom the geometry shown in igure 215. Hin: Use the lw of
Sines and the pinhole projection equation. Why have we chose to postion the
reference frame ain Figure 215?
© 29. We canpraict the seasvtyof measurements obtained through (2.24) by
taking partial derivatives wit espeet tothe frml's parameters, Compare sich
predictons wih respect to b and f
Projects
{© 21 You can tuld your own pinhole camera, and join the adepts of pinhole
‘photography. Perce ahole about Smm in diameter on one side ofan old tin box, 10
to Memiin depth. Spray the inside of box andl with black paint Pierce a pinhole
ina pice of thtk aluminium fol (e@, the one used for milk tops), and fix the foil,
tothe hole in th box with black tae. Ina dark room, fx piece of black and white
photographic fm on the hole in the box, and seal the box with black tape. The
‘nearer the pinhole tothe film, the wider the field of view. Cover the pinhate with