0% found this document useful (0 votes)
175 views470 pages

(Bart M. Ter Haar Romeny) Front-End Vision and Multi-Scale Image Analysis - Multi-Scale Computer Vision Theory and Applications (2003)

Uploaded by

mvannier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views470 pages

(Bart M. Ter Haar Romeny) Front-End Vision and Multi-Scale Image Analysis - Multi-Scale Computer Vision Theory and Applications (2003)

Uploaded by

mvannier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 470

Front-End Vision and Multi-Scale Image Analysis

Computational Imaging and Vision

ManagingEditor
MAX A. VIERGEVER
Utrecht University, Utrecht, The Netherlands

Editorial Board
GUNILLA BORGEFORS, Centrefor Image Analysis, SLU, Uppsala, Sweden
THOMAS S. HUANG, University of lllinois, Urbana, USA
SABURO TSUJI, Wakayama University, Wakayama, Japan

Volume 27
Front-End Vision and
Multi-Scale Image Analysis
Multi-Scale Computer Vision Theory and
Applications, written in Mathematica

by
Bart M. ter Haar Romeny
Eindhoven University of Technology,
Department of Biomedical Engineering,
Biomedical Imaging and Informatics,
Eindhoven, The Netherlands

Springer
Prof. Dr. Bart M. ter Haar Romeny
Eindhoven University of Technology
Fac. Biomedical Engineering
Dept. Image Analysis & Interpretation
5600 MB Eindhoven
Netherlands

This book is available in Mathematica, a mathematical programming language, at the following website:
http://bmia.bmt.tue.nl/Education/Courses/FEV/book/index.html

ISBN 978-1-4020-1503-8 (HB) e-ISBN 978-1-4020-8840-7 (e-book)


ISBN 978-1-4020-1507-6 (PB)

Library of Congress Control Number: 2008929603

Reprinted in 2008

c 2003 Springer Science + Business Media B.V.


No part of this work may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, microfilming, recording
or otherwise, without written permission from the Publisher, with the exception
of any material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work.
Printed on acid-free paper

9 8 7 6 5 4 3 2 1

springer.com
Table of contents
Front-End Vision and Multi-Scale Image Analysis ................................ xiii
T h e p u r p o s e of this b o o k .............................................................................. xiii
S c a l e - s p a c e t h e o r y is biologically m o t i v a t e d c o m p u t e r vision .......... xiv
This b o o k has b e e n written in M a t h e m a t i c a ........................................... xvi
A c k n o w l e d g e m e n t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
1. Apertures and the notion of scale ........................................................... 1

1.1 O b s e r v a t i o n s a n d the size of a p e r t u r e s ............................................ 1


1.2 M a t h e m a t i c s , physics, a n d vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 W e blur b y looking ................................................................................... 5


1.4 A critical v i e w on o b s e r v a t i o n s ............................................................. 9
1.5 S u m m a r y of this c h a p t e r ........................................................................ 12

2. Foundations of scale-space ....................................................................... 13


2.1 Constraints for an u n c o m m i t t e d front-end ........................................ 13
2.2 A x i o m s of a visual front-end ................................................................. 15
2.2.1 D i m e n s i o n a l analysis ........................................................... 15
2.2.2 T h e c o o k i n g of a turkey ...................................................... 16
2.2.3 R e y n o l d ' s n u m b e r ................................................................. 18
2.2.4 Rowing: m o r e o a r s m e n , higher s p e e d ? ........................ 19
2.3 A x i o m a t i c derivation of the G a u s s i a n kernel ................................... 21
2.4 S c a l e - s p a c e from causality ................................................................... 23
2.5 S c a l e - s p a c e from e n t r o p y m a x i m i z a t i o n ........................................... 25
2.6 D e r i v a t i v e s of s a m p l e d , o b s e r v e d d a t a ............................................. 27
2.7 S c a l e - s p a c e s t a c k .................................................................................... 31
2.8 S a m p l i n g the s c a l e - a x i s ......................................................................... 32
2.9 S u m m a r y of this c h a p t e r ........................................................................ 35
3. T h e Gaussian kernel ...................................................................................... 37
3.1 T h e G a u s s i a n kernel ............................................................................... 37
3.2 N o r m a l i z a t i o n ............................................................................................. 38
3.3 C a s c a d e property, selfsimilarity ........................................................... 39
3.4 T h e s c a l e p a r a m e t e r ............................................................................... 40
3.5 Relation to g e n e r a l i z e d functions ........................................................ 40
3.6 S e p a r a b i l i t y ................................................................................................. 43
3.7 Relation to binomial coefficients .......................................................... 43
3.8 T h e Fourier transform of the G a u s s i a n kernel ................................ 44
3.9 Central limit t h e o r e m ............................................................................... 46
3.10 A n i s o t r o p y ................................................................................................ 48
3.11 T h e diffusion e q u a t i o n .......................................................................... 49
3.12 S u m m a r y of this c h a p t e r ..................................................................... 50
vi Front-end vision and multi-scale image analysis

4. Gaussian derivatives ..................................................................................... 53


4.1 Introduction ................................................................................................ 53
4.2 S h a p e and algebraic structure ............................................................. 53
4.3 Gaussian derivatives in the Fourier d o m a i n .................................... 57
4.4 Z e r o crossings of Gaussian derivative functions ........................... 59
4.5 The correlation b e t w e e n Gaussian derivatives ............................... 60
4.6 Discrete Gaussian kernels .................................................................... 64
4.7 Other families of kernels ........................................................................ 65
4.8 Higher d i m e n s i o n s and separability ................................................... 67
4.9 S u m m a r y of this c h a p t e r ........................................................................ 69
5. Multi-scale derivatives: implementations ............................................ 71
5.1 I m p l e m e n t a t i o n in the spatial d o m a i n ................................................ 71
5,2 Separable implementation ..................................................................... 73
5.3 Some examples ........................................................................................ 74
5.4 N-dim Gaussian derivative o p e r a t o r i m p l e m e n t a t i o n .................... 78
5.5 I m p l e m e n t a t i o n in the Fourier d o m a i n ............................................... 79
5.6 Boundaries ................................................................................................. 83
5.7 A d v a n c e d topic: s p e e d concerns in ........................ M a t h e m a t i c a 85
5.8 S u m m a r y of this chapter ........................................................................ 89
6. D i f f e r e n t i a l s t r u c t u r e o f i m a g e s ................................................................ 91
6.1 The differential structure of i m a g e s .................................................... 91
6.2 Isophotes and flowlines .......................................................................... 92
6.3 C o o r d i n a t e s y s t e m s and transformations ......................................... 96
6.4 Directional derivatives ............................................................................. lO2
6.5 First o r d e r g a u g e coordinates .............................................................. 103
6.6 G a u g e coordinate invariants: e x a m p l e s ........................................... 108
6.6.1 Ridge detection ..................................................................... lO8
6.6.2 Isophote and flowline curvature in g a u g e coord ......... 110
6.6.3 Affine invariant corner detection ...................................... 113
6.7 A curvature illusion .................................................................................. 115
6.8 S e c o n d o r d e r structure ........................................................................... 117
6.8.1 The Hessian matrix and principal curvatures .............. 119
6.8.2 The shape index .................................................................... 120
6.8.3 Principal directions ............................................................... 122
6.8.4 Gaussian and m e a n curvature ......................................... 123
6.8.5 Minimal and z e r o Gaussian curvature surfaces .......... 126
6.9 Third o r d e r i m a g e structure: T-junction detection .......................... 127
6.10 Fourth o r d e r i m a g e structure: junction detection ......................... 131
6.11 Scale invariance and natural coordinates ...................................... 132
6.12 Irreducible invariants ............................................................................. 134
Intermezzo: T e n s o r notation ........................................................ 135
6.13 S u m m a r y of this c h a p t e r ..................................................................... 136
Table of contents vii

7. Natural limits on observations 137 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.1 Limits on differentiation: scale, accuracy and o r d e r ...................... 137


7.2 S u m m a r y of this c h a p t e r 141
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8. Differentiation and regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143


8.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Regular t e m p e r e d distributions and test functions ........................ 144
8.3 An e x a m p l e of regularization ................................................................ 147
8.4 Relation regularization ~ Gaussian scale-space ....................... 148
8.5 S u m m a r y of this c h a p t e r ........................................................................ 152
9. The front-end visual system - the retina ............................................... 153
9.1 Introduction ................................................................................................ 153
9.2 Studies of vision ....................................................................................... 154
9.3 The eye ........................................................................................................ 156
9.4 The retina .................................................................................................... 157
9.1 Retinal receptive fields ........................................................................... 160
9.6 Sensitivity profile m e a s u r e m e n t of a receptive field ...................... 162
9.7 S u m m a r y of this c h a p t e r ........................................................................ 165
10. A scale-space model for the retinal sampling ................................. 167
10.1 The size and spatial distribution of receptive fields .................... 167
10.2 A scale-space model for the retinal receptive fields ................... 172
10.3 S u m m a r y of this c h a p t e r ..................................................................... 177
11. The front-end visual system - LGN and c o r t e x ................................ 179
11.1 The t h a l a m u s .......................................................................................... 179
11.2 The lateral geniculate nucleus (LGN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
11.3 Corticofugal c o n n e c t i o n s to the LGN .............................................. 183
11.4 The primary visual cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
11.4.1 Simple cells .......................................................................... 187
11.4.2 C o m p l e x cells ...................................................................... 188
11.4.3 Directional selectivity ......................................................... 189
11.5 Intermezzo: M e a s u r e m e n t of neural activity in the brain .......... 191
E l e c t r o - E n c e p h a l o g r a p h y (EEG) ................................................ 191
M a g n e t o - E n c e p h a l o g r a p h y (MEG) ............................................ 192
Functional MRI (fMRI) .................................................................... 193
Optical imaging with v o l t a g e sensitive dyes ............................ 194
Positron Emission T o m o g r a p h y (PET) 194 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.6 S u m m a r y of this c h a p t e r ..................................................................... 195


12. The front-end visual system - cortical c o l u m n s ............................. 197
12.1 H y p e r c o l u m n s and orientation structure ........................................ 197
12.2 Stabilized retinal i m a g e s ...................................................................... 200
12.3 The c o n c e p t of local sign .................................................................... 202
12.4 Gaussian derivatives and E i g e n - i m a g e s ........................................ 204
viii Front-end vision and multi-scale image analysis

12.5 Plasticity a n d s e l f - o r g a n i z a t i o n .......................................................... 208


12.6 H i g h e r cortical visual a r e a s ................................................................ 210
12.7 S u m m a r y of this c h a p t e r ..................................................................... 211
12.8 Vision d i c t i o n a r y ..................................................................................... 211
12.8.1 F u r t h e r r e a d i n g on the w e b : ............................................ 212
13 Deep structure I. watershed segmentation ........................................ 215
13.1 Multi-scale m e a s u r e m e n t s .................................................................. 215
13.2 S c a l e selection ........................................................................................ 216
13.3 N o r m a l i z e d f e a t u r e d e t e c t i o n ............................................................. 218
13.4 A u t o m a t i c s c a l e selection .................................................................... 219
13.4.1 ,~-Normalized s c a l e selection .......................................... 220
13.4.2 Is this really d e e p structure? ........................................... 220
13.5 E d g e f o c u s i n g ......................................................................................... 221
13.5.1 Simplification f o l l o w e d b y focusing ............................... 221
13.5.2 Linking in 1D ........................................................................ 222
13.6 Follicle d e t e c t i o n in 3D u l t r a s o u n d ................................................... 225
13.6.1 Fitting spherical h a r m o n i c s to 3 D points ..................... 229
13.7 Multi-scale s e g m e n t a t i o n ..................................................................... 231
13.7.1 Dissimilarity m e a s u r e in s c a l e - s p a c e ........................... 231
13.7.2 W a t e r s h e d s e g m e n t a t i o n ................................................ 232
13.7.3 Linking of r e g i o n s ............................................................... 234
13.7.4 T h e m u l t i - s c a l e w a t e r s h e d s e g m e n t a t i o n ................... 237
13.8 D e e p structure a n d n o n l i n e a r diffusion ........................................... 239
13.8.1 N o n - l i n e a r diffusion w a t e r s h e d s e g m e n t a t i o n ........... 239
14. Deep structure II. catastrophe theory .................................................. 241
14.1 C a t a s t r o p h e s a n d singularities .......................................................... 241
14.2 E v o l u t i o n of i m a g e singularities in s c a l e - s p a c e ........................... 242
14.3 C a t a s t r o p h e t h e o r y b a s i c s .................................................................. 243
14.3.1 F u n c t i o n s .............................................................................. 243
14.3.2 C h a r a c t e r i z a t i o n of points ................................................ 243
14.3.3 Structural e q u i v a l e n c e ...................................................... 244
14.3.4 Local c h a r a c t e r i z a t i o n of f u n c t i o n s ............................... 244
14.3.5 T h o m ' s t h e o r e m .................................................................. 245
14.3.6 G e n e r i c p r o p e r t y ................................................................. 246
14.3.7 D i m e n s i o n a l i t y ..................................................................... 246
14.3.8 Illustration of the c o n c e p t s .............................................. 247
14.4 C a t a s t r o p h e t h e o r y in s c a l e - s p a c e .................................................. 250
14.4.1 G e n e r i c e v e n t s for differential o p e r a t o r s .................... 251
14.4.2 G e n e r i c e v e n t s for o t h e r differential o p e r a t o r s ......... 254
14.4.3 A n n i h i l a t i o n s a n d c r e a t i o n s ............................................. 255
14.5 S u m m a r y of this c h a p t e r ..................................................................... 256
Table of contents ix

15. Deep structure III. topological numbers ............................................. 257


15.1 Topological numbers ............................................................................ 257
15.1.1 Topological numbers in scale-space ............................ 258
15.1.2 Topological number for a signal .................................... 259
15.1.3 Topological number for an image ................................. 259
15.1.4 The winding number on 2D images ............................. 260
15.2 Topological numbers and catastrophes 263 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.3 The deep structure toolbox ................................................................. 265


15.3.1 Detection of singularities .................................................. 265
15.3.2 Linking of singularities ...................................................... 265
15.3.3 Linking of contours ............................................................. 268
15.3.4 Detection of catastrophes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
15.3.5 General discrete geometry approach .......................... 269
15.4 From deep structure to global structure 271 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15.4.1 Image representations ...................................................... 271


15.4.2 Hierarchical pro-segmentation ....................................... 272
15.4.3 Perceptual grouping .......................................................... 273
15.4.4 Matching and registration ................................................ 274
15.4.5 Image databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
15.4.6 Image understanding ........................................................ 275
15.5 Summary of this chapter ..................................................................... 275
16. Deblurring Gaussian b l u r ..........................................................................277
16.1 Deblurring .................................................................................................277
16.2 Deblurring with a scale-space approach ........................................ 277
16.3 Less accurate representation, noise and holes ........................... 281
16.4 S u m m a r y of this chapter ..................................................................... 284
17. Multi-scale optic f l o w 285
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17.1 Introduction ..............................................................................................285


17.2 Motion detection with pairs of receptive fields .............................. 286
17.3 Image deformation by a discrete vectorfield ................................. 289
17.4 The optic flow constraint equation .................................................... 290
17.5 Scalar and density images .................................................................. 292
17.6 Derivation of multi-scale optic flow constraint equation ............ 292
17.6.1 Scalar images, normal flow ............................................. 296
17.6.2 Density images, normal flow .......................................... 301
17.7 Testing the optic flow constraint equations ................................... 303
17.8 Cleaning up the vector field ................................................................ 305
17.9 Scale selection ........................................................................................307
17.10 Discussion ............................................................................................. 309
17.11 Summary of this chapter ....................................................................310
x Front-end vision and multi-scale image analysis

18. Color differential structure ....................................................................... 311


18.1 Introduction .............................................................................................. 311
18.2 Color i m a g e formation and color invariants ................................... 311
18.3 Koenderink's Gaussian derivative color model ............................ 314
18.4 I m p l e m e n t a t i o n ....................................................................................... 320
18.5 C o m b i n a t i o n with spatial constraints ............................................... 325
18.6 S u m m a r y of this c h a p t e r ..................................................................... 327
19. Steerable kernels .......................................................................................... 329
19.1 Introduction .............................................................................................. 329
19.2 Multi-scale orientation .......................................................................... 330
19.3 Orientation analysis with Gaussian derivatives ............................ 331
19.4 Steering with self-similar functions ................................................... 332
19.5 Steering with Cartesian partial derivatives .................................... 336
19.6 Detection of stellate tumors ................................................................ 338
19.7 Classical papers and student tasks ................................................. 342
19.8 S u m m a r y of this c h a p t e r ..................................................................... 343
20. S c a l e - t i m e ........................................................................................................ 345
20.1 Introduction .............................................................................................. 345
20.2 Analysis of p r e r e c o r d e d t i m e - s e q u e n c e s ....................................... 346
20.3 Causal time-scale is logarithmic ....................................................... 349
20.4 Other derivations of logarithmic scale-time ................................... 351
20.5 Real-time receptive fields .................................................................... 353
20.6 A scale-space m o d e l for time-causal receptive fields ................ 354
20.7 Conclusion ............................................................................................... 359
20.8 S u m m a r y of this c h a p t e r ..................................................................... 360
21. Geometry-driven diffusion ........................................................................ 361
21.1 Adaptive S m o o t h i n g and Image Evolution ..................................... 361
21.2 N o n l i n e a r Diffusion Equations ........................................................... 362
21.3 The P e r o n a & Malik Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
21.4 S c a l e - s p a c e i m p l e m e n t a t i o n of the P&M equation ..................... 366
21.5 The P&M equation is ill-posed ........................................................... 370
21.6 Von N e u m a n n stability of numerical PDE's ................................... 372
21.7 Stability of Gaussian linear diffusion .............................................. 373
21.8 A practical e x a m p l e of numerical stability ...................................... 376
21.9 Euclidean shortening flow ................................................................... 378
21.10 Grayscale invariance .......................................................................... 379
21.11 Numerical e x a m p l e s shortening flow ............................................ 379
21.12 Curve Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
21.13 Duality b e t w e e n PDE- and curve evolution ................................. 383
21.14 M a t h e m a t i c a l M o r p h o l o g y ................................................................. 386
21.15 Mathematical m o r p h o l o g y on g rayvalued i m a g e s .................... 389
21.1 6 Mathematical m o r p h o l o g y versus scale-space .......................... 390
21.17 S u m m a r y of this c h a p t e r ................................................................... 390
Table of contents xi

22. Epilog ................................................................................................................. 393

A. I n t r o d u c t i o n t o M a t h e m a t 395
i c a ....................................................................

A.1 Quick overview of using Mathematica 395 ..............................................

A.2 Quick overview of the most useful c o m m a n d s ............................... 397


A.3 Pure functions 401
...........................................................................................

A.4 Pattern matching 401


......................................................................................

A.5 Some special plot forms ........................................................................ 404


A.6 A faster w a y to read binary 3D data .................................................. 405
A.7 What often goes wrong 407 .........................................................................

A.8 Suggested reading .................................................................................. 410


A9. W e b resources 412
.........................................................................................

B. The c o n c e p t o f c o n v o l u t i o n ....................................................................... 413


B,1 Convolution 413
................................................................................................

B.2 Convolution is a product in the Fourier domain ............................. 416


C. I n s t a l l i n g the book and packages ........................................................... 419
C.1 Content 419
.......................................................................................................

C,2 Installation for all systems .................................................................... 420


C,3 Viewing the book in the Help Browser .............................................. 420
C.4 Sources of additional applications 421 .....................................................

D. First Start with Mathematica: T i p s & T r i c k s 423 .......................................

1. Evaluation 423
.....................................................................................................

2. Images 423
...........................................................................................................

3. Programming ............................................................................................... 424


4.3D 424
....................................................................................................................

References .............................................................................................................. 425


Index .......................................................................................................................... 455
Front-End Vision and Multi-Scale
Image Analysis
Bart M. ter Haar Romeny, PhD

Department of Biomedical Engineering


Eindhoven University of Technology
The Netherlands

Tell me, and 1 will forget. Show me, and I will remember.
Involve me, and 1 will understand.
Old Chinese proverb

The purpose of this book

Scale is not an important parameter in computer vision research. It is an essential parameter.


It is an immediate consequence of the process of observation, of measurements. This book is
about scale, and its fundamental notion in computer vision, as well as human vision.

Scale-space theory is the theory of apertures, through which we and machines observe the
world. The apertures come in an astounding variety. They can be exploited to model the first
stages of human vision, and they appear in all aspects of computer vision, such as the
extraction of features, the measurement of optic flow and stereo disparity, to do orientation
analysis, segmentation, image enhancement etc. They have an essential role in the
fundamental processes of differentiation and regularization.

Scale-space theory is named after the space that is formed by looking at an image at many
different scales simultaneously.

When stacked, we get one dimension extra, i.e. the scale dimension. The scale-space is the
space of the spatial and scale dimensions, see figure 1.

This book is a tutorial course. The level of the book is undergraduate and first level graduate.
Its main purpose is to be used as a coursebook in computer vision and front-end vision entry
courses. It may also be useful as an entry point for research in biological vision.

Although there are excellent texts appearing on the notion of scale space, most of them are
not easy reading for people just entering this field or lacking a solid mathematical
background. This book is intended partly to fill this gap, to act as an entry point for the
xiv The purpose of this book

growing literature on scale-space theory. Throughout the book we will work steadily through
the necessary mathematics.

The book discusses the many classical papers published over the last two decades, when
scale-space theory became mature. The different approaches and findings are put into
context. First, linear scale-space theory is derived from first principles, giving it a sound
mathematical basis.

The notion that a multi-scale approach is a natural consequence of the process of observation
is interwoven in every chapter of this book. E.g. Horn and Schunck’s famous optic flow
equation gets a new meaning when we ’look at the data’. The concept of a point and local
point operators like the derivative operator diffuse into versions with a Gaussian extent,
making the process of differentiation well posed. It immediately makes large and mature
fields like differential geometry, invariant theory, tensor analysis and singularity theory
available for analysis on discrete data, such as images.

We develop ready-to-use applications of differential invariants of second, third and fourth


order. The relation between accuracy, differential order and scale of the operator is
developed, and an example of very high order derivatives is worked out in the analytical
deblurring of Gaussian blur.

Practical examples are also developed in the chapters on multi-scale optic flow and multi-
scale differential structure of color images. Again, the physics of the observation process
forces the analytical solution to be multi-scale. Several examples of ways to come to proper
scale-selection are treated underway.

Figure 1. A scale-space of a sagittal MR image. The image is blurred with a Gaussian kernel
with variable width o, which is the third dimension in this image.

Scale-space theory is biologically motivated computer vision


We consider it very important to have many cross-links between multi-scale computer vision
theory, and findings in the human (mammalian) visual system. We hope the reader will
appreciate the mutual cross-fertilization between these fields. For that reason we elaborate
the current state of the art in neurophysiological and psychophysical findings of the first
stages of the visual system.
Front-End Vision and Multi-Scale Image Analysis xv

The chapters on time-scale and multi-scale orientation analysis are directly inspired by
findings from biological vision. The grouping of local properties into meaningful larger
subgroups (perceptual grouping) is treated both on the level of establishing neighborhood
relationships through all measured properties of the points, and through

the study of the deep structure of images, where topology comes in as a mathematical toolkit.
The natural hierarchical ordering is exploited in a practical application, where we discuss
multi-scale watershed segmentation.

This book is meant to be a practical and interactive book. It is written as a series of


notebooks in Mathematica 4, a modern computer algebra language/system.

For every technique discussed the complete code is presented. The reader can run and adapt
all experiments himself, and learn by example and prototyping. The most effective way to
master the content is to go through the notebooks on a computer running Mathematica and
play with variations.

This book is a tribute to Jan Koenderink, professor at Utrecht University in the Netherlands,
chairman of the Physics Department ’Physics of Man’. He can be considered the ’godfather’
of modern scale-space theory. A brilliant physicist, combining broad knowledge on human
visual perception with deep insight in the mathematics and physics of the problems.

Figure 2. Prof. dr. hon.dr. Jan Koenderink.

This book is just a humble introduction to his monumental oeuvre and the offspin of it. Many
papers he wrote together with his wife, Ans van Doom. They published on virtually every
aspect of front-end vision and computer vision with a strong perceptually based inspiration,
and the physical modeling of it.

This book is written for both the computer vision scientist with an interest in multi-scale
approaches to image processing, and for the neuroscientist with an appeal for mathematical
modeling of the early stages of the visual system. One of the purposes of this book is to
bridge the gap between both worlds. To accommodate a wide readership, both from physics
and biology, sometimes mathematical rigor is lacking (but can be found in the indicated
references) in favor of clarity of the exposition.
xvi Scale-space theory is biologically motivated computer vision

Figure 3. Attendants of the first international Scale-Space conference, Summer 1997 in


Utrecht, the Netherlands, chaired by the author (standing fourth from right).

Figure 4. Attendants of the second international Scale-Space conference, Summer 1999 in


Corfu, Greece, chaired by Mads Nielsen, PhD (IT-Univ. Copenhagen, foreground fifth from
right). See for the conference series: www.scalespace.org.

This book has been written in Mathematica


This book is written as a series of Mathematica notebooks. Mathematica is a high level
interactive mathematical programming language, developed and marketed by Stephen
Wolfram (www.wolfram.com). Notebooks are interactive scientific documents, containing
both the text as the code.

Mathematica consists of a two separate programs, the kernel (the computing engine) and a
front-end which handles all information for and from the user. The structure of ’cells’ in the
front-end enables the efficient mix of explaining text, computer code and graphics in an
intuitive way. The reasons to write this book in Mathematica are plentiful:

- We can now do mathematical prototyping with computer vision principles/techniques on


images. The integration of both symbolic and fast numerical capabilities, and the powerful
pattern matching techniques make up for a new and efficient approach to apply and teach
computer vision, more suitable for human mathematical reasoning. For, computer vision is
mathematics on images. It is now easy to do rapidprototyping.

- Reading a scientific method now has something extra: the code of every method discussed
is available, ready for testing, with modifications and applications to the reader’s images. The
Front-End Vision and Multi-Scale Image Analysis xvii

gap between the appreciation of a method in a theoretical paper and one’s own working
software that applies the method is now closing. David Donoho writes about his WaveLab
package: "An article about computational science in a scientific publication is not the
scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the
complete software development environment and the complete set of instructions which
generated the figures. (http://www-stat.stanford.edu/~wavelab/).

- M a t h e m a t i c a stays close to the traditional notation. The mathematical notation of e.g.


symbols, operators and rules are virtually identical to traditional math

The notebooks are WYSIWYG. Notebooks can easily be saved as LaTeX or


HTML/MathML documents. Notebooks are portable ASCII documents, they appear virtually
identical on a wide range of computer platforms.

- All programs become compact. In this book no example exceeds 20 lines of code. There are
no f o r , w h i l e or d o loops. Most commands are L i s t ~ l e , i.e. operate on any member
of the operand list. The language contains hardly abbreviations, and is so intuitive that
learning the language may be mastered during reading the book. In the appendix a list of
tutorial books on M a t h e m a t i c a is given, and a summary of the command structure of the
most popular commands used in this book.

-Wolfram Research Inc. indicates that over 2 million licenses are sold. It may serve as a
(WWW-based) starting set in exchangeable Mathematica computer vision routines.

- Mathematica is complete. Over 2500 highly optimized mathematical routines are on board,
which relieves the computer vision programmer from searching for routines in Numerical
Recipes, IMSL etc. It has graphical capabilities for 1D to 4D (animations). It is now
integrated with Java (JLink), which is available anywhere and ideally suited for further
development of the GUI and real-time manipulation with the data. Mathematica code can be
compiled for further speed increase.

The parallel version of Mathematica now enables the easy distribution over a range of
kernels on different computers on a network.

- Last but not least: the present version of M a t h e m a t i c a is fast. From release 4 it has reached
a point where, from being an ideal rapid prototyping tool, it is now turning into an all-round
prototyping a n d application tool. The run-time of most experiments described in this book is
within fractions of seconds to tens of seconds on a typical 1.7 GHz 256 MB Pentium IV
system under Windows.

It is platform independent, and is available on any type of computer.


xviii Acknowledgements

Acknowledgements
The contents of this book is based on work of: Jan J. Koenderink, Luc M. J. Florack, Tony
Lindeberg, Wiro J. Niessen, Alfons H. Salden, Mads Nielsen, Jon Sporring, Ole F. Olsen,
Jan-Mark Geuzebroek, Erik Dam, Peter Johansen, Avan Suinesiaputra, Hans J. Blom, Bram
van Ginneken, Stiliyan Kalitzin, Joes Staal, Robert Maas, Max Viergever, Antonio L6pez
Pefia, Nico Karssemeijer, Stephen Pizer, Dave Eberley, Jean-Philippe Thirion, Stephen
Zucker, Wim van de Grind, Arts Koenderink-van Doom, Alex Frangi, Arjan Kuijper, Remco
Duits, Bram Platel, Frans Kanters, Taizo Iijima, Ted Adelson, Pietro Perona, Jitendra Malik,
David Mumford, Luis Alvarez, Markus van Almsick, William Freeman, Izumi Ohzawa,
Ralph Freeman, Russell DeValois, David Hubel, Torsten Wiesel, Semir Zeki, Erik Kandel,
Amiram Grinvald, Brian Wandell, Robert Rodieck and many more, as well asmany of the 3D
Computer Vision / Image Sciences Institute master’s students.
The careful proofreading and remarks by Michel Bister were very helpful. Markus van
Almsick gave many Mathematica hints, insights and valuable help with editing. I thank my
wife Hetty, for the patience and loving support during the endeavor to write this book.
This book originated from coursenotes of the annual graduate course on "Multiscale Image
Analysis and Font-End Vision" at the Image Sciences Institute of Utrecht University
1996-2000 and at Faculty of Biomedical Engineering at Eindhoven University of
Technology in the Netherlands since 2001. This course has also been given at a
summerschool at CIMAT in Guanajuato, Mexico, at IMPA in Rio de Janeiro, Brazil, at the
Multimedia University in Cyberjaya, Malaysia and as a tutorial at several international
conferences (VBC 98, CVPR 1999, MICCAI 2001). The author was kindly hosted for two
months at the new IT University at Copenhagen, Denmark in the summer of 2000 to get a lot
of the writing done. The highly competent and kind atmosphere of the collaborating labs
there considerably contributed to the pleasure it was to write and to program this interactive
book. I learned a lot by doing multi-scale computer vision, and I hope the reader will do the
same.

Eindhoven - Utrecht - Copenhagen, February 2003

Email: [email protected]
1. Apertures and the notion of
scale
Nothing that is seen is perceived at once in its entirety.
Euclid (~300 B.C.), Theorem I

1.1 Observations and the size of apertures


Observations are always done by integrating some physical property with a measurement
device. Integration can be done over a spatial area, over an amount of time, over wavelengths
etc. depending on the task of the physical measurement. For example, we can integrate the
emitted or reflected light intensity of an object with a CCD (charge-coupled device) detector
element in a digital camera, or a grain in the photographic emulsion in a film, or a
photoreceptor in our eye. These ’devices’ have a sensitive area, where the light is collected.
This is the aperture for this measurement. Today’s digital cameras have several million
’pixels’ (picture elements), very small squares where the incoming light is integrated and
transformed into an electrical signal. The size of such pixels/apertures determines the
maximal sharpness of the resulting picture.

An example of integration over time is sampling of a temporal signal, for example with an
analog-digital converter (ADC). The integration time needed to measure a finite signal is the
size of the temporal aperture. We always need afinite integration area or afinite integration
time in order to measure a signal. It would be nice to have infinitely small or infinitely fast
detectors, but then the integrated signal is zero, making it useless.

Looking with our visual system is making measurements. When we look at something, we
have a range of possibilities to do so. We can look with our eyes, the most obvious choice.

We can zoom in with a microscope when things are too small for the unaided eye, or with a
telescope when things are just very big.The smallest distance we can see with the naked eye
is about 0.5 second of arc, which is about the distance between two neighboring cones in the
center of our visual field. And, of course, the largest object we can see fills the whole retina.

It seems that for the eye (and any other measurement device) the range of possibilities to
observe certain sizes of objects is bounded on two sides: there is a minimal size, about the
size of the smallest aperture, and there is a maximal size, about the size of the whole detector
array.
2 1.1 Observations and the size of apertures

Spatial resolution is defined as the diameter of the local integration area. It is the size of the
field of view divided by the number of samples taken over it. The spatial resolution of a
Computer Tomography (CT) scanner is about 0.5 mm, which is calculated from the
measurement of 512 samples over a field of view with a diameter of 25 cm.

The temporal resolution of a modem CT scanner is about 0.5 second, which is 2 images per
second.

It seems that we are always trying to measure with the highest possible sharpness, or highest
resolution. Reasons to accept lower resolution range from costs, computational efficiency,
storage and transmission requirements, to the radiation dose to a patient etc. We can always
reduce the resolution by taking together some pixels into one, but we cannot make a coarse
image into a sharp one without the introduction of extra knowledge.

The resulting measurement of course strongly depends on the size of the measurement
aperture. We need to develop strict criteria that determine objectively what aperture size to
apply. Even for a fixed aperture the results may vary, for example when we measure the
same object at different distances (see figure 1.1).

< < FrontEndVision'FEV';


Show[GraphicsArray[
{{Import["cloudl.gif"]}, {Import["cloud2.gif"]}}], ImageSize~400];

Figure 1.1 A cloud observed at different scales, simulated by the blurring of a random set of
points, the 'drops'. Adapted from [Koenderink1992a].

1.2 Mathematics, physics, and vision


In mathematics objects are allowed to have no size. We are familiar with the notion of
points, that really shrink to zero extent, and lines of zero width. No metrical units (like
meters, seconds, amperes) are involved in mathematics, as in physics.

Neighborhoods, like necessary in the definition of differential operators, are taken into the
limit to zero, so for such operators we can really speak of local operators. We recall the
definition for the derivative of f(x): limh~0 f(x+h)-f(x)
h
where the limit makes the operation
confined to a mathematical point.

In physics however this is impossible. We saw before that objects live on a bounded range of
scales. When we measure an object, or look at it, we use an instrument to do this observation
1. Apertures and the notion of scale 3

(our eye, a camera) and it is the range that this instrument can see that we call the scale
range. The scale range is bounded on two sides:
- the smallest scale the instrument can see is the inner scale. This is the smallest sampling
element, such as a CCD element in a digital camera, rod or cone on our retina;

- the largest scale the instrument can see is the outer scale. This is the field of view. The
dimension is expressed as the ratio between the outer scale and the inner scale, or how often
the inner scale fits into the outer scale. Of course the bounds apply both to the detector and
the measurement: an image can have a 2D dimension of 256 x 256 pixels.

Dimensional units are essential in physics: we express any measurement in dimensional


units, like: 12 meters, 14.7 seconds, 0.02 c a n d e l a / m 2 etc. W h e n we measure (observe,
sample) a physical property, we need to choose the ’stepsize’ with which we should
investigate the measurement. W e scrutinize a microscope image in microns, a global satellite
image in kilometers. In measurements there is no such thing as a physical ’point’: the smallest
’point’ we have is the physical sample, which is defined as the integrated weighted
measurement over the detector area (which we call the aperture), where area is always finite.

How large should the sampling element be? It depends on the task at hand in what scale
range we should measure: "Do we like to see the leaves or the tree"? The range of scales
applies not only to the objects in the image, but also to the scale of the features. In chapter 5
we discuss in detail many such features, and how they can be constructed. We give just one
example here: in figure 1.2 we see a hierarchy in the range o f scales, illustrated here for a
specific feature (the gradient).

im = I m p o r t [ " U t r e c h t 2 5 6 . g i f " ] [ [ l , 1]];


Block[{$DisplayFunction = Identity},
pl = L i s t D e n s i t y P l o t [ i m ] ;
p2 =

ListDensityP1ot[~gD[im, 1, O, #]= +gDEim, 0, 1, #l=] 9 {1, 2, 4}];


Show[GraphicsArray[Prepend[p2, pl]], ImageSize ~ 500];

Figure 1.2 Picture of the city of Utrecht. The right three pictures show the gradient: the
strength of borders, at a scale of 1, 2 resp. 4 pixels. At the finest scale we can see the
contours of almost every stone, at the coarsest scale we see the most important edges, in
terms of outlines of the larger structures. We see a hierarchy of structures at different scales.
The Mathematica code and the gradient will be explained in detail in later chapters.

To expand the range say o f our eye we have a wide armamentarium of instruments available,
like scanning electron microscopes and a Hubble telescope. The scale range known to
4 1.2 Mathematics, physics, and vision

humankind spans about 50 decades, as is beautifully illustrated in the book (and movie)
"Powers of Ten" [Morrison1985].

Show[Import["PowersoflOsel.gif"], ImageSize ~ 500];

Figure 1.3 Selection of pictures from the journey through scale from the book
[Morrison1985], where each page zooms in a factor of ten. Starting at a cosmic scale, with
clusters of galaxies, we zoom in to the solar system, the earth (see the selection above), to a
picknicking couple in a park in Chicago. Here we reach the 'human' (antropometric) scales
which are so familiar to us. We then travel further into cellular and molecular structures in the
hand, ending up in the quark structure of the nuclear particles. For the movie see:
http://www.micro.magnet.fsu.edu/primer/java/scienceopticsu/powersof10/index.html.

In vision we have a system evolved to make visual observations of the outside world. The
front-end of the (human) visual system is defined as the very first few layers of the visual
system. Here a special representation of the incoming data is set up where subsequent
processing layers can start from. At this stage there is no memory involved or cognitive
process.

Later we will define the term ’front-end’ in a more precise way. We mean the retina, lateral
geniculate nucleus (LGN, a small nucleus in the thalamus in our mid-brain), and the primary
visual cortex in the back of our head. In the chapter on human vision we fully elaborate on
the visual pathway.

The front-end sampling apparatus (the receptors in the retina) is designed just to extract
multi-scale information. As we will see, it does so by applying sampling apertures, at a wide
range of sizes simultaneously.

There is no sampling by individual rods and cones, but by well-structured assemblies of rods
and cones, the so-called 'receptive fields'.

In chapters 6 - 9 we will study the neuroanatomy of the human front-end visual system in
more detail. The concept of a receptive field was introduced in the visual sciences by
1. Apertures and tile notion of scale 5

Hartline [Hartline1940] in 1940, who studied single fibre recordings in the horseshoe crab
(Limulus polyphemus).

Psychophysically (psychophysics is the art of measuring the performance of our perceptual


abilities through perceptual tasks) it has been shown that when viewing sinusoidal gratings
of different spatial frequency the threshold modulation depth is constant (within 5%) over
more than two decades.

This indicates that the visual system is indeed equipped with a large range of sampling
apertures. Also, there is abundant electro-physiological evidence that the receptive fields
come in a wide range of sizes. In the optic nerve leaving each eye one optic-nerve-fibre
comes from one receptive field, not from an individual rod or cone.

In a human eye there are about 150 million receptors and one million optic nerve fibres.So a
typical receptive field consists of an average of 150 receptors. Receptive fields form the
elementary ’multi-scale apertures’ on the retina. In the chapter on human vision we will study
this neuroanatomy in more detail.

1.3 We blur by looking


Using a larger aperture reduces the resolution. Sometimes we exploit the blurring that is the
result of applying a larger aperture. A classical example is dithering, where the eye blurs the
little dots printed by a laser printer into a multilevel greyscale picture, dependent on the
density of the dots (see figure 1.4).

It nicely illustrates that we can make quite a few different observations of the same object (in
this case the universe), with measurement devices having different inner and outer scales. An
atlas, of course, is the canonical example.

Show[GraphicsArray[{Import["Floyd0.gif"], Import["Floydl.gif"]}],
ImageSize -> 330];

Figure 1.4 Dithering is the representation of grayvalues through sparse printing of black dots
on paper. In this way a tonal image can be produced with a laserprinter, which is only able to
print miniscule identical single small high contrast dots. Left: the image as we observe it, with
grayscales and no dithering. Right: Floyd-Steinberg dithering with random dot placements.
[From http://sevilleta.unm .edu/~bmilne/khoros/html-dip/c3/s7/front-page.html].
6 1.3 We blur by looking

A priori we have to decide on how large we should take the inner scale. The front-end vision
system has no knowledge whatsoever of what it is measuring, and should be open-minded
with respect to the size of the measurement aperture to apply.

Show[Import["wales-colordither.gif"], ImageSize -> 400];

Figure 1.5 En example of color-dithering in image compression. Left: the original image, 26
KByte. Middle: color dithering, effective spreading of a smaller number of color pixels so that
the blurring of our perception blends the colors to the same color as in the original. Filesize
16 Kbyte. Right: enlargement of a detail showing the dithering. From http://www.digital-
foundry.com/gif/workshop/dithering.shtml.

As we will see in the next section, the visual front-end measures at a multitude of aperture
sizes simultaneously. The reason for this is found in the world around us: objects come at all
sizes, and at this stage they are all equally important for the front-end.

Show[Import["Edlef Romeny - cherry trees.jpg"], ImageSize-> 280];

Figure 1.6 In art often perceptual clues are used, like only coarse scale representation of
image structures, and dithering. Painting by Edlef ter Haar Romeny [TerHaarRomeny2OO2b].

i m = Import["mona lisa face.gif"][[l, 111;


imrl = Table[Plus@@ P l u s @ @ Take[im, {y, y + 9), ~x, x + 9}],
{y, 1, 300, 10}, {x, 1, 200, 10}I;
imr2 = T a h l e [ P l u s @ @ P l u s @ @ T a k e [ i m , {y, y + 14}, {x, x + 9 } ] ,
{y, I, 290, 15}, {x, I, 200, 10}I;
DisplayTogetherArray[ListDensityPlot/@ {im,
Join@@Tahle[MapThread[Join, Table[imr2imrl[[y, x]], {x, i, 20}]],
{y, i, 30}]}, ImageSize ~ 250];
1. Apertures and the notion of scale 7

Figure 1.7 Image mosaic of the Mona Lisa. Image resolution 200x300 pixels. The image is
subsampled to 20x30 samples, whose mean intensity modulates a mosaic of the
subsampled images.

And that, in a natural way, leads us to the notion of multi-scale observation, and multi-scale
representation of information, which is intrinsically coupled to the fact that we can observe
in so many ways. The size of the aperture of the measurement will become an extra
continuous measurement dimension, as is space, time, color etc. We use it as a free
parameter: in first instance we don’t give it a value, it can take any value.

Task 1.1 Experiment with dithering with circular disks of proper size in each
pixel. Calculate the area the disk occupies. Some example code to get sta~ed:
Show[Graphics[Table[Disk[{x,y},.3+im[[y,x]]/2048],{y,1,128}
,{x,l,128}]],AspectRatio~Automatic] ;

9 Task 1.2 Experiment with dithering with randomly placed small dots in each
pixel.

Mosaics, known since Roman times, employ this multiresolution perceptive effect. There is
also artistic interest in replacing a pixel by a complete image (see e.g. figure 1.7). When
random images with appropriate average intensity and color (and often intensity gradient) are
chosen the technique is called an image mosaic.

Task 1.3 One can play with other graphical elements, e.g. text ( BasicBIock:-->
(Text["FEV", #1,#2]&) ) etc. Note that despite the structure in the dithering
elements, we still perceive the large scale structure unchanged in depth.

It turns out that there is a very specific reason to not only look at the highest resolution. As
we will see in this book, a new world opens when we consider a measurement of the outside
world at all these sizes simultaneously, at a whole range of sharpnesses. So, not only the
smallest possible pixel element in our camera, but a camera with very small ones, somewhat
8 1.3 We blur by looking

larger ones, still larger ones and so on. It turns out that our visual system takes this approach.
The stack of images taken at a range of resolutions is called a scale-space.

Another interesting application of dithering is in the generation of random dot stereograms


(RDS), see figure 1.8.

options [RDSPIot] = (BasicBlock :-~ (Rectangle [#1 - #2, #1 + #2] &) };


RDSPIot [expr_, {x_, xmin , xmax_}, {y_, ymin_, ymax_}, o p t s _ _ ] :=
B l o c k [ { p t s = 120, periods = 6, zrange = {-1, 1}, density = .4, depth = 1,
b a s i c b l o c k = B a s i c B 1 o c k /. {opts) /. Options [RDSPlot], guides = True,
strip, xpts, ypts, dx, dy, xval, yval, zmin, zmax, e x p r n o r m },
{zmin, zmax} = zrange~ {xpts, ypts} = If[Length [pts] == 2, pts, {pts, pts}]
dy = (ymax - ymin) / ypts ; dx = (xmax - xmln) / xpts ; strip = Floor [xpts / periods ] dx ;
e x p r n o r m = ( .25 depth (xmax - xmin) / (periods (zmax - zmin) ) ) *
(Max[zmin, Min[zmax, expr]] - (zmax § zmin) /2)F
Graphics [ {RDSArray [basicblock , [dx, dy} / 2, Flatten [Table [I f [Random [ ] < density,
T h r e a d [ { r d s l m a g e s [exprnorm /. y ~ y v a l , {x, xval, xmax, strip}], yval}], (}],
{yval, ymin § .5 dy, ymax, dy}, {xval, xmin + .5 dx,
rdsimage [exprnorm /. y ~ yval, {x, xmin, strip)] , dx} ] , 2] ] ,
If [guides , makeguides [{ .5 xmax + .5 xmin, 1.1 ymln - . 1 ymax}, .5 strip] , {}] } ,
Sequence @ @ S e l e c t [ { opts }, ! M e m b e r Q [First /@ Options [RDSPIot ] , First [# ] ] & ] ] ] ;

rdsimage [ e x p r , {x_, xval_, dx } ] : = xval § dx - N [expr /. x ~ xval § dx / 2 ] ;


rdslmages [expr_, {x_, xval_, xmax_, dx_}] := I If[xval ~ xmax,
Prepend [rdsimages [expr, {x, rdsimage [expr, {x, xval, dx}], xmax, dx}], xval], {}] ;
makeguides [pos_, size_] : = Apply [Rectangle,
M a p [ p o s + s i z e % & , {{{-i.i, -.i}, {-.9, .i}}, ({.9, -.i}, {1.1, .i}}}, {2)], i];
Unprotect [Display ] ; Display [channel_, graphics_ ? ( ! F r e e Q [ #, R D S A r r a y ] &) ] : = ( D i s p l a y [
channel, graphics /. (RDSArray [basicblock_, dims_, pts_] :v (basicblock [ #, dims ] & /@ pts))] ;
graphics ) ; Protect [Display] ;

Show[RDSelot. [- ~ 2 x Exp [- x2 +y2


- - ~ - - - , ] {x, - 3 , 3), {y, - 3 , 3 } ] ,

ImageSize ~ 400] ;

Figure 1.8 Random dot stereogram (of the first derivative with respect to x of the Gaussian
function, a function which we will encounter frequently in this book). The dots are replaced by
a random draw from the letters A-Z.
Code by Bar-Natan [Bar-Natan1991, www.ma.huji.ac.il/~drorbn/].
See also www.ccc.nottingham.ac.uk/~etzpc/sirds.html. Look with both eyes to a point behind
the image, so the dots under the figure blend together. You will then see the function in
depth.
1. Apertures and the notion of scale 9

See also the peculiar paintings of the Italian painter Giuseppe Arcimboldo (1527-1593). See
www.illumin.co.uk/svank/biog/arcim/arcidx.html.

S h o w [I m p o r t [ " V e r t u m n u s . jpg" l , I m a g e S i z e -> 170] ;

Figure 1.9 Vertumnus (Rudolph II) by Giuseppe Arcimboldo (ca. 1590). Painting in the
Skoklosters Slott, Stockholm, Sweden.

1.4 A critical view on observations

Let us take a close look at the process of observation. We note the following:

9 Any physical observation is done through an aperture. By necessity this aperture has to be
finite. If it would be zero size no photon would come through. We can modify the
aperture considerably by using instruments, but never make it of zero width. This leads to
the fundamental statement: We cannot measure at infinite resolution. We only can
perceive a ’blurred’ version of the mathematical abstraction (infinite resolution) of the
outside world.

9 In a first ’virginal’ measurement like on the retina we like to carry out observations that
are uncommitted. With uncommitted we mean: not biased in any way, and with no model
or any a priori knowledge involved. Later we will fully incorporate the notion of a model,
but in this first stage of observation we know nothing.

An example: when we know we want to observe vertical structures such as stems of trees, it
might be advantageous to take a vertically elongated aperture. But in this early stage we
cannot allow such special apertures.

At this stage the system needs to be general. We will exploit this notion of being
uncommitted in the sequel of this chapter to the establishment of linear scale-space theory.

It turns out that we can express this ’uncommitment’ into axioms from which a physical
theory can be derived. Extensions of the theory, like nonlinear scale-space theories, follow in
a natural way through relaxing these axioms.
10 1.4 A critical view on observations

i Being uncommitted is a natural requirement for the first stage, but not for further stages,
where extracted information, knowledge of model and/or task etc. come in. An example:
the introduction of feedback enables a multi-scale analysis where the aperture can be
made adaptive to properties measured from the data (such as the strength of certain
derivatives of the data). This is the field of geometry-driven diffusion, a nonlinear scale-
space theory. This will be discussed in more detail after the treatment of linear scale-space
theory.

Show[Import ["DottedPR.gif"] , ImageSize -> 380] ;

9 I 9 lira ~ Io l| , J ,~
9 g .,’. *.....:. 9 . . .
i 9 I ll ~ i 9 "a t~t P’" t !

, 9 o 9o o,o.% .’, ,, ' ***.o,,, ,~ '~, , ,9 * '$, 9 ,


I!| ~d iI

~I~ ii( l 9 iI 9 9 9 i! 9
.';.'.,.~,.:, 1%, .;x,.to .,,.....to 9 ,,o;,.,,. ~ .,,. -,,o. r oo ~

9r ; ;,',, '%,.,',' t t ,'l~-"~,~"'l ,,


I i* Ie ee e r 4 @ I

Figure 1.10 At different resolutions we see different information. The meaningful information in
this image is at a larger scale then the dots of which it is made. Look at the image from about
2 meters. Source: dr. Bob Duin, Pattern Recognition Group, Delft University, the Netherlands.

i A single constant size aperture function may be sufficient in a controlled physical


application. An example is a picture taken with a camera or a medical tomographic
scanner, with the purpose to replicate the pixels on a screen, paper or film without the
need for cognitive tasks like recognition. Note that most man-made devices have a single
aperture size. If we need images at a multiple of resolutions we simply blur the images
after the measurement.

4, The human visual system measures at multiple resolutions simultaneously, thus


effectively adding scale or resolution as a measurement dimension. It measures a scale-
space L(x, y; o-), a function of space (x, y) and scale o-, where L denotes the measured
parameter (in this case luminance) and o- the size of the aperture. In a most general
observation no a priori size is set, we just don’t know what aperture size to take. So, in
some way control is needed: we could apply a whole range of aperture sizes if we have no
preference or clue what size to take.

i When we observe noisy images we should realize that noise is always part of the
observation. The term ’noisy image’ already implies that we have some idea of an image
with structure ’corrupted with noise’. In a measurement noise can only be separated from
the observation if we have a model of the structures in the image, a model of the noise, or
a model of both. Very often this is not considered explicitly.
1. Apertures and the notion of scale 11

i m = T a b l e [ I f [ l l < x < 3 0 & & l l < y < 30, i, 0] + 2 R a n d o m [ I , {x, 40}, {y, 40}];
ListDensityPlot[im, FrameTicks -> False, ImageSize -> 120];

Figure 1.11 A square with additive uniform pixel-uncorrelated noise. Jagged or straight
contours? 'We think it is' or 'it looks like' a square embedded in the noise. Without a model
one really cannot tell.

9 When it is given that objects are human-made structures like buildings or otherwise part
of computer vision’s ’blocks wodd’, we may assume straight or smoothly curved contours,
but often this is not known.

9 Things often go wrong when we change the resolution of an image, for example by
creating larger pixels.

9 If the apertures (the pixels) are square, as they usually are, we start to see blocky
tesselation artefacts. In his famous paper "The structure of images" Koenderink coined
this spurious resolution [Koenderink1984a], the emergence of details that were not there
before, and should not be there. The sharp boundaries and fight angles are artefacts of the
representation, they certainly are not in the outside world data. Somehow we have created
structure in such a process. Nearest neighbour interpolation (the name for pixel
replication) is of all interpolation methods fastest but the worst. As a general rule we want
the structure only to decrease with increasing aperture.

Show[Import["Einsteinblocky.gif"], ImageSize -> 120];

Figure 1.12 Spurious resolution due to square apertures. Detail of a famous face: Einstein
Much unintended 'spurious' information has been added to this picture due to the sampling
process. Intuitively we take countermeasures for such artefacts by squeezing our eyes and
looking through our eyelashes to blur the image, or we look from a greater distance.

9 In the construction of fonts and graphics anti-aliasing is well known: one obtains a much
better perceptual delineation of the contour if the filling of the pixel is equivalent to the
physical integration of the intensity over the area of the detector. See figure 1.13 for a font
example.
12 1.4 A critical view on observations

Show[Import["anti_alias.gif"], ImageSize -> 250];

Figure 1.13 Anti-aliasing is the partial volume effect at the boundaries of contours. When
making physically realistic test images for computer vision applications it is essential to take
this sampling effect into account.

1.5 Summary of this chapter


Observations are necessarily done through a finite aperture. Making this aperture
infinitesimally small is not a physical reality. The size of the aperture determines a hierarchy
of structures, which occur naturally in (natural) images. With the help of instruments
(telescopes, microscopes) we are able to see a scale-range of roughly 1050 . The visual
system exploits a wide range of such observation apertures in the front-end simultaneously,
in order to capture the information at all scales. Dithering is a method where the
blending/blurring through an observation with a finite aperture is exploited to create
grayscale and color nuances which can then be created with a much smaller palet of colors.

Observed noise is part of the observation. There is no way to separate the noise from the data
if a model of the data, a model of the noise or a model of both is absent. Without a model
noise is considered input which also contains structural geometric information, like edges,
corners, etc. at all scales.

The aperture cannot take any form. An example of a wrong aperture is the square pixel so
often used when zooming in on images. Such a representation gives rise to edges that were
never present in the original image. This artificial extra information is called ’spurious
resolution’. In the next chapter we derive from first principles the best and unique kernel for
an uncommitted observation.
2. Foundations of scale-space
"There are many paths to the top o f the mountain,
but the view is always the same" -Chinese proverb.

2.1 Constraints for an uncommitted front-end

To compute any type of representation from the image data, information must be extracted
using certain operators interacting with the data. Basic questions then are: Which operators
to apply? Where to apply them? How should they look like? How large should they be?

Suppose such an operator is the derivative operator. This is a difference operator, comparing
two neighboring values at a distance close to each other. In mathematics this distance can
indeed become infinitesimally small by taking the limit of the separation distance to zero, but
in physics this reduces to the sampling distance as the smallest distance possible. Therefore
we may foresee serious problems when we deal with such notions as mathematical
differentiation on discrete data (especially for high order), and sub-pixel accuracy.

From this moment on we consider the aperture function as an operator: we will search for
constraints to pin down the exact specification of this operator. We will find an important
result: for an unconstrained front-end there is a unique solution for the operator. This is the
1 x2
Gaussian kernel g(x; o-) - 24TY~- e J - , with o- the width of the kernel. It is the same bell-
shaped kernel we know from probability theory as the probability density function of the
normal distribution, where o- is the standard deviation of the distribution.

Interestingly, there have been many derivations of the front-end kernel, all leading to the
unique Gaussian kernel.

This approach was pioneered by Iijima (figure 2.2) in Japan in the sixties [Iijima19621, but
was unnoticed for decades because the work was in Japanese and therefore inaccessible for
Western researchers.

Independently Koenderink in the Netherlands developed in the early eighties a rather


complete multi-scale theory [Koenderink1984a], including the derivation of the Gaussian
kernel and the linear diffusion equation.

<< FrontEndVision'FEV';
14 2.1 Constraints for an uncommitted front-end

1 x2
o = 1 ; Plot [ _ _ E 2~ 9{x,-4, 4}, I m a g e S i z e - > 200];

-4 2 2 4

Figure 2.1 The Gaussian kernel with unit standard deviation in 1D.

Koenderink was the first to point out the important relation to the receptive field families in
the visual system, as we will discuss in forthcoming chapters. Koenderink’s work turned out
to be monumental for the development of scale-space theory. Lindeberg pioneered the field
with a tutorial book [Lindeberg1994a]. The papers by Weickert, Ishikawa and lmija (who
together discovered this Japanese connection) present a very nice review on these early
developments [Weickert1997a, Weickert1999a].

Show[Import["Iijima.gif"], ImageSize -> 1 5 0 ] ;

Fig. 2.2 Prof. Taizo lijima, emeritus prof. of Tokyo Technical University, Japan, was the first
to publish the axiomatic derivation of 'the fundamental equation of figure'.

We will select and discuss two fundamentally different example approaches to come to the
Gaussian kernel in this book:

1. An axiomatic approach based on dimensional analysis and the notion of having ’no
preferences’ (section 2.2);
2. An approach based on the maximization of local entropy in the observation (section 2.5);
2. Foundations of scale-space 15

2.2 A x i o m s of a visual front-end

The line of reasoning presented here is due to Florack et al. [Florack1992a]. The
requirements can be stated as axioms, or postulates for an uncommitted visual front-end. In
essence it is the mathematical formulation for being uncommitted: "we know nothing", or
"we have no preference whatsoever".

9linearity: we do not allow any nonlinearities at this stage, because they involve knowledge
of some kind. So: no knowledge, no model, no memory;

9 spatial s h i f t i n v a r i a n c e : no preferred location. Any location should be measured in the


same fashion, with the same aperture function;

9 isotropy: no preferred orientation. Structures with a particular orientation, like vertical


trees or a horizontal horizon, should have no preference, any orientation is just as likely. This
necessitates an aperture function with a circular integration area.

9 scale invariance: no preferred size, or scale of the aperture. Any size of structure, object,
texture etc. to be measured is at this stage just as likely. We have no reason to look only
through the finest of apertures. The visual world consists of structures at any size, and they
should be measured at any size.

In order to use these constraints in a theory that sets up the reasoning to come to the aperture
profile formula, we need to introduce the concept of dimensional analysis.

2.2.1 Dimensional analysis

Every physical unit has a physical dimension.

It is this that mostly discriminates physics from mathematics. It was Baron Jean-Baptiste
Fourier who already in 1822 established the concept of dimensional analysis [Fourier1955].
This is indeed the same mathematician so famous for his Fourier transformation.

Show[Import["Fourier.jpg"], ImageSize ~ 140];

Figure 2.3 Jean-Baptiste Fourier, 1792-1842.


16 2.2 Axioms of a visual front-end

Fourier described the concept of dimension analysis in his memorable work entitled "Thforie
analyfique de la chaleur" [Fourier1955] as follows: "It should be noted that each physical
quantity, known or unknown, possesses a dimension proper to itself and that the terms in an
equation cannot be compared one with another unless they possess the same dimensional
exponent".

When a physicist inspects a new formula he invariably checks first whether the dimensions
are correct. It is for example impossible to add meters to meters/second. One of the most
fundamental laws in physics is that the physical laws should be rock solid, independent of a
chosen description, anywhere and anytime. This law is the law o f scale invariance, which
indicates that we have full freedom of reparametrization:

[Law of Scale Invariance] Physical laws must be independent of the choice of


fundamental parameters.

’Scale invariance’ here refers to the notion of scale with respect to dimensional units
(remember the microns, kilometers or milliseconds as the aperture size of the measurement
instrument).

In essence the law of scale invariance states that the left and right part of the equation of a
physical equation should have the same dimensional units, and they should describe the same
process, whether expressed in Cartesian or polar coordinates.

Core in dimensional analysis is that when the dimensions in a complex physical system are
considered, only a limited number of dimensionless combinations can be made: the basis or
null-space of the system. It is an elegant and powerful tool to find out basic relations
between physical entities, or even to solve a problem. It is often a method of first choice,
when no other information is available. It is often quite remarkable how much one can
deduct by just using this technique. We will use dimensional analysis to establish the
expression defining the basic linear isotropic scale-space kernel. First some examples which
illustrate the idea.

2.2.2 The cooking of a turkey

Show[Import["Turkey.gif"l, ImageSize -> 150];

This example is taken from the delightful paper by Geoffrey West [West1988]. When
cooking a turkey, or a goose, there is the problem of knowing how long to cook the bird in
the oven, given the considerable variation that can exist in its weight and size.

Many (inferior) cookbooks specify simply something like ’20 minutes per pound’, implying a
linear relation. There are superior cookbooks, however, such as the ’Better Homes and
2. Foundations of scale-~pace 17

Gardens Cookbook’ that recognize the nonlinear nature of this relationship. In figure 1.4 we
have adapted the graph from this cookbook, showing a log-log plot of the cooking time as a
function of the weight of the turkey. The slope of the linear relation is about 0.6. It turns out
that we can predict this relation just by dimensional analysis.

data = {{5, 3}, {7, 3.8}, {i0, 4.5}, {14, 5}, {20, 6}, {24, 7}};
LogLogListPlot[data, PlotJoined -> False,
T i c k s - > {{5, i0, 15, 20}, {3, 5, 7}}, F r a m e - > T r u e ,
F r a m e L a b e l - > {"Cooking time (hrs)", "Weight (lb)"},
FlotStyle -> FointSize[O.02], ImageSize -> 220];

3 ~o
' ~o . . . . ,5 . . . . 2'~
C~kingtime (hr~)

Figure 2.4. Turkey cooking time as a (nonlinear) function of turkey weight. The slope of the
log-log plot is about 0.6. (Based on a table in Better Homes and Gardens Cookbook, Des
Moines: Meridith Corp., 1962, p. 272).

Let us list the physical quantities that are involved:


- the temperature distribution inside the turkey T, in degrees Kelvin,
- the oven temperature To (both measured relative to the outside air temperature), in degrees
Kelvin,
- the bird density p in k g / m 3 ,
- the diffusion coefficient of the turkey x from Fourier’s diffusion equation for T:
02 02 a2
~ - = KA T where A is the Laplacian operator ~ + ~ + a~-, in m 2 / s e c ,
- the weight of the turkey W, in kg,
- and the time t in seconds.
In general, for the dimensional quantities in this problem, there will be a relationship of the
form T = f(To, W, t, p, K). We can make a matrix of the units and their dimensions:

m = {{0, 0, 0, O, -3, 2},


{o, o, o, i, o,-i}, {o, o, i, o, i, o}, {i, i, o,o, o, o}};
TableForm[m, TableHeadings -> {{"length", "time", "mass", "degree"},
{"TO", 'T", "W", "t", 'O', 'x"}}]

TO T W t p K
length 0 0 0 0 -3 2
time 0 0 0 1 0 -i
mass 0 0 1 0 1 0
degree 1 1 0 0 0 0

A matrix can be described with a basis spanned by basis vectors, whose linear combinations
satisfy the matrix equation m. x == 0 . The command N u l l S p a c e gives us the list of
basis vectors:
18 2.2 Axioms of a visual front-end

NullSpaee[m]

{[0, 0, -2, 3, 2, 3}, {-I, i, 0, 0, 0, 0}}

The (famous) Pi-theorem in dimensional analysis by Buckingham (see


http://www.treasure-troves.com/physics/BuckinghamsPiTheorem.html) states that one can
make as many independent dimensionless combinations of the physical variables in the
system under study as the number of basis vectors of the nullspace of the dimension matrix
ra. These are determined by the nullspace.

So, for the turkey problem we can only construct two independent dimensionless quantities
pZ t 3 x3 ~ - A T
(just fill in the exponents given by the basis vectors): T ~.u T0"

T
So, from the nullspace vector { -I, 1,0,0,0,0 } we found To and from
p2 fl ,,c3
{0 , 0 , - 2 , 3 , 2 , 3 } we found T . Because both these quantities are dimensionless one
must be expressible in the other, giving the relationship: W = f . Note that since the
lefthand side is dimensionless, the arbitrary function f must be a dimensionless function of a
dimensionless variable. This equation does not depend on the choice of units, since
dimensionless units remain invariant to changes in scale: the scale invariance.
The graph in the cookbook can now be understood: when geometrically similar birds are
considered, cooked to the same temperature distribution at the same oven temperature, there
will be the following scaling law: / w2
~ ~3 = constant. If the birds have the same physical
characteristics, which means the same p and x, we find that t 3 - p2zrx3 , so the cooking time
t is proportional to W 2/3 which nicely explains the slope.

2.2.3 Reynold's number

From [Olver1993] we take the example of the Reynold’s number. We study the motion of an
object in some fluid.

As physical parameters we have the resistance D of the object (in kg ), the fluid density p
(in m~- ), the velocity relative to the fluid v (in m ), the object diameter d (in m) and the fluid
viscosity/2 (in mk---L
s
). The dimension matrix becomes then:

m: {{-3, i, i, -I, -I}, {0, - i , O, - i , - 2 } , {Z, O, O, I , Z } } ;


TableForm[m,
T a b l e H e a d i n g s - > {{"meter", "second", "kg"}, {"O", "v", "d", "~", "D"}}]

D v d ~ D
meter -3 1 1 -1 -i
second 0 -i 0 -i -2
kg 1 0 0 1 1

We calculate the nullspace:

NullSpace [m]

{{-I, -2, O, O, i } , {-i, -i, -i, 1, 0}}


2. Foundations of scale-space ]9

From the nullspace we easily find the famous Reynolds number: R = pvd.
/1
The other
dimensionless entity ~p v2 is the friction factor.

2.2.4 Rowing: more oarsmen, higher speed?


Show[Import["Rowerswanted.gif"], ImageSize ~ 210];

Another illuminating example is the problem of the number of oarsmen in a competition


rowing boat: Do 8 oarsmen need less time to row a certain distance, say 2000 meter, then a
single skiffer, despite the fact that the water displacement is so much bigger? Let’s study the
physics again: We first find the relation for the drag force F on a ship with length l moving
with velocity v through a viscous fluid with viscosity/~ and density p.

The final term to take into account in this physical setup is the gravity g. Again we can make
a dimensional matrix for the six variables involved:

m: { { 1 , i , 1, - i , - 3 , i } , { - 2 , O, - i , - 1 , O, - 2 } , (1, O, O, i , 1, 0 } } ;
TableForm[m, TableHeadings ->
{ ( " m e t e r " , " s e c o n d " , "kg"}, {"F", "I", "v", "~", "p", "g"}}]

F 1 v ~ p g
meter 1 1 1 -i -3 1
second -2 0 -i -i 0 -2
kg 1 0 0 1 1 0

Figure 2.5 Dimensional matrix for the physics of drag of an object through water. F is the
drag force, / resp v are the length resp. the velocity of the ship, # is the viscosity of the
water.

and study the nullspace:

NullSpace[m]

{{0, i , - 2 , O, O, 1}, ( - i , 2, 2, O, i , 0}, { - i , i , i , i , O, 0}}

rowdatal = {{i, 6.88}, (2, 6.80}, {4, 6.60}, {8, 5.80}};


rowdata2 = {{i, 7.01}, {2, 6.85}, {4, 6.50}, {8, 5.85}};
rowdata3 = {{i, 7.04}, {2, 6.85}, {4, 6.40}, {8, 5.95}};
rowdata4 = {{i, 7.10}, {2, 6.95}, {4, 6.50}, {8, 5.90}};
20 2.2 Axioms of a visual front-end

M u l t i p l e L i s t P l o t [rowdatal, r o w d a t a 2 ,
r o w d a t a 3 , r o w d a t a 4 , T i c k s -> { {1, 2, 4, 8}, A u t o m a t i c } ,
AxesLabel -> { "# o f \ n o a r s m e n " , "Time f o r \ n 2 0 0 0 m (min) "},
P l o t J o i n e d - > True, PlotRange-> { A u t o m a t i c , {5, 8}}] ;

Timefor
2 0 0 0 i n (mil~)

7~

55 #of

1 2 4 8 ~a~men

Figure 2.5. The results of best US regatta rowing (2000 m) of Summer 2000 for different
numbers of oarsmen. The slope of the graph is about -1/9. Source: http://rowingresults.com/

122
The dimensionless units are: ~ (Froude’s number), ~ (the pressure coefficient) and
lF~ (the Poisseuille coefficient). So we have F =f(/@, ~_)orF.~pv212f wheref
is a dimensionless number. The power E produced by the n oarsmen together to overcome
the drag force F is given by F v. Thus E = F v = p v 312f.~n because E is directly
proportional to n.

The weight W of a ship is proportional to the volume of displaced water (Archimedes law),
F ~ I which means that larger ships have advantages, because, for
so W ~ l 3 . This implies W-F
similar bodies, the ratio ~- decreases as the size of the ship increases. We know p = 1 for
water and W ~ l 3 (Archimedes again) and W ~ n in good approximation, we find v3 ~ n 1/3
so v .~ n 1/9 . So eight oarsmen indeed go faster, though little, than less oarsmen.

There are several nice references to the technique of dimensional analysis [West1988,
Pankhurst1964a, diver1993], often with quite amusing examples, some of which were
presented in this section.

Archimedes ’ Number FroudeNumber Monin - ObukhovLength


BinghamNumber Grashof Number Nusselt Number
Biot Number Internal Froude Number P4clet Number
Boussinesq Number Mach Number Prandtl Number
CriticalRayleighNumber Magnetic Reynolds Number RayleighNumber
Ekman Number Mason Number Richardson Number
Fresnel Number Moment of Inertia Ratio TimescaleNumber

Figure 2.6. A list of famous dimensional numbers. From Eric Weisstein's World of Physics.
URL: http://sciencew•r•d•w••fram•c•m/physics/t•pics/UnitsandDimensi•na•Ana•ysis•htm••

Many scaling laws exist. In biology scaling laws have become a powerful technique to find
surprising relations (see for a delightful easy-to-read overview the book by McMahon and
Bonner [McMahon1983] and the classical book by Thompson [Thompson1942]).
2. Foundations of scale-space 21

2.3 Axiomatic derivation of the Gaussian kernel


The dimensional analysis discussed in the previous section will now be used to derive the
Gaussian kernel as the unique solution for the aperture function of the uncommitted visual
front-end.

We do the reasoning in the Fourier domain, as this turns out to be easier and leads to smaller
equations. We give the theory for 2D. We will see that expansion to other dimensionalities is
straightforward. We use scripted symbols for variables in the Fourier domain. We consider
’looking through an aperture’. The matrix m and the nullspace become:

m= { { Z , - Z , - 2 , - 2 } , {0, O, Z, Z}};
TableForm[m,
TableHeadings-> { { " m e t e r " , " c a n d e l a " } , {"a", "~", "LO", "L"}} l

a ~ LO L
meter 1 -i -2 -2
eandela 0 0 1 1

Figure 2.8 Dimensional matrix for the physics of observation through an aperture.

Nu11Spaee [m]
{ { 0 , O, - i , i}, { i , i , O, 0 } }

were o- is the size of the aperture, to the spatial coordinate (frequency in the Fourier domain),
-s the luminance of the outside world (in candela per square meter: c d / m 2 ), and .s the
luminance as processed in our system. The two dimensionless units ~ and o-~o can be
expressed into each other: s = ~(o-to), where ~ is the kernel (filter, aperture) function in
the Fourier domain to be found (the Gaussian kernel we are after). We now plug in our
constraints, one by one.
No preference for location, together with the prerequisite for linearity, leads to the
recognition of the process as a convolution. The aperture function is shifted over the whole
image domain, with no preference for location: any location is measured (or filtered,
observed) with the same aperture function (kernel, template, filter, receptive field: all the
same thing). This is written for the spatial domain as:

L(x, y) = L0(x, y) | G(x, y) =- f~_~Lo(u, v) G(x - u, y - v) d u d v

In the Fourier domain, a convolution of functions translates to a regular product between the
Fourier transforms of the functions: -s toy) =-s toy) . ~(tox, toy)

The axiom of isotropy translates into the fact that we now only have to consider the length
I1~11 of our spatial frequency vector w = {Wx, ~Oy}: ~o= I1~11 = ~/~ox 2 +COy2, This is a
scalar.

The axiom of scale-invariance is the core of the reasoning: when we observe (or blur) an
observed image again, we get an image which is blurred with the same but wider kernel:
22 2.3 Axiomatic derivation of the Gaussian kernel

~(r (7"1)~(r 0 - 2 ) : ~(0)O-1 -I-O)0-2). Only the exponential function is a general solution of
this equation: G ( ~ 0-) = exp ((o~ ~o 0-)P) where a and p are some arbitrary constants.
W e must raise the argument here to some power p because we are dealing with the
dimensionless parameter ~ 0-. In general, we don’t know ~ or p , so we apply the following
constraint: isotropy.
The dimensions are independent, thus separable: [] ~ 0 - ]] = (~oi 0-)~1 + (~ol o-)e2 + . - - where
the ~i are the basis unit coordinate vectors. Recall that the vector
( 9 = wx ~x + ~Oy Oy + ~ox ~z) in the Fourier domain is the set of spatial frequencies in the
spatial dimensions. The magnitude of II ~ 0- II is calculated by means of Pythagoras f r o m the
projections along Oi, so we add the squares: p = 2. We further demand the solution to be
real, so a 2 is real. W e notice that when we open the aperture fully, we blur everything out,
1
so limo-~o+0 ~(~o0-) -~ 0 . This means that a 2 must be negative. We choose o~ = - ~ - . As we
will see, this (arbitrary) choice gives us a concise notation of the diffusion equation. So we
1 0-2 ~o2). W e go to the spatial domain with the inverse Fourier
get: ~ ( ~ , 0-) = e x p ( - ~-
transform:

C l e a r [0] ;
G 2 ~02
g [x_, G_] = Simplify[rnverseFourierTrans form [Exp [- - - 7 - - ], ~, x], a > O]

x2

The last notion to use it that we want a normalized kernel. The integrated area under the
curve must be unity:

x2
Simplify[Integrate[ e - ~ 9 {x, -co, co}], G > O]
a

W e divide by this factor, so we finally find for the kernel: G(2, 0-) = ~ 1 e x p ( - ~-~2
).~~
This is the Gaussian kernel, which is the Green’s function of the linear, isotropic diffusion
equation ~~163 + ~~ = L~x + Lyy = ~ t ’ where t = 2 0-2 is the variance.

Note that the ’derivative to scale’ in the diffusion equation (as it is typically called) is the
derivative to 20 -2 , which also immediately follows from a consideration of the
dimensionality of the equation. The variance t has the dimensional unit of m 2 . The original
image is the boundary condition of the diffusion: it starts ’diffusing’ from there.Green’s
functions are named in honor of the English mathematician and physicist George Green
(1793-1841).

So from the prerequisites ’we know nothing’, the axioms from which we started, we have
found the Gaussian kernel as the unique kernel fulfilling these constraints. This is an
important result, one of the cornerstones in scale-space theory. There have been more ways
in which the kernel could be derived as the unique kernel. Weickert [Weickert1997a] gives a
2. Foundations of scale-space 23

systematic and thorough overview of the historical schemes that have been published to
derive the Gaussian.

2.4 Scale-space from causality


Koenderink presented in his famous paper "The structure of images" [Koenderink1984a] the
elegant and concise derivation of the linear diffusion equation as the generating partial
differential equation for the construction of a scale-space.

The arguments were taken from the physics of causality: when we increase the scale and blur
the image further, we have the situation that the final blurred image is completely caused by
the image we started from.

The previous level of scale is the cause of events at the next level. We first discuss the
situation in 1D.

Clear[f] ; f[x_] := Sin[x] + S i n [ 3 x ] ; gr = Plot[f [x], {x, -3, 3}, E p i l o g - >


(Arrow[[x, fix]}, {x, f[x] +Sign[f' '[x] ] .5}] /. Solve[f' [x] ==0, x]),
A x e s L a b e l -> { "x", "intensity" }, ImageSize -> 200] ;

intensity

’i
x

Figure 2.9 Under causal blurring signals can only change in the direction of less structure.
Generation of new structure is impossible, so the signal must always be closed to above
(seen from both sides of the signal). The arrows indicate the direction the intensity moves
under blurring.

A higher level of scale contains always less structure. It is physically not possible that new
structure is being generated. This is one of the most essential properties of a scale-space. We
will encounter this property again when we consider nonlinear scale-spaces in chapter 19.

The direction of the arrows in figure 2.9 are determined by the fact if the extremum is a
maximum or a minimum. In a maximum the intensity is bound to decrease, in a minimum
the intensity is bound to increase. The second order derivative determines the curvature of
the signal, and the sign determines whether the function is locally convex (in a maximum) or
concave (in a minimum). We have the following conditions:

02u #u
maximum: ~ < O, ~T < O, intensity always decreasing;
aZu 0u
minimum: ~ > O, ~T > O, intensity always increasing.

These conditions can be summarized by ~~u ~au- > O.


24 2.4 Scale-space from causality

The most important property to include next is the requirement of linearity: the second order
derivative to space ~02u is linearly related to the first order derivative to scale ~5-,
Ou
so:
02u
-oL ~ . We may resample any scale axis in such a way that o~= 1 so we get
9~- ~. This is the 1D linear isotropic diffusion equation, an important result. The
x2
Green’s function of the linear diffusion equation is the Gaussian kernel ~ el 2~2 , which
means that any function upon which the diffusion is applied, is convolved with this Gaussian
kernel.

We can check (with t = ~ - o-2 , the double = = means equation, test of equality, not
assignment):

Clear[x, t] ; Ox x (, .21 (,
~t
E-~ == Ot - - E- // S i m p l i f y

True

Also any spatial derivative of the Gaussian kernel is a solution. We test this for the first order
derivative:

Clear[x,
([
t] ; Ox, x Ox ~ t1 E-~
)) [[i "-II
--=or o. 4~;,t

True

9 Task 2.1 Show that this holds true for any order of derivative, including mixed
derivatives for 2- or higher dimensional Gaussians.

In 2D and higher dimensions the reasoning is the same. Again we demand the function to be
closed to the top. No new structure can emerge.

The requirement for the sign of the second order derivative is now replaced by the
requirement on the sign of the rotation invariant Laplacian, ~02L
X2 q- OZLoy
2.

The reasoning leads to ~a2u + 02u


aya - o//
a t , the 2D linear isotropic diffusion equation, or
Ou 02 0z
A u = ~ - in any dimension (the Laplacian operator ~x2 + ~ is often indicated as A).

In the following chapters we will study the Gaussian kernel and the Gaussian derivatives in
detail. First we present in the next section an alternative and particularly attractive alternative
approach to derive the scale-space kernel starting from the maximization of entropy.
2. Foundations of scale-space 25

2.5 Scale-space from entropy maximization


An alternative way to derive the Gaussian kernel as the scale-space kernel of an
uncommitted observation is based on the notion that the ’uncommittedness’is expressed in a
statistical way using the entropy of the observed signal. The reasoning is due to Mads
Nielsen, IT-University Copenhagen [Nielsen1995, Nielsen1997a]:

First of all, we want to do a measurement. We have a device which has some integration area
over which the measurement is done. As we have seen before, the area (or length or volume)
of this detector should have a finite width. It cannot be brought down to zero size, because
then nothing would be measured anymore.

The measurement should be done at all locations in the same way, with either a series of
identical detectors, or the same detector measuring at all places. In mathematical language
this is stating that the measurement should be invariant for translation.

We also want the measurement to be linear in the signal to be measured, for example the
intensity. This means that when we measure a signal twice as strong, also the output of the
measurement should be doubled, and when we measure two signals, the measurement of the
sum of the signals should be equal to the sum of the individual measurements. In
mathematical language again this is called invariance for translation along the intensity axis.

These requirements lead automatically to the formulation that the observation must be a
convohttion: h(x) = f_~ L(o~) g(x - o~) do~.

L(x) is the observed variable, in this example the luminance, g(x) is our aperture, h(x) the
result of our measurement.

The aperture function g(x) should be a unity filter. Such a filter is called a normalized filter.
Normalization means that the integral over its weighting profile should be unity:
f__~oog(x)dx = 1. The filter should not multiply the data with something other than 1.

The mean of the filter g(x) should be at the location where we measure (say at x0 ), so the
expected value (orfirst moment) should be x0 : ~ _ x g ( x ) d x = xo. Because we may take any
point for x0, we may take for our further calculations as well the point x0 = 0, which makes
life somewhat easier.

The size of the aperture is a very essential element. We want to be free in choice of this size,
so at least we want to find a family of filters where this size is a free parameter. We can then
monitor the world at all these sizes by ’looking through’ the complete set of kernels
simultaneously. We call this ’size’ o-. It has the dimension of length, and is the yardstick of
our measurement. We call it the inner scale. Every physical measurement has an inner scale.
It can be /lm, milliseconds, light-years, anything, but for every dimension we need a
yardstick. Here cr is our yardstick. We can express distances in a measurement in "number of
o-’s that we stepped around".
26 2.5 Scale-space from entropy maximization

ff we weight distances quadratically with our kernel we separate the dimensions: two
orthogonal vectors fulfill (a + b) 2 = a 2 + b 2 . Distances (or lengths) add up quadratically by
Pythagoras’ law. We call the weighted metric 0-2 : f~oox 2 g(x) d x = o-2 .

The last equation we add to the set that will lead to the final formula of the kernel, comes
from the incorporation of the request to be as uncommitted as possible. We want no filter
that has a preference for something, such as vertical structures, or squares or circles.
Actually, we want, in statistical terms, the ’orderlessness’ or disorder of the measurement as
large as possible, there should be no ordering, ranking, structuring or whatsoever. Physically,
this is expressed through the entropy, a measure for disorder. The entropy of very regular
data is low, we just want maximal entropy. The formula for entropy of our filter is:
H = f ' ~ g(x) lng(x) d x where i n ( x ) is the natural logarithm.

We look for the g(x) for which the entropy is maximal, given the constraints that we derived
before:

g(x) d x = 1, (x) d x = 0 and x 2 g(x) d x = o-2.


eo co
When we want to find a maximum under a set of given constraints, we apply a standard
mathematical technique named the method of Euler-Lagrange equations (see for an intuitive
explanation of this method Petrou and Bosdogianni [Petrou1999a, page 258]).

This is a technique from the calculus of variations. We first make the Euler-Lagrange
equation, or Lagrangian, by adding to the entropy term the constraints above, each
multiplied with an unknown constant ~, which we are going to determine. The Lagrangian E
becomes:

E= s g(x)lng(x)dx + ~1 fYoog(X) dX + ][2 ~_C~ooXg(x)d x + ][3 Y?ooX2 g(X)~X


The condition to be minimal for a certain g(x) is given by the vanishing of the first variation
(corresponding to the first derivative, but in this case with respect to a function) to g(x):
a_8_E
0g = 0. This gives us: - 1 + ][1 + x ][2 + x 2 R3 - I n g(x) = 0 from which we can easily solve
g(x): g(x)= e -l+:t~+x~2+s ~3. So, g(x) is beginning to get some shape: it is an exponential
function with constant, linear and quadratic terms of x in the exponent. Let us solve for the
][’S:

g [x_] : = E -;§247247 ~3 ;

From the equation we see that at least ][3 must be negative, otherwise the function explodes,
which is physically unrealistic. We then need the explicit expressions for our constraints, so
we make the following set of constraint equations, simplified with the condition of ][3 < O:
2. Foundations of scale-space 27

eqnl = S i m p l i f y [ l ~ g [ x ] r dx == I, A 3 < O]
,2_

==l

eqn2 = S i m p l i f y [ F ~ x g[x] d x == O, A 3 < O]

e -I§ ~ A2
== 0
2 (-A3) 3/2

eqn3 = Simplify[ F=x2 g[x] dx == •2, A3 < O]

e -I§ %/~ (A2 2 - 2 9.3) == (72


4 (-A3) 5n

Now we can solve for all three A’s:

solution = Solve [ {eqnl, eqn2, eqn3 }, {AI, A2, A3 ] ]

{{AI-~ ~1 - ~ , o ge[4~ ] , A2-~O, A3-~-Ta-~


1
]}

g [x_, G_] = S i m p l i f y [E -1§247 ~3 /. F l a t t e n [ s o l u t i o n ] , a > O]

x2
e

which is the Gaussian function. A beautiful result. Again, we have found the Ganssian as the
unique solution to the set of constraints, which in principle are a formal statement of the
uncommittment of the observation.

2.6 Derivatives of sampled, observed data


All partial derivatives of the Gaussian kernel are solutions too of the diffusion equation.

So the first important result is that we have found the Gaussian kernel and all of its partial
derivatives as the unique set of kernels for a front-end visual system that satisfies the
constraints: no preference for location, scale and orientation, and linearity. We have found a
one-parameter family of kernels, where the scale o- is the free parameter.

Here are the plots of some members of the Gaussian derivative family:

1 x a + y2
g := Exp[---]; c*= 1;
2 a2
-
2 ~ a 2

B l o c k [ {S D i s p l a y F u n c t i o n [] I d e n t i t y } ,
graphs = Plot3D[Evaluate[#], {x, -3.5, 3.5], {y, -3.5, 3.5}] & / @
{g, O x g , O x O y g , Ox,xg+Oy,yg}] ;
28 2.6 Derivatives of sampled, observed data

Show[GraphicsArray[graphs], I m a g e S i z e -> 400];

Figure 2.10 Upper left: the Gaussian kernel G(x,y;o-) as the zeroth order point operator; upper
right: ~o~ ; lower left: Yw~-,
a2G . lower right: the Laplacian o7-
: c + 7::~ of the Gaussian kernel.

Because of their importance, we will discuss properties of the Gaussian kernel and its
derivatives in detail in the next chapters. In chapters 6 and 7 we will see how sensitivity
profiles of cells in the retina closely resemble the Laplacian of the Gaussian, and in the
primary visual cortex they closely resemble Gaussian derivatives, as was first noticed by
Young [Young1985, Young1986, Young1986b, Young1987a] and Koenderink
[Koenderink1984a].

The derivative of the observed data Lo(x, y)| y;cr) (the convolution is the
0
observation) is given by ~-x {L0(x,y)| which can be rewritten as
o G(x, y; o9. Note that we cannot apply the chainrule of differentiation here: the
Lo(x, y)| TEx
operator between Lo(x, y) and G(x, y; o-) is a convolution, not a product. The commutation
(exchange) of the convolution operator and the differential operator is possible because of
their linearity. It is best appreciated when we consider the equation
0
~-x {L0(x, y) | G(x, y; o-)} in the Fourier domain. We need the two rules:

- The Fourier transform of the derivative of a function is equal to - i o ) times the Fourier
transform of the function, where i ~- ~L~-, and
- convolution in the spatial domain is a product in the Fourier domain:

Clear[f]; FourierTransform[f[x], x, ~]

FourierTransform[f[x], x, w]

FourierTransform[D[f[x], x], x, ~]

-i w F o u r i e r T r a n s f o r m [ f [x] , x, ~]

0
So we get (L denotes the Fourier transform of L): ~-x {L0(x, y)| y; o-)}
-iog{L.(~} = L.{-icoG} ~-~ Lo(x, y)| ~x G(x, y; o-)

The commutation of the convolution and the derivative operators, which is easily shown in
the Fourier domain. From this we can see the following important results:

9 Differentiation and observation are done in a single step: convolution with a Gaussian
derivative kernel.

9 Differentiation is now done by integration, namely by the convolution integral.


2. Foundations of scale-space 29

This is a key result in scale-space theory. We can now apply differentiation (even to high
order) to sampled data like images.

We just convolve the image with the appropriate Gaussian derivative kernel. But where do
we need the derivatives, and where do we need higher order derivatives?

An important area of application is the exploitation of geometric information from images.


The most basic example is the first order derivative, which gives us edges.

Edges are defined as a sudden change of intensity L when we walk over the image and this is
aL
exactly what a derivative captures: -b-~-x9

Derivatives abound in the detection of differential features (features expressed as some


(polynomial) expression in image derivatives). They also show up with the detection of
motion, of stereo disparity to find the depth, the detection of structure in color images,
segmentation, image enhancement and denoising, and many other application areas as we
will see in the rest of the book.

Some more implications of the theory so far:

9 The Gaussian kernel is the physical analogue of a mathematical point, the Gaussian
derivative kernels are the physical analogons of the mathematical differential operators.
Equivalence is reached for the limit when the scale of the Gaussian goes to zero:

9 lime-,0 G(x; o-) = 6(x), where 6(x) is the Dirac delta function, and
lim~-,0 {f(x) | ac(X;ax~) ~J= lim~_~0 f'_~f(ce) ax G(x - or; o-) dc~ =
f~of(c~) 6(x - ce) dce = Ox f ( x ) .

1 ~f[a] D [ D i r a c D e l t a [ a - x ] , x] d a

f’[x]

9 There is an intrinsic and unavoidable relation between differentiation and blurring. By its
definition, any differentiation on discrete (observed) data blurs the data somewhat, with
the amount of the scale of the differential operator. There is no way out, this increase of
the inner scale is a physical necessity. We can only try to minimize the effect by choosing
small scales for the differentiation operator. However, this minimal scale is subject to
constraints (as is the maximal scale). In chapter 7 we develop the fundamental relation
between the scale of the operator, its differential order and the required amount of
accuracy.

The Mathematica function gD [ira, n x , n y , cr] implements a convolution with a Gaussian


derivative on the image ira, with order of differentiation n x with respect to x resp. n y with
respect to y. Figure 2.12 shows the derivative to x and y of a simple test image of a square:
30 2.6 Derivatives of sampled, observed data

i m = Table[If [80 < x < 170 && 80 < y < 170, 1, 0], {y, 1, 256}, {x, 1, 256}] ;
Block [{ SDisplayFunction = Identity},

imx = gD[im, I, 0, I] ; imy = gD[im, 0, 1, i] ; grad = 4 i m x 2 + i m ~ ;


p1 = L i s t D e n s i t y P l o t / @ {ira, imx, imy, grad}] ;
Show[GraphicsArray[pl] , ImagcSize ~ 400] ;

Figure 2.11 The first order derivative of an image gives edges. Left: original test image
L(x, y), resolution 2562 . Second: the derivative with respect to x: -~-
aL
at scale o-= 1 pixel.
Note the positive and negative edges. Third: the derivative with respect to y: ~aL at scale
2 ( OL ~2
o-=1 pixel. Right: the gradient ~/ ( ~aL) +,ax, at a scale of o- = 1 pixel which gives all
edges.

9 The Gaussian kernel is the unique kernel that generates no spurious resolution. It is the
blown-up physical point operator, the Gaussian derivatives are the blown-up physical
multi-scale derivative operators.

S h o w [I m p o r t [ " b l o w n - u p ddx. jpg"] , I m a g e S i z e -> 300] ;

Figure 2.12 Convolution with a Gaussian derivative is the blown-up version of convolution with
the Delta Dirac function. Taking the limit of the scale to zero ('letting the air out") leads to the
'regular' mathematical formulation.

9 Because convolving is an integration, the Gaussian kernel has by definition a strong


regularizing effect. It was shown by Schwartz [Schwartz1951] that differentiation of
distributions of data (’wild’ data, such as discontinuous or sampled data) has to be
accomplished by convolution with a smooth testfunction. This smooth tesffunction is our
Gaussian kernel here. So, we recognize that the process of observation is the regularizer.
So there is no need to smooth the data first. Actually, one should never change the input
data, but only make modifications to the process of observation where one has access: the
filter through which the measurement is done. The visual system does the same: it
employs filters at many sizes and shapes, as we will see in the chapter on human vision.

@ Recently some interesting papers have shown the complete equivalence of Gaussian scale
space regularization with a number of other methods for regularization such as splines,
2. Foundations of scale-space 31

thin plate splines, graduated convexity etc. [Scherzer2000a, Nielsen1996b,


Nielsen1997b|. In chapter 10 we will discuss the aspects of differentiation of discrete data
(it is ’ill-posed’) and the property of regularization in detail.

9 The set of Gaussian derivative kernels (including the zeroth order derivative: the Gaussian
kernel itself) forms a complete set of derivatives. This set is sometimes referred to as the
N-jet.

Now the basic toolkit is there to do differential geometry, tensor analysis, invariant theory,
topology and apply many more mathematical tools on our discrete data. This will be the
topic of much of the rest of this book.

2.7 Scale-space stack


A scale-space is a stack of 2D images, where scale is the third dimension. One can make a
scale-space of any measurement, so one can measure an intensity scale-space, a gradient
magnitude scale-space, a Laplacian scale-space etc.

i m = Import["mr64.gif"] [[1, 1]];


Block[{$DisplayFunction = Identity, xres, yres, max},
{yres, xres} = Dimensions[im]; max = Max[im];
g r = Graphics3D[ListPlot3D[Table[0, {yres}, {xres}],
Map[GrayLevel, i m / m a x , {2}], Mesh~False, B o x e d ~ F a l s e ] ] ;
g b = Table[blur= gD[im, 0, 0, i]; Graphics3D[ListPlot3D[
Table[i i0, {yres}, {xres}], Map[GrayLevel, blur/max, {2}],
Mesh~False, B o x e d ~ F a l s e ] ] , {i, I, 6}]];
Show[{gr, gbl, BoxRatios ~ {1, 1, 1}, ViewPoint -> {1.190, -3.209, 1.234},
DisplayFunction -> SDisplayFunctlon, Boxed -> True, ImageSize -> 240];

Figure 2.13 A scale-space of a 2D MRI sagittal slice, dimensions 642 , for a range of scales
=1,2,3,4,5,and 6 pixels.

We found a family of kernels, with the scale o- as a free parameter. When we don’t know
what scale to apply in an uncommitted measurement, we just take them all. It is like
sampling at spatial locations: we put CCD elements all over our receptor’s sensitive area. We
32 2.7 Scale-space stack

will see that the visual system does just that: it has groups of rods and cones in the retina
(termed receptive fields) of a wide range of circular diameters, effectively sampling at many
different scales.

We will see in the chapters on the ’deep structure’ of images (i.e. the structure along the scale
axis), that in the scale-space the hierarchical, topological structure of images is embedded.
See chapters 13-15.

One can make scale-spaces of any dimension. A scale-space stack of 3D images, such as 3D
datasets from medical tomographic scanners, is a 4D space (x,y,z;o-) and is termed a
hyperstack [Vincken 1990].

And here are two scale-spaces of a real image, a scale-space of the intensity (no derivatives,
only blurred) and a scale-space of the Laplacian (the Laplacian is the sum of the second
02L 02L
order derivatives of the image, 0-7 + ~ )’

im= Import["mr128.gif"] [[1, 1]];


DisplayTogetherArray[
{Table[ListDensityPlot[gD[im, 0, 0, E']], {r, 0, 2.1, .3}],
Table[ListDensityPlot[gD[im, 2, 0, E'] + gD[im, 0, 2, E~]],
{r, 0, 2.1, .3}]}, ImageSize-> 390];

Figure 2.14 A scale-space is a stack of images at a range of scales. Top row: Gaussian blur
scale-space of a sagittal Magnetic Resonance image, resolution 1282, exponential scale
range from o- = e~ to o- = e21 . Bottom row: Laplacian scale-space of the same image, same
scale range.
The function gD [ i m , a x , a y , a] will be explained later (chapter 4 and 5). It convolves the
image with a Gaussian derivative.

2.8 Sampling the scale-axis


From the example from the trip through scale in the "Powers of 10" series we made steps of
a f a c t o r 10 each time we took a new picture. This is an exponential stepping through scale,
and we know this as experimental fact. We step in ’orders of magnitude’. The scale parameter
cr gives a logical length parameter for the level of resolution.

If we consider how to parametrize scale cr with a dimensionless parameter r, then we realize


that scale-invariance (or self-similarity) must imply that d o - / d r must be proportional to o-.

In other words, the change that we see when we step along the scale axis, is proportional to
do"
the level of resolution at hand. Without loss of generality we may take -37r = o- with
o- It=0 = e. We call the dimensionless parameter r the natural scale parameter: a = e e ~
2. Foundations of scale-space 33

where 7- can be any number, even negative. Note that the artificial singularity due to the
problematic value of o- = 0 is now no longer present.

There is a difference between ’zooming’ and ’blurring’:

zooming is the reparametrization of the spatial axis, 2 ~ a x , so we get a larger or smaller


image by just setting them further apart of farther away. There is no information gained or
lost. Blurring is doing an observation with a larger aperture: the image is blurred. Now
information is lost, and this is exactly what is a requirement for a scale-space: reduction of
information. Because we have a larger o- over the same image domain, we can effectively
perform a sampling rate redaction [Vincken1990].

How much information is lost when we increase scale? Florack [Florack1994b] introduced
the following reasoning:

The number of (equidistant) samples on a given domain, given a fixed amount of overlap
between neighboring apertures, on scale level o- relative to the number of samples at another
scale level o-0 is given by N(cr) = ( ~ _ ) D , where D is the dimension.
Or, in terms of the natural scale parameter 7- with o- = e e r :

N(o-) = N(cro),( eerO " = N(o-0)e D(r~


ce~ )D

which is the solution of the differential equation Td N + D N = 0. At the highest scale, we


have just a single wide aperture left and we achieved total blurting; the image domain has
become a single point. Notice that the sampling rate reduction depends on the dimension D.
When we consider natural, generic images, we expect the information in the images to exist
on all scales. We could think of a ’density of local generic features’ such as intensity maxima,
minima, saddle points, corners etc. as relatively homogeneously distributed over the images
over all scales when we consider enough images. This ’feature density’ NF (7-) might then be
related to the number of samples N(7-), so aNF + D N F = 0. In chapter 20 we will see that
the number of extrema and saddle points in a scale-space of generic images indeed decreases
with a slope of dlnNF
dr ~ --2 for 2D images and a slope of-1 for 1D signals.

The factor 6 in the equation for natural scale appears for dimensional reasons: it is the scale
for 7- = 0, and is a property of our imaging device; it is the pixelsize, CCD element size, the
sampling width etc.: the inner scale of the measurement.
34 2.8 Sampling the scale-axis

B l o c k [ {SDisplayFunction = Identity},
pl = Graphics [
Table[Circle[{x, y}, .6], {x, i, 10}, {y, 1, 10}], A s p e c t R a t i o - > i] ;
p2 = Graphics[Table[Circle[{x, y}, 1.2], {x, i, i0, 2}, {y, I, i0, 2}],
AspectRatio -> 1] ;
p3 = Graphics3D [Table [ {EdgeForm [] , T r a n s l a t e S h a p e [Sphere [. 6, 10, 10] ,
{x, y, z}]}, {x, i, 6}, {y, i, 6}, {z, i, 6}], B o x e d -> False] ;
p4 = Graphics3D [Table [{EdgeForm[] , TranslateShape [
Sphere[l.2, i0, i0], {x, y, z}]}, {x, i, 6, 2},
{y, i, 6, 21, {z, i, 6, 2}], B o x e d -> False] ] ;
Show[GraphicsArray[{p1, p2, p3, p4}], ImageSize-> 400 l;

Figure 2.15 The number of samples on a 2D domain, given a fixed amount of overlap
decreases with ( o0 ) 2 (left two figures), on a 3D domain with (_~_)3 (right two figures). So
the number of samples decreases as a function of scale with a slope of - D , where D is the
dimension (see text). The sampling rate reduction is dependent on the dimensionality of the
measurement.

For positive z we go to larger scale, for negative r we go to smaller scale. In the expression
for the natural scale the singularity at o- = 0 is effectively removed.

The exponential stepping over the scale axis is also evident in the Hausdorff dimension, the
number of boxes counted in a quadtree of a binary image (see also [Pedersen2000] and
chapter 15, section 15.1.4).

Of course, there is no information within the inner scale, so here problems are to be expected
when we try to extract information at sub-pixel scale. Only by taking into account a context
of voxels through a proper model, we can go to the subpixel domain.

This is an important notion: any observation at a single point is an independent


measurement, and we can do a lot of measurements there.

In the next few chapters we will derive many features related to the measurement of
derivatives at our pixel. It turns out that we can make lots of specific polynomial
combinations, like edge strength, ’cornerness’ etc. but they all describe information in that
point. It is a ’keyhole observation’. The important ’perceptual grouping’ of neighboring points
into meaningful sets is accomplished by specifying constraints, like models. In this book we
first derive many local (differential) features.

In the second part we go a little further in the cascade of visual processing steps, and
investigate local neighborhood relations through comparison of local properties like
orientation, strength of derivative measurements etc. We also explore the deep structure of
images (a term first coined by Koenderink), by which we mean the relations over scale. In
2. Foundations of scale-space 35

the deep structure we may expect the hierarchical, structuring, more topological information:
what is ’embedded in’ what, what is ’surrounded by’ what, what is ’part of’ what etc. This
takes us to a next level of description in images, which is currently receiving a lot of
attention.

Fractals are famous examples of self similar functions. This self-similar fractal shows a tree
in three dimensions [Cabrera, www.mathsource.com]. Parameters: o~ = branch angle; E =
scale factor; m = number of branches from previous branch; n = deepness.

2.9 Summary of this chapter


Scale-space theory was discovered independently by Iijima in Japan in the early sixties, and
by Koenderink in Europe in the early seventies.

Because we have specific physical constraints for the early vision front-end kernel, we are
able to set up a ’first principle’ framework from which the exact sensitivity function of the
measurement aperture can be derived. There exist many such derivations for an uncommitted
kernel, all leading to the same unique result: the Ganssian kernel. We discussed two
approaches: the first started with the assumptions of linearity, isotropy, homogeneity and
scale-invariance.

With the help of the Pi-theorem from dimensional analysis one is able to derive the Gaussian
by plugging in the constraints one by one.

The second derivation started from causality: it is impossible that maxima increase and
minima decrease with increasing scale, every blurred version is the causal consequence of
the image it was blurred from. This means that the extrema must be closed from above. This
leads to s constraint on the sign of the second derivative, from which the diffusion equation
emerges.

The third derivation started from the minimization of the entropy at the very first
measurement. Through the use of Lagrange multipliers, where the constraints are used one
by one, one can again derive the Gaussian kernel as the unique kernel for the front-end.

A crucial result is that differentiation of discrete data is done by the convolution with the
derivative of the observation kernel, in other words: by an integration. Differentiation is now
possible on discrete data by means of convolution with a finite kernel. In chapter 14 we
discuss this important mathematical notion, which is known as regularization.

This means that differentiation can never be done without blurring the data somewhat. We
find as a complete family of front-end kernels the family of all partial derivatives of the
Gaussian kernel. The zeroth order derivative is just the Ganssian blurkernel itself.

Scale is parametrized in an exponential fashion (we consider ’orders of magnitude’ when


scaling). The exponent in this parametrization is called the natural scale parameter.
36 2.9 Summary of this chapter

Rotz[t_] = {{Cos[t], Sin[t], 0}, {-Sin[t], Cos[t], 0}, {0, O, i}];


Roty[t_] = {{Cos[t], O, -Sin[t]}, {O, 1, 0}, {Sin[t], O, Cos[t]}};
R o t 3 D [~_, e_] = R o t y [e] .Rotz [@] ; S p h e r i c a l C o o r d i n a t e s [ {x_, y_, z_} ] =
{ S q r t [ x ^ 2 + y ^ 2 + z ^2] , A r c T a n [ z , Sqrt [x^2 + z ^2] ] , A r c T a n Ix, y] } ;
N e x t B r a n c h e s [ a _ , e_, m ] [ B r a n c h [ r l _ L i s t , r0_List, th_] ] :=
M o d u l e [ { r , ~, e}, {r, e, @} = S p h e r i c a l C o o r d i n a t e s [ r l - r 0 ] ;
{ B r a n c h [ s , ( r l - r 0 ) + r l , rl, s * t h ] , S e q u e n c e @ @ T a b l e [
B r a n c h [rl + s * r {Sin [a] Cos [~] , Sin [a] Sin [~] , Cos [a] ] .Rot3D [~, e] ,
2Pi
r l , s . t h ] , {~, O, 2Pi, - - } ] } //N],
ra
N e x t B r a n c h e s [ a _ , e_, m_] [w_List] := M a p [ N e x t B r a n c h e s [ a , e, m] , w] ;
Tree2D[a_, e_, m_, r_List, th_, n_] :=
N e s t L i s t [ N e x t B r a n c h e s [ a , s, m], Branch[r, {0, 0, 0}, i] , n] /.
B r a n c h [ r 1 _ , r0_, t_] :>
{RGBColor[0, 0.6 (1 - t) + 0.4, 0], T h i c k n e s s [ t h * t ] , Line[{rl, r0}] }
S h o w [ G r a p h i c s 3 D [ T r e e 2 D [ a , e, m, r, th0, n] /.
{a->Pi/8, e->0.6, m->5, r - > [0.01, 0, i}, n - > 4 , t h 0 - > 0 . 0 3 } ] ,
P l o t R a n g e - > {{-I, i}, {-i, i}, {0, 2.5}},
V i e w P o i n t - > {3.369, -0.040, 0.312}, I m a g e S i z e - > 200] ;

Figure 2.16 Fractals are famous examples of self similar functions. This self-similar
fractal shows a tree in three dimensions [Cabrera, www.mathsource.com]. Parameters: o~ =
branch angle; ~ = scale factor; m = number of branches from previous branch; n = deepness.
Source: Renan Cabrera, www.mathsource.com.
3. The Gaussian kernel
O f all things, man is the measure.
Protagoras the Sophist (480-411 B.C.)

3.1 The Gaussian kernel


The Gaussian (better GauBian) kernel is named after Carl Friedrich Gang (1777-1855), a
brilliant German mathematician. This chapter discusses many of the attractive and special
properties of the Gaussian kernel.

<< FrontEndVlsion'FEV'; Show[Import["Gaussl0DM.gif"], ImageSize -> 280];

Figure 3.1 The Gaussian kernel is apparent on every German banknote of DM 10,- where it
is depicted next to its famous inventor when he was 55 years old. The new Euro replaces
these banknotes. See also: http://scienceworld.wolfram.com/biography/Gauss.html.

The Gaussian kernel is defined in l-D, 2D and N-D respectively as

1 e x2 x2 +y2 p~12
G1D (X; 0-) = ~ o- 2o-2 , G 2 D ( X , y ; 0-) ~ 1 e z~ 2 GND (2; 0-) (2d~-1 o-)u e

The 0- determines the width of the Gaussian kernel. In statistics, when we consider the
Gaussian probability density function it is called the standard deviation, and the square of it,
0-2 , the variance. In the rest of this book, when we consider the Gaussian as an aperture
function of some observation, we will refer to 0- as the inner scale or shortly scale.

In the whole of this book the scale can only take positive values, o- > 0. In the process of
observation o- can never become zero. For, this would imply making an observation through
an infinitesimally small aperture, which is impossible. The factor of 2 in the exponent is a
matter of convention, because we then have a ’cleaner’ formula for the diffusion equation, as
we will see later on. The semicolon between the spatial and scale parameters is
conventionally put there to make the difference between these parameters explicit.
38 3.1 The Gaussian kernel

The scale-dimension is not just another spatial dimension, as we will thoroughly discuss in
the remainder of this book.

The half width at half maximum (o- = 2 V 2 In 2 ) is often used to approximate o-, but it is
somewhat larger:

Unprotect [gauss] ;
gauss{x_, u_] := - -
1 Exp[- x2
o 2%~-~ ~2 ];
Solve[ g a u s s { x , o] 1 x]
gauss[O, o ] == 2 " '

{ix-~-a 2%/2~g[2] }, {X-~CT~2Log[2] }}

%//N

{{ x~-1.17741c~}, ix-~ 1.17741a}}

3.2 Normalization
The term ~ 1 in front o f the one-dimensional Gaussian kernel is the normalization
constant. It comes from the fact that the integral over the exponential function is not unity:
f~coe -x2/2~ dx= ~ o-. With the normalization constant this Gaussian kernel is a
normalized kernel, i.e. its integral over its full domain is unity for every or.

This means that increasing the o- of the kernel reduces the amplitude substantially. Let us
look at the graphs o f the normalized kernels for o- = 0.3, o- = 1 and o- = 2 plotted on the
same axes:

1 x2
Unprotect[gauss] ; gauss{x_, a_] := - - E x p [ -
o 2~T~= 2~
Block[{$DisplayFunetion= Identity}, {p1, p2, p3} =
Plot{gauss{x, ~ #], {x, -5, 5}, PlotRange-> {0, 1.4}] & / @
{.3, 1, 2}];

L 1
Show[GraphicsArray[{p1, p2, p3}], ImageSize-> 400];

i~ 08 08
06

-4-2 24 -4-2 24 -4-2 24

Figure 3.2 The Gaussian function at scales o-=.3, o-= 1 and o - = 2 . The kernel is
normalized, so the total area under the curve is always unity.

The normalization ensures that the average graylevel of the image remains the same when
we blur the image with this kernel. This is k n o w n as average grey level invariance.
3. The Gaussian kernel 39

3.3 Cascade property, selfsimilarity


The shape of the kernel remains the same, irrespective o f the 0-. When we convolve two
Gaussian kernels we get a new wider Gaussian with a variance 0-2 which is the sum o f the
variances o f the constituting Gaussians: gnew(2; ~ + ~ = gl (Yc;0-2) | g2 (Yc;0-2).

o.=.; SimplifY[l~gauss[o-,c oz] gauss[o.-x, o-z] do 86{o'1 >0, 02 >0}]

x2

This phenomenon, i.e. that a new function emerges that is similar to the constituting
functions, is called self-similarity.

The Gaussian is a self- similar fitnction. Convolution with a Gaussian is a linear operation, so
a convolution with a Gaussian kernel followed by a convolution with again a Gaussian
kernel is equivalent to convolution with the broader kernel. Note that the squares o f o- add,
not the 0-’s themselves. Of course we can concatenate as many blurring steps as we want to
create a larger blurring step. With analogy to a cascade of waterfalls spanning the same
height as the total waterfall, this p h e n o m e n o n is also known as the cascade smoothing
property.
Famous examples of self-similar functions are fractals. This shows the famous Mandelbrot
fractal:

eMandelbrot = Compile[{{c, _Complex}}, -Length[


FixedPointList[# 2 + e &, c, 50, SameTest -> (Abs[#21 > 2.0 &)Ill;
ListDensityPlot[ -Table[cMandelbrot[a + b I], {b, -1.1, 1.1, 0.0114},
{a, -2.0, 0.5, 0.0142}], Mesh -> False, AspectRatio -> Automatic,
F r a m e -> False, ColorFunction -> Hue, ImageSize -> 170];

Figure 3.3 The Mandelbrot fractal is a famous example of a self-similar function. Source:
www.mathforum.org. See also mathworld.wolfram.com/MandelbrotSet.html.
40 3.4 The scale parameter

3.4 The scale parameter


In order to avoid the summing of squares, one often uses the following parametrization:
2 0 - 2 ---) t, so the Gaussian kernel get a particular short form. In N
x2
1 ---
dimensions:GyD(2, t) = ~ e , .

It is this t that emerges in the diffusion equation 0L


Ot -- 02r
Oxz + ~02L + 702L
" It is often referred
OL
to as ’scale’ (like in: differentiation to scale, ~/- ), but a better name is variance.

To make the self-similarity of the Gaussian kernel explicit, we can introduce a new
x ’ W e say that we have reparametrized the x-axis.
dimensionless spatial parameter, 2 = o-v2
Now the Gaussian kernel becomes: g,(5:; 0-) = ~ 1 -
e -x2 , or gn(5:; t) = ~ 1 e-~ . In
other words: if we walk along the spatial axis in footsteps expressed in scale-units (o-’s), all
kernels are of equal size or ’width’ (but due to the normalization constraint not necessarily o f
the same amplitude). W e now have a ’natural’ size o f footstep to walk over the spatial
coordinate: a unit step in x is now 0- i f f , so in more blurred images we make bigger steps.
W e call this basic Gaussian kernel the natural Gaussian kernel gn(5:; 0-). The new coordinate
5: = ~ - 2 is called the natural coordinate. It eliminates the scale factor o- from the spatial
coordinates, i.e. it makes the Gaussian kernels similar, despite their different inner scales.
W e will encounter natural coordinates many times hereafter.

The spatial extent of the Gaussian kernel ranges from -oo to + ~ , but in practice it has
negligible values for x larger then a few (say 5) 0-. The numerical value at x=50-, and the area
under the curve from x=5o - to infinity (recall that the total area is 1):

gauss[5, 1] //N
Integrate[gauss[x, i], {x, 5, I n f i n i t y } ] // N
1.48672• 6

2.86652• -7

The larger we make the standard deviation 0-, the more the image gets blurred. In the limit to
infinity, the image becomes homogenous in intensity. The final intensity is the average
intensity of the image. This is true for an image with infinite extent, which in practice will
never occur, of course. The boundary has to be taken into account. Actually, one can take
many choices what to do at the boundary, it is a matter of consensus. Boundaries are
discussed in detail in chapter 5, where practical issues o f computer implementation are
discussed.

3.5 Relation to generalized functions


The Gaussian kernel is the physical equivalent o f the mathematical point. It is not strictly
local, like the mathematical point, but semi-local. It has a Gaussian weighted extent,
indicated by its inner scale 0-.
3. The Gaussian kernel 41

Because scale-space theory is revolving around the Gaussian function and its derivatives as a
physical differential operator (in more detail explained in the next chapter), we will focus
here on some mathematical notions that are directly related, i.e. the mathematical notions
underlying sampling of values from functions and their derivatives at selected points (i.e. that
is why it is referred to as sampling). The mathematical functions involved are the
generalized functions, i.e. the Delta-Dirac function, the Heaviside function and the error
function. In the next section we study these functions in detail.

When we take the limit as the inner scale goes down to zero (remember that cr can only take
positive values for a physically realistic system), we get the mathematical delta function, or
Dirac delta function, 6(x). This function is everywhere zero except in x = 0, where it has
infinite amplitude and zero width, its area is unity.

limo-+0
( ~1 e-57
x2 )
= 6(x).

6(x) is called the sampling function in mathematics, because the Dirac delta function
adequately samples just one point out of a function when integrated. It is assumed that f ( x )
is continuous at x = a:

I ~DiracDelta[x - a]f [x] dx


f[a]

The sampling property of derivatives of the Dirac delta function is shown below:

I ~D[DiracDelta[x], {X, 2}] f [ x ] dX

f"[o]

The delta function was originally proposed by the eccentric Victorian mathematician Oliver
Heaviside (1880-1925, see also [Pickover1998]). Story goes that mathematicians called this
function a "monstrosity", but it did work! Around 1950 physicist Paul Dirac (1902-1984)
gave it new light. Mathematician Laurent Schwartz (1915-) proved it in 1951 with his
famous "theory of distributions" (we discuss this theory in chapter 8). And today it’s called
"the Dirac delta function".

The integral of the Gaussian kernel from - ~ to x is a famous function as well. It is the error
function, or cumulative Gaussian function, and is defined as:

a=.;err[x_, a_] =~o ~ 1


o~
i Erf [~ x ]
42 3.5 Relation to generalized functions

The y in the integral above is just a dummy integration variable, and is integrated out. The
Mathematica error function is E r r [ x ] .
x
In our integral of the Gaussian function w e need to do the reparametrization x--* 0-7/2 "
1
Again w e recognize the natural coordinates. The factor 7 is due to the fact that integration
starts halfway, in x = 0.

a = 1.; Plot Err , {x, -4, 4}, AspectRatio-> .3,

AxesLabel-> {"x", "Err[x]"), ImageSize-> 200];


Eft[x]
O4
02
. . . . . . . . . . . . . . . .
-4 -2 _ 2 4

Figure 3.4 The error function E r r [x] is the cumulative Gaussian function.

When the inner scale o- of the error function goes to zero, w e get in the limiting case the so-
called Heavyside function or unitstep function. The derivative of the Heavyside function is
the Delta-Dirac function, just as the derivative of the error function of the Gaussian kernel.

a = .1; Plot Erf , {x, -4, 4}, AspectRatio-> .3,

AxesLabel-> {"x ", "Erf[x]"l, ImageSize-> 270];

~
Ed~•

O2

4 -2 02 2 4 x

--0
Figure 3.5 For decreasing o- the Error function begins to look like a step function. The Error
function is the Gaussian blurred step-edge.

Plot[UnitStep[x], {x, -4, 4}, DisplayFunction-> SDisplayFunction,


AspectRatio -> . 3, AxesLahel - > { " x " , " H e a v y s i d e [x I , U n i t S t e p [x] " } ,
PlotStyle -> Thickness [ .015], ImageSize -> 270] ;
Heavysid~[x], UnitStel~x]
1

0.6
0.8 I
O4
02
. . . . . . . .
-4 2 2 4

Figure 3.6 The Heavyside function is the generalized unit stepfunction. It is the limiting case
of the Error function for lim o- ~ 0.

The derivative of the Heavyside step function is the Delta function again:
3. The Gaussian kernel 43

D[UnitStep[x], x]
DiracDelta[x]

3.6 Separability

The Gaussian kernel for dimensions higher than one, say N, can be described as a regular
product of N one-dimensional kernels. Example: g2D(x, Y; 0-3 + 0-2) = gl D (X; O-~l)
gl D (Y; 0"2) where the space in between is the product operator. The regular product also
explains the exponent N in the normalization constant for N-dimensional Gaussian kernels in
(0). Because higher dimensional Gaussian kernels are regular products of one-dimensional
Gaussians, they are called separable. We will use quite often this property of separability.

DisplayTogetherArray[{Plot[gauss[x, Q= I], {x, - 3 , 3}],


Plot3D[gauss[x, G = i] g a u s s [ y , a = i], {x, - 3 , 3}, {y, - 3 , 3}]},
ImageSize -> 4 4 0 ] ;

Figure 3.7 A product of Gaussian functions gives a higher dimensional Gaussian function.
This is a consequence of the separability.

An important application is the speed improvement when implementing numerical separable


convolution. In chapter 5 we explain in detail how the convolution with a 2D (or better: N-
dimensional) Gaussian kernel can be replaced by a cascade of 1D convolutions, making the
process much more efficient because convolution with the 1D kernels requires far fewer
multiplications.

3.7 Relation to binomial coefficients


Another place where the Gaussian function emerges is in expansions of powers of
polynomials. Here is an example:

Expand[(x+ y)30]
x 3~ + 30 x 29 y + 4 3 5 x 28 y2 + 4 0 6 0 X 27 y3 + 2 7 4 0 5 X 26 y4 + 1 4 2 5 0 6 x 25 y5 +
593775 x 24 y6 + 2 0 3 5 8 0 0 x 23 y7 + 5 8 5 2 9 2 5 x 22 y8 + 1 4 3 0 7 1 5 0 x 21 y9 +
30045015 x 2~ ylO + 5 4 6 2 7 3 0 0 x 19 y11 + 8 6 4 9 3 2 2 5 x 18 y12 + 1 1 9 7 5 9 8 5 0 x Iv y13 +
145422675 x 16 yir + 1 5 5 1 1 7 5 2 0 x 15 y15 + 1 4 5 4 2 2 6 7 5 x 14 y16 +
119759850 x 13 y17 + 8 6 4 9 3 2 2 5 x 12 y18 + 5 4 6 2 7 3 0 0 x n y19 + 3 0 0 4 5 0 1 5 x I~ y2O +
14307150x 9 yn +5852925x 8 y22 + 2 0 3 5 8 0 0 x 7 y23 + 5 9 3 7 7 5 x 6 y24 +
142506 x 5 y25 + 2 7 4 0 5 x 4 y26 + 4 0 6 0 x 3 y27 + 4 3 5 x 2 y28 + 30 x y29 + y3O

The coefficients of this expansion are the binomial coefficients (n) (’n over m’):
44 3.7 Relation to binomial coefficients

ListPlot[Table[Binomial[30, n], {n, 1, 30}],


PlotStyle -> {PointSize [. 015] }, A s p e c t R a t i o -> .3] ;
15
125X l0a
lxl08
75xi07
5XI0~
25X l0T

5 10 15 20 25 30

Figure 3.8 Binomial coefficients approximate a Gaussian distribution for increasing order.

And here in two dimensions:

BarChart3D[Table[Binomial[30, n] Binomial[30, m], {n, i, 30}, {m, I, 3 0 } ] ,


ImageSize -> 180];

Figure 3.9 Binomial coefficients approximate a Gaussian distribution for increasing order.
Here in 2 dimensions we see separability again.

3.8 The Fourier transform of the Gaussian kernel


We will regularly do our calculations in the Fourier domain, as this often turns out to be
analytically convenient or computationally efficient. The basis functions of the Fourier
transform T a r e the sinusoidai functions e i~ . The definitions for the Fourier transform and
its inverse are:

the Fourier transform: F(co) = 5~ {f(x)} = ~ f _ ~ f ( x ) e i~~ d x


24~T -
the inverse Fourier transform: U-1 {F(o)} = 1 f[ooF(~o)
e -*~~ d a )

a =. ; Y g a u s s [ ~ _ , a__] =
1 1 x 2
Simplify [ Integra%e [- - Exp[- ~--~-a2 ] Exp[I~x], {x, -co, co}],

{ u > 0, Ira[G] ==0)]

e ~ a2 ~2
3. The Gaussian kernel 45

The Fourier transform is a standard Mathematica command:

Simplify[FourierTransform[gauss[x, Q], x, w], a > 0]

Note that different communities (mathematicians, computer scientists, engineers) have


different definitions for the Fourier transform. From the Mathematica help function:

With the setting F o u r i e r P a r a m e t e r s ~ { a , b } the discrete Fourier transform computed


by F o u r i e r T r a n s f o r m is 4 Ibl f ~ f(t)e ih'~ tit. Some common choices for { a t ’ h }

are { 0 , 1 } (default), { - 1 , 1 } (data analysis), { 1 , - 1} (signal processing).

In this book we consistently use the default definition.

So the Fourier transform of the Gaussian function is again a Gaussian function, but now of
the frequency ~o. The Gaussian function is the only function with this property. Note that the
scale o- now appears as a multiplication with the frequency. We recognize a well-known fact:
a smaller kernel in the spatial domain gives a wider kernel in the Fourier domain, and vice
versa. Here we plot 3 Gaussian kernels with their Fourier transform beneath each plot:

Block [ {SDisplayFunction = Identity},


pl=Table[Plot[gauss[x, Q], {x, -10, 10}, PlotRange->A11,
PlotLabel-> "gauss[x," <>ToString[a] <>"]"], {a, i, 3}] ;
p2 =Table[Plot[Ygauss[~, a], {ca, -3, 3}, P1otRange->A11,
PlotLabel-> "Ygauss[x," <> ToString[a] <> "] "], {a, I, 3}] l ;
Show[GraphicsArray[{pl, p2}], ImageSize->400] ;
gauss[x.1] gauss[x,2] gauss[x,3]

-[0 5 5 10 -lO 5 5 10 10 -5 5 10

9k
~-gauss[xA] ~-gauss[x,2] ~gauss[x.3l

-3 2 -I 1 2 3 3 2 -1 1 2 3 -3 - 2 1 1 2 3

Figure 3.10 Top row: Gaussian function at scales o--1, o--2 and o-=3. Bottom row: Fourier
transform of the Gaussian function above it. Note that for wider Gaussian its Fourier
transform gets narrower and vice versa, a well known phenomenon with the Fourier
transform. Also note by checking the amplitudes that the kernel is normalized in the spatial
domain only.

There are many names for the Fourier transform 7-g(~o; o-) of g(x; o-): when the kernel
g(x;o-) is considered to be the point spread function, T g ( a , ; or) is referred to as the
modulation transfer function. When the kernel g(x;cr) is considered to be a signal, 7- g(~o; o-)
is referred to as the spectrum. When applied to a signal, it operates as a lowpass filter. Let us
46 3.8 The Fourier transform of the Gaussian kernel

plot the spectra of a series of such filters (with a logarithmic increase in scale) on double
logarithmic paper:

scales =N[Table[ Exp[t/3], {t, 0, 8}]]


spectra []LogLinearPlot[Ygauss[~, #],
{~, .01, 10l, DisplayFunction-> Identity] & /@ scales;
Show[spectra, DisplayFunction -> SDisplayFunction, AspectRatio -> .4,
PlotRange -> All, AxesLabel -> {"~", "Amplitude"}, ImageSize -> 300];
{i., 1.39561, 1.94773, 2.71828,
3.79367, 5.29449, 7.38906, 10.3123, 14.3919}
Amplitude
0.4

, ,, , , ~ ~ v ~ ’ ’ w
0.01 0~ 01 05 1 5 10

Figure 3.11 Fourier spectra of the Gaussian kernel for an exponential range of scales o- -
(most right graph) to o- = 14.39 (most left graph). The frequency ~o is on a logarithmic scale.
The Gaussian kernels are seen to act as low-pass filters.

Due to this behaviour the role of receptive fields as lowpass filters has long persisted. But the
retina does not measure a Fourier transform of the incoming image, as we will discuss in the
chapters on the visual system (chapters 9-12).

3.9 Central limit theorem


We see in the paragraph above the relation with the central limit theorem: any repetitive
operator goes in the limit to a Gaussian function. Later, when we study the discrete
implementation of the Gaussian kernel and discrete sampled data, we will see the relation
between interpolation schemes and the binomial coefficients. We study a repeated
convolution of two blockfunctions with each other:

fix_] :[]UnitStep[1/2 +x] +UnitStep[1/2-x] - 1;


g[x_] :=UnitStep[1/2 +x] +UnitStep[1/2-x] - 1;

Plot[f[x], {x, -3, 3}, ImageSize-> 140];

lil I
-3 2 1 1 2 3

Figure 3.12 The analytical blockfunction is a combination of two Heavyside unitstep functions.

We calculate analytically the convolution integral


3. The Gaussian kernel 47

hl = Integrate[f[x] g [ x - x l ] , {x, -~, ~}]

1
(-i + 2 UnitStep[1 - xl] - 2 xl UnitStep[1 - xl] - 2 xl UnitStep[xl]) +
1 (-1 +2xlUnitStep[-xl] +2UnitStep[1 +xl] +2xlUnitStep[1 +xl])

Plot[h1, {xl, -3, 3), P 1 o t R a n g e - > A l l , ImageSize-> 1501 ;

-3 2 1 1 2 3

Figure 3.13 One times a convolution of a blockfunction with the s a m e blockfunction gives a
triangle function.

The next convolution is this function convolved with the block function again:

h2 = Integrate[(hl /. x l - > x) g [ x - x l ] , {x, -~, ~ ]

-i+ ffl (i_2xi)2 + ~i (i+2xi)2 + 8-


i (3_4xl_4x12) + ff
1 (3+4xl-4x12) 1
+--8
(-4+ 9 UnitStep[ 23---xl] - 12 xl U n i t S t e p [ ~ - xl] + 4 x12 UnitStep[ ~-- -xl] +

UnitStep [- 1 +xl]- 4 xl UnitStep[-~ +xl] + 4 x l 2 UnitStep[-3 +xl]) +

1
1 (-UnitStep [ 3 - x l ] + 4 xl UnitStep [~- xl]-4 xl 2 UnitStep[ 3 - xl]-

UnitStep[~ + x l ] - 4 x l U n i t S t e p [ 3 + x l ] - 4 x l 2 UnitStep[7i + xl])+

18 (-4 + UnitStep [- 1 _ xl]+4 xl UnitStep[- ~ - xl]+

1 - xl] + 9 UnitStep[~3 + xl] +


4 xl 2 UnitStep [- ~-

12 xl UnitStep[~ +xl] + 4 x12 UnitStep[~ +xl])

1
-i+~11-2xI) 2+~11+2xI)2+~I3-4xl-4x1~l+~I3+4xl-4x12>+
1

Y1 (-4 + 9 UnitStep[ ~ - xl] - 12 xl UnitStep[ ~ - xl] + 4 x12 UnitStep[~ - xl] +


UnitStep[- ~ +xl]-4 xl unitStep[-~ +xl] +4xl 2 UnitStep[-~i + xl])+ ~- 1
(-UnitStep[~-xl] +4 xlUnitStep[~-xl]-4xl 2 UnitStep[~-xl]- 1
1 1
UnitStep[ ~ + xl] ~ 4 xl UnitStep[ 12 + xl] - 4 xl 2 UnitStep[~ +xl]) +
1 (-4 + UnitStep[- ~1 - xl] + 4 xl UnitStep[- ~1 - xl] + 4 xl 2 UnitStep[-~1 - xl] +
Y
9 UnitStep[~3 + xl] + 12 xl UnitStep[ ~3 + xl] + 4 xl 2 UnitStep [~ +xl])

W e see that w e get a result that begins to look more towards a Gaussian:
48 3.9 Central limit theorem

Plot[{h2, gauss[x1, .5]}, {xl, -3, 3}, P l o t R a n g e - > A 1 1 ,


P l o t S t y l e - > {Dashing[{l], D a s h i n g l { 0 . 0 2 , 0.021]}, I m a g e S i z e - > 150] ;

-’3 -, -~ ~ 2 3

Figure 3.14 Two times a convolution of a blockfunction with the same blockfunetion gives a
function that rapidly begins to look like a Gaussian function. A Gaussian kernel with o- = 0.5
is drawn (dotted) for comparison.

The real Gaussian is reached when we apply an infinite number of these convolutions with
the same function. It is remarkable that this result applies for the infinite repetition o f any
convolution kernel. This is the central limit theorem.

9 T a s k 3.1 S h o w the central limit t h e o r e m in practice for a n u m b e r of o t h e r


arbitrary kernels.

3.10 Anisotropy
P l o t G r a d i e n t F i e l d [ - g a u s s [ x , i] gauss[y, 1],
{x, -3, 31 , {y, -3, 31 , P l o t P o i n t s -> 20, I m a g e S i z e -> 140] ;

,......
9 .........
,~LLI~Jt,,I,"
: ..... "
"'''~\\ II II/. ....

..... t1!!,, .....


,; ..... ,~ ~ ..... ,,
., ;. ; ;', ',',,' ~ :','.'.'..

Figure 3.15 The slope of an isotropic Gaussian function is indicated by arrows here. There are
circularly symmetric, i.e. in all directions the same, from which the name isotropic derives.
The arrows are in the direction of the normal of the intensity landscape, and are called
gradient vectors.

The Gaussian kernel as specified above is isotropic, which means that the behaviour o f the
function is in any direction the same. For 2D this means the Gaussian function is circular, for
3D it looks like a fuzzy sphere.

It is of no use to speak of isotropy in 1-D. W h e n the standard deviations in the different


dimensions are not equal, we call the Gaussian function anisotropic. A n example is the
pointspreadfunction of an astigmatic eye, where differences in curvature o f the cornea/lens in
different directions occur. This show an anisotropic Gaussian with anisotropy ratio o f 2
(O’x/ o-y = 2) :
3. The Gaussian kernel 49

Unprotect[gauss];

- - + - - ];
gauss[x_, y_, ax_, ay_] := 2 ~ a x o y 2ux z
- -

2o'y z

ax = 2 ; 0 7 = 1; Block[{$DisplayFunction = Identity),
p1 = DensityPlot[gauss[x, y, ax, uy] ,
{x, -i0, I0}, {y, -i0, i0}, PlotPoints-> 50];
p2 = Plot3D[gauss[x, y, ax, oy], {x, -10, 10},
{y, -i0, i0}, S h a d i n g - > True];
p3 = ContourPlot[gauss[x, y, ax, ay], {x, -5, 5l, {y, -i0, i0}]];
Show[GraphicsArray[{pl, p2, p3l], ImageSize-> 400];

Figure 3.16 An anisotropic Gaussian kernel with anisotropy ratio O'x/O'y=2 in three
appearances. Left: DensityPlot, middle: Plot3D, right: ContourPlot.

3.11 The diffusion equation


The Gaussian function is the solution of several differential equations. It is the solution of
dy
dx --
y(u-x)
0-2 ~
because -dy
-7 ~
(u-x)
0 -2
dx, from which we find by integration ln(~0 ) -- (u-x)2
2 0"2
(x #)2
and t h u s y = y 0 e 2~2

It is the solution of the linear diffusion equation, oLot -- 02Lox2-]-~y202L A L.


=

This is a partial differential equation, stating that the first derivative of the (luminance)
function L(x, y) to the parameter t (time, or variance) is equal to the sum of the second order
spatial derivatives. The right hand side is also known as the Laplacian (indicated by A for
any dimension, we call A the Laplacian operator), or the trace of the Hessian matrix of
second order derivatives:

Lxx Lxy )
hessian2D = Lxy Lyy ; Tr[hessian2D]

Lxx + Lyy

Lxx Lxy Lxz )


hessian3D = Lyx Lyy Lyz ; Tr [hessian3D]
Lzx Lyz Lz,

Lxx + Lyy + Lzz

cqu
The diffusion equation -b7 = A u is one of some of the most famous differential equations in
physics. It is often referred to as the heat equation. It belongs in the row of other famous
50 3.11 The diffusion equation

02U
equations like the Laplace equation A u = 0, the wave equation 7 = A u and the
Ou
Schr6dinger equation N- = i A u.

Ou
The diffusion equation -37 = A u is a linear equation. It consists of just linearly combined
derivative terms, no nonlinear exponents or functions of derivatives.

The diffused entity is the intensity in the images. The role of time is taken by the variance
t = 2 o-2 . The intensity is diffused over time (in our case over scale) in all directions in the
same way (this is called isotropic). E.g. in 3D one can think of the example of the intensity
of an inkdrop in water, diffusing in all directions.

The diffusion equation can be derived from physical principles: the luminance can be
considered a flow, that is pushed away from a certain location by a force equal to the
gradient. The divergence of this gradient gives how much the total entity (luminance in our
case) diminishes with time.

<< Calculus'VectorAnalysis"
SetCoordinates[Cartesian[x, y, zll;

Div[ Grad[L[x, y, z] ] ]

L(~176 y, z] + L(~176 y, z] +L(2"~176 y, z]

A very important feature of the diffusion process is that it satisfies a maximum principle
[Hummel1987bl: the amplitude of local maxima are always decreasing when we go to
coarser scale, and vice versa, the amplitude of local minima always increase for coarser
scale. This argument was the principal reasoning in the derivation of the diffusion equation
as the generating equation for scale-space by Koenderink [Koenderink1984a].

3.12 Summary of this chapter


The normalized Gaussian kernel has an area under the curve of unity, i.e. as a filter it does
not multiply the operand with an accidental multiplication factor. Two Gaussian functions
can be cascaded, i.e. applied consecutively, to give a Gaussian convolution result which is
equivalent to a kernel with the variance equal to the sum of the variances of the constituting
Gaussian kernels. The spatial parameter normalized over scale is called the dimensionless
’natural coordinate’.

The Gaussian kernel is the ’blurred version’ of the Delta Dirac function, the cumulative
Ganssian function is the Error function, which is the ’blurred version’ of the Heavyside
stepfunction. The Dirac and Heavyside functions are examples of generalized functions.

The Gaussian kernel appears as the limiting case of the Pascal Triangle of binomial
coefficients in an expanded polynomial of high order. This is a special case of the central
limit theorem. The central limit theorem states that any finite kernel, when repeatedly
convolved with itself, leads to the Gaussian kernel.
3. The Gaussian kernel 51

Anisotropy of a Gaussian kernel means that the scales, or standard deviations, are different
for the different dimensions. When they are the same in all directions, the kernel is called
isotropic.

The Fourier transform of a Gaussian kernel acts as a low-pass filter for frequencies. The cut-
off frequency depends on the scale of the Gaussian kernel. The Fourier transform has the
same Gaussian shape. The Ganssian kernel is the only kernel for which the Fourier transform
has the same shape.

The diffusion equation describes the expel of the flow of some quantity (intensity,
temperature) over space under the force of a gradient. It is a second order parabolic
differential equation. The linear, isotropic diffusion equation is the generating equation for a
scale-space. In chapter 21 we will encounter a wealth on nonlinear diffusion equations.
4. Gaussian derivatives
A difference which makes no difference is not a difference.
Mr. Spock (stardate 2822.3)

4.1 Introduction
We will encounter the Gaussian derivative function at many places throughout this book.
The Gaussian derivative function has many interesting properties. We will discuss them in
one dimension first. We study its shape and algebraic structure, its Fourier transform, and its
close relation to other functions like the Hermite functions, the Gabor functions and the
generalized functions. In two and more dimensions additional properties are involved like
orientation (directional derivatives) and anisotropy.

4.2 Shape and algebraic structure


When we take derivatives to x (spatial derivatives) of the Gaussian function repetitively, we
see a pattern emerging of a polynomial of increasing order, multiplied with the original
(normalized) Gaussian function again. Here we show a table of the derivatives from order 0
(i.e. no differentiation) to 3.

< < FrontEndVision" FEV" ;


1 x2
Unprotect[gauss] ; gauss[x_, ~_] := ~ R~2~--~ E x p [- 2 ~----~];

Table[Factor[Evaluate[D[gauss[x, a], {x, n ) ] ] ] , {n, 0, 4}]

x2 x2 x2
re ~ e~x e~ (x-a) (x+~)
[ - - - t
' z4~Y o3, ~ os
x2
e-~ x ( x 2 - 3 c ~z) e-~ (x 4 -6x 2 cy2 + 3 (y4)
2./~-; ~7 ’ 24~o9 }

The function F a c t o r takes polynomial factors apart.

The function g a u s s [x, O] is part of the standard set of functions (in FEV.m) with this
book, and is protected. To modify it, it must be U n p r o t e c t e d .

The zeroth order derivative is indeed the Gaussian function itself. The even order (including
the zeroth order) derivative functions are even functions (i.e. symmetric around zero) and the
odd order derivatives are odd functions (antisymmetric around zero). This is how the graphs
of Gaussian derivative functions look like, from order 0 up to order 7 (note the marked
increase in amplitude for higher order of differentiation):
4. Gaussian derivatives 54

Graphic sArray [
Partition[Table[Plot[Evaluate[D[gauss[x, i], {x, n } ] ] , {x, - 5 , 5},
P l o t L a b e l -> S t r i n g J o i n [ " O r d e r = " , T o S t r i n g [n]] , D i s p l a y F u n c t i o n ->
I d e n t i t y ] , {n, 0, 7}] , 4] , I m a g e S i z e -> 500] // S h o w ;

Order0 Ordeal Order2 Order3


/~11 ~ 04
. -4 - 2 4 02

-4 -20.1 2 4
-4 -2 2 4 -O4

Ordem4 Order-5 Ordem6 Ordel-7

O. --- /~ -4"N-/2_11 / -~/’4 ,, 5 ,

Figure 4.1 Plots of the 1D Gaussian derivative function for order 0 to 7.

The Gaussian function itself is a common element of all higher order derivatives. We extract
the polynomials by dividing by the Gaussian function:

Table[Evaluate[ D[gauss[x, G], {x, n}] ], {n, 0, 4}] // S i m p l i f y


g a u s s [ x , G]

lI X X 2 -- 0 2 X 3 -- 3 X 0-2 X 4 -- 6 X 2 0-2 + 3 O-4


-- ~O- I 0-4 t 0-6 t 0- 8 }

These polynomials have the same order as the derivative they are related to. Note that the
highest order of x is the same as the order of differentiation, and that we have a plus sign for
the highest order of x for even number of differentiation, and a minus signs for the odd
orders.

These polynomials are the Hermite polynomials, called after Charles Hermite, a brilliant
French mathematician (see figure 4.2).

Show[Import["Charles Hermite.jpg"], ImageSize-> 150];

Figure 4.2 Charles Hermite (1822-1901).


55 4.2 Shape and algebraic structure

On e-2
They emerge from the following definition: ax, - ( - 1 ) n H n ( x ) e -x: 9 The function
Hn(x ) is the Hermite polynomial, where n is called the order of the polynomial. When we
make the substitution x - ~ x/(o-~v12), we get the following relation between the Gaussian
function G(x, tr) and its derivatives: O"G(x,~r) = (_ 1)n 1

In Mathematica the function Hn is given by the function H e r m i t e H [ n , x ] . Here are the


Hermite functions from zeroth to fifth order:

Table[HermiteH[n, x], { n , 0, 7 } ] I/ TableForm

i
2x
-2+4x 2
-12x+Sx 3
12-48x 2 + 16x 4
120x- 160x 3 + 32x s
-120 +720 x 2 -480x 4 +64x 6
-1680x+3360x 3 - 1344x s + 128x v

The inner scale o- is introduced in the equation by substituting x -+ 7 - x~ - " As a consequence,


with each differentiation we get a new factor - ~ -1 . So now we are able to calculate the I-D
Gaussian derivative functions g d [ x , n , o'] directly with the Hermite polynomials, again
incorporating the normalization factor i
~r 2V~-n :

Clear [u] ;

g~E~_,n , ~_l := -i He=itcH[n, --~ ] i ~.xp[---

Check:

Simplify[gd[x, 4, a], a > 0]


x2
e-~ (x 4 - 6 x 2 c;2 + 3 ~4 )
2~-~ a9

Slmplify[D[ I x2
,,xp[--g.j], o,, 4~], o>O]
o" 2 " ~

x2
e ~ ( x 4 - 6 x 2 ~2 + 3 or4)

The amplitude of the Hermite polynomials explodes for large x, but the Gaussian envelop
suppresses any polynomial function. No matter how high the polynomial order, the
exponential function always wins. We can see this graphically when we look at e.g. the 7 th
order Gaussian derivative without (i.e. the Hermite function, figure left) and with its
Gaussian weight function (figure right). Note the vertical scales:
4. Gaussian derivatives 56

1 )7 x
,E-_] ,o 7 2 .e=ite.[7, 72],
DisplayTogetherArray[{Plot[f[x], {x, -5, 5}],
x2
p2 = Plot[f[x] Exp[---~], {x, -5, 5}]}, ImageSize-> 400];

5O0

-- 2 ~4 4

~ -5o0 4 V2 ~10
-io0o -30

Figure 4.3 Left: The 7 th order Hermite polynomial. Right: idem, with a Gaussian envelope
(weighting function). This is the 7 th order Gaussian derivative kernel.

Due to the limiting extent o f the Gaussian window function, the amplitude of the Gaussian
derivative function can be negligible at the location o f the larger zeros. We plot an example,
showing the 20 th order derivative and its Gaussian envelope function:

n = 20; Q = i; DisplayTogether[(FilledPlo%[gd[x, n, o], {x, -5, 5}],


Plot{ ga[O, n, ol gausstx, ol , {x, -5, 5}] }, ImageSize -> 200] ;
gauss[O, ~

Figure 4.4 The 20 th order Gaussian derivative's outer zero-crossings vahish in negligence.
Note also that the amplitude of the Gaussian derivative function is not bounded by the
Gaussian window. The Gabor kernels, as we will discuss later in section 4.7, are bounded by
the Gaussian window.

H o w fast the Gaussian function goes zero can be seen from its values at x = 3 o-, x = 4 o-
and x = 5 o-, relative to its peak value:

Table[ gauss{a, i] , {a, 3, 5}] / / N


gauss{0, I]
{0.011109, 0.000335463, 3.72665• -6}

and in the limit:


57 4.2 Shape and algebraic structure

Limit[gd[x, 7, 1], x -> Infinity]

The Hermite polynomials belong to the family of orthogonal functions on the infinite
interval (-eo,c~) with the weight function e -x2 , f ~ o e -x2 H n ( x ) H m ( x ) d x = 2n+m n ! ~ / ~ 6nm,
where 6,m is the Kronecker delta, or delta tensor. ~nm = 1 for n = m, and ~nm = 0 for
n 4 : m.

Table[|~Exp[-x2l(" HermiteH[k, x] HermiteH[m, x] d x ,

{k, 0, 3}, {m, O, 3}] // MatrixForm

0 0 0
0 2~'-~" 0 0
0 0 8",/~ 0
0 0 0 48~c~

The Gaussian derivative functions, with their weight function e - T are not orthogonal. We
check this with some examples:

{ o'- 15 }

Other families of orthogonal polynomials are e.g. Legendre, Chebyshev, Laguerre, and
Jacobi polynomials. Other orthogonal families of functions are e.g. Bessel functions and the
spherical harmonic functions. The area under the Gaussian derivative functions is not unity,
e.g. for the first derivative:

SetOptions [Integrate, GenerateConditions -> False] ;


;gd[x, i, a] dx

4.3 Gaussian derivatives in the Fourier domain

The Fourier transform of the derivative of a function is ( - i ~o) times the Fourier transform of
the function. For each differentiation, a new factor ( - i w) is added. So the Fourier transforms
of the Gaussian function and its first and second order derivatives are:

a =. ; Simplify[FourierTransform[
(gauss [x, a], Ox gauss [x, a], acx,2} gauss[x, a] }, x, ~], a > 0]
4. G a u s s i a n d e r i v a t i v e s 58

In general: 5~ t/ a-c(x,r
Oxn
/ = ( - k o ) " 5~ {G(x, o-)).
l

Gaussian derivative kernels also act as bandpass filters. The m a x i m u m is at ~ = f f ~ :

02 ~2
n:.; ~ i; S o l v e [ E v a l u a t e [ D [ <-~ ~}~ -.=,p[--~-], ~,]] :=o, ~,]

{{.~i0~}, {.~-#~}, {.~#~}}


The normalized powerspectra show that higher order of differentiation means a higher center
frequency for the bandpassfilter. The bandwidth remains virtually the same.

(-x~)" E x p [ ’ - ~ ~,2
o-- 1, p l _- , a b i e [ , ' l o t [ , " , , [ <
~
...... ,o , , pL[ _ ___~
~ ] ] ]’

{~, O, 6}, D i s p l a y F u n e t i o n - > Identity], {n, i, 12}] ;


S h o w [p1, D i s p l a y F u n c t i o n - > $ D i s p l a y F u n c t i o n , P l o t R a n g e - > All,
A x e s L a h e l -> { "~", ....}, I m a g e S i z e -> 4001 ;

i . . . . i ’ ~0
1 2 3 4 5 6

Figure 4.5 Normalized power spectra for Gaussian derivative filters for order 1 to 12, lowest
order is left-most graph, o- = 1. Gaussian derivative kernels act like bandpass filters.

Task 4.1 Show with partial integration and the definitions from section 3.10 that
the Fourier transform of the derivative of a function is ( - i w ) times the Fourier
transform of the function.

Task 4.2 Note that there are several definitions of the signs occurring in the
Fourier transform (see the Help function in Mathematica under Fourier). Show
that with the other definitions it is possible to arrive to the result that the Fourier
59 4.3 Gaussian derivatives in the Fourier domain

transform of the derivative of a function is (i ~) times the Fourier transform of the


function. In this book we stick to the default definition.

4.4 Zero crossings of Gaussian derivative functions

gd[x_, n_, G...] := HermiteH[n, --~--2 ]

nmax : 20; Q : 1;
Show [Graphics [Flatten [Table [ {PointSize [0.015] , Point [ {n, x } ] } /.
Solve [HermiteH[n, x ] : : o , :<], nmaxl],
A x e s L a b e l -> { "Order", "Zeros of\nHermiteH" }, Axes -> True,
ImageSize -> 350] ;
Zerosof
HermiteH

7.5

."i iii
5

2.5 9 o 9

.!
i:
-25

5
.::
-7.5

Figure 4.6 Zero crossings of Gaussian derivative functions to 20 th order Each dot is a zero-
crossing.

How wide is a Gaussian derivative? This may seem a non-relevant question, because the
Gaussian envelop often completely determines its behaviour. However, the number of zero-
crossings is equal to the order of differentiation, because the Gaussian weighting function is
a positive definite function.

It is of interest to study the behaviour of the zero-crossings. They move further apart with
higher order. We can define the ’width’ of a Gaussian derivative function as the distance
between the outermost zero-crossings. The zero-crossings of the Hermite polynomials
determine the zero-crossings of the Gaussian derivatives. In figure 4. 6all zeros of the first 20
Hermite functions as a function of the order are shown. Note that the zeros of the second
derivative are just one standard deviation from the origin:

~=.; Simplify[Solve[D[gauss[x, a], {x, 2}] == 0, x], u > 0]


4. Gaussian derivatives 60

An exact analytic solution for the largest zero is not known. The formula of Zernicke (1931)
specifies a range, and Szego (1939) gives a better estimate:

Block [(SDisplayFunction = Identity},


pl =Plot[2Sqrt[n.l-3.05~-n+l], {n, 5, 50}];
( * Z e r n i c k e u p p e r limit * )

p2 = P l o t [ 2 S q r t [ n + l - l . 1 5 ~ n + l ] , {n, I, 50}];
(* Z e r n l c k e lower limit *)
p3 -- Plot[24;§ .~ - 2.338098/~~4~;-.5,
{n, i, 50}, P l o t S t y l e - > D a s h i n g [ { . 0 1 , .02}l]] ;
S h o w [ (pl, p2, p3 }, A x e s L a b e l - > { "Order",
"Width of Gaussian\nderivative (in a)"}, ImageSize-> 260] ;
WidthofGaussian
derivative(io-)
n
14
1 2
10

l0 20 30 40 50
Olxler
Figure 4.7 Estimates for the width of Gaussian derivative functions to 50 m order. Width is
defined as the distance between the outmost zero-crossings. Top and bottom graph:
estimated range by Zernicke (1931), dashed graph: estimate by S z e g o (1939).

For very high orders of differentiation of course the numbers of zero-crossings increases, but
also their mutual distance between the zeros becomes more equal. In the limiting case of
infinite order the Gaussian derivative function becomes a sinusoidal function:

limn~oo ~ (x,.)_- sin(x,/'~- ( ) . )


4.5 The correlation between Gaussian derivatives

Higher order Gaussian derivative kernels tend to become more and more similar. This makes
them not very suitable as a basis. But before we investigate their role in a possible basis, let
us investigate their similarity.

In fact we can express exactly how much they resemble each other as a function of the
difference in differential order, by calculating the correlation between them. We derive the
correlation below, and will appreciate the nice mathematical properties of the Gaussian
function. Because the higher dimensional Gaussians are just the product of 1D Gaussian
functions, it suffices to study the 1D case.

Compare e.g. the 20 th and 24 nd derivative function:


61 4.5 The correlation between Gaussian derivatives

Block[{$DisplayFunction = Identity},
g l = Plot[gd[x, 20, 2], {x, -7, 7}, P l o t L a b e l - > " O r d e r 20"];
g2 = P1ot[gd[x, 24, 2], {x, -7, 7}, P l o t L a b e l - > " O r d e r 24"]];
Show[GraphicsArray[{gl, g2}, ImageSize-> 400]];
Order20 Order24

Figure 4.8 Gaussian derivative functions start to look more and more alike for higher order.
Here the graphs are shown for the 20 th and 24 m order of differentiation.

T h e correlation coefficient between two f u n c t i o n s is defined as the integral o f the product of


the functions over the full d o m a i n (in this case -co to +co).

B e c a u s e we want the coefficient to be unity for complete correlation ( w h e n the functions are
identical by an amplitude scaling factor) we divide the coefficient by the so-called
autocorrelation coefficients, i.e. the correlation of the functions with themselves.

W e t h e n get as definition for the correlation coefficient r between two G a u s s i a n derivatives


o f order n and m:

f_oog(n)(X)g(m)(X)dx
rn'rn ff _ ~ [g(")(x)l 2 dx f
oo [g(m)(x)]2dx

with g(n)(x) - 8~g(x)


0 x n " T h e G a u s s i a n kernel g(x) itself is an e v e n function, and, as we have
seen before, g(n)(x) is an even function for n is even, and an odd function for n is odd. T h e
correlation b e t w e e n an e v e n function and an odd function is zero. This is the case w h e n n
and m are both not even or both not odd, i.e. w h e n (n - m) is odd. W e now can see already
two important results:
rn,m = 0 for (n - m) odd;
rn,m = 1 for n = m .

T h e r e m a i n i n g case is w h e n ( n - m) is even. W e take n > m. Let us first look to the


nominator,f2,og(n)(x)g(m)(x)dx. T h e standard approach to tackle high exponents o f
functions in integrals, is the reduction o f these exponents by partial integration:

fycog (n) (x) g(m) (x) d x =


g(n- 1)(X ) g(m) (X) I~_oo- f-Y,,og(n-1)(X) g(m + I ) ( x ) d x = ( - 1)k f~oog(n-k)(x) g(m +k) (x) d x

w h e n we do the partial integration k times. T h e , stick expression , g (n - 1) eo is zero


(x) g (m) (x) I_~
b e c a u s e any G a u s s i a n derivative function goes to zero for large x. W e can choose k s u c h that
the e x p o n e n t s in the integral are equal (so we e n d up with the square of a G a u s s i a n derivative
4. Gaussian derivatives 62

function). So we make ( n - k ) = (m + k), i.e. k = (n-m)


2 ’
Because we study the case that
(n - m) is even, k is an integer number. We then get:

( - 1)* ~oog(n-k)(x) g(m+k)(x) d x = (- 1 ) ~ f_~g(~-)(x) g ( ~ - ) ( x ) d x

The total energy of a function in the spatial domain is the integral of the square of the
function over its full extent. The famous theorem of Parceval states that the total energy of a
function in the spatial domain is equal to the total energy of the function in the Fourier
domain, i.e. expressed as the integral of the square of the Fourier transform over its full
extent. Therefore

n-m s r n+m ~ ( "+" ~ ~ Parceva] co -+-,


j_cog, = = = f:co = do)--

(-1)~ --.,
~ - fyoow~+m#2 (o))do) = ( - 1 ) ~ " ’~
~f'~o),+m
l e - O ~ 092
do)

We now substitute w’ = o- w, and get finally: ( - 1) ~e~ ~. O"n+'1~n+1 f~ -~- e o o),(n+m) e-,O '2 dw'.

This integral can be looked up in a table of integrals, but why not let Mathematica do the job
(we first clear n and m):

Clear[n, m]~ 1~xm§ dx

1 ( i + ( - 1 ) m+n) Gamma[ 1
-~- (1 + r a + n ) ] L o g [ e ] ~ -( . . . . . )

The function Gamma is the Euler gamma function. In our case Re [m+n] > - 1 , so we get for
our correlation coefficient for (n - m) even:

(-~)"7'~ ~n+m+1
~ r ( ~ T ') _ (-1)~ r(~+;+')
~2,,+1 0~2m+ t 4r(~)r(2T ')

Let’s first have a look at this function for a range of values for n and m (0-15):

/
m+n+l /~/ 2n+l 2m+l
r [ n _ , 111_] := ( - I ) ~ Ganlma[-T] --~ Gamma[T ] Gallt~a[T ] ;

ListPlot3D[Table[Abs[r[n, m]] , {n, 0, 15}, {m, 0, 15}] ,


A x e s - > True, A x e s L a b e l - > {"n", "m", "Abs\nr[n,m]"},
ViewPoint-> {-2.348, -1.540, 1.281}, ImageSize->220] ;

Figure 4.9 The magnitude of the correlation coefficient of Gaussian derivative functions for
0 < n < 15 and 0 < m < 15. T h e origin is in front.
63 4.5 The correlation between Gaussian derivatives

Here is the function tabulated:.

Table[NumberForm[r[n, m] / / N , 3], {n, 0, 4}, {m, 0, 4}] / / M a t r i x F o r m

1. 0. - 0 . 7 9 8 i -0.577 0. + 0 . 4 1 2 i 0.293
0. + 0 . 7 9 8 i i. 0. - 0 . 9 2 1 i -0.775 0. + 0 . 6 2 3 i
-0.577 0. + 0 . 9 2 1 i i. 0. - 0.952 i -0.845
0. - 0 . 4 1 2 i -0.775 0. + 0 . 9 5 2 i 1. 0. - 0 . 9 6 5 i
0.293 0. - 0.623 i -0.845 0. + 0.965 i 1.

The correlation is unity when n = m, as expected, is negative when n - m = 2 , and is positive


when n - m = 4 , and is c o m p l e x otherwise. Indeed w e see that when n - m = 2 the functions
are even but of opposite sign:

Block[{$DisplayFunction = Identity},
p l = P l o t [ g d [ x , 20, 2], (x, -5, 5}, P l o t L a b e l - > "Order 20"];
p2 = P l o t [ g d [ x , 22, 2], {x, -5, 5}, P l o t L a b e l - > "Order 22"]];
Show[GraphicsArray[{pl, p2}, I m a g e S i z e - > 450]];
Order 20 Order 22

,/2,, "
o

/ -4
-4

5o

-6O0

Figure 4.10 Gaussian derivative functions differing two orders are of opposite polarity.

and w h e n n - m = l t h e y h a v e a phase-shift, leading to a c o m p l e x correlation coefficient:

B l o c k [ { $ D i s p l a y F u n c t i o n = Identity],
p l = P l o t [ g d [ x , 20, 2], Ix, -5, 5}, P l o t L a b e l - > " O r d e r 20"];
p2 = P l o t [ g d [ x , 21, 2], {x, -5, 5}, P l o t L a b e l - > " O r d e r 21"]];
Show[GraphicsArray[{p1, p2}, ImageSize-> 450]];
Order20 Order21

4 "~ 50

Figure 4.11 Gaussian derivative functions differing one order display a phase shift.

Of course, this is easy understood if w e realize the factor ( - i ~ o ) in the Fourier domain, and
that i = e -i ~-. W e plot the behaviour of the correlation coefficient of two close orders for
large n. The asymptotic behaviour towards unity for increasing order is clear.
4. Gaussian derivatives 64

Plot[-r[n, n + 2], {n, 1, 20}, D i s p l a y F u n c t i o n -> S D i s p l a y F u n c t i o n ,


AspectRatio-> .4, PlotRange-> {.8, 1.01},
A x e s L a b e l -> { "Order", "Correlation\ncoefficient" }] ;
Correlation
c~fficient
. . . . , . . . . r . . . . , . . . . , Order
5 10 15 20
O95

09 /

085

08

Figure 4.12 The correlation coefficient between a Gaussian derivative function and its ever
neighbour up quite quickly tends to unity for high differential order.

4.6 Discrete Gaussian kernels

o = 2; ,lot[{
1
Exp[-~---~-a2
-x 2
], 1
BesselI[x,a2] /BesselI[0, a2]} ,
{X, 0, 8}, PlotStyle~ {RGBColor[0,0, 0], Dashing[{0.02,0.02}]},
P l o t L e g e n d ~ { "Gauss", "Bessel" }, L e g e n d P o s i t i o n ~ { 1, 0},
L e g e n d L a b e l ~ "O = 2 " , P l o t R a n g e ~ All, I m a g e S i z e ~ 400] ;

02

"~ - - o-=2
Gauss m
015

\ \ - - Bessel

\\\\

Figure 4.13 The graphs of the Gaussian kernel and the modified Bessel function of the first
kind are very alike.

Lindeberg [Lindeberg1990] derived the optimal kernel for the case when the Gaussian kernel
was discretized and came up with the "modified Bessel function of the first kind". In
Mathematiea this function is available as B e s s e l l . This function is almost equal to the
Gaussian kernel for cr > 1, as we see in the figure on the previous page. Note that the Bessel
function has to be normalized by its value at x = 0. For larger o- the kernels become rapidly
very similar.
65 4.7 Otherfamilies of kernels

4.7 Other families of kernels


The first principles that we applied to derive the Gaussian kernel in chapter 2 essentially
stated "we know nothing" (at this stage of the observation). Of course, we can relax these
principles, and introduce some knowledge. When we want to derive a set of apertures tuned
to a specific spatial frequency ~ in the image, we add this physical quantity to the matrix of
the dimensionality analysis:

m: {{1, -1, -2, -2, - 1 ) , {0, O, 1, i , 0 } } ;


TableForm[m,
TableHeadings-> {{"meter", "candela"}, {"a", "~", "LO", "L", "k"}}]

~ LO L k
meter 1 -i -2 -2 -i
candela 0 0 1 1 0

The nullspace is now:

N u l l S p a c e [m]

{ { i , O, O, O, i } , {0, O, - i , i , 0}, { I , i , O, O, 0}}

Following the exactly similar line of reasoning, we end up from this new set of constraints
with a new family of kernels, the Gabor family of receptive fields, with are given by a
sinusoidal function (at the specified spatial frequency) under a Gaussian window.

In the Fourier domain: Gabor(~o, o-, k ) = e - J ~ eik’~ which translates into the spatial
domain:

gabor[x_, a_l := Sin[x] - - ~.xp [- x----~-2]


2 o2

The Gabor function model of cortical receptive fields was first proposed by Marcelja in 1980
[Marcelja1980]. However the functions themselves are often credited to Gabor [Gabor19461
who supported their use in communications.

Gabor functions are defined as the sinus function under a Gaussian window with scale o-.
The phase 4’ of the sinusoidal function determines its detailed behaviour, e.g. for q~ = a-/2 we
get an even function.Gabor functions can look very much like Gaussian derivatives, but there
are essential differences:

- Gabor functions have an infinite number of zero-crossings on their domain.

- The amplitudes of the sinusoidal function never exceeds the Gaussian envelope.
4. Gaussian derivatives 66

gabor[x_, ~_, a_] := Sin[x+~] gauss[x, a] ;


Block[{$DisplayFunction = Identity, p, pg},
p = Plot[gabor[x, #, i0], {x, -30, 30}, PlotRange-> {-.04, .04}] &;
p g = Plot[gauss[x, i0], {x, -30, 30}, PlotRange-> {-.04, .04}];
p13 = Show[p[0], pg]; p23 = Show[p[~/2], pg]];

Show[GraphicsArray[{p13, p23}], ImageSlze-> 450];

,~ ~j:o~ -3o

Figure 4.14 Gabor functions are sinusoidal functions with a Gaussian envelope. Left: Sin[x]
G[x,10]; right: Sin[x4~r/2] G[x,]0].

Gabor functions can be made to look very similar by an appropriate choice of parameters:

1 X2
o = l~ gdEx_, ~_l = o [ ~ ~xp[- 7-~- ] , x],

Plot[{- 1.2gabor[x, 1.2], gd[x, 1]}, Ix, -4, 4},


PlotStyle~ {Dashing[{0.02, 0.02}], RGBColor[0, 0, 0]},
PlotLegend ~ {"Gabor", "Gauss"},
LegendPosition~ (1.2, -0.3}, ImageSize-> 320];

0.5

. . . . Gabor

4 2 ' ' ' 9//


2 ' / > ' 4
'

Gauss
-0I

-0.2

Figure 4.15 Gabor functions can be made very similar to Gaussian derivative kernels. In a
practical application then there is no difference in result. Dotted graph: Gaussian first
derivative kernel. Continuous graph: Minus the Gabor kernel with the same a as the
Gaussian kernel. Note the necessity of sign change due to the polarity of the sinusoidal
function.

If we relax one or more of the first principles (leave one or more out, or add other axioms),
we get other families of kernels. E.g. when we add the constraint that the kernel should be
tuned to a specific spatial frequency, we get the family of Gabor kernels [Florack1992a,
Florack1997a]. It was recently shown by Duits et al. [Duits2002a], extending the work of
Pauwels [Pauwels1995], that giving up the constraint of separability gives a new family of
interesting Poisson scale-space kernels, defined by the solution of the Dirichlet problem
67 4.7 Other families of kernels

OLas= - ( - A ) ~ L " For o~ = 1 we find the Gaussian scale-space, for t~ = ~-1 we get the Poisson
scale-space. In this book we limit ourselves to the Gaussian kernel.

We conclude this section by the realization that the front-end visual system at the retinal
level must be uncommitted, no feedback from higher levels is at stake, so the Gaussian
kernel seems a good candidate to start observing with at this level. At higher levels this
constraint is released.

The extensive feedback loops from the primary visual cortex to the LGN may give rise to
’geometry-driven diffusion’ [TerHaarRomeny1994f], nonlinear scale-space theory, where the
early differential geometric measurements through e.g. the simple cells may modify the
kernels LGN levels. Nonlinear scale-space theory will be treated in chapter 21.

Task 4.2 When we have noise in the signal to be differentiated, we have two
counterbalancing effect when we change differential order and scale: for higher
order the noise is amplified (the factor (-i~)) n in the Fourier transform
representation) and the noise is averaged out for larger scales. Give an explicit
formula in our Mathematica framework for the propagation of noise when filtered
with Gaussian derivatives. Start with the easiest case, i.e. pixel-uncorrelated
(white) noise, and continue with correlated noise. See for a treatment of this
subject the work by Blom et al. [Blom1993a].

Task 4.3 Give an explicit formula in our Mathematica framework for the
propagation of noise when filtered with a compound function of Gaussian
derivatives, e.g. by the Laplacian ~a2c + 0-7"
02~ See for a treatment of this subject
the work by Blom et al. [Blom1993a].

4.8 Higher dimensions and separability


Gaussian derivative kernels of higher dimensions are simply made by multiplication. Here
again we see the separability of the Gaussian, i.e. this is the separability. The function
gd2D [ x , y , a , m, a x , cry’] is an example of a Gaussian partial derivative function in 2D,
first order derivative to x, second order derivative to y, at scale 2 (equal for x and y):
4. Gaussian derivatives 68

gd2D[x_, y_, n_, m_, ox_, oy_] :: gd[x, n, ox] gd[y, m, oy];
Plot3D[gd2D[x, y, i, 2, 2, 2], {x, -7, 7},
{y, -7, 7}, A x e s L a b e l - > {x, y, ....}, P l o t P o i n t s - > 40,
FlotRange -> All, Boxed -> False, Axes -> True, ImageSize -> 190];

0 3 (;(x,y)
Figure 4.16 Plot of a x o y 2 . The two-dimensional Gaussian derivative function can be
constructed as the product of two one-dimensional Gaussian derivative functions, and so for
higher dimensions, due to the separability of the Gaussian kernel for higher dimensions.

The ratio o-y


o-x is called the anisotropy ratio. When it is unity, we have an isotropic kernel,
which diffuses in the x and y direction by the same amount. The Greek word ’isos’ (tcror
means ’equal’, the Greek word ’tropos’ (Tpo~ror means ’direction’ (the Greek word ’topos’
(’ro~rog) means ’location, place’).

In 3D the iso-intensity surfaces of the Gaussian kernels are shown (and can be interactively
manipulated) with the command MVContourPlot3D from the OpenGL viewer ’MathGL3D’
by J.P.Kuska
(phong.informatik.uni-leipzig.de/~kuska/mathgl3dv3):

<< MathGL3d'OpenGLViewer" ;

MVClear[]; a = 1;
x2.y2+z2
pl =Table[MVContourPlot3D[Evaluate[D[E- ~o2 , {x, n}]], {x, -6, 6},
{y, -4, 4}, {z, -3, 0}, C o n t o u r s ~ R a n g e [ - . 6 , .6, .1 l, P l o t P o i n t s ~ 6 0 ,
B o x R a t i o s ~ {2, 2, i}, D i s p l a y F u n c t i o n ~ Identity], {n, i, 3}I;
Show [GraphicsArray [pl] , ImageSize ~ 400] ;

Figure 4.17 Iso-intensity surface for Gaussian derivative kernels in 3D. Left: -EE
0G . middle: a2G
a3 G
right: a~-.

The sum of 2 of the three 2 nd order derivatives is called the ’hotdog’detector:


69 4.8 Higher dimensions and separability

M V C l e a r [] ; a = 1;
x2+y2..2 ~2~2 ~z2
pl = MVContourPlot3D[Evaluate[0x,xE 2~2 +0z,zE ~o2 ], {x, -6, 6},
{y, -4, 4}, {z, -3, 0), C o n t o u r s ~ R a n g e [ - . 6 , .6, .i] ,
P l o t P o i n t s ~ 60, B o x R a t i o s ~ {2, 2, 1}, I m a g e S i z e - > 1501 ;

a2G o~G
Figure 4.18 Iso-intensity surface for the Gaussian derivative in 3D ~ + a~-'

4.9 Summary of this chapter


The Gaussian derivatives are characterized by the product of a polynomial function, the
Hermite polynomial, and a Gaussian kernel. The order of the Hermite polynomial is the same
as the differential order of the Gaussian derivative. Many interesting recursive relations exist
for Hermite polynomials, making them very suitable for analytical treatment. The shape of
the Gaussian derivative kernel can be very similar to specific Gabor kernels. One essential
difference is the number of zerocrossing: this is always infinite for Gabor kernels, the
number of zerocrossings of Gaussian derivatives is equal to the differential order. The
envelope of the Gaussian derivative amplitude is not the Gaussian function, as is the case for
Gabor kernels.

The even orders are symmetric kernels, the odd orders are asymmetric kernels. The
normalized zeroth order kernel has unit area by definition, the Gaussian derivative kernels of
the normalized kernel have no unit area.

Gaussian derivatives are not orthogonal kernels. They become more and more correlated for
higher order, if odd or even specimens are compared. The limiting case for infinite order
leads to a sinusoidal (for the odd orders) or cosinusoidal (for the even orders) function with a
Gaussian envelope, i.e. a Gabor function.

In the vision chapters we will encounter the Gaussian derivative functions as suitable and
likely candidates for the receptive fields in the primary visual cortex.
5. Multi-scale derivatives:
implementations
Three people were at work on a construction site. All were doing the same job, but when each was asked what the
job was, the answers varied. "Breaking rocks," the first replied. "Earning my living," the second said. "Helping to
buiM a cathedral," said the third.
-Peter Sehultz

In order to get a good feeling for the interactive use of Mathematica, we discuss in this
section three implementations of convolution with a Gaussian derivative kernel (in 2D) in
detail:
1. implementation in the spatial domain with a 2D kernel;
2. through two sequential 1D kernel convolutions (exploiting the separability property);
3. implementation in the Fourier domain.
Just blurring is done through convolution with the zero order Gaussian derivative, i.e. the
Gaussian kernel itself.

5.1 Implementation in the spatial domain


Mathematica 4 has a fast implementation of a convolution: L i s t C o n v o l v e [kernel,
list] forms the convolution of the kernel k e r n e l with list. This function is N-
dimensional, and is internally optimized for speed. It can take any Mathematica expression,
but its greatest speed is for R e a l (floating) numbers. We first define the 1D Gaussian
function g a u s s [ x , G] ."

<< F r o n t E n d V i s i o n ' F E V " ;


U n p r o t e c t [gauss] ;

1 ,2
gauss[x_, G_/;G>0] : = - - e 2~2 ;
o 24~

We explain in detail what happens here:


The function g a u s s [x._, a._] is defined for the variables x._ and a.._. The underscore _
means that x is a P a t t e r n with the name x, it can be anything. This is one of the most
powerful features in Mathematica: it allows pattern matching. In the appendix a number of
examples are given. The variable a__ has the condition (indicated with ] ;) that o should be
positive, ff this condition is not met, the function will not be evaluated. The function is
defined with delayed assignment ( : = in stead of = for direct assignment). In this way it will
be evaluated only when it is called. The semicolon is the separator between statements, and
in general prevents output to the screen, a handy feature when working on images.

The function gDc [ ix., a x , n y , o] implements the same function in the spatial domain. The
parameters are the same as above. This function is much faster, as it exploits the internal
72 5.1 Implementation in the spatial domain

function ListConvolve, and applies Gaussian derivative kernels with a width truncated to
+/- 4 standard deviations, which of course can freely be changed.

gDc[im_, nx_, ny_, u_/; u > 0] := Module[{x, y, kernel},


kernel = N[Table[Evaluate[
D[gauss[x, G] *gauss[y, u], {x, nx}, {y, nyl]],
{y, -4*G, 4* a}, {x, -4*a, 4* a}]];
ListConvolve[kernel, im, Ceiling[Dimensions[kernel]/2]]];

M o d u l e [ { v a r s }, ... ] is a construct to make a block of code where the vars are


shielded from the global variable environment. The derivative of the function g a u s s [ ] is
taken with D [ f , { x , n x } , { y , n y } ] where n x is the number of differentiations to x and
n y the number of differentiations to y. The variable k e r n e l is a L i s t , generated by the
Table command, which tabulates the function g a u s s [ ] over the range _ 40- for both x
and y. The derivative function must be evaluated with E v a l u a t e [ ] before it can be
tabulated. The function N [ ] makes the result a numerical value, a R e a l number.

ListConvolve is an optimized internal Mathematica command, that cyclically convolves


the kernel k e r n e l with the image im. The D i m e n s i o n s [ ] of the kernel are a L i s t
containing the x- and y-dimension of the square kernel matrix. Finally, the upwards rounded
(Ceiling) list of dimensions is used by L i s t C o n v o l v e to fix that the kernel starts at the
first element of i m and returns an output image with the same dimension as the input image.

i m = T a b l e [ I f [ x 2 + y 2 < 7000, 100, 0], {x, -128, 127}, {y, -128, 127}];
Block[{$DisplayFunction = Identity},
pl=ListDensityPlot[#] & /@{im, gDc[im, i, 0, i]}];
Show[GraphicsArray[p1], ImageSize -> 350];

Figure 5.1 The derivative to x (right) at scale o- = 1 pixel on a 2562 image of a circle (left).

The wider the kernel, the more points we include for calculation of the convolution, so the
more computational burden we get. When the kernel becomes wider than half of the domain
of the image, it becomes more efficient to apply the Fourier implementation discussed
below. This trade-off has been worked out in detail by Florack [Florack2000a].
5. Multi-scale derivatives: implementations 73

5.2 Separable implementation


The fastest implementation exploits the separability of the Gaussian kernel, and this
implementation is mainly used in the sequel:

Options[gD] = (kernelSampleRange ~ {-6, 6}};


gD[im_List, nx_, ny_, u_, ( o p t s ) ? O p t i o n Q ] :=
Module[{x, y, kpleft, kpright, kx, ky, mid, tmp},
{kpleft, kpright} = kernelSampleRange /. {opts} /. Options[gD] ;
kx = N[Table[Evaluate[D[gauss[x, G], {x, nx}]],
{x, kpleft*u, kpright*G}]];
ky =
If[nx == ny, kx, N[Table[Evaluate[D[gauss[y, G], {y, ny}]],
{y, kpleft* G, kpright*a}]]]; mid = Ceiling[Length[#1] /2] & ;
tmp =
Transpose[ListConvolve[{kx}, im, {{i, mid[kx]}, (i, mid[kx]}}]];
Transpose[ListConvolve[(ky}, tmp, {{i, mid[ky]}, {i, mid[ky]}}]]];

The function gD [ira, n x t n y , a , o p t i o n s ] implements first a convolution per row,


then transposes the matrix of the image, and does the convolution on the rows again, thereby
effectively convolving the columns of the original image. A second T r a n s p o s e returns the
image back to its original orientation. This is the default implementation of multi-scale
Gaussian derivatives and will be used throughout his book.

i m = Table[If[x 2 +y2 < 7000, 100, 0l, {x, -128, 127}, {y, -128, 127}];
Timing[imx= gD[im, 0, I, 2]][[1]]

0.031 Second

Block[{$DisplayFunction = Identity},
pl =ListDensityPlot[#] & / @ (im, imx}];
Show[GraphicsArray[pl], ImageSize -> 260];

Figure 5.2 The derivative to y (right) at scale o- = 2 pixels on a 2562 image of a circle (left).

Task 5.1 Write a Mathematica function of the separable Gaussian derivative


kernel implementation for 3D. Test the functionality on a 3D test image, e.g. a
sphere.
74 5.3 Some examples

5.3 Some examples


Convolving an image with a single point (a delta function) with the Gaussian derivative
kernels, gives the kernels themselves., i.e. the pointspread function. E.g. here is the well
known series of all Cartesian partial Gaussian derivatives to 5 th order:

spike = T a b l e [ 0 . , {128}, {128}]; spike[[64, 64]] = I.;


B l o c k [ { $ D i s p l a y F u n c t i o n = Identity},
a r r a y []T a b l e [ T a b l e [ L i s t D e n s i t y P l o t [ g D [ s p i k e , m - n, n, 20],
P l o t L a b e l - > "0x=" < > T o S t r i n g [ m - n ] < > " , ay=" < > T o S t r i n g [ n ] ] ,
{n, 0, re}I, {m, 0, 5 } ] ] ;
Show[GraphicsArray[array], I m a g e S i z e ~ 330];

Figure 5.3 Gaussian partial derivative kernels up to 5 th order.

$DisplayFunction is the internal variable that determines how things should be


displayed. Its normal state (it default has the value D i s p l a y [ S D i s p l a y , #1] &) is to send
PostScript to the output cell. Its value is temporarily set to I d e n t i t y , which means: no
output. This is necessary to calculate but not display the plots.

We read an image with I m p o r t and only use the first element [ [ 1 , 1 ] ] of the returned
structure as this contains the pixeldata.

im [] I m p o r t ["mr128.gif"] [ [1, 1] ] ;

We start with just blurring at a scale of o- = 3 pixels and show the result as 2D image and 3D
height plot:
5. Multi-scale derivatives: implementations 75

DisplayTogetherArray[
{ListDensityPlot[gD[im, 0, 0, 3]], ListPlot3D[gD[im, 0, 0, 3],
Mesh->False, B o x R a t i o s ~ {1, 1, 1}]}, ImageSize~ 500];

Figure 5.4 Left: a blurred MR image, resolution 1282, O-blur= 3 pixels. Right: The intensity
surface as a height surface shows the blurring of the surfaces.

A movie of a (in this example) logarithmically sampled intensity scale-space is made with
the T a b l e command. Close the group of cells with images by double-clicking the group
bracket. Double-clicking one of the resulting images starts the animation. Controls are on the
bottom windowbar.

ss = Table[ListDensityPlot[gDf[im, 0, 0, E'], ImageSize-> 150],


{~, O, 2 . 5 , .25}];

Figure 5.5 Animation of a blurring sequence, with exponential scale parametrization. Double-
click the image to start the animation (only in the electronic version). Controls appear at the
lower window bar.

This animation is only available in the electronic version. Here are the images:
76 5.3 Some examples

Show[GraphicsArray[Partition[ss, 5]], ImageSize -> 450];

Figure 5.6 Frames of the animation of a blurring sequence above.

The sequence can be saved as an animated GIF movie (e.g. for use in webpages) with:

Export [ "c : \ \scalespace. gif", ss, "GIF" ] ;

The gradient of an image is defined as ~ x 2 + Ly 2 . On a scale o- = 0.5 pixel for a 2562 CT


image of chronic cocaine abuse (EuroRAD teaching file case #1472, www.eurorad.org):

i m = Import["eocaine septum.gif"] [[1, 1]] ;


DisplayTogetherArray [ {ListDensityPlot [im] ,

grad=ListDensityPlot[4gD[im,
I, 0, .5] 2 +gD[im, 0, I, .5] 2 ]},
ImageSize -> 370] ;

Figure 5.7 The gradient at a small scale o- = 0.5 pixels. Due to the letters R and L in the
image with steep gradients the gradient image is not properly scaled in intensity. Note the
completely missing septum in this patient (From www.eurorad.org, EuroRAD authors: D. De
Vuyst, A.M. De Schepper, P.M. Parizel, 2002).

To change the window/level (contrast/brightness) settings one can change the displayed
range of intensity values:
5. Mul scale derivatives: implementations 77

Show[grad, P l o t R a n g e ~ {0, 20},


D i s p l a y F u n c t i o n -> S D i s p l a y F u n c t l o n , I m a g e S i z e -> 150];

Figure 5.8 The gradient at a small scale o- = 0.5 pixels, now with an intensity window of 0
(black) to 30 (white).

We can also transfer the image into its histogram equalized version, by substituting its
grayvalues by the values given by its cumulative lookup table:

U n p r o t e c t [heq] ;
heq[im_List] : = M o d u l e [ { m i n , max, freq, cf, icf, maxcf, lut, int),
m i n = M i n [im] ; m a x = Max[im] ;
f r e q = B i n C o u n t s [ F l a t t e n [ i m ] , {min, max, (max - min) / 256} ] ;
cf = F o l d L i s t [Plus, F i r s t [freq] , D r o p [freq, i] ] ;
m a x c f = Max[cf] ; icf = Length[cf] ;
lut = T a b l e [ N [ { ( i - i) /icf, cf[[i]] / m a x c f } ] , {i, i, Icf}] ;
lut[[icf]] = {i., i.};
int = I n t e r p o l a t i o n [ l u t ] ; m a x int[ (im- min) / (max - min) ] l ;

ListDensityPlot [

heq[~gD[im, i, 0, .5] 2 + g D [ i m , 0, i, .512 ], I m a g e S i z e - > 150];

Figure 5,9 Histogram equalization of the gradient image of figure 5.7. By many radiologists
this is considered too much enhancement. 'Clipped' adaptive histogram equalization admits
different levels of enhancement tuning [Pizer1987].

The cumulative lookup table is applied for the intensity transform. Small contrasts have been
stretched to larger contrasts, and reverse. We next compare the histograms of the gradient
image with the histogram of the histogram-equalized gradient image. The total histogram of
this image is indeed reasonably flat now.
78 5.3 Some examples

grad = 4 g D [ i m ,
1, 0, .5] 2 + gD[im, 0, 1, .5] 2 ; n i s p l a y T o g e t h e r A r r a y [
Histogram[Flatten[#]] & /@ {grad, heq[grad]}, ImageSize ~ 380] ;

15OO0 80O0
10000

. I .Lll ..........
20 40 60 80 100 120 2O 4O 60 80 100 120

Figure 5.10 Left: Histogram of the gradient image of figure 5.7. Right: Histogram of the
histogram-equalized gradient image. Note the equalizing or marked stretching of the
histogram.

To conclude this introduction to multi-scale derivatives, let us look at some edges detected at
different scales. It is clear from the examples below that the larger scale edges denote the
more ’important’ edges, describing the coarser, hierarchically higher structure:

im = Import["Utrecht256.gif"][[1, 1]];
DisplayTogetherArray[
ListDensityPlot[4gn[ia, 1, 0, #]2 +gD[im, 0, 1, #]2] &/@{.S, 2, 5},
ImageSize -> 400];

Figure 5.11 Gradient edges detected at different scales (o- = 0.5, 2, 5 pixels resp.). The
coarser edges (right) indicate hierarchically more 'important' edges.

Other sources of different scales for edges are shadows and diffuse boundaries [Elder1996].

5.4 N-dim Gaussian derivative operator implementation


One of the powerful capabilities of Mathematica as a programming language is the relative
ease to write numerical functions on N-dimensional data. In scale-space theory often high
dimensional data occur: 3D and 3D-time medical images, such as 3D cardiovascular time
sequences, orientation bundles (see chapter 16 where an extra dimension emerges from the
inclusion of orientation as the output of measurements by oriented filters), high dimensional
feature spaces for texture analysis, etc. Here is the separable implementation for N-
dimensions:
5. Multi-scale derivatives: implementations 79

Unproteet[gDn] ; gDn[im_, orderlist_, alist_, opts ?OptionQ] : =


Module[ {gaussd, dim = Length[Vimensions[im] ] , out = N [im] , I, r, gder, x,
kernel, cat, mid, Ic, tl, td}, td = D i m e n s i o n s / @ {orderlist, aliat} ;
tl = L e n g t h / @ td; { 1, r) = karnelSampleRange/. {opts} /. Options [gD] ;
1 f 1 ) ,I x2
gaussd = #2-~4~--~ t-~-~--~-) .e~ito.[#1. #2
. .~ - . . . ~;

gder = Table[N[gaussd[#1, #2] ], {x, Floor[l #2] , Ceiling[r #2] }] &;


kernel = RotateRight[MapThread [gder, {orderlist, alist} ] ] ;
mid = (Ceiling[Length[#1] / 2] &) /@kernel;
c n t = Append[Table[l, {dim - 1}] , mid[#1]] &;
ic =
Tranapose[ListConvolve[Nest [List, kernel~#2], d i m - i] , #I, {cnt [#2] , cnt [#2] } ] ,
RotateRight[Range[dim] ] ] &; Do[out = le[out, i], {i, dim}] ; out]

Tile function makes use of the possibility to N e s t functions to large depth, and the
universality of the L i s t C o n v o l v e function. The function is fast. Note the specification of
orders and scales as lists, and note the specific, Mathematica-intrinsic ordering with the
fastest running variable last: {z,y,x}.

03L
Example: g D n [ i r a , { 0 , 2 , 1 }, {2,2,2 } ] calculates ~ of the input image i m at an
isotropic scale of o-z = O-y = O-x = 2 pixels.

Here is the time it takes to calculate the first order derivative in 3 directions at scales of 1
pixel of a 1283 random array (more than 2 million pixels, 1.7 GHz, 512 MB, Windows XP):

im=Table[Random[], {128), {128}, {128}];


Timing[gDn[im, {i, I, 1}, {1, i, I}]] / / F i r s t

5.094 Second

This gives help on how to call the function:

? gDn

gDn[im,{...,ny,nx},{...,ay,ax},options] calculates the Gaussian


derivative of an N-dimensional image by approximated spatial
convolution. It is optimized for speed by ID convolutions per
dimension. The image is considered cyclic in each direction.
Note the order of the dimensions in the parameter lists.
im = N-dimensional input image [List]
nx = order of differentiation to x [Integer, nx z 0]
~x = scale in x-dimension [in pixels, ~ > 0]
options = <optional> kernelSampleRange: range of kernel
sampled in multiples of ~. Default: kernelSampleRange->{-6,6}

Example: gDn[im,{0,0,1},[2,2,2}] calculates the x-


derivative of a 3D image at an isotropic scale of az=ay=ax=2.

5.5 Implementation in the Fourier domain


The spatial convolutions are not exact. The Gaussian kernel is truncated. In this section we
discuss the implementation of the convolution operation in the Fourier domain.
80 5.5 Implementation in the Fourier domain

In appendix B we have seen that a convolution of two functions in the spatial domain is a
multiplication of the Fourier transforms of the functions in the Fourier domain, and take the
inverse Fourier transform to come back to the spatial domain. We recall the processing
scheme (e.g. with 1D functions):

f(x) : h(• |
$ $ $
F(w) : s(~) G(w)

The $ indicates the Fourier transform in the downwards direction and the inverse Fourier
transform in the upwards direction, f ( x ) is the convolved function, h(x) the input function,
and g(x) the convolution kernel.

The function g D f [ i m , n x , n y , a ] implements the convolution of the 2D image with the


Gaussian derivative for 2D discrete data in the Fourier domain. This is an exact function, no
approximations other than the finite periodic window in both the x- and y-direction. We
explicitly give the code of the functions here, so you see how it is implemented, the reader
may make modifications as required. All information on (always capitalized) internal
functions is on board of the Mathematica program in the Help Browser (highlight+key F1),
as well as on the ’the Mathematica book’ internet pages of Wolfram Inc.

Variables: i m = 2D image (as a L i s t structure)


n x , n y = order of differentiation to x resp. y
a = scale of the Gaussian derivative kernel, in pixels

The underscore in e.g. i m _ means B l a n k [ i m ] and stands for anything (a single element)
which we name ira. i m L i s t means that i m is tested if it is a L i s t . If not, the function
g D f will not be evaluated.

Unprotect[gnf]; Remove[gDf];

gDf[im_Lis%, nx_, ny_, a_] :=


Module [ ( xres, yres, gdkernel},
{yres, xres} []Dimensions[im] ;
gdkernel =
r x2 § ~.2
N[TablerEvaluete[D[ - I ExPk---~-~---G2 ], {x, nx}, {y, ny}]], {y,
L k k 2 7~. G2

- (yres - i) / 2, (yres - i) / 2}, {x, - (xres - i) / 2, (xres - I) / 2} 1 ] ;


Chop [N [~ xres yres InverseFourier [Fourier [im]
Fourier[RotateLeft[gdkernel, {yres/2, xres/2}]]]]]] ;
5. Mul scale derivatives: implementations 81

A Module [ {vars } .... code... ] is a scope construct, where the vars are private
variables. The last line determines what is returned. The assignment with : = is a delayed
assignment, i.e. the function is only evaluated when called. The dimensions of the input
image are extracted (note the order!) and the Gaussian kernel is differentiated with the
function D [ g a u s s , { x , n x } , { y , n y } ] and symmetrically tabulated over the x- and y-
dimensions to get a kernel image with the same dimensions as the input image.

We have now a 2D L i s t with the kernel in the center. We shift gdkernel with
RotateLeft over half the dimensions in x- and y-direction in order to put the kernel’s
center at the origin at {0,0}. We could equally have shifted in this symmetric case with
RotateRight. We then take the F o u r i e r transform of both the image and the kernel,
multiply them (indicated by a space) and take the I n v e r s e F o u r i e r transform.

Because we have a finite Fourier transform, we normalize over the domain through the factor
"X]xres y r e s . The function N [ ] makes all output numerical, and the function C h o p [l
removes everything that is smaller then 10 -1~ , so to remove very small round-off errors.

im = Import["mr256.gif"] [ [1, 1] ] ; imx = g D f [ira, 1, 0, 1] ;


ListDensityPlot[imx, I m a g e S i z e -> 240] ;

Figure 5.12 First order Gaussian derivative with respect to x at scale o- = 1 pixel, calculated
through the Fourier domain. Resolution 2562 pixels.

The Mathematica function Fourier is highly optimized for any size of the data, and uses
sophisticated bases when the number of pixels is not a power of 2.

This function is somewhat slower that the spatial implementation, but is exact. Here is a
vertical edge with a lot of additive uniform noise. The edge detection at very small scale only
reveals the ’edges of the noise’. Only at the larger scales we discern the true edge, i.e. when
the scale of the operator applied is at ’the scale of the edge’.
82 5.5 Implementation in the Fourier domain

im5 = Table[If[x> 128, 1, 0] +13Random[I, {y, 256}, {x, 256}];


DisplayTogetherArray[
Prepend[ListDensityPlot[4gDf[imS, i, 0, #]2 +gDf[im5, 0, 1, #]2] &/@
{2, 6, 12}, ListDensityPlot[im5]], ImageSize-> 500];

Figure 5.13 Detection of a very low contrast step-edge in noise. Left: original image, the step-
edge is barely visible. At small scales (second image, o- = 2 pixels) the edge is not detected.
We see the edges of the noise itself, cluttering the edge of the step-edge. Only at large scale
(right, o- = 12 pixels) the edge is clearly found. At this scale the large scale structure of the
edge emerges from the small scale structure of the noise.

Task 5.2 The Fourier implementation takes the Fourier transform of the image
and the Fourier transform of a calculated kernel. This seems a waste of
calculating time, as we know the analytical expression for the Fourier transform
of the Gaussian kernel. Write a new Mathematica function that takes this into
account, and check if there is a real speed increase.

Task 5.3 The spatial implementation has different speed for different size
kernels. With increasing kernel size the number of operations increases
substantially. How?

Task 5.4 Compare for what kernel size the choice of implementation is
computationally more effective: Fourier or spatial domain implementation. See
also [Florack 2000a].

T h e r e are two concerns we discuss next: what to do at the boundaries? And: the function is
slow, so h o w to speed it up?
5. Mul scale derivatives: implementations 83

5.6 Boundaries
DisplayTogetherArray[
S h o w / @ Import/@ {"Magritte painting boundary.gif", "Magritte.jpg"},
I m a g e S i z e -> 340];

Figure 5,14 It is important to consider what happens at the boundaries of images. It matters
what we model outside our image. Painting by Ren~ Magritte (right: self-portrait, 1898-1967).

At the boundary of the image artefacts may appear when we do convolutions with (by
nature) extended kernels. Here is an example: two linear intensity ramps give a constant
output when we calculate the first derivative to x, but we see both at the left- and right-hand
side strong edge responses, for the Fourier implementation as well as for the spatial
implementation:

ira= Table[If[y> 64, x-64, 64-x], {y, 128}, {x, 128}];


DisplayTogetherArray[ListDensityPlot [#] & /@
{im, gDf[im, i, 0, 3], gD[im, i, 0, 3] }, ImageSize-> 400] ;

Figure 5.15 Boundary effects due to the periodicity of the Fourier domain.

This is due to the fact that both in the Fourier domain as the spatial domain implementation
of the convolution function the image is regarded as repetitive. A Fourier transform is a
cyclic function, i.e. ~F’(~o)= ~-’(to + n 2 zr). In 2D: T(COx, Wy) = U(Wx + nx 2 ~r, ~o + ny 2 ~).
The boundary effects in the image above are due to the strong edge created by the
neighboring pixels at both ends. One can regard the domain of the image as a window cut-
out from an infinite tiling of the plane with 2D functions. Figure 4.1 shows a tiling with 20
images, each 642 pixels:
84 5.6 Boundaries

ira= Import["mr64.gif"] [[i, i]] ;


tiles = Join@@Table[MapThread[Join, Table [in*, {5}]], {4}] ;
ListDensityPlot [tiles, ImageSize -> 280] ;

Figure 5.16 A section from the infinite tiling of images when we consider a cyclic operation.
The Mathematica function M a p T h r e a d maps the function J o i n on the rows of the horizontal
row of 5 images to concatenate them into long rows, the function J o i n is then applied (with
A p p l y or @@) on a table of 4 such resulting long rows to concatenate them into a long
vertical image.

Clear[a, b, c, d, e, f, hi ;
MapThread[h, {{a, b, e}, {d, e, f}}]

{h[a, d], h[b, e], h[c, f]}

Apply[h, {{a, b, c}, {d, e, f}}]

hi{a, b, c}, {d, e, f}]

h @ @ { { a , b, e}, {d, e, f}}

h[{a, b, c}, {d, e, f}]

It is important to realize that there is no way out to deal with the boundaries. Convolution is
an operation with an e x t e n d e d kernel, so at boundaries there is always a choice to be made.
The most c o m m o n decision is on repetitive tiling of the domain to infinity, but other choices
are just as valid. One could extend the image with zero’s, or mirror the neighboring image at
all sides in order to minimize the edge artefacts. In all cases information is put at places
where there was no original observation. This is no problem, as long as we carefully describe
how our choice has been made. Here is an example of mirrored tiling:
5. Mul scale derivatives: implementations 85

i m = I m p o r t [ " m r 1 2 8 . g i f " ] [[1, 1]]; O f f [ G e n e r a l : : s p e l l ] ;


imv = R e v e r s e Jim] ; imh = R e v e r s e / @ im; i m h v = R e v e r s e / @ R e v e r s e [im] ;

mirrored = Join@@ M a p T h r e a d [Join, #] & /@ imh im imh ;


imhv imv imhv
ListDensityPlot[mirrored, I m a g e S i z e -> 270];

Figure 5.17 A section from the infinite tiling of images when we consider a mirroring
operation. Note the rather complex mirroring and concatenation routines for these 2D images.

Task 5.5 Rachid Deriche [Deriche 1992] describes a fast recursive


implementation of the Gaussian kernel and its derivatives. Make a Mathematica
routine for recursive implementation.

Task 5.6 A small truncated kernel size involves less computations, and is thus
faster. Blurring with a large kernel can also be accomplished by a concatenation
of small kernels, e.g. a blurring step with o- = 3 pixels followed by a blurring step
with o- = 4 pixels gives the same result as a single blurring step with 0- = 5 pixels
(o-12+o-22 =O'new2). What is faster, a large kernel, or a cascade series of
smaller kernels? Where is the trade-off?

5.7 Advanced topic: speed concerns in Mathematica


This section can be skipped at first reading.

Mathematica is an interpreter, working with symbolic elements, and arbitrary precision. For
this reason, care must be taken that computation times do not explode for large datasets.
When proper measures are taken, Mathematica can be fast, close to compiled C++ code. In
86

this section we discuss some examples of increasing the speed of the operations on larger
datasets.

It pays off to work numerically, and to compile a function when it is a repetitive operation of
simple functions. Mathematica’s internal commands are optimized for speed, so the gain here
will be less. We discuss the issue with the example of the generation of a discrete Gaussian
derivative kernel. The timings given are for a 1.7 GHz 512 MB PC and Mathematica 4.1
under Windows XP.

First of all, exact calculations are slow. Most internal Mathematica functions can work both
with symbolic and numerical data. These internal functions are fully optimized with respect
to speed and memory resources for numerical input. Here is a simple example:

o = I; m = Table FExp [- x2 +y2 ] {x, -4, 4}, {y, -4, 4}] ;


~ 2 u 2 ’

Timing[Eigenvalues[m]]
Timing[Eigenvalnes[N[m]]]
Timing[Chop[Eigenvalues[N[m]]]]
2 2 2
{3.156Second, { 0 , 0 , 0 , 0 , 0 , 0 , 0, 0 , 1 + ~ i ~ - + ~ - + ~- +

{0. S e c o n d , { 1 . 7 7 2 6 4 , 5 . 5 9 3 7 3 10 -17 , - 4 . 2 7 2 3 2 10 -17 , - 1 . 8 2 9 7 8 10 18,


3 . 2 2 6 8 8 x 1 0 -22 , - 6 . 5 0 7 2 z4, _ 7 . 4 7 8 6 4 -34 , 1 . 0 5 4 9 2 3s, 0 . } }

{0. S e c o n d , {1.77264, 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0}}

In the sequel we will develop a very fast implementation for the convolution of a 2D image
with a Gaussian derivative in the Fourier domain (see section 4.3). Most of the time is spent
in the creation of the 2D Gaussian kernel, e.g. for 2562 :

{xres, yres} = {256, 256}; o = 3;


Timing[

kernel = Table{L2- ~I o 2 Exp[---~-~--G2],


r x2 + y2 {y, - ( y r e s - i ) / 2 , (yres-i)/2},
{x, - (xres- i) /2, (xres- I) /2}]] [[i]]

4.859 Second

Mathematica keeps values as long as possible in an exact representation. Here is the


pixelvalue at (30, 47):

kernel[ [30, 47]]

1
1 8 e 32689/36 )T

An additional disadvantage is that the Fourier transform on such symbolic expressions also
takes a long time:

Timing[fft =Fourier[kernel]] //First

8. S e c o n d
5. Multi-scale derivatives." implementations 87

It doesn’t make much difference when we enter the data as R e a l values (to be done with the
insertion of a decimal point in a number, or through the function N):

{xres, yres} = {256., 256.}; ~ : 3.; pl = N [ ~ ] ;


Timing [
r X2 + y2
gdkernel =Table|-- E x p |L- _~_ | d, {y, - (yres- i) /2, ( y r e s - I) /2},
L 27fG2
{x, - ( x r e s - I) /2, (xres- i) /2}]] [[I]]

5.797 S e c o n d

The output is now a number, not a symbolic expression:

gdkernel[[30, 48]]

6.379323933059 x 1 0 -393

But still, we have no gain in speed. This is because the internal representation is still in
’arbitrary precision’ mode. The smallest and largest n u m b e r that can be represented as a
R e a l is:

SMinMachineNumber
SMaxMachineNumber

2.22507 • 10 -308

1.79769•

We have smaller values in our pixels! As soon as Mathematica encounters a number smaller
or larger then the dynamic range for R e a l numbers, it turns into arbitrary precision mode,
which is slow. A good improvement in speed is therefore gained through restricting the
output to be in this dynamic range. In our example the parameter for the exponential function
E x p should be constrained:

[xres, yres} = [256., 256.}; a[] 3.; p i = N [ ~ ] ;


1 x 2 + y2 x 2 + y2 ]
T i m i n g [gdkernel = TableL2----~--Exp[If[[[[ 2 G 2 < - i 0 0 , -100 9 - - -2] ]a 2 J

{y, - (yres - i) / 2, (yres - i) / 2l,


[x, - ( x r e s - i) /2, (xres- i) /2}]] / / F i r s t

2.594 S e c o n d

Most of the internal commands of Mathematica do a very good job on real numbers.

A further substantial improvement in speed can be obtained by compilation of the code into
fast internal assembler code with the function C o m p i l e [ { a r g s } , ...aode...,
{ d e a l } ]. This generates a pure function, that can be called with the arguments { a r g s }.
This function generates optimized code based on an idealized register machine. It assumes
approximate real or inter numbers, or matrices of these. The arguments in the argumentlist
need to have the proper assignment ( - - R e a l , _ I n t e g e r , Complex or T r u e / F a l s e ) .
88

The assignment - - R e a l is default and may be omitted, so {x, - R e a l } is equivalent to


{ x }. An example to calculate the factorial of a sum of two real numbers:

gammasum[] Compile[{{x, _Real}, {y, _Real}}, (x+y) t]

C o m p i l e d F u n c t i o n [ { x , y}, ( x + y ) !, -CompiledCode-]

gammasum [3, 5]

40320.

We now check if the compiled code of our Gaussian kernel gives a speed improvement:

xres - 1 yres - 1
gdkernel [] C o m p i l e [ {xres, yres, u}, x r e s h = - - ; yresh
2 2
1 x 2 + y2 x 2 + y2 ] ]
p = Table[ 2----~--~
2 E x p [ I f [ - - - 2 a 2 < -10O, -i00, ---~--~---a2j j ,

{y, -yresh, yresh}, {x, -xresh, xresh}], {Ix, _Real},


{y, _Real}, {xresh, _Real}, {yresh, _Real}, {p, _Real, 2}}] ;

T i m i n g [ g d k e r n e l [ 2 5 6 , 256, 3] ] // F i r s t

2.532 S e c o n d

In version 4.2 of Mathematica we see no improvement, running the example above, the
kernel has been optimized for these calculations. In earlier versions you will encounter some
60% improvement with the strategy above. See the Help browser (shift-F1) for speed
examples of the Compile function. We now add the symbolic operation of taking derivatives
of the kernel. We force direct generation of the polynomials in the Gaussian derivatives with
the Hermite polynomials, generated with H o r r a i t o H . The symbolic functions are first
evaluated through the use of the function E v a l u a t e , then compiled code is made:

gdkernel= C o m p i l e [ { x r e s , yres, a, {nx, _ I n t e g e r } , {ny, _ I n t e g e r } } ,


xres - 1 yres - 1
xresh = - - ; yresh = - - ;
2 2
p = Table [Evaluate [

-1 ] ..... H e r m i t e H [ n x , x ] H e r m i t e H [ny, - - Y ]

1 ExprIfr-tl_ x2 +y2 < - 1 0 O , -100, x-2 +y2


--]]],
a 2 2 ~r 2 a 2 2 u2

{y, -yresh, yresh} , {x, -xresh, xresh}] , { {x, _Real} ,


{y, _Real} , {xresh, _Real} , {yresh, _Real}, {p, _Real, 2} } ] ;

Timing[gdkernel[256, 256, 3, i0, 10] ] // F i r s t

4.25 S e c o n d

Larger kernels are now no problem anymore, e.g. for 5122:


5. Multi-scale derivatives: implementations 89

Timing[t = gdkernel[512, 512, 3, 2, 1]] // First

7.593 Second

We adopt this function for our final implementation. Because the output is a matrix of real
numbers, also the Fourier transform is very fast. This is the time needed for the Fast Fourier
Transform on the 5122 kernel just generated:

Timing[Fourier[t]] // First

0.172 Second

To complete this section we present the final implementation, available throughout the book
in the context F E V ’ , which is loaded by default in each chapter.

In the compiled function also complex arrays emerge, such as the result of F o u r i e r [ ] and
InverseFourier [ ]. The compiler is told by the declarations at the end that anything
with the name F o u r i e r and I n v e r s e F o u r i e r working on something (_) should be
stored in a complex array with tensorrank 2, i.e. a 2D array. Study the rest of the details of
the implementation yourself:

Unprotect[gDf]; gDf[im_, nx_, ny_, a_] :=


Module[{}, {xres, yresl = Dimensions[im]; gf[im, nx, ny, ~, xres, yres]l;

gf =
Compile[{{im, _Real, 2}, {nx, _Integer}, {ny, _Integer}, a, xres, yresl,
xres- 1 yres-1
Module[{x, y}, xresh = - - ; yresh = - - ;
2 2
.....

p = RotateLeft [Table [ 1 Evaluate[- Hermi%eH [nx,

a~ r~ ] MermiteH[ny, y.__y.__]a~
r~ eZ,[_~2.,z
2~ "...........x2;,z
2, I],

xres yres ~]
{y, -yresh,yresh}, {x,-xre~h,~resh}], {--3--' -q--~J]'
~yres Chop[Re[InverseFourier[Feurier[im] Fourier[p]]]],
{{x, R e a l } , {y, _Real}, {xresh, _Real}, {yresh, _Real}, {p, _Real, 2l,
{Fourier[_], _Complex, 2}, {InverseFourier[_], _Complex, 2}li;

5.8 Summary of this chapter


Mathematica is fast when:
- it can use its internal kernel routines as much as possible. They have been optimized for
speed and memory use;
- it can calculate on numerical data. Use the function N [ . . . ] to convert infinite precision
representations like S i n [ 3 / 7 ] to numerical data;
- it is working in the representation range of real numbers. Otherwise it enters the infinite
precision mode again;
- the function is compiled with the function C o m p i l e [ . . . ];
6. Differential structure of images
"I f l had more time, 1 would have written you a shorter letter", Pascal (1623-1662)

6.1 The differential structure of images


In this chapter we will study the differential structure of discrete images in detail. This is the
structure described by the local multi-scale derivatives of the image. We start with the
development of a toolkit for the definitions of heightlines, local coordinate systems and
independence of our choice of coordinates.

< < FrontEndVision'FEV'; Off[General::spell];


Show[Import["Spiral CT abdomen.jpg"], ImageSize -> 170];

Figure 6.1 An example of a need for segmentation: 3D rendering of a spiral CT acquisition of


the abdomen of a patient with Leriche's syndrome (EuroRAD case #745, authors R. Brillo, A.
Napoli, S. Vagnarelli, M. Vendola, M. Benedetti Valentini, 2000, www.eurorad.org).

We will use the tools of differential geometry, a field designed for the structural description
of space and the lines, curves, surfaces etc. (a collection known as manifolds) that live there.

We develop strategies for the generation of formulas for the detection of particular features,
that detect special, semantically circumscribed, local meaningful structures (or properties) in
the image. Examples are edges, comers, T-junctions, monkey-saddles and many more. We
develop operational detectors in Mathematica for all features described.

One can discriminate local and multi-local methods in image analysis. We specifically
discuss here local methods, at a particular local neighborhood (pixel). In later chapters we
look at multi-local methods, and enter the realm of how to connect local features, both by
studying similarity in properties with neighboring pixels (’perceptual grouping’), relations
6. Differential structure of images 92

over scale (’deep structure’) and relations given by a particular model. We will discuss the use
of the local features developed in this chapter into ’geometric reasoning’.

Why do we need to go in detail about local image derivatives? Combinations of derivatives


into expressions give nice feature detectors in images. It is well known that

~( (OL] 2 i s a g o o d e d g e d e t e c t o r , and(0.0.~) 2 0 2 L _ 2 0 L OL OZL +(OL] 2 02L


_~j~)~2 +~Oy, ~ OX Oy OxOy ~Ox’ Oy2
is a good corner detector. But how do we come to such formula’s? We can make an infinite
number of such expressions. What constraints can/should we impose to come to a reasonably
small set of basis descriptors? Is there such a basis? It turns out there is, and in this chapter
we will derive a formal complete set of such descriptive elements.

A very important constraint in the development of tools for the description of image structure
is to be independent of the choice of coordinates. We will discuss coordinate
transformations, like translations, rotations, zooming, in order to find a way to detect features
invariant to such coordinate transformations. In fact, we will discuss three ’languages’ in
which it is easy to develop a general strategy to come up with quite complex image structure
detectors:

gauge coordinates, Cartesian tensors, and algebraic polynomial invariants. All these methods
have firm roots in mathematics, specifically differential geometry, and form an ideal
substrate for the true understanding of image structure.

We denote the function that describes our landscape (the image) with L(x, y) throughout this
book, where L is the physical property measured in the image. Examples of L are luminance,
T1 or T2 relaxation time (for MRI images), linear X-ray absorption coefficient (for CT
images), depth (for range images) etc. In fact, it can be any scalar value. The coordinates x, y
are discrete in our case, and denote the locations of the pixel. If the image is 3-dimensional,
e.g. a stack of images from an MRI or CT scanner, we write L(x, y, z). A scale-space of
images, observed at a range of scales o- is written as L(x, y; o-). We write a semicolon as
separator to highlight the fact that o- is not just another spatial variable. If images are a
function of time as well, we write e.g. L(x, y, z; t) where t is the time parameter. In chapter
17 we will develop scale-space theory for images sampled over time. In chapter 15 we study
the extra dimension of color in images and derive differential features in color-space, and in
chapter 13 we derive methods for the extraction of motion, a vectorial property with a
magnitude and a direction. We firstly focus on static, spatial images.

6.2 Isophotes and flowlines


Lines in the image connecting points of equal intensity are called isophotes. They are the
heightlines of the intensity landscape when we consider the intensity as ’height’. Isophotes in
2D images are curves, and in 3D surfaces, connecting points with equal luminance.

(Greek: isos (to-oq) = equal, photos (~boroq) = light): L(x,y) =constant or


L(x, y, z) = constant. This definition however is for a continuous function. But the scale-
space paradigm solves this: in discrete images isophotes exist because these are observed
93 6.2 lsophotes and flowlines

images, and thus continuous (which means: infinitely differentiable, or C~~ Lines of
constant value in 2D are C o n t o u r s in Mathematica, which can be plotted with
C o n t o u r P l o t . Figure 6.2 illustrates this for a blurred version of a 2D image.

i m = Import["mr128.gif"][[1, 1]];
Block[{$DisplayFunetion = Identity, dp, ep},
dp=ListDensityPlot[gD[im, 0, 0, #]] & /@{i, 2, 3};
cp = ListContourPlot[gn[im, 0, 0, #],
ContourStyle ~ List / @ H u e / @ (.1Range[10])] & /@ {1, 2, 3};
pa = MapThread[Show, {dp, cp}]]; Show[GraphicsArray[pa],
ImageSize-> 400];

Figure 6.2 Isophotes of an image at various blurring scales: from left to right: o- = 1, o- = 2
and o - = 3 pixels. Image resolution 1282 . Ten isophotes are plotted in each image,
equidistant over the available intensity range. Each is shown in a different color,
superimposed over the grayvalues. Notice that the isophotes get more 'rounded' when we
blur the image. When we consider the intensity distribution of a 2D image as a landscape,
where the height is given by the intensity, isophotes are the heightlines.

Isophotes are important elements of an image. In principle, all isophotes together contain the
same information as the image itself. The famous and often surprisingly good working
segmentation method by thresholding and separating the image in pixels lying within or
without the isophote at the threshold luminance is an example of an important application of
isophotes. Isophotes have the following properties:

9 isophotes are closed curves. Most (but not all, see below) isophotes in 2D images are a so-
called Jordan curve: a non-self-intersecting planar curve topologically equivalent to a circle;
9 isophotes can intersect themselves. These are the critical isophotes. These always go
through a saddlepoint;
9 isophotes do not intersect other isophotes;
9 any planar curve is completely described by its curvature, and so are isophotes. We will
define and derive the expression for isophote curvature in the next section.

9 isophote shape is independent of grayscale transformations, such as changing the contrast


or brightness of an image.

A special class of isophotes is formed by those isophotes that go through a singularity in the
intensity landscape, thus through a minimum, maximum or saddle point. At these places the
intensity landscape is horizontal, the local spatial derivatives are all zero. Only at saddle
points isophotes intersect themselves, and just above and below this intersection its neighbor
6. Differential structure of images 94

isophotes have different topology: they have split from one curve into two, or merged from
two curves into one.

1 ( x _ ux) 2 ( y _ ~y)2
blob[x_, y_. ,x . ,y_. a__] := Exp[ + ];
-- 2 ~ e2 2 e2
blobs[x_, y_] :=
blob[x, y, i0, i0, 4] + .7 blob[x, y, 1St 20, 4] + 0 . S b l o b [ x , y, 22, 8, 4];
B l o c k [ { S D i s p l a y F u n c t i o n = Identity}, pl = P l o t 3 D [blobs [x, y] - . 00008,
{x, 0, 30}, {y, 0, 30}, P l o t P o i n t s ~ 30, M e s h ~ False, S h a d i n g - > True] ;
c = ContourPlot[blobs[x, y], {x, 0, 30}, {y, 0, 30},
P l o t P o i n % s ~ 30, C o n t o u r S h a d i n g ~ False] ;
c3d = Graphics3D[Graphics[c] [ [i] ] /.
L i n e [pts_] :4 (val = Apply[blobs, First[pts] ] ;
Line[Map[Append[#, val] &, pts] ] ) ] ] ;
S h o w [pl, c3d, V i e w P o i n t -> { 1. 393, 2. 502, 1.114}, I m a g e S i z e -> 250] ;

Figure 6.3 Isophote on a 2D 'landscape' image of 3 Gaussian blobs, depicted as heightlines.


The height is determined by the intensity. The height plot is depicted slightly lower (-0.0002)
in order to show the full extent of the isophotes.

At a minimum or maximum the isophote has shrunk to a point, and going to higher or lower
intensity gives rise to the creation or disappearance of isophotes. This is best illustrated with
an example of an image where only three Ganssian ’blobs’ are present (see figure 6.3). The
saddle points are in between the blobs. Isophotes through saddles and extrema are called
critical isophotes.

We show the dynamic event of a ’split’ and a ’merge’ of an isophote by the behaviour of a
two-parameter family of curves, the Cassinian ovals: (x2 + y2 + a 2) _ b 2 _ 4 a 2 x 2 = O.

Famous members of Cassini functions are the circle (cassini [x,y, a=0 ,b] ) and the
lemniscate of Bernouilli ( e a s s i n i [ x , y , a = b , b ] ). The limaqon function, a generalization
of the cardioid function, shows how we can get self-intersection where the new loop is
formed within the isophote’s inner domain. Here are the plots:
95 6.2 Isophotes and flowlines

cassini[x_, y_, a_, b_] := (x2+ y 2 + a2) 2 - b 2 _ 4 a 2 xZ;


DisplayTogetherArray[{
ImplicitPlot[cassini[x, y, #, 4] == 0, {x, -5, 5}] & /@ {1.99, 2., 2.01},
P a r a m e t r i c P l o t [ ( 2 C o s [ t ] +#) { Cos[t], Sin[t]}, {t, 0, 2~}] & /@
{3, 2., 1}}, I m a g e S i z e - > 400];

i ,5

Figure 6.4 Top row: Split and merge of an isophote just under, at and above a saddle point in
the image, simulated with a Cassini curve. Bottom row: Self intersection with an inner loop,
simulated with the lima~on function. Examples taken from the wonderful book by Alfred Gray
[Gray1993].

Isophotes in 3D are surfaces. Here is an example of the plotting of 4 isophote surfaces of a


discrete dataset. We use the versatile OpenGL viewer MathGL3d developed by Jens-Peer
Kuska: http://phong.informatik.uni-leipzig.de/~kuska/mathgl3dv3/

Get["MathGL3d'OpenGLViewer'"]; isos = Compile[{}, 103

Table[Exp[ x2 y2 z2
18 - T - Y E ] ' { ' ' -lO, lO}, ty, -lO, lOl, { x , - l O , lO}]];
MVListContourPlot3D[isos[], Contours -> {.i, I, I0}, ImageSize -> 150];

Figure 6.5 Isophotes in 3D are surfaces. Shown are the isophotes connecting all voxels with
the values 0.1, 1,10 and 100 in the discrete dataset of two neighboring 3D Gaussian blobs.

The calculations with the native command L i s t C o n t o u r P l o t 3 D take take much longer.

Flowlines are the lines everywhere perpendicular to the isophotes. E.g. for a Gaussian blob
the isophotes are circles, and the flowlines are radiating lines from the center. Flowlines are
the integral curves of the gradient, made up of all the small little gradient vectors in each
point integrated to a smooth long curve. In 2D, the flowlines and the isophotes together form
a mesh or grid on the intensity surface.

Figure 6.6 shows such a grid of the isophotes and flowlines of a 2D Gaussian blob (we have
left out the singularity).
6. Differential structure o f images 96

DisplayTogether[
ShadowPlot3D[-gauss[x, 5] gauss[y, 5], (y, -15, 15), {x, -15, 15}],
CartesianMaplExp, {-~, ~}, {-~, ~ } ] ,
ImageSize -> 200, AspectRatio -> I];

Figure 6.6 Isophotes and flowlines on the slope of a Gaussian blob. The circles are the
isophotes, the flowlines are everywhere perpendicular to them. Inset: The height and
intensity map of the Gaussian blob.

Just as in principle all isophotes together completely describe the intensity surface, so does
the set of all flowlines. Flowlines are the dual of isophotes, isophotes are the dual of
flowlines. One set can be calculated from the other. Just as the isophotes have a singularity at
minima and maxima in the image, so have flowlines a singularity in direction in such points.

6.3 Coordinate systems and transformations


We will now apply the complete family of well behaving differential operators developed in
the first chapter for the detection of local differential structure in images 9 The set of
derivatives taken at a particular location is a language from which we can make a description
of a local feature. We can make assemblies of the derivatives to any order, in any
combination 9 Local structure is the local shape of the intensity landscape, like how sloped or
curved it is, if there are saddlepoints, etc. The first order derivative gives us the slope, the
second order is related to how curved the landscape is, etc.

In mathematical terms the image derivatives show up in the so-called Taylor expansion of
our image function.

The Taylor expansion describes the function ’a little further up’: If we move a little distance
(3x, 6y) away from the pixel where we stand, the Taylor expansion -or Taylor series- is
given by (we take the expansion in the origin (0, 0) for notational convenience):

OL 1 0EL OZL t92L


L(rx, 6y) = L(O, O) + (~x 6x + -o-f 6y) + -~. (~x2 6 x 2 + O~Tfy6 x 6 y + ~ ~y2) +
1 [~9 03l,
a~’ 8 x 3 + ~ 03L ~x2 6 Y + ~ a3L ~ x ~ y 2 + ~-~
03L r + O05X4, r
97 6.3 Coordinate systems and transformations

We see al the partial derivatives appearing. The spatial derivatives are taken at the location
(0,0), e.g. ~o2L 1(0,0). The first-order, second-order and third-order terms are grouped in
brackets. Such groups of all terms of a specific order together are called ’binary forms’. The
list goes to infinity, so we have to cut-off somewhere. The above series is an approximation
to the third order, and the final expression O(6x 4, ~y4) indicates that there is more, a rest
term of order 4 and higher in d x and 3y. Mathematica has the command S e r i e s to make a
Taylor expansion. Here is the Taylor series for L(x, y) for 6x to second order and then
expanded to second order by 6 y :

Series[L[~x, 6y], {6x, O, 2}, {6y, O, 2}]

L[O, O] + L (~ [0, O] cSy+ yi L (~ [0, O] dy 2 +O[6y] 3 ) +

L I'~ [0, 0] + L (I'I) [0, 0] 6 y + y1L 11'21 [0, O] dy 2 +O[cSy] 3) d~x+

)
L(2"~ [0, O] + ~- L(2"I) [0, O] 6y + ~- L(2'2) [0, O] 6y2 + O[6y] 3 ~x2 + O[~x] 3

This expansion says essentially that we get a good approximation of the intensity landscape a
little bit (Sx, ~Sy) further away from the origin (0, 0), when we first climb up over ~x and 6y
with a slope given by the first derivative, the tangent. Then we come close, but not exactly.
We can come somewhat better approximated when we include also the second order
derivative, indicating how curved locally our landscape is. Etc. Taking into account more
and more higher order terms gives us a better approximation and finally with the infinite
series we have an exact description.
Our most important constraint for a good local image descriptor comes from the requirement
that we want to be independent of our choice of coordinates. The coordinate system used the
most is the Cartesian coordinate system (invented by and named after Descartes, a brilliant
French mathematician from the 18th century): this is our familiar orthogonal (x, y) or
(x, y, z) coordinate system.

But it should not matter if we describe our local image structure in another coordinate system
like a polar, cylindrical or rotated or translated version of our Cartesian coordinate system.
Because the Cartesian system is the easiest to understand, we will deal only with changes in
this coordinate system. The frame of the coordinate system is formed by the unit vectors
pointing in the respective dimensions. What changes could occur to a coordinate system? Of
course any modification is possible. We will focus on the change of orientation (rotation of
the axes frame), translation (x and/or y shift of the axes frame), and zoom (multiplication of
the length of the units along the axes with some factor).

The shear transformation (where the axes are no longer orthogonal) will not be discussed
here; we limit ourselves to changes of the coordinates where they remain orthogonal.
6. Differential structure of images 98

DisplayTogetherArray[
Show[Graphics[{Arrow[(0, 0}, #] & / @ {(i, 0l, {0, i}},
Red, Poins Point[{.4, .6}]}],
Frame -> True, Axes -> True, AspectRatio -> i],
Show[Graphics3D[{arrow3D[{0, 0, 0}, #, True] & /@ {(i, 0, 0), {0, i, 0},
{0, 0, i}}, Red, PointSize[.04], Point[{.4, .6, .7}]}],
B o x e d - > T r u e , B o x R a t i o s - > {1, 1, 1}, A x e s - > T r u e ] , I m a g e S i z e - > 2 5 0 ] ;

08 1

06 ~ 075

04 o2i
O2

0 --
0 o2 014 016 o8 ~

Figure 6.7 Use of graphics primitives in Mathematica: the coordinate unit vectors in 2D and
3D.

We call all the possible instantiations of a transformation the transformation group. So all
rotations form the rotational group, the group of translations is formed by all translations. We
now consider the transformation of the frame vectors.

Mathematically, the operation of a transformation is described by a matrix, the


transformation matrix. E.g. rotation of a vector over an angle 4) is described by the rotation
matrix in 2D:

RotationMatrix2D[~] //MatrixForm

Cos[S] Sin[C]
-Sin[S] Cos[S])

The angle 4’ is defined as clockwise for the positive direction. In 3D it gets a little more
complicated, as we have three angles to rotate over (these are called the ’Euler’ angles):

RotationMatrix3D[0, e, ~]

{{Cos[S] Cos[S] -Cos[@] Sin[C] Sin[~] ,


Cos[el Cos[S] Sin[$] +Cos[$] Sin[S], Sin[e] Sin[S]},
{-Cos[S] Sin[S] -Cos[0] Cos[$] Sin[S],
Cos [0] Cos [~] Cos [$] - Sin [$] Sin [$] , Cos [$] Sin [0] },
{Sin[0] Sin[S] , -Cos[S] Sin[0] , Cos [0] l}

In general a transformation is described by a set of equations:


X’l ---- f l ( X l , X2 ..... Xn)

X’ n = f . ( X l , X2 . . . . . Xn)

When we transform a space, the volume often changes, and the density of the material inside
is distributed over a different volume. To study the change of a small volume we need to
consider ~ x ’ ’ which is the matrix of first order partial derivatives.
99 6.3 Coordinate systems and transformations

We have
O(x’)~ O(x’)l ]
OXl Ox,
j = o~’ : . This matrix is called the Jacobian matrix, named after Carl
a(x’), O(x’),
Oxl OXn
Jacobi(1804-1851), a Prussian mathematician. The Jacobian can be computed in
Mathematica with

jacobianmatrix [f u n c t i o n s _ L i s t , variables_List] :=
Outer [D, functions, variables]

If we consider the change of the infinitesimally small volume


dX’l dx’2 ... dx’n = I 07’ I d Xl d x2 ... d Xn we see that the determinant of the Jacobian
matrix (also called the Jacobian) is the factor which corrects for the change in volume. W h e n
the Jacobian is unity, we call the transformation a special transformation.
The transformation in matrix notation is expressed as 2 ’ = A 2, where Y’ is the transformed

J
ail "’" aln
vector, 2 is the input vector, and A = i ". : is the transformation matrix. W h e n
k an 1 "’" ann
the coefficients o f A are constant, we have a linear transformation, often called an affine
transformation. In Mathematica (note the dot product between the matrix and the vector):

! all an
Clear[x, y ] ; A = | | ; x = {xl, x2};
a21 a22 i
1
~'= A.~
De, [jacobianmatrix rA. ~, x] l

a,~ x__~+ a1_~x_~ a2~ x__~+_~2__~x_~


L _el2 a21 + el I a22 t -el2 a21 + all a22 J

9 Task 6.1 Show that the Jacobian of the transformation matrices


Rotatio~atrix2D [4,] and R o t a t i o n M a t r i x 3 D [ ~ , e , ~ ] are unity.

A rotation matrix that rotates over zero degrees is the identity matrix or the symmetric tensor
or d-operator:

6 = RotationMatrix2D [0] ; 6 // M a t r i x F o r m

(01)
i 0

and the matrix that rotates over 90 degrees (Jr/2 radians) is called the antisymmetric tensor,
the E-operator or the Levi-Civita tensor:

e = RotationMatrix2D [ ~ / 2 ] ; e // M a t r i x F o r m

(0 1
-1 0)
6. Differential structure of images 1O0

Let us study an example of a rotation: a unit vector under 45 o is rotated over 1 l0 ~ clockwise:

=
i--~-- },~' = RotationMatrix2D[ll0
5-~].~//.
2,

{0.422618, -0.906308}

Show[Graphics[{Arrow[{0, 0}, #] & /@ {~, ~'}, Text["~", {.8, .8}],


Text["~'", {.55, -.8}]}], PlotRange -> {{-I, i}, {-i, i}},
Frame->True, A x e s - > True, A s p e c t R a t i o - > i, I m a g e S i z e - > i00];

1
075 %*
05 /
025
o
-025
-O5
-075 v
0 5~5) 25002~5} 751

Figure 6.8 The vector V' is rotated by the action of the rotation matrix operator on the vector
V.

What we want is invariance under the transformations of translation and rotation. A function
is said to be invariant under a group of transformations, if the transformation has no effect on
the value of the function. The only geometrical entities that make physically sense are
invariants. In the words of Hermann Weyh "any invariant has a specific meaning", and as
such they are widely studied in computer vision theories.

An example: The derivative to x is not invariant to rotation; if we rotate the coordinate


system, or the image, we get in general a completely different value for the value of the
derivative at that point. The same applies to the derivative to y. However, the combination

OL 2 (OL]2 be seen from the following: We denote derivatives


(5-;) +~OyJ is invariant, as can

OL
with a lower index: Lx -~ 7x" The length of the gradient vector (Lx, Ly) is the scalar

{Lx, Ly}. {Lx, Ly}

+ Ly 2

We used here again the D o t (.) product of vectors. When we now rotate each vector (Lx, Ly)
with the rotation matrix over an arbitrary angle ~b, we get

( ( R o t a t i o n M a t r i x 2 D [~ l . {Lx, Ly} ) . (RotationMatrix2D[~] . {Lx, Ly}) )

4(LyC~ -LxSin[~])2 + (LxCos[~] +LySin[~])2

Simplify [%]

+ Ly 2
101 6.3 Coordinate systems and transformations

Invariance proved for this case. Invariants are so important, that the lower-order ones have a

aL 2 ( OL ]2
name. E.g. the scalar ~/ (-~-~) + ~ Oy J is called the gradient magnitude, the vector operator

--- { ~ , ~y} is called the nabla operator. So VLis the gradient of L. V.(VL) = ~~ + Oy
2~
02 L 02L

is called the Laplacian. Note that the gradient of the gradient V(VL) =
OxOy
02L Ox2
02L
is the
OxOy 072
matrix of second order derivatives, or the Hessian matrix (this is not an invariant).

9 Task 6.2 Show that the Laplacian is an invariant under rotation, in 2D and 3D.

In the sequel, we will only consider orthonormal transformations. These are also called
Euclidean transformations. Orthonormal transformations are special orthogonal
transformations (the Jacobian is unity). With orthogonal transformations the orthogonality of
the coordinate frame is preserved. An orthonormal transformation preserves lengths of
vectors and angles between vectors, i.e. it preserves a symmetric inner product < Y, ~ >.
When T is the orthogonal transformation, this means that < Y, ~ >= < TY, T ~ >.

The transformation matrix of an orthogonal transformation is an orthogonal matrix. They


have the nice property that they are always invertible, as the inverse of an orthogonal matrix
is equal to its transpose: A -1 = A T . A matrix m can be tested to see if it is orthogonal using

O r t h o g o n a l Q [ m _ L i s t ? M a t r l x Q ] :=
(Transpose[m].m == I d e n t i t y M a t r i x [ L e n g t h [ m ] ] ) ;

Of course, there are many groups of transformations that can be considered, such as
projective transformations (projecting a 3D world onto a 2D surface). In biomedical imaging
mostly orthogonal transformations are encountered, and on those which will be the emphasis
of the rest of this chapter.

Notice that with invariance we mean invariance for the transformation (e.g. rotation) of the
coordinate system, not of the image. The value of the local invariant properties is also the
same when we rotate the image. There is however an important difference between image
rotation, and coordinate rotation. We specifically mean here the local independence of
rotation, for that particular point. See also figure 6.9. If we study the rotation of the whole
image, we apply the same rotation to every pixel.

Here, we want in every point a description which is independent to the rotation of the local
coordinates, so we may as well rotate our coordinates in every pixel differently. Invariance
for rotation in this way means something different than a rotation of the image. There would
be no way otherwise to recognize rotated images from non-rotated ones!
6. Differential structure of images 102

Show[Import["Thatcher i11usion.jpg"], ImageSize-> 330];

Figure 6.9 The "Thatcher illusion", created by P. Thompson [Thompson1980], shows that
local rotations of image patches are radically different from the local coordinate rotation
invariance, and that we are not used to (i.e. have no associative set in our memory) for
sights that we seldomly see: faces upside down. Rotate the images 180 degrees to see the
effect.

In particular, we will see that specific scalar combinations of local derivatives give
descriptions of local image structure invariant under a Euclidean transformation.

6.4 D i r e c t i o n a l derivatives

The directed first order nabla operator is given in 2D by 9 .V, where ~ is a unit vector
pointing in the specific direction. ~ .V is called the directional derivative. Let us consider
some examples. We calculate the directional derivative for 9 = { - ~ , _~t~} and
~ = {’~3-/2, 1/2}:

im= Import["mip147.gif"] [[i, I]];


northeast[ira_, G_] := { - ~ , -%/2}.{gD[im, i, 0, u], gD[im, 0, i, a]};
southsouthwest[im_,Q_] :=
{~/3-/2, l/2}.{gD[im, 1, o, o], gD[im, o, 1, ~
DisplayTogetherArray[ListDensityPlot/@
{im, northeast[im, i], southsouthwest[im, i]}, ImageSize -> 300];

Figure 6.10 Directional derivatives. Image from the Eurorad database (www.eurorad.org),
case 147.
103 6.5 First order gauge coordinates

6.5 First order gauge coordinates


We introduce the notion of intrinsic geometry: we like to have every point described in such
a way, that if we have the same structure, or local landscape form, no matter the rotation, the
description is always the same. This can be accomplished by setting up in each point a
dedicated coordinate frame which is determined by some special local directions given by
the landscape locally itself.

Consider yourself an ant on a surface, you can only see the direct vicinity, so the world looks
locally very simple. We now fix in each point separately our local coordinate frame in such
a way that one frame vector points to the direction of maximal change of the intensity, and
the other perpendicular to it (90 degrees clockwise). The direction of maximal change of
_- ~[ToL aL

=
(0 ,)
intensity is just the gradient vector

- 0 " ~ = ( ~oL
x , ~-y)" The perpendicular direction is

oL We can check: if we are on a slope going up in the y-


- y , - Tx)"
direction only (the ’Southern’ slope of a hill), we have as gradient {0, ~T},
aL because in the x-
direction the slope is horizontal.

ContourPlot[x 2 + y2, {y, 2, 4.5},


{x, 2, 4.5}, Contours~Range[2, 100, 4], Epilog->
{PointSize[.02] , P o i n t [ { 3 , 3 } ] , Arrow[{3, 3}, {3 + .5 ~ , 3 - .5 ~ } ] ,
Arrow[{3, 3}, {3 + . 5 ~ , 3 + .5~}], Text["@", {3.8, 2 . 2 } ] ,
Text[ "~", {3.8, 3.8} ] }, Frame ~ False, ImageSize ~ 100]

Figure 6.11 Local first order gauge coordinates {9, #}. The unit vector 9 is everywhere
tangential to the isophote (line of constant intensity), the unit vector 9 is everywhere
perpendicular to the isophote and points in the direction of the gradient vector.

We have now fixed locally the direction for our new intrinsic local coordinate frame (f, W).
This set of local directions is called a gauge, the new frame forms the gauge coordinates and
fixing the frame vectors with respect to the constant direction W is called: fixing the gauge.
Because we discuss first order derivatives here, we call this afirst order gauge. We can also
derive a second order gauge from second order local differential structure, as we will see
later.

We want to take derivatives with respect to the gauge coordinates.

As they are fixed to the object, no matter any rotation or translation, we have the following
very useful result:
6. Differential structure of images 104

any derivative expressed in gauge coordinates is an orthogonal invariant. E.g. it is clear that
a._L is the derivative in the gradient direction, and this is just the gradient itself, an invariant.
Ow

And ~OL -= 0, as there is no change in the luminance as we move tangentially along the
isophote, and we have chosen this direction by definition.

From the derivatives with respect to the gauge coordinates, we always need to go to
Cartesian coordinates in order to calculate the invariant properties on a computer. The
transformation from the (9, ~) from to the Cartesian (s Y) frame is done by implementing
the definition of the directional derivatives. Important is that first a directional partial
derivative (to whatever order) is calculated with respect to a frozen gradient direction. We
call this direction (Lx, Ly). Then the formula is calculated which expresses the gauge
derivative into this direction, and finally the frozen direction is filled in from the calculated
gradient.

In Mathematica: The frame vectors ~ and 9 are defined as

o~ ;~ = (ol 1 ) "~;
+ Ly 2 o

The directional differential operators P. V = ( ~ ; , a~-) and ~. V = ( ~ - , ~y ) are defined as:

~. {Ox #, Oy #) &;
~. (0. #, Oy #} &;

The notation ( . . . # ) & is a ’pure function’ on the argument #, e.g. (#2 + #5) & gives the
sum of second and fifth power of some argument #, D [ # , x ] & (or equivalently ( O x # ) &)
takes the derivative of the variable # with respect to x (look in the Help browser to
F u n a t i o n for more examples). So the construct of a pure function is the construct for an
operator. This pure function can be applied to an argument by the familiar square brackets,
e.g.

(#2+#s) &[zz]
ZZ 2 + ZZ 5

Higher order derivatives are constructed through nesting multiple first order derivatives, as
many as needed. The total transformation routine is now:

C l e a r [ f , L, Lx, Ly] ; U n p r o t e c t [gauge2D] ;


g a u g e 2 D [ f _ , n v _ /; n v a 0, n w _ /; n w ~ 0] :=
M o d u l e [ { L x , Ly, v, w}, w = {Lx, Ly) / S q r t [ L x ^ 2 +Ly^2] ;
v-- {{o, I}, (-I, o}).,;
Simplify [
N e s t [ ( v . { D [ # 1 , x], D[#1, y]} &), N e s t [ ( w . { D [ # 1 , x], D[#I, y]} &),
f, nw] , nv] /. { L x ~ D [ f , x] , L y ~ D [ f , Y]}]I ;

where f is a symbolic function of x and y, and nw and nv are the orders of differentiation
with respect to w resp v. Here is an example of its output: the gradient 3OL
u :
105 6.5 First order gauge coordinates

L w = g a u g e 2 D [ L [ x , y] , O, I]

~L(0,l> [x, y]2 +L<1,01 [x, y]2

Using pattern matching with the function s h o r t n o t a t i o n we get more readable output:

L W = g a u g e 2 D [ L [ x , y], 0, i] // s h o r t n o t a t i o n

L,.~ = g a u g e 2 D [ L [ x , y], 0 , 2] / / s h o r t n o t a t i o n

L~ L,= + 2 Lx Lxy L~ + L~ Lye,

Lv = g a u g e 2 D [ L [ x , y] , i, 0] // s h o r t n o t a t i o n

aL
As expected, because it is exactly what we put into the definition of ~-v: it is the
differentiation in the direction perpendicular to the gradient, so along the tangential direction
of the isophote, and in this direction there is no change of the intensity function. But

Lvv = g a u g e 2 D [ L [ x , y] , 2, 0] // s h o r t n o t a t i o n

- 2 Lx Lxy Ly + Lxx Ly2 + L2x Lyy

is not zero, because it is constructed by first applying the directional derivative twice, and
then fixing the gauge.

This calculates the Laplacian in gauge coordinates, Lvv + Lww (what do you expect?):

g a u g e 2 D l L [ x , y], 0, 2] + g a u g e 2 D [ L [ x , y], 2, O] // s h o r t n o t a t i o n

Lxx + Lyy

9 Task 6.3 Show and explain that in the definition of the function gauge2D we
cannot define w = {ax L, ay L}. We need to have the direction of the gauge fixed
while computing the compound formula. Why?

The next figure shows the {r ~} gauge frame in every pixel of a simple 322 image with 3
blobs:
6. Differential structure of images 106

1 (x- t~x) 2 + ( y - / . L y ) 2
blob[x_, y _ , ~ x _ , ~ y _ , a _ ] := - - Exp[ ];
2 ~ a2 2 a2
blobs[x_, y _ ] "=
b l o b [ x , y , i 0 , i 0 , 4] + . 7 b l o b [ x , y , 1 5 , 2 0 , 4] + 0 . 8 b l o b [ x , y, 22, 8, 4];
im = Table[blobs[x, y], {y, 30}, {x, 30}];
B l o c k [ {$ V i s p l a y F u n e t i o n = Identity, gradient, norm, u, frame},
norm= (#/Sqrt[#.#]) &;
a = 1; g r a d i e n t = Map[norm,
Transpose[{gD[im, 1, 0, a l, gD[im, 0, 1, a]}, {3, 2, 1}], {2}] ;
frame = G r a p h i c s [ {White, A r r o w [ # 2 - .5, #2 - .5 + #1] , Red,
Arrow[#2-.5, # 2 - . 5 + {#i[[2]], -#I[[i]]}] }] &;
ar []M a p I n d e x e d [ frame, g r a d i e n t / 2, {2 } ] ;
1p = L i s t D e n s i t y P l o t [ g D [ i m , 0, 0, a] ] ] ;

Show[{lp, ar}, F r a m e - > T r u e , ImageSize-> 410] ;

Figure 6.12 The gauge frame {w, v} given for every pixel in a 302 image of three Gaussian
blobs. The gradient direction w, calculated at a scale of o-=1 pixel, is indicated in white, and
points always to higher intensity. They are (defined as) everywhere perpendicular to the
isophotes and tangential to the flowlines. These vectors always point to extrema and saddle
points. The v frame vector (in red) is ~/2 radians rotated clockwise, they encircle the
extrema, (defined as) tangential to the isephotes. (The boundary effects, most notably on the
right, are due to the cyclic interpretation of the gradient calculation, which causes the image
to be interpreted as infinitely repeated in all directions: the gradient direction changes over ~,
no artefact, but now well understood).
107 6.5 First order gauge coordinates

The gauge coordinates are not defined at ’horizontal points’ in the intensity landscape, i.e.
locations where ~ x 2 + Ly 2 = 0, as is clear from the definition of the gauge coordinates.
This occurs in saddle points and extrema (minima and maxima) of the intensity landscape,
where both Lx = 0 and Ly = 0. In practice however this is not a problem: we have a finite
number of such points, typically just a few, and we know from Morse theory that we can get
rid of such a singularity by an infinitesimally small local change in the intensity landscape.

Due to the fixing of the gauge by removing the degree of freedom for rotation (that is why
Lv -= 0), we have an important result: every derivative to v and w is an orthogonal invariant.

In other words: it is an invariant property where translation and/or rotation of the coordinate
frame is irrelevant. This also means that polynomial combinations of these gauge derivative
terms are invariant. We now have the toolkit to make invariants expressed in gauge
derivatives to any order.

Here are a few other differential invariants of the image, which are now easily constructed:

g a u g e 2 D [L [x, y] , 4, 0] // s h o r t n o t a t i o n

- 4 L x3 Lxyyy Ly + 6 L 2 Lxxyy Ly2 - 4 Lx Lxxxy Ly3 + Lxxxx Ly4 + L x4 Lyyyy


(T,~ +T.~) 2

gauge2D[L[x, y], 2, I] // s h o r t n o t a t i o n

<L2x +L2) 3/2

In conclusion of this section, we have found a complete family of differential invariants, that
are invariant for rotation and translation of the coordinate frame. They are called differential
invariants, because they consist of polynomials with as coefficients partial derivatives of the
image. In the next section we discuss some important members of this family. Only the
lowest order invariants have a name, the higher orders become more and more exotic.

The final step is the operational implementation of the gauge derivative operators for discrete
images. This is simply done by applying pattern matching:
- first calculate the symbolic expression
- then replace any derivative with respect to x and y by the numerical derivative
g D [im, n x ,ny ,a]
- and then insert the pixeldata in the resulting polynomial function;
as follows:

Unprotect[gauge2DN];
gauge2DN[im_, nv_, n w _ , a _ / ; a > 0] :=
Module[(im0}, gauge2D[L[x, y], nv, nw] /.
Derivative[nx_, ny_] [L_] [x_, y_] ~ g D [ i m 0 , nx, ny, u] /. i m 0 ~ i m ] ;

This writes our numerical code automatically. Here is the implementation for Lvv. ff the
image is not defined, we get the formula returned:
6. Differential structure of images 108

Clear[im, a] ; gauge2DN[im, 2, 0, 2]

(gD[im, 0, 2, 2] gD[im, i, 0, 2] 2
2 gD[im, 0, i, 2] gD[im, i, 0, 2] gD[im, i, i, 2] +
gD[im, 0, i, 2] 2 gD[im, 2, 0, 2]) / (gD[im, 0, i, 2] 2 +gD[im, i, 0, 2] 2 )

If the image is available, the invariant property is calculated in each pixel:

im = Import ["thorax02.gif" ] [ [1, 1] ] ;


DisplayTogetherArray [
L i s t D e n s i t y P l o t / @ {im, -gauge2DN[im, 0, 1, 1] , -gauge2DN[im, 2, 0, 4] },
ImageSize -> 400] ;

Figure 6.13 The gradient Lw (middle) and Lvv, the second order directional derivative in the
direction tangential to the isophote (right) for a 2562 X-thorax image at a small scale of 0.5
pixels. Note the shadow of the coins in the pocket of his shirt in the lower right.

6.6 Gauge coordinate invariants: examples


6.6.1 Ridgedetection
Lvv is a good ridge detector, since at ridges the curvature of isophotes is large (see figure
6.13).

(
f[x_, y_] := Sin[x] + ~
1 Sin[3x] ) (l+.ly);
DisplayTogetherArray[Plot3D[f[x, y], {x, 0, ~l, {Y, 0, ~l],
ContourPlot[f[x, y], {x, 0, ~}, {y, 0, ~}, P l o t P o i n t s - > 50],
ImageSize -> 370] ;

Figure 6.14 Isophotes are much more curved at the top of ridges and valleys then along the
slopes of it. Left: a slightly sloping artificial intensity landscape with two ridges and a valley,
at right the contours as isophotes.
109 6.6 Gauge coordinate invariants: examples

Let us ~ s t this on an X-ray image of fingers and c ~ c u l ~ e Lvv sc~e ~ = 3.

i m = Import["hands.gif"] [[1, 1]]; L v v = gauge2DN[im, 2, 0, 3];


DisplayTogetherArray[ListDensltyPlot/@ (im, Lvvl, ImageSize ~ 450];

Figure 6.15 The invariant feature Lvv is a ridge detector. Here applied on an X-ray of two
hands at o- = 3 pixels. Image resolution: 361 x 239 pixels.

9 Task 6.4 Study the ridges Lvv of the fingers at different scales, and note the
scale-dependent interpretation.

Noise has structure too. Here are the ridges of uniform white noise:

i m = Table[Random[], (128}, (256}] ;


ListDensityPlot[gauge2DN[im, 2, 0, 4]] ;

Figure 6.16 The invariant feature Lvv detects the ridges in white noise here, o-= 4 pixels,
image resolution: 256 x 128 pixels.

9 Task 6.5 Study in the same way the gradient of white noise at a range of scales.
Do you see the similarity with a brain surface at larger scales?

We will encounter the second order gauge derivative Lvv in chapter 19 in the ’fundamental’
equation of Alvarez et al. [Alvarez1992a, Alvarez1993], a nonlinear (geometry driven)
OL = L w .
diffusion equation: ~T
6. Differential structure of images l 10

This equation is used to evolve the image in a way that locally adapts the amount of blurring
to differential invariant structure in the image in order to do e.g. edge-preserving smoothing.
We discuss this in detail in chapter 21.

Detection of ridges is an active topic in multi-scale feature detection [Koenderink1993a,


Maintz1996a, Eberly1993, Eberly1994, Eberly1994a, Eberly1994b, Damon1999,
Lindeberg1998b, Lop6z1999], as it focuses on the dual of boundaries.

6.6.2 Isophote and flowline curvature in gauge coordinates

The derivation of the formula for isophote curvature is particularly easy when we express the
problem in gauge coordinates. Isophote curvature K is defined as the change w" = - ~~- of
Ow
the tangent vector w’ = ~-v = v in the gradient-gauge coordinate system. The definition of an
isophote is: L(v, w) = Constant, and w = w(v). So, in Mathematica we implicitly differentiate
the equality (==) to v:

L [ v , w [ v ] ] == Constant;
v = . ; w = . ; D [ L [ v , w [ v ] ] == C o n s t a n t , v]

W" [V] L (~ Iv, W[V] ] + L <I'~ [V, W [ V ] ] == 0

We know that Lv =- 0 by definition of the gauge coordinates, so w’ = 0, and the curvature K


= w" is found by differentiating the isophote equation again and solving for w":

x = W ' ' [v] /. S o l v e [ D [ L [ v , w [ v I ] == C o n s t a n t , {v, 2}] /. w ' [v] -> 0, w ' ' [v] ]

L <2"~ [v, w[v] ]


{ T,io,~lIv, w[v]] }
Lvv
So x = - --~-. In Cartesian coordinates we recognize the well-known formula:

gauge2D[L[x, y] , 2, 0]
i m =. ; ~ = // shortnotation
gauge2D[L[x, y], 0, i]

-2 L x Lxy Ly + Lxx L 2 + L2x Lyy


(L~ + L~) 3/2

Here is an example of the isophote curvature at a range of scales for a sagittal MR image:

im = Import["mr256.gif"] [[1, 111;


x p l o t []

ListDensityPlot[ gauge2DN[im, 2, 0, #] , P l o t R a n g e - > {-5, 5)] &;


gauge2DN[im, O~ I, #1
111 6.6 Gauge coordinate invariants: examples

DisplayTogetherArray[
{ListDensityPlot[im], xplot[1], xplot[2], xplot[3]}, ImageSize-> 470];

Figure 6.17 The isophote curvature K is a rotationally and translationally invariant feature. It
takes high values at extrema. Image resolution: 2562 pixels.

The reason we see extreme low and high values is due to the singularities that occur at
intensity extrema, where the gradient Lw = O.

9 Task 6.6 Why was not in a single pixel infinite isophote curvature encountered?
There are many maxima and minima in the image.

Lop6z et al. [Lop6z2000b] defined a robust multi-scale version of a local curvature measure,
which they termed level set extrinsic curvature, based on the divergence of the gradient field,
integrated over a path (with a certain are: the scale) around the point of interest.
The perception o f curvature is influenced by its context, as is clear from the Tolansky’s
curvature illusion (see figure 6.18).

Show[
Graphics[{Thickness[.01], Circle[{0, 0}, 10, {0, ~}], Circle[{0, -4},
i0, {~/4, 3 ~ / 4 } ] , Circle[{0, -8}, i0, {3~/8, 5~/8}]}],
AspectRatio ~ Automatic, ImageSize -> 260];

Figure 6.18 Tolansky's curvature illusion, The three circle segments have the same curvature
1/10.

We remember the flowlines as the integral curves of the gradient. In figure 6.6 they were
depicted together with their duals, the isophotes. In that particular case, for such circular
objects flowlines are straight lines with curvature zero. In figure 6.6 the isophote curvature at
the top of the blob goes to infinity and is left out for that reason.
6. Differential structure of images 112

9 Task 6.7 Prove, with the methodology sketched a b o v e , that the flowline
Lvw
c u r v a t u r e e x p r e s s e d in first o r d e r g a u g e c o o r d i n a t e s is: p = Lw '

The third (and last) m e m b e r of the set of second order derivatives in gauge coordinates is
Lww. This is the derivative of the gradient in the gradient direction. So w h e n we want to find
the m a x i m u m of the gradient, we can inspect zeros of Lw w.
Historically, much attention is paid to the zerocrossings of the Laplacian due to the
groundbreaking work of Marr and Hildreth. As a rotational isotropic filter, and its close
analogy to the retinal receptive fields, its zerocrossings were often interpreted as the m a x i m a
of a rotational invariant edge detector. The zerocrossings are however displaced on curved
edges.

L~ we can establish a
Note that with the compact expression for isophote curvature x = - Lvv
relation between the Laplacian and the second order derivative in the gradient direction we
want to investigate for zerocrossings: Lw,~. From the expression of the Laplacian in gauge
coordinates AL = L,,w + Lvv = Lww -KL,,~ we see immediately that there is a deviation term
xLw which is directly proportional to the isophote curvature x. Only on a straight edge with
local isophote curvature zero the Laplacian is numerically equal to Lww. Without gauge
coordinates, this is m u c h harder to prove. It took Clark two full pages in P A M I to show this
[Clark1989] !

i m = Import["thorax02.gif"] [[I, i]];


Block[{$DisplayFunction = Identity},
p1 = ListDensityPlot[im];
p2 = ListContourPlot[gauge2DN[im, 0, 2, 4], C o n t o u r s - > {0}];
p3 []ListContourPlot[gD[im, 2, 0, 4] +gD[im, 0, 2, 4], C o n t o u r s - > {0}]];

DisplayTogetherArray[{Show[{p1, p2}], Show[{pl, p3}]}, ImageSize-> 380];

Figure 6.19 Contours of Lvv = 0 (left) and A L = 0 (right) superimposed on the X-thorax
image for o- = 4 pixels.

The term v = - ~Zw- w is not a curvature, but can be interpreted as a density of isophotes.
113 6.6 Gauge coordinate invariants : examples

Notice that the isophote curvature K = -- ~Lvv and t o w l i n e curvature /1 = - ~Lvw have equal
dimensionality for the intensity in both nominator and denominator. This leads to the
desirable property that these curvatures do not change when we e.g. manipulate the contrast
or brightness of an image. In general, these curvatures are said to be invariant under
monotonic intensity transformations. In section 6.7 we elaborate on this special case of
invariance.

6.6.3 Affine invariant corner detection

Comers are defined as locations with high isophote curvature and high intensity gradient. An
elegant reasoning for an a:ffine invariant comer detector was proposed by Blom
[Blom1991a], then a PhD student of Koenderink. We reproduce it here using Mathematica.
Blom proposed to take the product of isophote curvature - ~ and the gradient Lw raised to
some (to be determined) power n:

| = - L~v Lwn = KLwn = - L v v Lwn-1


L--~-

An obvious advantage is invariance under a transformation that changes the opening angle of
the corner. Such a transformation is the affine transformation. An affine transformation is a
linear transformation of the coordinate axes:

(x,)
y,
]
= a~-bc
( a
c d
b ) (x y)+(ef).

We omit the translation term (e f ) and study the affine transformation proper. The term
1 is the determinant of the transformation matrix, and is called the Jacobian. Its
ad-bc
purpose is to adjust the amplitude when the area changes.

A good example of the effect of an affine transformation is to study the projection of a


square from a large distance. Rotation over a vertical axis shortens the x-axis. Changing both
axes introduces a shear, where the angles between the sides change. The following example
illustrates this by an affine transformation of a square:

(,5)
square= { { 0 , 0 } , {1, 0}, {1, 1}, {0, 1}, {0, 0 } } ;
affine= 0 .5 ; afsquare = affine.#& /@square

//
DisplayTogetherArray[Graphics [Line[#] , A s p e c t R a t l o -> 1] & /@
{square, afsquare}, ImageSize -> 200] ;

Figure 6.20 Affine transformation of a square, with transformation matrix


(s0 2 ) mapped
.5
on each point.
6. Differential structure of images 114

The derivatives transform as


,(ab)
Oy, = a~S-bc c d (Ox Oy). We put the affine

transformation A = into the definition o f affinely transformed gauge coordinates:


c d

Clear[a, b, c, d] ;

gauge2Daffine[f_, nv_, nw_] := Module[{Lx, Ly, v, w, A = (a b ) }


c d '

W= {Lx ,y ;V= (01


- 01) .W;
Lx z +Ly ,2

Det[A] A.{0x#, 0r#} &,

1
Nest[w. ( Det-[A] A.{Ox#, 0y#} ) &, f, nw], nv] I.
{Lx' aLx+bLy cLx+dLy ~
-’ DettA----------Z’ L ~ , ’ - , Pet[A] " /. (Lx-,Oxf, Ly-,Oy~]],

The equation for the affinely distorted coordinates -L~~ ~. L ~. n-I now becomes:

-gauge2Daffine[L[x, y] , 2, 0] gauge2Daffine[L[x, y] , 0, 1] "-I //


Simplify // shortnotation

(bc_ad)2 ) (2 Lx Lxy Ly - Lxx Ly2 _ Lx2 Lyy )


(bc-ad) 2

Very interesting: when n = 3 and for an affine transformation with unity Jacobean
(a d - b c = 1, a so-called special transformation) we are independent of the parameters a , b,
c and d ! This is the affine invariance condition.

Lvv 3
So the expression | =LvvLw 2 = 2 L x L x y L y - L x x L y 2 - L x 2 L y y is an affine
invariant corner detector. This feature has the nice property that it is not singular at locations
where the gradient vanishes, and through its affine invariance it detects corners at all
’opening angles’.

We show corner detection at two scales on the ’Utrecht’ image:

im= SubMatrix[Import["Utrecht256.gif"] [[1, 1]], {1, 128}, {128, 128}];


c o r n e r 1 = gauge2DN[im, 2, 0, 1] gauge2DN[im, 0, 1, 112;
corner3 []gauge2DN[im, 2, 0, 3] gauge2DN[im, 0, 1, 2]z;
115 6.6 Gauge coordinate invariants: examples

DisplayTogetherArray[
ListDensityPlot/@ {im, corner1, corner3}, ImageSize ~ 5001 ;

Figure 6.21 Corner detection with the Lvv L~,,2 operator. Left: original image, dimensions
1282 . Middle: corner detection at o- = 1 pixel; right: corner detection at o- = 3 pixels. Isophote
curvature is signed, so note the positive (convex, light) and negative (concave, dark) corners.

Task 6.8 S h o w w h y the c o m p o u n d spike response, w h e r e an rotationally


invariant operator is applied on a spike i m a g e (discrete delta function), leads to
a rotationally symmetric response. An e x a m p l e is given below:

s p i k e = Table[0, {128}, {128}]; spike[[64, 64]] = 100;


gradient = gauge2DN[spike, 0, 1, 15];
cornerness = -gauge2DN[spike, 2, 0, 15] gauge2DN[spike, 0, 1, 1512;
DisplayTogetherArray[
ListDensityPlot/@ (spike, gradient, cornerness}, ImageSize -> 400] ;

Figure 6.22 Convolution of a spike (Delta function) image with a kernel gives the kernel itself
as result. Left: spike image, middle: response to the gradient kernel assembly, right:
response to the cornerness kernel assembly. Scale o- = 15 pixels, resolution image 1282 .

6.7 A curvature illusion


A particular visual illusion shows the influence of the multi-scale perception of a local
property, like curvature. In figure 6.23 the lines appear curved, though they are really straight.
6. Differential structure of images 116

star = Graphics[Table[
Line[{{Cos[~], Sin[~]}, {-Cos[~],-Sin[~]}}], {~, 0, ~, ~/20}]];
lines = Graphics[{Thickness[.015], DarkViolet,
Line[{{-l, .i}, {I, .i}}], Line[{{-l, -.I}, {i, -.I}}]}];
Show[{star, lines}, PlotRange~ {{-.4, .4}, {-.2, .2}l,
AspectRatio ~ Automatic, ImageSize -> 300];

Figure 6.23 The straight lines appear curved due to the surrounding pattern.

When we calculate the isophote curvature K = ~-Lv - - v for this figure at a coarse scale, we see
that the curvature is not constant along the horizontal lines, but changes when moving from
the center. Figure 6.24 shows the curvature and the profile along the center of the horizontal
line.

curvi11= Show[{star, lines}, PlotRange~ {{-.4, .4}, {-.2, .2}},


AspectRatio ~ A u t o m a t i c , I m a g e S i z e -> 432, DisplayFunction -> Identity];
Export["curvillusion-star.jpg", curvi11];
a+b+c
iml []Import["curvillusion-star.jpg"] [[i, i]] /. {a_, b_, c_} -> - - ;
3
DeleteFile["curvillusion-star.jpg"];

DisplayTogetherArray[
tL~ListDensityPl~ = gauge2nN[iml, 2, 0, 20]
gauge2nN[iml, 0, 1, 20] ' PlotRange-> {-.1, .1l,
Epilog-> {Red, Line[{{ll0, 161}, {320, 161}}]}],
ListPlot[Take[xl, {161, 161}, {110, 320l] //Flatten,
AspectRatio-> .4, AxesLabel-> {"", "KI"}]}, ImageSize->450];

Figure 6.24 Left: Isophote curvature K at a scale of o-= 20 pixels for the pattern in figure
6.23, dimensions image 216 x 432 pixels. Right: profile of curvature along the central portion
of the top horizontal line (to avoid boundary effects only the central portion is shown,
indicated by the red line in the left figure).
117 6.8 Second order structure

6.8 Second order structure


The second order structure of the intensity landscape is rich. To describe and to represent it,
we will develop a precise mathematical formulation in order to do a proper analysis.

Let us first develop some intuitive notions by visual inspection. Figure 6.25 shows a blurred
version of an X-thorax image is depicted as a height plot. We see hills and dales, saddle
points, ridges, maxima and minima. Clearly curvature plays an important role.

The second order structure of the intensity landscape L(x, y; 09 in a point L(x0, Y0; o-) is
described by the second order term in the local Taylor expansion around the point (x0, Y0).
Without any loss of generalization we take (xo, Yo) = (0, 0):

s = Series[L[x, y], {x, 0, 2}, {y, 0, 2}] // Normal // shortnotation

L[O, O] + X L x + ~ X2Lxx +y ( ~ 1 Y2 (x2 L .... +2 (XLxyy +Lyy))


+XLxy +Ly ) +~-

1
The second order term is ~-Lxx x 2 + Lxy x y + y1 Lyy y2 . The second order derivatives are
the coefficients in the quadratic polynomial that describes the second order landscape.

im= Import["thorax02.gif"] [[1, 1]];


DisplayTogetherArray[ListnensityPlot[im],
ListPlot3D[-gD[im, 0, 0, 2], Mesh ~ False], ImageSlze ~ 320];

Figure 6.25 Left: An X-thorax image (resolution 2562) and its 'intensity landscape' at o-= 2
pixels (right).

We investigate the role of the coefficients in this second order polynomial. In the graph
below we vary all three coefficients. In the three groups of 9 plots the value of the mixed
coefficient Lxy has been varied (value -1, 0 and 1). In each group the ’pure’ order terms Lxx
and tyy are varied (values -1, 0 and +1). In the middle group we see concave, convex,
cylindrical and saddle shapes.
6. Differential structure of images 118

Show[GraphicsArray[
Lxx
Table[GraphicsArray[Table[Plot3D[--~--x 2 + L x y x y + - - ~Lyy
- - y2, {x, -3, 3},
{y, -3, 3}, PlotRange ~ {-18, 18}, AspectRatio ~ 1,
DisplayFunction~ Identity, Boxed ~ True, Mesh ~ False],
{Lxx, -i, i}, {Lyy, -1, i}], F r a m e ~ Truel, {Lxy, -I, 1}I], I m a g e S i z e ~ 480];

Figure 6.26 Plots of - - ~ - x 2 + L x y x y + ~ - y 2 . Left: L x y = - l . Middle: Lxy=O. Right:


Lxy = 1. In each frame: upper row: Lxx = 1, middle row: Lxx = 0, lower row: Lxx = - 1 , left
column: Lyy = - 1 , middle column: Lyy = 0, right row: Lyy = 1.

When three variables are at steak, and a visual impression may give valuable insight, one can
exploit the trichromacy of our vision. We employ the invariant second order derivatives,
Lvv, L~w and Lww. This shows the triple {Lvv, Lvw, Lww} as RGBColor[Lvv, Lvw, Lww}
color directive settings in each pixel. The color coefficients for this function need to be
scaled between 0 and 255.

im = Import["thorax02.gif"]; a = 5; impix = im[[1, 1]]; imeolor = im;


min = Min[color = Transpose[{gauge2DN[impix, 2, 0, u],
gauge2DN[impix, 1, i, a], gauge2DN[impix, 0, 2, u]}, {3, i, 2}]];
color-min
max = Max[color-min]; imcolor[[1, I]] = N[ 255];
max
imeolor[[1, 4]] = ColorFunetion ~ RGBColor;
DisplayTogetherArray[Show/@ {im, imeolor}, ImageSize ~ 400];

Figure 6.27 Left: An X-thorax image (resolution 2562 ) and a mapping of the triple of invariant
second order derivatives {Lvv, Lvw, Lww} on the RGB coordinates in each pixel.
119 6.8 Second order structure

6.8.1 The Hessian matrix and principal curvatures

At any point on the surface we can step into an infinite number of directions away from the
point, and in each direction we can define a curvature. So in each point an infinite number of
curvatures can be defined. It runs out that the curvatures in opposite directions are always the
same. Secondly, when we smoothly change direction, there are two (opposite) directions
where the curvature is maximal, and there are two (opposite) directions where the curvature
is minimal. These directions are perpendicular to each other, and the extremal curvatures are
called the principal curvatures.

The Hessian matrix is the gradient of the gradient vectorfield. The coefficients form the
second order structure matrix, or the Hessian matrix, also known as the shape operator
[Gray1993 I. The Hessian matrix is a square, symmetric matrix:

(o~,~.[~, y] O~,~T.[~,y] )
h e s s l a n 2 D = ~@x,yL[ x, Y] @y,yL[x, y] ;

The Hessian matrix is square and symmetric, so we can bring it in diagonal form by
calculating the Eigenvalues of the matrix and put these on the diagonal elements:

D i a g o n a l M a t r i x [Eigenvalues [hessian2D] ] // s h o r t n o t a t i o n

{0, ~-
These special values are the principal curvatures of that point of the surface. In the diagonal
form the Hessian matrix is rotated in such a way, that the curvatures are maximal and
minimal. The principal curvature directions are given by the Eigenvectors of the Hessian
matrix, found by solving the characteristic equation I H - - K I I = 0 for x, where I... [
denotes the determinant, and I is the identity matrix (all diagonal elements are 1, rest zeros).

x=.; Solve[Det[hessian2D-xIdentityMatrix[2]] == 0, ~] // shortnotation

{{~-~ T1 (~.xx + w.~ - ,/T.~ + 4 ~ - 2 ~xx ~ y + T.~ ) },

The command to calculate Eigenvalues is built into Mathematica:

{xz, ~2} []E i g e n v a l u e s [ h e s s i a n 2 D ] //FullSimplify;


{Kz , ~2 } // s h o r t n o t a t i o n

The two principal curvatures are equal when 4Lxy z + (Lyy - L x ~ ) 2 is zero. This happens in
so-called umbilical points. In umbilical points the principal directions are undefined. The
surface is locally spherical. The term 4Lxy 2 + (Lyy - L x x ) 2 can be interpreted as ’deviation
from sphericalness’.
6. Differential structure of images 120

6.8.2 The shape index

When the principal curvatures K~ and K2 are considered coordinates in a 2D ’shape graph’,
we see that all different second order shapes are represented. Each shape is a point on this
graph. The following list gives some possibilities:

When both curvatures are zero we have theflat shape.


When both curvatures are positive, we have concave shapes.
When both curvatures are negative, we have convex shapes.
When both curvatures the same sign and magnitude: spherical shapes.
When the curvatures have opposite sign: saddle shapes.
When one curvature is zero: cylindrical shapes.

Koenderink proposed to call the angle, of where the shape vector points to, the shape index.
It is defined as:
shapeindex -= Z arctan x, +~2 K1 > K2 ’
7( K I - - K2 ’ - -

The expression for ~ +~’-can be markedly cleaned up:


g1 -x2

S i m p l i f y [ ~t
] ~_+ I~2
~2 / / shortnotation

-Lxx - Lyy
4 L2xy + (Lxx - L y y ) 2

so we get for the shape index:


shapeindex -= Z arctan( -Lxx-Lyy )
Jr 4Lxx2+4Lxy2_2Lxx Lyy+Lyy2 "

The shape index runs from -1 (cup) via the shapes trough, rut, and saddle rut to zero, the
saddle (here the shape index is undefined), and the goes via saddle ridge, ridge, and dome to
the value of +1, the cap.

The length of the vector defines how curved a shape is, which gives Koenderink’s definition
of eurvedness:

curvedness =- Ti V~12 + K22

1
-- ~ K I 2 +K2 2 // S i m p l i f y / / s h o r t n o t a t i o n
2
1
~P'~x + 2 % + ~.~

shapes =
T a b l e [ G r a p h i c s A r r a y [ T a b l e [ P l o t 3 D [ K 1 x 2 +~2 y2, {x, -3, 3}, {y, -3, 3},
P l o t R a n g e ~ {-18, 18}, P l o t L a h e l ->
"~i=" <> ToString[E1] <> ", x2=" <> ToString[x2], A s p e c t R a t i o ~ 1,
D i s p l a y F u n c t i o n ~ Identity, B o x e d ~ T r u e , M e s h ~ False],
{~2, 1, - 1 , - 1 } , {~I, - I , i } ] ] ] ;
121 6.8 Second order structure

Show[
GraphicsArray[{Graphics[{Arrow[{0, 0}, {.7, .5}], Red, PointSize[.02],
Point[{.7, .5}]}, PlotRange~ {{-I, i}, {-I, i}},
Frame ~ Truel Axes ~ True, AxesLabel ~ {"xz", "x2"},
A s p e c t R a t i o ~ 1], shapes}l, ImageSize~450];

Figure 6.28 Left: Coordinate space of the shape index. Horizontal axis: maximal principal
curvature K1, vertical axis: minimal principal curvature K2 . The angle of the position vector
determines the shape, the length the curvedness. Right: same as middle set of figure 6.22.

Here is the shape index calculated and plotted for every pixel on our familiar MR image at a
scale of 0-=3 pixels:

i m = Import["mr128.gif"] [[i, i]];


2
shapeindex[im_, a_] := --ArcTan[(-gD[im, 2, 0, a] - gD[im, 0, 2, G]) /

(~ (gD[im, 2, 0, G] 2 +4gD[im, I, i, a] 2 -
2gD[im, 2, 0, O l gD[im, 0, 2, u] +gD[im, 0, 2, a]2))];
DisplayTogetherArray[ListDensityPlot[shapeindex[im, #]] & /@ Range[Sl,
ImageSize ~ 400];

Figure 6.29 Shape index of the sagittal MR image at o- = 1,2, 3, 4 and 5 pixels.
6. Differential structure of images 122

c u r v e d n e s s [ i m _ , o_] :=
~ ( g D [im, 2, 0, ~ § 2 gD[im, i, i, ~ + gD[im, 0, 2, a] 2) ;
DisplayTogetherArray[ListDensityPlot[curvedness[im, #]] & /@ R a n g e [ 4 ] ,
I m a g e S i z e ~ 400];

Figure 6.30 Curvedness of the sagittal MR image at o- = 1,2, 3 and 4 pixels.

6.8.3 Principal directions

The principal curvature directions are given by the Eigenvectors of the Hessian matrix:

{vKz, vK2 ] = E i g e n v e c t o r s [hessian2D] ; (vK1, vK2 } // s h o r t n o t a t i o n

{{- -''~ +''~" + 4~'~x + 4 ~':~- 2 ~'xx~'~ +''~ i}


2 Lxy ' '

{ ~'~ -''~ + 4L~x + ~ % - 2 ~'x~~'" +''~' i}}


2 Lxy

The Eigenvectors are perpendicular to each other, there inner product is zero:

v~1.vx 2 // S i m p l i f y

The local principal direction vectors form locally a frame. We inspect how the orientations of
such frames are distributed in an image. We orient the frame in such a way that the largest
Eigenvalue (maximal principal curvature) is one direction, the minimal principal curvature
direction is Jr/2 rotated clockwise.

plotprincipalcurvatureframes[im_,o_] :=
M o d u l e [ ( h e s s i a n , frame, frames},
(gDEim, 2, 0, oi gDEi~, i, I, o1 1
h e s s i a n = - gD[im, 1, i, o] gD[im, 0, 2, o] - ;
frame [] {Green, A r r o w [ # 2 - .5, #2 - .5 + F i r s t [#I]],
Red, Arrow[#2- .5, # 2 - . 5 + L a s t [ # 1 ] ] } &;
frames = M a p I n d e x e d [ f r a m e , .5 M a p [ E i g e n v e c t o r s ,
T r a n s p o s e [ h e s s i a n , (4, 3, 2, 1}], {2}], (2}];
p l o t = L i s t D e n s i t y P l o t [ g D [ i m , 0, 0, o], E p i l o g ~ frames]l
i m = I m p o r t [ " m r 3 2 . g i f " ] [[i, i]];
123 6.8 Second order structure

plotprincipalcurvatureframes[im, 1];

Figure 6.31 Frames of the normalized principal curvature directions at a scale of 1 pixel.
Image resolution 322 pixels. Green: maximal principal curvature direction; red: minimal
principal curvature direction.

The principal curvatures have been employed by Niessen, ter Haar Romeny and Lopdz in
studies to the 2D and 3D structure of trabecular bone [TerHaarRomeny1996f, Niessen1997b,
Lopdz200a]. The local structure was defined as f i a t when the two principal curvatures of the
iso-intensity surface in 3D were are both small, as r o d - l i k e if one of the curvatures was small
and the other high, giving a local cylindrical shape, and s p h e r e - l i k e if two principal
curvatures were both high. See also Task 19.8.

6.8.4 Gaussian and mean curvature

The Gaussian curvature ’K" is defined as the product of the two principal curvatures:
J ~ = KI K2 .

~r = KI ~2 // S i m p l i f y // s h o r t n o t a t i o n

-L2y + Lxx Lyu

The Gaussian curvature is equal to the determinant of the Hessian matrix:

D e t [hessian2D] // shortnotation

-L2y + Lxx Lyy

The sign of the Gaussian curvature determines if we are in a concave / convex area (positive
Gaussian curvature) or in a saddle-like area (negative Gaussian curvature). This shows
saddle-like areas as dark patches:
6. Differential structure of images 124

i m = I m p o r t [ " m r 2 5 6 . g i f " ] [[I, i]];


a = 5; ~ = -gD[im, i, 1, a ] 2 + gD[im, 2, 0, G] gD[im, 0, 2, a];
DisplayTogetherArray[Append[ListDensityPlot/@ {~, S i g n [ ~ ] } ,
ListContourPlot[~, C o n t o u r s ~ {0}]], I m a g e S i z e ~ 390];

Figure 6.32 Left: Gaussian curvature 7(for a 2562 sagittal MR image at a scale of 5 pixels.
Middle: sign of %. Right: zerocrossings of 'K.

The locations where the Gaussian curvature is zero, are characterized by the fact that at least
one o f the principal curvatures is zero. The collection o f locations where the Gaussian
curvature is zero is known as the parabolic lines. It was shown by Koenderink that these
lines play an important role in reflection and shape-from-shading.

The mean curvature is defined as the arithmetic mean of the principal curvatures: 74 --
xl +K2
2

The mean curvature is related to the trace o f the Hessian matrix:

~1 + K2
g~ = - - // S i m p l i f y // shortnotation
2
I
~- (Lxx + Lyy )

Tr [hessian2D] / / shortnotatlon

Lxx + Lyy

The relation between the mean curvature 74 and the Gaussian curvature 7s is given by
x2 - 2 74 K + 7(" = 0, which has solutions:

9~=.; 9r S o l v e [ K 2 -294K+9r 0, K]

The mean curvature 74 and the Gaussian curvature ’K" are well defined in umbilical points.

The directional derivative of the principal curvature in the direction of the principal direction
is called the extremality [Monga1995].

Because there are two principal curvatures, there are two extremalities, v~I.VK1 and V~z.VK2 :

< < Calculus" V e c t o r A n a l y s i s" ;


125 6.8 Second order structure

el = v x 1 . T a k e [ G r a d [ { x l , 0, 0}, C a r t e s i a n [ x , y, z]], 2] // F u 1 1 S i m p l i f y ;
el // s h o r t n o t a t i o n

4 Lxy Lxxx + Lxyy + 4 4 Lx2y + (Lxx - L ~ ) 2

i [ + +
4 Lxy

e2 = v ~ 2 . T a k e [ G r a d [ { K 2 , O, 0}, C a r t e s i a n [ x , y, z]], 2] // F u l l S i m p l i f y ;
e2 // s h o r t n o t a t i o n

4 Lxy

The lines defined by the zerocrossings of each of these two extremalities are called the
extrema/lines |Thirion1995a,Thirion1996]. There are 4 types of these lines:
- lines o f m a x i m u m largest principal curvature (these are called crest lines);
- lines o f m i n i m u m largest principal curvature;
- lines o f m a x i m u m smallest principal curvature;
- lines o f m i n i m u m smallest principal curvature.

The product of the two extremalities is called the Gaussian extremality ~ = e l . e z , a true
local invariant [Thirion1996]. The boundaries of the regions where the Gaussian extremality
changes sign, are the extremal lines.

el .e2 // S i m p l i f y / / shortnotation

- (L2xy (L~x x + 2 L ~ L~yy - 3 (LZxxy + Lx2yy)) +


Lxxx Lxyy (Lxx - Lyy ) 2 + 2 Lxxx Lxxy nxy (-Lxx + Lyy ) +
(Lx2x Lxxy - 2 Lxy Lxyy Lyy + 2 Lxx (Lxz Lxyy - Lxxy Lyy ) + Lxxy <2 Zx2y + L2y ) > Lyyy +
Lx2y Lyyy)
2 2 + 4 Lxy
/ (Lxx 2 - 2 L ~ Lyy + Ly2M )
6. Differential structure of images 126

DisplayTogetherArray[
ListDensityPlot[Sign[el.e2 /. Derivative[nx_, ny_][Ll[x_, y_]
gD[im0, nx, ny, #] /. im0 ~ im]] & /@ {2, 6, 10}, ImageSize ~ 400l;

Figure 6.33 Left: Gaussian extremality G = el e2 for a 2562 sagittal MR image at a scale of 2
pixels (left), 6 pixels (middle) and 10 pixels (right).

The mesh that these lines form on an iso-intensity surface in 3D is called the extremal mesh.
It has been applied for 3D image registration, by extracting the lines with a dedicated
’marching lines’ algorithm [Thirion 1996].

Show[Import["extremal mesh - Thirion.jpg"], I m a g e S i z e ~ 250];

Figure 6.34 Extremal mesh on a 3D skull from a 3D CT dataset. The extremal lines are found
with the marching lines algorithm. From [Thirion1993].

6.8.5 Minimal surfaces and zero Gaussian curvature surfaces

Surfaces that have everywhere mean curvature zero, are called minimal surfaces. There are
many beautiful examples of such surfaces (see e.g. the Scientific Grahics Project,
http://www.msri.org/publications/sgp/SGP/indexc.html. Soap bubbles are famous and much
studied examples of minimal surfaces.

From the wonderful interactive book by Alfred Gray [Gray1993] (written in Mathematica)
we plot two members of the family of zero Gaussian curvature manifolds that can be
constructed by a moving straight line through 3D space:
127 6.8 Second order structure

heltocat[t_] [u_, v_] := Cos[t] {Sinh[v] Sin[u], -Sinh[v] Cos[u], u} +


Sin[t] {Cosh[v] Cos[u], Cosh[v] Sin[u], v};
moebiusstrip[u , v_] := [Cos[u] + v C o s [ u / 2 ] Cos[u],
Sin[u] + v C o s [ u / 2 ] Sin[u], v S i n [ u / 2 ] };
DisplayTogetherArray [{ParametricPlot 3D [
Evaluate [heltocat [0] [u, v]], {u, -~, 7r}, {v, -~, ~},
PlotPoints-> 30, A x e s - > N o n e , BoxRatios-> {I, 1, 1},
PlotRange-> {{-13, 13}, {-13, 13}, {-~, ~}}],
ParametricPlot3D[moebiusstriplu, v] // Evaluate,
{u, 0, 2Pi}, {v, -.3, .3}, PlotPoints-> {30, 4},
A x e s -> None] }, ImageSize ~ 390] ;

Figure 6.35 Surfaces with zero Gaussian curvature. Left the helicoid, a member of the
heltocat family. Right the Moebius strip. Both surfaces can be constructed by a moving
straight line. From [Gray1993].

9 Task 6.9 Which surfaces have constant mean curvature? And which surfaces
have constant Gaussian curvature?

9 Task 6.10 If I walk with my principal coordinate frame over an egg, something
goes wrong when I walk through an umbilical point. What?

6.9 Third order image structure: T-junction detection


An example of third order geometric reasoning in images is the detection of T-junctions
[TerHaarRomeny1991a, TerHaarRomeny1993b]. T-junctions in the intensity landscape of
natural images occur typically at occlusion points. Occlusion points are those points where a
contour ends or emerges because there is another object in front of the contour. See for an
artistic example the famous painting ’the blank cheque’ by Magritte.
6. Differential structure of images 128

Show[Import["blank cheque.jpg"], ImageSize-> 210];

Figure 6.36 The painting 'the blank cheque' by the famous Belgian surrealist painter Ren6
Magritte (1898 - 1967). Source: Paleta (www.paletaworld.org).

In this section we develop a third order detector for "T-junction-likeniness". In the figure
below the circles indicate a few particular T-junctions:

blocks Import["blocks.gif"][[1, i]];


=

ListDensityPlot[blocks,
Epilog-> (circles [] {Circle[(221, 178}, 13], Circle[(157, 169}, 13],
Circle[(90, 155}, 13], Circle[{148, 56}, 13],
Circle[(194, 77}, 13], Circle[{253, 84}, 13]}), ImageSize-> 300];

Figure 6.37 T-junctions often emerge at occlusion boundaries. The foreground edge is most
likely to be the straight edge of the "T", with the occluded edge at some angle to it. The
circles indicate some T-junctions in the image.

W h e n we zoom in on the T-junction of an observed image and inspect locally the isophote
structure at a T-junction, we see that at a T-junction the derivative of the isophote curvature K
129 6.9 Third order image structure: T-junction detection

in the direction perpendicular to the isophotes is high. In the figure below the isophote
landscape of a blurred T-junction illustrates the direction of maximum change of K:

i m = Table[If[y< 64, 0, i] + I f [ y < x & & y > 63, 2, 1], {y, 128}, {x, 128}];
DisplayTogetherArray[ListDensityPlot[im],
ListContourPlot[gD[im, 0, 0, 7], C o n t o u r s ~ 15,
PlotRange~ {-0.3, 2.8}], ImageSize-> 280];

Figure 6.38 The isophote structure (right) of a simple idealized and observed (blurred) T-
junction (left) shows that isophotes strongly bend at T-junctions when we walk through the
intensity landscape.

When we study the curvature of the isophotes in the middle of the image, at the location of
the T-junction, we see the isophote ’sweep’ from highly curved to almost straight for
decreasing intensity. So the geometric reasoning is the "the isophote curvature changes a lot
cq,~ .
when we traverse the image in the w direction". It seems to make sense to study ~ ; .

Lvv
We recall that the isophote curvature K is defined as K = L~ ".

gauge2D[L[x, y], 2, 0]
x = ; K // S i m p l i f y // shortnotation
gauge2D[L[x, y], 0, i]

-2 Lx Lxy Ly + Lxx L 2 + Lx2 Lyy


(L~ +L~) ~/~

The derivative of the isophote curvature in the direction of the gradient, ~ox- is quite a
complex third order expression. The formula is derived by calculating the directional
derivative of the curvature in the direction of the normalized gradient. We define the gradient
(or nabla: V) operator with a pure function:

grad= {Ox#, 0y#} &;


grad[L[x, y] ]
dKdw = .grad [x] ;
~grad[L[x, y]].grad[L[x, y]]
dKdw // Simplify // shortnotation
1
(T.~ +T.~) 3
(Lxxy Ly5 + Lx4 ( _ 2 Lx2y + Lx Lxyy - Lxx Lyy ) - Ly4 (2 L2xy - Lx (Lxxx - 2 Lxyy ) + Lxx LyM ) +
T,~ T,2 (_37,~x +8T,~xy+T.x (T,xxx_T.x~) +4T. x T , - 3 T , ~ y ) +
Lx3 Ly (6 Lxy (Lxx - Lyy ) + Lx (-2 Lxxy + Lyyy ) ) +
Lx Ly3 (6Lxy (_Lx x +Lyy) + L x (-Lxxy + L y y y ) ) )
6. Differential structure of images 130

To avoid singularities at vanishing gradients through the division by (Lx 2 + Ly2) 3 = Lw 6 we


use as our T-junction detector r = ~0~ Lw 6 :

tjunction = d x d w (grad[L[x, y] ] . g r a d [L[x, y] ])3 ;


tj u n c t i o n / / shortnotation

L~ L~yy + L~ <-2 L2xy + Lxxy Ly - Lxx Lyy) +


Lx3 Ly ( 6 Lxx Lxy + Lxxx Ly - Lxyy Ly - 6 Lxy Lyy ) +
L x L 3 ( - 6 Lxx Lxy + Lxx x Ly - 2 Lxyy L~, + 6 Lxy Lyy ) -
L 4 (2 L2xy + 2 Lxxy Ly + Lxx Lyy - LM Lyyy ) +
L x2 Ly2 (_ 3 Lxx
2 + 8 Lxy
2 - Lxxy Ly + 4 Lxx Lyy - 3 Lyy
z + L7 Lyyy )

Finally, we apply the T-junction detector on our blocks at a rather fine scale of o- = 2 (we
plot - t j u n c t i o n to invert the contrast):

a = 2; L i s t D e n s i t y P l o t [
tjunction /. D e r i v a t i v e [ n x _ , ny_][L][x, y] -> g D [ i m 0 , nx, ny, o] /.
im0->blocks, Epilog-> circles, ImageSize-> 230];

OK
Figure 6.39 Detection of T-junctions in the image of the blocks with the detector r = 7,, tw 6 .
The same circles have been drawn as in figure 6.32.

Compare the detected points with the circles in the input image. Note that in medical
tomographic images (CT, MR, PET, SPECT, US) there is no occlusion present. One can
however use third order properties in any geometric reasoning scheme, as the ’change of a
second order property’.

9 T a s k 6.11 Investigate if the e x p r e s s i o n for the T-junction r = ~OK


- L w 6 is affine
invariant.

T a s k 6.12 A n o t h e r definition for a T-junction d e t e c t o r might be the m a g n i t u d e o


( 0K 2 aK 2
the g r a d i e n t of the curvature: r = ~ / , a~v j + ( - ~ ) Lw6, or the d e r i v a t i v e of the
c u r v a t u r e in the v direction: a~ Lw6 . S t u d y and explain the differences.
131 6.10 Fourth order image structure: junction detection

6.10 Fourth order image structure: junction detection


As a final fourth order example, we give an example for a detection problem in images at
high order of differentiation from algebraic theory. Even at orders of differentiation as high
as 4, invariant features can be constructed and calculated for discrete images through the
biologically inspired scaled derivative operators. Our example is to find in a checkerboard
the crossings where 4 edges meet. We take an algebraic approach, which is taken from
Salden et al. [Salden1999a].

When we study the fourth order local image structure, we consider the fourth order
polynomial terms from the local Taylor expansion:

p o 1 4 = L x x x x x 4 + 4 L x x x y x 3 y + 6 L x x y y x 2 y2 + 4 L x y y y x y3 + L y y y y y4 ;

The main theorem of algebra states that a polynomial is fully described by its roots: e.g.
ax2+bx+c=(x-xl)(X-X2). It was shown more than a century ago by Hilbert
[Hilbert1890] that the ’coincidencesness’ of the roots, or how well all roots coincide, is a
particular invariant condition. From algebraic theory it is known that this ’coincidenceness’ is
given by the discriminant, defined below (see also [Salden 1999a]):

Discriminant [p_, x_] : =

With[{m=Exponent[p, x]}, Cancel[ (-l)~m(m-1)CoefficientResultant[P'[p,


x, m]OXP' x] ]]

The resultant of two polynomials a and b, both with leading coefficient one, is the product of
all the differences a i - bj between roots of the polynomials. The resultant is always a
number or a polynomial. The discriminant of a polynomial is the product of the squares of all
the differences of the roots taken in pairs. We can express our function in two variables
{x, y} as a function in a single variable yx by the substitution y ~ 1. Some examples:

Discriminant [Lxx x ~ + 2 L x y x y + L y y y2 9x] /. {y ~ 1}

-4 ( - L x y 2 + L x x Lyy)

The discriminant of second order image structure is just the determinant of the Hessian
matrix, i.e. the Gaussian curvature. Here is our fourth order discriminant:

Discriminant[po14, x] /. { y ~ 1} // S i m p l i f y

256 (-27 L x x x y 4 L y y y y 2 + L x x x y 3 (-64 L x y y y 3 + 108 L x x y y L x y y y L y y y y ) -


12 L x x x x L x x x y L x y y y (-9 L x x y y L x y y y 2 + 15 L x x y y 2 L y y y y + L x x x x L y y y y 2 ) -
6 L x x x y 2 (-6 L x x y y 2 L x y y y 2 + 9 L x x y y 3 L y y y y +
Lxxxx Lxyyy 2 Lyyyy - 9 Lxxxx Lxxyy Lyyyy 2 ) +
L x x x x (-54 L x x y y 3 L x y y y 2 + 81 L x x y y 4 L y y y y + 54 L x x x x L x x y y L x y y y 2 L y y y y -
18 L x x x x L x x y y 2 L y y y y 2 + L x x x x (-27 L x y y y 4 + L x x x x L y y y y 3 ) ) )

It looks like an impossibly complicated polynomial in fourth order derivative images, and it
is. Through the use of Ganssian derivative kernels each separate term can easily be
6. Differential structure of images 132

calculated. We replace (with the o p e r a t o r / . ) all the partial derivatives into scaled Gaussian
derivatives:

discr4[im_, ~_] :=Discriminant[pol4, x] /.


{y ~ i, Lxxxx ~ gD[im, 4, 0, ~], Lxxxy ~ gD[im, 3, 1, a],
Lxxyy ~ gD[im, 2, 2, a], Lxyyy ~ gD[im, I, 3, a], Lyyyy ~ gD[im, 0, 4, a]l

Let us apply this high order function on an image of a checkerboard, and we add noise with
twice the maximum image intensity to show its robustness, despite the high order derivatives
(see figure 6.40).

Note that we have a highly symmetric situation here: the four edges that come together at the
checkerboard vertex cut the angle in four. The symmetry can be seen in the complex
expression for discr4: only pure partial derivatives of fourth order occur. For a less
symmetric situation we need a detector which incorporates in its expression also the lower
order partial derivatives. For details see [Salden1999a].

tl = Table[If[(x > 5 0 & & y > 50) II (x s 5 0 & & y s 5 0 ) , o, zoo] + 2 o 0 . R a n d o m [ ] ,


{x, zoo}, {y, zoo}];
t2 = T a b l e [ Z f [ ( x + y- 100 > 0&& y-x < 0)II (x + y - 1 o o ~ o & & y - x z o),
0, i00] + 200* Random[], {x, I00}, {y, i00}];

noisycheck = Transpose[Join[t1, t2]];


DisplayTogetherArray[ListDensityPlot/@
{noisycheck, discr4[noisycheck, 5]}, ImageSize-> 400];

Figure 6.40 Left: A noisy checkerboard detail at two orientations. Right: the output of the 4 th
order discriminant. The detection clearly is rotation invariant, robust to noise, and there is no
detection at corners (e.g. center of the image).

6.11 Scale invariance and natural coordinates


The intensity of images and invariant features at larger scale decreases fast. This is due to the
non-scale-invariant use of the differential operators. For, if we consider the transformation
xo- _~ ~, then ~ is dimensionless. At every scale now distances are measured with a distance
yardstick with is scaled relative to the scale itself. This is the scale-invariance.

The dimensionless coordinate is termed the natural coordinate. This implies that the
derivative operator in natural coordinates has a scaling factor: TU ~ o . n ----~
0n
9x 9

Here we generate a scale-space of the intensity gradient. To study the absolute intensities, we
plot every image with the same intensity plotrange of {0,40}:
133 6.11 Scale invariance and natural coordinates

im= Import["mr128.gif"] [[1, 1]]; Block[{$DisplayFunction= Identity} 9


pl = Table[grad= ~ (gD[im, i, 0, ~]2 +gD[im, 0, i, G]2);
ListDensityPlot[#, PlotRange -> {0, 40]] & /@ {grad, G grad}
9{a, i, 5}]]; Show[GraphicsArray[Transpose[pl]], I m a g e S i z e ~ 4 5 0 ] ;

Figure 6.41 The gradient of a 1282 image plotted at 5 scales, for o- = 1, 2, 3, 4 and 5 pixels
respectively. All images (in both rows) are plotted at a fixed intensity range {0,40}. Top row
shows the regular gradient, clearly showing the decrease in intensity for larger blurring.
Bottom row: the gradient in natural coordinates (multiplied by o-). The intensity dynamic
range is now kept more or less constant.

Clearly the gradient magnitude expressed in the natural coordinates keeps its average output
range. For a Laplacian scale-space stack in natural coordinates we need to multiply the
32 32 o.2( 02 #z
Laplacian with o-2 : a T + o 7 = kaY- + ~7-)’ and so on for higher order derivative
operators in natural coordinates.

Block[{$DisplayFunction = Identity},
pl=Table[lapl=gD[im, 2, 0, a] +gD[im, 0, 2, Q];
ListDensityPlot[#, PlotRange -> {-90 9 60}] & /@ {lapl, a 2 lapl}
9{a, 1, 5}]]; Show[GraphicsArray[Transpose[p1]], I m a g e S i z e ~ 450];

Figure 6.42 The Laplacian of a 1282 image plotted at 5 scales, for o = 1, 2, 3, 4 and 5 pixels
respectively. Top row: Laplacian in regular coordinates. Bottom row: Laplacian in natural
coordinates. Top and bottom rows at fixed intensity range of {-90,60}.
6. Differential structure of images 134

6.12 Irreducible invariants


Invariant differential features are independent of changes in specific groups of coordinate
transformations. Note that the transformations of the coordinates are involved as the basic
physical notion, as the particular choice of coordinates is just a mean to describe the world,
the real situation should be independent of this choice. This is often misunderstood, e.g.
when rotation invariance is interpreted as that all results are the same when the image itself is
rotated. Rotation invariance is a local property, and as such a coordinate rotation and an
image rotation are only the same when we consider a single point in the image.

For medical imaging the most impo~ant groups are the orthogonal transformations, such as
translations, rotations, mirroring and scaling, and the affine transformations, such as shear.
There are numerous other groups of transformations, but it is beyond the scope of this book
to treat this. The differential invariants are the natural building blocks to express local
differential structure.

It has been shown by Hilbert [Hilbert1893] that any invariant of finite order can be
expressed as a polynomial function ~,f a set of irreducible invariants. This is an important
result. For e.g. scalar images these invariants form the fundamental set of image primitives in
which all local intrinsic properties can be described. In other words: any invariant can be
expressed in a polynomial combination of the irreducible invariants.

Typically, and fortunately, there are only a small number of irreducible invariants for low
order. E.g. for 2D images up to second order there are only 5 of such irreducibles. We have
already encountered one mechanism to find the irreducible set: gauge coordinates. We found
the following set:

Zeroth order L
First order L~
Second order Lvv, Lvw, Lww
Third order Lvvv, L . . . . L . . . . Lwww
etc.

Each of these irreducible invariants cannot be expressed in the others. Any invariant property
to some finite order can be expressed as a combination of these irreducibles. E.g. isophote
curvature, a second order local invariant feature, is expressed as: x = - L v v / L w .
Note that the first derivative to v is missing. But Lv =- 0 is just the gauge condition! There is
always that one degree of freedom to rotate the coordinate system in such a way that the
tangential derivative vanishes. This gives a way to estimate the number of irreducible
invariants for a given order: It is equal to the number of partial derivative coefficients in the
local Taylor expansion, minus 1 for the gauge condition. E.g. for the 4 th order we have the
partial derivatives L~vvv, Lvvvw, Lv~ww, Lvwww, and Lwwww, so in total we have
1 + 1 + 3 + 4 + 5 = 14 irreducible invariants for the 4 th order.
These irreducibles form a basis for the differential invariant structure. The set of 5
irreducible grayvalue invariants in 2D images has been exploited to classify local image
structure by Schmidt et al. [Schmidt1996a, Schmidt1996b] for statistical object recognition.
135 6.12 Irreducible invariants

This assigns the three RGB channels of a color image to the irreducible invariants
{L, Lw and Lvv + Lww} of a scalar grayvalue image for o- = 2 pixels:

im= Import["mr256.gif"] ; p x = im[ [i, 1] ] ; a = 2;

r = gD[px, 0, 0, a] ; g = 4 g D [ p x , i, 0, a] 2 + g D [ p x , 0, i, 0] 2 ;
b = gD[px, 2, 0, a] 2 + g D [ p x , 0, 2, a]2;
255 255
g =g ;b =b - - ; i m t r = T r a n s p o s e [ {r, g, b}, {3, i, 2} ] ;
Max[g] Max[b]
ira[ [i, i] ] = imtr; im[ [i, 4] ] = C o l o r F u n c t i o n ~ R G B C o l o r ; Show[im, I m a g e S i z e -> 150] ;

Figure 6.43 RGB color coding with the triplet of differential invariants {L, Lj Lj, Ljj}.

Intermezzo: Tensor notation

There are many ways to set up an irreducible basis, but it is beyond the scope of this
introductory book to go in detail here. We just give one example of another often used
scheme to generate irreducible invariants: tensor notation (see for details e.g.
[Florack1993a]). Here tensor indices denote partial derivatives and run over the dimensions,
e.g. L i denotes the vector {Lx, Ly}, L i j denotes the second order matrix (the Hessian)

When indices come in pairs, summation over the dimensions is implied (the so-called
Einstein summation convention, or contraction): L i i = ~Di=xL i i = Lxx + Lyy, etc. So we get:

Zeroth order L
First order LiLi (= Lx Lx + Ly Ly , the gradient)
Second order Lii (= Lxx + Lyy, the Laplacian)
Lij Lji (= Lxx 2 + 2 Lxy + Lyy 2 , the ’deviation from flatness’),
Li Li j Lj (= Lx 2 Lxx d- 2 L x Ly Lxy q- Ly 2 Lyy, ’curvature’)
etc.

Some statements by famous physicists:


- "Gauge invariance is a classic case of a good idea which was discovered before its time."
(K. Moriyasu, An Elementary Primer for Gauge Theory, World Scientific, 1984).
- "The name ’gauge’ comes from the ordinary English word meaning ’measure’. The history
of the use of this name for a class of field theories is very roundabout, and has little to do
6. Differential structure of images 136

with their physical significance as we now understand it." (S. Weinberg, "The Forces of
Nature", Am. Scientist, 65, 1977).
- "As far as I see, all a priori statements in physics have their origin in symmetry." (H. Weyl,
Symmetry, 1952).

6.13 Summary of this chapter


Invariant differential feature detectors are special (mostly) polynomial combinations of
image derivatives, which exhibit inwariance under some chosen group of transformations.
We only discussed invariance under translations and rotations, the most common groups,
especially for medical images. The derivatives are easily calculated from the image through
the multi-scale Gaussian derivative kernels.

The notion of invariance is crucial for geometric relevance. Non-invariant properties have no
value in general feature detection tasks. A convenient paradigm to calculate features
invariant under Euclidean coordinate transformations is the notion of gauge coordinates. For
first order in 2D they are defined as a local frame with one unit vector W pointing in the
direction of the gradient, the other ]perpendicular unit vector ~ pointing in the direction
tangential to the isophote. Any combination of derivatives with respect to v and w is
invariant under Euclidean transformations. We discussed the second order examples of
isophote and towline curvature, cornerness and the third order example of T-junction
detection in this framework.

Mathematica offers a particularly attractive framework, in that it combines the analytical


calculation of features under the Euclidean invariance condition with a final replacement of
the analytical derivatives with numerical Gaussian derivatives. In this way even high order
(up to order 4) examples could be discussed and calculated.
7. Natural limits on observations
He who asks a question is a fool for five minutes;
he who does not ask a question remains a fool forever.
Chinese Proverb

7.1 Limits on differentiation: scale, accuracy and order


For a given order of differentiation we find that there is a limiting scale-size below which the
results are no longer exact. E.g. when we study the derivative of a ramp with slope 1, we
expect the outcome to be correct. Let us look at the observed derivative at the center of the
image for a range of scales (0.4 < o- < 1.2 in steps of 0.1):

<<FrontEndVision'FEV'; i m = Table[x, {y, 64}, {x, 64}];


b = Table[{a, gDf[im, I, 0, a] [[32, 32]]}, {~, .4, 1.2, .I}];
ListPlot[b, PlotJolned ~ True,
PlotRange ~ All, AxesLabel ~ {"o", "0xL"},
P l o t S t y l e - > T h i c k n e s s [ . 0 1 ] , ImageSize-> 250];

Figure 7.1 The measured derivative value of the function y = x is no longer correct for
decreasing scale. For scales o- < 0.6 pixels a marked deviation occurs.

The value of the derivative starts to deviate for scales smaller then say tr = 0.6. Intuitively,
we understand that something must go wrong, when we decrease the size of the kernel in the
spatial domain: it becomes increasingly difficult to fit the Gaussian derivative function with
its zerocrossings. We recall from chapter 4 that the number of zerocrossings of a Gaussian
derivative kernel is equal to the order of differentiation.
There is a fundamental relation between the order of differentiation, scale of the operator and
the accuracy required [TerHaarRomeny1994b]. We will derive now this relation.
The Fourier transform of a Gaussian kernel is again a Gaussian:
138 7.1 Limits on differentiation: scale, accuracy and order

1 X2
Unprotect [gauss] ; gauss Ix_, ~_] := Exp [- ~ ];

f f t g a u s s [w_, a_] = F o u r i e r T r a n s f o r m [gauss [x, Q] , x, co]

e_~F c~2 ~2

The Fourier transform of the derivative of a function is - i co times the Fourier transform of
the function:

F o u r i e r T r a n s f o r m [ 0 x g a U s s [ x, ~], x, ~]
FourlerTransformIgauss[x, u], x, ~]

-i~

The Fourier transform of the n-th derivative of a function is (-ion)" times the Fourier
transform of the function. Note that there are several definitions for the signs (see the
Mathematica Help browser for F o u r i e r ) .
A smaller kernel in the spatial domain gives rise to a wider kernel in the Fourier domain, as
shown below for a range of widths of first order derivative Gaussian kernels (in 1D):

DisplayTogetherArray[
{Plot3DIfftgauss[~, a], {~, -~, ~}, {Q, .4, 2}, P l o t P o i n t s ~ 30,
A x e s L a b e l ~ {"~", "u", "fft"}, A x e s ~ T r u e , B o x e d ~ T r u e ] ,
Plot3D[gauss[x, a], {x, -~, ~}, {~, .4, 2}, P 1 o t P o l n t s ~ 3 0 , AxesLabel
{"x", "u", "gauss"}, Axes ~ True, B o x e d ~ True]}, ImageSize -> 490];

Figure 7.2 Left: The Fourier transform of the Gaussian kernel is defined for zr < ~ < ~. The
function repeats forever along the frequency axis over this domain. For decreasing scale o-
in the spatial domain the Fourier transform get wider in the spatial frequency domain. At
some value of o- a significant leakage (aliasing) occurs. Right: The spatial Gaussian kernel
as a function of scale.

We plot the Fourier spectrum of a kernel, and see that the function has signal energy outside
its proper domain [-Jr, Jr] for which the spectrum is defined:
7. Natural limits on observations 139

F i l l e d P l o t [ { I f [ - ~ < ~ < ~ , fftgauss[~, .5], 0], fftgauss[~, .5]},


{~, - 2 ~ , 2Fi}, F i l l s ~ {[{i, Axis}, GrayLevel[.5]}},
Ticks ~ {{-~, ~}, Automatic},
A x e s L a b e l ~ {"w", "g(~,a=.5)"}, I m a g e S i z e - > 3 5 0 ] ;

Figure 7.3 The definition of the leakage is the (unshaded) area under the curve outside the
definition domain, relative to the total area under the curve. Here the definition is given for
the 1D Gaussian kernel. Due to the separability of Gaussian kernels this definition is easily
extended to higher dimensions.

The error is defined as the amount o f the e n e r g y (the square) of the kernel that is ’leaking’
relative to the total area under the curve (note the integration ranges):

f~ (I~0)2n fftgauss[m, u] 2 d ~
error[n_, a_] = i00
fo(I ca) 2 n fftgauss [~, G] 2 d~0

i00 ( ( l + 2 n ) G a m m a [ + +n] - 2 G a m m a [ ~ +n] + (i + 2 n ) 2 G a m m a [ + +n, ~2 ~2])


(I + 2 n) 2 G a m m a [ + + n]

W e plot this G a m m a function for scales between o- = 0.2 - 2 and order of differentiation
from 0 to 10, and we insert the 5% error line in it (we have to lower the plot somewhat (-
6%) to make the line visible):

Block[{$DisplayFunction = Identity},
pl = F l o t 3 D [ e r r o r [ n , a] - 6 , [u, .2, 2}, {n, 0, 10}, P l o t R a n g e ~ A l l ,
AxesLabel ~ {"u", "n", "error %"}, Boxed ~ True, Axes ~ True];
p2 = ContourPlot[error[n, a], {~, .2, 2}, {n, 0, i0},
ContourShading ~ False, Contours ~ {5}];
c3d = Graphics3D[Graphics[p2][[l]] /.
Line[pts_] :> ({Thickness[.01], val []Apply[error, First[pts]];
Line[Map[Append[#, val] &, pts]]}}]];
140 7.1 Limits on differentiation." scale, accuracy and order

Show[p1, c3d, ImageSize-> 310];

Figure 7.4 Relation between scale ~, order of differentiation n, and accepted error (in %) for
a convolution with a Gaussian derivative function, implemented through the Fourier domain.

ContourPlot[error[n, a], {n, O, i0}, {a, .i, 2},


ContourShading ~ False, Contours ~ {1, 5, 10}, FrameLabel ->
{"Order of differentiation", "Scale in pixels"}, I m a g e S i z e - > 275];

]75

]5

075

0.5

0.25

0 2 4 6 8 I0
Orderefdifl~rentiation

Figure 7.5 Relation between scale o- and the order of differentiation n for three fixed
accuracies for a convolution with a Gaussian derivative function, implemented through the
Fourier domain: upper graph: 1%, middle graph: 5%, lower graph: 10% accepted error.

The lesson from this section is that we should never make the scale of the operator, the
Gaussian kernel, too small. The lower limit is indicated in the graph above. A similar
reasoning can be set up for the outer scale, when the aliasing occurs in the spatial domain.
7. Natural limits on observations 141

We summarize with a table of the minimum o-, given accuracies of 1, 5, resp 10%, and
differentiation up to fifth order:

T a b l e F o r m [Table [
P r e p e n d [ (a /. F i n d R o o t [ e r r o r [ n , u] == #, {u, .6l]) & / @ {i, 5, I0}, n l,
{n, 1, 5}], T a b l e H e a d i n g s - >
{None, {"Order", "~ @ 1%", "a @ 5%", "a @ 10%"}}]

Order a @ 1% c @ 5% c @ 10%
1 0. 7 5 8 1 1 5 O. 629205 0.56276
2 0. 8 7 4 2 3 1 O. 748891 0.684046
3 0. 9 6 7 4 5 5 O. 844186 0.78025
4 i. 04767 O. 925811 0.862486
5 i. 11919 0.998376 O. 9 3 5 5 0 1

Task 7.1 This chapter discusses the fundamental limit which occurs by too
much 'broadening' of the Gaussian kernel in the Fourier domain for small scales
(the 'inner scale limit'). Such a broadening also occurs in the spatial domain,
when we make the scale too large. A similar fundamental limit can be
established for the 'outer scale limit'.
Find the relation between scale, differential order and accuracy for the outer
scale.

The reasoning in this chapter is based on the implementation of a convolution in the Fourier
domain. The same reasoning holds however when other choices are made for the
implementation. In each case, a decision about the periodicity or extension of the image
values outside the domain (see the discussion in chapter 5), determines the fundamental limit
discussed here.

7.2 Summary of this chapter


There is a limit to the order of differentiation for a given scale of operator and required
accuracy. The limit is due to the no longer ’fitting’ of the Gaussian derivative kernel in its
Gaussian envelop, known as aliasing. We derived the analytic expression for this error.

As a rule of thumb, for derivatives up to 4 th order, the scale should be not less then one pixel.
8. Differentiation and regularization
8.1 Regularization
Regularization is the technique to make data behave well when an operator is applied to
them. Such data could e.g. be functions, that are impossible or difficult to differentiate, or
discrete data where a derivative seems to be not defined at all. In scale-space theory, we
realize that we do physics. This implies that when we consider a system, a small variation of
the input data should lead to small change in the output data.
Differentiation is a notorious function with ’bad behaviour’. Here are some examples of non-
differentiable functions:

<< FrontEndVision'FEV';
Block[{$DisplayFunction= Identity},
pl = Plot[Exp[-Abs[x]], {x, -2, 2}, PlotStyle -> Thickness[.01]];
p2 = Plot[UnitStep[x - i], {x, -2, 5}, PlotStyle -> Thickness[.01]];
p3 = Plot[Floor[4 Sin[x]], {x, 0, 4 ~}, PlotStyle ->Thickness[.01]];
p4 = ListPlot[Table[Sin[4~/~l], {i, 2, 40}], PlotStyle->PointSize[.02]]];
Show[GraphicsArray[{{pl, p2}, {p3, p4}}], ImageSize -> 300];

0.8

0.6
0.4

-2-1 2 3 4 5

, ~ 1
-1 20 30
-2 05
-3
-4 -1

Figure 8.1 Some functions that can not be differentiated.

In mathematical terms it is said that the operation of differentiation is ill-posed, the opposite
of well-posed. Jacques Hadanmrd (1865-1963) [Hadamard1902] stated the conditions for
well-posedness:

9 The solution must exist;


9 The solution must be uniquely determined;
9 The solution must depend continuously on the initial or boundary data.
The first two requirements state that there is one and only one solution. The third
requirement assures that if the initial or boundary data change slightly, it should also have a
limited impact on the solution. In other words, the solution shouM be stable.
8. Differentiation and regularization 144

Regularization is a hot topic. Many techniques are developed to regularize the data, each
based on a constraint on how one wishes the data to behave without sacrificing too much.
Well known and abundantly applied examples are:

[] smoothing the data, convolution with some extended kernel, like a ’running average filter’
(e.g. {~, 71’ 31-}) or the Gaussian;
[] interpolation, by a polynomial (multidimensional) function;
[] energy minimization, of a cost function under constraints [Mumford1985a,
Mumford1989a, Mumford1994al;
[] fitting a function to the data (the best known examples are splines, i.e. polynomials fitting
a curve or a surface up to some order [DeBoor19781. The cubic splines are named so because
they fit to third order, x3 ;
[] graduated convexity [Blake1987] ;
[] deformable templates (’snakes’) [Mclnerney1996];
[] thin plates or thin plate splines [Bookstein1989] (see also
mathw orld.wolfram.com/ThinPlateSpline.html);
[] Tikhonov regularization, discussed in detail in the next section.

However, smoothing before the differentiation does not solve the ill-posedness problem. The
crucial difference between the approaches above and scale-space theory is that the first
methods change the data, your most valuable source of information, before the operation
(e.g. differentiation) is applied. The derivative is taken of the regularized data.

When we recall the importance of doing a measurement uncommitted, we surely should not
modify our data in any way. We need a regularization of the operator, not the operand.
Actually, the only control we have when we do a measurement is in our measurement
device. There we can change the size, location, orientation, sensitivity profiles etc. of our
filtering kernels. That is something completely different from the methods described above.
It is one of the cornerstones in scale-space theory that the only control allowed is in the
filters. As such, scale-space theory can be considered the ’theory of apertures’.

8.2 Regular tempered distributions and test functions


The formal mathematical method to solve the problems of ill-posed differentiation was given
by Laurent Schwartz [Schwartz1951] (see figure 8.2) as was noted by Florack
[Florack1994a]. The following is adapted from [Niessen1997a]. A regular tempered
distribution associated with an image is defined by the action of a smooth test function on the
image. Smooth is here used in the mathematical definition, i.e. infinitely differentiable, or
C e~ "

The class of smooth test functions r (also called the Schwartz space S ( R o ) ) is large. It
comprises all smooth functions that decrease sufficiently fast to zero at the boundaries. They
are mathematically defined as the functions r that are C ~~ and whose derivative to whatever
order goes faster to zero to any polynomial. Mathematically stated:

r E S ( R D) ~ r E C ~176
(R D) A sup II x ’n Oil...in r II < oo
145 8.2 Regular tempered distributions and test functions

for all m and n. As we consider any dimension here, m and n are multi-indices.

Let us give an example of such a function. The Gaussian kernel has the required properties,
and belongs to the class of smooth test functions. E.g. its goes faster to zero then e.g. a 13th
order polynomial:

Limit Ix 13 E x p [-x 2 ] , x ~ ~ ]

Gaussian derivatives are also smooth test functions. Here is an example for the third order:

Limit [x x3 0 . . . . . E x p [-x ~ ] , x ~ oo]

The reason that the function e -x2 suppresses any polynomial function, is that a series
expansion leads to polynomial terms of any desired order:

Series[Exp[-x 2], (x, 0, 15)]

1-x 2 4- x4 x6 x8 xl~ x12 x14 4- O { x ] 16


T - 6-- + 24- - 1 2 0 + 720 5040

9 Task 8.1 Find a number of functions that fulfill the criteria for being a member of
the class of smooth test functions, i.e. a member of the Schwartz space.

A regular tempered distribution TL associated with image L(x) is defined as:

TL = f[ooL(x) O(x) dx

The testfunction ’samples’ the image, and returns a scalar value. The derivative of a regular
tempered distribution is defined as:

Oit...i, TL = ( - 1 )
n
f’_oL(x)Oi,...i,, ~(x)dx
co

Thus the image is now ’sampled’with the derivative of the test function. This is the key result
of Schartz’ work. It is now possible to take a derivative of all the nasty functions we gave as
examples above. We can now also take derivatives of our discrete images. But we still need
to find the tesffunction qk Florack [Florack1994b] found the solution in demanding that the
derivative should be a new observable, i.e. that the particular test function can be interpreted
as a linear filter.
The choice for the filter is then determined by physical considerations, and we did so in
chapter 2 where we derived the Gaussian kernel and all its partial derivatives as the causal
non-committed kernels for an observation.

We saw before that the Gaussian kernel and its derivatives are part of the Schwartz space.
8. Differentiation and regularization 146

Show[Import["Laurent Schwartz.jpg"], ImageSize -> 150];

Figure 8.2 Laurent Schwartz (1915 - ). Schwartz spent the year 1944-45 lecturing at the
Faculty of Science at Grenoble before moving to Nancy where he became a professor at the
Faculty of Science. It was during this period of his career that he produced his famous work
on the theory of distributions. Harald Bohr presented a Fields Medal to Schwartz at the
International Congress in Harvard on 30 August 1950 for his work on the theory of
distributions. Schwartz has received a long list of prizes, medals and honours in addition to
the Fields Medal. He received prizes from the Paris Academy of Sciences in 1955, 1964 and
1972. In 1972 he was elected a member of the Academy. He has been awarded honorary
doctorates from many universities including Humboldt (1960), Brussels (1962), Lund (1981),
TeI-Aviv (1981), Montreal (1985) and Athens (1993).

So we can now define a well-posed derivative of an image L(x) in the proper ’Schwartz way’:

n co
Oii...i, L(x) = ( - 1 ) f~ooL(y)Oi,...i, q~(y, x ) d y

We have no preference for a particular point where we want this ’sampling’ to be done, so we
have linear shift invariance: qS(y;x) = ~b(y- x). We now get the result that taking the
derivative of an image is equivalent with the convolution of the image with the derivative of
the test function:

Oi 1 ... i,, t ( x ) = f;coL(y) Oil ... in (b(y - x) d y

The set of test functions is here the Gaussian kernel and all its partial derivatives. We also
see now the relation with receptive fields: they are the Schwartz test functions for the visual
system. They take care of making the differentiation regularized, well posed. Here is a
comparison list:

Mathematics ~ Smooth test function


Computer vision r Kernel, filter
Biological vision r Receptive field
147 8.2 Regular tempered distributions and test functions

Of course, if we relax or modify the constraints of chapter 2, we might get other kernels
(such as the Gabor kernels if we are confine our measurement to just a single spatial
frequency). As long as they are part of the Schwartz space we get well posed derivatives.

The key point in the reasoning here is that there is no attempt to smooth the data and to take
the derivative of the smoothed result, but that the differentiation is done prior to the
smoothing. Differentiation is transferred to the filter. See for a full formal treatment on
Schwartz theory for images the papers by Florack [Florack1992a, Florack1994a,
Florack1996b, Florack1997a].

The theory of distribution is a considerable broadening of the differential and integral


calculus. Heaviside and Dirac had generalized the calculus with specific applications in
mind. These, and other similar methods of formal calculation, were not, however, built on an
abstract and rigorous mathematical foundation. Schwartz’s development of the theory of
distributions put methods of this type onto a sound basis, and greatly extended their range of
application, providing powerful tools for applications in numerous areas.

8.3 An example of regularization


The classical example of the regularization of differentiation by the Gaussian derivative is
the signal with a high-frequency disturbance E cos(co x). Here E is a small number, and co a
very high frequency.

We compare the mathematical derivative with convolution with the Gaussian derivative.
First we calculate the mathematical derivative:

Ox (n [x] + e c o s [o, x] )

-e ~ sin[x~] + L' [x]

For large co the disturbance becomes very large. The disturbance can be made arbitrarily
small, provided that the derivative of the signal is computed at a sufficiently coarse scale o-
in scale-space:

gx[x_, a__] := Ox , E ~;2 ;


o

Simplify eCos[~ ( x - a ) ] gx[u, u] d u , {~>0, o>0}]

-e -#a2 J e co S i n [ x ~ ]
8. Differentiation and regularization 148

8.4 Relation regularization ~ Gaussian scale-space


When data are regularized by one of the methods above that ’smooth’ the data, choices have
to be made as how to fill in the ’space’ in between the data that are not given by the original
data. In particular, one has to make a choice for the order of the spline, the order of fitting
polynomial function, the ’stiffness’ of the physical model etc. This is in essence the same
choice as the scale to apply in scale-space theory. In fact, it is becoming clear that there are
striking analogies between scale-space regularization and other means of regularization.

An essential result in scale-space theory was shown by Mads Nielsen. He proved that the
well known and much applied method of regularization as proposed by Tikhonov and
Arsenin ITikhonov and Arsenin 1977] (often called ’Tikhonov regularization’) is essentially
equivalent to convolution with a Ganssian kernel [Nielsen1996b, Nielsen1997a,
Nielsen1997b]. Tikhonov and Arsenin tried to regularize functions with a formal
mathematical approach from variational calculus called the method of Euler-Lagrange
equations.

This method studies a function of functions, and tries to find the minimum of that function
given a set of constraints. Their proposed formulation was the following: Make a function
0~
E(g) = f ’ f - g)2 d x and minimize this function for g. f and g are both functions of x.
The function g must become the regularized version of f , and the problem is to find a
function g such that it deviates as little as possible from f . The difference with f is taken
with the so-called 2-norm, ( f - g ) 2 , and we like to find that g for which this squared
difference is minimal, given a constraint.

This constraint is the following: we also like the first derivative of g to x (gx) to behave
well, i.e. we require that when we integrate the square of gx over its total domain we get a
finite result. Mathematically you see such a requirement sometimes announced as that the
functions are ’mapped into a Sobolev space’, which is the space of square integrable
functions.

The method of the Euler-Lagrange equations specifies the construction of an equation for the
function to be minimized where the constraints are added with a set of constant factors &i,
one for each constraint, the so-called Lagrange multipliers. In our case:
E(g) = f _ ~ ( f - g ) 2 + &l gx 2 d x . The functional E(g) is called the Lagrangian and is to be
dE
minimized with respect to g, i.e. we require ~-g = 0.

In the Fourier domain the calculations become a lot more compact, f and ~ are denoted as
the Fourier transforms of f and g respectively. The famous theorem by Parceval states that
the Fourier transform of the square of a function is equal to the square of the function itself.
Secondly, we need the result that the Fourier transform of a derivative of a function is the
Fourier transform of that function multiplied with the factor - i o~. So
7(-Tsa g ( x ) ) -_ - i ~o ~F(g(x) ) where ~ d e n o t e s the Fourier transform.

For the square of such a derivative we get the factor ~oa , because the square of a complex
function z is the function z multiplied with its conjugate (i ~ - i ) , denoted as z*, so
149 8.4 Relation regularization ~ Gaussian scale-space

Z2 =ZZ.* which gives the factor (-i0))(-i0))* =0)2. So finally, because the Fourier
transform is a linear operation, we get for the Lagrangian/~:

co(-

dL’(~) ~ 2 1
d~: - 2 ( f - - ~ ) +2A10)2~=0, so(f-g,)+h.10)2~=0 ~= l+X]-o; f"

The regularized function ~, is (in the Fourier domain, only taking into account a constraint on
the first derivative) seen to be the product of two functions, 1 and jz, which product is a
convolution in the spatial domain.

The first result is that this first order regularization can be implemented with a spatial
filtering operation. The filter ~ 1 in the spatial domain looks like this:

1
gl[x ] =’rnverseFourierTransform[-- ~, x] / / $ 1 m p l i f y
L 1 + A1~02 9

e-Abs [x] ~

A I = i; Plot[gl[x], {x, -3, 3}, ImageSize-> 150];

Figure 8.3 Filter function proposed by Castanet al. [Castan1990].

It is precisely the function proposed by Castanet al. [Castan1990|. The derivative of this
filter is not well defined, as we can clearly see.

This is a first result for the inclusion of the constraint for the first order derivative. However,
we like our function ~ to be regularized with all derivatives behaving nicely, i.e. square
integrable. When we add the constraint of the second derivative, we get two Lagrangian
multipliers, ~tI and ~2 :

/~(g) = f-:oo (J~ - ~ ) 2 +h-l~x 2 +?~2~xxdco=f’_oo(f-~)


co - 2 +,~1 0)2 ~2 +2~2 0)4 ~2 do) and we
find in a similar way for ~,:

d E(~) ~ 2
d~ - 2 ( f - ~ ) + 2 ~.1 0)2 ~ + ~2 0)4 ~ = 0 ~--~-’~ - 1 ~,4 J~’
I+X1 J+X2

This is a regularization involving well behaved derivatives of the filtered j: to second order.
This filter was proposed by Deriche [Deriche1987], who made a one-parameter family of
this filter by setting a relation between the ?t’s: X1 = 2 ~ - 2 . The dimensions of A1 and 22
are correctly treated by this choice. When we look at the Taylor series expansion of the
8. Differentiation and regularization 150

Gaussian kernel in the Fourier domain, we see that his choice is just the truncated Gaussian
to second order:

1 x2
Simplify[FourierTransform[ E 2a2 9x t ~011
J
a > 0]
J
~47g~ o

z427

Series[E~ ~ , (~, 0, i0]]


a2 ~2 ~4 ~4 ~6 ~6 a8 ~8 O10 ~10
l+---g--+T+~+~g~-+~+o[w] 1~

Here is how Deriche’s filters look like (see figure 8.4):

A I = . ; A2 =.; g2[x_] []
1
InverseFourierTransform[ , ~, x] // F u l l S i m p l i f y
i+2~c02 +A2~0 r

1
242~
Ig ] ( 21,4 +xSignExl)- (x+ 21,4 SignExl) s . nhI x ])

o2
and the graph, for ;t2 = 1 :

A2 = 1; Plot[g2[x], {x, -4, 4}, ImageSize-> 150];

05
04
0.3

Figure 8.4 Filter function proposed by Deriche [Deriehe1987].

From the series expansion of the Gaussian, and the induction from the lower order
regularization, we may develop the suspicion that by adding the constraints for all
derivatives to behave well, we get the infinite series

1 j~ = h j7 for n ~ co.
= 1+/-1 oz+/-2 ~o4+’" +A. J "

Nielsen showed that the filter ,~ is the Gaussian kernel indeed. The reasoning goes as
follows. W e have an infinite number o f unknowns here, the An ’s, so we need to come up with
an additional constraint that gives us just as many equations, so we can solve uniquely for
this system o f equations. W e have just as many terms co2n , so we look for a constraint on
them. It is found in the requirement that we want scale invariance for the filters, i.e. we want
two filters h(s) and h(t) to cascade, i.e. h(s @t)= h(s)| h(t) where | is the convolution
operator. The parameters s and t are the scales of the filters. The operator (9 stands for the
151 8.4 Relation regularization ~ Gaussian scale-space

summation at any norm, which is a compact writing for the definition


a ~ b = (a + b) p = a p + b p . It turns out that for p = 2 we have the regular addition o f the
variances o f the scales, as we have seen now several times before, due to the requirement o f
separability and Euclidean metric.
Implementing the cascading requirement for the first order:

1 _ 1 1 giving
l+~t(s@t)~ 2 l+~.l(S)~O
2 - l+hq(t)~)2
1 + hi (s ~ t) 092 = 1 + hi (s) o92 + hi (t) 0)2 + h! (t) h 1(s) 0)4 and
hi (s ~ t) = hl (s) + hi (t) + hi (t) hi (s) c02 .

We equal the coefficients of powers of 0) both sides, so for 0)0 we find


h I (s ~]~t) = h 1 (s) + h 1 (t) which means that hi must be a linear function o f scale, hi = 0" s.
Now for the second order:

1 1 1
1 -I-21 (s~)g) (02 q’~-2 (s~)t) (04 -- 1+~1 (s) 0)2 q-~-2(s) 0)4 ~ 1+~-1(t) o~2 +MB(t) 094
giving
h 1 (s ~) t) 0) 2 + h 2 (s ~ t) 0)4 : ~-1 (s) 0) 2 + ~-2 (s) 604 + h 1 (t) 0)2 +
h I (t) h i (s) 0)4 + h i (t) h2 (s) 0)6 + h2 (t) 0)4 + h2 (t) h 1 (s) 0)6 + h2 (t) -~2 (s) 0) 8

and equating the coefficients for o94 on both sides:

h2(s @ t) = hi (t)hi (s) + h2(t) + h2(t) from which dimension we see that h 2 must be quadratic
0:2 ~2 1 .
in scale, a n d , ~ 2 - z - ~ h12

This reasoning can be extended to higher scale, and the result is that we get the following
series:

1 0 " 2 s 2 , h 3 = 2 ~. 0 " 4 0 - 4 , h 4 = ~ 1
ghl = 0, s, h2 = 2- 0 , 6 0 . 6 etc.

W e recall that the series expansion of the Gausslan function E - ~ ~ ~ is, using

s~i~[,~ ~ ~ <~, o, ioi]


a2 (/)2 0-4 0)4 0_6 0)6 0_8 0)8 alO ~I0
I + ~ +~ + ~ + ~ + B--~-~O--+~

i
~2 w2 o~ ~4 ~6 ~6 o-8wS ~10 wl0 +O[09111
1+ 2 + 24 + 246 § 246.8 q 2.46.810
8. Differentiation and regularization 152

= 1., PZot[~.valuate[~nver,,Fourler~ran,~o~,[~-~ ~ ~ ,~,, . ] ] ,


{x, -4, 4}, ImageSize-> 300];

0.8

06

04

02

-4 2

Figure 8.5 The Gaussian kernel (o- = 1 ).

When we take the arbitrary constant ce = l , we get as the optimal regularization filter, where
all derivatives are required to behave normal, precisely the Gaussian kernel! This important
result is due to Nielsen [Nielsen1996b, Nielsen1997a, Nielsen1997bl. It has recently been
proved by Radmoser, Scherzer and Weickert for a number of other regularization methods
that they can be expressed as a Gaussian scale-space regularization [Radmoser1999a,
Radmoser2000al.

9 Task 8.2 Prove the equations for the coefficients ~-n in the section above by
induction.

8.5 S u m m a r y of t h i s c h a p t e r

Many functions can not be differentiated. A sampled image is such a function. The solution,
due to Schwartz, is to regularize the data by convolving them with a smooth testfimction.
Taking the derivative of this ’observed’ function is then equivalent to convolving with the
derivative of the test function. This is just what the receptive fields of the front-end visual
system do: regularization and differentiation. It is one of the key results of scale-space theory.

A well know variational form of regularization is given by the so-called Tikhonov


regularization: a functional is minimized in L 2 sense with the constraint of well behaving
derivatives. It is shown in this chapter, with a reasoning due to Nielsen, that Tikhonov
regularization with inclusion of the proper behaviour of all derivatives is essentially
equivalent to Gaussian blurting.
9. The front-end visual system-
the retina
We all share the same biology, regardless of ideology. (Sting)

9.1 Introduction
The visual system is our most important sense. It is estimated that about one quarter of all
nerve cells we have in our central nervous system (CNS) is related to vision in some way.
The task of the system is not to form an image of the outside world into the brain, but to help
us to survive in this world. Therefore it is necessary to perform a substantial analysis of the
2D image as a projection of the 3D world. For this reason there is much more measured than
just the spatial intensity distribution. We will see that the front-end visual system measures
simultaneously at multiple resolutions, it measures directly (in the scale-space model)
derivatives of the image in all directions at least up to fourth order, it measures temporal
changes of intensity, the motion and disparity parameters, and the color differential structure.
As a consequence, the layout of the receptors on the retina is strikingly different from the 2D
pixel arrays in our conventional digital camera’s.

There are some excellent books giving a good introduction to the basics of the human visual
system. A few deserve special mentioning here:

1. "Eye, Brain and Vision" by David Hubel [Hubel1988a], a Scientific American Library
book by this Nobel laureate about his pioneering work-of-life. This delightful book is from
1988, and many new findings have been added since then, but as an introduction it is very
nice reading.
2. "The Visual System" by Semor Zeki [Zeki1993], with more historical descriptions of the
discoveries. Zeki, as a physician, also incorporates many clinical findings in the quest.

3. "Principles of neural science" by Eric Kandel, James Schwartz and Thomas Jessell (4 th
edition [Kandel2000]). An excellent and widely used textbook on neurons, perception and
vision.
4. "The First Steps in Seeing" by Robert Rodieck [Rodieck1998] gives a detailed account of
the neurophysiology of the visual front-end (particularly the retina, but also the lateral
geniculate nucleus and primary visual cortex).
5. "Foundations of Vision", by Brian Wandell [Wandell1995] gives attention to both
biological vision as models for processing.
6. "Neuro-Vision Systems" by Madan Gupta and George Knopf [Gupta1993a]. This is a
convenient selection of reprints of the classical papers on vision and vision modeling. There
are quite a few tutorial chapters added.
9. The front-end visual system - the retina 154

These books focus on the introduction into the visual system, and all cover the essential
properties and details. They have excellent illustrations, and read easily for computer
scientists and non-biologically educated people. They all contain pointers to the original
scientific papers.

In the next few chapters we give a summary of the properties, subsystems and possible
models for the front-end visual system as is relevant in the context of scale-space theory. We
discussed in the previous chapters the multi-scale notion of spatial and spatiotemporal image
structure, these chapters are about how a biological system may deal with measurement and
analysis of this structure.

The most important properties in the context of this book are the following:

1. The human visual system is a multi-scale sampling device of the outer world. It exploits
this strategy by the creation of so-called receptive fields (RF’s) on the retina: groups of
receptors assembled in such a way that they form a set of apertures of widely varying size.
They together measure a scale-space of every image. The hierarchical structure of the input
image is contained in this multi-scale stack of images measured at a range of scales. We call
this the deep structure.

2. The human visual system does ensemble measurements: for every (perceivable) aspect of
the stimulus it has a dedicated set of detector (receptive fields or receptive field pairs)

They span the full measurement range of the parameter, i.e. for every location, order of
spatial and temporal differentiation of the stimulus, for every orientation, for every velocity
in every direction, for every disparity, etc. Amazing, but there seems to be no lack of
hardware (’wetware’) in the visual system as we shall see.

3. The visual system is considered layered: its first stages measure the geometrical structure
by multi-scale partial derivatives in space and time, and subsequent layers perform an
analysis of the contextual strncture, by perceptual grouping and hierarchical topological
analysis, the highest stages do the cognitive, highly associative tasks. This rough division in
processing layers is also known as front-end, intermediate or high level visual processing.

We will now look into more detail into the neuro- and electrophysiological (and
psychophysical) findings that corroborate this model.

9.2 Studies of vision


The visual system is a signal processing system with spectacular performance. First of all,
the mere quantity of information processed in the visual system is enormous. But most
importantly, the system is capable of extracting structural information from the visual
picture with astonishing capabilities and with real-time speed. The visual system is one of
the best-studied systems in the brain.
155 9.2 Studies of vision

We are however still a long way from understanding its intricate details. In this chapter we
focus on the first stages of vision, where data from neuro-physiology, anatomy and
computational models give rise to a reasonable insight in the processing that happens here.

After the measurement through the eye (the looking) our seeing is done in the visual cortex, a
number of specialized cortical areas in the back of our brain. It has been estimated that 25%
of all our estimated 1010 brain cells are in some way involved with the visual process.

A study of the visual system can be done from many different viewpoints:
Eye physics: the study of the physical limitations due to the optics and retinal structure.

Psychophysics: how well does the visual system perform? What are the limits of
perception? Can it be tricked?
Neurophysiology and -anatomy: a study of the system’s wetware organization, e.g. by
measurement of the electrical activity of single or small groups of cells, and by mapping
neural pathways and connections.
Functional and optical imaging: measure the functional activity of arrays or large
clusters of cells (in chapter 8 these methods are described in Intermezzo 8.5).
Computational models: the field of computer vision mimicking the neural substrate to
understand and predict its behavior, and to inspire artificial vision algorithms for computer
vision tasks. Despite many efforts and high mathematical sophistication, today we still see a
rather limited performance of computer vision algorithms in general.

Traditionally, vision is coarsely divided into three levels: front-end, intermediate and high
level vision. Traditionally, the visual front-end has been defined as the measurement and first
geometric analysis stage, where associative memory and recognition do not yet play a role. It
is becoming clear that the visual front end receives many inputs from higher levels, making
the front-end a front-end in a context. The outputs of the front-end go to all further stages,
the intermediate and high levels.
In the front-end the first processing is done for shape, motion, disparity and color analysis (in
more or less separate parallel channels). The intermediate level is concerned with perceptual
grouping, more complex shape, depth and motion analysis and first associations with stored
information. The high level stages are concerned with cognition, recognition and conscious
perception.

This chapter focuses on front-end vision, as this is by far the best understood, and its
principles may guide progress in the research of higher levels. High-level vision, where the
cognitive processes take place, is a huge research area, and the most difficult one. It is the
domain of many scientific disciplines, the cognitive sciences.

The visual system turns out to be extremely well organized. The retinal grid of receptors
maps perfectly to the next layers in the brain in a retinotopic fashion: neighborhood relations
are preserved.

Two cells, next to each other in the retina, map to cells next to each other in higher stages,
i.e. the lateral geniculate nucleus (this structure in the brain on which most of the optic nerve
9. The front-end visual system - the retina 156

projects is explained in detail in chapter 7) and the primary visual cortex. We recognize a
cascade of steps, many (or all?) of the stages equipped with extensive feedback to earlier
stages. The mapping is not one-to-one: there is neighborhood convergence (many nearby
cells to one) and neighborhood divergence (one cell to many nearby).

Many models can be proposed for the assumed working of the visual front-end in its visual
processing, and extensive literature can be found. A successful recent realization is that the
front-end can be regarded as a multi-scale geometry engine IKoenderink1984a,
Koenderink1990c, Koenderink1992d]. What does this mean? Multiscan, or multi-resolution,
is the property of the visual system to measure and process the information at many
simultaneous levels of resolution. It is a direct consequence of the physics paradigm that the
world consists of objects and structure at many sizes, and they should be measured with
apertures at these sizes, i.e. at these different resolutions.

Our retina is not sending down the measurements of single rods and cones, but of groups of
rods or cones. Such a (typically circular) group is called a receptive field. They come in a
large range of sizes (minutes of arc to many degrees), and measure the world consequently
from sharp to very blurred. For very fine details we employ the smallest receptive fields, for
the larger structures we employ the larger receptive fields. Moreover: it is an extra dimension
of measurement: we sample not only the spatial and temporal axes, but also along the scale
axis.

The notion of geometry engine is reflected in the important contemporary model that the
front-end visual system extracts derivatives in space and time to high order from the visual
input pattern on the retina.

It turns out that in the visual front-end we have separate parallel channels for shape, motion,
color and disparity processing. In our visual system we generate an extremely redundant set
of measurements: for every possible value of a parameter we seem to have a specialized
receptive field.

E.g. for every velocity, for every direction, for all sizes, necessary differential order and all
orientations. We will study this framework in greater detail in the next sections, while
following the visual pathway, i.e. the neuronal path of the information from the retina into
the brain.

9.3 The eye


The eyes are the two moveable stereo cameras to measure the light distribution reflected and
emitted from the objects in the world around us. The image is formed upside down on the
retina, which is the layer of light sensitive receptors in the back of our eye. The breaking
power of lens optics is defined as 1 / f where f is the focal distance of the lens, and is
expressed in diopters (meter -1 ).

The breaking power of the eye optics cornea system is 60 diopters. This is due to both the
cornea (43 diopter) and the lens (17 diopter), and can be varied over about 8 diopters
(accommodation of the lens).
157 9.3 The eye

<< FrontEndVision'FEV';
Show[Import["binocular projection.jpg"], ImageSize -> 300];

Figure 9.1 Where the visual fields of both eyes overlap we are equipped for stereo vision.
The left and right visual field are each treated in a different hemisphere. The optic nerves
split underway in the chiasma (from [Kandel 2000]).

The eye has a diameter of about 17 mm. The processing of information starts already in the
retina, as this really is extended brain tissue: similar neurotransmitters are found in the retina
as in the cortical brain tissues, and we recognize the same strictly layered structure in the
retina as we will meet later in the visual cortex.

9.4 The retina


The retina consists coarsely of three layers of cells (figure 9.3). The light sensitive receptors,
i.e. the rods and the cones, are located in the back of the eye behind the other layers. The
reason for this is the close neighborhood to the nursing vessel bed in the back of the eye, the
pigmented cells or chromatoid.

The rhodopsin molecules in the receptors that are bleached by the light can in this way be
easily replenished. The middle layer contains horizontal, bipolar and amacrine cells (figure
9.2). The front layer contains the about one million ganglion cells, whose axons form
together the optic nerve, the output of the retina.

The rods and cones are packed tightly together in a more or less hexagonal array. We have
much more rods then cones. It is estimated that we have about 110,000,000 to 125,000,000
rods in our retina, and about 6,400,000 cones [Osterberg19351. The diameter of a rod in the
periphery is about 2.5/~m, or about 0.5 minute of visual angle. In the fovea, the central area
of the retina, the receptors become smaller, about half this size (figure 9.2).
9. The front-end visual system - the retina 158

Show[Import["rods and cones mosaic.jpg"], I m a g e S i z e - > 300];

Figure 9.2 Hexagonal packing of the cones in the fovea (left) and rods/cones in the periphery
(right) in the human retina. Scale bar = 10/~m. From: [Curcio et al. 1990].

Rods are only used at very dim light levels. This is rod vision, or scotopic vision. In normal
daylight lighting conditions, they are in a completely saturated state, and have no role in
perception. Rods are single color, so in the dim light we see no colors.

Show[Import["retina layers.jpg"], ImageSize-> 240];

Figure 9.3 The cell layers of the retina. Light is coming from below (indeed, it must pass all
the transparent cell layers!). The receptors at the top of the image touch the pigmented cells
in the highly vascularized chromatoid layer, from which the bleached rhodopsin is
replenished. The bipolar cells connect the receptors with the ganglion cells. The horizontal
cells enable lateral interaction. The function of the many types of amacrine cells is unclear.
They may have a role in motion processing (e.g. as a time-delay cell between time-coupled
receptive fields). The ganglion cells at the bottom form the output and are the only cells in
the retina that generate action potentials. The collective axons of all ganglion cells (about
one million) form the match-thick optic nerve. From [Hube11988a].

The optimal sensitivity is in the green-yellow. Cones are used at normal light levels, i.e.
photopic vision. Cones come in three types, for long (red), medium (green) and short (blue)
wavelength sensitivity. Therefore these types are called L, M and S cones.
159 9.4 The retina

Figure 9.5 shows an electron-microscopic cross-section of a single rod. On top the hundreds
of disks can be seen, which contain the light sensitive rhodopsin molecules in their
membranes. Even a single photon is capable to evoke a measurable chemical reaction. We
need about 3-4 photons to see them consciously. The sensitivity range of the retina is
impressive: the ratio between the dimmest and the brightest light we can see (without
damage) is 1014 !The caught photon changes the structure of the rhodopsin molecule, after
which a whole chain of events generates a very small voltage change at the base of the cell,
where it is transmitted to the next layer of cells, to the horizontal and bipolar cells. A rod or
cone does not generate an action potential, just a small so-called receptor-potential of a few
millivolts. Much research has been done on this process; it is beyond the scope of this
chapter to go into more detail here. A good tutorial overview can be found in [Kandel et al.
2000, 4 th edition].

Show[Import["rod disks.jpg"], ImageSize-> 4 0 0 l ;

Figure 9.4 Electron-microscopic cross-section of a rod. At the right in the image are the disks
with the membrane vesicles (V), which contain the visual pigment, i.e. the rhodopsin
molecules. Ci = cilium, the small connecting tube between the top (outer segment) and
bottom part (inner segment) of the cell. Mi = mitochondrion. BI = basal body. Scale bar: 400/~
m. From the beautifully illustrated book [Kessel & Kardon 1979].

The receptors are not evenly distributed over the retina: in the fovea (about 1.5 m m or 5.2
degrees in diameter; one degree of visual angle is equal to 288 ~tm on the retina), we find an
area free of rods of a roughly 250 - 750 ~tm.

The number of cones in the fovea is approximately 200,000 at a density of 17,500


cones/degree 2. The rod free area is about 1~ thus there are about 17,500 cones in the central
rod-free fovea. The density distribution of rods and cones as a function of eccentricity is
given in figure 9.5.
9. The front-end visual system - the retina 160

Show[
Import["retinal receptor distribution according to Osterberg.jpg"],
ImageSize -> 220];

cone peak
~.~! rod peak ] ~],~

a '~ [ S
= I ~"1

ECCENf RICI1Y io degrees

Figure 9.5 Density of retinal rods and cones as a function of eccentricity. The central area of
the fovea is rod-free. From [Qsterberg 1935].

9.5 Retinal receptive fields


The story of the receptive fields begins in the early fifties, when Torsten Wiesel and David
Hubel (see figure 9.6) in the lab of Stephen Kuffler began to map the responses of individual
ganglion cells in the retina.

ims = Import/@ {"David Hubel.jpg",


"Torsten Wiesel.jpg", "Stamp Nobelpris1981.jpg"};
Show[GraphicsArray[ims], ImageSize -> 380];

Figure 9.6 David Hubel (left) and Torsten Wiesel (middle). They received the Nobel Prize in
Medicine for their pioneering work in front-end vision neurophysiology in 1981 (right, stamp
Sweden 1984).

When a very small electrode (an extruded hollow glass tube filled with electrolyte, or a tiny
tip of a platinum-iridium electrode, tipsize 0.5 - 1 pm), one can record the action potentials
of the ganglion cells. In the dark these cells typically ’fire’ with frequencies in the 1-2 KHz
range. Surprisingly, when the retina was illuminated as a whole (with a ’Ganzfeld’), the firing
rates of ganglion cells did not change, no matter the level of illumination.

It was not until Hubel and Wiesel discovered that the sensitivity came in receptive field
sensitivity patterns. Only a tiny spot of illumination could increase or decrease the firing rate
of the ganglion cell (see figure 9.7).

This important discovery started a first understanding of the retinal function.


161 9.1 Retinal receptive fields

Show[Import["retinal RF response.jpg"], ImageSize-> 400];

Figure 9.7 Retinal receptive field behavior on light stimulation. First row: In the dark the
ganglion cell fires at the spontaneous firing rate. Third row: With homogeneous light
stimulation the average firing rate is the same as in the dark. Second row: central spot
stimulation. Bottom row: peripheral annular stimulation of the surround. Left: on-center/off-
surround receptive field, right: off-center/on-surround receptive field. From [Hube11988].

The area that needed to be illuminated to change the firing rate of a ganglion cell turned out
to be roughly circular, contained about 30-50 receptors and had a sensitivity pattern which
was excitatory (frequency-increasing) in the center and inhibitory (frequency-decreasing) in
the surround in 50% of the cells. This sensitivity pattern was called on-center center-
surround. The group of receptors and the single ganglion cell they project on, as well as the
intermediate cells in between, is called a receptivefieM. The retinal receptive fields are all of
the center-surround type.

For the other 50% of the ganglion cells the situation was reversed: off-center center-surround
receptive fields. Receptive fields were found at many sizes, from a few minutes of arc to
many degrees in diameter. One degree of visual angle corresponds to 200 micron distance on
the retina.

It turns out that the receptive field structure is a general feature in the human senses when a
spatial distribution of a signal is to be measured. The organ of Corti on the basilar membrane
in the inner ear displays receptive fields, and we find them on the skin, where tactile
receptive fields are formed of Pacini pressure-sensitive receptors (see figure 9.8).

It is interesting to note that the output of the central nervous system (CNS) exploits a similar
strategy of ’blurring’the output of force development in a muscle. Typically, a motor neuron
drives several hundreds of muscle fibres by sprouting of its axon terminals. The set of
roughly circular and widely overlapping set of muscle fibres belonging and its driving motor
neuron is called a ’motor unit’.
9. The f r o n t - e n d visual system - the retina 162

Show[Import["Tactile RFs.jpg"], ImageSize-> 380];

Figure 9.8 Tactile receptive fields of Pacini pressure-sensitive receptors found in the skin
have a center surround structure, strongly overlap and come at a wide range of sizes. From
[Kandel et al. 2000].

Hubel and Wiesel expanded these experiments, and started to measure the receptive fields of
many other cells in the visual pathway, such as in the Lateral Geniculate Nucleus (LGN: a
nucleus in the midbrain, discussed in detail in the next chapter), the primary visual cortex
(V1), and higher.

9.6 Sensitivity profile measurement of a receptive field


DeAngelis, Ozahwa and Freeman at UC Berkeley (among others) have been able to carefully
measure the receptive field sensitivity profiles of visual pathway cells in cats and monkeys
(figure 9.7, see also the demonstrations on the Freeman’s Lab website:
http://neurovision.berkeley.edu/). They recorded the firing of the cell by inserting a very thin
needle into the cell’s soma (Greek: soma = body), and recorded the increase or decrease in
firing rate dependent on the stimulation of its receptive field with a tiny light stimulus. They
first found the general coarse location of the receptive field by manually searching the retina
with a small light stimulus, and then mapped quantitatively the area belonging to it. Small
light and dark flashes were randomly presented on a gray background in fast sequence (see
figure 9.9).
163 9.6 Sensitivity profile measurement of a receptive field

Show[GraphicsArray[
pl=Table[Graphies[{GrayLevel[.5], Rectangle[{0, 0}, {100, 100}],
GrayLevel[Random[Integer, {0, 1}]], Rectangle[
rnd = {Random[integer, {i, 92l] , Random[Integer, {i, 92ll},
rnd+ {7, 7}]}, AspectRatio-> i], {4}, {8}]], ImageSize-> 300];

Figure 9.9 Reverse correlation stimulus. Small dot stimuli with positive or negative contrast
are presented sequentially. The images shown are row by row the individual frames of the
presented sequence.

For every recorded action potential it was checked what has been the location of the stimulus
before, and the bin of that location was incremented. By dividing the time after the stimulus
in discrete time slots, for every slot a mapping could be made, thus giving a spatio-temporal
mapping, i.e. a series of maps of the RF over time (see also chapter 20). This mapping
technique is known as ’reverse correlation’ [de Boer1968, MacLean1994, Ringach1997] (see
figure 9.10).

The advantages of the reverse correlation are plentiful. The stimulus is a white noise
stimulus at low intensity, thus operating in the linear domain of the receptive field. The
correlation between the stimulus and the response is calculated fast. Many stimuli can be
given in a short period of time. The number of stimuli is larger then the number of responses,
thus the reversing of the time axis in the analysis pays off.

An alternative way is to stimulate with white noise with a so-called maximum-length


sequence. This has the advantage that numerous locations are stimulated at once, and one has
a better chance of mapping also the weaker outer borders of the receptive field profile. For a
more detailed discussion, see [DeValois2000], [DeAngelis1995a].

A web tutorial from Ralph Freeman’s lab is available at


neurovision.berkeley.edu/Demonstrations/V S OC/teaching/AA_RFtutorial.html.
Recently, registrations from 2 neighbouring neurons could be recorded [DeAngelis 1999a].

9 Task 9.1 With the smallest foveal cone having a diameter of 2.5/~m, what is the
diameter of the smallest dot you can see from a distance of 10 meters?
9, The f r o n t - e n d visual system - the retina 164

Task 9.2 Find literature with actual measurements of the sensitivity profiles of
the receptive fields in the retina, and cortical cells.

Show[Import["reverse correlation.gif"], ImageSize -> 250];

Figure 9.10 Diagram of the reverse correlation stimulus method. Top: Sequence of stimulus
images as a function of time, time running to the right. During the stimulus presentation
action potentials (the spike train) are recorded with a needle electrode from a single cell in
LGN or cortex. For every spike recorded the causing stimulus is found (reversed in time).
The time after the stimulus is divided into time bins (slots). If a spike occurs e.g. 25 ms after
a certain stimulus, it is recorded at the appropriate position in the 2D post stimulus-time
histogram (PSTH) at that particular time instance after the stimulus (25 ms in this case). This
25ms-PSTH is depicted vertically in the space-time cube in the middle, and in the lower left
of the figure. It shows the spatial sensitivity profile of the receptive field of the cell. The set of
PSTH's for the different time slots form the frames of the temporal behaviour of the cell's
receptive field sensitivity profile. They can be presented as a movie, showing the dynamic
change of the sensitivity profile of the receptive field. From [DeAngelis1995a].

In the retina only center-surround RF profiles are found, which are static. They are found at a
wide range of sizes (from a few minutes of arc to tens of degrees). See an example profile in
figure 9.11. This wide range of sizes is the retinal mechanism to sample at a range o
resolutions simultaneously at the very first stage of the measurement. This is the multi-scale
sampling strategy.
165 9.6 Sensitivity profile measurement of a receptive field

Show[Import["RF center surround ce11.jpg"], ImageSize -> 150];

Figure 9.11 Center-surround sensitivity profile on the retina of a receptive field of a cell
recorded in the lateral geniculate nucleus (LGN) in the thalamus (a midbrain structure). Cells
in the LGN have the same center-surround structure as retinal ganglion cells. Tiny light
measurement probe stimulation, measurement is a spatial post-stimulus-time histogram at
31 x 31 locations. This is an on-center RF: the cell increased firing when stimulated in the
central area (green), and decreased firing when stimulated in the surround area (red). Field
of view: 3 degrees of arc. From [DeAngelis et al. 1995].

9.7 Summary of this chapter


The eye optics project the visual world upside down on the retina, which is a receptor layer
with about 150 million receptors. The retina is brain tissue, and already performs a
substantial processing in its 4 main types of layered cells (receptors, horizontal, amacrine
and ganglion cells). The receptors come in two types, about 120 million rods for scotopic
(dim light) black and white vision, and about 30 million cones for photopic (bright light)
color vision. There are three colors cones: types sensitive for light of large wavelengths
(red), medium wavelengths (green) and short wavelengths (blue). The horizontal cells collect
the receptor output over a roughly circular area, the bipolar cells transport the signals to the
collecting large ganglion cells.

The axons of the about 1 million ganglion cells form together the match-thick optic nerve,
the output of the retina. The optic nerve directly projects to the thalamus in the midbrain,
ganglion cells looking to the fight visual field to the left brain, and vice versa. The amacrine
cells, which come in a rich variety, are assumed to play a role in motion detection.

The contribution of a set of receptors to the firing of a single ganglion cell is spatially
organized in a receptive field. A positive sensitivity in the receptive field means that a small
illumination here increases the firing frequency of the ganglion cell, and vice versa.
Receptive fields in the retina are 50% on-center/surround and 50% off-center/surround: a
circular sensitivity profile at a wide range of sizes, reflecting the scale-space multi-scale
sampling at the retinal level. Receptive fields can be accurately recorded with single cell
electrophysiological methods. A classical technique is the method of RF mapping by reverse
correlation.
10. A scale-space model for the
retinal sampling
10.1 The size and spatial distribution of receptive fields
Why do we have all these different sizes? Smaller receptive fields are useful for a sharp high-
resolution measurement, while the larger receptive fields measure a blurred picture of the
world. We denote the size of the receptive field its scale. We seem to sample the incoming
image with our retina at many scales simultaneously.

In regular man-made camera's we don't encounter this situation: we only measure at the
highest possible resolution, the basic pixel of the grid. If we want a lower resolution, e.g. for
computational efficiency, we blur the sharp image with a computer afterwards. There must
be an important reason to have this multi-scale capability at the retina. The different
resolutions are each sampled as an independent new measurement, and they somehow need
to be available simultaneously. The retina measures a whole stack of images: a scale-~pace
(x-y-o-, recall figure 2.12). We will study this in detail in the chapter on the deep structure of
images.

The measurement at the different scales gives an interesting model for the retinal receptive
field distribution. The debate on why we have such an inhomogeneous receptor and receptive
field distribution is old and still not set definitively (see e.g. |Williams 1991]). In this chapter
we present a model from a scale-space perspective.

We already have seen that the density of receptors is not homogeneous. We may assume here
the principle of scale-invariance, expressed in words as: 'all scales should be dealt with in
identical fashion, there is no preference whatsoever for a particular scale'.

This means, that also the processing capability for each scale is equal, and that we may
expect the same processing capacity for each scale, i.e. the same amount of wetware. This
boils down to an equal amount of receptive fields for each scale. The densest stacking in 2D
of equal circular areas is a hexagonal array. Because the eyes can move, it is most logical to
put the hexagonal arrays of receptive fields all in the center of the retina, superimposed on
each other so we get a lot of overlap.

The hexagonal tiling of the smallest scale receptive fields forms the fovea. The slightly
larger scale receptive field array has just as many receptive fields, but naturally these cover a
slightly larger area. And so on, till we encounter the largest scale, of which the receptive
fields just cover the whole available retinal area.
168 10.1 The size and spatial distribution of receptive fields

Figure 10.1 shows the stack model for receptive fields in the retina, as originally proposed by
Lindeberg and Florack [Lindeberg1992]. It is based on the earlier ground-breaking
'sunflower' model by Koenderink [Koenderink1984d, Koenderink1988d].

<~ F r o n t E n d V i s i o n ' F E V " ;


Block[ { $ D i s p l a y F u n c t i o n = Identity, r = 2 Cos [~/6] }, c i r c l e s =
F l a t t e n [ ( # A p p e n d l T a b l e [ { C o s [ t ] + r C o s [ i ] , Sin[t] + r S i n [ i ] , I},
{i, 0, 2 ~ - .i, ~ / 3 } ] , {Cos[t], Sin[t], i}]) & / @
(Exp[Range[.l, 2, .3]] - .5) , i] ;
p a r = P a r a m e t r i c P l o t 3 D [ E v a l u a t e [ c i r c l e s ] , {t, 0, 2 ~ } ] ;
s h a d o w p l o t = S h a d o w [ p a r , X S h a d o w ~ False,
YShadow~False, B o x R a t i o s ~ {1, 1, 1}, B o x e d ~ F a l s e ,
V l e w P o i n t - > {0.646, -3.424, 1.057}, Z S h a d o w P o s i t i o n ~ - . 5 ] ;
oe []Exp[1.91 - .5; ob = E x p [ . 1 ] - .5;
lines = G r a p h i c s 3 D [
{Line[{{0, 0, oe}, {0, 0, 0}}], Line[{{0, 0, 0}, {oe ( r + l ) , 0, oe}}],
L i n e [ { { 0 , 0, 0}, {-oe ( r + l ) , 0, oe}}]}];
S T e x t S t y l e = { F o n t F a m i l y ~ "Helvetica", F o n t S i z e ~ 11} ;
t e x t = G r a p h i c s 3 D [ T e x t [ " f o v e a " , {6, 0, oh}]] ;
] ; Show[ {shadowplot, lines, text}, I m a g e S i z e -> 230] ;

Figure 10.1 Stack model for the retinal receptive fields. Every scale is treated equally, so we
assume equal hardware, i.e. numbers of receptive fields in a hexagonal sampling grid for
that scale. The receptive field arrays are all centered in the middle of the retina, and are
superimposed. One receptor contributes to all receptive fields that overlap with its position.
The smallest array (the lowest set in the stack) forms the fovea; the largest array just covers
the whole retina. From [Lindeberg and Florack 1992].

The model states that we do a simultaneous sampling of the image at all scales. We have
scale-invariance, i.e. there is no preference for a particular scale. So what we measure, and
send to the brain, is not a single image, but a stack of images, a scale-space. The very first
stage of the visual front-end is a multi-scale sampling device. Figure 10.1 shows an example
of such a stack of images.

This model is in good accordance with recent measurements of receptive field sizes. One
measure for this size is the extent of the dendritic tree of the retinal ganglion cells. Dendrites
10. A scale-space modelfor the retinal sampling 169

are the tree branch-like structures on the ganglion nerve cell body, collecting information
from neighbouring cells that have synaptic connections on them. These dendrites can be
measured accurately with sophisticated single cell dying techniques, after which the dendritic
tree can be measured under the microscope (figure 10.3). There are different types of
ganglion cells in the retina. The two most prominent families are the midget and the parasol
ganglion cells. The midget cells are smaller, and project to the two layers with small cell
bodies (the parvo-cellular layers; Latin: parvus = small) in the Lateral Geniculate Nucleus
(LGN). The parasol cells are larger, and project to the magno-cellular layer in the LGN
(Latin: magnus = large).

im = Import["lena64.gif"] [[i, 1]];


{yres, xres} = Dimensions[im]; max = Max[im];
Bloek[{$DisplayFunction = Identity},
b o t t o m = Graphics3D[ListPlot3D[Table[0, {yres}, {xres}],
Map[GrayLevel, im/max, {2}], M e s h ~ F a l s e , B o x e d ~ F a l s e ] ] ;
stack = Table[blur = gD[im, 0, 0, i]; Graphics3D[ListPlot3D[
Table[if0, {yres}, {xres}], Map[GrayLevel, b l u r / m a x , {2}],
M e s h ~ F a l s e , B o x e d ~ F a l s e ] ] , {i, i, 6}]];
scalespace= Show[{bottom, stack}, B o x R a t i o s ~ {1, 1, 1.3},
ViewPoint-> {1.081, -3.236, 0.930}, B o x e d - > T r u e ,
DisplayFunction -> SDisplayFunction, ImageSize -> 250];

Figure 10.2 The retina does not measure a single image, but a series of images at different
scales, here depicted as a scale-space stack of the famous 'Lena' image. The scale is the
vertical dimension. The structural relations over scale in this stack are referred to as the
'deep structure' of an image.

As we shall see, the midget ganglion cells are involved in the measurement of shape and the
parasol ganglion cells are involved in the measurement of motion.
170 10.1 The size and spatial distribution of receptive fields

Show[
Import["midget and parasol dendritic trees.jpg"], ImageSize-> 250];

Figure 10.3 Dendritic tree size of retinal midget and parasol ganglion cells. The dendritic tree
size is believed to be a representative measure for the receptive field size. Note the
substantial range of sizes. Midget cells project to parvo-cellular LGN cells, parasol cells
project to magno-cellular LGN cells. From [Rodieck 1998].

The model predicts that the RF diameter of both parasol and midget cell dendritic tree should
increase linearly with eccentricity, which (in the macaque, a short-tailed asian monkey often
used for research) is indeed found. When the measured area is plotted on the retina where
they have been measured, we clearly see the increase of size with eccentricity (see figure
10.4).

Show[Import["RF eccentricity graph.jpg"], ImageSize -> 300];

Figure 10.4 Largest diameter of the dendritic tree area in macaque retina as a function of
eccentricity for midget and parasol ganglion cells. The size increases linearly with larger
eccentricity, in accordance with the model. The upper cloud depicts the larger parasol
ganglion cells, the lower cloud the smaller midget cells. From [Rodieck1998].

Note that there is a lack of small receptive fields at large eccentricities, as expected. We lack
visual acuity at higher eccentricity due to a lack of small receptive fields. Acuity is so low,
l O. A scale-space model for the retinal sampling 171

that e.g. at 60 degrees eccentricity it is impossible to judge the number of fingers another
person is sticking up at lm distance!

Another observation is the lack of large receptive fields in the center, which the model
predicts. This may be due to the fact that in the fovea it is increasingly difficult to inject the
ganglion cell bodies with dye, due to the small size. The receptor density in the fovea is so
high that in that area the ganglion cell bodies are displaced to slightly more outward areas,
introducing another bias.

DisplayTogetherArray[
Show/@Import /@{"midget cell rfs.jpg", "parasol cell rfs.jpg"l,
ImageSize -> 500];

Figure 10.5 Retinal locations of different sizes of dendritic trees of midget (left) and parasol
(right) retinal ganglion cells in the macaque. From [Rodieck 1998].

Very few large receptive fields populate the retina, making it statistically unlikely to hit them
in the dye injection process.

Note also that the dendritic areas substantially overlap, another prediction by the model.
Koenderink [Koenderink1984d] was the first to propose this retinal stack model, or
' sunflower model'.

Clearly, the distribution of dendritic tree sizes as a function of eccentricity (figure 10.4)
shows a linear relation with eccentricity. From psychophysics it is well known that many
important parameters change linearly with eccentricity. Among these parameters are
decreasing visual acuity, decreasing velocity discrimination thresholds, and decreasing stereo
acuity thresholds.
172 10.1 The size and spatial distribution of receptive fields

The scatter plot (figure 10.4) shows a linear upper and a lower bound to the distribution of
cells. This can be understood by considering the receptive field sizes at a particular
eccentricity: the smallest receptive field size is of those cells whose total area is just reaching
the location at hand. Smaller receptive fields can only be found at smaller eccentricity. All
the other receptive fields of this size, and all smaller ones, are at a smaller eccentricity. Only
larger receptive fields are found when we go to greater eccentricity. The largest receptive
field size at our specific location are bound by the fixed number of receptive fields in our
tiling grid per scale that fits on the outer bounds of the retina.

Show[Import["on-off ganglion locations.gif"], ImageSize-> 330];

Figure 10.6 The distance between on- and off center receptive fields is significantly smaller
than the distance between on-center or off-center cells proper. This implies that at a single
retinal position the visual system measures both signs simultaneously. From [Dowling1987].
See also [W&ssle1991 ].

10.2 A scale-space model for the retinal receptive fields


A good model for the sensitivity profile of a center-surround RF turns out to be Laplacian of
the Gaussian kernel (figure 10.7). This is the well-known 'Mexican Hat' function, given by
02G 0ZG ~ ~2+y2
Ox2 + ~ where G is the Gaussian kernel given by G(x, y; o-) = e - 2~2
10. A scale-space model for the retinal sampling 173

Actually, it is a striking finding that we don't look with our rods and cones proper, but with
receptive fields composed of (up to hundreds of) rods and cones in a center-surround
structure. In scale-space terminology we 'observe the Laplacian of the world'. Figure 7.8
shows what is actually transmitted from the retina into the brain.

lap1 =.; lapl Ix_, y_, a_] :=


D[gauss[x, a] gauss[y, o], {x, 2}] + D[gauss[x, o] gauss[y, el, {y, 2}];
Block[{$DisplayFunction = Identity}, a = 11;
SetOptions[Flot3D, PlotPoints -> 50, Shading -> True, Mesh -> False];
(pl, p2} = Plot3D[Evaluate[-lepl[x, y, #]], {x, -a, a}, {y, -a, a}] & /@ {1, 3};
{p3, p4} = Plot3D[Evaluate[lapl[x, y, #]], {x, -a, a}, {y, -a, a}] & /@ {i, 3};
{p5, p6} =
DensityPlot[Evaluate[-lapl[x, y, #]], {x, -a, a}, {y, -a, a}] & /@ {i, 3};
{pT, p8} = DensityPlot[Evaluete[lapl[x, y, #]], {x, -a, a}, {y, -a, a}] & /@
{ i , 3}];
Show[GraphiesArray[{{p1, p2, p3, p4}, {pS, p6, p?, p8}}], ImageSize -> 480];

Figure 10.7 The Laplacian of Gaussian function as a mathematical model for the retinal
center-surround receptive field sensitivity profile. Top row: small scale and large scale 'on'-
center-surround (left) and 'off-center-surround receptive field. Bottom row: The sensitivity
profiles plotted as density functions. This is the model for the measurement depicted in the
previous chapter in figure 9.12.

9 Task 10.1 If w e do not s e e m to measure the zeroth and first order spatial
derivatives on the retina, hoes do w e then perceive a linear gradient in intensity?

We know already for a decades that the center-surround receptive fields are causing specific
'illusions'. Two examples are given below. When we observe an image with a stepwise
intensity ramp, we notice brighter regions at the side of the step closest to the darker region,
and darker regions at the side of the step closest to the brighter region. When we take the
Laplacian of the image, it becomes clear that the up- and downswing of the intensity can be
nicely explained by taking the second order derivative (see figure 10.9).

im=Import["Utrecht256.gif"]|[1,1]];
Block[{$DisplayFunction=Identity},
{pl,p2}=Table[ShadowPlot3D[# laplaceG[x,y,G], {y, -15, 15}, {x, -15,
15},
P l o t L a b e l ~ " ~ = "<>ToString[a],ShadowPosltion->-#],{a,2,6}]& /@ {1,-1};
{p3,p4}=Table[ListDensityPlot[# (gD[im,2,0,G]+gD[im,0,2,u])],{G
, 2 , 6 } l & /@ { i , - I } ] ;
174 10.2 A scale-space model for the retinal receptive fields

Show[GraphicsArray[{p1,p3,p2,p4}],ImageSize->360];

Figure 10.8 What the off-center surround (top row) and off-center surround (third row)
receptive fields in the retina and LGN 'see' at a range of respective scales of the RF's.
Scales: o- = 2 to 6 pixels.

steps =Table[Ceiling[x/30], {y, 100}, {x, 300}];


lapl=-(gD[steps, 2, 0, 3] +gD[steps, 0, 2, 3]);
DisplayTogetherArray[
{ListDensityPlot[#, AspectRatio ~ Automatic, Epilog ->
{Hue[l], Line[{{0, 50}, {300, 50}}]}] &/@{steps, lapl},
ListP1ot[#[[15]], AspectRatio-> .2, PlotJoined->True,
Axes~False] &/@{steps, lap1}}, ImageSize~400,
Frame-> True, Epilog~ {Text["brighter\tdarker", {0.25, 0.25}],
Arrow[{.18, .28}, {.21, .35}], Arrow[{.32, .28}, {.28, .35}]}];

Figure 10.9 Left: linear intensity step function, resolution 300x100. Human observers see
brighter intensities near an edge opposite to the darker region (a phenomenon known as the
Craik-O'Brien-Cornsweet illusion, see www.sloan.salk.edu/~thomas/coce.html for the
colored version of this illusion). Right: stepfunction convolved with a Laplacian center-
surround receptive field.
10. A scale-space model for the retinal sampling 175

The 2D case of this phenomenon is the illusion of the grey squares in the Hermann grid (see
figure 10.10):

hermanngrid = Table[If[(Mod[x, 45l ~ 32) && (Mod[y, 45] ~ 32), -i, i],
{y, 300}, {x, 300}]; {lap11, lap12} =
(gD [hermanngrid, 2, 0, #] + gD[hermanngrid, 0, 2, #]) & /@ {4, .5};
DisplayTogetherArray[ListDensityPlot /@ {hermanngrid, -lapll, lapl2},
ImageSize -> 350];

Figure 10.10 Left: the famous Hermann grid. Resolution 3002 . When we fixate on a crossing
of the white lines, we notice grey spots at all other crossings. They do not appear at the
crossing where we fixate. Middle: the grid convolved ('observed') with a Laplacian off-center
receptive field of scale o- = 4. The grey spots are a result of the Laplacian filtering at a coarse
scale. Right: idem for a Laplacian center-surround filter of small scale, o- = 0.5 pixels. At the
fovea (the fixation point) we have small scales of the receptive fields, and do not observe the
grey spots. For many more illusions, see AI Seckel's illusions webpage
http://www.illusionworks.com.

It is unknown why we have this center-surround receptive field structure at the retinal level.
The traditional textbook explanation stresses the notion of lateral inhibition, the
enhancement of structure relative to its neighboring structures.

Another often used model is the representation of the receptive field sensitivity function of
retinal ganglion cells as a difference of Gaussians function. It is unclear however (other than
heuristic) how the two widths of the Gaussians should be taken.

Plot[gauss[x, 2] -gauss[x, 25], {x, -40, 40}, ImageSize-> 150];

0.1

-40 40

Figure 10.11 Model of the on-center receptive field function as a difference of Gaussian
function.
176 10.2 A scale-space model for the retinal recepfive fields

~gL
The diffusion equation -bY = A L may be interpreted (in a sloppy sense) as d~L = &t A L, so
integration of both sides of the equation over all scales gives a robust measurement in the
sense that the exclusion of some scales due to damage to some ganglion cells or axons in the
optic nerve may be less noticeable for the subsequent layers.

Another interpretation of the diffusion equation may be the conjecture that on the retina we
OL
actually sample -3?, the change in luminance when we change locally our aperture
somewhat.

It is an intriguing observation that the multi-scale sampling of the outside world by the visual
system takes place at the retinal level. All scales are separately and probably independently
sampled from the incoming intensity distribution. In multi-scale computer vision applications
the different scale representations are generated afterwards. The fundamental reason to
sample at this very first retinal level is to observe the world at all scales simultaneously.

In chapters 13-15 we will discuss the resulting scale-space proper, i.e. the deep structure of
the scale-space.

ira= Import["Tl.pgm"] [[i, i]] ~ lapl = (gD[im, 2, 0, #] +gD[im, 0, 2, #]) &;


s t a c k = lapl/@Table[E ~, {r, 0, 3, .15}];
DisplayTogetherArray[{ListDensityPlot/@ Take[stack, 7],
ListDensityPlot/@Take[stack, {8, 14}],
ListDensityPlot/@Take[stack, -7]}, ImageSize-> 490];

Figure 10.12 Stack of on-center receptive field function outputs as an exponential function
of scales.
10. A scale-space model for the retinal sampling 177

10.3 Summary of this chapter


The retina is a multi-scale sampling device. A scale-space inspired model for the retinal
sampling at this very first level considers the retina as a stack of superimposed retinas each at
a different scale. As a consequence of scale invariance, each scale is likely to be treated
equally, and be equipped with the same processing capacity in the front-end. This leads to
the model that each retina at a particular scale consists of the same number of receptive fields
that tile the space, which may explain the linear decrease of acuity with eccentricity.

The reasons why we do a center-surround sampling on the retina are not yet clear. The
sampling of the Laplacian at a range of scales may be necessary for efficient analysis in a
proper multi-scale 'deep structure' setting, much of which strategy still needs to be
discovered.

DisplayTogetherArray [
ListDensityPlot/@ {ira, Plus @@ stack, lap111.5]}, ImagaSize-> 400];

Figure 10.13 Left: original image, resolution 217x181. Middle: Sum of the Laplacians of the
original image at 16 scales (exponentially sampled between 1 and 20 pixels, see fig. 10.12).
Right: The Laplacian of the original image at a scale of 1.5 pixels.
11. The front-end visual s y s t e m -
LGN and cortex
What we see depends mostly on w h a t w e look for. (Kevin Eikenberry, 1998)

11.1 The thalamus


From the retina, the optic nerve runs into the central brain area and makes a first
monosynaptic connection in the Lateral Geniculate Nucleus, a specialized area of the
thalamus (see figure 11.1 and 11.2).

<< FrontEndVision'FEV';
Show[Import["optic pathway bottom view.jpg"], ImageSize -> 360] ;

Figure 11.1 The visual pathway. The left visual field is processed in the right half of the brain,
and vice versa. The optic nerve splits half its fibers in the optic chiasm on its way to the LGN,
from where an extensive bundle projects to the primary visual cortex VI. From [Zeki 1993].

The thalamus is an essential structure of the midbrain. Here, among others, all incoming
perceptual information comes together, not only visual, but also tactile, auditory and balance.
It is one of the very few brain structures that cannot be removed surgically without lethal
consequences.

The thalamus structure shows a precise somatotopic (Greek: soma = o-~o/t~ = body; topos = z
o~roq = location) mapping: it is divided in volume parts, each part representing a specific part
of the body (see [Sherman and Kock 1990]).
l l . The f r o n t - e n d visual system - L G N and cortex 180

Show[GraphicsArray[{Import["thalamus coronal cut.jpg"],


Impors location.jpg"]}],
G r a p h i c s S p a c i n g ~ -.15, ImageSize -> 500];

Figure 11.2 Location of the left and right thalamus with the LGN in the brain. Left: coronal
slice through the middle of the brain. The thalamus is a very early structure in terms of brain
development. Right: spatial relation of the thalamus and the cerebellum. From [Kandel et al.
2000].

Show[Import["thalamus with LGN.jpg"], ImageSize -> 450];

Figure 11.3 Subdivisions of the thalamus, each serving a different part of the body. Note the
relatively small size of the LGN at the lower right bottom. From [Kandel et al. 2000]. See also
biology.about.com/science/biology/library/organs/brain/blthalam usimages.htm.
181 11.2 The lateral geniculate nucleus (LGN)

11.2 The lateral geniculate nucleus (LGN)


On the lower dorsal part of each thalamus (figure 11.3) we find a small nucleus, the Lateral
Geniculate Nucleus (LGN). The left and fight LGN have the size of a small peanut, and
consist of 6 layers (figure 11.4).
The top four layers have small cell bodies: the parvo-cellular layers (Latin: parvus = small).
The bottom two layers have larger cell bodies: the magno-cellular layers (Latin: magnus =
big). Each layer is monocular, i.e. it receives ganglion axons from a single eye by
monosynaptic contact.

Show[Import["LGN layers.jpg"], ImageSize-> 210];

Figure 11.4 The 6 layers of the left Lateral Geniculate Nucleus (LGN). The top four parvo-
cellular layers have relatively small cell bodies; the bottom two magno-cellular layers have
relatively large cell bodies. The central line connects the cells receiving projections from the
fovea. Cells more to the right receive projections from the nasal visual field, to the left from
the temporal visual field. Note that the first and third layers are somewhat shorter on the right
side: this is due to the blocking of the nasal visual field by the nose. Size of the image: 3 x 4
mm. From [Hube11988a].

The order interchanges from top to bottom for the parvo-cellular layers: left, right, left, fight,
and then a change for the magno-cellular layers: fight, left. The magno-cellular layers are
involved in mediating motion information; the parvo-cellular layers convey shape and color
information. It is so far not known what is separated in the 2 pairs of parvo-cellular layers or
in the pair of magno-cellular layers. Some animals have a total of 4 or eight layers in the
LGN, and some patients have been found with as many as 8 layers.

The mapping from retina to the LGN is very precise. Each layer is a refinotopic (Greek:
topos = location) map of the retina. The axons of the cells of the LGN project the visual
signal further to the ipsi-lateral (on the same side) primary visual cortex through a wide
bundle of projections, the optic radiations.
l 1. The front-end visual system - L G N and cortex 182

What does an LGN cell see? In other words, what do receptive fields of the LGN cells see?

Electrophysiological recordings by DeAngelis, Ohzawa and Freeman [DeAngelis1995a]


learned that the receptive field sensitivity profile of LGN cells are the same as those of
retinal ganglion cells: circular center-surround receptive fields, with on-center and off-center
in equal numbers, and at the same range of scales.

However, the r e c e p t i v e f i e l d s a r e n o t c o n s t a n t in time, or stationary. With the earlier


mentioned technique of reverse correlation DeAngelis et al. were able to measure the
receptive field sensitivity profile at different times after the stimulus. The polarity turns out
to change over time: e.g. first the cell behaves as an on-center cell, several tens of
milliseconds later it changes into an off-center cell.

A good model describing this spatio-temporal behavior is the product of a Laplacian of


Gaussian for the spatial component, multiplied with a first order derivative of a Gaussian for
~ + ~0~ 1I ~0c- " So such a cell is an operator: it takes the
the temporal domain. In formula: (Tp-x2
first order temporal derivative.

Show[Import["Spatiotemporal LGN.jpg"], ImageSize -> 180];

Figure 11.5 Spatio-temporal behavior of a receptive field sensitivity profile of an LGN cell.
Delays after the stimulus shown 15-280 ms, in increments of 15 ms. Each figure is the
sensitivity response profile at a later time (indicated in the lower left corner in ms) after the
stimulus. Read row-wise, first image upper left. Green is positive sensitivity (cell increases
firing when stimulated here), red is otherwise. From [DeAngelis et al. 1995a]. See also their
informative web-page: http://totoro.berkeley.edu.

As in the retina, 50% of the center-surround cells is on-center, 50% is off-center. This may
indicate that the foreground and the background are just as important (see figure 11.6).
183 11.2 The lateral geniculate nucleus (LGN)

DisplayTogetherArray[
Show/@Import/@{"utensilsl.jpg", "utensils2.jpg"}, ImageSize-> 400];

Figure 11.6 Foreground or background? The same image is rotated 180 degrees. From
[Seckel2000].

11.3 Corticofugal connections to the LGN


It is well known that the main projection area after the LGN for the primary visual pathway
is the primary visual cortex in Brodmann's area 17 (see figure 11.8). The fibers connecting
the LGN with the cortex form a wide and conspicuous bundle, the optic radiation (see figure
11.10). The topological structure is kept in this projection, so again a map of the visual field is
projected to the visual cortex. A striking recent finding is that 75% of the number of fibers in
this bundle are corticofugal ('from the cortex away') and project from the cortex to the LGN!
The arrow in figure 11.7 shows this.

Show[Import["visual pathway projections with arrow.gif"],


ImageSize -> 260];

Figure 11.7 75% of the fibers in the optic radiation project in a retrograde (backwards)
fashion, i.e. from cortex to LGN. This reciprocal feedback is often omitted in classical
textbooks. Adapted from Churchland and Sejnowski [Churchland1992a].

This is an ideal mechanism for feedback control to the early stage of the thalamus. We
discuss two possible mechanisms:
1. Geometry-driven diffusion;
2. Long-range interactions for perceptual grouping.
11. The front-end visual system - LGN and cortex 184

A striking finding is that most of the input of LGN cells comes from the primary cortex. This
is strong feedback from the primary cortex to the LGN.

It turns out that by far the majority of the input to LGN cells (nearly 50%) is from higher
cortical levels such as V1, and only about 15-20% is from retinal input (reviewed in
[Guillery1969a, Guillery1969b, Guillery1971])!

It is not known what exact purpose these feedback loops have and how these retrograde (i.e.
backwards running) corticofugal (i.e. fleeing from the cortex) projections are mapped. It is
generally accepted that the LGN has a gating / relay function [Sherman1993a,
Sherman1996a].

One possible model is the possibility to adapt the receptive field profile in the LGN with
local geometric information from the cortex, leading e.g. to edge-preserving smoothing:
when we want to apply small scale receptive felds at edges, to see them at high resolution,
and to apply large scale receptive fields at homogeneous areas to exploit the noise reduction
at coarser scales, the model states that the edginess measure extracted with the simple cells in
the cortex may tune the receptive field size in the LGN. At edges we may reduce the LGN
observation scale strongly in this way. See also [Mumford1991a, Mumford1992a, Wilson
and Keil 1999]).

In physical terminology we may say that we have introduced local changes in the
conductivity in the (intensity) diffusion. Compare our scale-space intensity diffusion
framework with locally modulated heat diffusion: at edges we have placed heat isolators, so
at those points we have reduced or blocked the diffusion process. In mathematical terms we
may say that the diffusion is locally modulated by the first order derivative information.

Of course we may modulate with any order differential geometric information that we need
in modeling this geometry-driven, adaptive filtering process. We also may modulate the size
of the LGN receptive field, or its shape. Making a receptive field much more elongated along
an edge than across an edge, we can smooth along the edge more then we smooth across the
edge, thus effectively reducing the local noise without compromising the edge strength. In a
similar fashion we make the receptive field e.g. banana-shaped by modulating its curvature
so it follows even better the edge locally, etc.

This has opened a large field in mathematics, in the sense that we can make up new,
nonlinear diffusion equations.

This direction in computer vision research is known as PDE (partial differential equation)
-based computer vision. Many nonlinear diffusion schemes have been proposed so far, as
well as many elegant mathematical solutions to solve these PDE's. We will study an
elementary set of these PDE's in chapter 21 on nonlinear, geometry-driven diffusion.

An intriguing possibility is the exploitation of the filterbank of oriented filters we encounter


in the visual cortex (see next chapter). The possibility of combining the output of differently
oriented filters into a nonlinear perceptual grouping task is discussed in chapter 19.
185 11.3 Corticofugal connections to the LGN

S h o w [ I m p o r t [ " A 1 - 0 2 - b r o d m a n n . g i f " ] , I m a g e S i z e -> 400] ;

Figure 11.8 In 1909 Brodmann published a mapping of the brain in which he identified
functional areas with numbers. The primary visual cortex is Brodmann's area 17. From these
maps it is appreciated that the largest part of the back part of the brain is involved in vision.
From [Garey1987]).

11.4 The primary visual cortex


The primary visual cortex is the next main station for the visual signal. It is a folded region
in the back of our head, in the calcarine sulcus in area 17. In the visual cortex we find the
first visual areas, denoted by VI, V2, V3, V4 etc.

The visual cortex (like any other part of the cortex) is extremely well organized: it consists of
7 layers, in a retinotopic, highly regular columnar structure. The layers are numbered 1
(superficial) to 7 (deep). The LGN output arrives in the middle layers 4a and 4b, while the
output leaves primarily from the top and bottom layers.
The mapping from the retina to the cortical surface is a log-polar mapping (see for a
geometrical first principles derivation of this transformation [Schwartz1994, Florack2000d]).

Show[Import["Vl c o r t i c a l c r o s s s e c t i o n . j p g " ] , I m a g e S i z e - > 320] ;

Figure 11.9 Horizontal slice through the visual cortex of a macaque monkey. Slice stained for
cell bodies (gray matter). Note the layered structure and the quite distinct boundaries
between the visual areas (right of b and left of c). (a) V1, center of the visual field. (b) V1,
more peripheral viewing direction. (c) Axons between the cortical surfaces, making up the
gross connection bundles, i.e. the white matter. From [Hube11988a].
l l. The front-end visual system - LGN and cortex 186

If the retinal polar coordinates are (p, 0), with are related to the Cartesian coordinates by
x = p cos(0) and y = p sin(0), the polar coordinates on the cortex are described by (pc, 0c),
with pc =ln(~-0 ) and O c = q O . Here P0 is the size of the fovea, and l / q is the minimal
angular resolution of the log-polar layout. The fovea maps to an area on the cortex which is
about the same size as the mapping from the peripheral fields.

Task 11.1 Generate from an arbitrary input image the image as it is mapped
from the retina onto the cortical surface according to the log-polar mapping.
Hint: Use the Mathematica function L i s t ' r n t e r p o l a t i o n [ i m ] and resample
the input image according to the new transformed coordinates, i.e. fine in the
center, coarse at the periphery.

Show[Import["Calcarine fissure.jpg"], ImageSize-> 350];

Figure 11.10 LGN pathway. From the LGN fibers project to the primary visual cortex in the
calcarine sulcus in the back of the head. From [Kandel et al. 2000].
187 11.4 The primary visual cortex

The cortical columns form a repetitive structure of little areas, about 1 x 1 mm, which can be
considered the visual 'pixels'. Each column contains all processing filters for local
geometrical analysis of that pixel. Hubel and Wiesel [Hubel1962] were the first to record the
RF profiles of V1 cells. They found a wide variety of responses, and classified them broadly
as simple cells, complex cells and hypercomplex (end-stopped) cells.

11.4.1 Simple cells

The receptive field sensitivity profiles of simple cells have a remarkable resemblance to
Gaussian derivative kernels, as was first noted by Koenderink [Koenderink1984a]. He
proposed the Gaussian derivative family as a taxonomy (structured name giving) for the
simple cells.

point = Table[0, {128}, {128}]; point[[64, 64]] = 1000;


Block[{$DisplayFunction = Identityl,
p1 = Table[ListContourPlot[gD[point, n, m, 15], ContourShading ~ True],
{ . , 1, 2}, {m, o, 1 } ] ] ;
Show[GraphicsArray[pl], G r a p h i c s S p a c i n g ~ 0, ImageSize -> 160];

Figure 11.11 Gaussian derivative model for receptive profiles of cortical simple cells. Upper
left: ~-,aGupper right: ~xay~ lower left: a--~-,~ lower right: ~-xay
' OsG "

Daugman proposed the use of Gabor filters in the modeling of the receptive fields of simple
cells in the visual cortex of some mammals. In the early 1980's a number of researchers
suggested Gaussian modulated sinusoids (Gabor filters) as models of the receptive fields of
simple cells in visual cortex [Marcelja1980, Daugman1980] Watson1982a, Watson1987a,
Pribram1991]. A good discussion on the use of certain models for fitting the measured
receptive field profiles is given by [Wallis1994].

Recall figure 2.11 for some derivatives of the Gaussian kernel. DeAngelis, Ohzawa and
Freeman have measured such profiles in single cortical cell recordings in cat and monkey
[DeAngelis1993a, DeAngelis1995a, DeAngelis1999a].

Figure 11.12 shows a measured receptive field profile of a cell that can be modeled by a
second order (first order with respect to x and first order with respect to t) derivative.
Receptive fields of simple cells can now be measured with high accuracy (see also Jones and
Palmer [Jones1987]).
1 l . The f r o n t - e n d visual system - L G N and cortex 188

Show[
Import["RF simple eel1 XT s e p a r a b l e s e r i e s . j p g " ] , I m a g e S i z e - > 400];

Figure 11.12 Receptive field profile of a V1 simple cell in monkey as a function of time after
the stimulus. Times in 25 ms increments as indicated in the lower left corner, 0-275 ms. Field
of view 2 degrees. From [DeAngelis et al. 1995a].

S h o w [ { I m p o r t [ " S l m p l e cell vl XT.gif"], G r a p h i c s [


{Text [ "x-~" , {5. , -5. } ] , Text ["it", {-8. , 8} ] } ] }, I m a g e S i z e -> 120] ;

Figure 11.13 Sensitivity profile of the central row of the sensitivity profile in the images of
figure 11.12 as a function of time (vertical axis, from bottom to top). This cell can be modeled
as the first (Gaussian) derivative with respect to space and the first (Gaussian) derivative
with respect to time, at an almost horizontal spatial orientation. From [DeAngelis et al.
1995a].

As with the LGN receptive fields, all the cortical simple cells exhibited a dynamic behaviour.
The receptive field sensitivity profile is not constant over time, but the profile is m o d u l a t e d .

In the case of the cell depicted in figure I 1.12 this dynamic behaviour can be modeled by a
first order Gaussian derivative with respect to time. Figure 11.13 shows the response as a
function of both space and time.

11.4.2 Complex cells

The receptive field of a complex cell is not as clear as that of a simple cell. They show a
marked temporal response, just as the simple cells, but they lack a clear spatial structure in
the receptive field map. They appear to be a next level of abstraction in terms of image
feature complexity.
189 11.4 The primary visual cortex

Block[{$DisplayFunction = Identity}, pl =
Table[Show[Import["eomplex " <>ToString[i] ,>".glf"]], {i, 1, 30}]];
Show[GraphicsArray[Partition[pl, 6]], ImageSize -> 260];

Figure 11.14 Complex cell receptive field sensitivity profile as a function of time. Only bright
stimulus responses are shown. Responses to dark stimuli are nearly identical in spatial and
temporal profiles. Bright bar response. XY domain size: 8 x 8 degs. Time domain: 0 - 150
msec in 5 msec steps. Orientation: 90 degrees. Bar size: 1.5 x 0.4 degrees. Cell
bk326r21.02r. Data from Ohzawa 1995, available at
neurovision.berkeley.edu/DemonstrationsNSOC/teaching/RF/Complex.html.
One speculative option is that they may be modeled as processing some (polynomial?)
function of the neighboring derivative cells, and thus be involved in complex differential
features (see also lAlonso1998a]).

As Ohzawa states: "Complex cell receptive fields are not that interesting when measured
with just one stimulus, but they reveal very interesting internal structure when studied with
two or more stimuli simultaneously" (http://neurovision.berkeley.edu/).

11.4.3 Directional selectivity

Many cells exhibit some form of strong directional sensitivity for motion. Small bars of
stimulus light are moved across the receptive field area of a rabbit directionally selective
cortical simple cell from different directions (see figure 11.15).

When the bar is moved in the direction of optimal directional response, a vigorous spiketrain
discharge occurs. When the bar is moved in a more and more deviating direction, the
response diminishes. When the bar is moved in a direction perpendicular to the optimal
response direction, no response is measured. The response curve as a function of orientation
is called the orientation tuning curve of the cell.
11. The f r o n t - e n d visual system - L G N and cortex 190

Show[Import["directional cell response.jpg"], ImageSize ~ 210];

Figure 11.15 Directional response of a cortical simple cell. This behaviour may be explained
by a receptive field that changes polarity over time, as in figure 11.12, or by a Reichart-type
motion sensitive cell (see chapters 11 and 17). From [Hube11988].

The cortex is extremely well organized. Hubel and Wiesel pioneered the field of the
discovery of the organizational structure of the visual system.

Show[Import["hypercolumn model.jpg"], ImageSize -> 340];

Figure 11.16 Left: projection of the LGN fibers to the cortical hypercolumns in the primary
visual cortex VI. A hypercolumn seems to contain the full functionality hardware for a small
visual space angle in the visual field, i.e. the RF's of the cells in a hypercolumn are
represented for both eyes, at any orientation, at any scale, at any velocity, at any direction,
and any disparity. In between the simple and complex cells there are small 'blobs', which
contain cells for color processing. From [Kandel et al. 2000].
191 11.4 The primary visual cortex

One of their first discoveries was that the left and fight eye mapping originates in a kind of
competitive fashion when, in the embryonal phase, the fibers from the eye and LGN start
projecting to the cortical area. They injected a monkeys eye with a radioactive tracer, and
waited a sufficiently long time that the tracer was markedly present, by backwards diffusion
through the visual axons, in the cortical cells. They then sliced the cortex and mapped the
presence of the radioactive tracer with an autoradiogram (exposure of the slice to a
photographic film). Putting together the slices to a cortical map, they found the pattern of
figure 11.17.

Show[Import["ocular dominance columns.gif"], ImageSize -> 200];

Figure 11.17 Ocular dominance bands of monkey striate cortex, measured by voltage
sensitive dyes. One eye was closed, the other was visually stimulated. The white bands
show the activity of the stimulated eye, the dark bands indicate inactivity. The bands are on
average 0.3 mm wide. Electrode recordings (dots) along a track tangential to the cortical
surface in layer 4 revealed that the single neuron response was consistent with the optical
recording. From [Blasde11986].

11.5 Intermezzo:
Measurement of neural activity in the brain

Single cell recordings have for decades been the method of choice to record the activity of
neural activity in the brain. The knowledge of the behaviour of a single cell however does
not give information about structures of activity, encompassing hundreds and thousands of
interconnected cells. We have now a number of methods capable of recording from many
cells simultaneously, where the mapping of the activity is at a fair spatial and temporal
resolution (for an overview see [Papanicolaou2000a]). The concise overview below is
necessarily short, and primarily meant as a pointer.

Electro-Encephalography(EEG)
Electro-encephalography is the recording of the electrical activity of the brain by an array of
superficial (on the scalp) or invasive (on the cortical surface) electrodes. Noninvasive EEG
recordings are unfortunately heavily influenced by the inhomogeneities of the brain and
scalp.
11. The front-end visual system - L G N and cortex 192

For localization this technique is less suitable due to the poor spatial resolution. Invasive
EEG studies are the current gold-standard for localizations, but they come at high cost, and
the results are often non-conclusive.

Magneto-Encephalography(MEG)
The joint activity of many conducting fibers leads to the generation of tiny magnetic fields
(in the order of f e m t o T e s l a = 10-1ST). The high frequency maintained over a so-called
Josephson junction, a superconducting semiconductor junction, is influenced by minute
magnetic fields, forming a very sensitive magnetic field detector. Systems have now been
built with dozens of such junctions close to the skull of a subject to measure a local map of
the magnetic fields (see figure 11.18).

Show[Import["KNAW MEG system.jpg"l, ImageSize -> 420];

Figure 11.18 The 150-channel MEG system at the KNAW-MEG Institute in Amsterdam, the
Netherlands (www.azvu.nl/meg/).

DisplayTogetherArray [
Show/@ Import /@ { "MEG-1 .gif", "MEG-2 .gif" }, ImageSize ~ 400] ;

Figure 11.19 Epileptic foci (in red) calculated from magneto-encephalographic measurements
superimposed on an MRI image. From: www.4dneuroimaging.com.
193 l l.51ntermezzo: Measurement of neural activity in the brain

From these measurements the inducing electrical currents can be estimated, which is an
inverse process, and difficult. Though the spatial resolution is still poor (in the order of
centimeters), the temporal resolution is excellent.

The calculated location of the current sources (and sinks) are mostly indicated on an
anatomical image such as MRI or CT (see figure 11.19).

Functional MRI (fMRI)

Most findings about cortical cell properties, mappings and connections have been found by
electrophysiological methods in experimental animals: recordings from a single cell, or at
most a few cells. Now, functional magnetic resonance imaging fMRI is able, non-invasively,
to measure the small differences in blood oxygenation level when there is more uptake in
capillary vessels near active neurons (BOLD fMRI: blood oxygen level dependence), fMRI
starts to shed some light on the gross functional cortical activity, even in human subjects and
patients, but the resolution (typically 1-3 mm in plane, 2-5 mm slice thickness) is still far
from sufficient to understand the functionality at cellular level.

Functional MRI is now the method of choice for mapping the activity of cortical areas in
humans. Knowledge of the functionality of certain brain areas is especially crucial in the
preparation of complex brain surgery. Recently high resolution fMRI has been developed by
Logothetis et al. using a high fieldstrength magnet (4.7 Tesla) and implanted radiofrequency
coils in monkeys [Logothetis 1999].

DisplayTogetherArray[
Import / @ ( " f M R I 01 Max Planck.jpg", "fMRI 02 Max Planck.jpg"},
ImageSize -> 370];

Figure 11.20 Functional magnetic resonance of the monkey brain under visual stimulation
Blood Oxygen Level Dependence (BOLD) technique, field strength 4.7 Tesla. Left: Clearly a
marked activity is measured in the primary visual cortex. Right: different cut-away views from
the brain of the anesthetized monkey. Note the activation of the LGN areas in the thalamus.
Measurements done at the Max Planck Institute, T0bingen, Germany, by Nikos Logothetis,
Heinz Guggenberger, Shimon Peled and J. Pauls [Logothetis1999]. Images taken from
[http://www.m pg.de/pri99/pri19_99.htm].
11. The front-end visual system - L G N and cortex 194

Optical imagingwith voltage sensitive dyes


Recently, a powerful technique is developed for the recording of g r o u p s of neurons with high
spatial and temporal resolution. A voltage-sensitive dye is brought in contact with the
cortical surface of an animal (cat or monkey) [Blasdel1986, Ts'o et al. 1990]. The dye
changes its fluorescence under small electrical changes from the neural discharge with very
high spatial and temporal resolution. The cortical surface is observed with a microscope
through a glass window glued on the skull. For the first time we can now functionally map
large fields of (superficial) neurons. For images, movies and a detailed description of the
technique see: http://www.weizmann.ac.il/brain/images/lmageGallery.html. Some examples
of the cortical activity maps are shown in the next chapter.

Positron EmissionTomography (PET)


With Positron Emission Tomography imaging the patient is injected a special radioactive
isotope that emits p o s i t r o n s , which quickly annihilate with electrons after their emission.

A pair of photons is created which escape at opposite directions, stimulating a pair of


detectors in the ring of detectors around the patient. The line at which the annihilation took
place is detected with a coincidence circuit checking all detectors. The isotopes are often
short-lived, and are often created in a special nearby cyclotron. The method is particularly
powerful in labeling specific target substances, and is used in brain mapping, oncology,
neurology and cardiology. Figure 11.21 shows an example of brain activity with visual
stimulation.

Show[Import["PET brainfunction 02.gif"], ImageSize -> 240];

Figure 11.21 Example of a transversal PET image of the brain after visual stimulation. From
www.crump.ucla.edu/Ipp/. Note the marked activity in the primary visual cortex area.
195 11.6 Summary of this chapter

11.6 Summary of this chapter


The thalamus in the midbrain is the first synaptic station of the optic nerve. It acts as an
essential relay and distribution center for all sensorial information. The lateral geniculate
nucleus is located at the lower part, and consists of 4 parvocellular layers receiving input
from the small retinal midget ganglion cells, and 2 layers with magnocellular cells, receiving
input from the larger retinal parasol cells. The parvocellular layers are involved with shape,
the magnocellular cells are involved in motion detection.

No preference for a certain scale induces the notion of treating each scale with the same
processing capacity, or number of receptive fields. This leads to a scale-space model of
retinal RF stacking: the set of smallest receptive fields form the fovea, the larger receptive
fields each form a similar hexagonal tiling. With the same number of receptive fields they
occupy a larger area. The superposition of all receptive field sets creates a model of retinal
RF distribution which is in good accordance with the linear decrease with eccentricity of
acuity, motion detection, and quite a few other psychophysical measures.
The receptive field sensitivity profile of LGN cells exhibits the same spatial pattern of on-
and off-center surround cells found for the retinal ganglion cells. However, they also show a
marked dynamic behaviour, which can be modeled as a modulation of the spatial sensitivity
pattern over time by a Gaussian derivative. The scale-space model for such behaviour is that
such a cell takes a temporal derivative. This will be further explored in chapters 17 and 20.

In summary: the receptive fields of retinal, LGN and cortical cells are sensitivity maps of
retinal stimulation. The receptive fields of:
- ganglion cells have a 50% on-center and 50% off-center center-surround structure at a wide
range of scales;
- LGN cells have a 50% on-center and 50% off-center center-surround structure at a wide
range of scales; they also exhibit dynamic behaviour, which can be modeled as temporal
Gaussian derivatives;
- simple cells in V1 have a structure well modeled by spatial Gaussian derivatives at a wide
range of scales, differential order and orientation;
- complex cells in V 1 have not a clear structure; their modeling is not clear.

The thalamic structures (as the LGN) receive massive reciprocal input from the cortical areas
they project to. This input is much larger then the direct retinal input. The functionality of
this feedback is not yet understood. Possible mechanisms where this feedback may play a
role are geometry-driven diffusion, perceptual grouping, and attentional mechanisms with
information from higher centers. This feedback is one of the primary targets for computer
vision modelers to understand, as it may give a clue to bridge the gap between local and
global image analysis.
12. The front-end visual s y s t e m -
cortical c o l u m n s

"Better keep yourself clean and bright;


you are the window through which you must see the worM"
-George Bernard Shaw

12.1 Hypercolumns and orientation structure


Hubel and Wiesel were the first to find the regularity of the orientation sensitivity tuning.
They recorded a regular change of the orientation sensitivity of receptive fields when the
electrode followed a track tangential to the cortex surface (see figure 12.1).

<< FrontEndVision'FEV';
Show[
Import["orientation tangential track 01.jpg"], ImageSize-> 320];

\-311 i

% 6
i .~b ,- .+IV :
i
I ~J ~< ,-."' % I ,nil]imcte,
\-3O i- "o

/ :~JL_.__ .a....................E
........................~.2.~::~.......- - - ~ ' ~
~l I 2 3 4
"l?ackdi3Iancc(Inil]h~lctct :~)

Figure 12.1 A tangential electrode tracking along the surface of the cortex displays a neat
ordering of orientation selectivity of the cortical receptive fields. Horizontally the electrode
track distance is displayed, vertically the angle of the prominent orientation sensitivity of the
recorded cell, From [Hube11982].

A hypercolumn is a functional unit of cortical structure. It is the hardware that processes a


single 'pixel' in the visual field for both eyes. There are thousands of identical hypercolumns
tiling the cortical surface.

The vertical structure in this small patch of cortical surface does not show much variation in
orientation sensitivity of the cells, hence the name 'columns'.
198 12.1 Hypercolumns and orientation structure

Show[GraphiesArray[
Import /@ {"cortical columns model.jpg", "orientation column.jpg"}],
ImageSize -> 400l;

Figure 12.2 Left: Cortical columns are found at all places on the cortex. It is a fundamental
organizational and structural element of topological organization. Right: A visual column in
V1 contains all orientations (in a pinwheel-like structure). From [Ts'o et al. 1990].

They contain cells of all sizes, orientations, differential order, velocity magnitude and
direction, disparity and color for both left and right eye. It is a highly redundant filterbank
representation. The left and right eye dominance bands form somewhat irregular bands over
the cortical surface (figure 11.16).

From the voltage sensitive dye methods we now know that the fine structure of the
orientation sensitivity is organized in a pinwheel fashion [Bonhoeffer and Grinvald 1993]
(see figure 12.3), i.e. the spokes connect cells firing at the same orientation. In these
measurements the monkey is presented with a multitude of sequential lines at particular
orientations.

Show[GraphicsArray[Import/@ {"iso-orientation contours.gif",


"iso-orientation contours zoomed.gif"}], ImageSize-> 400];

Figure 12.3 Left: Voltage sensitive dye measurements of orientation sensitivity on a small
patch of V1 in the macaque monkey. Size of the cortical patch: 9x12 mm. Right: enlarged
section of the rectangular area in the left figure. Shaded and unshaded areas denote the left
and right eye respectively. Colored lines connect cells with equal orientation sensitivity. They
appear in a pinwheel fashion with the spokes in general perpendicular to the column
boundary. From [Ts'o et al. 1990].
12. The f r o n t - e n d visual system - cortical columns 199

Show[GraphicsArray[Import/@{"iso-orientatlon contours bw.gif",


"orientation columns model.gif"}], ImageSize -> 320] ;

Figure 12.4 Left: The orientation columns are arranged in a highly regular columnar
structure. Arrow 1 and 4: the singularity in the pinwheel iso-orientation contours is located in
the center. Arrow 2 and 3: Border between left and right eye columns. Note that the iso-
orientation contours are in majority perpendicular to the boundaries of the columns. Right:
Model to explain the measured results. The iso-orientation lines mostly reach the ocular
dominance boundary at a right angle. From [Blasdel and Salama 1986].

It is not known how the different scales (sizes of receptive fields) and the differential orders
are located in the hypercolumns. The distance from the singularity in the pinwheel and the
depth in the hypercolumn form possible mapping possibilities.

Injection of a pyramidal cell in layer 2 and 3 in a monkey with a dye (horseradish


peroxidase) reveals that such cells make connections to cells in neighboring columns (see
figure 12.5). The clusters of connections occur at intervals that are consistent with cortical
column distances.

Show[Import["orientation coupling.gif"], ImageSize -> 220];

Figure 12.5 Cells of a hypercolumn at a particular orientation have facilitating connections


some distance away. This distance is just to cells in neighboring hypercolumns with the
same orientation sensitivity, thus enabling a strong perceptual grouping on orientation, which
is essential for the perception of lines, contours and curves. From [McGuire1991], see also
[Kandel2000].
200 12.1 Hypercolumns and orientation structure

Show[Import["orientation coupling model.gif"] , ImageSize -> 240] ;

Axis of orientation
\ I /--\ I /--\ I /

Figure 12.6 Model for the excitatory orientation coupling between neighboring columns with
similar orientation. From [Kandel et al. 2000].

This may be particularly important for close-range perceptual grouping [Elder1998,


Dubuc2001a]. It has been shown that the projections are only with those neighboring cells
that have the same functional specificity (a vertical line in this case), see figure 12.6 and
[Kandel2000, pp. 542] and [Gardner1999]. See also Zucker [Zucker2001a].

12.2 Stabilized retinal images


DensityPlot[-E -x2-y~ , {x, -2, 2}, (y, -2, 2}, PlotRange ~ {-2, 0},
Epilog-> {White, Point[{O, 0 } ] }, ImageSize-> 300] ;

Figure 12.7 Stimulus to experience the disappearance of perception (also called visual
fading) when the image is stabilized on the retina. Fixate a long time on the small central
white dot. After some time the grey blob completely disappears. From [Cornsweet1970].

Worth mentioning in the context of the biomimicking of vision into a mathematical


framework is the amazing fact that vision totally disappears in a few seconds (!) when the
image is stabilized on the retina [Ditchburn1952, Gerrits1966a].
12. The front-end visual system - cortical columns 201

One can fix the retinal image by an optical projection of a contactlens on the eye attached to
a fiber bundle carrying the image, or monitor carefully the eye movements and displace the
stimulus image appropriately counteracting the eye movements. One can appreciate this
phenomenon with the stimulus depicted in figure 12.7.

9 Task 12.1 Repeat this experiment with a foreground and background of different
colors. What fills in, the foreground or the background?

9 Task 12.2 Why is a gradual slope of the boundary of the blob stimulus required?

The immediate implication of visual fading with a stabilized retinal image is that we do not
perceived homogeneous areas, like a white sheet of paper. We perceive the boundaries, and
till-in the white of the paper and the background by measuring what happens on both sides of
the intensity contour.

We continuously make very small eye movements, possibly in order to keep seeing. It has
been suggested that eye movements play an important role in the perceptual grouping of
'coterminous' (non-accidental) edges [Binford1981].

Show[Import["micromovements.jpg"], ImageSize -> 300];

/ "\

Figure 12.8 Microsaccades are made very precisely synchronously by both eyes, and are
much larger than single receptor diameters.
From McCourt, www.psychology.psych.ndsu.nodak.edu.

These small eye movements are substantially larger than single receptors, and are made
synchronously by both eyes (figure 12.8). There is also small drift.

Figure 12.9 shows some stimulus patterns that continuously seem to shimmer. This is due to
your involuntary eye movements (drift, tremor and micro-saccades).

Burst in cortical neurons tend to occur after microsaccades. The investigation of the relation
between cortical responses and eye movements (in particular microsaccades) is an active
research area, where one investigates the problem of why we have a stable perception despite
these eye movements.
202 12.2 Stabilized retinal images

.* = 60; nr = 30; Show[Graphics[


{Table[{Blaek, Polygon[{{0, 0}, {Cos[*], Sin[*]}, {Cos[*+ . ~ ] , Sin[*+ . ~ ] } } ] } ,
25
{ , , o, =,~, - ; - } ] , ~ahZe[{BZaek, Di,kr{=.=, o}, r ] ,
1 1 1
White, Disk[{2.2, 0}, r-.~.~.nr ] }, .Jr, 1, - - 'nr2 -"n"~r}] ]']'
AspectRatio -> A u t o m a t i c , I m a g e S i z e -> 480] ;

Figure 12.9 Stimulus pattern, in which a shimmering effect is seen due to our small
involuntary eye movements. Note: when viewing on screen, adjust the magification of the
display for best resolution, and adjust or regenerate the pattern so fresh PostScript is
generated, n4~ is the number of segmentpairs, n r is the number of ringpairs.

12.3 The concept of local sign


How does a cell in the LGN and cortex know from what retinal position the incoming signals
arrive? How is the map formed?

The fibers that arrive in the LGN from both retinas all look the same: covered with a white
myelin sheet. This is different in our modern electronic equipment, where we carefully label
each wire with e.g. colors in order to know exactly which wire it is and what signal it is
carrying. Somehow the brain manages to find out how the wiring is done, which was formed
in the embryonic stage.

This philosophical problem was first studied by Lotze [Lotze1884], a German philosopher.
He coined the German term Lokalzeichen, which means 'local sign'. Jan Koenderink
followed up on this first thinking, and wrote a beautiful paper which put the Lokalzeichen in
a computer vision perspective [Koenderink1984d]. The main line of reasoning is the
following:Two cells can determine if they have a neighborhood relation if they are
correlated.

Neighboring geometrical properties have to correspond, such as intensity, contours with the
same orientation, etc.
12. The front-end visual system - cortical columns 203

A similar situation occurs when we solve a jigsaw puzzle (see figure 12.10). We know to find
the location of a piece due to its relations to its neighbors. In differential geometric
terminology: There need to be similar differential structure between two neighboring pieces.
The zero-th order indicates that the same intensity makes it highly likely to be connected. So
does a similar gradient with the same slope and direction, and the same curvature etc. It of
course applies to all descriptive features: the same color, texture etc. The N-jet has to be
interrelated between neighboring cortical hypercohimns at many scales.

So during the formation of the visual receptive field structures the system seems to solve a
huge jigsaw puzzle. Only be looking the cells are stimulated, and the system can accomplish
its task. When we are born, the neurons are very redundantly wired. A cell is connected to
too many of its neighbors.

Only those synapses that are necessary, remain during the formation process, because these
are the ones that are actually used.

A frequently used synapse grows, a hardly or not used synapse degenerates and will never
come back.

Show[Import["jigsaw.gif"], ImageSize-> 450];

Figure 12.10 We solve a jigsaw puzzle by finding corresponding geometrical properties


between pieces that have to be close together, in order to form a pair. In this example (from
http://jigzone.com/) the pieces have not been rotated or occluded. In our scale-space model
of the front-end visual system the pieces are blobs weighted with multi-scale Gaussian
derivative functions.

Receptive fields substantially overlap, and they should in order to create a correlation
between neighboring fibers. However, they overlap because we have a multi-scale sampling
structure. At a single scale, our model presumes a tight hexagonal tiling of the plane. There
is a deep notion here of the correlation between different scales, and the sampling at a single
location by receptive fields of different scale.

The reconstruction of the receptive field structure when the overlap relations are known is a
classical topological problem.
204 12.3 The concept of local sign

The methodology is beyond the scope of this introductory book. It has been solved for 1D on
a circular 1D retina (to avoid boundary effects) by Koenderink, Blom and Toet
[Koenderink1984c, Koenderink1984d, Toet1987] but for higher dimensions it is still
unsolved.

Note: It is interesting to note that social networks, which can be seen as overlapping
'receptive fields', have recently been discovered as being 'searcheable'. The psychologist
Milgram distributed small postal packages among arbitrary people in Nebraska, requesting
them to send these to someone in Boston.

Because they did not know this person, there were asked to send the package to someone of
which they expected he would know him. To Milgram's surprise it took on average only 6
steps for the packages to reach their target. This has recently been mathematically modeled
by Watts et al. [Watts2002].

Show[Import["owl.gif"], ImageSize-> 250];

Figure 12.11 The solution of the jigsaw puzzle of figure 12.10.

12.4 Gaussian derivatives and Eigen-images


It has been shown that the so-called Eigen-images of a large series of image small patches
have great similarity to partial Gaussian derivative functions [Olshausen1996,
Olshausenl997]. The resulting images are also often modeled as Gabor patches and
wavelets. In this section we will explain the notion of Eigen-images and study this statistical
technique with Mathematica.

We read the many patches as small square subimages of ~=12x12 pixels, non-overlapping, at
17 horizontal and 17 vertical position, leading to a series of 289 patches. Figure 12.12 (next
page) shows the location of the patches. These 289 images form the input set.
12. The front-end visual system - cortical columns 205

im = Imp0.rt["forest06.gif"][[l, 1]]; 6 = 12;


ListDensityPlot[im, Epilog ->
{Gray, Table[Line[{{i, j}, { i + 6 , J}, { i + 6 , j +6}, {i, j +6}, {i, j}}],
{j, 2, 256, 15}, {i, 2, 256, 15}]}, ImageSize-> 170];

Figure 12.12 Location of the 289 small 12x12 pixel patches taken from a 2562 image of a
forest scene.

The small 12x12 images are sampled with SubMatrix:

set []Table[SubMatrix[im, {j, i}, {6, 6}],


{j, 2, 256, 15}, {i, 2, 256, 15}]; Dimensi0.ns[set]

{17, 17, 12, 12}

and converted into a matrix m with 289 rows of length 144. We multiply each small image
with a Gaussian weighting function to simulate the process of observation, and subtract the
global mean:

x 2 +y2
0 = 4 ; g = T a b l e [ E x p [ - -----~-~] , {x, - 5 . 5 , 5 . 5 } , {y, - 5 . 5 , 5 . 5 } ] ;
set2 = M a p [ g # &, set, {2}];
Plus @@ #
m=Flatten[Map[Flatten, set2, {2}], i] ; m e a n = &;
Length [# ]
m = N [m - mean [Flatten [m] ] ] ; Dimensions [m]

{289, 144l

We calculate m Y m, a 1442 matrix with the D o t product, and check that it is a square matrix:

Dimensions[mTm= N[Transp0.se[m].m]]

{144, 144}

The calculation of the 144 Eigen-values of a 1442 matrix goes fast in Mathematica. Essential
is to force the calculations to be done numerically with the function N [ ]. Because m ~ a is a
symmetric matrix, built from two 289x144 size matrices, we have 144 (nonzero) Eigen-
values:
206 12.4 Gaussian derivatives and Eigen-images

S h o r t [ T i m i n g [ e v s = e i g e n v a l u e s = E i g e n v a l u e s [ m T m ] ] , 5]

{0.046 Second,
{3.08213• v, 9 . 6 2 6 2 1 • , 4.30075• , 2.83323• , 1.42025• ,
1.385 • , 1.20726• , 890446., 811958., <<126>>, 670.255, 6 4 4 . 0 3 9 ,
613.394, 503.244, 4 4 2 . 5 1 5 , 366.791, 284.952, 250.393, 235.25}}

We calculate the Eigenvectors of the matrix m T m and construct the first Eigen-image by
partitioning the resulting 144xl vector 12 rows. All Eigen-vectors normalized: unity length.

eigenvectors = Eigenvectors[mTm];
e i g e n i m a g e s [] T a b l e [ P a r t i t i o n [ e i g e n v e c t o r s [ [ i ] ] , 6 ] , {i, 1, 8l];
DisplayTogetherArray[ListDensityPlot/@ e i g e n i m a g e s , I m a g e S i z e -> 460];
D i s p l a y T o g e t h e r A r r a y [ L i s t P l o t 3 D / @ e i g e n i m a g e s , I m a g e S i z e -> 460];

Figure 12.13 The first 8 Eigen-images of the 289 patches from figure 12.10.

Note the resemblance of the first Eigen-image to the zeroth order Ganssian blob, and the
OG OG
second and third Eigen-image to the first order Gaussian derivatives ~-x and ~-y, and the
02 G
4th, 5th and 6th Eigen-image to the second order Gaussian derivatives 7 under 120 degree
different angles.
We will derive in chapter 19 that a second order Gaussian derivative in any direction can be
constructed from 3 second order partial derivative kernels each 120 degrees rotated (this is
the steerability of Gaussian kernels, they form a basis).

The Eigen-images reflect the basis functions in which the spatial structure of the images can
be expressed. The natural basis for spatial image structure are the spatial derivatives
emerging in the local Taylor expansion. When there is no coherent structure, such as in white
noise, we get Eigen-images that reflect just noise. Here are the Eigen-images for white noise
(we take the same 289 12x12 patches again):

noise = Table[Random[I, {256}, {256}]; 6 = 12;


s e t = Table[SubMatrix[noise, {j, i}, {6, 6}], {j, 3, 256, 15}, {i, 3, 256, 15}];
m = Flatten[Map[Flatten, set, {2}], 1];
m = N[m - mean[Flatten[m]]]; m T m = N[Transpose[m].m];
{eigenvaluesn, eigenvectorsn} = Eigensystem[mTm];
eigenimagesn = Table[Partition[eigenvectorsn[[i]], 6], {i, i, 8}];

DisplayTogetherArray[ListDensityPlot/@eigenimagesn, ImageSize -> 460] ;

Figure 12.14 The first 8 Eigen-images of 289 patches of 12x12 pixels of white noise. Note that
none of the Eigen-images contains any structure.
12. The front-end visual system - cortical columns 207

Note that the distribution of the Eigen-values for noise are much different from those of a
structured image. They are much smaller, and the first ones are markedly less pronounced.
Here we plot both distributions:

DisplayTogether [
LogListPlot [evs, PlotJoined -> True, PlotRange -> {. 1, Automatic} ] ,
LogListPlot [eigenvaluesn, PlotJoined -> True] , ImageSize -> 250] ;

l'x107

lO0(X30

looo ~
\

1o

Figure 12.15 Nonzero Eigen-values for a structured image (upper) and white noise (lower).

When we extract 49 x 49 = 2401 small images of 12 x 12 pixels at each 5 pixels, so they


slightly overlap, we get better statistics.

A striking result is obtained when the image contains primarily vertical structures, like trees.
We then obtain Eigenpatches resembling the horizontal high order Gaussian derivatives /
Gabor patches (see figure 12.16).

im= Import["forest02.gif"] [[i, i]] ; 6 = 12;


set []
Table[SubMatrix[im, {j, i}, {6, 6}], {j, 2, 246, 5}, {i, 2, 246, 5}l;
x2 + y 2
66= (6-i) /2; a:66; g : Table[N[Exp[-

{x, -66, 66}, {y, -66, 66}] ; s e t 2 []Map[g# &, set, {2}] ;
Plus @@ #
m = Flatten[Map[Flatten, set2, {2l] , 1] ; m e a n = &;
Length [# ]
m = N[m-mean [Flatten[m] ] ] ; m T m = N [ T r a n s p o s e [ m ] .m] ;
eigenvectors = Eigenvectors [mTm] ;
eigenimages = Table [Partition [eigenvectors [ [i] ] , 6] , { i, i, 251 ] ;
208 12.4 Gaussian derivatives and Eigen-images

Block[{$DisplayFunction = identity}, pl = ListDensityPlot[im]; p2 =


Show[GraphicsArray[Partition[ListDensityPlot/@eigenimages, 5]l]; ]
Show[GraphicsArray[{pl, p2}, I m a g e S i z e - > 420]];

Figure 12.16 Eigen-images for 2401 slightly overlapping patches of 12x12 pixels from the
image of figure 12.10. Due to the larger number of patches we get better statistics. Note that
the first Eigenpatches resemble the high order horizontal derivatives

9 Task 12.3 Show the Eigen-images for a range of patch sizes, from 5x5 to
15x15 pixels. How can the result be interpreted?

Task 12.4 The visual cortex receives input from center-surround receptive fields,
thus (scale-space model for front-end vision) from the Laplacian of the input
image. Show the Eigen-images for the Laplacian of the input image at several
scales. Interpret the results, especially with respect to the first Eigen-image.

9 Task 12.5 Find the (color) Eigen-images for patches taken from a RGB color
image.

12.5 Plasticity and self-organization


It can be speculated that the coherent structure in the collective set of first retinal images that
we perceive after birth creates the internal structure of the observation mechanism in our
front-end visual system. Numerous deprivation studies have shown the importance of the
early visual stimulation for visual development.
The closure of one eye during the first 4 months after birth due to some illness or accident,
prevents the creation of proper stereopsis (depth vision). It was shown histologically in early
deprivation experiments by Hubel and Wiesel [Hubel1988a] that the involved cells in the
LGN seriously degenerated in a monkey with a single eye blindfolded the first months after
birth. Later deprivation showed markedly less such degeneration effects.
12. The f r o n t - e n d visual system - cortical columns 209

When we arrive on this world, we are redundantly wired. Each neuron seems to be connected
to almost all its neighboring neurons. Connections that are hardly used deteriorate and
disintegrate, connections that are frequently used are growing in strength (and size).

The same principle is found in the output of the neural circuitry, in the activation of muscles.
Muscle cells in neonates are innervated by multiple nerve endings but finally a single ending
remains (synapse elimination), exactly in the middle where the minimum of shear forces is
felt.

Here the self-organizing parameter may be the shear-stress. The nasal half of the retina
projects to a different half of the visual cortex than the temporal half.

The receptive fields on the vertical meridian on the retina have to communicate with each
other in different brain halves. The connection is through an abundant array of fibers
connecting the two cortical areas. This is the fornix, a bundle easily recognized on sagittal
MR slices as the corpus callosum.
How do the millions of fibers know where their corresponding contralateral receptive field
projection locations are? The answer is that they don't have to know this. The connections in
the redundant and superfluous wiring after birth that are not stimulated by visual scenes on
the receptive field just bordering the vertical meridian are degenerating, leaving the right
system connections in due time (a few months).

9 Task 12.6 In this section we took many patches from a single image. Show that
similar Eigen-images emerge when we take patches from different images.

Task 12.7 Show the Eigen-images for a variety of other images, e.g. natural
scenes, faces, depth maps, ultrasound speckle images. Is there any difference
between tomographic slice images (where no occlusion is present) or real world
3D -~ 2D projection images?

This statistical approach to the emergence of a representation of operators in the early visual
system is receiving much attention today, see e.g. [Rao2001]. Keywords are principal
component analysis (PCA), partial least squares (PLS), canonical correlation analysis
(CCA), independent component analysis (ICA), multiple linear regression (MLR) and sparse
code learning (SCL). It is beyond the scope of this book to elaborate further on this topic.

9 Task 12.8 What determines the orientation angle, the sign and the amplitude of
the emerging Eigen-images?

Task 12.9 This formalism is now easily extended to images as scale-spaces.


When a small stack of Gaussian scale-space images is brought into a column
vector format, the same apparatus applies. Show the Eigen-images for such
210 12.5 Plasticity and self-organization

scale-spaces, and set up conjectures about the self-emerging operators when


the neural system is presented with large series of scale-space images as input.

12.6 Higher cortical visual areas


Show[Import["cortical functlonal areas.jpg"], ImageSize-> 430];

Figure 12.17 Functional diagram of the visual pathways. From [Kandel2000].

From V1 projections go to the higher visual layers in the cortex, such as V2, V3, V4 and the
MT (medio-temporal) layer.
It is beyond the scope of this chapter on front-end vision to discuss all layers in detail. Figure
12.17 summarizes the principal connections. It is clear from this diagram, that the visua
system has dedicated pathways through the multiple visual areas. They are related to the
separate functional properties that are measured: shape, color, motion, and disparity. The
current view is that there are two major pathways: a ventral pathway to the inferior temporal
cortex and a dorsal pathway to the posterior parietal cortex. The projections in each pathway
are hierarchical, there are strong projections between the subsequent levels as well as strong
feedback connections.

The type of visual processing changes systematically from one level to the next (see the
excellent overview by Robert Wurtz and Eric Kandel [Kandel2000, chapter 28]).
12. The front-end visual system - cortical columns 211

12.7 Summary of this chapter


The visual cortex is a somatotopic visual map of the retina. The visual cortex consists of 6
structural layers, which can be discriminated by cell type and function. The input from the
LGN is in the middle layer 4, the output to higher centres is from the layers above and
below. The visual cortex is a hierarchical cascade of functional areas, V1, V2, V3, V4 and
up. In V1 a precise arrangement of hypercolumns is shown by both single electrode studies
and voltage sensitive dye functional mapping methods.

Each hypercolumn contains the cellular machinery for complete binocular analysis of a
'pixel' in the overlapping visual field of both eyes. The cells sensitive for the same
orientation are located on 'spokes' of a pinwheel-like arrangement in the hypercolumn.

The cortex has to infer spatial arrangements of incoming fibers from the mutual relations
between the signals they carry. The concept of 'local sign' compares this inference with the
solving of a jigsaw puzzle from the N-jet and orientation information of receptive fields in
neighboring hypercolumns.

The structure of spatially and temporally coded receptive field sensitivity profiles is largely
constructed after birth by seeing.

The plasticity of these formations has been shown by deprivation experiments in


experimental animals, and can be modeled by self organizing neural networks. An analysis
of the Eigen-images for a large number of patches taken from an image shows great
similarity of the basis images with the Gaussian derivative receptive field models.The first
layers of the visual front-end (up to V2) are well mapped and studied. This is much less the
case for the higher cortical areas. A general strategy seems to be the hierarchical mapping
onto subsequent layers with substantial divergent feedback connections.

12.8 Vision dictionary


Below is a short explanation of the vision related terms used in this book. For more complete
listings see e.g. the Visionary webpages by Liden: (http://cns-
web.bu.edu/pub/laliden/WWW/Visionary/Visionary.html) and the glossary by Fulton
(http://w ww.4colorvision.com/pdf/glossary.pdf).

area 17 visual cortex, in the cortical area. Classification by Brodmann


afferent direction for nerve signals: from the receptor to the brain
amacrine cell - retinal cell, likely involved in motion processing
axon output fiber of a neuron
bipolar cell cell in the retina with two possible synaptic projection polarities
blob group of color processing cells in the center of a hypercolumn
coronal vertical 3D slice direction, plane through eye centers and cheek
corticofugal from the cortex away
caudal on the side of the tail
cranial on the side of the top (of the head)
dendrite treelike branch on a cell body for reception of synaptic input
212 12.8 Vision dictionary

depolarization - decreasing the inner voltage of the cell body


dorsal on the side of the back
efferent direction for nerve fibers: from the brain to effector (muscle, gland)
excitatory positive input increasing firing rate
fovea the central portion of the retina with highest acuity
ganglion cell cell type in the retina, the output cells
hypercolumn functional unit of the cortex (typ. l x l x 2 mm)
hyperpolarization - increasing the inner voltage of the cell body
inhibitory negative input, decreasing firing rate
lateral located at the side
macaque short-tailed asian monkey, often used in experiments
magno-cellular - with large cell bodies (Latin: magnus = big)
midget cells small retinal ganglion cells, involved in shape extraction
myelin insulating layer of a neuron's axon, white colored
nasal located at the side of the nose
nucleus a localized small structure in the brain
occipital located in the back (of the head)
optic chiasm crossing of the optic nerve
orientation column - column of cortical ceils containing all local orientations
parietal on the side
parvo-cellular - with small cell bodies (Latin: parvus = small)
parasol cell large retinal ganglion cells involved in motion
PSTH post-stimulus-time-histogram
psychophysics - measurement of human performance in perceptual tasks
Pacini receptor- onion-shaped pressure sensitive receptor in the skin (after Filippo
Pacini (1812-1883), Italian anatomist)
receptive field - 2D spatial light sensitivity area on the retina with respect to the cell's
firing rate
retinotopic forming a spatial, neighbor-preserving mapping with the retina
rhodopsin the light-sensitive protein in the rods and cones
sagittal vertical 3D slice direction (Latin: sagitta = arrow)
soma cell body (Greek: soma = body)
somatotopic forming a spatial map with a surface somewhere on the body
striate striped (Latin: stria = furrow)
striate cortex area 17. Area 17 is striped, due to a marked stripe of white matter in
layer 4 of myelinated axons
synapse tiny pedicle where one neuron passes information to the next
temporal located at the temple of the head (Latin: tempus = temple)
thalamus deep brain structure in the midbrain, receiving all perceptual
information
transversal horizontal 3D slice position
ventral on the side of the belly

12.8.1 Further reading on the web:

Some suggested webpages for further exploration:


12. The front-end visual system - cortical columns 213

Journal: Perception
www.perceptionweb.com/perabout.html
Space-time receptive fields in the visual system (Ohzawa, Berkeley):
neurovision.berkeley.edu/Demonstrations/V SOC/teaching/AA_RFtutorial.html
Voltage sensitive dye research at Weizmann Institute of Science, Israel:
www.weizmann.ac.il/brain/grinvald/index.html
Optical recording literature (compilation by Steve M. Potter):
www.its.caltech.edu/~pinelab/optical.html
LGN research (Dwayne Godwin):
www.wfubmc.edu/bgsm/nba/faculty/godwin/godwin.html
Center for Computational Vision and Control (early-vision models and biomedical image
analysis): cvc.yale.edu/
Magneto-encephalography (MEG): www.4dneuroimaging.com/
Positron Emission Tomography (PET): www.crump.ucla.edu/lpp/
Functional Magnetic Resonance Imaging (fMRI): www.functionalmri.org/,
www.spectroscopynow.com/Spy/mri/
Medical images and illustrations: www.mic.ki.se/Medimages.html
13 Deep structure I.
watershed segmentation
Erik Dam and Bart M. ter Haar Romeny

"Study the family as a family, i.e. define deep structure, the relation between
structural features of different derived images" [Koenderink1984a].

13.1 Multi-scale measurements


The previous chapters have presented the notion of scale - any observation is, implicitly or
explicitly, defined in terms of the area of support for the observation. This allows different
observations at the same location that focus on different structures. A classical illustration of
this concept is the observation of a tree. At fine scale the structures of the bark and the leaves
are apparent. In order to see the shapes of the leaves and the twigs a higher scale is required;
an even higher scale is appropriate for studying the branches, and finally the stem and the
crown are best described at a very coarse scale.

A comprehensive description of the tree requires observations at all scales ranging from the
cellular level to the scale corresponding to the height of the tree. However, in order to give a
full description of the tree, subsequent inspection at all the relevant scales is not sufficient.
Even though it is possible to measure the size of the crown and the number of leaves at
specific fixed scales, inspection of the connections between the structures requires
simultaneous inspection at all the intermediary scales. If we want to count the number of
leaves positioned at one of the trees major branches, it is necessary to link the localization of
the leaves at fine scale through the twigs (possibly at an even finer scale), the thin branches
and finally reaching the desired branch at a coarser scale.

The key point is that not only do we need to connect observations at different localizations -
we also need to link observations at different scales.
In the words of Koenderink, we must study the family of scale-space images as a family, and
define the 'deep' structure. 'Deep' refers to the extra dimension of scale in a scale-space, like
the sea has a surface and depth.

As demonstrated in the previous chapters, differential operators allow detection of different


features at a given, fixed scale - this is the superficial structure. The scale can be adaptive,
and different in every pixel. This brings up the notion of scale-adaptive systems like the
geometry-driven diffusion equations (in the Perona & Malik equations the scale of the
operator is adapted to the length of the gradient), and of scale selection.
13 Deep structure L watershed segmentation 216

This chapter will give examples of how linking of observations through the entire scale-
space offers additional information about the observed features.
The examples cover mechanisms for automatic choice of the appropriate observation scale,
localization of the proper location for features, and a robust segmentation method that takes
advantage of the deep structure.
The chapter is concluded with some thoughts on how the deep structure can be used to
extract the significant hierarchical information about the image structures and thereby give a
potentially comprehensive and compact description of the image.

13.2 Scale selection


A first step towards exploitation of the deep structure is the automatic selection of the
appropriate scale for an observation [Lindeberg1998a, Lindeberg1998b]. Differential
operators are applied to the detection, localization and characterization of a large number of
image features such as edges, blobs, comers, etc. The responses from these operators depend
on the scale at which the operators are applied - therefore it is essential to choose the proper
scale for the observation. Furthermore, the responses from these differential operators are
often used to characterize the properties of the feature - for instance the strength of an edge
or the size of a blob. Since the responses from the operators depend on the scale it is not
trivial to compare the strength of edges detected at different scales.

As an example, we inspect the standard blob detector defined by the spatial maxima of the
Laplacian, V2L = Lxx + Lyy = Lww + Lvv, where L is the luminance. The Laplacian can be
considered a measure of the blob strength in a point - thereby the maxima define the blobs.
In order to detect both light and dark blobs we use the absolute value of the Laplacian:

< < F r o n t E n d V i s ion" FEV" ;


b l o b n e s s [ i m _ , a...] : = A b s [ g D [ i m , 2, 0, •] + g D [ i m , 0, 2, a ] ] ;

This is illustrated for an image with a blob and a square, where the blobs are detected at two
different scales. In order to pick the blobs from these images, we first define a function that
allows extraction of the positions and values for the n largest maxima (with respect to a 2 , D
neighborhood) from an image of arbitrary dimension:

n M a x i m a [ i m _ , n_] : = M o d u l e [ { 1 , d = Depth[ira] - 1},


p [] Times @ @ Table [ (Sign [im - M a p [RotateLe ft, im, { i } ] ] + 1)
(Sign[im-Map[RotateRight, im, (i l]] + i ) , {i, 0, d - I}] ;
1 = Length[Position[p, 4d ] ] ;
Take [Reverse [Union [ {Extract Jim, # ] , # } & /@ P o s i t i o n [p, 4a ] ] ] ,
I f [ n < i, n, I]]]

With the S i g n function we first find maxima in the columns, then in the rows, and we
multiply (boolean 'and') the result to find the image maxima.The S i g n function is -1, 0 or 1
with negative, zero or positive value of the argument, so we divide by 2 4 = 16. We then
E x t r a c t in i m b the intensity values at every P o s i t i o n where p is l, U n i o n the result
(return sorted distinguished entries) and R e v e r s e this list so the largest intensity comes
first, T a k e the first n values, mad return the intensity value at every P o s i t i o n in this list.
217 13.2 Scale selection

The two blobs with largest feature detector responses are extracted from an example image at
two different scales (o- = 6 and o- = 10 pixels), see figure 13.1. The blobs are illustrated by
red dots and circles (at the detected blob location, radius proportional to the blob strength):

i m = 1 0 0 0 I m p o r t [ " s q u a r e b l o b . g i f " ] [[1, 1]] ;


B l o c k [ {S D i s p l a y F u n c t i o n [] Identity}, p1 = L i s t D e n s i t y P l o t [- im] ;
(m = n M a x i m a [ i m b []b l o b n e s s [ i m , #] , 2] ; Print ["o=", #, ", pos =",
Part[#, 2] & / @ m , "\n resp = ", Part[#, I] & / @ m ] ; ) & / @ { 7 , 11};
p2 = L i s t D e n s i t y P l o t [ b l o b n e s s [ i m , #], E p i l o g - >
{PointSize[.03], Hue[l], P o i n t [ R e v e r s e [ P a r t [ # , 2]]] & / @ m ,
C i r c l e [ R e v e r s e [ P a r t [ # , 2]], Part[#, i]] & /@ml] & / @ {7, Ii}] ;
S h o w [ G r a p h i c s A r r a y [P r e p e n d [p2, pl] ] , I m a g e S i z e -> 300] ;

a=7, pos ={{50, 15}, {20, 42}}


resp = {11.6499, 10.6793}

a=ll, pos ={{23, 42}, {50, 15}}


resp = {5.91633, 2.03575}

Figure 13.1 Detection of the two best blobs from a test image at two scales. At low scale
(o- = 7) px the square is the most blob-like feature - the circle is too large to be properly
detected at this scale. At high scale (o- = 10 px) the circle is detected as the most blob-
like shape, but the response is far less than the response for the circle at low scale since
the responses generally decrease with increasing scale. Note that the center coordinates
displayed are (x, y) where the origin is the lower left corner. The test image is 64x64
pixels with values 0 and 1.

At both scales, the two largest blobs are detected approximately at the centers of the square
and the blob. The response from the blob detector is largest for the square at the low scale.
This is not surprising since the scale simply is closer to the size of the square than the blob.
At the high scale the response is significantly higher for the blob than the square. However,
the response for the blob at high scale is still not higher than the response for the square at
low scale. Therefore, if we were to choose the single most blob-like feature in the image, we
would select the square.

The problem is a well-known property of the gaussian derivatives. In general, the amplitude
of a spatial gaussian derivative is degrading with increasing scale [Lindeberg1994a].
Therefore, the response from the circle is weaker simply because it is best detected at a
higher scale since it is larger than the square.
13 Deep structure L watershed segmentation 218

13.3 Normalized feature detection


The classical solution to these problems is to express the differential operators in terms of
n o r m a l i z e d derivatives [Lindeberg1993h, Lindeberg1994a]. The basic idea, as we recall from
chapter 3, is to measure the spatial coordinates relative to the scale - instead of using length x
we use the dimensionless coordinate ~: = x / o - which is normalized with respect to scale
[Florack1994b]. The normalized derivative operator for the function f with respect to the
spatial coordinate x is then of of
0--~-= O(x/r of
= cr Ox "
The standard feature detectors consisting of combinations of differential operators can now
be reformulated as normalized detectors. Some of the most common detectors are presented
in [Lindeberg1996b].

Analogously, since the Laplacian includes second order derivatives, the normalized blob
detector should instead be c r 2 V 2 L = o - Z ( L x x + L y y ) = o - Z ( L u u + L v v ) , leading to the
normalized feature strength measure function:

nblobness[im_, u__] : = A b s [ a 2 (gD[im, 2, 0, a] +gD[im, 0, 2, a])] ;

Reassuringly, the normalized detector points out the blob as being the most blob-like feature
in the image:

Block[{$DisplayFunction = Identity}, pl = ListDensityPlot[-iml;


(m = n M a x i m a [ i m b []nblobness[im, #], 2]; Print["G=", #, ", pos =",
Part[#, 21 & / @ m , "\n resp = ", Part[#, 1] & / @ m ] ; ) & / @ {7, 11};
p2 =ListDensityPlot[nblobness[im, #], E p i l o g - > {PointSize[.03],
Hue[l], Point[Reverse[Part[#, 2]]l & /@m, Circle[
Reverse[Part[#, 2 I], Part[#, i] /i00] & /@m}] & /@{7, 11}1 ;
o=7, pos ={{50, 15}, {20, 42)}
resp = {570.843, 523.284}

a=ll, pos ={{23, 42}, {50, 15}}


resp = {715.876, 246.325}

Show[GraphicsArray[Prepend[p2, pl]], ImageSize -> 300];

Figure 13.2 Detection of the two best blobs from a test image at two scales with
normalized blob detector. The blob gets the highest response (at high scale). The square
gets a lower response (at low scale). Compare with figure 13.1.
219 13.4 Automatic scale selection

13.4 Automatic scale selection


The example above depends on choosing appropriate scales for the illustrations. However,
since the normalized feature detector allows comparison of detector responses across scale,
this can be done automatically. For a given scale, features are detected in an image as the
spatial local maxima of the feature detector. When the same features are detected for all level
in a scale-space, the features are detected on a number of consecutive scales. The scale at
which a feature is best observed is the scale where the normalized feature detector has the
strongest response. The means that the points must not only be local maxima for the feature
detector in the spatial direction but also in the scale direction.

Since the function nMaxima was previously defined for an arbitrary dimension we can use
that for detection of maxima in scale-space as well. Superimposed on the example image are
the detected blobs illustrated with a circle with radius proportional to the detection scale:

im = Import["blobs.gif"][[1, 1]];
blobSS = Table[nblobness[im, Exp[r]], {~, 1.7, 2.7, .1}];
maxs =nMaxima[blobSS, 4]; blobSizeFactor = 1.5;
Table[Print["Blob strength: ", maxs[[i, 1]],
", (x,y): ", Reverse[maxs[[i, 2, {2, 3}]]]], {i, 4}];

Blob strength: 0.72431, (x,y): {57, 107}

Blob strength: 0.714045, (x,y): {94, 31}

Blob strength: 0.689219, (x,y): {27, 56}

B1ob strength: 0.439206, (x,y): {97, 101}

ListDensityPlot[im, Epilog ~ {Hue[l], Thickness[.01], Dashing[{0.03, 0.02}],


Table[Circle[{maxs[[i, 2, 3]], maxs[[i, 2, 2]]}, blobsizeFactor
Exp[(maxs[[i, 2, I]] -i) .i + 1.7]], {i, 4}]}, ImageSize ~ 130];

Figure 13.3 Detection of the four most blob-like features in a test image with different
shapes. The blobs are superimposed as circles with a radius proportional to the scale the
blobs was detected at. The output of the feature response reveals that the circles have
somewhat higher feature strength (between 0.71 and 0.72) than the square (at 0.68). The
scale-space is generated in 11 exponential levels from scale o- = e 17 to o- = e 2 - 7 . The
test image is 128 by 128 with values 0 and 1.

Inspection of the output reveals that there has been detected three blobs with similar
normalized responses (between 0.69 and 0.72) and one with a significantly lower response
(0.43), and that the three blobs with high responses actually correspond with the symmetric
blob-like objects in the image. The low-response object is the ellipse.
13 D e e p structure 1. w a t e r s h e d s e g m e n t a t i o n 220

It is also apparent that the detection scales and the sizes of the blobs are approximately
proportional. With a model of an ideal blob, this can be formalized through analysis of the
response from the blob detector. This is done for a gaussian bell-shaped model of a blob in
[Lindeberg1994a] revealing that the width of the blob and the detection scale are in fact
proportional.

13.4.1 ~-Normalized scale selection

Often, the normalized derivative operator has an added parameter A which allows the
operator to be tuned to the specific feature detectors. Instead of writing the normalized n-th
order derivative as Lil .. i,,-norm = O-n Li~ .. i,, (an often used notation, slightly different from the
one used above), the normalized derivative has the extra parameter ~:

Lfi .. i,, - n o r m = ~ Lil .. i,

Intuitively, Lindeberg determines the free parameter h. based on an analysis of the dimension
of the specific feature. For instance, for a blob ?~ is twice the ~t for an edge since the edge is
essentially only a 1-dimensional feature where the blob is 2-dimensional. For a more
thorough treatment of ~t-normalized derivatives see [Lindeberg 1994a].

Another approach for determining the h. parameter is to use the local fractal dimension
(defined in terms of the Hausdorff dimension) as a descriptor of the local complexity and
thereby estimate the A-parameter [Pedersen2000].

13.4.2 Is this really deep structure?

Conceptually, we follow the singularity points for the feature detector through scale-space
and locate the scale where the normalized feature strength is maximal. This is the appropriate
scale for detecting the feature and for extracting information about the feature. However, we
have more information: the nice continuous behaviour across scales allows us to locate the
optimal scale explicitly as the s i n g u l a r i t i e s for a local differential operator output in scale-
space.

This approach allows us, e.g. for the Laplacian operator, to select the most blob-like features
in the image and establish their approximate size. In the examples we have not given any
explicit method for determining whether or not to consider these features for actual blobs -
we have simply selected the n best. For a specific application it would be appropriate to
determine a threshold for the feature strength of the blob detector - or to establish a more
elaborate scheme.

In a sense, the approach ignores the deep structure: we don't explicitly follow the singularity
points through scale-space.

However, when establishing the location of the blob, we rely heavily on the deep structure.
The implicit assumption is that the singularity points form strings that are approximately
vertical in scale-space, meaning that the location of the blob is equivalent with its location at
the detection scale.
221 13.4 Automatic scale selection

Blobs are "nice" - they move relatively little as the image is blurred. This makes the
assumption reasonable for practical applications. Other feature are not so stable across scale.
For these we would have to track the singularity string down in scale from the detection scale
in order to establish a more precise localization of the feature. The following sections
investigate this approach.

Task 13.1 For a Gaussian bell-shaped blob the width of the blob and the
detection scale obtained with the normalized blob detector are proportional
[Lindeberg1994a]. Analyze whether this is the case for a circular step-edge blob
like the ones in the illustrations. Determine whether the proportionality factor
above is appropriately assigned (blobSizeFactor = 1.5).

13.5 Edge focusing


When an image is blurred through convolution with Gaussians of increasing aperture, the
noise in the image is gradually removed. However, the salient features are blurred as well.
Obviously, small scale structures vanish at high scale, but also the structure that remains is
affected by the blurring. When the aperture is increased the local features are influenced by a
large neighborhood in the image.

This process changes the appearance of objects. As we saw with the rectangles in the
previous section, the shapes are simplified towards rounded shapes. The blurring also
dislocates the objects and their edges.

To a large extent we avoided dislocation in the previous section because the objects in the
test images had nice spacing in between - thereby the "interaction" between the objects
during blurring was insignificant. For more complicated images this effect will be more
pronounced.

This section investigates how to take advantage of the deep structure in order to link the
simplified large scale structure to the fine scale origins. Specifically, we illustrate how the
edges can be tracked down through scale in order to establish the precise location of the edge.

13.5.1 Simplification followed by focusing


In chapter 1 we saw a first example that a structure, like an edge, could emerge from the
noise when we increased the scale of the operator. Both the scale of the structure and the
scale of the operator had to be larger than the fine grain structure of the noise. We repeat the
experiment for the detection of the gradient magnitude of a circle in additive uncorrelated
(white) noise.
13 Deep structure L watershed segmentation 222

noise= Table[2550Random[], {256}, {256}];


noisyDisk = Import[ "blackdisk256.gif"] [ [1, 1] ] + noise;
DisplayTogetherArray[Prepend[ListDensityPlot[

~ g D [noisyDisk, I, 0, E#] 2 + gD [noisyDisk, 0, 1, E~] 2 ] & /@


{0, I, 2, 3}, ListDensityPlot[noisyDisk]l, ImageSize ~ 4801;

Figure 13.4 Left: A circular disk embedded in noise. S/N=0.1" Image intensity range
[0,255], noise intensity range [0,2550]. Right: Image gradient at o- = 1, 2.7, 7.4, and 20
pixels. Image resolution 2562 . Only at large scales the contour of the disk emerges.

At scale o- = e 2 (= 7.4 pixels) we see the contour emerging from the noise. It is difficult to
find the locations of the edges at fine scale filtering.

It was suggested by Fredrik Bergholm [Bergholm1987] to use the large scale representation
of edges as a guide for the localization of edges at a finer scale. A gradient scale-space is
constructed and edges are located at a coarse scale where they are clearly visible, or can
easily be extracted with some thresholding (or e.g. a zero-crossing detection of the derivative
of this gradient image in the gradient direction, Lw w ). Edges at finer scales are then found by
searching a small neighborhood around the coarse-scale pixel - the scale-space is traversed
downwards until the finest scale is reached. The spatial location of the end of the trace is
considered the location of the edge that 'lived' until the coarse scale where the search started.
This procedure is named edge focusing.

13.5.2 Linking in 1D

The phenomenon is studied in somewhat more detail - for the sake of simplicity in 1D.

noisystep = Table[5.5Random[] + 2Tanh[2 (i-0.7)] +


Tanh[4 (i-l.0)] + 1.4Tanh[ (i-1.4)], {i, -2, 4, 0.011}];
ListPlot[noisystep, PlotRange -> All, PlotJolned -> True,
ImageSize -> 230];

Figure 13.5 A noisy 1D discrete edge signal. Length 546 samples.


223 13.5 Edge focusing

We define an edge as a point with maximal absolute incline. These are the zero-crossings for
the second order derivative. We are only interested in the location of the edge so we don't
need to consider normalizing the operators involved.

First we generate a noisy 1D edge:

The signs for the second order derivative is inspected in order to locate the edge. If there is a
change in sign between the 'pixel' just left of each 'pixel' of the second derivative, we have an
edge location. The graph of the sign-change of a signal as a function of scale is denoted the
signature of the signal.

We first calculate a scale-space of the second derivative on an exponential range of scales,


and take the S i g n . By mapping the 'difference with your neighbor' function
( R o t a t o R i g h t [ # ] - # ) & on each scale level, we get the following result (the function
g D f l D is a version of the Gaussian derivative function gD implemented in the Fourier
domain for 1D signals):

scalespaceLxx= T a b l e [ g D f l D [ n o i s y s t e p , 2, E'], {z, 0, 4.5, .015}];


signature = ( R o t a t e R i g h t [#] - #) & /@ S i g n [ s c a l e s p a c e L x x ] ;
ListDensityPlot[signature, A s p e c t R a t i o ~ .5, I m a g e S i z e -> 500];

Figure 13.6 The signature function of the noisy edge. Exponential scale-space, scale
range o- = Exp[O < ~-< 5], 251 scale levels. Edge focusing exploits the signature function
by tracking the most prominent features down to finer scales. The negative (black) edge
trace at the border of the signal is a consequence of the cyclic representation of the
signal in the implementation.

Positive upgoing edges are white, negative edges are black in the signature plot. We notice
the clear emergence of one edge, surviving much longer over scale then the other edges.

The notion of longevity can be viewed of a measure of importance for singularities


[Witkin83]. The semantical notions of prominence and conspicuity now get a clear meaning
in scale-space theory.
13 Deep structure L watershed segmentation 224

In a scale-space we see the emergence of the hierarchy of structures, in this example we see
edges emerging. A second thing to notice is the arched structure of the smaller edges over
scale. Positive and negative edges come together and annihilate.

These arches show interesting behaviour: sometimes three edges come close together, then
two of them annihilate, and one edge continues. We see here a first example of the behaviour
of singularity points over scale, in this example the behaviour of extrema for the gradient. In
the next sections we will study this behaviour in much more detail: this is the analysis of the
behaviour of structures over scale, the deep structure.

edgefocus[signature_, s t a r t l e v e l _ , dir_] : = M o d u l e [ { a , b, e, out},


out = 0. signature; a = P o s i t i o n [ s i g n a t u r e [ [ s t a r t l e v e l ] ] , dir];
D o [ b = P o s i t i o n [ s i g n a t u r e [ [ i ] ] , dir];
c = Select[b, (Position[a, # - 1] # {} [I
P o s i t i o n [ a , #] # 4} II Position[a, # + i] # 4}) &];
out[[i]] = R e p l a c e P a r t [ o u t [ [ i ] ] , -i, c]; b = c; a = b,
{i, s t a r t l e v e l - i, i, -I}]; out]
focused1= edgefocus[signature, #, 2] & / @ 4 1 7 0 , 280};
focused2 = e d g e f o c u s [ s i g n a t u r e , #, -2] & / @ {170, 280};
s h o w a r r a y [ { f o c u s e d l , focused2}, F r a m e -> True, F r a m e T i c k s -> False];

Figure 13.7 Edge focusing for the signature function of our noisy step edge. Top row: two
different start levels for the search downwards for negative edges. Bottom row: idem for
positive edges. In both cases a sharp edge position is reached at the bottom level, i.e. at
the original image. Compare with figure 13.6.

The edge focusing itself is implemented below for 1D signals. From a signature a copy is
made with zero's, and a start level is chosen from which to start the search downwards. At
the first level the positions are found (a) of the edge direction ( d i r = - 2 or + 2). From the
level below (b) those edges are selected (c), that have a position that is - 1 , 0 or + 1 different
from the position in the level above. The entries in the copy signature are then replaced at the
found positions with - 1 's, so they plot as black lines.
225 13.5 Edge focusing

Even though the linking scheme is quite simple and heuristic it reveals the potential.

The simplified large scale representation of the image is used to single out the prominent
edge. The deep structure of the singularity strings are then used to link the detected edges
down to the fine scale where the edges can be precisely located.

Task 13.2 In 2D the zero-crossings of the second derivative in the gradient


direction (Lww) are curves, and form non-intersecting surfaces in scale-space.
Develop a routine in Mathematica to compute and display such surfaces. Show
the internal structure by plotting cut-away views.

13.6 Follicle detection in 3D ultrasound


Edge focusing is particularly advantageous in noisy data, such as diagnostic ultrasound. We
discuss the application of detecting and analyzing follicles in 3D ultrasound
[TerHaarRomeny1999a]. Part of the implementation is due to Kalitzin.

DisplayTogetherArray[Show/@
Import/@ {"3dus probe.jpg", "3dus-slice75.gif"}, ImageSize-> 300];

Figure 13.8 Left: 3D ultrasound probe (Kretz Ultrasound) sweeps a 2D sector across in a few
seconds, effectively acquiring a pyramidal data volume. From this a rectangular Cartesian
equidistantly sampled dataset is interpolated. Right: slice through the left female ovary,
showing the follicles as dark hypo-echoic (i.e. with low echo signal) spherical volumes. Size
of the image 212xl 64 pixels, approx. 4x3 cm.

Knowledge about the status of the female reproductive system is important for fertility
problems and age-related family planning. The volume of these fertility requests in our
emancipated society is steadily increasing.
The number of the female egg cells (follicles) in both ovaries decreases roughly linearly
from 106 at birth to none at the start of the menopause. The detection, counting, shape
analysis and growth response to hormonal stimulation of follicles is an important diagnostic
procedure for ovarian aging.
13 Deep structure L watershed segmentation 226

This procedure is however labour-intensive and error prone, making a computer aided
analysis system a welcome addition. Intravaginal 3D ultrasound imaging of the follicles in
the ovary is the modality of choice.

z d i m [] y d i m [] x d i m = 128; n o i s e = C o m p i l e [ { z d i m , ydim, xdim},


n = T a b l e [ R a n d o m [ l , {zdim}, (ydlm}, {xdim}], { ( n , _ R e a l , 3))I;
f o l l i c l e = C o m p i l e [ { z 0 , y0, x0, r, zdim, ydim, xdim},
f = Table[If[(x-x0) 2 + ( y - y 0 ) 2 + ( z - z 0 ) 2 < r ~, 0., I.],
{z, i, zdim}, {y, I, ydim}, {x, i, xdim}],
{{f, _Real, 3}, {x, _Real}, {y, Real}, { z , _ R e a l } } ] ;
t e s t s e t = follicle[60, 65, 50, 10, zdim, ydim, xdlm] +
follicle[35, 25, 55, 7, zdim, ydim, xdim] +
follicle[35, 85, 95, 5, zdim, ydim, xdim] + n o i s e [ z d i m , ydim, xdim];
DisplayTogetherArray[Table[ListDensityPlot[testset[[i]],
P l o t R a n g e ~ {2, 4}, F r a m e ~ True, F r a m e T i c k s ~ False,
FrameStyle~Red, PlotLabel-> "slice"<>ToString[i]],
{i, 1, 90, 11}], I m a g e S i z e ~ 5 0 0 1 ;

Figure 13.9 Some slices from the artificial 3D ultrasound noisy testset with 3 follicles.

t e s t s e t b l u r r e d = g D n [ t e s t s e t , {0, 0, 0}, {3, 3, 3}] ;

We use the function n M a x i m a (defined in section 13.2) to find the n largest maxima in the
testset. The minus sign is to find the minima.

n M a x l m a [ i m , n_] := M o d u l e [ { l , d = Depth[im] - 1},


p = T i m e s @ @ Table [ (Sign[ira- M a p [ R o t a t e L e f t , ira, {i}] ] + 1)
( S i g n [ i m - M a p [ R o t a t e R i g h t , im, {i}]] + 1), {i, 0, d - 1}] ;
1 = Length[Posltion[p, 4d ] ] ;
T a k e [Reverse [Union [{Extract [im, # ] , # } & /@ P o s i t i o n [p, 4 a ] ] ] ,
I f [ n < I, n, i]]]
d e t e c t e d = n M a x i m a [ - t e s t s e t b l u r r e d , 3]

{(-2.50496, {60, 65, 50)},


{-2.66311, {35, 25, 55}}, {-2.93738, {35, 85, 95}}}

Indeed, the right minima positions are found. An alternative method (described in
[TerHaarRomeny1999al) is the use of 3D winding numbers. Winding numbers are explained
in chapter 15).

We next check if they are surrounded by a sphere (an example of model-based


segmentation), by tracking linear rays of 30 pixels long, starting in the detected minimum
position, and over 7 polar (or colatitudinal, 0 < 0 < n) and 5 azimuthal (or longitudinal,
0 < q~< 2~r) angles. We sample equidistantly along these rays the 3D ultrasound intensities
using cubic 3D polynomial interpolation (third order is the default interpolation order, linear
interpolation is acquired by adding the option I n t e r p o l a t i o n O r d e r ~ l ) :
227 13.6 Follicle detection in 3D ultrasound

interpolation = ListInterpolation[testset];
estep = ~ 1 8 ; r = 2~/5;
rays[z_, y_, x_] :=Module[{#, e, r}, Table[N[
interpolation[z+rCos[#] Cos[e], y+rSin[#] Cos[e], x+rSin[e]]],
{e, - P i / 2 + e s t e p , P i / 2 - e s t e p , estepl,
{~, 0, 2Pi-~step, #stepl, {r, i, 301]];

The tracking rays are first visualized, to be sure we have the proper sampling in 3D. Lines of
30 pixels long are drawn in the 7 polar and the 5 azimuthal directions for each minimum
found, the end is indicated with a blue dot:

star[z_, y_, ~:_] :=Module[{~, e, r},


Graphies3D[Table[{Wheat, Line[{{z, y, x}, (z + r Cos[~] Cos[e],
y+rSin[~] Cos[e], x+rSin[e]}}], Blue, AbsolutePointSize[3],
Point[(z +rCos[~] Cos[e], y+rSin[~] Cos[e], x+rSin[e]}]},
{e, -Pi / 2 + estep, Pi / 2 - estep, estep},
{~, 0, 2Pi-~step, ~step}, {r, 30, 30l], AspectRatio~ i]];
Show[Apply[star, Last/@ detected, 2],
PlotRange ~ {{1, xdim}, {1, ydim}, (1, zdim}}, ImageSize -> 270];

Figure 13.10 From each detected minimum rays are cast, along which the 3D US intensity is
sampled. The blue dots mark the rays' endpoints.

Let us investigate the interpolated sampled intensity profiles along the radiating rays from
the minimum point {60,65,50}:

profiles= rays[60, 65, 50];


ListDensityPlot[#, PlotRange ~ {2, 41,
F r a m e ~ T r u e , ImageSize -> 200] & /@ profiles;
13 Deep structure I. watershed segmentation 228

Figure 13.11 Sets of 5 3D ultrasound intensity profiles for each of the 7 polar directions. The
origin, i.e. the start point of the ray in the detected minimum, is to the left. Ray length is 30
pixels.

For each ray the position of the follicle boundary is found by edge focusing (function defined
in section 13.5.2) for the largest edge along the ray:

edgefocus[signature_, startlevel_, dir_] :=Module[{a, b, c, out},


out = 0. signature; a = Position[signature[[startlevel]], dir];
Do[b = Position[signature[[i]], dir];
c = S e l e c t [ b , ( P o s i t i o n [ a , # - i] # {} II
Position[a, #] # {} II Position[a, # + i] # {}) &];
out[[i]] = ReplacePart[out[[i]], -i, c]; b = c; a = b,
{i, startlevel- 1, 1, -1}]; out]

findedgelocation[track_] := Module[{scalespaceLxx},
scalespaceLxx = Table[gDflD[track, 2, E~], {z, 0, 2, .06}];
signature = (RotateRight[#] -#) & /@ Sign[scalespaceLxx];
Extract[Position[First[edgefocus[signature, 15, 2]], -i], {1, i}]
];

This finds them all:

outr =Map[findedgelocation, profiles, {2}]

{ { i 0 , i i , 11, i0, i 0 } , { i 0 , i0, i0, i0, 9},


{ i 0 , i0, i i , i i , i 0 } , { i 0 , i 0 , i0, i0, i 0 } ,
{i0, i0, i0, i0, i0}, {ii, i0, i0, 9, i0}, {i0, i0, 3, ii, i0}}

The proper 3D coordinates of the edge on the ray are found by converting the polar
coordinates to the Cartesian coordinates. We put the center of the follicle in the position of
the found minimum, and check the result visually.
229 13.6 Follicle detection in 3D ultrasound

Clear{f]; f[r_, {ne , n#_}] :=N[{z+rCos[n~step] Cos[-~/2+neestep],


y+rSin[n~#step] Cos[-~/2 +neestep], x+rSin[-~/2+neestep]}];
{Z, y, X} [] {60, 65, 50}; positions = MapIndexed[f, outr, {2}];
Show[Graphics3D[{Wheat, Map{Line{{[60, 65, 50}, #}] &,
Flatten{positions, 1]], Red, AbsolutePointSize[5],
Map{Point, positions, {2}]l], ImageSize~215];

Figure 13.12 Follicle boundary points (red points) as detected by the edge focusing algorithms
along each ray.

13.6.1 Fitting spherical harmonics to 319 points

A convenient representation for 3D point clouds on a detected surface form the spherical
harmonics. These orthogonal polynomials Y[n(o, fb) are the angular portion of the solution in
spherical coordinates to the Laplace's equation V2 qJ = AqJ = 0 (see
mathworld.wolfram.com/SphericalHarmonic.html).

The set of spherical harmonics up to second order is given by:

fitset =
Flatten[Table[SphericalHarmonicY[l, m, 8, ~], {I, 0, 2}, {m, -I, I, I}]]

~- Sin{e], ~ Cos{e],

_~_
e1~l ~ 3 Sin{e], 41 e 2i~ I~215 Sin{e]2

1 e_i~ ~ ~15 Cos{0] Sin{e], -4


2- 1 (-i +3Cos{e]2),

_ ~~eI, ~15 Cos{e] Sin{e], 1 e2i~ i~21--5 Sin{e]2 }

Mathematica's function Fit does a least square approximation with any set of functions. We
plug in the points and the set of fit functions:
13 Deep structure L waterst~ed segmentation 230

p o i n t s = F l a t t e n [ p o s i t i o n s , 1];
rerun[e_, #_] = C h o p [ F i t [ p o i n t s , fitset, {e, ~}] / / E x p T o T r i g , 10 -8 ] ;
rerun[e, #]

58.8381+ 1.96853Cos[el - 12.0257Cos[el 2 +


4 . 3 1 7 0 6 C o s [ # ] Sin[e] - 4 . 3 4 4 8 4 C o s [ e ] Cos[#] Sin[el +
6.409 Cos[2 #] Sin[el 2 - 1.69899 Sin[el Sin[~] +
4.29894 Cos[6~] Sin[e] Sin[#] + 8.01166 Sin[el 2 Sin[2 ~]

This shows the detected follicle as a 3D volume surface:

P a r a m e t r i c P l o t 3 D [ r o f u n [ e , #] [ Cos[el, Sin[e] Cos[#], Sin[e] Sin[#]},


{e, 0, ~}, {,~, 0, 2 ~}, A s p e c t R a t i o ~ A u t o m a t i c , I m a g e S i z e -> 150];

Figure 13.13 Spherical harmonic (second order) surface parametrization of one of the follicles
in the noisy 3D US test dataset

Because we have now an analytical expression for the volume, we can easily let
Mathematica calculate tile volume:

Clear[e, ~]; v o l u m e = I n t e g r a t e [ r o f u n [ e , ~], {e, 0, ~], {#, 0, 2 ~ } ]

1042.73

9 Task 13.3 Fit the spherical harmonics to 4th order.

9 Task 13.4 A high order of spherical harmonics functions as fit functions gives us
not the correct result, due to overfitting. What do we mean by this?
231 13.7 Multi-scale segmentation

13.7 Multi-scale segmentation


The trade-off between simplification and detail is classical. Gaussian blurring allow the
significant features to emerge from the noise - the price is general dislocation and blurring of
the objects of interest. As the previous sections indicate there is not need for such a black and
white perception of blurting. The features can be detected at the appropriate scale where they
are best distinguished from their surrounding - and then linking down through scale allow a
conceptual deblurring that allow the fine scale shape and features to be inspected.

This section presents a segmentation method that takes advantage of the deep structure in
order to facilitate this ,fimplification followed by extraction of the fine scale shape. The
method investigates the deep structure of watershed regions. The result is a partitioning of
the image at all scales simultaneously. The regions from this multi-scale partitioning can
then be used as building blocks in an interactive segmentation application.

The multi-scale watershed segmentation method presented here is due to Ole Fogh Olsen
[Olsen1997]. A similar approach has been presented in [Gauch1999]. In the following, the
method is presented step by step - at the end the whole method is collected into a single
M a t h e m a t i c a function.

13.7.1 Dissimilarity measure in scale.space

The borders between the regions should be located where there is a large contrast in the
image. The measure of the contrast - or the d i s s i m i l a r i t y m e a s u r e - can be defined according
to the specific application. For gray-scale images, a natural and simple definition of the
dissimilarity measure is the gradient magnitude squared. Below an example image and the
corresponding dissimilarity image is displayed at a few scales.

dissimilarity[im_, a_] :=gD[im, i, 0, Q]2 +gD[im, 0, I, u]2;


noisyShapes =
Import["blobs.gif"] [[1, 1]] + Table[Random[], ~128}, {128}];
disScaleSpace=Table[disslmilarity[noisyShapes, E~], {r, 1, 2.2, .3}];

Show[GraphicsArray[Flatten[{
ListDensityPlot[noisyShapes, DisplayFunction -> Identity],
(ListDensityPlot[disScaleSpace[[#]], DisplayFunction->
Identity]) &/@{1, 2, 3, 4}}]], ImageSize-> 400];

Figure 13.14 Left: The 128 by 128 test image displaying some simple shapes with S/M
ratio 1. Right: Gradient squared dissimilarity images for four scale levels:
o-= 1.6, 2.6, 4.1, and6.6 pixels.
13 Deep structure L watershed segmentation 232

13.7.2 Watershed segmentation

Imagine rain pouring down a hilly landscape. After hitting the ground, the individual drops
run downhill and gather in pools. Every time a drop hits a certain spot the drop will run into
the same pool. This implicitly partitions the landscape into regions of support for each pool.

A part of the landscape that leads water to a specific pool belongs to the catchment basin for
this pool. The borders between the catchment basins are the watersheds. These geographic
concepts was introduced in mathematics in [Cayley1859, Maxwell1870]. Aside: According
to Encyclopedia Brittanica, the term watershed is actually a drainage basin. Furthermore,
"the term has also been used synonymously with drainage divide, but this use is
discouraged". However, the computer vision community is traditionally not discouraged.

We use to this principle to partition the dissimilarity image into regions. In order to calculate
the catchment basins we. first determine the direction of steepest descent for each pixel in a
dissimilarity image. This is done by checking the difference with each of the four neighbors.

The function operate on images that have been Flatten'ed. Thereby the neighbors are
located at offsets 1, -1, xDim, and -xDim.

eheekDireetion[dislm_, bestDireetion_, offset_] := Module[


{disNeighbor, eheekNeighbor,
neighborDif, bestDif, bestOffset, disVal, disNeighborVal},
disNeighbor = RotateLeft[disIm, offset];
cheekNeighber[disVal_, disNeighborVal_, tbestOffset_, bestDif_}] := (
neighborD~Lf = disNeighborVal - disval;
If[neighberDif < bestDif, {offset, neighborDif}, {bestOffset, bestDif}]);
Thread[eheckNeighbor[disIm, disNeighbor, bestDirection]]];

bestDirection[disIm_, xDim_, yDim_] := Module[


{bestDir},
bestDir = Table[{0, 0}, {xDim yDim}];
bestDir = checkDirection[disIm, bestDir, 1];
bestDir = checkDirection[disIm, bestDir, -i];
bestDir = checkDirection[disIm, bestDir, xDim];
bestDir = checkDirection[disIm, bestDir, -xDim];
bestDir];

From a given pixel, the path given by the local direction of steepest descent are to be
followed until a local minimum is reached. In order to prepare for this we assign a unique
label to each local minimum. The local minima are the points with no direction of steepest
descent. The result is a "label" image with numbers at local minima and zeros elsewhere.

labelMinima[bestDir, xyDim_] := M o d u l e [
{labels, m i n i m a , b a s i n L a b e l s } ,
l a b e l s [] Table[0, {xyDim}];
minima= Flatten[Position[bestDir, {0, 0)]];
b a s i n L a b e l s [] Range[l, L e n g t h [ m i n i m a ] ] ;
labels[[minima]] = basinLabels;
labels];

The remaining pixels are then to be labelled with a region number. From a given pixel the
path defined by the directions of steepest descent are followed until a local minimum is
233 13.7Multi-scale segmentation

reached. The descent path is kept in a stack. When a minimum is reached every pixel in the
path stuck can be labelled with the region number of the minimum (stop reading until this is
trivial). Fu~hermore, it is not necessary to continue the pmh to a minimum if a pixel that has
already been labelled is encountered on the way. The label of this pixel is equivalent to the
label of the minimum that would eventu~ly be reached.

descentToMinima[bestDir_, minLabels_, xDim_, yDim_l := Module[


{xyDim, labels, j, stack, stackCount, pixel, basinLabel, i},
xyDim = xDim yDim; labels []minLabels;
For[j = i, j <= xyDim, j++, If[labels[[j]] == 0,
stack = Table[0, {2Max[xDim, yDim]}]; stackCount = 1; p i x e l = j;
While[labels[lpixel]] =:0, stack[[staekCount]] =pixel;
p i x e l + = b e s t D i r [ [ p i x e l , 1]] ; If[pixel< 1, p i x e l + = xyDim] ;
If[pixel > xyDim, pixel -= xyDim];
stackCount++]; basinLabel = labels[[plxel]];
For[i [] 1, i < stackCount,
i++, labels[[stack[[i]]]] =basinLabel]]] ;
labels];

The catchment basins can now be constructed through the use of the functions above in the
following manner. Since the images are F l a t t o n ' e d in the c~culations, the final image
with basin labels is P a r t i t i o n ' e d .

makeBasins[disIm_] := Module[
(xDim, yDim, bestDir, labels},
{xDim, yDim} []Dimensions[disIm];
bestDir = bestDirection[Flatten[disIm], xDim, yDim];
labels = labelMinlma[bestDir, xDim yDim];
labels = descentToMinima[bestDir, labels, xDim, yDim];
Partition[labels, xDim]];

The borders between the regions are more appropriate for illustrations than the region labels.
The function below finds the pixels where the neighboring labels are different and mark
these as borders:

makeBorders[labelImage_] := Module[
{left, down, border),
l e f t = l a b e l I m a g e - RotateLeft[labelImage];
d o w n = labelImage - Map[RotateLeft, labelImage];
border[l_, d_] := I f [ l = = 0 & & d ==0, 0, 1];
SetAttributes[border, Listable];
border[left, down]];

The basic watershed segmentation functions are illustrated with the dissimilarity scale-space
of the example image.
13 Deep structure L watershed segmentation 234

b a s i n S c a l e S p a c e = Table[makeBasins[disSealeSpaee[[i]]], {i, 1, 4 } ] ;
Show[GraphicsArray[Flatten[~
ListDensityPlot[noisyShapes, DisplayFunction -> Identity],
(ListDensityPlot[makeBorders[basinScaleSpace[[#l]],
DisplayFunction -> Identity]) & /@
{i, 2, 3, 4}}]], ImageSize-> 400];

Figure 13.15 Left: The 128 by 128 test image. Right: Watershed catchment basins at four
scale levels. Each object from the original image can be captured as a single region from
the catchment basins. However, the blurring affects the shapes of the objects. Compare
with figure 13.17.

We can see from the example above, that for each object in the original image, there is
indeed a corresponding catchment basin at some scale level. However, the blurting has
caused the captured shapes to be rounded as the edges are dislocated. In order to avoid this
effect we must "deblur" the shapes by linking down through scale.

13.7.3 Linking of regions

As scale is increased, the number of catchment basins decrease - the catchment basins
gradually merge into larger basins. Each catchment basin corresponds to a local minimum
for the dissimilarity measure. These minima form singularity strings like the ones showed in
the previous section on edge focusing. Therefore we could track these minima by a linking
process similar to the focusing in that section. As we saw in that section, this linking of
singularity points through scale-space is non-trivial. Therefore, we will here pursue another
approach where linking is based on regions instead of points.

The conceptually simple method is to link a region at a given scale to the region at the next
scale with the maximal spatial overlap.
As the blurting increases, the borders of the regions move slightly. However, the central part
of the region remain within the same area. The linking scheme is illustrated in figure 13.16.

The merging of the catchment basins defines a hierarchy of regions. Each region at a given
scale is linked to exactly one region at the next, higher scale level.

The linking is implemented by the function below. The parameters are the labelled basins for
two adjacent scale levels. The labels must be numbered consecutively from 1.

The function is somewhat complicated. First the two basin images are combined into a single
list s o r t e d O v o r l a p where each element is a number that signifies a "vote" - a fine scale
region x has one pixel [hat overlaps with a coarse scale region y. The number is constructed
such that all votes from one fine scale region are consecutive when the list is sorted. From
these votes the the coarse scale region with most votes is extracted for each fine scale region.
235 13.7 Multi-scale segmentation

Show[GraphicsArray[(Import["simple_link.jpg"],
Import["merge_link.jpg"]}], ImageSize -> 400];

Figure 13.16 Linking of regions across scale. As scale is increased, the regions become
more rounded and the borders move slightly. Left: The shape of the region simplifies but
the main part of the area overlaps. Right: Two regions merge into one. Both regions link
to the combined region since this is the region with maximal overlap.

The extraction is done through a linear pass of the s o r t e d O v e r l a p list.

The number of votes for the current coarse scale region from the current fine scale region is
counted during the pass. Whenever there is a change in coarse scale, it is recorded whether
the previous region got more votes than the one with most votes so far.

When there is a change is fine scale, the coarse scale region with most votes is the desired
region to be linked to - this is recorded in l i n k l ~ p . The pass over s o r t e d O v e r l a p is
done with a F o l d using the function n e x t L i n k V o t , for each element in the list. The
function returns a list of' labels. For each fine scale label number, the list contains the coarse
scale label with maximal overlap.

The linking functions (next page) are very "imperative" in programming style and not very
Mathematica. Indeed, it can be written much more elegantly. The function below has the
exact same functionality and is much shorter. However, there is one caveat. The function
below has runtime proportional to the number of regions - the functions above have runtimes
proportional to the size of the image.

This effectively means that the short version below is faster for linking at large scales where
there a few regions. Unfortunately, it is very much slower at low scales where there are many
regions. This is an example of how sometimes, in Mathematica, short and elegant is not
always preferable.

The linking process result in a mapping between the region labels at low scale and region
labels at the next scale. This linking tree can be used for the "region focusing" process. A
given region at high scale can be substituted by the regions at a lower scale that link to it.
This is done recursively through the scale-space of catchment basins.
13 Deep structure L watershed segmentation 236

linkBasins[fineBasins_, coarseBasins_] := Module[


{maxFine, maxCoarse, overlapIdx, sortedOverlap,
linkMap, LastLimit, bestCount, bestCoarse, c u r C o u n t I curCoarse},
maxFine = Max[fineBasins]; m a x C o a r s e = Max [coarseBasins];
o v e r l a p I d x = (fineBasins - 1) m a x C o a r s e + coarseBasins;
s o r t e d O v e r l a p = Sort[Flatten[overlapIdx]];
linkMap = Table[0, {maxFine}];
next [{fineLimit_, bestCount_, bestCoarse_, curCount_, curCoarse_}, idx_] : =
Module[
{coarse, fine}, c o a r s e = Mod[idx - 1, maxCoarse] + 1;
If[idx > fineLimit,
(. The series for one fine scale r e g i o n is done - r e g i s t e r p r e v i o u s *)
fine = Quotient[fineLimit, maxCoarse];
If[curCount 9bestCount,
linkMapl[[fine]] = curCoarse, linkMap[[fine]] = bestCoarse];
{ f i n e L i m i t + maxCoarse, 0, 0, 1, coarse},
(* Else:
Next coarse scale r e g i o n for same fine scale r e g i o n e n c o u n t e r e d *)
If [coarse == c u r C o a r s e t
(* C o a r s e scale is the s a m e so increment count *)
{fineLimit, bestCount, bestCoarse, c u r C o u n t + 1, coarse},
(* C o a r s e scale new so r e g i s t e r p r e v i o u s and start count from 1 *)
If[curCeunt >= bestCount,
{fineLimit, eurCount, curCoarse, 1, coarse},
{fineLimit, bestCount, bestCoarse, 1, coarse}]]]];
{lastLimit, bestCount, bestCoarse, curCount, curCoarse} =
Fold[next, {maxCoarse, 0, 0, 0, 01, sortedOverlap];
(* The last c o a r s e scale r e g i o n has not been r e g i s t e r e d -
check if it is best *)
If[curCount 9bestCount, linkMap[[maxFine]] = curCoarse,
linkMap[[maxFine]] = bestCoarse]; linkMap];

linkBasinsShort[fineBasins_, c o a r s e B a s i n s j := Module[
{linkvotes, uniquePairs, PairCount,
bestLink, possibleLinks, possibleCounts, bestPosition},
l i n k V o t e s = Thread[({#1, #21 &)[Flatten[fineBasins], Flatten[coarseEasins]]];
u n i q u e P a i r s = Union[linkVotes];
P a i r C o u n t = M a p [Count [linkVotes, # ] &, uniquePeirs] ;
bestLink[fine_] : = (
p o s s i b l e L i n k s = Position[uniquePairs, _ ? [# [ [1] ] == fine &) , 1, Heads ~ False] ;
p o s s i b l e C o u n t s = E x t r a c t [PairCount, possibleLinks] ;
b e s t P o s i t i o n = First[Position[possibleConnts, Max[possibleCounts]]];
u n i q u e P a i r s [ [ E x t r a e t [ p o s s i b l e L i n k s , bestPosition]]][[1, 2]]);
Mep[bestLink, Table[i, {i, Mex[fineBasins]}]]]

Below, we generate a scale-space of region labels, where the catchment basins are linked
down to the lowest scale level - the localization scale. This done from fine to coarse scale,
where the linking map provided by the previous function is used to map the catchment basins
into localized basins.
237 13.7 Mul•scale segmentation

l o c a l i z e d S c a l e S p a c e = T a b l e [ 0 i m a g e , {i, 1, 4}];
localizedScaleSpace[[1]] []basinScaleSpace[[l]];
For[level = 2, level <= 4, level++,
llnkMap =
linkBasins[basinScaleSpace[[level- 1]l, basinScaleSpace[[level]l];
localizedScaleSpace[[levell] =
linkMap[[#]] & / @ l o c a l i z e d S c a l e S p a c e [ [ l e v e l - 1 ] ] ]
Show[GraphicsArray[Flatten[{
ListDensityPlot[noisyShapes, DisplayFunction -> Identity],
(ListDensityPlot[makeBorders[localizedScaleSpace[[#]]],
DisplayFunction -> Identity]) & /@
{i, 2, 3, 4}}]], ImageSize -> 500];

Figure 13.17 Left: The 128x128 test image. Right: The localized gradient watershed
regions (to be compared with figure 13.15). The middle three images show the
intermediate results. The larger structures are segmented well. In particular the rectangle
has been segmented with sharp corners.

The regions at the highest scale now clearly correspond to the objects in the original image.
Notice how regions are merged into larger regions as scale increases - but the location of the
remaining borders remain fixed due to the linking.
The obvious improvement from figure 13.15 to figure 13.17 is due to the linking - or, in other
words, the deep structure.

13.7.4 The multi-scale watershed segmentation

The bits and pieces that produce the localized catchment basins can be put together in a
module. The complete function takes an image and the desired list of scales.

The scale levels are a parameter since the dimensions of the image, the noise level, and the
size of the objects of interest all affect the appropriate localization scale, number of scale
level and scale sampling spacing. The function is named g e n e r a t e B u i l d i n g B l o c k s in
order to emphasize the nature of the output.

The method does not provide a complete segmentation, where the image has been partitioned
into background and a number of objects.

Instead the method provides building blocks of varying sizes that can be used for an
interactive segmentation process. The user then selects the appropriate regions at the proper
scales that allow construction of the desired objects.
13 Deep structure L watersi~ed segmentation 238

generateBuildingBlocks[image_, scales_] := Module[


{borderScaleSpace, fineBasins,
coarseBasins, localizedBasins, linkMap},
borderScaleSpace = Table[0 image, {Length[scales]}];
coarseBasins =makeBasins[dissimilarity[image, scales[Ill]l];
iocalizedBasins = coarseBasins;
borderScaleSpace[[l]] = makeBorders[IocalizedBasins];
For[level = 2, level <= Length[scales], level++,
fineBasins = coarseBasins;
dissimImage = dissimilarity[image, scales[[level]]];
coarseBasins = makeBasins[dissimImage];
linkMap = linkBasins[fineBasins, coarseBasins];
localizedBasins = Map[Function[linkMap[[#]]], IocalizedBasins];
borderScaleSpace[[level]] = makeBorders[iocalizedBasins]];
borderScaleSpace];

The test image resulting in figure 13.17 is certainly not the most difficult segmentation task
The objects are quite easy to distinguish from the background. The example below with an
MR brain scan is more realistic - and more challenging.

scales = Table[1.4 t, It, 1, 8}]; m r = Import["mr128.gif"] [[1, 1]];


borderScaleSpace = generateBuildingBlocks[mr, scales];

Show[GraphicsArray[
{Flatten[{ListDensityPlot[mr, DisplayFunction -> Identity],
ListDensityPlot[borderScaleSpace[[#]] , DisplayFunction
Identity] & /@ {2, ~, 6, 8}}]}], ImageSize -> 500];

Figure 13.18 Left: An 128x128 MR brain scan. Right: The localized gradient watershed
regions for selected scales from a scale-space with 8 levels from o-= 1.4 to o-= 14.8
pixels. At low scale the finer details can be selected. At coarse scale the larger
anatomical structures, such as the brain, are available.

As the example illustrates, a number of building blocks must be selected interactively in


order to form most anatomical objects in the figure. It is not really surprising that the method
does not provide perfect partitioning of the brain anatomy. The multi-scale watershed
segmentation method is completely un-committed. Therefore it can not be expected to
perform perfectly for a highly specialized task.

The strength of the method is the generality. The method itself is un-committed towards any
special task and can be used for n-dimensional data.

The task specific knowledge is provided by a user in an interactive program. This approach
will generally not be competitive with highly specialized approaches for specific
segmentation task. However, for many segmentation task where no specialized methods
239 13.7 Multi-scale segmentation

exist, the multi-scale watershed segmentation method is relevant. The method have been
implemented with promising results for clinical use in medical imaging [Dam2000c].

The segmentation method can be optimized in a number of ways (see [Olsen19971 for
details). A possibly way of specializing the method towards specific tasks is to design
suitable dissimilarity measures. Another approach is to specialize the underlying diffusion
scheme - this is briefly discussed in the next section.

13.8 Deep structure and nonlinear diffusion


Linear Gaussian diffusion has a number of appealing theoretical properties. However, as we
will see in chapter 21, non-linear diffusion is superior for several specific applications.
Among these are edge detection [Perona1990], edge enhancement, and fingerprint
enhancement [Weickert1998a]. Generally, linear Gaussian diffusion is inferior in
applications where elongated structures are to be preserved during the diffusion.

The deep structure is defined by the diffusion. Non-linear diffusion allows adaptation of the
local diffusion to the local geometry of the image. Thereby the deep structure can be
specialized towards specific applications. It is therefore obvious to investigate whether some
of the non-linear diffusion schemes allow superior performance compared to linear diffusion
in deep structure applications.

This section will not give a comprehensive treatment of the effect of the diffusion scheme on
the deep structure. However, we do provide the following appetizer.

13.8.1 Non-linear diffusion for multi-scale watershed segmentation

Diffusion blurs across t]he borders of the catchment basins and ensures that the watersheds
are blurred away. The catchment basins merge into building blocks of increasing size as
scale increases. However, the linear diffusion scheme treats all image regions with the same
amount of blurring. This causes roundish shapes to be favoured as building blocks.

Non-linear diffusion schemes allow specification of which image features to "protect" during
the diffusion. For instance, the classical Perona-Malik diffusion scheme allows specification
of an edge threshold defined in terms of the gradient (see chapter 21). In areas where the
gradient is above this threshold, the blurting is diminished in order to preserve the edge. For
edge detection this allows simplification of the image while the desired edges are preserved.

In the multi-scale watershed segmentation method this is also interesting. The goal is to
provide the user with building blocks that capture the desired image objects in just a few
building blocks. For elongated objects this is problematic with linear diffusion. At low scale,
elongated objects will be split into a number of short pieces. In order for these pieces to
merge together into a single region a scale proportional with the length of the object is
needed. However, at such a large scale, the blurring across the object is substantial. The
borders of the elongated structures are likely to be blurring away due to influence from the
surrounding structures.
13 Deep structure 1. watershed segmentation 240

Figure 13.19 The effect of applying non-linear diffusion for multi-scale watershed
segmentation. Top and bottom rows are the effects of linear and a non-linear scheme
called GAN, respectively. A brain MR scan is segmented using the multi-scale watershed
segmentation method. Left images: originals. The next images show the ground truth
segmentation of the white matter tissue. The colored images are the status of the
segmentation. Blue areas are correctly segmented, green areas are within the ground
truth but not segmented, and red areas are segmented but not part of the ground truth.
The images to the right illustrate the watershed segmentation building blocks that have
been used for the segmentation. On average, with the GAN scheme in the segmentation
method, the user is required to supply less than half the number of user interactions
compared to linear Gaussian diffusion. Visually, the building blocks resulting from GAN
diffusion correspond much better with the shape of the ground truth.
GAN (Generalized Anisotropic Non-linear diffusion) is a scheme that has several
important diffusion schemes as special cases. Here, the parameters make the scheme
similar to the Perona-Malik scheme. The illustration is from [Damon1995, Damon1997].

Non-linear diffusion minimizes the diffusion across the edges: edges of elongated structures
can survive to higher scales and thereby enable merging of regions inside the objects. For the
multi-scale watershed segmentation the performance has been evaluated for a number of
diffusion schemes (among these mean curvature motion and Weickert's anisotropic
nonlinear diffusion schemes). The results reveal that, compared to linear diffusion, a scheme
similar to the Perona-Malik scheme allow the desired objects to be captured using less than
half as many actions [Dam2000b]. This is illustrated in figure 13.19. Similar results have been
established for the Hyperstack segmentation method [Niessen1997dl.
14. Deep structure II.
catastrophe theory
Erik Dam and Bart M. ter Haar Romeny

14.1 Catastrophes and singularities


The previous chapter illustrates a number of approaches that explore the deep structure.
However, there are a number of caveats. The edge focusing technique implicitly assumes that
the edges for the signal can be located at the adjacent lower scale level in a small
neighborhood around the location at the current scale. As mentioned, no formal scheme for
defining the size and shape of the neighborhood is presented. Furthermore, this method
ignores the problems encountered when edge points merge or split with increasing scale.
Analogously, the multi-scale watershed segmentation depends on the behaviour of the
dissimilarity measure singularities (the notion of dissimilarity is defined in chapter 13,
section 6.1). Even though the linking of the watershed catchment basins is quite robust due
to the matching of regions (opposed to tracking of points as in the edge focusing paradigm),
the linking in the presence of merges and splits of regions is not explicitly established above.
Without knowledge of how the dissimilarity singularities can behave in scale-space, we can
only hope that the method will work on other images than the ones used for the illustrations.
The finding of explicit schemes for linking in the neighborhood of changes in the singularity
structures requires explicit knowledge of the changes. A change in the singularity structure in
denoted a catastrophe. Catastrophe theory (or with a broader term: singularity theory) is the
theory that analyses and describes these changes. Catastrophe theory allows prediction of
which changes in the singularity structure can be expected. Thereby the schemes that involve
the singularities can be designed to detect and handle these events as special cases.

Actually, this analysis was done for the multi-scale watershed segmentation method in
[Olsen1997].

The field of catastrophe theory is vast and rather complicated. The focus of this introduction
is to give a condensed introduction of the central definitions, with an intuitive understanding
of the effects we often observe in 'deep scale-space'.
14. Deep structure H. catastrophe theory 242

14.2 Evolution of image singularities in scale-space


An important property of linear scale-space is the overall simplifying effect of increasing
scale.
In general, this implies that the number of appearances of a given image feature decreases as
scale increases. In particular, this is the qualitative behavior for the image singularities -
blurred versions of an image will in general contain less singularities than the original one.
As mentioned in chapter 2 section 2.8, this notion is formalized by Florack [Florack1992a],
which leads to a prediction on the number of singularities in n-dimensional signals/images.
Specifically, the number of singularities can be expected to decrease with a slope of -1 for
1D signals and -2 for 2D images (in general: - n for n-D signals; see chapter 1 for the
reasoning to derive this).
More precisely, when the scale levels are generated with the usual exponential sampling
cr = e e ~ the logarithm of the number of singularities decrease with these slopes as a
function of the scale parameter r.
This is illustrated for the "blobs" image from the scale selection section in the previous
chapter. For simplicity we count the number of maxima instead of the number of
singularities (then we can use the function nMaxima). Following the argument of Florack,
the relative decrease in the number of maxima is equivalent to the decrease in the number of
singularities.

<< F r o n t E n d V i s i o n ' F E V " ;


c o u n t M a x i m a [im_] := M o d u l e [ {p, d = D e p t h [im] - 1},
p =Times@@Table[(Sign[im-Map[RotateLeft, im, {i}]] +1)
( S i g n [ i m - M a p [ R o t a t e R i g h t , im, {i}] ] + i) , {i, 0, d - i}] /4d;
C o u n t [Flatten [p] , 1] ] ;
noisyblobs =
I m p o r t [ " b l o b s . g i f " ] [[I, i]] + 1 0 T a b l e [ R a n d o m [ ] , {128}, {128}] ;

l e v e l s = 15; step = 0.15;


d a t a = Table [ {step t, c o u n t M a x i m a [nb [t] = gDf [noisyblobs, 0, 0, E stept ] ] },
{t, levels} ] ; D i s p l a y T o g e t h e r A r r a y [
Append [ListDensis {nb[l], nb [8] , nb[15] ], L o g L i s t P l o t {data,
P l o t J o i n e d - > True, A x e s L a b e l - > {"t", "N"}]], I m a g e S i z e - > 400] ;
Print {"Slope = ", C o e f f i c i e n t { F i t { L o g { d a t a ] , {I, t}, t], t] ] ;

Figure 14.1 The evolution of the number of singularities for a set of noisy 2D blobs.
Images blurred with 0-= e ~ = 1.16, 0-=3.32 and o-= 9.49 pixels. The observed slope
is close to the predicted value of -2.

The blob image from figure 13.3 is used with noise added (S/N ratio = 1/10). First we define
the scale levels used (such that o- = E e s t e p r where the s t e p ensures sufficiently small scale
243 14.2 Evolution of image singularities in scale-space

steps, and e = 1 ). We display the lowest, middle and highest levels of the selected range of
15 scales and plot the logarithm of the number of maxima as a function of s e e p t. The
slope is calculated with a linear least square F i t .
The overall effect of blurring signals and images is simplification. This is examplified by the
decrease of maxima above. However, this general notion reveals nothing about h o w the
singularities disappear. More specifically, it gives no insight into the local structure of the
signals and images at the specific point in scale-space where a singularity is annihilated.
Furthermore, we get no information about whether singularities are created as well. In order
to investigate these matters we need a bit of mathematics - introduced in the following
through some central concepts from Catastrophe Theory.

9 Task 14.1. Check the expected decrease in the number of singularities for a 1D
signal as it was done for a 2D image above. Use the 1D noisystep signal.

14.3 Catastrophe theory basics


The field of catastrophe theory is quite extensive and complicated. This introduction focuses
on giving an intuitive understanding of the concepts most related to computer vision and
image processing. Therefore, the presentation is also somewhat less strict than possible. For
a comprehensive introduction see [Gilmore 1981 ].

The singularities are central feature points of an image - or more general for a function. The
singularities alone offer a good qualitative description of the structure of a function. When a
function undergoes an evolution it is therefore central to capture where the set of
singularities change, in order to analyze the evolution. These points are denoted catastrophes
since this is where the qualitative structure changes.
In order to describe these events properly, a few definitions are needed. They are presented
quite briefly - the concepts are then illustrated by a number of examples.

14.3.1 Functions

For a smooth ( - infinitely differentiable, indicated with C ~ ) function f the parameters are
divided into n state and m control parameters: f ( x l . . . . x , , Cl . . . . Cm) c C~176n+m, R). For
the intuitive understanding, think for the concept of scale-space of the state parameters as
spatial coordinates, and think of a single control parameter, namely scale.

14.3.2 Characterization of points

For a smooth function f ( x l . . . . xn, Cl . . . . Cm) @ COO(•n+m, R) a given point p c R n+m is


either:

Regular : 3 le [l..n] s u c h t h a t ~o f # 0
Morse Singularity : --aafxi= 0 and D e t ( ~O ) a f #

Catastrophe : af = 0
axi and D e t ( ~ 0)f = 0
14. Deep structure 11. catastrophe theory 244

In the equations above, tensor notation is used for the spatial parameters with subscripts i and
j (but not for l).

Note that a regular point is only required to be a singularity with respect to the state (or
spatial) parameters. Singularities and catastrophes are found at locations where the gradient
is zero, so at horizontal locations in the image intensity landscape. A catastrophe differs from
a singularity in that the second order structure has a degenerate Hessian matrix, i.e. the
determinant of the Hessian matrix vanishes, and the Hessian matrix is thus singular in
catastrophes.
Morse singularities are non-degenerate singularities in the state (or spatial) parameters.
When the determinant of the Hessian matrix for the state parameters is vanishing the
singularity becomes degenerate.
A non-degenerate singularity is stable. This means that a slight pertubation of the function
will not change the local qualitative structure of the function - there will still be a singularity
of the same type near the original with a value close to the original.
Degenerate singularities are not stable - a slight pertubation can cause a change in the local
structure of the function. Such a pertubation could be caused by a slight change in the
control parameters. Specifically, in scale-space the singularity structure changes at
degenerate singularities when the scale is changed. These are the catastrophe points where
creations or annihilations of singularities occur.

14.3.3 Structural equivalence

Two functions are locally structurally equivalent at a point if a diffeomorphism (a smooth


invertible function with smooth inverse) exists such that a change of the coordinate system
for one function with this diffeomorphism will make the functions equal in a neighborhood
around the point.
Two functions are globally structurally equivalent if they are locally structurally equivalent
at all points.

We will not formalize this definition in mathematical notation. The key point is that these
definitions imply that two functions are structurally equivalent if their singularity structures
are corresponding - the topological ordering and the types of the singularities are equivalent.

Slight pertubations of a function will in general leave it structurally equivalent with itself.
The singularities move a bit and change value, but the topological structure remains the
same. However, in the presence of catastrophe points, a slight pertubation will change the
singularity structure and the function no longer remains structurally equivalent to itself.

14.3.4 Local characterization of functions

Analogous to the characterization of points into three classes, the local structure of a function
is characterized by the following theorems. Here, f is a given smooth function
f ( x l . . . . Xn, el . . . . Cm) C C~176n+m, JR).
245 14.3 Catastrophe theory basics

Implicit Function Theorem:


At a given regular point the function f is locally structurally equivalent with the function g
where

g(xl .... x n , Cl . . . . Cm) = xi

In other words, the Implicit Function Theorem states that at a regular point the function is
locally equivalent with its tangent plane.

The Morse Lemma:


At a given Morse singularity point the function f is locally structurally equivalent with the
[unction g where

1
g(xl . . . . x n , Cl . . . . Cm) -- ~. x i x j

The Morse Lemma states that at Morse singularity points the local structure is defined by the
second order terms.

The Splitting Lemma:


At a given catastrophe point for the function f the Eigenvalues for the Hessian matrix can be
ordered by absolute value with the first d being zero - corresponding to the degree of
degeneracy. The function f is then locally structurally equivalent with the function g where

9., ..~ .at_ t/ ~n 1


g(Xl . . . . Xn, C l , Cm) =gn m (xl, xa) ~,i=d+l j=d+l ~ xi x j

The Splitting Lemma states that the function can be split into two parts: A non-Morse part
and a Morse part. Accordingly, the parameters are split into "bad" and "good" parameters.
The bad parameters correspond to the degenerate directions of the Hessian matrix. However,
the Splitting Lemma does not characterize the local structure of the non-Morse part of the
function. This is done by a theorem by the French mathematician Ren6 Thom (1923-).

14.3.5 Thorn's theorem

Let f be a given smooth function f ( x l .... xn, c/ .... Cm) ~ C~~ n+m, O~). At a catastrophe
point the eigenvalues for the Hessian matrix can be ordered by absolute value with the first d
being zero. The function is then locally structurally equivalent with a function g where

g(xl .... xn, Cl . . . . Cm) = CatGerm(d) +Perturb(d, m) + OYXi Xj

If m<_5 then CatGerm(d) is one of the Catastrophe Germs and Pert(d,m) is the corresponding
Pertubation listed in the table below:
14. Deep structure H. catastrophe theory 246

Name Nickname m d CatGerm (d) Perturb (d, m)


A2 Fold 1 1 x3 ci x
A+ 3 Cusp 2 1 • X4 C1 X + C2 X2

A4 Swallowtail 3 1 x5 cl x + c2 x 2 + c3 x 3

A+5 Butterfly 4 1 • 6 cl x + C 2 x 2 + C 3 x 3 + C4 x 4

A6 ~ 5 1 X7 C1 X + C2 X2 + C3 X3 + C4 X4 + C5 X5

D_ 4 Elliptic umbilic 3 2 x 2 y _ y3 c I x + c I y + c 3 y2
D+4 Hyperbolic umbilic 3 2 x 2 y + y3 c I x + c I y + c 3 y2

D5 Parabolic umbilic 4 2 x 2 y + y4 c I x + ci y + c3 x 2 + c4 y2

D 6 D 5 2 x 2 y _ y5 c I x + c 2 y + c 3 x 2 + c 4 y2 + c5 y3

D+6 [] 5 2 x2y+y 5 cl x + c 2 y+c3 x2 + c 4 y2 + C 5 y3

E_+6 [] 5 2 x 3 + y4 c I x + c2 y + c3 x y + c4 y2 + c5 x y 2

Figure 14.2 Table of elementary catastrophes for m _<5. The names in the first column
were originally proposed by Thom [Thorn1975]. The nicknames come from their visual
appearance (see MathWorld [http://mathworld.wolfram.com/Catastrophe.html] with
interactive plots, Gray [Gray1993], Bruce & Giblin [Bruce1984] and Scanns
[Scanns2000]). The Cl to c5 factors are also called the control factors.

The Catastrophe Germ is the local structure of the function for the specific set of control
parameters at the catastrophe point.

The Pertubation terms determine how the function behaves when the control parameters
vary (in a neighborhood around the catastrophe point).

14.3.6 Generic property

A property for a system is generic if an open, dense subset of the system possesses the
property. In probabilistic terms, a property is generic if it is possessed with probability one.

In this context, the term generic is used to characterize which catastrophes are generic for
images or for differential operators on images - these are the so-called generic events.

14.3.7 Dimensionality

Analysis of the dimension of the involved spaces can often determine whether a property is
generic. As an example, lets look at the set of singularities in an n-dimensional image. A
singularity point is determined by all n first order partial derivatives equaling zero. This
means that we have n conditions in a n-dimensional space. Under the assumption that these
conditions are independent, the space where the conditions are met is a n - n = 0
dimensional space. This means that the set of singularities contains only isolated points - or
more precisely, a singularity point is generically isolated. We know that for a 2D image the
singularities are the maxima, minima and saddle points of the intensity landscape, which are
easily recognized as isolated points.
247 14.3 Catastrophe theory basics

What about catastrophes? The potential catastrophe point is required to be a singularity (n


conditions) and then the Hessian is required to have a degenerate direction (an extra
condition). This means that n + 1 conditions are to be met resulting in an n dimensional
space. Or in other words: generically, a given image contains no catastrophe points.

In scale-space we have an extra parameter - the scale.

This means that in scale-space the set of singularities is generically a 1-dimensional space
(the 'path' of the singularity over scale) and that catastrophes do generically occur in isolated
points.

This very short and informal treatment of the dimensionalities of the singularity and the
catastrophe sets is only meant as an appetizer. In order to present the above considerations
properly mathematically, the image and the set of conditions should be represented as
manifolds in jet-space where the independency of the terms can be investigated through the
concept of transversality. However, this is far beyond the scope of this introduction - for a
richly illustrated interactive tutorial in Mathematica see [Sanns2000], for a more
comprehensive treatment see [Gilmore1981, Florack1994b, Olsen2000, Dam2000,
Bruce1984|.

14.3.8 Illustration of the concepts

The example below shows a function with two state parameters x and y and one control
parameter a. The local structure differs for three choices of the control parameter a: To the
left the function has a saddle and a local minimum, in the middle a single saddle, and to the
right no singularities at all. In general, the function has two singularities for negative values
of a, one singularity for a = 0, and no singularities for positive a.

f [ x , y_, a_] := x 3- 24 y2+ a x; DisplayTogetherArray[


Plot3D[f[x, y, #], {x, -30, 30}, {y, -30, 30}l & /@ {-300, 0, 300},
ImageSize -> 480];

Figure 14.3 The evolution of the Fold catastrophe as a function of the control parameter
a. As the control parameter changes, the structure of the function changes. For negative
values of the control parameter, the function has a local maximum and a saddle point.
For positive values of the control parameter, the function has no singularities in the
neighborhood of the observed point. The catastrophe occurs for control parameter equal
to zero.
14. Deep structure H. catastrophe theory 248

Since the singularity structure is different for positive and negative values of a, there must be
a catastrophe point for a = 0.

And reassuringly, for a = 0 the singularity at (x, y) = (0, 0) is a catastrophe point as well.
This is easily verified mathematically: for a = 0 the determinant of the Hesssian is 6 x ( - 4 8 ) ,
which is zero at the singularity point (x, y) = (0, 0).

The Splitting Lemma states that we can split the state parameters into "good" and "bad"
parameters (locally around the catastrophe point (x, y, a ) = (0, 0, 0) where the function
changes singularity structure for increasing a).

Actually, it need not be the original state parameters that are split - the degenerate direction
need not be aligned with the original axes. However, in this case the "bad" parameter is the x.
As stated by the splitting lemma, the function can be split into a part with the bad
parameters, and a part with the good parameters in the shape of a sum of second order
monomials (a monomial is a polynomial consisting of a product of powers of variables, e.g.,
x, x y3, x 4 y z 2 ' etc). The non-Morse part of the function can be recognized from the list of
catastrophe germs in Thorn's Theorem. It is known as the fold catastrophe. It should be noted
that - since there is only one control parameter - this is the only generic catastrophe.

fold[x_, a_] : = x 3 + a x ;
DisplayTogetherArray [
P l o t [ f o l d [ x , #], {x, -30, 30}, P l o t L a b e l - > " a = " < > T o S t r i n g [ # ] ,
A x e s L a b e l -> {"x", ....}, Ticks -> None] & /@
{-300, 0, 300}, I m a g e S i z e -> 440] ;

~-300 a=~ a:

Figure 14.4 The "bad" parameter x from the previous example. The original function can
be split into the Morse and the non-Morse part. The non-Morse part is structurally
equivalent to the fold catastrophe.

The canonical fold catastrophe is further analysed below. We derive the singularity and
catastrophe sets for the function fold(x, a):

Solve[Ox fold[x, a] == O, {a}]


Solve[{Ox fold[x, a] == O, Ox,x fold[x, a] --= 0}, {a, x}]
{ { a - ~ - 3 x2}}

{{a-~ O, x-~ 0}}

The above description of the singularity set is used to plot the fingerprint or the signature of
the function - the position of the singularities against the value of control parameter:
249 14.3 Catastrophe theory basics

Plot[-3 x2, {x, -1, 1}, ImageSize-> 150];

-II - 0 o~" 5 0.5 1

Figure 14.5 The fold catastrophe. The catastrophe at a=0 is where the two singularity
strings meet and annihilate. A point where several singularity strings meet is also
denoted a bifurcation. This is the fingerprint of the fold catastrophe. Compare with the
catastrophes in the fingerprint in figure 14.11.

The cusp catastrophe from Thorn's theorem is also commonly encountered. The canonical
germ and pertubation that give rise to the cusp catastrophe is x 4 + cl x 2 + c2 x. Since there
are two control parameters, this is slightly more complicated than the fold catastrophe.

9 Task 14.2 Illustrate the fingerprint for the cusp catastrophe (like the illustration in
figure 14.5 for the fold catastrophe).

Clear[cusp]; cusp[x_, c1_, c2_] := x4+c2x 2 +clx;


Show[GraphicsArray[
Table[Plot[cusp[x, cl, c2],
{x, -10, 10}, DisplayFunction -> Identity, Axes -> None],
{c2, 0, -30, -15}, {cl, -80, 80, 40}]], ImageSize->440];

5 5 5Z/Z/

Figure 14.6 Various perturbed shapes for the cusp catastrophe germ. It has two stable
states: one with a single minimum and one with a minimum,maximum,minimum triple (or
the same states with maximum and minimum switched). Conceptually, one control
parameter allows transition between the states by "tilting" the two minima until one
minima is annihilated with the central maxima (or the reverse process) - these events are
fold catastrophes. The other control parameter allows transition between the states by
letting the two minima approach each other until they are merged together with the
maxima into one single minimum. The cusp catastrophe point is where both control
parameters are zero.

Here are a few other illustrations and some plot commands to study them (see also the
package I r a p l i c i t P l o t 3 D .m in MathSource: www.mathsource.com):
14. Deep structure H. catastrophe theory 250

<< FrontEndVision'ImplicitPlot3D';
DisplayTogetherArray[{
cusp = ContourPlot[x 3-y2, {x, -.5, 4},
{y, -6, 6}, C o n t o u r s - > {0}, P l o t L a b e l - > "Cusp"],
c u s p 3 D = ContourPlot3D[4x 3 + 2 u x + v , {u, -2.5, 2}, {v, -2, 2},
{x, -1, 1}, P l o t P o i n t s - > 6, P l o t L a b e l - > "Cusp3D",
ViewPoint -> {-4.000, 1.647, 2.524}],
swallowtail = P a r a m e t r i c P l o t 3 D [ { u v 2 + 3 v 4, -2 u v - 4 v 3 , u},
{u, -2, 2}, {v, -.8, .8}, BoxRatios -> {i, i, i},
P l o t L a b e l - > "Swallowtail", V i e w P o i n t - > {4.000, -1.390, -3.520)]I,
ImageSize-> 420, G r a p h i c s S p a c i n g - > {0, 0}];

Figure 14.7 The cusp (2D and 3D) and the swallowtail catastrophe. From the wonderful
booklet by [Sanns2000]. See also:
MathWorld [http://mathworld.wolfram.com/Catastrophe.html].

14.4 Catastrophe theory in scale-space


As stated earlier, the natural application of catastrophe theory towards scale-space theory is
to view the spatial parameters as state parameters and the scale as the single control
parameter. The differentiability of the scale-space even ensures that the functions are
smooth. However, it is slightly more complicated than that. In scale-space theory, there is a
fixed connection between the spatial parameters and the scale parameter given by the
diffusion equation: Lt = Lxx + Lyy. This gives a severe restriction compared to the general
space of functions with one control parameter.

Thereby, the canonical catastrophe germs listed in Thorn's theorem might not apply to scale-
space images. Fortunately, the work of James Damon (UNC) reveals that we can in fact still
apply similar results to Thom's theorem for all practical purposes [Damon1995, Damon1997].

Another caveat in scale-space singularity analysis is the image. Images from natural scenes
behave nicely, but artificial test images often possess nasty properties. Two typical examples
are symmetry and areas with constant intensity. Across a symmetry axis singularities appear
in pairs. This means that the expected catastrophes appear in variants where two (or more)
symmetric catastrophes occur simultaneously at the same place (an example of this is shown
in the next section). This apparently causes non-generic catastrophes to appear. Actually, it is
not the catastrophes that are non-generic - symmetry in images is non-genetic.

Areas with constant intensity in artificial test images can cause unexpected results. In theory
this should cause no problems since the areas no longer have constants intensity at any given
scale > 0. In practice, implementations with blurring kernels of limited size will however
251 14.4 Catastrophe theory in scale-space

leave areas with constant intensity (gradually smaller with increasing scale). A simple
consequence is apparent areas of singularities - as opposed to the expected isolated points.

14.4.1 Generic events for differential operators

Thom's theorem states that the only generic catastrophe with only one control parameter is
the fold.
At first hand, we would therefore not expect to encounter any other catastrophes in scale-
space, where we only have scale as control parameter. However most applications, like edge
detection, examine singularities for differential operators and not singularities for the raw
image. Depending on the differential operator this induces other catastrophes as well.

An example of higher order singularities induced by a differential operator is illustrated in


the following. The operator is the square of the first derivative of the original image (the
gradient squared). In order to understand why this simple operator can induce other generic
catastrophes we look at the Taylor series expansion of a one-dimensional function f (around
0 for simplicity) and the derivative of this expansion squared.

Clear[f] ; Series[f [x], {x, O, 4}l

f[0] +f'[0] x + y 1 f,,[0] x 2 + ~ 1 f<3) [0] x3 + T 4 fI4) [0] x 4 + O [ x ] 5

When we look for singularities the zeroth order term is not interesting. Intuitively we can use
the spatial parameter x as a free parameter that allows us to find points p where fx(P) is zero.
Therefore we find singularities in generic signals (and images). When we have a control
parameter, we can turn this extra "knob" until we find points where fxx(P) is zero as well.
Then we find catastrophe points where the first and second order structures are vanishing -
the canonical fold catastrophe germ. But since we have no more knobs to turn, we cannot get
rid of the higher order structure and therefore higher order catastrophes are non-generic.

D[Series[f[x], {x, 0, 5}], x] 2

f'[0] 2 + 2 f'[0] f"[0] x + (f"[0] 2 + f'[0] f<3) [0]) x 2 +


(f"lo] f~> [o] + y1 f,[0] f(41 [o]) x 3 +

fO) [012 + ~ f'" [0] f(4) [0] + f'[0] f(5) [0] x4 + O [ x ] s

The situation is somewhat different for the derivative signal squared. Again, we can use the
two free parameters (x and the control parameter) to find points p where fx(P) = fxx(P) = O.
1
The remaining part of the derivative of the signal squared is then 3- fxxx(P) z x 4 + O(xS). We
see that the third order structure completely disappears and we therefore generically have the
cusp catastrophe present in the derivative of the signal squared. It is also worth noticing that
this implies that for each fold in the original signal, there is a cusp in the derivative of the
signal squared.

We first demonstrate these findings in scale-space by looking at a simple signal composed of


a few sines and cosines. First the definition of the signal and the derivative squared:
14. Deep structure H. catastrophe theory 252

f r o m : -100; to = 400; r e s o l u t i o n = 25;


t
f[t__] := Sin[ r e s o l u t i o n ] +
t t
Cos[1.2 +0.9] +Sin[1.3 + 0.6];
resolution resolution
DisplayTogetherArray[Plot[#, {t, from, to}] & / @ { f [ t ] , f'[t]2},
ImageSize ~ 470];

2 00125
~1') ~~4O0 0015
001
00075

-2 -Io0 100 2o0 300 4oo

Figure 14.8 A simple signal and the corresponding derivative squared.

We investigate how this signal evolves in scale-space by convolving with a Gaussian using
an exact Fourier domain implementation:

1 X2
gausskernel[x , a__] := - - ~.~pr- _-=I,

signal [~_, x_] : S i m p l i f y [


InverseFourierTransform[FourierTransform[gausskernel[x, o], x, ~]
F o u r i e r T r a n s f o r m [ f Ix] , x, ~] , ~, x] , a > 0] ;
d s i g n a l s q u a r e d [ u _ , x_] = (axsignal[a, x])2 / / C h o p ;
DisplayTogetherArray [
{ P l o t [ s i g n a l [ # , x], {x, from, to}, T i c k s ~ {Automatic, None},
P l o t L a b e l ~ "a = " < > T o S t r i n g [ # ] ] & / @ { 1 , 25, 40, 50} ,
P l o t [ d s i g n a l s q u a r e d [ # , x], {x, from, to},
T i c k s ~ {Automatic, None} ] & /@ {1, 25, 40, 50} } , I m a g e S i z e ~ 500] ;

4 00
c~ = 1 = 25 ~ = 40 a = 50

~o~oV4oo

-i00 100200300400 -i00 100200300400 -100 100200300400 -100 100200300400

Figure 14.9 Top row: the original signal at scales o-= 1,25, 40, 50. At the center of the
signal a maximum and a minimum melt together as scale increases and are annihilated
in a fold catastrophe. Bottom row: the derivative squared at the same scales. A
{minimum, maximum, minimum}-triple is annihilated into a single minimum in a cusp
catastrophe located where the fold is in the original signal.
253 14.4 Catastrophe theory in scale-space

In order to see that these nice derivations actually hold for real discrete images as well, we
inspect a random signal. First the signal and the derivative squared is constructed.

r a n d o m = Table[Random[I, {t, from, to}];


scaledrandom[G_] := gDflD [random, 0, G] ;
drandomsquared[a_] := gDflD[random, 1, a]2;

We then illustrate the evolution as scale increases:

view[func_] :=
ListPlot[func[#], PlotJoined ~ True, Ticks ~ {Automatic, None},
P l o t L a b e l ~ "~ = "<>ToString[#]] & / @ { 1 , 25, 40, 50};
DisplayTogetherArray[{viewlscaledrandom], view[drandomsquared]},
ImageSize ~ 500];

o-=1 ~=25 ~=40 ,r=~)

100 200 300 400 500 100 130 300 400 500 % -00 300 4130 500

o-=1 ~=25 ~=do ~=50

10(3 200 3130400 500 i00 200 300 4130 500 I00 200 300 400 500

Figure 14.10 The scale-space evolution for a random signal and the derivative signal
squared. The signals are displayed at scales o- = 1,25, 40, 50 as in figure 14.9.

Finally we display the fingerprints for the random signal and the derivative of the signal
squared. The fingerprints are slightly cluttered at low scale due to the inherent randomness of
the signal but the fold/cusp pairs are obvious.

fingerprint[signal_, maxscale_] := Module[{scsig, scLx, sig},


scsig = Table[signal[E~L~176176 {t, 50, 200}];
s c L x = scsig-Map[RotateLeft, scsig];
sig = Map[(RotateLeft[#] -#) &, Sign[scLx]];
ListDensityPlot[sig, ImageSize ~ 440]];
fingerprlnt[sealedrandom, 60];

Figure 14.11 Fingerprints for the random signal and the derivative signal squared. For
each fold catastrophe in the signal there is a corresponding cusp catastrophe in the
derivative signal squared.
14. Deep structure IL catastrophe theory 254

Thom's theorem states that the only generic catastrophe for a function with only one control
parameter is the fold. However, as we have seen in the previous section, when we construct
differential expressions from the original genetic function we can induce higher order
catastrophes as well.

A simple example is the derivative squared where the cusp catastrophe is genetic.

fingerprint[drandomsquared, 60];

Other differential expressions can induce catastrophes of even higher order. Therefore, it is
necessary to analyze each differential expression individually in order to reveal the genetic
events for the singularities as scale is increased.

14.4.2 Generic events for other differential operators

Other corners measures are studied in [Sporting 1998a].

Among the investigated differential operators are the gradient magnitude used above as
dissimilarity measure for the watershed segmentation. The generic events for this operator
are the fold and the cusp - for both annihilations as well as creations are genetic.

Show[Import["nongeneric_isocat.jpg"], ImageSize ~ 230];

Figure 14.12 The singularities for the isophote curvature located on a specific isophote
The red/blue curves are the maxima/minima. These singularities can be perceived as
corners. The singularity strings are followed up through scale-space revealing non-
generic catastrophes. Both the blue ring and the catastrophe at the top, where eight
singularities are annihilated, display highly non-generic behavior. This is due to the non-
generic test image - a perfect square. Illustration from [Dam1999].
255 14.4 Catastrophe theory in scale-space

This is derived in [Olsen1997]. Note, however, that creations only are generic in 2D and
higher dimensions. Creations in 1D can never occur.

The generic events for the isophote curvature are studied in [Dam1999]. Again, the generic
events are the annihilation and creation fold and cusp catastrophes. From this work we also
have an ensemble of apparently non-genetic catastrophes due to symmetry in the test images.
An example is displayed in figure 14.12.

14.4.3 Annihilations and creations

In traditional catastrophe theory there is no preferred orientation for the control parameters.
When the singularity structure for a function changes new singularities are as likely to appear
as old ones are to disappear. Annihilations and creations are simply reverse events between
different states for the local structure of a function. This is not the case for linear scale-space
functions where scale is the control parameter. We only study the evolution of the functions
for increasing scale. As mentioned earlier, increasing the scale results in a general
simplification of the image functions. This implies that singularities are annihilated much
more often that they are created.

As an example, see the fingerprint for the noisystep signal in figure 14.11. The figure shows
fold annihilations - but no creations at all. This is in fact no coincidence. For a 1D signal
(with no special properties or symmetries), creations are non-generic in scale-space. For
images of dimension 2 (or higher) creations are generic. The famous example from the
literature, where this was first discovered by Lifshitz and Pizer [Lifshitz1990], is the 'dumb-
bell example' (see figure 14.13).

dh = Tahle[Chop[Xxp[ (x-~)2 y2
2 ] ~P[ 2 <sinExl +.~>: ]]'
{y, -&, &, .2}, {x, O, 2~, .04}];
DisplayTogetherArray[
ListPlot3D[#, ViewPoint-> {1.489, -2.605, 1.968}, Mesh->False] &/@
{db, gD[db, 0, 0, 3]}, ImageSize-> 300];

Figure 14.13 The dumb-bell image (left) consists of two blobs with a narrow bridge in
between with a maximum. Blurring (right, o- = 3) has much more effect on the bridge, so
two new maxima and a saddle are created under blurring. Illustration from [Lifshitz1990].
14. Deep structure II. catastrophe theory 256

In general for catastrophes from images or differential operators on images, annihilations are
much more common than creations [Kuijper2002a, Florack2000b, Kuiper2002b].
Furthermore, the scale-range where singularities can be observed resulting from a creation in
scale-space is generally relatively short.

Another interesting aspect about singularities resulting from creations is their "validity". In a
sense, the singularities that arise from creations have no originating structure in the original
image. The feature corresponding to such a singularity can not be linked down to its "cause"
in the image. Therefore, the singularities arising from creations are discarded in certain
applications. An example of this is actually the linking scheme in the multi-scale watershed
segmentation. Since the regions are linked from fine to coarse scale, regions resulting from
creations never enter the linking tree. They are simply ignored on purpose.

14.5 Summary of this chapter


Catastrophe theory is the theoretical foundation that allows us to analyze the evolution of the
singularities in scale-space. The theory allows prediction of the behavior of differential
operators in scale-space. Ideally, any deep structure application should therefore include a
comprehensive analysis of the generic behavior.

However, catastrophe theory is quite complicated. Therefore a strict theoretical approach has
little appeal for the general image analysis community. Fortunately, a number of central
results can be applied without more than a certain intuitive understanding of catastrophe
theory. The purpose of this section was to provide a first step towards such a basic intuitive
understanding.
15. Deep structure III.
topological numbers
Bart M. ter Haar Romeny and Erik Dam

In the previous chapters we detected and followed the singularity strings through scale-space
in an ac hoc manner. In the scale selection section, the detection of maxima was done by
simply looking for pixels with values larger than its neighbors. In the edge focusing section,
minima and maxima were detected (and distinguished) for a 1D signal by looking at sign
changes for the derivative signal. Furthermore these extrema were tracked down through
scale-space by simply looking for extrema in a close neighborhood in successive scale levels
below. Finally, in the multi-scale segmentation section, the dissimilarity minima were
represented indirectly by the catchment basins, and the linking across scale was done
robustly by matching regions instead of points.
These approaches seem somewhat heuristic. However, they can be expressed in a more
formal manner with solid theoretical foundations. Furthermore, the implementations can be
refined in order to make them more robust. The approach presented in this section is
therefore not to be considered superior to the previous. It does, however, have a number of
properties that are appealing both in theory and implementation. The concept presented in
the following can be studied in more detail in [Kalitzin 1996a, Kalitzin1997b, Staal 1999a].

15.1 Topological numbers


The topological number defined below is a generalization of the winding number. The
topological number gives a characterization about the local structure of a function in a point
by investigating the immediate neighborhood of the point (divergence theorem, see
http://mathworld.wolfram.com/DivergenceTheorem.html).

Let L(x) : C ~(R n, R) be a differentiable scalar-valued function. For a given regular point p (a
point with non-vanishing gradient) we define

(l)(p) : Lil dLi2 A " ' ^ d L i " eiqia i"


(Lj (p) Lj (p))n/2

Here e is the permutation tensor defined by the following equations: el z'"n = 1 and
e~l tE...~k...~t...1,, = _e~ ~2...~-.tk...~,. This is simply a sign change depending on the number of
permutations.

For a closed, oriented, n-1 dimensional hypersurface S with no singular points on the surface
we define:
15. Deep structure II1. topological numbers 258

vs = fs a)(p)

For a point P0, the topological number v(po) is defined as the topological number vs for a
hypersurface S surrounding P0 closely. By the informal notion "closely" we specifically
mean that the region bounded by the hypersurface must contain no singularities, other than
possibly P0, for the function L. In order for this to be well-defined, it is required that the
singular points for the function L are isolated.

Characterization o f Points:
For a regular point the topological number is zero. For a singularity point, the topological
number offer a characterization of that point (examples are given below).

Invariance :
The topological number is invariant under homotopic deformations of the n-dimensional
space on which the hypersurface is embedded - provided that no singular points cross the
hypersurfaee in the deformation. This makes it a non-pertubative and topological quantity.

Additivity :
For a region constructed as the union of a number of disjoint subregions, the topological
number for the hypersurface bounding this region is the sum of the topological numbers for
the subregions.

Conservation:
The invariance property implies the following conservation property. For a family of images
L(x, o-) : C~ n+l , R ) , depending smoothly on the deformation parameter o-, the topological
number vs for a given hypersurface S is constant for all o- provided that no singular points
crosses the hypersurface during deformation. This property is central for the tracking of
singularities and the analysis of catastrophe points in scale-space.

15.1.1 Topological numbers in scale-space

The linear scale-space representation for n-dimensional image functions has three central
properties that makes the concept of topological numbers applicable:
9Image functions are infinitely differentiable ~ continuous derivatives in scale-space,
o->0.
9Singularities are generically isolated in a scale-space image for a given scale.
9The deformations defined by Gaussian blurring are smooth.

Thereby image functions (and their derivatives) qualify for the conditions stated above. The
topological number is well-defined for scale-space image functions. When analyzing the
deep structure of image in scale-space, the invariance and conservation properties for the
topological number mentioned above are central. Below, we will see this with respect to
tracking of singularities and the analysis of catastrophe points in scale-space.
259 15.1 Topological numbers

15.1.2 Topological number for a signal

The above mathematical definition of the topological number simplifies considerably for 1D
signals in scale-space. For a scale-space signal L(x, o - ) : C ~ ( R 1+l, R), the topological
number for a point p is simply v(p) = Sign(Lx(p+)) - Sign(Lx(p_)) where p_ and p+ are
points close to p such that p_ < p < p+.

Here "close" has the same meaning as defined above 9For regular points, the topological
number is zero. For local maxima and minima the topological numbers are -2 and 2,
respectively.

In the section on edge focusing, the illustration of the signature of the noisystep signal was
produced in an ad hoc manner. Inspection of the code reveals that the illustration displays
exactly the -2 and 2 values for the sign differences for the derivative of the signal.
Reassuringly, the sensible ad hoc approach proves to be a simple special case of a more
general principle.

15.1.3 Topological number for an image

For 2D images the expressions simplify quite a bit as well. The integrand from the
topological number simplifies to the following expression:

9(p) = tl dL2-L2 dt 1
L12 +L2 2

Using complex numbers, this can also be written as:

qb(p) = Im(L~ - iL2) d(L, +it)


Lie+L2 2

For a discrete image, this is simply the angle between the gradient vector for two
neighboring pixels. When this expression is integrated (or summed in the discrete setting) the
topological number is therefore a count of the number of times the gradient vector turns
around its origin as a contour surrounding a specific point is traversed 9This is known as the
winding number.

The winding number assumes values of k 2 Jr for some integer k. In order to understand this
intuitively, picture the gradient direction vector rotating for a moving test point that encircles
a given image point 9

For regular points the winding number is zero. For local extrema the winding number is + 2 ~r
9For saddle points the winding number is - 2 n-. Monkey saddles can be characterized by the
winding number as well - we will not look into that here.
15. Deep structure III. topological numbers 260

< < Fron~EndVision" FEV" ;


= 2 ~ / 3 ; DisplayTogetherArray[
Show[{Plo%VectorField[Evaluate[{0x #, 0y#}], {x, -2, 2}, {y, -2, 2}],
Graphics[{Hue[.4], Circle[{0, 0}, 1], Hue[0], Circle[{x, y}, .1],
Evaluate[Arrow[{X-Ox#/3, y-~y#/3}, {X+0x#/3, y+@y#/3}]]} /.
{x->Cos[#], y~Sin[~]}]}] &/@
{x 2 + y2, (x + 2) 2 + y2, x 2 - y2 }, I m a g e S i z e -> ~00] ;

\ \ \ ~ II//// ,Jir162 / / / / I / l l l Y \ \ \ \

::::::'!:;::::;; :;:~.':::::::zzzz :;::~':::-,, ....


. . . . . . , , , . . . . . . , , . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.... IIII;;~ ....

Figure 15.1 Path of the rotating gradient vector with the gradient vectorfield for three
different functions. Left: f(x, y ) = x 2 + y2; the gradient vector rotates once forward
enclosed is a maximum, the winding number is 27r. Middle: f(x, y ) = (x+2) 2 +y2; the
gradient vector does not make any rotation when the path is traversed -~ enclosed is a
regular point, winding number is zero. Right: f(x, y) = x 2 - y2 ; the gradient vector rotates
once backwards -~ enclosed is a saddle point, the winding number is -2~r.

Task 15.1 Show that the winding numbers add for the singularities within a closed path,
e.g. when 2 maxima (each windingnumber +2~r) and a saddlepoint (windingnumber -2~)
are traversed, the rotation is 27r+2Jr-27r=2Jr.

Task 15.2 Show that the winding number for a monkeysaddle f(x, y) = X 3 Jc 3 x y 2 iS -4Jr
and illustrate this with an animation. A monkeysaddle is so called to accommodate the
two legs and the tail of the monkey.

9 Task 15.3 Calculate the windingnumber of higher order monkeysaddles, given by


f(x, y) = Re[(x - ly)n], where n e R, and illustrate this with an animation.

15.1.4 The winding number on 2D images

First, we develop the routine for winding number detection in 2D images. The routine
w i n d i n ~ ' x u m b e r 2 D [ i r a , r ] returns a list of two elements { e x t r e m a , s a d d l e s }, i.e.
the positions of the extrema and the positions of the saddles, both as triples of coordinates in
scale-space.

A triple is { a , y , x }. The routine pursues the complex notation of the integrand @(p) above.
This is calculated in ~b for the eight neighbors of each pixel and summed in w n . Each
extrema and saddle is located twice - therefore the results are "pruned" through pattern
matching on w n .
For ease of calculations the components of the gradient are represented as the real and
261 15.1 Topological numbers

imaginary components of a complex number. The argument Arg of the ratio of two such
numbers is then the angle between the gradient vectors.

windingnumber2D[im_, ~ ] := Module[
{grad, v, wn, extrema, saddles}, grad= gDf[im, 1, 0, E ~ ] +IgDf[im, 0, 1, E~];
v = RotateLeft[grad, #] &/@{{0, 01, {-I, 0}, {-1, -I}, {0, -1}I;
1
wn = - - Plus @@ Arg Iv / RotateLeft Iv] ] // Round;
2~
{extrema, saddles I = (Insert[#, E ~ , 1] & /@ Position[wn, #] ) & /@ {1, -1}] ;

We calculate the winding numbers on a sagittal MR image (o- = 2.7 pixels) and plot the
extrema and saddle points as white pixels on the image::

i m = I m p o r t [ " m r 1 2 8 . g i f " ] [[1, 111;


{extrema, saddles} = w i n d i n g n u m b e r 2 D [ i m , r = 1.];
i m b l = gD[im, 0, 0, E x p [ 1 ] ] ;
w h i t e [ x _ ] : = M a x [ i m b l ] ; disp[pos_] : = M o d u l e [ { } ,
p o s i t i o n s = p o s /. {a_, y_, x_} -> {y, x};
L i s t D e n s i t y P l o t [ M a p A t [white, imbl, p o s i t i o n s ] ] ] ;
DisplayTogetherArray[disp/@ {extrema, s a d d l e s } , I m a g e S i z e -> 265];

Figure 15.2 The local extrema (left) and saddle points (right) for the MR brain image at
scale o- = 2.7 (-r = 1) superimposed as white dots on the original image blurred to o- = 2.7.

Where the extrema are typically in the center of regions, the saddle points are located near
the borders of regions. Here are the critical (self-intersecting) isophotes through the saddle
points #66, #68, #69 and #72:

saddlephote[nr_] := Block[{$DisplayFunction= Identity, saddlel,


saddle = positions[[nr]]; pl = ListDensityPlot[imbl,
Epilog -> {Yellow, PointSize[.03], Point[Reverse[saddle]It] ;
p2 = ListContourPlot[imbl, ContourStyle -> Wheat,
Contours -> {Extract[imbl, saddle]}];
Show[p1, p2, DisplayFunetion-> SDisplayFunetion]];
DisplayTogetherArray[saddlephote/@{66, 68, 69, 72}, ImageSize-> 400];

Figure 15.3 The isophote curves defined by the values at some saddle points (yellow
dots) for the MR brain scan of figure 15.2 at scale o- = 2.7. Note that all these critical
isophotes intersect themselves in saddlepoints.
15. Deep structure IlL topological numbers 262

criticalisophotes[im_, r_] :=
Module[{}, {extrema, saddles} =windingnumber2D[im, r];
imblurred= gD[im, 0, 0, E'];
saddlepositions = saddles /. {u_, y_, x } -> {y, x};
ListContourPlot[imblurred, ContourShading -> True,
Contours ~ Extract[imblurred, saddlepositions]]];
DisplayTogetherArray[criticalisophotes[im, #] & /@{1.5, 2, 2.5},
ImageSize-> 300];

Figure 15.4 The isophote curves defined by the values at the saddle points for the MR
brain scan at scales o- = 4.5 (left) and o-= 7.4 (right) pixels. Note that these critical
isophotes intersect themselves in saddlepoints.

In the previous chapter on catastrophe theory we studied the rate of the decrease of image
maxima. Using the winding number approach we can now detect extrema as well as saddle
points. For the sake of comparison, we inspect the rate of decrease of the number of
singularity points for the MR brain image and a white noise image:

wnImage = Table[windingnumber2D[im, r], {r, 0, 1, .1}];


{nrExtremaImage, nrSaddlesImage] =
Map[Length, wnImage, {2}] // Transpose;
wnNoise = Table[windingnumber2D[Table[Random[], {128}, {128}], r],
{r, 0, 1, .1}]; {nrExtremaNoise, nrSaddlesNoise} =
Map[Length, wnNoise, {2}] //Transpose;
disp[nrSingularities_, text_] := Module[{},
slopeText = " S l o p e for " < > t e x t < > " : "<>
ToString[Coefficient[Fit[10Log[nrSingularities], {1, x}, x], x]];
LogListPlot[nrSingularities, A x e s L a b e l - > {"r*10", ....},
PlotLabel -> slopeText, DisplayFunction -> Identity]];
Show[GraphicsArray[{disp[nrExtremaImage+ nrSaddlesImage, "MR image"],
disp[nrExtremaNoise+nrSaddlesNoise,"white noise"]}],
ImageSize -> 345];

Slope for M R image 1.66555 Slope for whitenoise - I 91549

2OOO

1000

3017

200 r*lO r*lO


2 4 6 8 10 2 4 6 8 10

Figure 15.5 The decrease of singularities (the sum of the extrema and saddles) in a 1282
MR image (left) and an image with white noise (right). The singularities are detected
using 2D winding numbers, Especially the curve for white noise has a slope close to the
theoretically predicted -2.

We see that for the MR image and the white noise the observed slope (also known as the
Hausdorff dimension) is close to the theoretical slope of -2. The MR image has a more
263 15.1 Topological numbers

deviating slope, possibly due to the lesser degree of self-similarity of the MR image at
different scales (see also [Pedersen2000]).

We can increase the radius of the path around the singular point to one pixel (8 pixels on our
track):

windingnumber2D8[im_, r_] :=
Module[{a, grad, v, ~, a, i,
wn, shead, stall, ehead, etail, extrema, saddles},
a[] Exp[r g r a d = gD[im, 1, 0, a} + I g D [ i m , 0, 1, a];
v = w n = Table[0, {8l];
v[[l]] []R o t a t e L e f t / @ grad; v[[2]] = RotateRight[v[[l]]];
v [[3]] = RotateRight[grad] ; v [[4]] = R o t a t e R i g h t / @ v [ [ 3 ] ] ;
v[[5]] = RotateRight/@ grad; v[[6]] = RotateLeft[v[[5]]];
v[[7]] = RotateLeft[grad]; v[[8]] = R o t a t e L e f t / @ v [ [ 7 ] ] ;
A r r v[[i]]1
= Table[If[i == 8, a = i, a = i + i]; g L ~ J ' {i, i, 8}1;

w n = R o u n d [ ~ Plus@@*};
wn=wn//. {{shead , -1, -i, stail } ~ {shead, 0, -i, stall},
{ehead___, i, i, e t a i l } ~ {ehead, 0, I, etail}};
w n = Transpose[Transpose[wn] //.
{{shead, -1, -I, s t a i l } ~ (shead, 0, -i, stail},
{ehead___, I, i, etail } ~ {ehead, 0, i, etail}}];
extrema [] Insert[#, u, 1] & /@ Position[wn, 1];
saddles = Insert[#, G, 1] & /@ Position[wn, -1]; {extrema, saddlesll ;

Sander and Zucker [Sander1992] employed winding numbers with respect to the vectoffield
co~esponding to the principal directions (of the principal curvatu~s) to detect umbilical
points (as singularities in this second order vectoffield).

15.2 Topological numbers and catastrophes


The conservation property for the topological number offers a way to analyze catastrophe
point.s Specifically, the conservation property (and the additivity property) states that the
sum of the topological nrs for the participating singularities are constant across a catastrophe.

{min, max, step} = {-30, 30, .5}; {r Cmax, rstepl = {1, 3.5, 0.3};
i m = Table[x 3 - 2 4 y 2 _ 3 0 0 x , {x, min, max, step}, {y, min, max, step}];
DisplayTogetherArray[
{ListPlot3D[im, V i e w P o i n t - > {3.147, -0.630, 1.071}, M e s h - > F a l s e ] ,
ListDensityPlot[Transpose[im]]}, ImageSize ~ 400];

Figure 15.6 A function with a local maxima and a saddle point. The function is displayed
both as a height map and a density plot.
15. Deep structure III. topological numbers 264

We illustrate this by the fold catastrophe presented in the catastrophe theory section. The
family of functions x 3 - 24 y2 + a x has a function with a fold catastrophe for a = 0.

Here we let linear diffusion produce the same catastrophe on a discrete version of the
function x 3 - 24 y2 _ 300 x.

The extrema and the saddle points are calculated with the winding number method:

wnSingularities= Table[windingnumber2D8[im, r], {r, i, 3.5, .3}];

We display the extrema points, saddle points, and the original image via the 3D graphics
primitive P o i n t . This means that image points, extrema, and saddles are converted into lists
of triples of {x, y, scale}. For the image the scale level is set to 0. The following code pieces
are quite simple - however they are somewhat cluttered by the many conversions between
scale-space coordinates and matrix indices.

(minL, maxL} = {Max[im], Min[im] };

imagepoints = Flatten [Table [{erayLevel [ x3 - 24maxLY2


_ _300minLX
- minL ] ,

Point[{Round x - min +1, R~ - " . +1, 0 } ] } ,


step
{X, min, max, step), (y, min, max, step)] , i] ;

index[coord_l := c o o r d / . {r_, x_, y_} :* {x, y, Round[ (Log[z] -I) / .3] + I};
allExtrema =
F l a t t e n [ w n S i n g u l a r i t i e s /. {extrema , saddles_} :* extrema, 1] ;
allSaddles =
F l a t t e n [ w n S i n g u l a r i t i e s /. {extrema_, saddles_} :* saddles, 1] ;

Show [Graphics3D [
{imagepoints, RGBColor[1, 0, 0] , Point [#] & / @ i n d e x [a11Extrema] ,
RGBColor[0, 0, 1], Point[#] & / @ i n d e x [ a l l S a d d l e s ] }],
ImageSize->400, V i e w P o i n t - > ~0.058, -1.840, 0.801}];

Figure 15.7 The singularity strings are displayed above the original image with scale level
as the vertical axis. The red dots are extrema and the blue dots are saddles. All
singularities but the two strings in the middle of the image are due to the cyclic
representation of the image in the winding number computation. The red string in the
center is the local maxima and the blue string is the saddle - see figure 15,6 for
comparison.
265 15.2 Topological numbers and catastrophes

The fold catastrophe in the center of the figure obeys the conservation principle of the
topological number. Below the catastrophe point there are a maximum (winding number
+270 and a saddle (winding number -2n) - the sum of the winding numbers is 0.

Above the catastrophe there are no singularities (within the center region) which means that
the winding number is 0. The winding number for the center region is thus preserved across
the catastrophe.

15.3 The deep structure toolbox


The previous sections in this chapter have presented and applied a number of techniques for
analyzing and utilizing the deep structure. The purpose of this section is to provide a
summary of the open problems in deep structure analysis from a pragmatic point of view.

The desired goal is the to be able to present a complete toolbox consisting of simple,
intuitive, robust and theoretically well-founded methods that handle the common challenges
concerning deep structure analysis. Such a toolbox would make deep structure approaches
more accessible for computer vision and image analysis researchers and programmers
outside the scale-space community.

15.3.1 Detection of singularities

A number of methods have been illustrated in the previous sections. The methods locate
singularities for a given n-dimensional image and for a given scale-space representation for
an image as well. The methods allow characterization of the singularities as maxima, minima
or saddle points.

This is the best equipped compartment in the deep structure toolbox.

15.3.2 Linking of singularities

The previous sections have shown a number of ways to detect and analyze singularities. A
central aspect of exploring the singularities is the linking across scale.

The linking method for singularities in the edge focusing section is quite heuristic. A
singularity is linked with the first singularity of the same type encountered at the adjacent
scale level in a fixed search space around the location at the current level.

Especially in the presence of catastrophes, where the singularities will have a large
horizontal movement between two scale levels, this approach is not as robust as desired.

The first step towards a more healthy approach is to adapt the search region to the local
geometry.

As well as detection the location of the singularities in scale-space we can detect their local
drift velocity [Lindeberg1992b, Lindebergl994a]. This could be used to give an estimate for
a sensible location of the search region.
15. Deep structure 111. topological numbers 266

For a non-degenerate singularity in the point x0 the singularity will drift according to
Oxi
Ot -- L i j - 1 Lit. This describes how the coordinates of the singularity change for
increasing scale (where t = ~- 1 o-). The expression is written in tensor notation, where Li.j -1
is the inverse of the Hessian matrix and Lit is the Laplacian of the gradient.

As an illustration, we equip the singularities from the singularity strings in figure 15.7 with
drift velocity vectors. The differential operators are derived in Mathematica symbolically
and then substituted with the corresponding discrete image data from the previous example.

( Ox,x L[x, y] Ox,yL[x, y] J


Clear[L, x, y] ; h e s s i a n = 0x,yL[x, y] Oy,yL[x, y] ;

g r a d = {OxL[X, y], OyL[x, Y]I;


l a p l a c e G r a d = Ox.x g r a d + 0y,y grad;
1
singDrifk = - -- I n v e r s e [ h e s s i a n ] .laplaceGrad;
2
d r i f t F i e l d [ i m a g e _ , u_] :=
s i n g D r i f t /. D e r i v a t i v e [ d x _ , dy_] [L_] [x, y] :e gD[im, dx, dy, u] ;

This is the expression for the drift velocity of the singularity points:

s i n g D r i f t // s h o r t n o t a t i o n

{ Lxxy Lxy - Lxxx Lyy - Lxyy Lyy + Lxy Lyyy Lxy (Lxxx + Lxyy ) - Lxx (Lxxy + Lyyy ) }
-2L2xy + 2 L x x Lyy ' -2L~y + 2 L x x Lyr

For each detected singularity we calculate a drift vector. This vector is represented by a
L i n e graphics object from the singularity point in scale-space coordinates. The drift
velocity vector is a first order estimate of the change in spatial coordinates for a change in
scale level. Close to a fold catastrophe, the singularity string is approximately horizontal.
The estimated drift for a change in scale level is therefore extremely high for the singularity
points detected close to the catastrophe point.

Therefore we crop the vectors - otherwise they would point far out of the view volume.
The extrema and saddle points are picked from the w n S i n g u l a r i t i e s variable calculated
in the previous section. The calculation is done for all singularities from a scale level
simultaneously such that the drift velocity field need only be computed once per scale level.

v e c t o r s [im_, ~Idx_, extrema_, saddles_] := M o d u l e [ { },


athis = Exp [zmin + rIdx zstep] ;
onext = Exp [train + ~Idx ~step + rstep] ;
drift = d r i f t F i e l d [ i m , uthis] (onext 2 - athis 2) ;
c r o p [coordinate_] := M a x [Min [coordinate, 1 + (max - min) / step] , i] ;
replaceRule =
{zz_, x_, y_} :~Line[{{x, y, zIdx}, { c r o p [ x + d r i f t [ [ 2 , x, y]]],
c r o p [ y + d r i f t [ [ l , x, y]]], z I d x + l } } ] ;
{Replace [extrema, r e p l a c e R u l e , 1] ,
R e p l a c e [saddles, r e p l a c e R u l e , 1] } ] ;

We limit the illustration to the two center singularity strings that meet in the fold catastrophe
point. The singularity strings resulting from the cyclic borders representation are unnecessary
267 15.3 The deep structure toolbox

clutter (and their drift velocities are somewhat erratic as well due to implementation details).
The selection of the c e n ~ r singularities is done in a simple ad hoc manner:

width = ( m a x - min) / step;


centerOnly[{rr_, x_, y_}] :=
A b s [ y - w i d t h / 2 ] < 0.2 width && A b s [ x - w i d t h / 2 ] < 0.2 width;
eenterSing[singList_] := Select[singList, centerOnly];

We c o m p u ~ the drift velocity vectors and attach them to the singularities in the illus~afion:

driftLines =veetors[im, #, centerSing[wnSingularities[[#, 1]]],


centerSing[wnSingularities[[#, 2]]]] & /@
Range[l, Floor[(rmax-rmin} /rstep] + i];
allExtremaLines = driftLines /.
{extremaLines_, saddlesLines_} ~ extremaLiues;
allSaddlesLines = driftLines /.
{extremaLines_, saddlesLines_} :~ saddlesLines;
g3D = Graphics3D[{imagepoints, RGBColor[1, 0, 0],
Point[#] & /@ index[centerSing[a11Extrema]], a11ExtremaLines,
RGBColor[0, 0, 1], Point[#] & /@index[centerSing[a11Saddles]],
allSaddlesLinesl, ImageSize ~ {440, Automatic}];
Show[g3D, ViewPoint ~ {0.045, -3.383, -0.009}, AspectRatio -> 0.3];
Show[g3D, ViewPoint -> {-0.700, -3.034, 1.324}];

Figure 15.8 The singularity strings from figure 15.6 with drift velocity vectors added. The
figure displays the same scale-space from two different angles, The drift velocities of the
singularities could be applied for a first step towards more robust linking across scale. At
the catastrophe point, the singularity string is horizontal in scale-space. The drift velocity
vectors are therefore extremely long for the singularity points just below the catastrophe.
In the illustration they are cropped to fit inside the view.

The drift velocity vectors does indicate that the search region for a linking method could be
refined extensively and thereby make the method much more robust.
15. Deep structure III. topological numbers 268

However, as the figure suggests, the drift velocity is not ideal for handling the linking in the
presence of catastrophes. Indeed the term is undefined for degenerate singularities. Therefore
more elaborate schemes are necessary to handle these cases.

An appealing approach is to exchange the drift velocity vector with a true tangent vector for
the singularity string curve. This simply implies treating the singularity string as an ordinary
curve in an ordinary n + I dimensional space. We would then be able to track the curve
equally well horizontally as vertically in scale-space. However, this is somewhat
complicated and computationally expensive (or even more complicated) since we then need
to sample the scale levels arbitrarily depending on the evolution of the string.

Furthermore, there is still the desire for explicit linking through higher order catastrophe
points with more complicated bifurcations than the fold. All in all, linking of singularities is
non-trivial and no simple, general, and robust toolbox method is readily available. However,
for specific applications where only simple catastrophes occur (i.e. fold catastrophes), an
approach where the search region is adapted to the differential structure of the singularity
position could be extremely effective.

15.3.3 Linking of contours

In section 13.5 the reader was challenged to extend the edge focusing technique from 1D to
2D. Instead of linking points across scale the task is to link contours. This is indeed a quite
challenging task.

We will propose no general method. However, it would appear that the approach in section
13.6 on multi-scale segmentation is the reasonable solution. Linking of regions with maximal
overlap is conceptually far simpler than any contour linking scheme.

However, like linking of singularity points, the catastrophe points pose special problems. For
the multi-scale watershed segmentation method, derivation and analysis of the generic
catastrophes reveal that the maximal overlap linking scheme is indeed sensible in the
presence of catastrophes. This is not necessarily true for regions defined in another manner.

15.3.4 Detection of catastrophes

Catastrophe theory reveals the catastrophes that can be expected for a given differential
operator - at least in principle. The introduction to catastrophe theory given in chapter 14 is
not detailed enough to allow the reader to derive the generic events for other differential
expressions. See [Olsen2000] for a more thorough treatment with applications in image
analysis.

Even though we know the generic catastrophes, detecting them in images is not trivial.
Mathematically it is quite simple. For a given image function the catastrophes are located
where ~ = 0 and Det = 0 (written in tensor notation). We know that the singularities
are curves in scale-space and the zero-points for the determinant of the Hessian form a
surface. The catastrophes are located where the singularity curves intersect the surface.
269 15.3 The deep structure toolbox

These differential expressions can easily be constructed using gaussian derivatives.


However, we have already seen that forming singularity strings is non-trivial - particularly in
the presence of catastrophe points. Furthermore, the zero-crossings for scale-spaces
corresponding to these expressions will generally not coincide with the pixel grid. This
makes it non-trivial to locate the intersections of the zero-crossings.

In [Kuijper99] the catastrophes are detected and their locations are determined with sub-pixel
precision. Here the image is locally modelled at each pixel by a third order model. For each
pixel the model then points towards a specific location for a catastrophe. For a small
window, the estimated catastrophe locations are then averaged.

In the vicinity of catastrophes, the estimates are well-aligned and the averaged estimate have
a small standard deviation.

Intuitively, these search windows are then moved across scale-space and catastrophes are
detected where the standard deviation for the catastrophe location is locally minimal.
However, this method is not quite as simple and robust as desired. The method depends on a
arbitrary search window and the estimated catastrophe must be located within the search
window. Possibly, this makes it particularly sensitive at higher order catastrophes.

15.3.5 General discrete geometry approach

The deep structure toolbox should ideally contain a simple unifying approach. The simple
approach sketched below handles many of the problems regarding linking and detection that
arises from the discretization of data in image analysis applications.

From analysis we define conditions in terms of equations and inequalities. Then we combine
these to get the desired solutions. We simply need to implement this approach for discrete
data. Equations and inequalities would be defined in terms of differential operators. These
are naturally discretized by Gaussian derivatives as described in previous chapters. Zero-
crossings in the discrete data corresponds to equations or to separators for inequalities. These
zero-crossings must be determined with sub-pixel precision.

A recipe for handling these issues could be:


a) Express the problem as a set of equations and inequalities using combinations of
differential operators.
b) Transform each equation or inequality such that the solution is expressed in terms of a
zero-crossing.
c) Calculate a discrete "image" volume for each equation and inequality. The involved
differential operators correspond to a discrete volume for a combination of Gaussian
derivatives. These can be computed using the implementations from this book. The zero-
crossings are then computed for each discrete volume. This results in a new discrete volume.
For each voxel is established whether the values at the vertices give rise to zero-crossings
within the voxel. In this case the zero-crossing surface patch is calculated through
interpolation of the vertex values and possibly the values in the neighboring voxels vertices
for a higher order interpolation.
15. Deep structure Ill. topological numbers 270

The order of interpolation determines the order of continuousness across the voxel borders.
These surface patches for the individual voxels represent the hypersurface that corresponds
to the zero-crossing of the differential operator. The equations are directly represented by
this data volume. The inequalities must also represent with part of the voxels are "within"
and "outside" the set that solves the inequality.

d) Combine the equations and inequalities through intersections of the discrete


representations of the zero-crossing hypersurfaces. This can be done voxel-wise.
e) ff the problem was originally formulated in terms of equations the result is a discrete
volume that represents a hypersurface. If the problem formulation includes inequalities the
resulting data volume possibly contains a hypersurface or a set enclosed by a hypersurface.

A simple example could be detection of maxima strings in scale-space for a 2D image:


a+b) Maxima are singularities with negative second order structure. There is an inequality
for each of the principal curvature directions.
c) The zero-crossings as discrete volumes arising from these equations and inequalities are
calculated. For each spatial derivative equation we get a surface in scale-space. For each
principal curvature direction the scale-space volume is split in two where the division is a
surface in scale-space.
d) The two surfaces corresponding to zero-crossings for the spatial derivatives are intersected
resulting in curve segments. The is done voxel-wise. The curve segment for a voxel will
intersect the voxel in two points located on the faces of the voxel. These curve segments are
intersected with the surfaces enclosing the sets determined by the signs for the second order
structure. For each line segment that crosses a surface, only the part within the corresponding
set is kept. Now, some curve segments will intersect the faces of their voxel zero or one time
only. The other end of the line segment will be inside the voxel.
e) The result is a data volume where the singularity strings are explicitly represented as
continuous line segments. Notice that the strings are directly available - no linking is
necessary.

The thoughts above are not to be considered as a new unifying approach. It is merely a
statement saying that such a method would be nice to have in the deep structure toolbox.

The approach sketched above has a major drawback - it is quite cumbersome to implement.
However, it is geometrically simple and topologically consistent. The generality enables
applications beyond detection of singularities and catastrophes.

Points, lines and surfaces are located correctly within the precision of a pixel. The precision
inside a pixel is determined by the interpolation. Simple linear interpolation will do for a
start - more sophisticated interpolations with a theoretically well-founded image model will
provide better precision.

Such a geometry toolkit, in combination with the implementations of the Gaussian


derivatives from this book, would certainly make the investigation of deep structure simpler.
271 15.3 The deep structure toolbox

Task 15.4 Implement a scheme for linking singularity strings that take
advantage of the drift velocity vector for determining a sensible location and
shape for the search region. Possibly refine the drift velocity vector to handle
singularity strings that a horizontal in scale-space (at catastrophe points). See
[Kuijper1999] for inspiration.

Task 15.5 Implement the deep structure geometry toolbox sketched above and
extend it with function for visualizing the data. And please send a copy to the
authors of this chapter 0.

15.4 From deep structure to global structure


The previous sections have introduced a number of methods for analyzing the deep structure
of images. For a specific application, such an analysis could result in a comprehensive
description of the images in terms of singularity strings and catastrophe points for a number
of differential operators.
Such an image description can be considered a simplified and possibly sparse representation
of the original image where certain features have been selected as the salient ones. However,
it is not only a simplification - it is also a specification. Where the original image contains
implicit information, this image description contain explicit information about these features.

An example of such a description is the primal sketch [Lindeberg1994a]. Here analysis of the
blobs in scale-space provides the description of the image.

The purpose of this final section of this chapter is to outline a number of applications that
could take advantage of such an image description.

15.4.1 Image representations

In [Johansen1986] it was showed that the toppoints (the fold catastrophe points) provide a
complete description of a 1D signal. Even though reconstruction of the original image from
this description is complicated, it does however give an indication of the descriptive power
of the deep structure elements.

The purpose of an image description based on the singularity strings and catastrophe points
in scale-space for a set of differential operators will in general not be to offer a complete
description from which the original image can be exactly reconstructed. The focus is rather
to accomplish a simplified description that capture the qualitative properties of the image.
The desired qualitative properties are defined by the application task of the image
representation.

The field of information theory is central in the selection of the relevant differential
operators. The set should be large enough to represent the image with sufficient precision.
However, in order to provide a good simplification, the set should be minimized. The
15. Deep structure 111. topological numbers 272

applications all rely on a powerful simplification. A way to formalize this is the minimum
description length (MDL) principle [Barron1998, Rissanen1978] a statistical inference
principle. It says that among various possible stochastic models (or model classes) for a data
sequence, one should select the model that yields the shortest code, taking into account also
the bits needed to describe the model (model class) that has been used for the encoding.

Therefore it is central to establish the minimal set of differential descriptors that allows
representation of the information-bearing part of the image.

This could be defined in terms of mathematics or from a more "soft" approach where an
image representation is measured in terms of the ability to reconstruct an image that
corresponds well with the human perception of the original image. An obvious application
for such an image representation is image compression.

Below, a number of other application areas are listed that can take advantage of image
representations based on deep structure analysis.

15.4.2 Hierarchical pre-segmentation

Kuijper et al. [Kuijper2001a] discovered that the scale-space surface through a so-called
scale-space saddle, defined by the simultaneous vanishing of the gradients in the spatial and
the scale direction (thus of the vanishing of the Laplacian), lead to a remarkable pre-
segmentation of the image without prior knowledge. In 2D images the saddle points are the
points where the characteristic isophotes cross (the only isophotes that intersect themselves
go through saddlepoints). They form in this way natural 'separatrices' of image structure.
This works in a similar fashion in the deep structure: the scale-space saddlepoints are the
only points where the 'hulls' of iso-intensity surfaces in scale-space touch each other. This
shows the surface through the saddle of a scale-space germ (for details see
[Kuijper2002b]and [Florack2000b]), using the interactive OpenGL 3D viewer for
Mathematica (see phong.informatik.uni-leipzig.de/~kuska/mathgl3dv3/):

<< M a t h G L 3 d " O p e n G L V i e w e r " ;


u = i; f = x 3 + 6 x t + a ( 2 t + y 2) ;

The saddle occurs for {x, y, t} = {- 3-'


1 0, - ~8 } with intensity - ~l- . .

sol= Solve[(Oxf == 0, O y f == 0, O f f == 0}, {x, y, t}] // F l a t t e n


saddleintenslty = f /. sol

1
27
273 15.4 From deep structure to global structure

MVClear[l; a = .5;
MVContourPlot3D[f, {x, -4, 4}, {y, -4, 4}, {t, -3, 3l,
C o n t o u r s ~ {N[saddleintensity]}, P 1 o t P o i n t s ~ 5 0 , ImageSize-> 150];

Figure 15.9 Iso-intensity contours for a scale-space saddle. In a scale-space saddle the
gradient and Laplacian are zero. They form the only points in deep structure where
isophote surfaces touch.

15.4.3 Perceptual grouping

Points and regions in an image provide local contributions to the overall information of the
image. The human visual system does an excellent job of combining these local input into a
comprehensive scene description [Elder1998]. The task of combining local information
indicators into a consistent non-local description is called perceptual grouping.

An example is the task of grouping the individual pixels into regions that correspond to the
object in the image. Another example is the task of connecting local edge segments into a
complete edge contour.

The deep structure provides a tool for transforming the local features into non-local entities.
The signature of a signal gives a simple illustration of this (see figure 13.6). Whenever two
singularity strings meet in a fold catastrophe point, a non-local entity is formed - an edge
pair. Each positive edge is associated with a negative edge. The two singularity strings
define a region enclosed by the edge pair in the original image. The scale at which the edge
pair are annihilated gives an upper bound for the scale of the structures within the enclosed
region. An edge is essentially an local feature. However, when the deep structure of the edge
is studied it offers non-local information.

Another example from this chapter is the grouping of the watershed catchment basins
defined by the deep structure of the gradient magnitude.

Perceptual grouping from deep structure has been pursued by a number of researchers. The
blob inclusion hierarchy of the primal sketch defines multi-scale grouping [Lindeberg1994a].
Grouping and detection of elongated structures can also be achieved through analysis of the
deep structure. An example of this is [Staal1999a].
15. Deep structure 111.topological numbers 274

15.4.4 Matching and registration

Matching is a difficult problem in image analysis. Traditional methods are based on


templates and deformable models where the correlation between an object prototype and
subwindows of the image is explored. The key point is that the matching is done between an
image and a prototype object image. These methods are inherently fragile to changes in the
lighting conditions and the viewing positions.

A deep structure based matching algorithm would instead do the matching between the deep
structure representations of the image. Since these representations are qualitative
simplifications, matching is theoretically easier. The algorithmic basis is possibly matching
of the deep structure singularity strings or even simpler matching of the catastrophe points.
This matching should probably be done in a coarse to fine order where the necessary global
and local deformations needed in order to achieve a match could be recorded. The matching
would then result in not only a measure of the "distance" between the images but also a
description of the transformations needed to map one image into the other. See also recent
results with shape-context operators [Belongie2002].

Algorithms for matching deep structure descriptions could be inspired by the research
performed within the field of graph theory [Shokonfandeh2002, Dickinson1999]. Interesting
and promising work has been done with so-called shocks, singularities in curve evolutions
[Siddiqi1999a], which concepts are ready to be transferred to scale-space singularities and
catastrophes.

A number of applications have used approaches similar to the outline above. In


[Bretzner1999a] the location and scale-range of features in scale-space is used as the
qualitative model for matching in a tracking algorithm. This approach gives an algorithm
which is quite robust with respect to change of view.

Registration is a task quite similar to matching. In registration the match is known


beforehand. The desired result is a mapping that links points (or feature points) between the
two images. As described above this mapping can also be achieved from the deep structure
description matching. A coarse to fine matching on structural information is applied in
[Massey1999] illustrated by registration of coastlines. The registration is robust with respect
to translation, rotation and scale differences between the images.

15.4.5 Image databases

In medical imaging the rise of the PACS (Picture Archive Computer System) has created
large image databases. This creates a natural need for algorithms that allow searching in
these databases. In principle, a matching algorithm allows searching. The obvious problem is
that explicit matching against every single image in a large database containing tens of
thousands of images would not be practically feasible.
275 15.4 From deep structure to global structure

Regular databases have indexes which allow search for specific keywords. Image databases
need the same concept where the search is defined in terms of a combination of keywords
and image data. This is by no means a trivial problem.

Indexing implies a sorting of the descriptions. An image description could be sorted based on
the coarse to fine hierarchy of the deep structure description and on the differential features
used in the description. However, the specific sorting is not obvious.

A further complication is that the searching will often not seek matching of complete images.
For many applications the image data used in the search is only a small part of the desired
image.

Indexing based on the coarse to fine approach would possibly allow the search algorithm to
enter the search space directly at the proper scale corresponding to the size of the desired
image object. The problem can then be stated as the matching of a subgraph within a larger
graph.

15.4.6Imageunderstanding
The listing of applications for a deep structure based image description is by no means
exhaustive. It is only intended as a short appetizer.

The wide variety of basic image analysis problems that can be addressed suggests that future
research within deep structure based image descriptions could have a profound impact on
image analysis in the years to come. Both as a theoretical basis for image understanding and
as the foundation for real world applications.

However, a lot of work needs to be done. A broad impact of deep structure based methods
requires:
9General methods for establishing the optimal differential descriptors.
9Clarification on which deep structure descriptors in terms of properties for singularities and
catastrophes that should be used in the image representation.
9Development of algorithms for matching, indexing and searching these deep structure
descriptions.
9Simple, intuitive formulations and applications of the theoretical achievements in order to
make them available to the general computer vision and image analysis community.

15.5 Summary of this chapter


Interesting results from vectorfield analysis can be applied to the analysis of singularities in
scale-space. We discuss the homotopy number, which is also called the winding number for
2D and 3D images. A vector (e.g. the gradient) does make no rotation when a closed contour
path is followed around a regular point, but it rotates once around a maximum or minimum,
and rotates once backwards around a saddle point. The number of rotations (over 27r) is
defined as the winding number. For 3D images this is equivalent to the number of complete
space angles (over 4Jr).
15. Deep structure III. topological numbers 276

The singularities' winding numbers add up when contained together within the closed path.
The vector of its velocity through scale-space can be calculated.

The number of singularities decreases exponentially with scale, with the coefficient of the
logqog slope (Hausdorff dimension) roughly equivalent to minus the spatial dimension.

Catastrophe theory is a mature field describing the events that can occur while evolving the
image. Extrema always annihilate with saddle points, in dimensions higher then one
creations can occur generically. A new field is emerging, where the topological, hierarchical
image structure can be exploited. The study of scale-space singularities and catastrophes is
the beginning hereof. The development of a 'deep structure toolbox' is an emerging and
promising field in fundamental multi-scale computer vision research.

Such a hierarchical representation is the natural input for the application of graph theory for
the study of topological relations. The toppoint hierarchical tree might be investigated for
atlas-supported segmentation, object search in images and image content based search and
retrieval (e.g. x - y - o- triples as input for active shape models), as well as logicalfiltering,
i.e. the removal of a subtree to remove a structural element out of an image (e.g. trainee
radiologists learn in an early stage at X-thorax reading: "think away the ribs").

Not only the detection of scale-space toppoints is important, the reconstruction from the
toppoints into an image again is just as important. Recently it has been shown that such a
reconstruction is feasible [Nielsen2001a].
16. Deblurring Gaussian blur
16.1 Deblurring
To discuss an application where really high order Ganssian derivatives are applied, we study
the deblurring of Gaussian blur by inverting the action of the diffusion equation, as originally
~described by Florack et al. [Florack et al. 1994b, TerHaarRomeny 1994a].

Ganssian degradation, as deblurring with a Gaussian kernel is also coined, occurs in a large
number of situations. E.g.: the point-spread-function of the human lens e.g. has a close to
Gaussian shape (for a 3 mm pupil its standard deviation is about 2 minutes of arc); the
atmospheric turbulence blurs astronomical images in a Gaussian fashion; and the thickness
profile of thin slices made by modern spiral-CT scanners is about Gaussian, leading to
Gaussian blur in a multiplanar reconstructed image such as in the sagittal or coronal plane.
Surely, deblurring is of immediate importance for image restoration.

Due to the central limit theorem, stating that a concatenation of any type of transformation
gives a Gaussian shape when the number of sequential transformations goes to infinity, many
physical processes involving sequential local degradations show a close-to-Gaussian blurring.

There is an analytical solution for the inversion of Gaussian blur. But the reconstruction can
never be exact. Many practical solutions have been proposed, involving a variety of
enhancing filters (e.g. high-pass or Wiener) and Fourier methods. Analytical methods have
been proposed by Kimia, Hummel and Zucker [Kimia1986, Kimia1993, Hummel1987] as
well as R6ti [R6ti 1995a]. They replaced the Gaussian blur kernel by a highly structured
Toeplitz matrix and deblurred the image by the analytical inverse of this matrix. Martens
deblurred images with polynomial transforms [Martens 1990l.

16.2 Deblurring with a scale-space approach


If we consider the stack of images in the scale-space, we see the images gradually blur when
we increase the scale. Indeed, the diffusion equation aL9t - a2L
Ox2 + - ba2L
- 7 tells us that the
change OL in L when we increase the scale t with a small increment Ot is equal to the local
OZL OZL
value of the Laplacian 0-7 + -b-7' From the early chapters we remember that a scale-space
is infinitely differentiable due to the regularization properties of the observation process.

A natural step is to look what happens if we go to negative scales. Due to the continuity we
are allowed to construct a Taylor expansion of the scale-space in any direction, including the
negative scale direction. We create a Taylor series expansion of our scale-space L(x, y, t)
with Mathematica's command S e r i e s , e.g. to third order around the point t -- O:

<< FrontEndVision'FEV';
278 16.2 Deblurring with a scale-space approach

L = . ; S e r i e s [L [x, y, t], {t, O, 3)]

L[x, y, O] + L (~176 [x, y, O] t +


12 L(O,O,21 [x, y, O] t 2 + ~-
1 L<O,O,a) Ix, y, O] ta + O [ t ] 4

The derivatives to t are recognized as e.g. L ( ~ . It is not possible to directly calculate


the derivatives to t. But here the diffusion equation rescues us. We can replace the derivative
of the image to scale with the Laplacian of the image, and that can be computed by
application of the Gaussian derivatives on the image. Higher order derivatives to t have to be
replaced with the repeated Laplacian operator. E.g., the second order derivative to t has to be
replaced by the Laplacian of the Laplacian. To shorten our notations, we define A to be the
Laplacian operator:

A : : (Ox,x # + a y , y # ) &

Here the construct of a 'pure function' in Mathematica is used: e.g. (#3) & is a function
without name that raises its argument to the third power. The repeated Laplacian operator is
made with the function N e s t :

N e s t [ f , x, 3]

f[f[f[x]]]

We now look for each occurrence of a derivative to t. This is the term L (0,o,n_) [ x , y, 0 ]
where n _ is anything, named n, the order of differentiation to t. The underscore _ or
B l a n k [ ] is the Mathematica representation for any single expression). With Mathematica's
powerful technique of pattern matching ( / . is the R e p l a c e operator) we replace each
occurrence of L (0,0, n_) [ x , y , 0 ] with an n-times-nested Laplacian operator as follows:

expr =Normal[Series[L[x, y, t], {t, 0, 3}]l /.


L (0'0'n-) [X, y, 0] :~ Nest[A, L[x, y, 0], n]

L[x, y, O] + t (L <~176 [x, y, O] + L (2'~176 [x, y, 0]) +


12 t2 (L(~176 1 ta
[x, y, O] + 2 L 12"2'~ [x, y, O] + L 14'~176 [x, y, O] ) + ~-
(L <0'6'0> [x, y, O] + 3 L (2'4'0) [x, y, O] + 3 L <4'2'0> [x, y, O] + L <6'~176 [x, y, 0])

To get the formulas better readable we apply the function s h o r t n o t a t i o n (defined in


chapter 6, section 5), which replaces the formal notations of the derivatives by a shortform
expressed in a (luminance) function L with appropriate subscripts through pattern matching:

expr // shortnotation

L[x, y, O] + t (Lxx +Lyy) + ~i t2 (Lxxxx + 2 Lxxyy + Lyyyy ) +

i t3 (Lxxxxxx + 3 Lxxxxyy + 3 Lxxyyyy + Lyyyyyy )


6

High order of spatial derivatives appear. The highest order in this example is 6, because we
applied the Laplacian operator 3 times, which itself is a second order operator. With
Mathematica we now have the machinery to make Taylor expansions to any order, e.g. to 5:
16. Deblurring Gaussian blur 279

expr = Normal[Series[L[x, y, s , {t, O, 5}]] /.


L (~176 Ix, y, 0] :~ Nest[A, L[x, y, 0], n];
expr// shortnotation

L [x, y, 0 ] + t (Lxx + Lyy ) + ~1 t2 (Lxxxx + 2 Lxxyy + Lyyyy ) +

1 t3 (Lxxxxxx + 3 Lxxxxyy + 3 Lxxyyyy + Lyyyyyy ) +


6

124 t4 (L........ + 4 L ......yy +6Lxxxxyny +4Lx~yyyny +Lyyyyyny) + i ~1 ts

(nxxxxxxxxxx + 5 Lxxxxxxxxyy + 10 nxxxxxxyyyy + i0 Lxxxxyyyyyy + 5 nxxyyyyyyyy + nyyyyyyyyyy )

No matter how high the order of differentiation, the derivatives can be calculated using the
multi-scale Gaussian derivatives. So, as a final step, we replace by pattern matching ( / . ) the
spatial derivatives in the formula above by Gaussian derivatives ( H o l d F o r m assures we see
just the formula for gD [ ], of which evaluation is 'hold'; R e l e a s e H o l d removes the hold):

corr =
e x p r /. Derivatlve[n_, m_, 0] [L] [x, y, a_] ~ H o l d F o r m [ g D [ i m , n, m, 1]]

t (gD[im, 0, 2, i] +gD[im, 2, 0, i]) +


--i
0 t2 2 (gD[im' , 4, i] + 2 g D [ i m , 2, 2, i] +gD[im, 4, 0, i]) + ~i t3
(gD[im, 0, 6, i] + 3 gD[im, 2, 4, i] + 3 g D [ i m , 4, 2, i] +gD[im, 6, 0, i]) +
--i t4 (gD[im, 0, 8 i] + 4 g D [ i m , 2, 6, i] +
24
6 gD[im, 4, 4, i] +4 gD[im, 6, 2, i] +gD[im, 8, 0, I]) +
1 t ~ (gD[im, 0, i0, i] + 5 gD[im, 2, 8, i] + 10 gD[im, 4, 6, i] +
120
10gD[im, 6, 4, i] + 5 g D [ i m , 8, 2, i] +gD[im, i0, 0, i]) +L[x, y, 0]

Because we deblur, we take for t = T1 0.2 a negative value, given by the estimated amount of
blurring O.est we expect we have to deblur. However, applying Gaussian derivatives on our
image increases the inner scale with the scale of the applied operator, i.e. blurs it a little
necessarily. So, if we calculate our repeated Laplacians say at scale O'operator ---- 4, we need to
deblur the effect of both blurrings. Expressed in t, the total deblurring 'distance' amounts to
0"2 est +02 operator
tdeblur -- 2
We assemble our commands in a single deblurfing command which calculates the amount of
correction to be added to an image to deblur it:

deblur [ira_, aest_, order_, q....] := Module[{expr}, A = Ox,x # + Oy,y # &;


expr = N o r m a l [ S e r i e s [ L [ x , y, t], {t, 0, order}}} /.
Gest 2 + 0 .2
L (0,0,1_) [x_, y _ , t_] : ~ N e s t [ A , L [ x , y, t], I] /. t ~
-- 2
Drop[expr, i] /. L ("-'~-'~ [x, y, t_] ~ H o l d F o r m [ g D [ i m , n, m, u]]]

and test it, e.g. for first order:

ira=.; deblur[im, 2, 1, a}

I
(-4-o- 2) (gD[im, 0, 2, ~] +gD[im, 2, 0, G])
280 16.2 Deblurring with a scale-space approach

It is a well known fact in image processing that subtraction of the Laplacian (times some
constant depending on the blur) sharpens the image. We see here that this is nothing else
than the first order result of our deblurring approach using scale-space theory. For higher
order deblurring the formulas get more complicated and higher derivatives are involved:

deblur[im, 2, 3, a]

1 1 )2
( - 4 - a 2) (gD[im, 0, 2, a] +gD[im, 2, 0, ~]) + ~ (-4-a 2

(gD[im, 0, 4, a] + 2 gD[im, 2, 2, a] + gD[im, 4, 0, ~]) + T 8 (-4 - 02) 3


(gD[im, 0, 6, G] + 3 gD[im, 2, 4, ~] + 3 gD[im, 4, 2, G] + gD[im, 6, 0, a])

We generate a test image blurred with o- = 2 pixels:

im = Import ["mr128.gif"] [ [1, 1] ] ; DisplayTogetherArray [


ListDensityPlot /@ {im, blur = gDf [im, O, 0, 2] }, ImageSize -> 360] ;

Figure 16.1 Input image for deblurring, blurred at o- = 2 pixels., resolution 1282 .

We try a deblurring for orders 4, 8, 16 and 32 (fig. 16.2 next page): A good result. Compare
with figure 16.1. Mathematica is reasonably fast: the deblurring to 32 nd order involved
derivatives up to order 64 (!), in a polynomial containing 560 calls to the gD derivative
function.

The 4 calculations take together somewhat more than one minute for a 1282 image on a 1.7
GHz 512 MB Pentium 4 under Windows 2000 (the 32 nd order case took 50 seconds). This
counts the occurrences of gD in the 32 nd order deblur polynomial, i.e. how many actual
convolutions were needed:

dummy=.; Length[Positlon[deblur[dummy, 2, 32, 4], gD]]

560
16. Deblurring Gaussian blur 281

Remove[p];
p[i_] := ListDensityPlot[blur+ReleaseHold[deblur[blur, 2, i, 4]],
P l o t L a b e l ~ " o r d e r = " <>ToStrlng[i]];
DisplayTogetherArray[{{p[4], p[8]}, {p[16], p[32]}}, ImageSize-> 450];

Figure 16.2 Deblurring of a blurred image (1282 pixels, O'blur 2 pixels, left column) with
:

different orders of approximation. The 32 nd order (bottom right) result comes close to the
original (figure 16.1, left).

16.3 Less accurate representation, noise and holes


The method is reasonably robust to the accuracy or representation of the data. Of course, it is
essential to retain as much information as possible during the blurring process. Close to
precise representation (as high precision real floating point numbers) was the case in the
above example. When we store the image to disk as a typical unsigned byte per pixel
representation, we throw away much information. We can study the effect of such round-off
by rounding each pixelvalue of the blurred image (making them integers), and do the same
deblurring again:
282 16.3 Less accurate representation, noise and holes

roundedblur = Round[blur]; Block[{$DisplayFunction = Identity},


p = T a b l e [ c o r r = deblur[roundedblur, 2, 2 i, 4] // ReleaseHold;
ListDensityPlot[roundedblur+corr,
P l o t L a b e l - > "order = "<>ToString[2i]], {i, 2, 5}]];
Show[GraphicsArray[Partition[p, 2]] , ImageSize -> 450];

Figure 16.3 Deblurring results when the blurred image is stored as integers (intensity range
of this particular image is [2-186]). Note that only the deblur results are shown. The
deblurring order is indicated with each result.

Clearly the deblurring now fails for the very high order, but the results are still good till 16 th
order.

Noise is a disaster. W h e n we add Gaussian distributed noise with zero mean and a standard
deviation of 5 intensity units, we get the following results:
16. Deblurring Gaussian blur 283

<< Statistics'ContinuousDistributions';
n o i s y b l u r = b l u r + T a b l e [ R a n d o m [ N o r m a l D i s t r i b u t i o n [ 0 , 5]], {128}, {128}];
Block[{$DisplayFunction = Identity},
pl = ListDensityPlot[noisyblur, PlotLabel -> "noisyblur"];
p2 = Table[corr =deblur[noisyblur, 2, 2 i, 4] // ReleaseHold;
ListDensityPlot[noisyblur+corr,
PlotLabel -> "order = " <> ToString[21]], {i, 2, 4}]];
Show[GraphicsArray[Prepend[p2, pl]], ImageSize -> 470];

Figure 16.4 Deblurring results when the blurred image is disturbed by Gaussian additive
noise (mean = 0, o-i.te,ls~= 5). The deblurring order is indicated with each result.

A n d to conclude, we study the effect of 25 random pixels being 'blanked out', i.e. set to zero:

coords =Table[Random[Integer, {1, 128}], {50}, {2}];


h o l e s b l a r = ReplacePart[blur, 0, coords];
Block[{$DisplayFunction = Identity},
p1 = ListDensityPlot[holesblur];
p2 = T a b l e [ c o r r = deblur[holesblur, 2, 2 i, 4] // ReleaseHold;
ListDensityPlot[holesblur+corr,
P l o t L a b e l - > "order = " <>ToString[2i]], {i, 2, 4}]];
Show[GraphicsArray[Prepend[p2, pl]], ImageSize -> 470];

Figure 16.5 Deblurring results when the blurred image is disturbed by setting a random
selection of 25 pixels to zero. The deblurring order is indicated with each result. Note that up
to order 8 the 'blanked' points reconstruct well. At order 16 an overshoot occurs.

9 Task 16.1 Experiment with deblurring images that are blurred with another
kernel than the Gaussian.

Task 16.2 Experiment with blurred images from an external source, e.g. find
unsharp speed ticket camera images on the internet, digitize your unsharp
home pictures, etc.
284 16.3 Less accurate representation, noise and holes

Task 16.3 Motion blur may be simulated with anisotropic Gaussian blur, i.e.
where the a is rather different for the x and y direction. It may also be at any
angle (see also chapter 19 where we discuss Gaussian kernels at arbitrary
directions). Make such a blurred test image, and come up with a deblurring
scheme for it.

Task 16.4 In chapter 21 we discuss nonlinear diffusion equations. After having


studied this chapter, it is interesting to consider how these nonlinear diffusion
equations might be applied in the framework presented in this chapter, and what
is the type of degradation.

16.4 Summary of this chapter


The regularization property of the Gaussian kernel makes the scale-space continuous, which
means infinitely differentiable in both the spatial as the scale domain. It was proposed by
Florack to expand the scale-space of a blurred image into the negative scale direction by
means of a Taylor expansion. The high order derivatives to scale in this expansion can be
expressed in spatial Laplacians of the image, due to the constraint of the isotropic diffusion
equation. Mathematica turns out to be an efficient tool to do the analytic calculations of the
high order Taylor expansion polynomial, in which the derivatives can be replaced by scaled
Gaussian derivatives. We show some examples to real high order.

Deblurring is instable, and can only be carried out analytically when no data is lost, for
example through finite intensity representation (8 bit), noise of other pixel errors. The
message of this chapter is that the taking of very high order derivatives is feasible, that
computer algebra is a suitable mean for implementing these calculations, and gives an
example of deblurring from Gaussian blur.
17. Multi-scale optic flow
Bart ter Haar Romeny, Luc Florack, Avan Suinesiaputra

17.1 Introduction
In this chapter we focus on the quantitative extraction of small differences in an image
sequence caused by motion, and in an image pair by differences in depth. We like to extract
the local motion parameters as a small local shift over time or space. We call the resulting
vectorfield the opticflow from the image sequence, a spatio-temporal feature, and we call the
resulting vectorfield the disparity map for the stereo pair. As the application of the method
described in this chapter is virtually the same for stereo disparity extraction, we will focus in
the treatment on spatio-temporal optic flow.

<< FrontEndVision'FEV"
PlotVectorField3D[{y, -x, Sin[z]}, {x, -1, 1},
{y, -i, I}, {z, O, 2}, VectorHeads ~ True, ImageSize -> 270];

Figure 17.1 An example of a 3D (optic) flowfield. Such flowfields are used in the study of 3D
motion of e.g. the heart from 3D-time magnetic resonance imaging (MRI) sequences.

We need to measure a displacement of something over some distance, in some amount of


time, in some direction in order to find the vector starting at each point in the image that
indicates where this point is moving to. Of course, we need to take into consideration that we
are doing a physical measurement, so we need to apply the scale-space paradigm, and
secondly we need to consider how constant our 'structure' is when it moves.
286 17.1 Introduction

Many approaches have been proposed to solve the problem of finding the optic flow field of
an image sequence. Three major classes of optic flow computation techniques can
discriminated (see for a good overview Beauchemin and Barron IBeauchemin19951):
- gradient based (or differential) methods;
- phase-based (or frequency domain) methods;
- correlation-based (or area) methods;
- feature-point (or sparse data) tracking methods;

In this chapter we compute the optic flow as a dense optic flow field with a multi-scale
differential method. The method, originally proposed by Florack and Nielsen [Florack1998a]
is known as the Multiscale Optic Flow Constrain Equation (MOFCE). This is a scale-space
version of the well known computer vision implementation of the optic flow constraint
equation, as originally proposed by Horn and Schunck [Horn1981]. This scale-space variation,
as usual, consists of the introduction of the aperture of the observation in the process. The
application to stereo has been described by Maas et al. [Maas 1995a, Maas 1996a].

Of course, difficulties arise when structure emerges or disappears, such as with occlusion,
cloud formation etc. Then knowledge is needed about the processes and objects involved. In
this chapter we focus on the scale-space approach to the local measurement of optic flow, as
we may expect the visual front-end to do.

17.2 Motion detection with pairs of receptive fields


As a biologically motivated start, we begin with discussing some neurophysiological findings
in the visual system with respect to motion detection. A popular model, based on
physiological data involving spatiotemporal receptive fields, is the Reichardt detector, which
models temporally and spatially coupled pairs of receptive fields.

pt [] zero[] Table[50, {64], {64}]; pt[[32, 32]] = i00;


rf = gD[pt, 2, 0, 10] +gD[pt, 0, 2, 10]; rf = r f - M i n [ r f ] ;
Block[{$DisplayFunction= Identity, xres, yres, maxl,
{yres, xres I = Dimensions[rf]; max = Max[rf];
rfleft = Graphics3D[
ListPlot3D[zero, Map[GrayLevel, r f / m a x , {2l], M e s h - > F a l s e ] ] ;
rfrlght =TranslateShape[rfleft, {75, 0, 0}];
cube = Graphics3D[Cuboid[(59, 22, i0}, {79, 42, 30}]];
sphere =
TranslateShape[Graphies3D[Sphere[13, 25, 20]], {107, 32, 20}];
rightarrow = Graphics3D[(Thickness[.01],
arrow3D[{107, 32, 20}, {30, 0, 0}, True]}];
lines = Graphics3D[{Thickness[.01],
Line[{{32, 32, 50}, {32, 32, 20}, (107, 32, 20), {107, 32, 50)]]}];];
17. Mul•scale optic flow 287

Show[{rfleft, rfright, cube, sphere, lines, r i g h t a r r o w } ,


V i e w P o i n t ~ (0.4, -5, .8},
DisplayFunction ~ $ D i s p l a y F u n c t i o n , B o x e d ~ False];

Figure 17.2 A simple model for a retinal Reichardt detector for the detection of motion. Two
center-surround receptive fields are separated by a center-to-center span of length d. These
receptive fields both project to the (spherical) output ganglion cell, the right one directly, the
left one through an intermediate cell (depicted as a cube) which incorporates a small temporal
delay r. The ganglion cell has the highest chance of firing an action potential when the
velocity of the object v = d/-c.

When motion needs to be detected directly, the displacement over space in a given amount of
time has to be measured. One of the strategies of the front-end visual system seems to design a
detector for every occurrence of a stimulus parameter. For the detection of motion a famous
proposal was done by prof. Werner Reichardt in the late fifties during his studies on fly
motion detection. He proposed a simple but very effective correlation type model consisting
of a velocity and directionally tuned pair of receptive fields. Two center-surround receptive
fields in a single eye of the same size, separated by a distance (span) d project to the same
ganglion cell. The first cell projects through an intermediate cell that introduces a small
temporal delay r, the second receptive field projects directly to the ganglion. The input
synapses on the ganglion cell create excitatory post-synaptic potentials (EPSPs), small
intracellular increases of the intracellular voltage with a short duration (some ms). EPSPs that
arrive simultaneously superimpose and this summed potential gives a higher chance to reach
the intracellular threshold voltage that leads to an action potential than a single EPSP alone. A
ganglion cell here (as any neuron) thus acts as a temporal coincidence detector. If an object
moving with velocity v passes over the two receptive fields such that the inputs to the
ganglion cell arrive simultaneously we get optimal detection if v = ! We call v the tuning
T "

velocity of the receptive field pair. Figure 17.2 shows the model.

The cell pair is tuned to its characteristic velocity and its characteristic direction only. This
means we need an abundance of such pairs to have tuning for all possible velocities and
directions. There are strong indications we have indeed enormous amounts of such receptor
pairs tuned for a wide perceptual range of velocities and directions. These motion sensitive
cells are coined the on-of/" direction selective ganglion cells by Rodieck [Rodieck1998 pp.
3191.

The motion selective ganglion cells are the parasol ganglion cells. They are the larger type of
ganglion cells, and project primarily to the magnocellular layers of the LGN and to the
superior colliculus (having a role in the control of gaze stabilization by eye movements). The
288 17.2 Motion detection with pairs of receptivefields

motion selective ganglion cells have a bistratified dendritic field that stretches out into both
the on- and off-sublayers of the inner synaptic layer in the retina [Amthor1989]. It is not (yet)
clear what the physiological mechanism of the delay cell might be. A likely candidate is the
amacrine cell. This cell is located at the proper location (layer) in the retina, often very close
to parasol cells, it has only connections to parasol ganglion cells (see figure 17.3) and it comes
in 30 to 40 varieties (to accommodate for the range of delays?). In the rabbit the on-off
direction selective cells are aligned with the eye muscles.

Show[Import["parasol amacrine connections macaque.gif"],


Frame -> True, FrameTicks -> False, ImageSize -> 250];

Figure 17.3 Amacrine cell processes coupled to a parasol ganglion cell. The processes were
labeled after injection of the ganglion cell. From [Dacey1992], adapted from [Rodieck1998, pp.
263].

Varieties of Reichardt detectors have been proposed. E.g. one can add a delay cell between the
two receptive fields in the opposite direction (in the figure 17.2 from the fight to the left
receptive field). This results in a bidirectional responsive motion detector and is called a
generalized Reichardt detector. For the work of Werner Reichardt see the bibliography in
http://www.kyb.tuebingen.mpg.de/re/biblio.html, and the overview papers in
[Reichard1988b]. See also the extensive work by van de Grind, Koenderink and van Doorn
IvandeGrind1999] and by Sperling and coworkers [Sperling1998] on physiological motion
detection models.

The elegance and simplicity of the Reichardt model is appealing. Many hardware
implementations exist, as e.g. CMOS chip, or analog VLSI computing device. For an
interesting overview see: http://www.klab.caltech.edu/~timmer/tel196/motion_chips.html.

This set of specialized pairs of receptive fields forms a separate channel in the visual pathway
for motion coding. It is again an example of thefimctional separation seen in the visual front-
end. Another example of paired receptive fields is a disparity pair where two receptive fields,
one in the left eye and one at or about at the corresponding position in the fight eye, form a
pair for the detection of depth and the extraction of the differential structure of depth. This
implies that the theory discussed in this chapter is also applicable to the multi-scale extraction
of stereo disparity and its derivatives (like slant and depth curvature). Again, we find the same
ensemble tuning: for all disparities a dedicated set seems to be available.
17. Multi-scale optic flow 289

17.3 I m a g e d e f o r m a t i o n by a d i s c r e t e v e c t o r f i e l d

As an intermezzo, we discuss the use of vector fields in the calculation of image deformation.
For such a warping, the pixels are displaced according to a vectorfield, and the intensity of the
pixels at their new location has to be calculated. This is the inverse process of the extraction of
optic flow. With the understanding of warping we can easily make e.g. test images for optic
flow routines, and generate warped images for image matching. The easiest method is to start
with considering the warped image, and to measure the image intensity at the location where
the pixel came from. This is often between the original pixels, so the image has to be
resampled. Mathematica has the function I n t e r p o l a t i o n and for discrete data
ListXntorpolation that interpolates a function (of any dimension) to some order
(default is cubic spline interpolation: to third order). This function comes handy for any
resampling.

Here is the implementation of a warping routine for 2D images given a 2D vectorfield:

Unprotect[deform2D];

deform2D[im_List, vecf_List] :=
Module[{xdim, ydim, m, imp, newx, newy},
If[Append[Dimensions[im], 2] != Dimensions[vecf], Break[]];
{ydim, xdim} = Dimensions[im]; m = Table[0 9 {ydim}, {xdim}];
imp = L i s t I n t e r p o l a t i o n [ i m ] ; Do[newx = x - vecf[[x, y, 2]];
newy = y - vecf[[x, y, 1]]; If[1 <= newx <= xdim &&
1 <= newy <= ydim, m[[x, y]] = imp[newx, newy], m[[x, y]] = 0],
{y, i, ydim}, {x, i, xdim}]; m];

? deform2D

deform2D[im, vecf] deforms a 2D image im according to a


specified discrete vectorfield vecf with the same dimensions.
The image is interpolated to third order with ListInterpolation.
im = 2D input image {ydim,xdim}
vecf = 2D discrete vectorfield {ydim,xdim, 2}

We read a 2562 image of a retinal fundus recording and plot the image, the vectorfield and the
resulting warped image below:

im = Import ["fundus256.gif" l [ [i, I] ] ; {ydim 9 xdim} = Dimensions [im] ;


vecf =
x
9
Table [- {Sin [~ x--~im 1 Cos[~ y--~-~
Y 1} / / ~ , ~y, 1, ydim~, ~ , 1, xdlm~] ;
DisplayTogetherArray [{ListDensityPlot [im I ,
x
PlotVectorField[- {Sin[~ x--~im 1 , Cos[~ y ~}i,mY 1

{x, i, xdim}, {y, i, ydim}l, ListDensityPlot[deform2D[im, 10vecfll~,


Imagesize ~ 480] ;
290 17.3 Image deformation by a discrete vectorfield

Figure 17.4 Left: Image of a retina fundus registered with a scanning laser ophthalmoscope
(courtesy T. Berendschot, University Medical Center Utrecht, the Netherlands). Image
resolution 2562 . Middle: the vectors of a given warping vectorfield. Right: warped image. The
new image intensities are sampled from the (cubic spline) interpolated image at a position
dislocated over the distance of the vector. To enhance the effect, the lengths of the vectors in
the vectorfield are multiplied by 4.

Task 17.1 Modify the warping routine so you get a function for:
- image rotation over an arbitrary angle;
- zooming in and zooming out (scaling of the image);
- affine transformation, like viewing the sheet from an oblique angle; this is often
called 'perspective horizontal' or 'perspective vertical', or combinations thereof;
- spiral deformation;

17.4 The optic flow constraint equation


When we consider structure in an image, that moves with time to a new position, we need to
define what we mean with 'structure'. A likely candidate is the local luminance. The classical
approach to the optic flow equation was proposed by Horn [Horn1981| in his famous optic
flow constraint equation (OFCE) for the case of a scalar image sequence, where the total
derivative of the luminance distribution with respect to time is supposed to vanish:
d L(x,y,t) _ O.
dt

Intermezzo: Total derivatives


From the Mathematica Help: When you find the derivative of some expression f with respect
to x, you are effectively finding out how fast f changes as you vary x. Often f will depend
not only on x, but also on other variables, say y and z. The results that you get then depend on
how you assume that y and z vary as you change x. There are two common cases. Either y
and z are assumed to stay fixed when x changes, or they are allowed to vary with x. In a
standard partial derivative ~ all variables other than x are assumed fixed. On the other hand,
df
in the total derivative ~ - a l l variables are allowed to change with x. In Mathematica,
D[f,x] gives a partial derivative, with all other variables assumed independent of x.
D t [ f , x ] gives a total derivative, in which all variables are assumed to depend on x.
17. Multi-scale optic f l o w 291

Dt[L[x, y, t], t] == 0

L <~176 [x, y, t] + D t [ y , t] L (~176 Ix, y, t] + D t [ x , t] L <z'~176Ix, y, t] = = 0

dx
The derivatives 7/- and ~d y- denote the velocity in the x-direction vx and the velocity in the y-
direction v y respectively. We write upper indices for the dimensional component and lower
indices for derivatives with respect to the dimensional component. We get:
~OL- + Vx OL
~ OL
+ vy ~-y = 0. This OFCE has been the basis for numerous computer vision studies
into optic flow [see also Koenderink1986c, Koenderink1987d, Koenderink1992f (second
order flow), and see Barron1994a for a comparison of different techniques].

There is, however, a difficulty when we start l o o k i n g a t the flow (the scale-space paradigm of
observation): because we use physical apertures of finite size, the observed image is a blurred
version, and the isophote landscape changes with the aperture. When the object comes closer
or moves further away, the same aperture covers different areas of the object. The isophote
landscape changes, as we can see in the following example where we move away from the
image. The isophotes L = 50 of an image observed at scale o- = 1 is compared with the same
isophotes observed at scale o- = 2 at a distance twice as far:

i m = I m p o r t ["mrf4.gif"] [ [1, 1] ] ; {iml, im2} = gD[im, 0, 0, #] & /@ {1, 2} ;


DisplayTogetherArray [S h o w [ {L i s t D e n s i t y P l o t [iml] ,
L i s t C o n t o u r P l o t Jim1, C o n t o u r s - > {50}, C o n t o u r S t y l e - > White] }],
S h o w [ { L i s t D e n s i t y P l o t [ i m 2 , P l o t R e g i o n ~ {{0.25, 0.75}, {0.25, 0.75}l],
L i s t C o n t o u r P l o t [im2, C o n t o u r s - > {501, C o n t o u r S t y l e - > White,
P l o t R e g i o n ~ {{0.25, 0.75}, {0.25, 0.75} }] }] , I m a g e S i z e - > 265] ;

Figure 17.5 The isophote landscape of an image changes drastically when we change our
aperture size. This happens when we move away or towards the scene with the same
camera. Left: observation of an image with o- = 1 pix, isophotes L=50 are indicated. Right:
same observation at a distance twice as far away. The isophotes L=50 have now changed.

Actually, any observation changes the isophote landscape, so there is no way out then to
include the notion of observation in the derivation of the multi-scale optic flow constraint
equation. The classic optic flow constraint equation only holds for the mathematical (o- = 0)
case.

Another 'problem' is the fact that a change of an isophote can only be detected in the direction
n o r m a l to the isophote. It is impossible to detect any change in the direction tangential to the
isophote. Recall the gauge coordinates of chapter 6, and it becomes clear that this
phenomenon is called the gauge-condition. The aperture problem (finding a solution for the
two unknowns v x and v y from only one equation) is actual an aperture p r o p e r t y
292 17.4 The optic f l o w constraint equation

[Florack1998a]. It limits the outcome of any isophote-related optic flow study to the normal
component of the flow. In section 17.6 we derive the optic flow with the n o r m a l c o n s t r a i n t .

We assume that there is conservation of topological detail, i.e. optic flow field is
d i f f e r e n t i a b l e . This means that there are no discontinuities in the optic flowfield, which occur
e.g. at occlusion boundaries where structure is emerging or disappearing. This constraint is

expressed in the fixed temporal derivative of the flowfield: ~ = vx = vx .


vY 1:3'

17.5 Scalar and density images


Two physically different types of images should be considered when studying optic flow.
When a pixelset is deformed by a motion field, the new pixel can either keep its old value
(scalar flow), or its value can be corrected for the area (volume) change of the pixel
undergoing the deformation (density flow). The next illustration shows the difference:

p x l = { { 2 , 2 } , {2, 4 } , {4, 4 } , {4, 2 } } ; p x 3 = #+ {10, O} & / @ p x l ;


px2 = {{5, .5}, {5, 4.5}, {9, 5}, {9, 1}}; px4 = # + {10, 0} & /@px2;
Show [Graphics [ {GrayLevel [. 8] ,
Polygon/@ {pxl, px2, px3}, Blue, MapThread[Arrow, {pxl, px2}],
Text["scalar flow", {4, 3}], GrayLevel[.4], Polygon[px4], Red,
MapThread[Arrow, {px3, px4}], Text["density flow", {14, 3}]}],
AspectRatio -> Automatic, ImageSize -> 400] ;

Figure 17.6 Left: when a pixel or voxel deforms to a new size due to some flow, the pixel/voxel
intensity remains the same with scalar flow. Right: with density flow the intensity changes with
the inverse of the area (volume) change of the pixel (voxel).

Examples of scalar images are: range (depth) images, CT images, T1 and T2 MRI images;
examples of density images are: proton density MRI images, CCD camera images, light
microscope images etc.

17.6 Derivation of the multi-scale optic flow constraint


equation
The following derivation is due to Florack and Nielsen [Florack1994d, Florack1998a,
Nielsen1998a, Florack2000c] and further implemented and refined by Niessen et al.
[Niessen1995a, Niessen1996c, Niessen1996d, Niessen1996e] for spatio-temporal optic flow,
and Maas et al. [Maas 1995a, Maas 1996a] for stereo disparity.
17. Multi-scale optic flow 293

We derive the equation for 2D. We observe the luminance distribution with the spatio-
temporal Gaussian kernel g(x, y, t; o-, ~-). The spatial scales are O-x and O-y, the temporal scale
is r.

Clear[ax, oy, r] ;
1 1 1 ~2 y2 ~
g[x_, y_, t_] := - - - - E 2~2 2~2 2,2 ;
~ ~/2 7r 29

The observation F is the convolution of the dynamic image sequence L(x, y, t) with the
Gaussian kernel in the spatial and temporal domain:
F(x,y,t;o-,r)=L| f~ r f~ c(x, y , t ) g ( x , y , t , o - , ' c ) d x d y d t .
_oo~_~_~

In order to follow constant intensities over time, we need the Lie-derivative of the observation
with respect to the vectorfield of the motion to be zero. Lie derivatives capture variations of
space-time quantities along the integral flow of some vectorfield. In every point the direction
of the differential operator is specified by the direction of the local vector of the vectorfield.
To take a Lie derivative one therefore has to know this vectorfield. In the following we only
consider the first order Lie derivative of an image, which will give us a linear model of optic
flow. This however is no restriction, and should not be confused with the spatiotemporal
differential order we are interested in.

The Lie derivative (denoted with the symbol .s of a function F(g) with respect to a
vectorfield ~ is defined as .s F(g). The optic flow constraint equation (OFCE) states that the
luminance does not change when we take the derivative along the vectorfield of the motion:

.s F(g) =- 0

The Lie derivative of F with respect to the vectorfield ~ for scalar images is equivalent to the
directional derivative of F in the direction of ~: .s F(g)= OuF(g).9 = VF.~ where V is the
0 o ) in3D.
nabla operator (~x, ~ ) i n 2D, and( fit' Oy'

We can derive this as follows. In the scalar intensity function F(~) we study a small excursion
in the direction of the velocity ~ by substituting ~ Y+~t. We get
F( ~ + ~ t) = F(~) + V F.O t + O(t 2). The Lie derivative is by definition:
F(2) limt+0 g(~+ot)-g(Tr _ V F . O = ~-~ft OF Vi. This is the rate of change of the scalar
t __i=1 ~
function F when moving in the direction ~.

However, for density images p the Lie derivative with respect to a vector field is equivalent to
the derivative of the density function together with the vectorfield: .s p = 0~ (p ~) = V.(p ~).
The density function p is real valued and non-zero, so we may write:
.s p = p Div ~ + ~.Vp = 0.

The derivation of the expression for Lie derivative for density image flow is as follows: In the
density function p(~) we study a small excursion in the direction of the velocity ~ by
substituting ~ ~ Y + 9 t. We get p(Y + ~t). Because we have a density flow, we need to
consider the 'dilution' of the intensity when a small volume element p(~)dy changes in
volume during the motion. The notation d y denotes the infinitesimal n-dimensional volume
294 17.6 Derivation of the multi-scale optic flow constraint equation

element d y ] d y 2 . . . d y n. This 'dilution' is taken into account by the determinant J of the


Jacobian matrix (the determinant is called the Jacobian) J = det (~-~-~x)of the transformation:
p(~) d ~ ~ p(Y + ~ t) det (-~)~yy. In Mathematica the Jacobian matrix is conveniently
calculated with O u t e r :

Outer[D, {a[x, y, z], b[x, y, z], c[x, y, z]}, {x, y, z}] llMatrixForm//
shortnotation

[ a~[x, y,
bx [x, y, z]
c x [x, y, z]
z] ay[x, y,
by [x, y, z]
cy [x, y, z]
z] a~[x, y,
bz [x, y, z]
cz [x, y, z]
z] ]

]
We expand the expression: p ( g + ~ t ) d e t ( ~ ) a - y . We first study the behaviour of the
Jacobian matrix (~---Y) for small transformations, i.e. small values of t. The diagonal terms
Ov i
= ~cqv
behave as Jii = 1 + -aTxgt+ O(t 2) and the off-diagonal terms behave as Jij i4=j t i+ O ( t 2 ) .
Combined: ( - ~ ) = 1+ t B + O(t z) ~-I+ t B where I is the identity matrix. For small t the
Jacobian matrix is thus polynomial, and we may write for the determinant:
J = det (1 + t B) -~ det I + t trace B + O(t2).
Combining the expansions for p(~ + Vt) and det ( - ~ ) ~yy we get:
p(Y + ~ t) det (-~L) d~y = {p(Y)+Vp.~t+o(tZ)}{l+tV.~+o(tZ)}J~y =
{p+ptV.~+Vp.ft+o(tz)}dyy ~f {p+.E~p(x).t+O(t2)}~, from which we derive:
s p(x) = p ~.V + ~ p.V.

Just as we have seen in the first chapters, the observation of F is the convolution F | g where
g is the Gaussian aperture and F is the spatiotemporal image distribution. Again, the
differential (Lie) operator may be moved to the Gaussian kernel: ,s F | g =- F | (f_~ g).

9 Task 17.2 Explain what differential operators emerge when the vectorfield
consists of unity vectors {0,1} at every point, resp. {1,0}.

For scalar images the optic flow constraint equation under the convolution is written
as:

f(VF.~) g dY = 0, from which we get by partial integration: - f F V.(g ~) d 7 = 0, or


-f~oo ~-oo f ~ L(x - x', y - y', t - t') Grad g(x, y, t). 9(x, y, t) d x ' d y ' d t ' = 0 where Grad is the
gradient operator (V) and g the Gaussian kernel.

For density images the optic flow constraint equation under the convolution is written as:
J'V.(p ~)g dY = 0, from which we get by partial integration: - f p ( V g . f ) d E = 0, or
oo oo oo i
-f'-oo f'-oof'-oo p(x - x , y - y', t - t') Div(g(x, y, t) f(x, y, t)) d x ' d y ' d t ' = 0 where Div is the
divergence operator (V.).

The motion vectorfield ~ is the unknown in this equation that we like to establish from a series
of observations of the image. In the following we will derive a set of equations, in which we
assume some approximated vectorfield for the unknown flow, and find just as many equations
17. Multi-scale optic flow 295

as we have unknowns in our approximation of the flowfield. The solution of this set of
equations in each pixel gives us then the approximated flowfield.

We approximate the unknown vector field to some order m. We define the optic flow
vectorfield with 3 components { u [ x , y , t ] ,v[x,y,t] ,w[x,y,t] }, where u is the x-
component, v the y-component and w the t-component of the vectorfield. Here is the
approximation to order r a = 1:

Clear[u, v, w];
vectorfield[x_, y_, t _ , m _ ] :=
{Normal[Series[u[x, y, t], {x, O, m), {y, O, m), {t, O, m}]],
Normal[Series[v[x, y, t], {x, O, m), (y, 0, m), {t, 0, m}]],
Normal[Series[w[x, y, t], {x, 0, m}, {y, O, m}, {t, 0, m}]]} /.
{Derivative[a_, b_, c_][u_][0, 0, 0] /; (a + b + c > m) ~ 0l;
vectorfield[x, y, t, I]
{u[O, O, O] + t u <~176 [0, O, O] + y u <~176 [0, O, O] + x u <I'~176[0, O, 0],
v[O, O, O] + t v <~176 [0, O, O] + y v <~176 [0, O, O] + x v <I'~176[0, O, 0],
W [ 0 , 0 , 0] + t w (~176 [ 0 , 0 , 0] + y w I~176 [ 0 , 0 , 0] +XW <1'~176[ 0 , 0 , 0 ] )

Note that the S e r i e s command gives us more terms because it nests the expansion. With a
conditional ( / ; is a shortwrite for C o n d i t i o n ) Replace statement ( ] . ) we set all terms
with order higher than m = 1 to zero. So we keep only terms in which the total order is one.
The expression becomes more readable when we write the derivatives as subscripted variables
with the function s h o r t :

short[expr_] := Module[{nx, ny, nt, u}, DisplayForm[expr /.


Derivative[nx_, ny_, nt_][L_][x_, y_, t_] -> Subscript[L,
StringJoin[Table["x", {nx}], Table["y", {ny}], Table["t", {nt}]]] /.
u_[0, 0, 0] ->u]] /.
Hold[gDn[im, {nt_, ny_, nx_}, {z, oy, ~x}]] -> Subscript["L",
StringJoin[Table["x", {nx}], Table["y", {ny}], Table["t", {nt}]]]

vectorfield[x, y, t, 1] // short

[ u + t u t + x u x + y U y , v + t v t + x v x +yVy, w + t w t + x w x + y w y }

In this example of a first order vectorfield we encounter an equation where 8 unknown


components of the vectorfield should be solved: u, Ux, uy, ut, v, Vx, vy and vt. With only one
equation this is of course impossible. However, because the Lie derivative of the image
vanishes identically, so do all the partial derivatives of this. It can be shown that it is allowed
to take up to M-th order derivatives provided the flow vector is approximated to M-th
polynomial order as well. So we may add the equations for the vanishing Lie derivatives with
respect to x, y and t, because we are studying a first order vectorfield. This gives us three
extra equations. The remaining 4 have to come from external information. Important external
information can for example be found when constraints on the flow are known, such as the
constraint that only normal flow can be extracted.

The normal constraint in 2D is expressed as ~.~ = ( 010 1 ) V L . p = 0 or - v L x + u L ~ , = O

where Lx and Ly are constant. This is our fifth equation in the set of optic flow constraint
equations to be solved for the 8 unknowns. We add the derivatives of the normal constraint
296 17.6 Derivation of the multi-scale optic flow constraint equation

equation with respect to x, y and t, and we have the set of 8 equations complete, which can
then be solved for each pixel in the temporal sequence to give the normal flow components
and their first order derivatives.

Other external conditions that give additional constraint equations are for example:
when we know the flow is just a translation in the x-direction:
v =const, u = 0 ; V x = V y = v t = O ;
when we know the flow is radial, we zoom in onto a scene or fly to a vanishing point:
-

-v +u =0;
when we know the camera rotates around the optical axis: v Lx + u Ly = 0;
-

- the smoothness constraint (e.g. Lucas and Kanade [Lucas1981 ]) which use a weighted least-
squares solution to the optic flow problem by minimizing f~2 W(x) 2 (u Lx + v Ly + Lt) 2 d x d y .

We end the chapter with a section on how an appropriate scale can be selected, and numerical
examples showing the effectiveness of the method. We start with the derivation and
implementation of the optic flow constraint equation for scalar images.

17.6.1 Scalar images, normal flow.

M a t h e m a t i c a has all the machinery on board to analytically calculate the Lie derivatives, and
subsequently replace the spatio-temporal image derivatives by discrete convolution with the
appropriate Gaussian derivatives. We load the package Calculus'VectorAnalysis"
with the definitions of the nabla operator and its actions on scalar- and vectorfields
(divergence and gradient operators respectively). We set the coordinates to the spatiotemporal
Euclidean space (x, y, t):

<< Calculus'VectorAnalysis';
SetCoordinates[Cartesian[x, y, t]];

For scalar images the Lie derivative of the observed spatiotemporal image L is defined as
"r g) where .EvT g(x) = - V . ( g ~), so we get
.Ev L(g) =- L ( s

~.~ L(x, y, t; or, z) =


f_~o (oo [oo L ( x ' y', t') .s ~ (g(x - x', y - y', t - t', or, z) P(x - x', y - y ', t - t')) d x ' d y ' d t ' =
%-~ ~c,~-~
f~-oo f~-~oJ_~ L(x', y', t') (-div(g(x - x', y - y', t - t', or, "c) 9(x - x', y - y ', t - t'))) d x ' d y ' d t ' =.
0

The triple integral represents the process of 3D (2D-time) convolution. We firstly calculate the
(polynomial) expression for the Lie differential operator - D i v [ g [ x , y , t l v[x,y,t] ]
expressed in Gaussian derivatives. By replacing the Gaussian derivatives with our familiar
convolution operator gD we implement the convolution on the input image L [ x , y , t ] .
17. Multi-scale optic flow 297

re=l;
scalarflow = Expand [Simplify [
{Div[g[x, y , t] v e c t o r f i e l d [ x , y , t, m] ] ,
Div[axg[X, y, t I v e c t o r f i e l d [ x , y, t, roll,
Div[ayg[x, y, t] v e c t o r f i e l d [ x , y, t, m ] ] ,
Div[a~g[x, y, t] vectorfield[x, y, t, roll} /g[x, y, t]]];
short [scalar flow]

ux vy tw txut + U x - x- 2-Ux x y uy
~-x~ o72 ~2 ax 2 ax 2 ax 2

tyvt XyVx y2 Vy t 2 Wt t XWx tyWy


CTM2 cYY2 + Vy - ~ + Wt ~2 7~2 i~2 t

U X2 U V X y t W X t X 2 Ut t Ut X 3 Ux
+ + - - + +
aM 4 C~X2 ~ a M 2 ~2 aX 4 aM 2 aM 4

__2XUx + __x2yUy --YUY + t x y v t x2yv~ XVy +


aX 2 aX 4 aX 2 ~ + aX 2 oy 2 aX 2

X y2 Vy X Wt t 2 X We + t X 2 Wx t X y Wy
O X 2 CyM2 aM 2 + C7X2 ]g~ a X 2 7~2 + a X 2 7~2 r

vy 2 V uxy twy txyut yUx X 2 yUx x y 2 Uy t y 2 Vt

t Vt X y2 Vx X Vx yS Vy 2 y Vy y Wt t 2 y Wt t X y Wx t y2 Wy
oy2 + oy4 ~o ~ + c~ oy 2 crlz~ + ~ + ~ + cr/~ 7z------W-,
t2w W tUX + tvy t2xut tUx tX2Ux tXyUy t2yvt
~ ~ + ~ ~7 T§ ~x~ ~ + ~ + ~ + ~ +
t x y vx t Vy t y2 Vy t 3 wt 2 t Wt t2 x Wx x wx t 2 y Wy y Wy

In the statements above we derive the expression for the scalarflow - d i v (g(x, y, t) v(x, y, t))
and for the first order derivatives of the flow (-div(gx(x,y,t) v(x,y,t)),
- d i v (gy(x, y, t) v(x, y, t) ) and - d i v (gi(x, y, t) v(x, y, t) ) ). We divide by the always positive
function g(x, y, t) in order to get the coefficients that occur in front of g(x, y, t) and expand
the expressions to get all polynomial factors as separate terms. We study the 4 equations of the
result in short notation:

We see that the 8 components of the unknown vectorfield (u, Ux, Uy, ut, v, Vx, Vy and vt)
show up, but with difficult to handle terms involving x, x 2, y, y2, t, t u x, t v y, etc. The way
to cope with these terms was suggested by Florack and Nielsen: Gaussian derivatives are
equivalent to the Gaussian kernel multiplied with a Hermite polynomial of that order (see
chapter 4 on Gaussian derivatives). So we are able to convert the set of terms above into
proper Gaussian derivatives. These Gaussian derivatives convolve with the input image to get
scaled derivatives of the image, which can all be measured. We define the Hermite function

hermite[x_, n_, a_] :. [ nermiteH[n,


o~
x ];

and construct the spatiotemporal (x, y, t) Hermite polynomial of order n in x, m in y and k in


t (due to separability this reduces to a product):

polynomial[n , m_, k_] :=


Simpllfy[hermite[x, n, ax] hermite[y, m, oy] hermite[t, k, z] ,
{ax >0, o y > 0 , ~ > 0 } ]
298 17.6 Derivation of the multi-scale optic flow constraint equation

An example:

polynomial[i, I, 1]

txy
a X 2 c7y2 ~2

We create a list of all Hermite polynomials having at least a first order expansion in x, y or t.
We start with an empty list and append only a term when the sum of the orders is nonzero and
less than or equal to 3. We get 19 combinations:

terms = { } ;
D O [ I f [0 < n + m + k ~ 3, t e r m s = Expand [Append [terms, p o l y n o m i a l In, m, k ] ] ] ] ,
{m, o, 3}, {n, o, 3}, {k, O, 3}] ;
Length [terms]
terms

19

t .t 2 . 1. . t .3 . 3 t. . x. . .t x t2 x + - -x
-~-' ~4 ~2 ' ~6 + ~4 ' aM 2 ' (~X 2 ~2 ' a M 2 ~4 a M 2 ~2 '

x2 1 t x2 t x3 3 x y t y
(~X 4 aX 2 ' - - ~ + a X 2 7S2 ' aX 6 + a M 4 ' -- 0 ~ 2 ' ~-~ E2 '

tz y y xy t xy x2 y y
O y 2 7~4 + 0~2 ~ ' (7X 2 CYM 2 ' O X 2 (D~ 2 722 ' CTX 4 0 ~ 2 + ~ '

y2 1 t y2 t x y2 x y3 3 y ]

The equivalence relations between these 19 coefficients (as prefactors for the Gaussian kernel)
and the corresponding Gaussian derivative functions can be found by solving 19 simultaneous
equations. In the following we explain the steps to build these equations using the pattern
matching capability of Mathematica step by step. The same machinery can then be easily
applied to optic flow equations of other orders or higher approximation orders of the velocity
flowfield.

We define a set of temporary variables o r d e r [ a , b , c] capturing the order of the exponents


of x, y and t as follows:

Clear[a, b, c] ; e x p o n e n t s = T r a n s p o s e [ E x p o n e n t [ t e r m s , #] & /@ {x, y, tl] /.


(a , b , c } -> o r d e r [ a , b, c]

{order[0, 0, i], order[0, 0, 2], order[0, 0, 3],


order[l, 0, 0], order[l, 0, i], order[l, 0, 2], order[2, 0, 0],
order[2, 0, i], order[3, 0, 0], order[0, i, 0], order[0, i, i],
order[0, i, 2], order[l, i, 0], order[l, i, i], order[2, i, 0],
order[0, 2, 0], order[0, 2, i], order[l, 2, 0], order[0, 3, 0]}

which belong to the 'pure' polynomial terms

vats = Exponent[terms, #] & /@{x, y, t) /. {a_, b_, c_) -> x a yb t=

{t, t 2 , t 3 , x, tx, t 2 x, x 2 , tx 2 , x 3 ,
y, t y , t 2 y, x y , t x y , x 2 y, y2, t y 2 , xy2, y3}
17. Multi-scale optic flow 299

We assign these pure polynomial terms to a set of 19 new variables k [i] using MapThread.
The order of the set of replacement rules must be reversed, in order to replace the higher order
terms first in the step to follow. For example in this way x y~ is replaced before y2. Otherwise,
x would be replaced individually which would lead to wrong results. The result is a set of
assignment rules:

rules []MapThread[Rule, {vars, Table[k[i], {i, L e n g t h [ v a r s ] ) ] } ] // R e v e r s e

{y3 + k [ 1 9 ] , xy 2 ~k[18], ty 2 ~k[17], y2 ~ k [ 1 6 ] , x= y ~ k [ 1 5 ] , txy~k[14],


xy+ k[13], t ~ y~k[12], ty~k[11], y~k[10], x a ~k[9], tx a ~k[8],
x 2 ~k[7], t 2 x~k[6], tx~k[5], x~k[4], t a ~k[3], t ~ +k[2], t~k[l]}

The set of rules is applied to our initial set of terms:

k t e r m s [] t e r m s /. r u l e s

r_ k[l] 1 k[2] 3k[l] k[3] k[4] k[5] k[4] k[6]


t - -~2
-- ' - 7 + ~ , ~4 r6 , - - -O-X
, 2 -O-X-2 ~2 ' O-X2 ~2 O-X2 ~4
1 k[7] k[l] k[8] 3k[4] k[9] k[10] k[ll]
- -CYX
- 2 + - -CYX
' 4 CYX2 7;2 -- O X 4 ];2 ' CYX4 CyX6 ' 0~2 ' 0~2 7~2 '
k[10] k[12] k[13] k[14] k[10] k[15]
aM2 ~2 oy2 E4 , (yx2 oy2 , cyx2 c7y2 E2 , (yx2 cyy2 cyx4 0~2 ,
1 k[16] k[l] k[17] k[4] k[18] 3k[10] k[19] ]
cyy2 + cyy4 , 0~2 ~2 c7y4 ~2 , cyx2 aM2 cTx2 0~4 , ay4 cyy6

and converted into a set of 19 equations (we show for brevity only the first and last equations,
with S h o r t ) :

set = MapThread [Equal, {kterms, exponents) ] ; S h o r t [set, 6]

{ k[1]r2 = = o r d e r [ 0 , 0, i], _~2_i + k[2]~ = = o r d e r [ 0 , 0, 2], <<15>>,

k[4] k[18] k[19]


a x 2 crY 2 cx 2 ~ 4 --order[l, 2, 0], 3 k [a 1y 04] ay 6 --order[0, 3, 0]}

Mathematica must solve this recursively, injecting at each equation one more rule at the time,
giving a set of rules for k [ i ] :

r k = {}; D o [ r k = F l a t t e n [ A p p e n d [ r k , Solve [Take[set, i], k [ i ] ] /. r k ] ] ,


{i, I, L e n g t h [ t e r m s ] }] ;
rk

{k[l] __>_7:2 order[0, 0, i], k[2] ~ 2 (i+~2 order[0, 0, 2]),


k[3] ~ _~2 (3 ~2 order[0, 0, i] +~4 order[0, 0, 3]), k[4] ~ - o x 2 order[l, 0, 0],
k[5] ~ ~x 2 ~2 order[l, 0, i] , k[6] ~ _~2 (~x 2 order[l, 0, 0] + ~x 2 7:2 order[l, 0, 2] ) ,
k[7] ~ a x 2 ( l + ~ x 2 order[2, 0, 0]),
k[8] ~ - ~ x 2 (~2 order[0, 0, i] + a x 2 r2 order[2, 0, i]),
k[9] ~ - ~ x 2 (3ax 2 order[l, 0, 0] + ~ x 4 order[3, 0, 0]), k[10] ~ -oy 2 order[0, i, 0],
k[ll] ~ ay 2 ~2 order[0, I, i] , k[12] ~ _~2 (oqz2 order[0, I, 01 + o~Y2 ~2 order[0, I, 2] ) ,
k[13] ~ ~x 2 o2l2 order[l, I, 0] , k[14] ~ -~x 2 oy 2 ~2 order[l, I, i] ,
k[15] ~ - ~ x 2 (oy 2 order[0, i, 0] + a x 2 oqr2 order[2, i, 0]),
k[16] ~ o y 2 ( l + o y 2 order[0, 2, 0]),
k[17] ~ -Gy 2 (r2 order[0, 0, i] + oy 2 7:2 order[0, 2, i] ) ,
k[18] ~ - a y 2 (ax 2 order[l, 0, 0] + ~ x 2 ay 2 order[l, 2, 0]),
k[19] ~ - c y 2 (3oy 2 order[0, i, 0] + o y 4 order[0, 3, 0])}
300 17.6 Derivation of the multi-scale optic flow constraint equation

Now we can inject the solutions for k [i] into the rules that convert the pure exponential
terms:

ruler = rules /. rk; Short [ruler, 6]

{y3 ~ _cry2 (3 cry 2 o r d e r [0, i, O] + cry 4 o r d e r [ O 9 3, O] ) ,


xy 2 ~-oy 2 (ax 2 order[l, O, O] + o x 2 cry 2 o r d e r [ l , 2, 0]),
<<15>>, t 2 ~ E2 (i + r 2 o r d e r [ O , O, 2 ] ) , t ~-=2 order[O, O, I]}

Partial integration gives a minus sign with odd total numbers of differentiation. We can now
plug in the Ganssian derivative operators to actually calculate the derivatives from the
spatiotemporal sequence. We replace the terms order by
H o l d [ gDn [ i m , { n x , n y , n t } , { a x , o y , ~ } ] ]. The H o l d function prevents immediate
calculation. We need R e p l a c e R e p e a t e d ( / / . ) because sometimes more replacements
have to be done in more passes.

sealartmpflow= (Expand[sealarflow//. ruler]) /.


order[a_, b_, c_] -> (-I) a§247 Hold[gDn[im, {c, b, a}, {z, uy, axl]];
short [scalartmpflow]

( - w Lt - u L x - v Ly - ~2 L x t ut _ cTx 2 L x x Ux _ c7y2 Lxy u y -


~2 Ly t Vt _ ~ X 2 Lxy V x _ o y 2 Lyy V y - ~2 Lt t W t - a X 2 L x t W x - o y 2 L y t W y ,
W Lxt + U Lxx + V nxy + ~2 nxxt Ut + Lx U x + s 2 nxxx U x + o y 2 Lxxy Uy + ~2 Lxy t Vt +
O x 2 Lxxy V x + L y V X + o y 2 Lxyy Vy + ~2 Lxtt W t + L t W x + G X 2 Lxxt W x + o y 2 Lxy t W y r
U Lxy + W Lyt + V Lyy + ~2 Lxy t Ut + ~ X 2 Lxxy Ux + Lx 4 y + o y 2 Lxyy Uy + ~2 Lyy t Vt +
a X 2 Lxyy V x + L y V y + o y 2 Lyyy Vy + r 2 nytt W t + a M 2 Lxyt W x + Lt W y + a M 2 Lyyt W y ,
W Ltt + U Lxt + V L y t + L x U t + ~2 Lxtt Ut + CyX 2 Lxxt U x + o y 2 Lxy t Uy + Ly V t +
~2 Lytt Vt + O X 2 Lxyt V x + ~ y 2 Lyy t V y + L t W t + ~2 Lttt W t + a X 2 Lxtt W x + ~ y 2 Lytt W y }

Note that in the approximated flowfield we introduced derivatives of u, v, and w with respect
to x, y and t. Here w is the component of the velocity field in the temporal direction. The
velocity component in this direction emerges when structure disappears or emerges, such as at
occlusion boundaries of objects in the image. However, at this time we will require that there
is no such disappearance or emergence of structure. This constraint translates to limiting the
temporal component w of the flowfield to the zeroth order, which we put to unity, and all
derivatives of w vanish:

scalardataflow= Expand[scalartmpflow/. {w[0, O, O] -> 1,


Derivative[a_, b_, e_] [w] [0, 0, 01 -> 0}l ; short [scalardataflow]

{-Lt - U L x - V Ly - r2 Lxt Ut - a X 2 Lxx Ux - o y 2 Lxy Uy - ~2 Lyt Vt -


a X 2 nxy V x - o y 2 Lyy V y I Lxt + U Lxx + v Lxy + E 2 Lxx t u t + L x u x +
a x 2 Lxxx U x + o y 2 Lxxy Uy + T2 nxy t V t + o x 2 Lxxy V x + L y v x + a M 2 Lxyy V y i
u Lxy + L y t + v Lyy + ~2 Lxy t ut + G X 2 nxxy Ux + L x U y + 0!I 2 Lxyy U y + ~2 Lyy t v t +
~ x 2 Lxy y v x + L y V y + o y 2 Lyyy V y 9 Lit + u Lxt + v L y t + L x ut + r 2 Lxtt ut +
o X 2 Lxx t u x + o y 2 Lxy t U y + L y v t + E2 Lytt v t + o x 2 Lxy t Vx + o y 2 Lyy t v y }

In matrix notation: A P = b , where A is given by the coefficients in the 4 x 8 matrix above, and
~" = {u, v, ut, vt, Ux, Vx, Uy, Vy} a n d b = {-Lt, - L t t , -Lxt, Lyt}.
17. Multi-scale opticflow 301

Note that this set of equations has become quite complex. In order to estimate a flowfield
approximated to first order we need to extract spatiotemporal derivatives to third order, and
temporal derivatives to second order. This of course has implications for the requirements to
the datasets from which the field is to be extracted. The more images in the temporal sequence
the better. In the limiting case of just 2 images as in a stereo pair we have to approximate the
first order temporal derivative by the mere difference, and put the second order temporal
derivative to zero.

We acquired four equations with the eight unknowns u, ux, Uy, ut, v, Vx, Vy and vt. The four
additional equations required in order to be able to solve the eight unknowns in each pixel can
only be formulated by incorporating external physical constraints to the flow. We choose in
this example the constraint of normalflow, which leads to an extra four equations. The normal
flow is easily derived from the regular flow by the substitution {u, v, 1} ~ {-v, u, 0}. It can
also be expressed as - v L x + u Ly = 0 where Lx and Ly are constant. This equation expresses
the fact that the tangential component of the velocity field vanishes. We first replace u into a
temporary variable vtmp, then replace the derivatives of u into the derivatives of vtmp, then
we replace the v into u and the derivatives of v into the derivatives of u, and finally replace
vtmp back into v.

scalarnormalflow = scalartmpflow/.
{ u [ 0 , 0, 0] - > - v t m p [ 0 , 0, 0] , D e r i v a t i v e [ a _ , b _ , c_] [u] [0, 0, 0] - >
-Derivative[a, b, c] [ v t m p ] [0, 0, 0 ] , v [ 0 , 0, 0] - > u [ 0 , 0, 0 l,
Derivative[a_, b _ , c _ ] [v] [0, 0, 0] - > D e r i v a t i v e [ a , b, c] [u] [0, 0, 0 ] ,
w[0, 0, 0] -> 0, D e r i v a t i v e [ a _ , b _ , c_] [w] [0, 0, 0] -> 0} /. v t m p - > v ;
short [ s c a l a r n o r m a l flow]

Iv Lx - u Ly - z 2 Ly t u t - u x 2 Lxy u x - o y 2 Lyy Uy + 272 Lxt v t 4- a M 2 Lxx Vx + o ~ 2 Lxy Vy ,


-Vnxx + tlLxy + 7~2 Lxy t u t + o x 2 Lxxy u x + L y u x +
o y 2 Lxyy Uy - z2 Lxxt vt _ Lx Vx _ a x 2 Lxxx Vx _ oy2 Lxxy vy t
- v Lxy + u Lyy + ~2 Lyy t ut + a x 2 Lxyy Ux + Ly Uy + o y 2 Lyyy Uy - ~2 Lxy t vt _
(Tx 2 Lxxy Vx - Lx Vy - o ~ 2 Lxyy Vy , - v Lxt + u Lyt + Ly ut + 7~2 Lytt ut +
oq( 2 Lxyt Ux + 0~/z2 Lyyt Uy - Lx vt - ~2 Lxtt vt _ a x 2 Lxxt Vx _ oy2 Lxy t Vy }

In matrix notation: N ~ = O. The total set of eight equations is the concatenation of the two
sets, forming eight equations with eight unknowns:

scalarflowequations = Join[scalardataflow, scalarnormalflow];

17.6.2 Density images, normal flow.

For density images the Lie derivative is Lv T g ( x ) = - V g . ~ = 0. We can now give the full
derivation of the 8 constraint equations to be solved in each pixel as a single routine, for the
same conditions as above: approximation of the vector field to first order, no creation of new
structure and the normal flow constraint:

<< C a l c u l u s ' V e c t o r A n a l y s i s ' ;


SetCoordinates[Cartesian[x, y, ill;

densityflowequations[order_] : 9
Module[{g, densityflow0, densityflow, hermite,
polynomial, terms, exponents, vars, rules, kterms, ruler, rk,
302 17.6 Derivation of the multi-scale optic flow constraint equation

densitytmpflow, densitydataflow, densitynormalflow, vtmp}, im =. ;


1 1 1 x2 y2 t2
Clear[ox, oy, r]; g[x_, y_, t_] := - - - - E 2~2 2~z 2~z ;
~ 2 7 r oy 2 ~ 2 7 r ~ 2
C l e a r [ n , v , w ] ; m= o r d e r ;
vectorfield[x, y, t, m] =
{Normal[Series[u[x, y, t], {x, 0, m}, {y, 0, m}, (t, 0, m}]],
Normal[Series[v[x, y, t], (x, 0, m}, {y, 0, m}, {t, 0, m}]],
Normal[Series[w[x, y, t], {x, 0, m}, (y, 0, m}, {t, 0, m}]]} /.
(Derivative[a_, b_, c_] [u_] [0, 0, 0] /; ( a + b + c >m) -> 0};
densityflow0 = Expand [Simplify [
(-Grad [g [x, y, t] ] . vectorfield[x, y, t, m]) / g Ix, y, t] ] ] ;
densityflow = If [m == 0, {densityflow0},
Expand[Simplify[{-Grad[g[x, y, t] ]. vectorfield[x, y, t, m],
-Grad [ax g [x, y, t] ] . vectorfield[x, y, t, m],
-Grad[ayg[x, y, t] ]. vectorfield[x, y, t, m],
-Grad [at g[x, y, t] ]. vectorfield[x, y, t, m] } /g [x, y, t] ] ] ] ;

hermite[x_, n_, a_] := HermiteH[n, a~

polynomial[n_, m_, k_] := Simplify[


hermite[x, n, ox] hermite[y, m, oy] hermite[t, k, r] , {ox > 0, oy > 0, r > 0}] ;
terms = (} ; DO [If [0 < n + m + k ~ 3, terms = Expand [Append[terms, polynomial In, m, k] ] ] ] ,
{m, O, 3 } , {n, O, 3 } , {k, O, 3 } ] ;
Clear[a, b, c] ; exponents = Transpose[Exponent[terms, #] & /@ {x, y, t}] /.
(a_, b_, c_} -> order[a, b, c] ;
v a r s = Exponent[terms, #] & /@ (x, y, t} /. {a_, b_, c_} -> x a yb to;
rules = MapThread[Rule, {vars, Table[k[i] , {i, Length[vars] }] }] // Reverse;
kterms = terms /. rules; set = MapThread[Equal, {kterms, exponents} ] ;
rk = {}; Do[rk = Flatten[Append[rk, Solve[Take[set, i] , k[i] ] /. rk] ] ,
{i, i, Length[terms] }] ;
ruler = rules /. rk;
densitytmpflow = [Expand [densityflow //. ruler] ) /.
order[a_, b_, c_] -> (-i) T M Hold[gDn[im, {c, b, a}, {z, oy, ax}]];
densitydataflow = Expand [densitytmpflow/.
(w[0, 0, 0] -> i, Derivative[a_, b_, e_] [w] [0, 0, 0] -> 0}] ;
densitynormalflow= densitytmpflow/. {u[0, 0, 0] ->-vtmp[0, 0, 0],
Derivative[a_, b_, c_] [u] [0, 0, 0] -> -Derivative[a, b, c] [vtmp] [0, 0, 0] ,
v[0, 0, O] ->u[0, o, 0],
Derivative[a_, b_, c_] [v] [0, 0, 0] -> Derivative[a, b, c] [u] [0, 0, 0] ,
w[0, 0, 0] -> 0, Derivative[a_, b_, c_] [w] [0, 0, 0] -> 0} /. vtmp -> v;
Join[densitydataflow, densitynormalflow] ]

The density flow equations for zeroth order lead to the classical Horn and Schunck optic flow
constraint equations [Horn 1981 l:

densityflowequations [0] // s h o r t

{Lt + U L x +VLy, -VLx +ULy}

For a first order approximation of the unknown flow field we get much more derivatives, up to
third order, and eight equations:
17. Multi-scale optic flow 303

densityflowequations [1] // short

{Lt + U Lx + V Ly + ~2 L x t Ut + Ux + o-x 2 L x x U x + o y 2 L x y U y + ~2 L y t V t +

GX 2 Lxy V x + Vy + O~/72 L y y V y I - L x t - U Lxx - V Lxy - ]~2 L x x t U t _ 2 Lx Ux -

a x 2 L x x x u x - o~ff2 L x x y Uy - ~2 L x y t v t _ (~x 2 L x x y V x _ Ly v x - L x Vy - o~ 2 Lxyy vy s

-u Lxy - Ly t - v Lyy - ~2 L x y t u t _ (~x 2 L x x y U x _ L y I,/x - L x U y -


a M 2 L x y y fly - r 2 L y y t v t - ax 2 Lxyy Vx - 2 Ly vy - o-172 L y y y V y f

-Ltt - u Lxt - VLy% - L x u t - ~2 L x t t ut _ Lt ux _ ax 2 Lxxt ux _


oi/2 L x y t U y - L y v t - ~2 Lytt v t _ ~x 2 Lxy t Vx _ Lt Vy - o~ 2 Lyy t Vy v
-v L x + u Ly + E2 L y t u t + c;x 2 L x y Ux + Uy + cry 2 L y y U y - ~2 L x t v t _ V x _

~x 2 Lxx Vx - o'y 2 L x y v y t V L x x - U L x y - ~2 L x y t S t - (~x 2 L x x y U x - L y U x -

nx Uy - oy 2 Lxyy Uy + ~2 Lxxt v t + 2 Lx v x + ~ x 2 Lxxx V x + o y 2 Lxxy Vy r


V Lxy - I/Lyy - ~2 L y y t U t _ aX 2 Lxyy Ux _ 2 L y U y - cry 2 L y y y ily +

~2 L x y t V t + (~X 2 L x x y V x + ny V x + L x Vy + C7~ 2 n x y y V y r

V Lxt - U n y t - L y U t - E2 L y t t Ut _ aX 2 Lxy t Ux - Lt Uy - oy 2 nyy t Uy +

Lx Vt + ~2 L x t t V t + Lt Vx + aX 2 Lxxt Vx + O~2 Lxy t Vy }

All eight equations in this set have to vanish, leading to a simultaneous set of eight equations
with eight unknowns. In the next section we test the procedure on a spatio-temporal sequence
of known deformation.

Task 17.3 Compare the above derived equations with the results of Otte and
Nagel [Otte1994], by taking the limit of O-x --) O, O-y - - ) 0 and ~- --) O. For a detailed
description of this comparison see [Florack1998a].

17.7 Testing the optic flow constraint equations


In order to test the multi-scale optic flow constraint equations, we have developed a
convenient package OFCE.m, which defines the commands to calculate the first order
flowfleld, and to display the result as a vectorfield.

The package, written by A. Suinesiaputra, is a good example of how to make complex


functions available in notebooks, and is read by:

<< FrontEndVision'OFCE';

As a test stimulus for scalar flow, we create a movie of deforming vessels, which we take
from the fundus image of figure 17.4. We take a 64x64 section around the fovea, where we
have vessels in different directions. We use the same vectorfield for warping as in figure 17.4.
304 17.7 Testing the optic flow constraint equations

im = Take[in = Import["fundus256.gif"][[1, I]],


{128, 1 2 8 + 6 4 } , {180, 1 8 0 + 6 4 ) ] ;
{ydim, xdim} = Dimensions[im];
DisplayTogetherArray[
{ListDensityPlot[in, E p i l o g ~ {RGBColor[I, i, i], Line[{{180, 128},
{ 1 8 0 + 6 4 , 128}, { 1 8 0 + 6 4 , 1 2 8 + 6 4 } , {180, 1 2 8 + 6 4 ) , {180, 128}}]}],
ListDensityPlot[im]}, I m a g e S i z e - > 300];

Figure 17.7 For the scalar optic flow test image we select a 64x64 pixel subregion around the
fovea of the fundus image.

This generates the warping sequence:

vecf = Table[
N[-{Sin[(~x) /xdim], Cos[(y~) /ydim])], {y, I, ydim}, {x, i, xdim}] ;
s t i m = Table[deform2D[im, ivecf], {i, 0, 4, 1 / 5 } ] ;

We then calculate the first order optic flow vectorfield for scalar images, with spatial scale
o- = 2.0 pixels and temporal scale ~- = 1.0 pixel. The function F i r s t O F C E [ ] is defined in
the package MOFCE. ta.

opticflowScalar =
FirstOFCE[stim, 2.0, 1.0, F l o w T y p e ~ scalar, S i n g u l a r i t i e s ~ ZeroVector]i

Calculate first order optic flow at (a,~)=(2.,l.)

create scalar OFCE matrix .....

create OFCE result vector .....

....solving ....

The stimulus and the vectorfield plots are calculated and displayed, where we omit the first
and last two images of the series, because of boundary effects:

stimulusplot =ListDensityPlot[#, P l o t R a n g e ~ {Min[stim], Max[stim]},


DisplayFunction -> Identity] & /@ stim; vectorfieldplot =
VectorPlot[#, HeadScaleFactor ~ 2, Sampling ~ {3, 3}, AspectRatio ~ 1,
P l o t R a n g e ~ {{1, 64}, {1, 64}}, C o l o r F u n c t i o n ~ A u t o m a t i c ,
DisplayFuncs Identity] & / @ T a k e [ 2 0 o p t i c f l o w S c a l a r , {4, 14}];

Show[#, PlotRange ~ {{i, 64}, {i, 64}},


DisplayFunctlon ~ SDisplayFunction, ImageSize -> 220] & /@
Transpose[{Take[stimulusplot, {4, 14)], vectorfieldplot}];
17. Mul•scale optic flow 305

Figure 17.8 The resulting vectorfield for the scalar optic flow test sequence of a warped
fundus image. The vectors have been plotted 20 times their actual length, to show the
direction and magnitude better. Note that most velocities and their directions are found
correctly, and that the normal constraint is the main reason for some (well understood)
deviation.

17. 8 Cleaning up the vector field

The option A v e r a g i n g in the function V e c t o r P l o t denotes the size of a small


neighborhood over which the velocity vectors are uniformly averaged. This value must be a
nonnegative odd integer. When the A v e r a g e window size is 5 we get a much smoother
vectorfield:

vplot3 = VectorPlot[20 #, Sampling ~ {2, 2}, HeadScaleFactor ~ 2,


AspectRatio ~ 1, Averaging ~ 5, ColorFunction ~ Automatic,
D i s p l a y F u n c t i o n ~ Identity] &/@Take[opticflowScalar, (3, 14}];
Show[#, P l o t R a n g e ~ {{0, 65}, {0, 65}}, B a c k g r o u n d ~ R G B C o l o r [ 0 , 0, 0],
DisplayFunction ~ SDisplayFunction, ImageSize -> 200] & /@
Transpose[{Take[stimulusplot, {3, 14}], vplot3}];

Figure 17.9 The vectorfield for the scalar optic flow test sequence of figure 17.8 with vector
averaging over a 5 x 5 region. Note the much smoother result.
306 17. 8 Cleaning up the vector field

9 17.4 Show that with a 25 x 25 averaging area the original smoothing vector field
is recovered. Why is this so?

9 17.5 Show the result with a 25 x 25 averaging area for a much faster varying
vector field deformation.

However, averaging the vector field may introduce deviations from the true normal motion. A
better way is to weight the su~ound of a vector with a Gaussian function, and multiply with a
penalty function given by E x p [ - -~-], where K. is a norm~ized matrix condition number and
Kma•
In is just a constant. We set m = 0.1. The value of K. = Km~. where Kmax and Kmin are the
largest and smallest Eigenvalue of the matrix, so Kn will be in the range of (0.. 1 ].

o p t i e f l o w S c a l a r = F i r s t O F C E [ s t i m , 2.0, 1.0,
F l o w T y p e ~ scalar, S i n g u l a r i t i e s ~ Z e r o V e c t o r , S m o o t h i n g -> True];

Calculate first order optic flow at (a,~)=(2.,l.)

create scalar OFCE matrix .....

create OFCE result vector .....

.... solving ....

integration of scale space ....

vplotl =
V e c t o r P l o t [ 2 0 #, S a m p l i n g ~ {2, 2}, H e a d S c a l e F a c t o r ~ 2, A s p e c t R a t i o ~ I,
C o l o r F u n c t i o n ~ A u t o m a t i c , D i s p l a y F u n c t i o n ~ Identity] & /@
T a k e [ o p t i c f l o w S c a l a r , {4, 14}];

S h o w [ # , P l o t R a n g e ~ {{0, 65}, {0, 65}}, B a c k g r o u n d ~ R G B C o l o r [ 0 , 0, 0l,


D i s p l a y F u n c t i o n ~ S D i s p l a y F u n c t i o n , I m a g e S i z e -> 200] & /@
T r a n s p o s e [ [ T a k e [ s t i m u l u s p l o t , {4, 14}], v p l o t l } ] ;

Figure 17.10 The vectorfield for the scalar optic flow test sequence of figure 17.8 with a
weighted penalty based on the condition number of the matrix.
17. Multi-scale optic flow 307

17.9 Scale selection


One of the major problems in making an effective extraction of the optic flow vectors, is the
appropriate choice of scale. This is a general and fundamental issue in scale-space theory.
Selection of scale must be treated at two levels:

First of all it is always present in the task we perform on the image. Do we want to segment
the leaves or the tree? A careful specification of the image analysis task should make the
choice of scale explicit. Secondly, we can look for automatic scale selection, by optimizing
some criterion over a range of scales. For example, in feature detection one might look for the
maximized output amplitude of some feature detector for each location (see Lindeberg for a
detailed discussion).

For our optic flow we have a different criterion: in stead of the generation of a (sophisticated)
filter output, we solve locally a set of equations, which is a process. When the matrix of
coefficients in the matrix equation becomes singular, we know that the solution cannot be
found. In other words, the further we are off from singularity, the better we can solve the set
of equations. This means that we need some measure of how far we are from singularity.
From linear algebra we know that the condition number of a matrix is a proper choice. There
are a number of possibilities [from Ruskeep~i~i1999, section 17.3.4]:

normLl[x_] :=Max[(Plus@@Abs[#]) & /@Transpose[x]] ;


n o r m L i n f [x_] : = M a x [ (Plus @ @ A b s [# ] ) & /@ x] ;
n o r m L 2 [x_] := M a x [ S i n g u l a r V a l u e s [N[x] ] [ [2] ] l ;

The L 1 -norm ( n o r m L 1 ) is the largest of the absolute column sums, and the L ~ - n o r m
( n o r m L i n f ) is the largest of the absolute row sums. The L2-norm ( n o r m L 2 ) is the
maximum of the singular values.

SingularValues [N [ x ] ] gives the singular value decomposition, and the second


component of this decomposition contains the singular values. This norm can be written as

normL2b[x ] := ~ M a x [ A b s [ E i g e n v a l u e s [ C o n j u g a t e [ T r a n s p o s e [ x ] ] . x ] ] ] ;

because the norm is also the square root of the largest absolute eigenvalue of the matrix
C o n j u g a t e [Transpose I x ] ] . x. In fact, the singular values are simply the square roots
of the eigenvalues of this matrix.

We use the Frobenius norm defined as:

F r o b e n i u s N o r m [mat_] : = Module [ { a, A = DeleteCases [


Re [%/Eigenvalues[Transpose[Conjugate[mat]].mat] ], 0.]},
a = Plus@@A2; Check[If[a< 10 -s~ I I a-Z === I n d e t e r m i n a t e I I
a -I === C o m p l e x I n f i n i t y , SMaxMachineNumber, P l u s @ @ A -2 ] , mat] ] ;

We choose here for the maximum norm of the coefficient matrix over scale. Does this norm
display a maximum over scale anyway? We calculate the norm for 22 spatial scales, ranging
from o- = 0.8 to o- = 5 pixels in steps of 0.2, for the eighth frame in the sequence ( f = 8 ) .
308 17.9 Scale selection

marL2 = Table[Lx = gDn[stim, {0, 0, 1}, {1, u, a}];


Ly = gDn[stlm, {0, i, 0}, {i, a, a}];
mat[f ] := (Lx[[fJ]" Ly[ [f]]
\Ly[[f]] -Lx[ [fl ] );
Map[FrobeniusNorm, Transpose[mat[8], {3, 4, i, 2}], {2}]
, {u, .8, 5, .2}]; Dimensions[marL2]

{22, 65, 65}

Indeed quite a few pixel locations show a maximum, when we plot the logarithm of the norm
as a function of scale index for all pixels:

DisplayTogether[Table[
LogListPlotlTranspose[matL2, {3, 2, 1}] [[i, j]], PlotJolned->True,
PlotRange-> {.01, 500}, AxesLabel-> {"scale index", "norm"}],
{i, i, 64, iO}, {j, I, 64, i0}], ImageSize -> 290];

Figure 17.11 Logarithmic plot of the L2 norm in every pixel as a function of spatial scale
(range: o- = 0.8 to 5.0 pixels) for the eighth frame in the retinal fundus sequence.

The plot of the scale index as a function of location in the frame is indicative for what
happens: a small scale is best at locations with image structure, such as the location of the
main vessels. In more homogeneous areas a larger scale gives a better condition number.

ShowLegend[DisplayTogetherArray[
{ListDensityPlot[stim[[8]]], ListDensityPlot[-Log[matL2[[8]]]]},
DisplayFunction-> Identity], {GrayLevel[1-#] &, 10, "u=3.0",
"a=0.8", LegendPosition ~ {1.1, -.4}}, ImageSize -> 400];

Figure 17.12 Left: eighth frame of the image sequence. Right: scale index for the maximum
singular value of the coefficient matrix of the optic flow equations, for the scale range o- = 0.8
to 3 pixels. Note the smaller scales for the locations rich with image structure, such as the
vessels, and the use of larger scales for the more homogeneous areas.
17. Multi-scale optic flow 309

9 Task 17.6 Investigate the behaviour of other definitions of condition numbers.

9 Task 17.7 Investigate the influence of the temporal scale.

9 Task 17.8 Investigate the influence of both the spatial scales and the temporal
scale simultaneously.

9 Task 17.9 Implement scale selection by selecting in each pixel the scale with the
largest Frobenius norm.

17.10 Discussion
Many authors have proposed a multi-scale approach to the determination of optic flow. Weber
and Maiik [Weber1995b] used a filtered differential method.

They applied a set of filters of various spectral contents, and also got a set of equations which
can be solved to get a unique solution for the optic flow. Fleet and Jepson [Fleet1990] extract
image component velocities from local phase information. Weickert recently introduced an
approach for discontinuity preserving flow determination [Weickert1998d], and a method
using a variational (energy minimizing) approach [Weickert2001 a].

So far, we have extracted the derivatives in the temporal direction in the same way as in the
spatial direction(s), through convolution with a Gaussian derivative kernel with an appropriate
temporal scale. For a pre-recorded temporal sequence this is fine. When the Gaussian kernel
extends over the border of the image, the image can be periodically extended, as we did for
the spatial case. In a real-time situation however, the only information available is the past,
and we cannot use the half of the Gaussian kernel that extends 'into the future'. This problem
has been elegantly solved by Jan Koenderink, who proposed to resample the temporal axis
with a logarithmic reparametrization, based on arguments of temporal causality. This is
discussed in detail in chapter 20.

Of course, a local method where a measurement is done through an aperture, faces the
aperture property (for a nice set of demo's of the aperture 'problem' see
www.illusionworks.com). A common extra constraint is the 'smoothness constraint, where the
local vector is considered in its relation to its direct vicinity, and should not change much over
a small distance. A natural step in the context of multi-scale optic flow extraction is the use of
deep structure scale space theory: the singularity points in the deep structure have no aperture
problem, they form a set of natural intrinsic and hierarchically ordered multi-scale image
pointset.
310 17.ll Summary of this chapter

17.11 Summary of this chapter


Two main features stand out in the treatment of optic flow in this chapter. The classical Horn
& Schunck optic flow constraint equation is brought 'under the aperture', as it should for
observed variables. The application of scaled derivatives brings the approach under the scale-
space paradigm.

Secondly, the notion of Lie-derivatives of the image intensity time series with respect to some
unknown vectorfield enables us to form a set of equations from which the unknown
parameters of the flow field can be calculated. The aperture problem is the consequence of the
fact that we in typical cases have more unknowns then equations. However, when additional
physical constraints are given, such as the constraint of normal or expanding flow, the solution
can be found exact. The theory enables the solution for different orders of the vector field
approximation.

Earlier tests of the method have shown that this physics based scale-space approach
outperforms all other methods. The main reason must be the full inclusion of the physical
model of the observed flow.
18. Color differential structure

Jan-Mark Geusebroek, Bart M. ter Haar Romeny, Jan J.


Koenderink, Rein van den Boomgaard, Peter Van Osta

18.1 Introduction
Color is an important extra dimension. Information extracted from color is useful for almost
any computer vision task, like segmentation, surface characterization, etc. The field of color
science is huge [Wyszecki2000], and many theories exist. It is far beyond the scope of this
book to cover even a fraction of the many different approaches. We will focus on a single
recent theory, based on the color sensitive receptive fields in the front-end visual system. We
are especially interested in the extraction of multi-scale differential structure in the spatial and
the color domain of color images. This scale-space approach was recently introduced by
Geusebroek et al. [Geusebroek1999a, Geusebroek2000a], based on the pioneering work of
Koenderink's Ganssian derivative color model [Koenderink1998a]. This chapter presents the
theory and a practical implementation of the extraction of color differential structure.

18.2 Color image formation and color invariants


What is color invariant structure? To understand that notion, we first have to study the process
of color image formation.

<< FrontEndVision'FEV';
pl = Plot[Sin[.06A] + S i n [ . I A ] +5, (A, 350, 700}, P l o t R a n g e - > {2, 8},
ImageSize -> 275, Ticks -> {Automatic, None}, AspectRatio -> .4];
p2 = DensityPlot[A, {A, 350, 700}, {y, 0, 2}, A s p e c t R a t i o - > .1,
ColorFunctlon-> ( H u e [ - O . 0 0 1 9 6 4 # + 1 . 3 7 5 ] &),
ColorFunctionScaling -> False, Frame -> False, ImageSize -> 275];
I

Figure 18.1 An arbitrary spectrum (distribution of the energy over the wavelengths) reflected
from an object. For different colored objects we have different spectra. Horizontal scale: nm,
vertical scale: energy (W/m2).
312 18.2 Color image formation and color invarian ts

Figure 18.1 shows an arbitrary spectrum that may fall onto the eye (or camera). The spectrum
as reflected by an object is the result of light falling onto the object, of which part of the
spectral energy is reflected towards the observer. Hence, the light spectrum falling onto the
eye results from interaction between a light source, the object, and the observer. Color may be
regarded as the measurement of spectral energy, and will be handled in the next section. Here,
we only consider the interaction between light source and material.

Before we see an object as having a particular color, the object needs to be illuminated. After
all, in darkness objects are simply black. The emission spectra l(A) of common light sources
are close to Planck's formula [Wyszecki 1999]

h=6.62617610-34; c = 2.99792458108; k = 1.3806610-23;


hr -1
I [ ~ _ , ~_1 : : 8 , , h r -~ (~.~-~ - I)

where h is Planck's constant, k Boltzmann's constant, and c the velocity of light in vacuum.
The color temperature of the emitted light is given by T, and typically ranges from 2,500K
(warm red light) to 10,000K (cold blue light). Note that the terms "warm" and "cold" are given
by artists, and refer to the sensation caused by the light. Representative white light is, by
convention, chosen to be at a temperature of 6500K. However, in practice, all light sources
between 2,500K and 10,000K can be found. Planck's equation is adequate for incandescent
light and halogen. The spectrum of daylight is slightly different, and is represented by a
correlated color temperature. Daylight is close enough to the Planckian spectrum to be
characterized by a equivalent parameter.

The part of the spectrum reflected by a surface depends on the surface spectral reflection
function. The spectral reflectance is a material property, characterized by a function c(&). For
planar, matte surfaces, the spectrum reflected by the material e(&) is simply the multiplication
between the spectrum falling onto the surface l(A) and the surface spectral reflectance function
c(A): e(A)= c(~t)l(&). For example, figure 18.2 shows the emission spectrum of three
lightsources, resp. of 2500K, 4500K and 10000K.

PZot[{z[azo-', 2500], l [ A 1 0 - ' , 4500], ~


1
l[A10-', 100003 },
{A, 0, 2000}, P l o t R a n g e -> All, I m a g e S i z e -> 200] ;

3(YdO00

2500OO

2000(10

150000

1(~ooo

50000

590 lOOO 15oo 2C~0

Fig 18.2 Emission spectrum of a black body light source with a temperature of resp. 2500K,
6500K and 10,000K. The emission at 10,000K is much larger than at the 2 lower
temperatures, and for that reason, is plotted at 1/50 of its amplitude.
18. Color differential structure 3!3

Now that we have the spectral reflectance function, we can examine how the reflected
spectrum would look with a different light source. In figure 18.3 the reflected spectrum for a
2500K, 4500K and a 10,000K radiator is demonstrated.

Block[{$DisplayFunction = Identity},
plots = P 1 o t [ l [ A 1 0 -9 , #] (Sin[.06A] + S i n [ . 1 A ] +5),
{A, 350, 700}, PlotRange ~ All, ImageSize ~ 300,
Ticks ~ {Automatic, None}] & /@ {2500, 4500, i0000}];
Show[GraphicsArray[plots], ImageSize -> 500];

k
400 4 5 0 500 550 600 650 700 4 0 0 450 500 550 6(10 65(3 7 0 0

Fig 18.3 Object reflectance function for the observed spectrum shown in figure 18.1 for a resp.
2500K, 6500K and 10,000K light source. As opposed to the reflected spectrum (fig. 18.1), this
object reflectance function is a material property, independent of the illumination source.

At this point it is meaningful to introduce spatial extent, hence to describe the spatio-spectral
energy distribution e(x,y,A) that falls onto the retina. Further, for three-dimensional objects the
amount of light falling onto the objects surface depends on the energy flux, thus on the local
geometry. Hence shading (and shadow) may be introduced as being a wavelength independent
multiplication factor m(x,y) in the range [0... 1]: e(x, y, 2t) = c(x, y, ~t) l(;t) m(x, y).

Note that the illumination l(2t) is independent of position. Hence the equation describes
spectral image formation of matte objects, illuminated by a single light source. For shiny
surfaces the image formation equation has to be extended with an additive term describing the
Fresnel reflected light, see [Geusebroek2000b] for more details.

The structure of the spatio-spectral energy distribution is due to the three functions c(.), l(.),
and m(.). By making some general assumptions, these quantities may be derived from the
measured image. Estimation of the object reflectance function c(.) boils down to deriving
material properties, the "true" color invariant which does not depend on illumination
conditions. Estimation of the light source l(.) is well known as the color constancy problem.
Determining m(.) is in fact estimating the shadows and shading in the image, and is closely
related to the shape-from-shading problem.

For the extraction of color invariant properties from the spatio-spectral energy distribution we
search for algebraic or differential expressions of e(.), which are independent of l(.) and m(.).
Hence the goal is to solve for differential expressions of e(.) which results in a function of c(.)
only.
314 18.2 Color image formation and color invariants

To proceed, note that the geometrical term m is only a function of spatial position.
Differentiation with respect to ,~, and normalization reduces the problem to only two
1 Oe(x,Tt) l~ c...L.~
functions: e(x, 30 = c(x, A) l(?,) ~ e(x,,t) 9--T-= 5- + c (indices indicate differentiation).
After additional differentiation to the spatial variable x or y, the first term vanishes, since l(.)
only depends on ~:

e[x, .X] = c [ x , A] 1[~.l ~


0= ( 0 ~ e [ x ' A]
e[x, A] )

The left-hand side, after applying the chain rule,

(Oae[x, y, A] )
O~ e[x, y, A] // shortnotation
e[x, y, ~] e ~ [x, y, ~] -e~ Ix, y, A] e: [x, y, A}
e[x, y, A] 2

is completely expressed in spatial and spectral derivatives of the observable spatio-spectral


energy distribution.

As an example in this chapter we develop the differential properties of the invariant color-
edge detector ~ = l_ & where the measured spectral intensity e = e(x, y, 30. Spatial
derivatives of N, likee 00~' ' contain derivatives to the spatial as well as to the wavelength
dimension due to the chain rule. In the next section we will see that the zero-th, first and
second order derivative-to-X kernels are acquired from the transformed RGB space of the
image directly. The derivatives to the spatial coordinates are acquired in the conventional way,
i.e. convolution with a spatial Gaussian kernel.

18.3 Koenderink's Gaussian derivative color model


We have seen in the previous chapters how spatial structure can be extracted from the data in
the environment by measuring the set of (scaled) derivatives to some order. For the spatial
domain this has led to the family of Gaussian derivative kernels, sampling the spatial intensity
distribution. These derivatives naturally occur in a local Taylor expansion of the signal.

Koenderink proposed in 1986 to take a similar approach to the sampling of the color
dimension, i.e. the spectral information contained in the color. If we construct the Taylor
expansion of the spatio-spectral energy distribution e(x, y, ~t) of the measured light to
wavelength, in the fixed spatial point (x0, Y0), and around a central wavelength Ao we get (to
second order):

Series[e[x0, y0, A], {A, A0, 2}]

e[x0, y0, A0] + e I~176 [x0, y0, A0] ( ~ - A 0 ) +


1. e(O,o,2) [x0, y0, R0] (~.-~.0) 2 + 0 [ ) . - ) . 0 ] 3
2
18. Color differential structure 315

We recall from chapter 1 that a physical measurement with a aperture is mathematically


described with a convolution. So for a measurement of the luminance L with aperture function
G(x, or) in the (here in the example 1D) spatial domain we get:
L(x; o-) = f_~L(x - ce) G(~, or) d ~ where ~ is the dummy spatial shift parameter running over
all possible values.

For the temporal domain we get L(t; o-) = f~ooL(t- t ) G(fl, o-) d r where/3 is the dummy
temporal shift parameter running over all possible values in time (in chapter 20 we will take a
close look at this temporal convolution). Based on this analogy, we might expect a
measurement along the color dimension to look like: L(,/.; o-) = f_~ooL(A- y) G(7, o-) d~, where
~t is the wavelength and y is the dummy wavelength shift parameter.

The front-end visual system has implemented the shifted spatial kernels with a grid on the
retina with receptive fields, so the shifting is implemented by the simultaneous measurement
of all the neighboring receptive fields. The temporal kernels are implemented as time-varying
LGN and cortical receptive fields (explained in detail in chapters 11 and 20). However, in
order to have a wide range of receptive fields which shift over the wavelength axis in
sensitivity, would require a lot of different photo-sensitive dyes (rhodopsins) in the receptors
with these different -shifted- color sensitivities.

The visual system may have opted for a cheaper solution: The convolution is calculated at just
a single position on the wavelength axis, at around/to = 520 nm, with a standard deviation of
the Gaussian kernel of about o-~ = 55 nm. The integration is done over the range of
wavelengths that is covered by the rhodopsins, i.e. from about 350 nm (blue) to 700 nm (red).
The values for /to and o-~ are determined from the best fit of a Gaussian to the spectral
sensitivity as measured psychophysically in humans, i.e. the Hering model.

So we get for the spectral intensity e(Y, A0;cr~) = s e(x, ~.)G(~., A0, o-x)d~.. This is a
'static' convolution operation. It is not a convolution in the familiar sense, because we don't
shift over the whole wavelength axis. We just do a single measurement with a Gaussian
aperture over the wavelength axis at the position )~0. Similarly, the derivatives to it

02e(y,~) __ 2 F Arnax . . . . 02G(2(,,1o,O'),) ~ a


-- ~ X J~-min e t x , n) ~ azt
describe the first and second order spectral derivative respectively. The factors o-a and o-~t2
are included for the normalization, i.e. to make the Gaussian spectral kernels dimensionless.

Here are the graphs of the 'static' normalized Gaussian spectral kernels to second order as a
function of wavelength:
316 18.3 Koenderink's Gaussian derivative color model

g a u s s A [ A _ , G_] = D [ g a u s s [ A , G], AI;


g a u s s A A [ A _ , o_] [] D [ g a u s s [ A , a], {A, 2}];
A 0 = 520; oA = 55;
Plot[{gauss[(A-A0), oAI, o A g a u s s A [ ( A - A 0 ) , oA],
aA 2 g a u s s A A [ ( A A0), aA]}, {A, A 0 - 3 OA, A 0 + 3 aA},
-

P1otRange~All, A x e s L a b e l - > {"A (nm)", ....}, I m a g e S i z e - > 250];

0006
0004
0002
4(30 450~ 6 ~ f- X(nm)
-0003_
0004
-0006

Figure 18.4 The zero-th, first and second derivative of the Gaussian function with respect to
wavelength as models for the color receptive field's wavelengths sensitivity in human color
vision. After [Koenderink 1986]. The central wavelength is 520 nm, the standard deviation 55
am.

We recall from the human vision chapter that the color sensitive receptive fields come in the
combinations red-green and yellow-blue center-surround receptive fields. The subtraction of
yellow and blue in these receptive fields is well modeled by the first order derivative to ~t, the
subtraction of red and green minus the blue is well modeled by the second order derivative to
9Alternatively, one can say that the zero-th order receptive field measures the luminance, the
first order the 'blue-yellowness', and the second order the 'red-greenness'. The sensitivity of
the three types of cones (L-, M- and S-cones: long, medium and short wavelength sensitivity)
is given in figure 18.5.

Show[Import["ConesColors.glf"l, I m a g e S i z e -> 200];

/ ",
;/ \ L

,/

Figure 18.5 The relative spectral sensitivity of the three types of cones in the human retina.
From [Geusebroek et al. 1999a].

The Gaussian model approximates the Hering basis [Hering64a] for human color vision when
t0 - 520 nm and o'x = 55 nm (see figure 18.6). They also fit very well the CIE 1964 XYZ
basis, which is a famous coordinate system for colors, much used in technical applications
involving color.
18. Color differential structure 317

Note: the wavelength axis is a half axis. It is known that for a half axis (such as with positive-
only values) a logarithmic parametrization is the natural way to 'step along' the axis. E.g. the
scale axis is logarithmically sampled in scale-space (remember the 'orders of magnitudes'), the
intensity is logarithmically transformed in the photoreceptors, and, as we will see in chapter
20, the time axis can only be measured causally when we sample it logarithmically. We might
conjecture here a better fit to the Hering model with a logarithmic wavelength axis.

Show[Import["HeringColormodel.gif"], ImageSize -> 300];

Figure 18.6 The Hering basis for the spectral sensitivity of human color receptive fields. From
[Hering1964a].

The Gaussian color model needs the first three components of the Taylor expansion of the
Gaussian weighted spectral energy distribution at A0 and scale o-~. An RGB camera measures
the red, green and blue component of the incoming light, but this is not what we need for the
Gaussian color model. We need a method to extract the Taylor expansion terms from the RGB
values. The figure below shows the RGB color space. The black color is diagonally opposite
the white cube.

sc = .2; n = 6; Clear[gr] ;
Show[gr = G r a p h i c s 3 D [ T a b l e [ ( R G B C o l o r [ x / n , y/n, z/n],
C u b o i d [ ( x - so, y - s c , z-sc}, {x+sc, y+sc, z+sc}]},
{z, i., n}, {y, 1., n), {x, i., n}], Lighting-> False] ,
V i e w P o i n t - > (2.441,
1.600, 1.172}, ImageSize->200l;

Figure 18.7 RGB color space.

LiveGraphics3D is a Java class written by Martin Kraus to real-time rotate and


manipulate Mathematica Graphics3D objects in a browser. The package is available at
318 18.3 Koenderink's Gaussian derivative color model

wwwvis.inform~ik.uni-stuUga~.de/~kraus/LiveGraphics3D/. With the command


liveFormWrite we wfitethe above graphicto atemporaryfile.

liveFormWrite[$TopDirectory <>
"\\AddOns\\Applications\\FrontEndVision\\LiveGraphics3D\\data.m", gr]
Hitshow.htmlto sta~ your browser and play with the 3D structure.

This plots ~lpixels o f a colorimage as points(withtheir RGB color) in RGB space:

Block[{$DisplayFunction = Identity, im, data},


im= Import["hybiscus2.jpg"]; data=Flatten[im[[l, i]], i];
pl= Show[im, ImageSize-> 150];
#
p2 = Show[eraphics3DE{RGBColor @@ - - , PointE#]} &/@data],
255
Axes->True, AxesLabel~ {"s", "G", "B"}]];
ShowEGraphicsArray[{pl, p2}], ImageSize->470];

Figure 18.8 Left: color input image. Right: distribution of the pixels in the RGB space. Note the
wide range of green colors (in the left cluster) and red colors (in the right cluster) due to the
many shadows and lighting conditions.

An RGB camera approximates the CIE 1964 XYZ basis for colorimetry by the following
linear transformation matrix:

I 0.621 0.113 0.194 1


rgb2xyz = 0.297 0.563 0.049 ;
-0.009 0.027 1.105

Geusebroek et al. [Geusebroek2000a] give the best linear transform from the XYZ values to
the Gaussian color model:

(-0.019
9 0.048 0.011 1
xyz2e = | 0.019 0. -0.016
0.047 -0.052 0.

The resulting transform from the measured RGB input image to the sampling 'h la human
vision' is the dot product of the transforms:
18. Color differential structure 319

colorRF = xyz2e.rgb2xyz; c o l o r R F / / M a t r i x F o r m

0.002358 0.025174 0.010821 1


0.011943 0.001715 -0.013994
0 013743 -0.023965 0.00657

Indeed, when we study this transform and plot the rows, we see the likeliness with the
Gaussian derivative sensitivity (see figure 18.6).

Block[{$DisplayFunction = Identity},
pl = Table[ListPlot[colorRF[[i]], PlotJoined -> False,
PlotStyle -> PointSize[0.05], Ticks ~ None], {i, 3}]];
Show[GraphicsArray[pl], ImageSize -> 200];

Figure 18.9 The transformations of the input RGB values to the Gaussian color model
coarsely resemble the Gaussian derivatives to t. ke~: zero-th order. Middle: first order, right:
second order. The three center wavelengths roughly correspond with 400, 475 and 575 nm of
the Gaussian derivative functions of figure 18.4.

The set of spatio-spectral Gaussian derivative cortical simple cells thus looks like (from
[Geusebroek et al. 2000a]):

Show[Import["ColorRFs.gif"], ImageSize -> 250];

Figure 18.10The Gaussian color model for cortical receptive fields. Left: zero-th order to
wavelength, measuring the luminance and the spatial derivative structure. Middle: first order
derivative to wavelength, yellow/blue - spatial derivatives. Right: second order derivative to
wavelength, red/green-spatial derivatives.

The Gaussian color model is an approximation, but has the attractive property of fitting very
well into Gaussian scale-space theory. The notion of image structure is extended to the
wavelength domain in a very natural and coherent way. The similarity with human differential-
color receptive fields is more than a coincidence.

Now we have all the tools to come to an actual implementation. The RGB values of the input
image are transformed into Gaussian color model space, and plugged into the spatio-spectral
formula for the color invariant feature.
320 18.3 Koenderink's Gaussian derivative color model

Next to the derivatives to wavelength we need spatial derivatives, which are computed in the
regular way with spatial Gaussian derivative operators. The full machinery of e.g. gauge
coordinates and invariance under specific groups of transformations is also applicable here.
The next section details the implementation.

18.4 Implementation
We start with the import of an RGB color image (see figure 18.11). The color pixels are RGB
triples in the very first element of the imported object:

image = Import["colortoys2.jpg"];
im= image[[1, 1]];
im[[l]] //Short
{{67, 104, 148}, {67, 104, 148}, {66, 103, 147}, {66, 103, 147},
{66, 103, 147}, {66, 103, 147}, <<167>>, {119, 40, 87}, {119, 40, 87},
{120, 41, 88}, {119, 40, 87}, {119, 40, 87}, {120, 41, 88}}

Dimensions[im]

{228, 179, 3}

The RGB triples are converted into measurements through the color receptive fields in the
retina with the transformation matrix c o l o r R F defined above:

colorRF = xyz2e.rgb2xyz; colorRF //MatrixForm

0.002358 0.025174 0.010821 ]


0.011943 0.001715 -0.013994
0.013743 -0.023965 0.00657

To transform every RGB triple we map the transformation to our input image as a pure
function at the second listleveh

observedimage=Map[Dot[colorRF, #] &, im, {2}];

The three 'layers' of this observed image o b s represent resp. e, e~t and eat. We 'slice' the
dataset smartly by a reordering T r a n s p o s e whereby the e-, ea- and eaa-values each form a
plane:

obs Transpose[observedimage, {2, 3, 1}];


=

Dimensions[obs]

{3, 228, 179}

Let us inspect what the color receptive fields see:


18. Color differential structure 321

DisplayTogetherArray [
Prepend [ListDensityPlot/@ o b s , S h o w [image] ] , I m a g e S i z e -> 470] ;

Figure 18.11 The input image (left) and the observed images with the color differential
receptive fields. Image resolution 228x179 pixels.

We now develop the differential properties of the invariant color-edge detector ~ = e1 0oe2 '
where the spectral intensity e = e(x, y, ;t). The derivatives to the spatial and spectral
coordinates are easily found with the chainrule. Here are the explicit forms:

D[e[x, y, A], St]


8 := ; 0~8
e[x, y , X]
e (~176 [x, y, X] e <I"~176 Ix, y, A] e <I'~ [x, y, X]
+
e [ x , y, 2] 2 e[x, y, A]

To make this more readable, we apply pattern matching to make the expression shorter (for
the code of s h o r t n o t a t i o n see FEV.nb)

0x 8 / 9e Ix, y, A l -> e // s h o r t n o t a t i o n

eexz Ix, y, A] - e x [ x , y, A] ez[x, y, A]


e2

Oy ~ /. e [x, y, l] -> e // s h o r t n o t a t i o n

eey~ [x, y, A] - e y [ X , y, A] ez [x, y, A]


e2

O a 8 /. e[x, y, l] -> e // s h o r t n o t a t i o n

-ez [x, y, A] 2 + e e z z [x, y, A]


e2

Ox,a8 /. e [ x , y, A] -> e // s h o r t n o t a t i o n

1 (e 2 exzz [x, y, A] - 2 eexz [x, y, A] ez Ix, y, A] +


e3
ex[x, y, X] ( 2 e z [ X , y, A] 2 - e e z z [x, y, A]))

0y,a& /. e [ x , y, A] -> e // s h o r t n o t a t i o n

1 (e 2 eyzz [x, y, A] - 2 e eyz [x, y, A] ez [x, y, A] +


e3
ey[x, y, X] (2ez [x, y, X] 2 - e e z z [x, y, A]))

The gradient magnitude (detecting yellow-blue transitions) becomes:


322 18.4 Implementation

g = Simplify[4 (0a 8) 2 + (0yS) 2 ] // s h o r t n o t a t i o n

e [ x , y , A] 4 ( ( e [ x , y , A] e~, [ x , y , A] - e ~ [ x , y , A] e ~ [ x , y , A])2 +

( e [ x , y , A] eyz [ x , y , A] - e y [ x , y , A] e~ [ x , y , A ] ) 2 ) )

The second spectral order gradient (detecting purple-green transitions) becomes:

~A/ : S i m p l i f y [ ~ (Ox,x $)2 . (Oy,x ~)2 ] / / s h o r t n o t a t i o n

e[x, y, ),]6
( ( e [ x , y , A] 2 e~z~ [ x , y , l ] § 2 e~ [ x , y , l ] ez [ x , y , A] 2 - e [ x , y , A]
( 2 e x ~ [ X , y , t ] e ~ [ x , y , A] + e ~ [ x , y , A] e ~ [ x , y , A ] ) ) 2 +
(e[x, y, A] 2 eyzz [x, y, A] + 2 ey [x, y, A] e z [x, y, A] 2
e[xl y, it] (2ey~ [x, y, A] ez[X, y, A] +ey[X, y, A] e~z [x, y, ).]))2))

Finally, the total edge strength N (for all color edges) in the spatio-spectral domain becomes:

N : S i m p l i f y [ ~ ( 0 x 8) 2 * (OyS) 2 + (O.,xS) 2 + (Oy,x~) 2 ] ; N // s h o r t n o t a t i o n

1
4(e[x,?, A] 6
(e [x, y, A] 2 (e [x, y, A] exz [x, y, A] - ex [x, y, A] e~ [x, y, A] ) 2 +
e[x, y, A] 2 (e[x, y, A] ey~ [x, y, A] -ey[x, y, A] e~[x, y, 2]) 2 +
(e[x, y, A]2 e~z [x, y, A] + 2 e x [ X , y, A] ez[X, y, A] 2
e[x, y, X] (2exz [x, y, A] ez[x, y, A] +e~[x, y, A] ezz [x, y, X])) 2 +
(e[x, y, .I]2 ey~z [x, y, A] + 2 ey[X, y, A] ez [x, y, A] 2
e[xl y, A] (2eyz [x, y, A] ez[X, y, A] +ey[X, y, A] e~z [x, y, A]))2))

As an example, we implement this last expression for discrete images.

As we did in the development of the multi-scale Gaussian derivative operators, we replace


each occurrence of a derivative to 1 with the respective plane in the observed image r f (by
the color receptive fields). Note that we use r f [ [ n A + l ] ] because the zero-th list element is
the Head. of the list. We recall the internal representation of a derivative in Mathematica:

F u l l F o r m [ e (I'~ [x, y, A] ]

Derivative[l, 0, 2] [e] [x, y, \[Lambda]]

We will look for such patterns and replace them with another pattern. We do this pattern
matching with the command ] . ( R e p l a c e A l l ) . We call the observed image at this stage
r f , without any assignment to data, so we can do all calculations symbolically first:

Clear[rf0, rfl, rf2, a] ;


rf = {rf0, rfl, rf2};
18. Color differential structure 323

N = ]V /. {Derivative[nx_, ny_, nl_] [e] [x, y, A] :*


Derivative[nx, ny] [rf[ In/+ I] ] ] [x, y] ,
e[x, y, A] :~ rf[ [i] ] } // Simplify

4( 1 (rf0 2 (rfl[x, y] rf0(o,1> [x, y] _rf0rfl<O,1> [x, y])2 +


r~
(2 rfl[x, y]2 rf0<o,11 [x, y] -2 rf0rfl[x, y] rfl I~ Ix, y] +
rf0 (-rf2[x, y] rf0 <~ [x, y] + r f 0 r f 2 I~ [x, y]))2 +
rf0 2 (rfl[x, y] rf011'~ [x, y] - r f 0 r f l (I'~ [x, yl) 2 +
(2 rfl[x, y]2 rf0(1,o) [x, y] - 2 r f 0 r f l [ x , y] rfl <I'~ Ix, y] +
rf0 (-rf2[x, y] rf0 (I'~ [x, y] + r f 0 r f 2 (I'~ [x, y]))2))

Note that we do a delayed rule assignment here (:-~ instead of ~ ) because we want to evaluate
the right hand side only after the rule is applied. We finally replace the spatial derivatives with
our familiar spatial Ganssian derivative convolution gD at scale o-:

/4'= ]'4'/. {Derivative[nx_, ny_] [rf_] [x, y] :~gD[rf, nx, ny, u],
rfl[x, y] :~rfl, rf2[x, Y] :~rf2}

1 (rE0 2 (rfl gD[rf0, 0, i, ~] -rf0 gD[rfl, 0, i, c] )z +


r~7
rf0 2 (rflgD[rf0, i, 0, ~] - r f 0 g D [ r f l , i, 0, c])2+
(2rfl 2 gD[rf0, 0, i, ~] - 2 r f 0 r f l g D [ r f l , 0, i, a] +
2
rf0 (-rf2gD[rf0, 0, i, ~] + r f 0 g D [ r f 2 , 0, i, a])) +
(2rfl 2 gD[rf0, i, 0, a] - 2 r f 0 r f l g D [ r f l , i, 0, ~] +
rf0 (-rf2 gD[rf0, i, 0, ~] + r f 0 g D [ r f 2 , i, 0, a]))2))

This expression can now safely be calculated on the discrete data:

{rf0, rfl, rf2} = obs; a = 1.;


L i s t D e n s i t y P l o t [-]q, PlotRange -> {-.4, 0}, ImageSize -> 220] ;

Figure 18.12 The color-invariant N calculated for our input image at spatial scale o- = 1 pixel.
Primarily the total color edges are found, with little edge detection at intensity edges. Image
resolution 228x179 pixels.
324 18.4 Implementation

The following routines are implemented as standard FEV functions:

5[ira,G] ( ~ : ~ Oe
0,t
colorinvariant 1 0 e
e Oh.

POt [ i m , a ] O~
0it first wavelength derivative of

F~A [ i m , a] 02--~-~
0A2 second wavelength derivative of

I oq8 2 . /08~2
g g [ i r a , a] (~-x) + [~-y ) yellow-blue edges

[ 028 ]2 . [ 02~ ,~2


~ V [ i m , O'] "~kclxO&] +[o-g'~;) red-green edges

l O~ 2 [0812 . { 025 ~2 . [ O2& ,2


gAt [ira, o] (~-x) + ~-yj t ~3Tb'i') -~ [ 0~-'F'~) total color edge strength

The prefix g stands for the multi-scale Gaussian derivative implementation. For the code, see
appendix D.

Here is an example of finding blue-yellow edges:

image = Import [ "terre indigo, jpg" ] ; im = rasterpixels [image] ;


DisplayTogetherArray [
{Show[image], L i s t D e n s i t y P l o t [ - g g [ i m , 1], P l o t R a n g e ~ {-.15, -.08}] },
ImageSize ~ 510] ;

/(08] 2 z 0 s ~2
Figure 18.13 The yellow-blue edge detector • = ~/, a x , + [ ~ ) calcunatedat a spatial scale
o- = 1 pixel. Image resolution 288x352. Note the absence of black-white intensity edges in the
day and time indication. Example due to J. Sporring.

This example shows the color-selectivity of the spatio-color differential invariants:


18. Color differential structure 325

image = Import["colortoys.jpg"]; im = rasterpixels[imagel;

DisplayTogetherArray[{Show[image],
ListDensityPlot[-g~[im, a[] 1.], P l o t R a n g e ~ {-.2, -.06}],
ListDensityPlot[-g~V[im, a = 1.], P l o t R a n g e ~ {-1.2, 0}]},
ImageSize ~ 510];

Figure 18.14 Detection of the yellow-blue edge detector g = (~) + [-~) (middle) and the

red-green edge detector ~ V = ~ (~ ax


028o~ J]2 + ~( oy
a2,s
a;~ J]2 (right) at a scale ofo-= 1 pixel.

As a histological example, spatio-color differential structure can accentuate staining patterns.


The next example shows the advantage of a specific red-green detection in the localization of
the edges of a stained nucleus in paramecium caudatum, a common freshwater inhabitant with
ciliary locomotion.

im = rasterpixels[image = Import["Paramecium caudatum.jpg"]];


DisplayTogetherArray[
{Show[image], ListDensityPlot[-gg[im, u[] 3.], P l o t R a n g e ~ {-.012, 0l],
ListDensityPlot[-g~V[im, a = 3.], PlotRange ~ {-.15, 0}]l,
ImageSize ~ 510];

Figure 18.15 The color-invariant G (middle) and qV (right) calculated for a histological image
of a fungus cell, p a r a m e c i u m c a u d a t u m (left). The red-green color edges found by q~' form a
good delineation of the cell's nucleus.

18.5 Combination with spatial constraints


Interesting combinations can be made when we combine the color differential operators with
the spatial differential operators. E.g. when we want to detect specific blobs with a specific
size and color, we can apply feature detectors that are best matching the shape to be found.
We end the chapter with two examples: color detection of blobs in a regular pattern, and
locating stained nuclei in a histological preparation.
326 18.5 Combination with spatial constraints"

Blobs are detected by calculating those locations (pixels) where the Gaussian curvature igc =
Lxx Lyy - Lxy z on the black-and-white version (• of the image is greater then zero. This
indicates a convex 'hilltop'. Pixels on the boundaries of the 'hilltop' are detected by requiring
the second order directional derivative in the direction of the gradient Lww to be positive.
Interestingly, by using these invariant shape detectors we are largely independent of image
intensity. For the color scheme we rely on ~ and its first and second order derivative to ~. The
Mathematica code gives some examples of specific color detection:

i m b w []c o l o r T o B W [ i m = r a s t e r p i x e l s [ i m a g e = I m p o r t [ " c o l o r b l o b s . g i f " ] ] ] ;


l w w = g a u g e 2 D N [ i m b w , 0, 2, 2];
lgc []gD[imbw, 2, 0, 2] gD[imbw, 0, 2, 2 ] - gD[imbw, 1, 1, 2]2;
el = ~ A [ i m , 2]; ell = ~ A A [ i m , 4];
tl = M a p [ I f [ # > 0, I, 0] &, lww, {2}];
t2 = M a p [ I f [ # > 0, I, 0] &, Igc, {2}];
t3 = M a p [ I f [ # > 0, i, 0] &, el, {2}];
t4 = M a p [ I f [ # > 0.001, 1, 0] &, e11, (2}];
t5 = M a p [ I f [ # > 0, 1, 0] &, e l + e l l , (2}];

B l o c k [ { $ D i s p l a y F u n c t i o n = Identity},
pl = S h o w [ i m a g e } ;
b l u e = L i s t D e n s i t y P l o t [ ( l - tl) t2 (i - t5) (I- (I- t5) t3)];
y e l l o w = L i s t D e n s i t y P l o t [ ( l - t l ) t2 ( i - t 4 ) t5];
redandmagenta = L i s t D e n s i t y P l o t [ ( 1 - tl) t2 t4]];
Show[
G r a p h i c s A r r a y [ ( p l , blue, yellow, r e d a n d m a g e n t a } ] , I m a g e S i z e ~ 420];

Figure 18.16 Detection of color blobs with spatial and color differential operators. Blobs are
spatially described as locations with positive Gaussian curvature and positive second order
directional derivative in the gradient direction of the image intensity, color is a boolean
combination of the color differential features. Example due to P. van Osta.

9 Task 18.1 Find the total detection scheme for all individual colors in the example
above, and for combinations of two and three colors.

A last example detects stained carbohydrate deposits in a histological application.

i m b w = c o l o r T o B W [im = r a s t e r p i x e l s [image = Import ["pas. jpg"] ] ] ;


lww = g a u g e 2 D N [ i m b w , 0, 2, 4] ;
Igc = g D [ i m b w , 2, 0, 4] gD[imbw, 0, 2, 4] - g D [ i m b w , I, i, 4]2;
el = ~A[im, 4] ; ell = ~AJl[im, 4] ; tl = M a p [ I f [ # > 0, i, 0] &, lww, {2}] ;
t 2 = M a p [ I f [ # > 0 . 2 , i, 0] &, Igc, {2}];
t6 = M a p [ I f [ # > 0, 1, 0] &, e l l - e l , {2}] ;
18. Color differential structure 327

DisplayTogetherArray[
{Show[image], ListDensityPlot[tlt2t6]}, ImageSize~500];

Figure 18.17 Detection of carbohydrate stacking in cells, that are specifically stained for
carbohydrates with periodic acid Schiff (P.A.S.). The carbohydrate deposits are in magenta,
cell nuclei in blue. The blob-like areas are detected with positive Gaussian curvature and
positive Lww, the magenta with a boolean combination of the color invariant ,S and its
derivatives to ~.. Example due to P. Van Osta. Image taken from
htt p://www.bris,ac. u k/Depts/PathAnd Micro/C P L/pas. html

18.6 Summary of this chapter


The scale-space model for color differential structure, as developed by Koenderink, states that
the wavelength differential structure is measured at a single scale (o'x = 55 nm) for zeroth
order (intensity), first order (blue-yellow) and second order (red-green). There is a good
analogy with the color receptive field structures found in the mammalian visual cortex.

The chapter shows an actual implementation of this color differential structure. The
wavelength derivatives can be extracted from the transformed RGB triples in the data.

In a similar way as for the spatial differential structure analysis of gray valued data, invariants
can be defined, with combined color and spatial coordinate transformation invariance. A
number of these color-spatial differential invariants are discussed and implemented on
examples.
19. Steerable kernels
19.1 Introduction
Clearly, orientation of structure is a multi-scale concept. On a small scale, the local
orientation of structural elements, such as edges and ridges, may be different from the
orientations of the same elements at a larger scale. Figure 19.1 illustrates this.

<< FrontEndVision'FEV"
i m = Import["fabric.gif"] [[i, I]];
DisplayTogetherArray[Prepend[
(ListDensityPlot[gauge2DN[im0, 2, 0, #] /. im0 ~ im]) & /@ {1.5, 5},
ListDensityPlot[im]], ImageSize-> 350];

Figure 19.1 The multi-scale nature of local orientation. Left: original, basket texture. Middle:
at a small scale (o-= 1.5 pixels), orientation is governed by the fibers as is shown by the
'ridgeness' Lv v. Right: at a larger scale (o- = 5 pixels), an other orientation can be seen by
calculating the Lv v ridgeness (bright areas, more or less horizontal in the picture).

Orientation plays an important role as parameter in establishing similarity relations between


neighboring points. As such, it is an essential ingredient of methods for perceptual grouping.
E.g. the grouping of edge pixels in a group that defines them as belonging to the same
contour, could be done using similarity in orientation of the edges, i.e. of their respective
gradient vectors. In chapter 12 we have seen that the visual system is particularly well
equipped to take measurements of differential properties at a continuum of orientations:

the typical spokewheel structure of orientation columns that make up each hypercolumn (the
wetware for the analysis of a single binocular visual field 'pixel') in the visual primary
cortex, where the receptive fields were found at all orientations. In this chapter, we will study
in detail the orientation properties of Gaussian derivative filters, and a concept named
'steerability'. We come to a proper definition of a directional derivative for all orders of
differentiation, and how this can best be computed in a Cartesian framework. We then study
as an application of orientation analysis the detection of stellate tumors in mammography
(example taken from [Karssemeier1995a, Karssemeier1996a]), which is based on a global
analysis of orientation. This is an example of computer-aided diagnosis (CAD).
19. Steerable kernels 330

Other examples of context dependent perception of curvature are some of the well known
obscured parallel lines illusions, shown in figure 19.2.

s q u a r e [ x _ , y _ , s _ l :=
L i n e [ { { x , y } , {x + s , y } , {x + s , y + s } , {x, y + s } , {x, y } } ] ;
DisplayToqetherArray[
{Show[Graphics[{Thickness[.01], Line[{{#, 0}, {0, #}}l, If[EvenQ[#],
Table[Line[{{#-n, n-I}, {#-n, n + i}}], {n, .i, i0, .5}],
Table[Line[{{#-n-i, n}, { # - n + i, n}}], {n, .i, i0, .5}]]} &/@
Range[l, 20, 3]], PlotRange-> {{0, i0}, {0, i0}}],
Show[Graphics[{Table[Line[{{-Cos[~], -Sin[~]}, {Cos[~], Sin[~]}}],
{~, 0, ~, ~/16}], square[-.25, .25, .5], square[-.25, -.75, .5],
Circle[{-.5, 0}, .25], Circle[{.5, 0}, .25]}],
PlotRanqe -> {{-1, 1}, {-1, 1}}, AspectRatio -> Automatic]},
ImageSize -> 400];

Figure 19.2 Obscured parallel lines illusions. Left: We perceive the lines as not parallel, in
reality they are parallel. Right: the squares seem parallelograms, the circles seem to have an
egg shape. The context around the contours of the figures determines the perceived
orientation.

19.2 Multi-scale orientation


We define the orientation of a vector relative to a coordinate frame as the set of angles
between the vector and with the frame vectors of the coordinate frame. Because we deal with
orthogonal coordinates in this book, we need a single angle for a 2D vector, and two angles
for a 3D vector.

Formal definition of angulation (2D), tilt and spin (3D). Figures.

The direction of a vector is the absolute value of the orientation. E.g. the direction of the x-
axis is horizontal, and a identical direction is created when we rotate the x-axis over Jr
radians (180 degrees).

In chapter 6,section 5, we studied the orientation of first order differential structure: the
gradient vector field, and in chapter 6, section 7.3 we encountered the orientations of the
principal curvatures, as the orientations of the Eigenvectors of the Hessian second order
matrix.
331 19.3 Orientation analysis with Gaussian derivatives

19.3 Orientation analysis with Gaussian derivatives


In order to build the machinery for the extraction of orientation information, we need to fully
understand the behavior of the Ganssian derivative operator at different orientations, and
how to control this behavior. We will limit the analysis to 2D. The zeroth order Gaussian is
isotropic by definition, and has no orientation. The action of this operator on an image is
rotational invariant. All non-zero partial derivatives are equipped with a direction.

OG
The first order Gaussian derivative kernel 'gFx is shown in figure 19.3, and we define the
orientation of this kernel to be zero, i.e. the angle r of its axis along which the differentiation
is done is zero radians with the x-axis. The orientation r of the Gaussian derivative kernel to
OG
y, ~-y, is Jr/2, pointing upwards along the positive y-axis. So increments in ~b are positive
in a counterclockwise fashion.

The first order Gaussian derivative kernel in another orientation can readily be made from its
basic substituents: it is well known that a kernel with orientation ~b can be constructed from
Cos(C) ~-x
ac + Sin(C) -bT"
o~

Figure 19.3 illustrates this on the convolution of the kernel on a Dirac-delta function, i.e. a
single spike, which gives us the figure of the kernel, as we have seen in chapter 2:

i m = T a b l e [ 0 , {128}, {128)]; im[[64, 64]] = I00;


= ~ / 6; B l o c k [ { $ D i s p l a y F u n c t i o n = Identity),
imx = gD[im, 1, 0, 15]; imy = gD[im, 0, 1, 15];
im~ = Cos[~] gD[im, i, 0, 15] + Sin[~] gD[im, 0, i, 15];
pl = L i s t D e n s i t y P l o t [ # ] & / @ { i m x , imy, imp}];
Show[GraphicsArray[pl]];

Figure 19.3 A first order Gaussian derivative at any orientation can be constructed with the
partial derivatives ~-~ (left) and ~ac; (middle) as a basis. Right: C o s ( ~ ) ~ 9+ Sin(~)~-y
ac; for
q~= Jr/6.

A class of filters where a filter of any orientation can be constructed from a linear
combination of other functions is called a steerable filter [Freeman1991a]. The rotational
components form a basis. A basis will contain more elements when we go to higher order.
The question is now: what are the basis functions? How many basis functions do we need for
the construction of a rotated Gaussian derivative of a particular order? We will discuss two
important classes of basis functions (see figure 19.6):

a. basis functions that are rotated copies of the Gaussian derivative itself;
b. basis functions taken from the set of all partial derivatives in the Cartesian framework;
19. Steerable kernels 332

In particular, we will derive later in the text general formulas for e.g. the rotation of ~176
Ox3
over 30 degrees clockwise ( - J r / 6 radians). In the two bases the results are the same
G(x,y)
3Ox3 145o denotes the 45 ~ rotated version of the third derivative kernel):

03G(x'Y)ox
31-30 o -- ~--43463G(x'Y)ox
3 10o 4 -3+~3-4
V~- 03G(x'Y)ox3145~+ ~-103G(x,Y)ox
3190o 3+~3-4
~ 03 G(x'Y)ox311350

03G(x,y) 1_300 _ 1 O3G(x,y) 3__~83 03G(x,y) 903G(x,y) 3~ 03G(x,y)


Ox3 8 Oy3 + OxOy2 ~- 8 Ox2 Oy 8 Ox3

The first class will be derived in of basis functions may have interesting similarities in the
control of directional derivatives in the visual system because of its self-similarity, symmetry
and a reduced set of receptive field sensitivity profiles, the second is more apt for computer
implementation. The derivation for the first class will be given in section 19.4, the second
class will be derived in section 19.5. The two basis sets are different, and an illustrating
example of multi-scale steerable filters.

19.4 Steering with self-similar functions


We need to know what is the necessary condition for a function to be steerable. We look at
the first order Gaussian derivative as a first example. We express the kernel in polar
coordinates {x, y} = {r Cos(~b), r Sin(~b)} where r is the distance to the origin, and 4~ the angle
with the real x-axis. With the function T r i c j T o E x p we express the result in complex
exponentials e i ~ :

Clear[#] ; {TrigToExp[Cos[#] + Z S i n [ # ] ] , Exp[I~] // ExpToTrig}

{e i~, Cos[#] + i S i n [ ~ ] }

We transform our Gaussian kernel and some of its partial derivatives, and display the result,
after division by the Gaussian kernel, in M a t r i x F o r m for notational clarity:

I x2 .y2
g := - - E zo2 ; ~ = . ;
2 ~a=
g axg
1 /. { x ~ rCos[#], y ~ rSin[#] } // TrigToExp // Simplify;
t = Oy g Ox,yg )
t / / MatrixForm

r2 r 2
e ~ i# ( l + e 2 1 # ) r
2~ 4no 2 o2
~2 z2
ie-~ -i~ (-l+e 2 . * ) r ie-~ - 2 i r ( l+e 4 i ~ ) r 2
4 ~ z az 8 n ~ ~ ~4

To see the structure more clearly, we divide the Gaussian itself out and collect the complex
exponentials:
333 19.4 Steering with self-similar functions

t
CollectLr t[[l, 1]] ' Ex~]l // MatrixForm

i e-i~: ei~: )
- 2 o 2 - 2 0z

ie-i~ r ie*~r ie-2i~ r2 ie2*~ r2


- 2 o2 + 2 ~2 4 a~ - 4 0~

The resulting function is separable in a radial part a(r) and an angular part e inch where n is
equal to the differential order: f ( r , ~ ) = F,M 1 an(r)e ino . A function with these properties
steers when it can be written as rotated versions of itself.

The so-called steerability constraint is f ~ y)= zMI kj(O)f~ y ) , where M is the


n u m b e r of filters, f ~ y) is the rotated kernel, and the f ~ y) are the rotated basis
kernels. The weighting factors kj(O) are to be determined, as follows. W h e n we fill in the
polar representation of the kernel in the steerability constraint, we get

a,(r) #"r = Z ~ l k:(O) a, (r) e i"r . (1)

We can divide by an (r), and w h e n an (r) = 0 for some n we just remove this constraint from
the set of equations. The equations are equal for - n and n, so we have to consider only
angular frequencies in the range 0 < n < M . Because n is equal to the order of
differentiation, we have here as first result that the n u m b e r of necessary basis filters for
steerability in n + 1.

To steer the first order Gaussian derivative, we needed two basis functions, for steering a
second order derivative kernel we need three basis functions. Equation (1) is a set of
equations from which we can solve the weighting constants kj(O). Written more explicitly,

r,/[ 1 /
they look like

e i~ e i~ e i~ ... e i~ k2(O)

( e ~o ) I,e~~ e*"~ ... ei"O- )~,kv(O))

The l's in the top row are for e i 0 0. Let us solve this set of equations.

We need to choose the orientations Oj of the basis functions such that the columns in the
matrix above are linearly independent. For symmetry reasons we choose them equally
spaced over the space of directions, i.e. over the range 0 - Jr. So, for the first order Gaussian
derivative the basis orientations are 0 and : r / 2 , for the second order they are 0, zr/3 and
2Jr/3, etc. Starting from an arbitrary angle 4, we get {05, 05+Jr/2} and
{05, 05 + zr/3, ~b+ 2 J r / 3 } etc.

n=2; On=Table[~+-- {i, O, n}]


n+1 '

{*, T +~, +*)


19. Steerable kernels 334

~.xp [ x 2 en ]

Because the complex exponentials are complex, we have to solve them separately for both
the real and the imaginary part.

n = 2; kj = A r r a y [ k , n + l ] ;
{ComplexExpand[Re[Exp[In8]] ----Re[kj.Exp[Inen]]],
ComplexExpand[Im[Exp[Ine]] == I m [ k j . E x p [ I n e n ] ] ] }

{Cos[20] Cos[2~] k[l] + C o s [ 2 ( 7

Sin[2 e] k[l] Sin[2 ~] k[2] Sin[2 ( 7

For the even orders o f differentiation we need to solve the equations specified by the top row
and the even rows (if n is even, we start in row 0), for the odd orders we need the odd rows
(if n is odd we start in row 1). The equation set is easily solved with Mathematica:

angularweights[n ] :=Module[{k, i, nil, Clear[e, r k] ;

en []T a b l e [ # + - - , {i, 0, n}] ; kj = A r r a y [ k , n + l ] ;


n+l
sol = Solve [Flatten[
Table [ (ComplexExpand [Re [Exp [I ni 8] ] ----Re [kj. Exp [I ni 8n] ] ] ,
ComplexExpand[Im[Exp[Ini8]] == I m [ k j . E x p [ I n i e n ] ] ] } ,
{ni, If[EvenQ[n], 0, I], n, 2}]
] , kj ] ; Flatten[kj /. sell // Simplify // TrigReduce]

For the first order we find for the angular weights kj (0) :

angularweight s [1 ]

[Cos[O-~], Sin[@-~]}

Indeed, a correct result, found from these equations:

iTr
n = i; Clear[e, ~, k] ; 8n []Table[# + *------?
n ' {i, 0, n}] ; kj []Array[k, n + i] ;
Table [ {
C o m p l e x E x p a n d [Re [Exp [I ni e]] ----Re [kj .Exp [I ni 8n] ] ] ,
ComplexExpand[Im[Exp[Ini8]] == I m [ k j . E x p [ I n i e n ] ] ] },
{ni, If[EvenQ[n], 0, I], n, 2}]

({Cos[O] == COS[~] k[l] -k[2] Sin[~], Sin[O] == Cos[~] k[2] +k[l] Sin[~] }}

For the second and third order Gaussian derivatives things get more complicated:
335 19.4 Steering with self-similar functions

angularweight s [2 ]

{i (l+2Cos[20-2qS]), ~-
1 (1-Cos[2e-2~] +%/3Sin[2e-2~])
1 ( (_1)2/6 %/'3" - (-1) s/6 r ~ - _ (-1 )1/6 ~f]-Cos[2 e - 24~] +
~-
(-i) s/6 ~ C o s [ 2 e - 2#] - 2 ~ Sin[2O-24)] -
(-1) I/3 ~ Sin[20- 2~] + (-1) 2/3 ~ S i n [ 2 0 - 2 # ] ) }

angularweights[3]

{~- < C o s [ 3 e - 3#] + C o s [ e - ~ ] ) ,


! ( - ' v r 2 c o s [ 3 e - 3 #] +%,r~Cos[e - ~] +'vr'ffSin[3 e - 3 ~] + - v t 2 S i n [ e - #])
4
1
~- (-Sin[3e- 3#] +Sin[e-@]),
1
~- ('vr'2"Cos[3e-3 #] - N t 2 C o s [ e - # ] +'vr2Sin[3e- 3 ~1 + % / ~ ' S i n [ e - # ] ) }

We have found the general result for steerability of Gaussian derivative kernels.
A Gaussian derivative kernel can be steered, i.e. made in any orientation, by a linearly
weighted sum of rotated versions of itself, the basis functions. There are n + 1 functions
required, equally spaced over an angle range of 0 - Jr.

For n = 1 the basis includes 0 = 0 o and 90 o ;


For n = 2 the basis includes 0 = 0 ~ 60 o and 120 o ;
For n = 3 the basis includes 0 = 0 o, 45 o, 90 o and 135 o ;
For n = 3 the basis includes 0 = 0 o, 36 o, 72 o, 108 o and 144 o ; etc.
In other words: for each value of n we have a different set of basis functions.

ang2 = angularweights[2]; ang3 = angularweights[3];


= 0; Block[{$DisplayFunction = Identity},
p l = PolarPlot[#, {e, 0, 2 ~ } ,
PlotRange ~ {{-1, 1}, {-1, 1}}, Frame ~ True] & /@ ang2;
p2 = PolarPlot[#, {e, 0, 2 ~ } , P l o t R a n g e ~ {{-1, 1}, {-1, 1}},
F r a m e . True] & /@ang3];
Show[GraphicsArray[{p1, p2}], I m a g e S i z e ~ 5 0 0 ] ;

Figure 19.4 Top row: polar plot of the three coefficients to construct the rotated second order
Gaussian derivative kernel. Bottom row: Polar plots of the 4 coefficients for the third order
rotated derivative operator. A polar p l o t / ( 8 ) is the radial plot of the radius f versus the angle
O.
19. Steerable kernels 336

The weights are found from a set of linear equations originating from the steeribility
constraint. So the third order derivative Gxxx 1r under an arbitrary angle of 4, with the x-axis
can be constructed from four basis kernels, each 45 degrees rotated, i.e.

Gx~x I,~ = 89(Cos[3 r + Cos[,~])6xxx [o


+ 88( - %/2- Cos[3 ~b]+ ~ Cos[~b] + if2- Sin[3 ~b]+ V ~ Sin[q~])Gxxx [ff
+ 1 (-Sin[3 ~] + Sin[~]) Gxxx Iff
+ 1 ( ~ - Cost3 ~b]- if2- Cos[g,] + ~ Sin[3 4,] + ~ - Sin[~b])G~xx I~-

When we fill in ~b= - J r / 6 we get the formula from the beginning of this chapter. It is
instructive to look at the polar plots of these coefficients. This is the plot of the function as a
function of the angle, with the amplitude of the function as distance to the origin. We quickly
see then how the angular weights are distributed over the orientations, and we recognize the
orientations of the basis functions, see figure 19.4.

19.5 Steering with Cartesian partial derivatives


When we just rotate our coordinates, we get a particular convenient representation for
computer implementations. We use the same strategy as developed for the gauge coordinates
in chapter 6. Instead of a locally adaptive set of directions for our frame to the orientation of
the gradient vectorfield, we now choose a fixed rotation of our frame vectors.

We use the same formulas, and notice that the orientation of the {Lx, Ly} unit vector frame is
given by {Sin(~b),Cos(~b)} where ~b is our required angle of rotation:

U n p r o t e c t [gD#] ;
gD#[im_, nv_, nw_, ~_, #_] :=
M o d u l e [ {Lx, Ly, v, w, ira0}, v [] {-Ly, Lx} ; w = {Lx, Ly);
S i m p l i f y [ N e s t [ (v. {0 x #, 0y#} &) , Nest[(w. {0x #, 0y#} &) , L[x, y] , nw] ,
nv] /. {Lx -* Sin[#] , L y -~ -Cos[#] }] ]

We denote gD~ [im__,nv_.tnw__, a__,~_] our rotated Gaussian derivative operator, and
define this function in the appropriate way by repeated action of the differential operators
(with N e s t ) and thereafter replacing the fixed direction {Lx, Ly} with the particular choice
of {Sin(~b), -Cos(~b)}. A final step is the replacement of any appearance of a derivative into
our regular (and now familiar) multi-scale Gaussian derivative operator gD.

Some examples: Here is the third derivative Gxxx rotated over -zr/6 (30 degrees
clockwise), the example of the beginning of this chapter, both in explicit formula and plotted
at a scale of o- = 15 pixels (1282 image):

ira[].; gD#[im, 3, 0, 15, - ~ / 6 ] // s h o r t n o t a t i o n

And a 4th order example:


337 19.5 Steering with Cartesian partial derivatives

gD~[im, 3, 1, 15, ~ / 5 ] // shortnotation

3~ ( (4 o - i0 + .... +

.... .... .... )

Task 19.1 Show that Lxxxy, when rotated over ~/2, gives Lxyyy. Explain this.

The numerical version for discrete images is g D ~ N :

Unprotect [gD~N] ;
gD~N[im_, nv_, nw_, a_, ~_] := gD~[im, nv, nw, o, ~] /.
Derivative[n_, m_] [L] [x, y] ~ gD[im0, n, m, o] /. im0 -> im
i m = Table[0, {128}, {128}]; im[[64, 64]l = i00;
D i s p l a y T o g e t h e r A r r a y [ListDensityPlot /@
{gD[im, 3, 0, 15] , gD~N[im, 3, 0, 15, -7r/6] }, ImageSize -> 380] ;

Figure 19.5 Left: the third order Gaussian derivative kernel to x. Right: the same kernel
rotated 30 degrees clockwise, calculated from the expression in partial Cartesian derivatives
above.

This is the proper multi-scale directional derivative. Any angle can now readily be
constructed, no more need for 'only in 8 directions'.

To summarize this section we show the figures of the two different bases discussed. The first
basis is a good starting point for models of oriented simple cells in the primary visual cortex.
In chapter 9 we have seen that the orientation columns in the mammalian visual cortex
contain the simple cells in a strikingly regular arrangement of ordered orientation. All
orientations are present to analyze the local visual field in the hypercolumns, the
arrangement in a pinwheel fashion. This representation is the topic of the last section of this
chapter. The second basis, made up of Cartesian separable functions, is the basis of choice
for computer implementations as these functions are just our familiar Gaussian derivative
convolution kernels (gD). As we now have all tools for making the oriented kernels, we
show both multi-scale steering bases for Gaussian derivative kernels below for the second
order derivative.
19. Steerable kernels 338

im = Table[0, {128}, {128}]; im[[64, 64]] = 100;


Block [{$DisplayFunction = Identity},

n = 3; O n = A p p e n d [ T a b l e [ - - , {i, 0, n}], - ~ / 6 ] ;
n+l
p1 = ListDensityPlot[gD~N[im, 2, 0, 15, #]] & /@On;
p2 = Apply[ListDensityPlot[gD[im, #1, #2, 15]] &,
{{3, 0}, {i, 2), {2, i}, {0, 3}}, 2];
tl = Graphics[Texs
StyleBox[\"~\",knFontFamily->\"Courier New
\",\nFontSize->24,knFontWeight->\"Plain\"]\)", {0, 0}]];
ttl = Insert[p1, tl, 5]]; tt2 = {p2, tl, p1[[5]]} // Flatten;
Show[GraphicsArray[{ttl, tt2}], ImageSize -> 350];

Figure 19.6 Two different sets of basis functions can be used for the construction of a
steered Gaussian derivative kernel. Upper row: the basis set is formed from rotated versions
of the kernel itself; bottom row: the basis set is formed from Cartesian x, y-separable
Gaussian derivative kernels. The weights are calculated in the section above. Kernel as in
figure 19.5.

19.6 Detection of stellate tumors

Computer-aided diagnosis (CAD) is the branch of computer vision in medical imaging


concerned with the assistance of the computer in the diagnostic process. This is a rapidly
growing field. There are two major reasons for its growth: a increasing array of
methodologies outperform the human diagnosis in specificity, and the sheer volume of
medical diagnostic data necessitates support. The system is always used as a 'second opinion'
system, the final responsibility of the diagnosis is always laid on the medal specialist.

Recently, a number of commercial products have acquired FDA approval to put the system
on the market and into clinical use. See e.g. the web pages of R2 Technology
(www.r2tech.com), Deus Technologies (www.deustech.com) and Fujifilm
(www.fujifilm.com). It is expected that more products soon will follow.

CAD up till now is mainly applied in fields where large scale screening of patients is
performed. Two classical areas are screening for microcalcifications and stellate (stella =
Latin: star; stellate = star-shaped) mammographic tumors (also called spiculated lesions),
and screening for X-thorax deviations (tuberculosis screening, lung cancer etc.). We discuss
the detection of stellate tumors in mammography. The following procedure has first been
presented by Karssemeijer [Karssemeier1995a, Karssemeier1996a]. This is now the basis of
one of the methods employed by R2 Technology.

A mammogram is a 2D X-ray photograph of the female breast, taken at high resolution


(typically 20002 or higher) and with a low X-ray tube voltage (typically 28-35 kiloVolt), in
339 19.6 Detection of stellate tumors

order to enhance the contrast between the soft tissue structures. The structure of the tissue is
highly tubular: many channels are present, all converging in a tree-like structure to the
nipple, so when a tumor expands, it is likely its outgrowth follows the channels. This gives
often rise to a particular stellate pattern seen around the shadow of the lesion.

The geometric reasoning: when we investigate a pixel for the presence of a stellate structure,
we actually have to investigate its immediate surrounding for the presence of lines oriented
towards the pixel. Because we study a large group of surrounding pixels, we can approach a
good statistical mean. The local orientation is detected by convolution of the second order
Gaussian derivative, a 'classical bar-detector'. The oriented kernel is a function of Lxx, Lxy
and Lyy :

Clear[4, ~] ; gD~[im, 2, 0, a, 4] // s h o r t n o t a t i o n

COS [~] 2 Lxx + Sin [2 ~] Lxy + Sin [~] 2 Lyy

For each pixel in the neigborhood of the pixel for which we inspect a stellated surround, we
calculated the three 'basis' derivatives Lxx, Lxy and Lyy. We create a kernel for the same
area where each pixel indicates under which angle this pixel is located with respect to the
central pixel. In this kernel we calculate again a triplet, now of the trigonometric coefficients
of the second order oriented Gaussian derivative.

We use the speed of L i s t C o r r e l a t e to multiply all triplets in the kernel with the triplets
of basis derivatives in the same area in the mammogram. The resulting area is somewhat
smaller than the original image. It is not useful to take information into account from a
periodic or mirrored boundary. The input image is mammo, at scale a of the differential
operator and s i z e is the size of the search region:

s t e l l a t e d e t e c t i o n [ m a m m o _ , ~ , size_] :=
M o d u l e [ {derivs, kernel}, d e r i v s = T r a n s p o s e [ {-gD [mammo, 2, 0, a] ,
gD[mammo, i, I, a], gD[mammo, 0, 2, a]1, {3, I, 2}];
k e r n e l = T a b l e [With [ {4 = A r c T a n Ix, y] I, {Cos [4] 2 9Sin [2 4] 9Sin [4] 2 } ] 9
{x, - s l z e - 0 . 5 , s i z e + 0 . 5 } , {y, - s i z e - 0 . 5 , size + 0 . 5 1 ] ;
F l a t t e n / @ L i s t C o r r e l a t e [kernel, derivs] ] ;

We start with an artificial mammogram of 200x200 pixels, containing two stellate patterns
with diameters of 30 and 20 pixels resp. (see figure 19.8, left), and uniformly distributed
noise:

i n s e r t [ m a m m o _ , stellate_, x_, y_] :=


M o d u l e [ { y d i m , xdim, xl, yl}, {ydim, xdlm} = D i m e n s i o n s [ s t e l l a t e ] ;
l o e s m = F l a t t e n [ T a b l e [ l y l , x11,
{yl, y, y + y d i m - i } , {xl, x, x + x d i m - 1 } ] , i];
l o c s s = F l a t t e n [ T a b l e [ { y l , xl}, {yl, i, ydim}, {xl, 1, xdim}], i];
R e p l a c e P a r t [ m a m m o , stellate, locsm, locss]];

s t e l l a t e [diam_] :: T a b l e [
I f [ x : : y l l x :: R o u n d [ d i a m / 2 ] II y :: R o u n d [ d i a m / 2 ] II x :: d i a m - y , i, 0],
{x, diam}, {y, diam}] ;
m a m m o = T a b l e [ 0 , {y, 2001, {x, 200}] ;
19. Steerable kernels 340

noise =Table[Random[l, {200}, {200}];


m a m m o = i n s e r t [ m a m m o , s t e l l a t e [ 3 0 ] , 50, 70];
m a m m o = i n s e r t [ m a m m o , s t e l l a t e [ 2 0 ] , 120, 140] + n o i s e ;

We extract the n largest maxima with the following function, which was first defined in
chapter 13:

nMaxima[im_, n_] : = M o d u l e [ { p , d =Depth[im] - 1},


p = T i m e s @@ T a b l e [ (Sign [im - M a p [RotateLeft, ira, { i } ] ] + 1)
( S i g n [ i m - M a p [ R o t a t e R i g h t , im, {i}]] + i), {i, 0, d - i}] /4a;
m a x s [] T a k e [ R e v e r s e [ U n i o n [ { 1 0 E x t r a c t [ i m , #], R e v e r s e [ # ] } & /@
P o s i t i o n [ p , i]]], n];
10 #1
Apply[{ maxs[[l, I ] ] ' #2} &, maxs, ( i } ] ] ;

The resulting detection is indicated with circles. The radii of the circles indicate the
likelihood of being the center of a stellate region. The radii are normalized to the highest
likelihood with a radius of 10 pixels.

{ydim, xdim} [] D i m e n s i o n s [ m a m m o ] ; size = 20;


n = 2;
s m a m m o = T a k e [ m a m m o , {i + size, y d i m - size - i}, {i + size, x d i m - s i z e - i}];
DisplayTogetherArray[{ListDensityPlot[smammo],
L i s t D e n s i t y P l o t [ p l [] s t e l l a t e d e t e c t i o n [ m a m m o , 4, 20], E p i l o g
{Red, C i r c l e @ @ @ R e v e r s e / @ n M a x i m a [ p l , n]}]}, I m a g e S i z e ~ 4 5 0 ] ;

Figure 19.7 Computer-aided diagnosis in mammography: detection of two artificial stellate


tumors in a noisy mammogram. Left: the input artificial mammogram (size 200x200 pixels)
with 2 stellate regions. Right: the detected cumulative response of an oriented second order
Gaussian derivative kernel, integrated over an area of 20x20 pixels.

Task 19.2 Experiment with different values for the following parameters:
- the scale o- of the differential operator;
- the signal to noise ratio of the input image;
- the area of the search around each pixel;
- the distance between two stellate lesions;
341 19.6 Detection of stellate tumors

- the size of the stellate lesion;


- the weight over the search area as a function of distance to the central pixel.

Now for a digital mammogram:

m a m m o = I m p o r t [ " m a m m o g r a m 0 4 . j p g " ] [[i, i]];


{ydim, xdim} = D i m e n s i o n s [ m a m m o ] ;
d e t e c t e d = s t e l l a t e d e t e c t i o n [ m a m m o , G = 3, s i z e = 50]; n = 3;
s m a m m o = T a k e [ m a m m o , {size + i0, - s i z e - 11}, {size + i0, - s i z e - 11}];
s d e t e c t e d = T a k e [ d e t e c t e d , {i0, -i0}, {i0, - i 0 } ] ;
circles = {Red, C i r c l e @ @ @ R e v e r s e / @ n M a x i m a [ s d e t e c t e d ~ n]};
D i s p l a y T o g e t h e r A r r a y [ L i s t D e n s i t y P l o t [ # , E p i l o g ~ circles] & /@
{smammo, s d e t e c t e d } , I m a g e S i z e ~ 480];

Figure 19.8 Left: Selection from a digital mammogram (size 403x291 pixels). Right: the
detected cumulative response of an oriented second order Gaussian derivative kernel at
o- = 3 pixels, integrated over an area of 50x50 pixels. The number of maxima to be reported
is 3.

9 Task 19.3 Experiment with other line detection kernels, such as a strongly
elliptical oriented Gaussian kernel of zeroth order.

9 Task 19.4 Find images with stellated tumors on the internet. See e.g.
marathon.csee.usf.edu/Mammography/Database.html.

Note that this section only introduces the notion of using orientation sensitive responses in a
geometric reasoning scheme, which might be part of a computer-aided diagnosis procedure.
The method described above should never be used in any diagnostic judgement.
19. Steerable kernels 342

19.7 Classical papers and student tasks


The paper by Freeman and Adelson [Freeman1991a] formed the basis of this section. The
references therein give a good overview of the literature on steerable filters. Other instructive
(and historical) papers are by Jan Koenderink in his classical paper on receptive field models
[Koenderink1988bl, by Pietro Perona [Perona1991a, Perona1992, Perona1995], by
Wolfgang Beil [Bei11994], by Per-Erik Danielson [Danielson1990] and Eduard Simoncelli
and coworkers [Simoncelli1995, Simoncelli1996a, Fafid1997a]. The construction of rotated
partial derivatives can also be set up with Lie group theory, finding the Lie 'infinitesimal
generators'. The papers by Michaelis and Sommer [Michaelis1995a, M~chaelis1995b] and by
Teo and Hel-Or [Teo1998] give a fine introduction, many examples and references of this
powerful technique. One of the early theoretical studies of orientation tuning in the context
of the front-end visual system is by Daughman [Daughman1983, Daughman1985]. See for
an invertible orientation 'bundle' the paper by Kalitzin [Kalitzin1997a, Kalitzin1998a]. Multi-
scale orientation analysis, based on models inspired by the 'spokewheel' structure as
observed in the columns in the visual primary cortex, is a promising terrain for perceptual
grouping research.

Task 19.5 Find the expressions for the weighting functions for a mixed Gaussian
derivative kernel, e.g. ~a3G
, for both bases.

Task 19.6 Pentland [Pentland1990] has suggested that shape-from-shading


analysis can be performed by a linear filtering operation in many situations, e.g.
when the reflectance function is approximately linear. Freeman and Adelson
[Freeman1991a] give suggestions how to implement this with steerable
functions. Make a Mathematica scale-space implementation for linear shape
from shading.

Task 19.7 Make a Mathematica implementation, based on the results developed


in this chapter, for 3D steerable filters. See again Freeman and Adelson
[Freeman 1991 a] for theoretical support.

Task 19.8 Trabecular bone (the sponge-like interior of most of our bones) has an
intricate multi-scale orientation structure. Extract from 2D and 3D datasets the
local orientation structure at multiple scales, and come up with sensible
definitions for the local structure of the bone. For inspiration, see
[TerHaarRomeny1996f, Niessen1997b, Lopez200a].
343 19.8 Summary of this chapter

19.8 Summary of this chapter


The local vectorfield specified by the gradient and its clockwise rotated perpendicular vector
form the first order orientation structure. This vectorfield is also specified by the
Eigenvectors of the local structure matrix.
The local vectorfield specified by the unit vectors of the principal curvature vectors in each
point form the second order orientation structure. This vectorfield is also specified by the
Eigenvectors of the local Hessian matrix.
Gaussian derivative kernels are steerable kernels. They can be constructed in any direction
(as directional derivative operators) in two ways: as a polynomial expressed in rotated
versions of the Gaussian derivative kernel itself, or as a polynomial combination of Cartesian
partial derivatives.
The geometric reasoning for the detection of structures can now be expanded to the inclusion
of responses to oriented structures, such as lines. An example is given for the detection of
stellate tumors in mammography.
20. Scale-time
"The dilemma is complete" -Jan Koenderink, [Koenderink1988a]
"It is later then you think" -Chinese proverb
"You are young and life is long and there is time to kill today" -Pink Floyd

20.1 Introduction
In the time domain we encounter sampled data just as in the spatial domain. E.g. a movie is a
series of frames, samples taken at regular intervals. In the spatial domain we needed an
integration over a spatial area to catch the information. Likewise, we need to have an
aperture in time integrating for some time to perform the measurement. This is the
integration time. Systems with a short resp. long integration time are said to have a fast resp.
slow response. Because of the necessity of this integration time, which need to have a finite
duration (temporal width) in time, a scale-space construct is a physical necessity again.

<< FrontEndVision'FEV';
Show[Import["DaVinci watch 512x540.jpg"], ImageSize.-> 160];

Figure 20.1 This watch Da Vinci Rattrapante (IWC Schaffhausen, Switzerland), named after
Leonardo da Vinci who did many inventions in the area of clocks, indicates time at many
different time scales, i.e. seconds, minutes, hours, weekdays, day of months, months, years
and lunar cycle.

Furthermore, time and space are incommensurable dimensions (measurements along these
dimensions have different units), so we need a scale-space for space and a scale-space for
time.

Time measurements can essentially be processed in two ways: as pre-recorded frames or


instances, or real-time. Temporal measurements stored for later replay or analysis, on
whatever medium, fall in the first category. Humans perform continuously a temporal
analysis with their senses, they measure real-time and are part of the second category. The
scale-space treatment of these two categories will turn out to be essentially different.
346 20.1 Introduction

Prerecorded sequences can be analyzed in a manner completely analogous with the spatial
treatment of scaled operators, we just interchange space with time. The notion of temporal
scale a t then naturally emerges, which is the temporal resolution, a device property when
we look at the recorded data (it is the inner scale of the data), and a free parameter temporal
scale when we do the multi-scale analysis.

In the real-time measurement and analysis of temporal data we have a serious problem: the
time axis is only a half axis: the past. There is a sharp and unavoidable boundary on the time
axis: the present moment. This means that we can no longer apply our standard Gaussian
kernels, because they have an (in theory) infinite extent in both directions. There is no way to
include the future in our kernel, it would be a strong violation of causality.

But there may be a way out when we derive from first principles a new kernel that fulfils the
constraint of causality: a kernel defined on a logarithmically remapped time axis. From this
new causal kernel we might again derive the temporal and spatio-temporal family of scaled
derivative operators. Jan Koenderink [Koenderink1988a] has presented the reasoning to
derive the theory, and we will discuss it in detail below.

We will now treat both the pre-recorded and real-time situation in more detail, and give the
operational details of the scale-space operators. The best source for reference is Jan
Koenderink's original contribution on scale-time [Koenderink1988a].

There have appeared some fine papers discussing the real-time causal scale-space in detail by
Luc Florack [Florack1997a, chapter 4.3] and Lindeberg, FagerstrOm and Bretzner
[Lindeberg1996b, Lindeberg1997a, Bretzner1996a, Bretzner1997a]. Lindeberg also
discusses the automatic selection of temporal scale [Lindeberg 1997b|.

20.2 Analysis of prerecorded time-sequences


Prerecorded temporal sample-sequences can be treated just as spatial sample-sequences. The
causality constraint is satisfied because we include the whole available axis in our analysis.
Boundary effects at both sides of the finite series in a practical situation are dealt with in a
proper manner: we choose a way to extend the data (recall the discussion in chapter 5). This
choice is essentially arbitrary, and the results can be predicted from the choice taken. The
most common choice, also the choice taken throughout his book, is the cyclic representation:
an infinite repetition of the data in all dimensions. Other feasible choices are mirroring,
extending with zero's, extending with the mean etc.

The Gaussian derivative kernels with respect to time form the complete family of the
temporal differential operators. They take the derivatives with respect to time, just as we
have seen for the spatial derivative kernels. The temporal scale o'T is a free parameter, and
the set of results of the operator for a range of scales is called a temporal scale-space.
20. Sca~-time 347

When we combine the class of spatial differential operators with the class of temporal
differential operators, we get the complete family of spatio-temporal operators. They take
simultaneously derivatives to space and time, and can be constructed to any order through
the construct of the familiar convolution with the Gaussian kernel.

For each dimension we can define a scale: the temporal scale o'r, the spatial scales
{o-x, O-y, o-~}. Typically, the spatial scales are identical (isotropy).

The Gaussian kernels can extend both into the past and the future in this case, because we
know both.

The full signal is available, the complete recording duration is the available time axis, in
analogy with the spatial domain of e.g. images. So the kernels extent over both the positive
(the 'future') and the negative time axis (the 'past'). There is no real notion of future or past
here, such as in the real-time case. The future resp. past here merely refers to data measured
later resp. earlier than the datapoint (moment) we are currently analyzing, but which we have
already in memory. Here are the graphs of some temporal derivative kernels of low order:

Block[{$DisplayFunction = Identity},
pl=
Table[Plot[Evaluate[D[gauss[t, a[] 1], {t, n } ] ] ,
{t, - 4 , 4}, A x e s L a b e l -> { " t i m e " , ....},
PlotLabel -> "order: " <> T o S t r i n g [ n ] ] , {n, 0, 3 } ] ] ;
Show[GraphicsArray[pl], I m a g e S i z e -> 5 0 0 ] ;

order 0 ordee 1 order 2 order 3


04
4 2_~2 4 02
time -@~--/f4 time
4 2_0.1 2 4
time
-4 I2 2 4

Figure 20.2 Proper temporal scale-space kernels for the temporal domain for prerecorded
sequences. From left to right: Gaussian temporal derivatives for order 0 to 3. The 0m order
kernel is the appropriate sampling kernel without differentiation, the temporal 'point-operator'.

Mixed partial spatio-temporal operators are spatial Ganssian derivative kernels concatenated
with temporal Ganssian derivative kernels. This concatenation is a multiplication due to the
separability of the dimensions involved.

spike=Table{0, {128], {128)]; spike{{64, 64]] = 10s;


gx := gD[spike, 1, 0, 20] ; lap1 := gD[spike, 2, 0, 20] + gD[spike, 0, 2, 20] ;
gxt = T a b l e [ E v a l a a t e [ g x * D [ g a u s s [ t , i.] , t] ] , {t, -4, 4}] ; maxgxt = Max[gxt] ;
laplt = Table [Evaluate [lap1 * D [gauss [t, 1. ] , t ] ] , {t, - 4, 4 } ] ;
maxlaplt = Max[laplt];
Bloek[{$DisplayFunetion = Identity],
pl = ListDensityPlot[#, PlotRange -> {-maxgxt, maxgxt]] & /@ gxt;
p2 = ListDensityPlot[#, PlotRange -> {-maxlaplt, maxlaplt)] & /@ laplt] ;

Show[GraphicsArray[{pl, p2}], I m a g e S i z e - > 4 0 0 ] ;


Plot[Evaluate[D[gauss[t, i.], t]], {t, -4, 4},
AxesLabel -> { " t i m e ~ " , ....l, A s p e c t R a t i o -> .1, I m a g e S i z e -> 4 3 0 ] ;
348 20.2 Analysis of prerecorded time-sequences

Figure 20.3 Spatio-temporal Gaussian derivative kernels. Top row: a~-~, a~c; the first order
O O2 O2
derivative in x and t, second row: ~-(a~- + aT-) G, the first order time derivative of the
spatial Laplacian operator. Bottom plot: First order Gaussian temporal derivative operator,
showing the temporal modulation of the spatial kernels. In the second half of the time domain
the spatial kernels are reversed in polarity. The horizontal axis is the time axis.

So the appearance of a spatiotemporal operator is as a spatial operator changing over time,


with a speed indicated ('tuned') by the temporal scale parameter. We illustrate this with an
example in 2D-t. The mathematical expression for the mixed partial derivative operator, first
order in the x-direction and time, is o~ o . This translates in scale-space theory into a
aG and bTx,
convolution with a concatenation of the Gaussian derivative kernels -3F aG leading to
a convolution operator ~ due to the separability of the kernels.

The following commands generate the sequence as an animation (electronic version only).
Doubleclick the image to start the animation. Controls appear in the bottom of the notebook
window.

spike = Table[0, {64}, {64}]; spike[ [32, 32l] = 103


gx := gD[spike, 1, 0, 10] ;
gxt = T a b l e [ E v a l u a t e [ g x * D [ g a u s s [ t , I.], t]], {t, -4, 4, .5}] ;
m a x g x t = Max[gxt] ; pl =
L i s t D e n s i t y P l o t [#, P l o t R a n g e -> {-maxgxt, m a x g x t } , I m a g e S i z e -> 100] & /@
gxt;

02 G
Figure 20.4 Animated sequence of the spatio-temporal Gaussian derivative kernel a-T~-ax
' In
the second half of the time domain the spatial kernels are reversed in polarity. Similar
sequence as the top row in figure 20.3.

Note that these temporal derivatives are point operators, they merely measure the change of
the parameter (e.g. luminance) over time at the operational point, so per pixel. This is
essentially different from the detection of motion, the subject of chapter 17, where relations
between neighboring pixels are established to measure the opticflow.
20. Scale-time 349

20.3 Causal time-scale is logarithmic


For real-time systems the situation is completely different. We noted in the introduction that
we can only deal with the past, i.e. we only have the half time-axis. This is incompatible with
the infinite extent of the Gaussian kernel to both sides.

With Koenderink's words: "Because the diffusion spreads influences with infinite speed any
blurring will immediately spread into the remote future thereby violating the principle of
temporal causality. It is clear that the scale-space method can only lead to acceptable results
over the complete axis, but never over a mere semi-axis. On the other hand the diffusion
equation is the unique solution that respects causality in the resolution domain. Thus there
can be no hope o f finding an alternative. The dilemma is complete" ([Koenderink1988a]).

The solution, proposed by Koenderink, is to r e m a p (reparametrize) the half t-axis into a full
axis. The question is then how this should be done. We follow here Koenderink's original
reasoning to come to the mapping function, and to derive the Gaussian derivative kernels on
the new time axis.

We call the remapping s(t). We define to the present moment, which can never be reached,
for as soon as we try to measure it, it is already further in time. It is our reference point, our
only point in time absolutely defined, our f i d u c i a l m o m e n t . Every real-time measurement is
relative to this point in time. Then s should be a function o f / l = to - t, so s(~u) = s(to - t) .
We choose the parameter /~ to be dimensionless, and /~ = 0 for the present moment, and
/~ = - c ~ for the infinite past. So we get s(/0 = s ( L ~ The parameter r is some time
constant and is essentially arbitrary. It is the scale of our measurement, and we should be
able to give it any value, so we want the diffusion to be scale-invariant on the p -domain.

W e also want shift invariance on this time axis, and the application o f different clocks, so we
require that a transformation t ' = a t + b leaves s(t) invariant. ,u is invariant if we change
clocks.

Plot[0, {t, -10, 0},


Ticks-> {{{-6, "/~2"}, {-2, "/az"}}, None}, P l o t R a n g e - > {-I, 5},
A s p e c t R a t i o -> .1, E p i l o g -> {Text [ "~ time", { - 10, .5 } ] ,
T e x t [ " 0 (= present) ", { .6, .5}l }, I m a g e S i z e -> 440] ;

~ilur 0 (= presenl)
#2 /ll

Figure 20.5 The time-axis has only the negative half. The right end is the present moment.
The moments in the past/~1 and/~2 are observed with a resolution that is proportional with
their 'past time', i.e./1.

On our new time-axis s(t) the diffusion should be a normal, causal diffusion. On every point
o f the s-axis we have the same amount of diffusion, i.e. the diffusion is h o m o g e n e o u s on the
s-domain.
The 'inner scale' or resolution of our measurement has to become smaller and smaller when
we want to approach the present moment. But even if we use femtosecond measuring
350 20.3 Causal time-scale is logarithmic

devices, we will never catch the present moment. On the other side of the s-axis, a long time
ago, we don't want that high resolution. A n event some centuries ago is placed with a
resolution of say a year, and the m o m e n t that the dinosaurs disappeared from earth, say some
65 million years ago, is referred to with an accuracy of a million years or so.

This intuitive reasoning is an expression of the requirement that we want our time-resolution
r on the s-axis to be proportional to M, i.e. r -~/.t or • = constant. So for small p we have a
small resolution, for large M a large one.

tO = 1; z = 1;
inset = LogLinearPlot[0, {x, 0, 1}, GridLines -> Automatic, Frame -> True,
FrameTicks -> False, DisplayFunction -> Identity, AspectRatio -> .25];
tO- t
Plot[-Log[---~--] , {t, -i, tO}, P l o t R a n g e - > {-i.5, 4},

AxesLabel-> {"t", "s"},


Epilog -> {Rectangle[{-l, -Log[t0 + i]}, {.75, -Log[t0 - .75]}, inset],
Dashing [{. 01, .01}], Line[{{l, -1.5}, {I, 4}}]}, ImageSize -> 300];

. . . . . . 1

I I I I+11111 '
-1

Figure 20.6 The logarithmic mapping of the horizontal t-time half-axis onto the vertical s-time
full axis. The present moment to (at t = 1 in this example, indicated by the vertical dashed
line) can never be reached. The s-axis is now a full axis, and fully available for diffusion. The
rectangular inset box with gridlines shows a typical working area for a real-time system. The
response time delimits the area at the right, the lifetime (history) at the left. Figure adapted
from [Florack1997a].

On the s-axis we should have the possibility for normal, causal diffusion. This means that the
'magnification' ] ~-uas ] should be proportional to • Then the s-axis is 'stretched' for every M
in such a way that the scale (or 'diffusion length' as Koenderink calls it) in the s - d o m a i n is a
constant relative diffusion length in the/.t-domain.

Uniform sampling in the s - d o m a i n gives a graded resolution history in the t- or/.t-domain.


In formula : ] tds
i T [ = ~r or ] ~ds I = ~ol. From this partial differential equation we derive
that the mapping s(M) must be logarithmic:

DSolve[Ou s [ u ] == --,
/a
s[u], /a]

[{S[//] ~ C [ I ] +o~Log[/~] }}
20. Scale-time 351

So our mapping for s is now: s = o ~ l n ( L ~ Z ) + c o n s t a n t . The constant is an arbitrary


translation, for which we defined to be invariant, so we choose this constant to be zero. W e
choose the arbitrary scaling parameter o~ to be unity, so we get

s = In (L_0.0.0.~).

This is a fundamental result. For a causal interpretation of the time axis we need to sample
time in a logarithmic fashion. It means that the present m o m e n t is mapped to infinity, which
conforms to our notion that we can never reach it. W e can now freely diffuse on the s-axis,
as we have a well defined scale at all moments on our transformed time axis. This mapping
also conforms to our notion o f resolution of events in the past: our memory seems to do the
same weighting o f the aperture over the past as the transformation we introduced above.

In the s - d o m a i n we can now run the diffusion equation without violation of temporal
causality. The diffusion equation is now 0zL 0L ' where Vs = -1f 7.2 is the temporal
Os2 - o.,,s
variance. We recall from the diffusion equation in the spatial domain that the Laplacian (of
the luminance in our case) along the diffusion axis is equal to the rate of change (of the
luminance) with the v a r i a n c e of the scale. The check of dimensions on both sides o f the
equation is always a good help. The temporal blurring kernels (the scale-space measurement
apertures) are now given by

K(s, s'; o-s) = - - ~1 e (s-s')2


272 , o r K ( s , s ' ; v s ) = ~ l e
(s-s')2
4v,

20.4 Other derivations of logarithmic scale-time


Florack [Florack1997a] came to the same result from a different perspective, from abstract
mathematics. He used a method from g r o u p t h e o r y . Essentially, a group is a mathematical
set o f operations, which (in popular words) all do the same operation, but with a different
parameter. Examples of a transformation group are the group of rotations, the group of
additions, the group of translations etc. A group is formally defined as a set of similar
transformations, with a m e m b e r that does the unity operation (projects on itself, i.e. does
nothing, e.g. rotation over zero degrees, an enlargement o f 1, a translation of zero etc.), it
must have an inverse (e.g. rotation clockwise, but also anti-clockwise) and one must be able
to concatenate its members (e.g. a total rotation which consists o f two separate rotations after
each other).

Florack studied the group properties of whole and half axes o f real numbers. The group o f
summations is a group on the whole axis, which includes the positive and the negative
numbers. This group however is n o t a group on the half axis. For we might be able to do a
summation which has a result outside the allowed domain. The group of multiplications
however is a group on the positive half axis.

T w o numbers multiplied from the half axis give a result on the same half axis. If we could
make all sums into multiplications, we would have an operation that makes it a group again.
The formal transformation from sums into multiplications is the logarithmic function:
e a § = e a . e b and its inverse ln(a 9b) = ln(a) + ln(b). The zero element is addition o f zero,
352 20.4 Other derivations of logarithmic scale-time

or multiplication with one. So the result is the same logarithmic function as the function of
choice for the causal parametrization of the half axis.

Lindeberg and FagerstrOm [Lindeberg1996b] derived the causal temporal differential


operator from the non-creation of local extrema (zero-crossings) with increasing scale.
Jaynes' method of 'transformation groups' to construct prior probabilities has a strong
similarity to the reasoning in this chapter. For details see [Jaynes1968].

Interestingly, we encounter more often a logarithmic parametrization of a half axis when the
physics of observations is involved:

- Light intensities are only defined for positive values, and form a half axis. It is well known
e.g that the eye performs a logarithmic transformation on the intensity measured on the retina.
- Sound intensities are measured in decibels (dB), i.e. on a logarithmic scale.
- Scale is only defined for positive values, and form a half axis (scale-space). The natural
scalestep ~- on the scale-axis in scale-space is the logarithm of the diffusion scale o-:
r = In(o-) - ln(o-0) (recall chapter 1).

Another example of the causal logarithmic mapping of the time axis is the striking Law of
Benford, which says that in a physical measurement that involves a duration, the occurrence
of the first digit has a logarithmic distribution. I.e. a "1" as the first digit occurs roughly 6.5
times more often than a "9"! This law follows immediately from the assumption that random
intervals are uniformly distributed on the logarithmic s-axis [Florack1997a, pp. 112].

Task 20.1 The decay times of a large set of radioactive isotopes is available on
internet: isotopes.lbl.gov and ie.lbl.gov/tori.html. Show that the first digits of
these decay times indeed show Benford's Law, with a much pronounced
occurrence of "l'"s. [Buck1993].

Our perception of time also seems to be non-linear. We have more trouble remembering
accurate events that took place a long time ago, as in our childhood, than events that just
recently happened. Why has last year passed by so quickly, and did a year at highschool
seem to last so much longer? Our perception is that time seems to go quicker when we get
older. We have a increasingly longer reference to the past with which we can compare, and
our notion is that time seemed to go slower a longer time ago. These observations suggest
some logarithmic scaling: stretched out at the low end and compressed at the high end. It is
known in psychological literature that our perceived time 'units' may be related to our age: a
10% of age for a 5-year old is half a year, for a 50-year old it is 5 years. A half year seems to
pass just as quick for the 5-year old as 5 years for the 50-year old. This also leads to a
logarithmic notion of time. See also [Gibbon18981] and [P6ppel 1978a].
20. Scale-time 353

20.5 Real-time receptive fields


W e have now all information to study the shape of the causal temporal derivative operators.
The kernel in the transformed s - d o m a i n was given above. The kernel in the original temporal
domain t becomes

_ ~ l n t t0 t . 2
K(t, to; r ) = 1 e 2r 2 ' r /

In figure 20.7 we see that the Gaussian kernel and its temporal derivatives are skewed, due to
the logarithmic time axis remapping. It is clear that the present m o m e n t to can never be
reached.

C l e a r [gt] ; B l o c k [ { $ D i s p l a y F u n c t i o n = Identity},
p l = T a b l e [r = .2; to = 0; P l o t [ E v a l u a t e [ g t [n] = D [
1
Expr-----~
L 2 r2
,ogrt~
L E
tl~],{t,
J
nl]],
{t, -.4, 0}, P l o t R a n g e - > A l l ,
P l o t L a b e l ~ " t e m p o r a l o r d e r = " <> T o S t r i n g [ n ] ,
Epilog-> Text["time ~ " , {-.45, 0}]], {n, 0, 2}]];
Show [GraphicsArray[pl] , ImageSize -> 400] ;

temporalorder= 0 tempomlorder 1 temporaloldet-2

tme-~04 03 - 2 O1 1 time: ~ 0
-~i ......... L~oo
time :
-04 -03 - 0 2 -O1

Figure 20.7 Left, middle, right: the zeroth , first and second Gaussian temporal derivative
operator in causal time. The timescale in each plot runs from the past on the left, to the
present on the right. The temporal scale r = 200 ms, the right boundary is the present, to = O.
Note the pronounced skewness of the kernels.

The zerocrossing of the first order derivative (and thus the peak of the zeroth order kernel) is
just at t = - r . The extrema of the first derivative kernel are found when we set the second
derivative of the time kernel to zero. Here are both solutions:

Clear[t, z, t i m e k e r n e l ] ; to = 0; O f f [ I n v e r s e F u n c t i o n : : i f u n , Solve::ifun] ;

timekernel [t_] =
I Exp [- - -
i Log
[5~2] ~] ;
2~--~ r2 2 r2
zerot = Solve[0ttimekernel[t] == 0, t]
zerott = Solve[at,ttimekernel[t] == 0, t]

{{t~ ~}, {t~ ~}}

It is now easy to combine the spatial and temporal kernels. Time and space are separable
incommensurable dimensions, so we may apply the operators in any order: first a spatial
kernel and then a temporal kernel is the same as first a temporal and then a spatial kernel.
354 20.5 Real-time receptive fields

The most logical choice is the simultaneous action. When we make a physical realization of
such an operator, we have to plot it in the spatial and time domain. An example is given
below for a 2D first order spatial derivative, and first order temporal derivative. We get a 2D-
time sequence:

spike =Table[0, {128}, {128}]; spike[J64, 64l] = 108;


= .3; gt = ~ttimekernel[-t]; gx =gD[spike, 1, 0, 20];
m a x = Max[gx ~t timekernel[t] /. First[zerott]];
p1 = Table[ListDensityPlot[gt gx, PlotRange -> {-max, max},
DisplayFunction -> Identity], {t, .1, .6, .05}];
Show[GraphicsArray[pl], ImageSize -> 500];

Figure 20.8 Eleven frames of a time series of a 2D-time spatio-temporal time-causal


Gaussian derivative kernel, which takes the first order derivative to time and the first order to
space in the x-dimension. The present moment is on the left. Note the inversion that takes
place in the spatial pattern during the time course.

This has been measured in cortical simple cells by single cell recordings (see figure 20.9).
See for the methodology to map receptive field sensitivity profiles chap. 9 and figure 11.12.

Show[GraphicsArray[Partitlon[
I m p o r t / @ ("separablext " <>ToString[#] <> " . g i f " & / @ R a n g e [ 6 0 ] ) ,
10]], ImageSize-> 500];

Figure 20.9 Time sequence of a cortical simple cell. Frames row-wise, upper left frame is
first. Time domain: 0 - 300 msec in 5 msec steps. Data of the cell: on-center X cell, non-
lagged; X-Y domain size: 3 x 3 degs; Bar size: 0.5 x 0.5 degs; Eye: left; Total time for
receptive field measurement: 19 min. Number of repetitions: 50; Duration of stimuli: 26.3ms.
From: neu rovision.berkeley.edu/Demonstrations/VSOC/teaching/R F/LGN.htmI.

20.6 A scale-space model for time-causal receptive fields


The temporal behaviour of cortical receptive fields has been measured extensively. The
method most often used is the method of 'reverse correlation' (explained in section 9.6).
20. Scale-time 355

Figure 20.9 shows a recording of a cortical simple cell from Freemans' lab in Berkeley, that
shows clearly the spatial similarity to the first Gaussian derivative kernel, and the modulated
amplitude over time. The spatio-temporal relations become more clear when they are plotted
in the spatio-temporal domain. Figure 20.10 shows a plot where the horizontal axis displays
the spatial modulation, the vertical axis displays the temporal modulation.

Show[Import["Simple cell V1 XT.gif"],


Frame -> True, FrameLabel -> {"space --*"i "time --*"),
FrameTicks -> None, ImageSize -> i00];

Figure 20.10 Spatio-temporal response of a cortical simple cell. From [DeAngelis1995a]. The
polarity of the spatial response reverses sign during the time course of the response of the
cell. Horizontal axis is space (6 degs), vertical axis is time (0-300 msec). Time zero is at the
bottom. Cell data: X-Y domain size: 6 x 6 degs; X-Y grid: 20 x 20 points; time domain: 0 -
300 msec in 5 msec steps; orientation: 15 degs; bar size: 2.5 x 0.5 degs; duration of
individual stimuli: 52.8 msec (4 frames).From [neurovision.berkeley.edu]. Copyright Izumi
Ohzawa, UC Berkeley.

Here is another cell measured by Dario Ringach and coworkers [Ringach1997]. Data from
manuelita.psych.ucla.edu/~dario/research.htm:

Show[GraphicsArraylPartition[
I m p o r t / @ ("Ringach V1 " <> ToString[#] <> ".gif" & / @ Range[12]),
6]], ImageSize -> 500];

Figure 20.11 Timescale 35 ms (first frame) to 90 ms (last frame) in 5 ms intervals. Upper left
frame is the start of the sequence,

Recent more precise measurements of the spatio-temporal properties of macaque monkey


and cat LGN and cortical receptive fields give support for the scale-time theory for causal
time sampling. De Valois, Cottaris, Mahon, Elfar and Wilson [DeValois2000] applied the
method of reverse correlation and multiple receptive field mapping stimuli (m - sequence,
356 20.6 A scale-space model for time-causal receptive fields

maximum length white noise stimuli) to map numerous receptive fields with high spatial and
temporal resolution. Fig 20.12 shows some resulting receptive field maps:

Show[GraphicsArray[Import /@
("Valois RF "<>ToString[#] <>".jpg"&/@ {a, b, c, d, e, f])],
ImageSize -> 500];

Figure 20.12 Examples of spatio-temporal receptive field maps of a sample of Vl simple cells
of macaque monkey. Vertical axis in each plot: time axis from 0 ms (bottom) to 200 ms (top).
Horizontal axis per plot: space (in degrees), a: 0-0.9, b: 0-1.2, c,d: 0-0.6, e: 0,1.9, f: 0-1.6
degrees. Note the clearly skewed sensitivity profiles in the time direction. Every 'island' has
opposite polarity to its neighboring 'island' in each plot. Due to black and white reproduction
the sign of the response could not be reproduced. The scale-space models for the plots are
respectively: a: ~a~L
, b: ~a~L oL d: T~,
, c: T~, aL e: ax2
a3Lat , f: 3~W
a2L Adapted from [DeValois2000].

If we plot the predicted sensitivity profiles according to Gaussian scale-space theory we get
remarkably similar results. In figure 20.13 the space-time plots are shown for zeroth to second
spatial and temporal differential order. Note the skewness in the temporal direction.

Important support for especially the Gaussian scale-time derivative model comes from
another observation by De Valois et ai. [DeValois2000]. They state: 'Note that the response
time course of these two non-directional cell populations ale approximately 90 degrees
shifted in phase relative to each other, that is, they are on average close to temporal
quadrature.

That is, the population of biphasic cells is shifting over from one phase to the reverse at the
time that the monophasic cell population reaches its peak response' (a quadrature filter can be
defined, independently of the dimensionality of the signal space, as a filter that is zero over
one half of the Fourier space. In the spatial domain, the filter is complex: an even real part
and an odd imaginary part).

C l e a r [ g t , gs, n] ; r [] 0.3; to = 0; u = 2;

gt[n_] =D[ 1 ~2 Exp[_ 2 1~ Log[_ t_t0--t_]2 ], { t , n}];

gs[n_] = D 2[~-=- 1a 2 Exp[-T-~o.2x2], {x, n}];


B l o c k [ { $ D i s p l a y F u n c t i o n = Identity],
p = T a b l e [ContourPlot [Evaluate rgt [i] gs [j ] ] ,
{x, -15, 15), {t, .01, .8}, P l o t P o i n t s - > 30,
C o n t o u r S h a d i n g - > True, F r a m e L a b e l ~ { "space", "time" },
P l o t L a b e l ~ "nspace=" <> ToStringrj] <> ",ntlme='' <> ToString[i] ],
{j, 0, 2}, {i, 0, 2}]];
20. Scale-time 357

Figure 20.13 Model for time-causal spatio-temporal receptive field sensitivity profiles from
Gaussian scale-space theory. All combinations from zeroth to second order partial derivative
operators with respect to space and time are shown. Vertical axis: time. Horizontal axis:
space.

This fits very well to the model: The receptive fields are the zero th and first order temporal
Gaussian derivative, and for these functions the zerocrossing of the the first order derivative
coincides with the maximum of the z e r o th order.
358 20.6 A scale-space mode~for time-causal receptive fields

Show [GraphicsArray [Import /@ { "Valois time to peak01, jpg",


"Valois time to peak02, jpg" } ] , I m a g e S i z e -> 360] ;

I . . . . . . . . . . . .

'~ ~~ l ;; ~ .... ~l~ l -- ~'~ --t''~ ,}~ ,~ 1~0

Figure 20.14 Distribution of time-to-peak responses, calculated from the spatio-temporal


receptive field mappings of the population of monophasic cells, and of both the initial and
later reversed phase of the biphasic cells. Note that the time of peak response for the
monophasic cells almost exactly coincides with the time of polarity reversal of the biphasic
cells, in perfect agreement with the Gaussian temporal derivative model Horizontal axis:
time-to-peak response time, in ms. Vertical axis: number of cells. Right figure: Time-to-peak
response of a random sample of just 5 cells, showing that even a small sample set has the
properties of the larger collection. Adapted from [DeValois2000].

De Valois et all define the biphasic index to be the ratio of the amplitude of the 2nd temporal
peak and the amplitude of the first peak of the temporal response function. The maximum
and minimum value of the first order temporal derivative of the kernel (Or t e m p r f ) are
where the second order temporal derivative (Ot, t t e m p r f ) is zero. We simplify the
expression knowing that o-t > 0, and get two values, one for the minimum amplitude and one
for the maximum amplitude:

1 1 t 2
Clear[z, tO], t e m p r f = - - Exp[ - ~ Log[-~] ];
r2 2 ~2
peaks []S i m p l i f y [ O t t e m p r f /. S01ve[at,ttemprf--= 0, t], z > 0]

2 2~ tO v 3 ' 2 2~ tO ~3

The biphasic index is no longer a function to to, and only depends on o-t:

peaks [ [2] ]
biphasicindex = // S i m p l i f y
peaks [ [i] ]

So we may expect a range of values for the biphasic index. De Valois etal. found values for
the biphasic index having a distinct peak between 0.4 and 0.8, which corresponds well with a
temporal scale range of 100 - 500 milliseconds (see graph in figure 20.15).
20. Scale-time 359

Plot[biphasicindex, (r, .1, .5}, ImageSize-> 140,


AxesLabel ~ { "timescale\nr", "Biphasic\nindex" }] ;
Biphasic
index
08~
07
06
O5 timescale
"0:5T
Figure 20.15 Predicted biphasic index according to the Gaussian time-causal temporal
differential operator model. The biphasic index is defined [DeValois2000] as the ratio
between the amplitude magnitudes of the first and second peak in the response in the time
domain.

9 Task 20.2 Give a measure for the skewness of the time-causal Gaussian kernel.

20.7 Conclusion
The causal time-scale, multi-scale temporal differential operator model from Ganssian scale-
space theory has not yet been tested against the wealth of currently available receptive field
measurement data.

It may be an interesting experiment, to test the quantitative similarity, and to find the
statistics of the applied spatial and temporal scales, as well as the distribution of the
differential order.

The Gaussian scale-space model is especially attractive because of its robust physical
underpinning by the principal of temporal causality, leading to the natural notion of the
logarithmic mapping of the time axis in a real-time measurement.

The distributions of the locations of the different scales and the differential orders have not
yet been mapped on the detailed cortical orientation column with the pinwheel structure.
Orientation has been clearly charted due to spectacular developments in optical dye high
resolution recording techniques in awake animals. Many interesting questions come up: Is
the scale of the operator mapped along the spokes of the pinwheel? Is the central singularity
in the repetitive pinwheel structure the largest scale? Is differential order coded in depth in
the columns?

These are all new questions arising from a new model. The answer to these questions can be
expected within a reasonable time, given the fast developments, both in high resolution
recording techniques, and the increase in resolution of non-invasive mapping techniques as
high-field functional magnetic resonance imaging (fMRI) [Logothetis1999|.
360 20.8 Summary of this chapter

20.8 Summary of this chapter


When a time sequence of data is available in stored form, we can apply the regular
symmetric Gaussian derivative kernels as causal multi-scale differential operators for
temporal analysis, in complete analogy with the spatial case. When the measurement and
analysis is real-time, we need a reparametrization of the time axis in a logarithmic fashion.
The resulting kernels are skewed towards the past. The present can be never reached, the new
logarithmic axis guarantees full causality. The derivation is performed by the first principle
of a scale of observation on the new time axis which is proportional to the time the event
happened. This seems to fit well in the intuitive perception of time by humans.

Recent physiological measurements of LGN cell receptive fields and cortical V 1 simple cell
receptive fields reveal that the biological system seems to employ the temporal and
spatiotemporal differential operators. Especially striking is the skewness in the temporal
domain, giving strong support for the working of the biological cells as time-causal temporal
differential operators.
21. Geometry-driven diffusion
To teach is to learn twice. (Joseph Joubert, 1754-1824)

21.1 Adaptive Smoothing and Image Evolution


So far we calculated edges and other differential invariants at a range of scales. The task
determined whether to select a fine or a coarse scale. The advantage of selecting a larger
scale was the improved reduction of noise, and the appearance of more prominent structure,
but the price to pay for this is reduced localization accuracy. Linear, isotropic diffusion
cannot preserve the position of the differential invariant features over scale.
A solution is to make the diffusion, i.e. the amount of blurring, locally adaptive to the
structure of the image. E.g. in order to preserve edges while reducing the noise by area-
averaging such as blurring, one may try to prevent blurring at the location of the edges, but
do a good noise-reducing job at pixels (voxels) in a homogeneous area, i.e. where there are
no edges. E.g. the well-known Sobel and Prewitt edge operators do an averaging in the
direction perpendicular to its differential operation (as opposed to the Roberts operator).
This adaptive filtering process is possible by three classes of (all nonlinear) mathematical
approaches, which are in essence equivalent:

1. Nonlinear partial differential equations (PDE's), i.e. nonlinear diffusion equations which
evolve the luminance function as some function of aflow. This general approach is known as
the 'nonlinear PDE approach';
2. Curve evolution of the isophotes (curves in 2D, surfaces in 3D) in the image. This is
known as the 'curve evolution approach'.
3. Variational methods that minimize some energy functional on the image. This is known as
the 'energy minimization approach' or 'variational approach'.

The word 'nonlinear' implies the inclusion of a nonlinearity in the algorithm.


This can be done in an infinite variety, and it takes geometric reasoning to come up with the
fight nonlinearity for the task.

This explains the commonly used term 'geometry-driven diffusion' for this field. Otherwise
stated, this gives us a tool to include knowledge in the way we want to modify the images by
some evolutionary process. We can include knowledge about a preferred direction of
diffusion, or that we like the diffusion to be reduced at edges or at points of high curvature in
order to preserve edges and corners, etc. As we study the evolution of images over time, we
also call this field evolutionary computing of image structure, or the application of
evolutionary operations.

The applicability of these mathematically challenging and nonrigid (but 'only requiring a lot
of clever well-understood reasoning') approaches to image processing sparked the attention
of both pure and applied mathematicians. In each of the general approaches mentioned below
362 21.1 Adaptive Smoothing and Image Evolution

a wealth of methods have been proposed. From 1993-1996 a consortium of laboratories in


the US and Europe (sponsored by the US National Science Foundation and the European
Community) was active in this field. The result of their work is a rather complete recording
of this field at the time [ter Haar Romeny 1994f].

This chapter is an introduction to this rapidly expanding field. Excellent reviews have
appeared for each of the above general approaches. A good point for further introduction into
nonlinear diffusion equations are the review papers by Weickert [Weickert 1997d, 1998a,
1999b]. Curve evolution was given much momentum by the important discovery by Osher
and Setbian that the process of curve evolution became much more stable and flexible when
the curve was considered the zero level set of some characteristic function. The nonlinear
evolution is done on the characteristic function.

Many, if not most, aspects of this level set approach are treated in the book by Sethian
[Sethian 1999]. The variational approaches were pioneered by Mumford and Shah [Mumford
and Shah 1985a]. A detailed overview of these methods is given in the book by Morel and
Solemini [Morel and Solemini 1995].
In the early visual pathway we see an abundance of feedback. A striking finding is that the
majority of fibers (roughly 75% !) in the optic radiation (the fibers between the LGN and the
primary visual cortex) are projecting in a retrograde (backwards) fashion, from cortex to
LGN [Sherman and Kock 1990, Sherman 1996a], recall chapter 11.

These cortico-thalamic projections may well tune the receptive fields with the differential
geometric information extracted with the receptive fields in the visual cortex.

It is long known that the receptive field sensitivity functions of LGN receptive fields are not
static. They may be temporally modulated as temporal differentiating operators, they may
also be some function of the incoming image under control of the cortical input.
Unfortunately little is still known about the exact wiring. It is clear that any LGN cell
receives input from the cortical cell it projects to [Mumford 1991a], but the cortico-fugal
(Latin: fugare = to escape) connections seem to project all over the LGN [Logotbetis 1999],
making long-range interactions possible.

Nonlinear diffusion techniques have become an important and extensive branch in computer
vision theory. It is impossible to explain all approaches, theories and techniques in this
chapter. This chapter is a starting point for interactive study and a basic understanding of the
mathematics involved, such as the underlying geometric reasoning leading to the particular
PDE and numerical stability of approximations.

21.2 Nonlinear Diffusion Equations


The introduction of a conductivity coefficient (c) in the diffusion equation makes it possible
to make the diffusion adaptive to local image structure:

OL ~
-- = V.c VL
Os
21. Geometry-driven diffusion 363

where the function c = c(L, OL OZL . . . . ) is a function of local image differential structure,
ax ' axz
i.e. depends on local partial derivatives. The general structure of a diffusion PDE is shown in
the formula above: The change of luminance with increasing scale is a divergence (V.) of
s o m e f l o w ( c T L ) . We also call c V L the flux function. With c = 1 we have normal linear,
isotropic diffusion: the divergence of the gradient is the Laplacian. The most famous case,
introduced by Perona and Malik in their seminal paper [Perona and Malik 1990] which
sparked the field, is where c is a decreasing monotonic function of the gradient magnitude,
c = c ( ] V L [). The diffusion is reduced at the location of edges because c is small at strong
edges, and vice versa.

The term conductivity is understood when we consider the analogon of diffusion of heat in a
metal plate. The local flow of heat can be influenced by the insertion of local heat isolators in
the plate, which act as barriers for the heafflow, leading to a non-isotropic diffusion of heat
and thus to a non-isotropic temperature distribution over the plate over time. The
conductivity c is different for every pixel location, and is a function of the local differential
structure. We will discuss several possible classical functions proposed in the literature.

The nonlinear diffusion paradigm enables geometric reasoning, we may put knowledge in
the task of the evolution of the image. Examples of such reasoning statements are:
- 'reduce the diffusion at locations where edges (or other local features such as corners, T-
junctions, etc.) occur', or
- 'adapt the diffusion so it is maximized along edges and minimized across edges', or
- 'enhance the diffusion in the direction of ridges and reduce the diffusion perpendicular to
them' etc.

The naming of nonlinear diffusion equation is sometimes not consistent in the literature. We
list some names and their meaning:
- linear diffusion, equivalent to isotropic diffusion: the diffusion is the same in all directions,
for all dimensions; the conductivity function c is a constant;
- geometry-driven diffusion, the most general naming for the use of geometric reasoning; it
includes (invariant) local differential geometric properties in the diffusion equations, curve
evolution schemes and variational, energy minimizing expressions;
- variable conductance diffusion, the 'Perona and Malik' type of gradient magnitude
controlled diffusion;
- inhomogeneous diffitsion: the diffusion is different for different locations in the image; this
is the most general naming;
- anisotropic diffusion: the diffusion is different for different directions; the Perona and
Malik nonlinear diffusion scheme is not an anisotropic diffusion (despite the title of their
original paper); it says nothing about the direction, only about the magnitude of the
diffusion; it is an example of inhomogeneous diffusion, or variable conductance diffusion;

- coherence enhancing diffusion: a particular case of anisotropic diffusion where the


direction of the diffusion is governed by the direction of local image structure, for example
the eigenvectors of the structure matrix or structure tensor (the outer product of the gradient
with itself, explained in chapter 6), or the local ridgeness.
- tensor driven diffusion: the diffusion coefficient is a tensor, not a scalar. Mathematically,
364 21.2 Nonlinear Diffusion Equations

this is the most general form. The tensor operates as a Euclidean operator on the gradient
flow vector, and can modify its magnitude and direction accordingly. The tensor can be
calculated on a different scale than the gradient.

21.3 T h e P e r o n a & Malik E q u a t i o n

Perona and Malik [Perona and Malik 1990] proposed to make e a function of the gradient
magnitude in order to reduce the diffusion at the location of edges:

with two possible choices for c: Cl = e k2 , and c a = l 1+ k2 ). Note that these


expressions are equal up to a first order approximation:

< < F r o n t E n d V i s i o n " FEV" ;

c, = s e , , i e , [ - . x p [ - ~ ] , ~<,,I,), 0, 4~]

c~--series[z z+ k~ , {(v-.),O, 4l]

(VL) 4
1- + ~ +O[VL] 5

i- ~ + ~ +O[VL] ~

im = Import["mr256.gif"][[l, 1]]; a = 2;

DispZayT~176 [ L gD[im, 1, O, a]2+ gD[im, 0, 1, a] = J]'


#2
PlotLabel -> "k = " <> ToString[#]] & /@ {5, 10, 20}, ImageSize -> 400];

Figure 21.1 The conductivity coefficient Cl in the Perona & Malik equation as a function of
the parameter k. Gradient scale: o- = 2 pixels, image resolution 2562 . For higher k, larger
gradients are taken into account only.

The geometric reasoning here is to let intra-region smoothing occur preferentially over inter-
region smoothing. In this particular choice the conductivity function c(VL) is small for large
edge-strength and vice versa, i.e. the image is diffused most where the gradient is smallest.
IVLI2
Figure 21.1 shows cl = e ~2 for a sagittal MR image for o-VL = 2 and three values for k.
21. Geometry-driven diffusion 365

We see clearly the reduced conductivity at the edges (darker in the pictures), and we see that
we can control the relative influence of the effect with the free parameter k. This parameter
has the dimension of the gradient (meter -1 ) as we want the exponent to be dimensionless.

In the rest of the examples we take the first choice for the conductivity coefficient: cl. So,
the Perona and Malik (P&M) equation becomes

t" Ivq2 ]
9L = V.[e-~Z VL ) (2)

Expanding the differential operators for the right hand side, we get in 1D:

I (O~L[x])2 /
O~ ~.xp[ k2 ] Ox~.[x] // si,.plify
_ L' Ix] 2
e k2 (k 2 - 2 L ' [ x ] 2 ) L" [x]
k2

and in 2D:

k=.;

PM= ax

FullSimplify;
I
E- k2

PM//
OxL[X,

shortnotation
y]
If
+ay E-
....
k~ OyL[x, y]
1
//

e ~z ( (k2 - 2 L2) Lxx - 4 Lx Lxy Ly + (k2 - 2 L~) Lyy)


k2

We recognize the strongly nonlinear character of this equation. Unfortunately, there is no


analytical solution for this PDE. So we have to rely on numerical methods to approximate
the solution. Fortunately there are many efficient and stable numerical schemes for the time-
evolution of an image governed by this type of divergence of a flow-type PDE's. We will
discuss some of them in the course of this chapter.
cOL
The most straightforward numerical approximation of W - - V . c V L is the forward-Euler
approximation 6L = 6 s ( V . c V L ) where 6L is the increment in L and ~s is the (typically
small) stepsize in scale: the evolution stepsize. Through iteration (calculation of many small
consecutive increments) we can calculate the image at the required level of evolution, i.e. at
the required level of adaptive blurring.

The images in between form a scale-space again, or alternatively, a series of images over
evolution4ime. For the limit k -~ oo, we get the linear diffusion equation again:

Limit[PM, k -> ~ ] // s h o r t n o t a t i o n

Lxx + Lyy
366 21.4 Scale-space implementation of the P&M equation

21.4 Scale-space implementation of the P&M equation


To work on discrete images, we replace (with the R e p l a c e operator, short notation / . )
every occurrence of a spatial derivative in the fight-hand side of the P&M equation ( p m c l
resp. pmc2) with the scaled Gaussian derivative operator gD:

For Cl :

(ox L [ x , y ] ) 2 § (0y L [ x , y ] ) 2
Clear[im, ~, k]; ca = E ,2 ;
pmcl[im_, a_, k_] = 0x (cz 0xL[x, y]) +0y (cz 0yL[x, Yl) /-
Derivative[n_, m_] [L] [x_, y_] ~ gD[im, n, m, G] // Simplify

~-2 ( e-~=' . . . . . . . 12~.......... ]2 (( k2 - 2 g D [ i m , 0, 1, a] 2 ) g D [ i m , 0, 2, O-] -


4 gD[im, 0, i, a] gD[im, i, 0, a] gD[im, i, i, a] +
(k 2 - 2 g D [ i m , i, 0, a] 2) gD[im, 2, 0, ~]))

For C2 :

Clear[im, ~, k] ; ca = 1
// 1+
(0xL[x,yl)a+
-k~
(OyLrx,y])a / ;
pmc2 [ira_, o_, k_] = 0x (ca 0xL[x, y]) +0y (c2 0yL[x, y]) /.
Derivative[n_, m_] [L] Ix_, y_] ~ gD[im, n, m, o] // Simplify

(k 2 (gD[im, 0, 2, a] (k 2 -gD[im, 0, i, c] 2 +gD[im, i, 0, a] 2) -


4gD[im, 0, i, a] gD[im, i, 0, G] gD[im, i, I, G] +gD[im, 0, i, c~]2
gD[im, 2, 0, c] + (k 2 -gD[im, i, 0, c] 2 ) gD[im, 2, 0, a ] ) ) /
(k 2 +gD[im, 0, i, o] 2 +gD[im, i, 0, 012) 2

We calculate the variable conductance diffusion first on a simple small (64x64) noisy test
image of a black disk (minimum: 0, maximum: 255):

i m d i s k = T a b l e [ I l l ( x - 3 2 ) 2 + (y-32) 2 < 300, 0, 255], {y, 64}, {x, 64}];


noise= T a b l e [ 1 0 0 R a n d o m [ ] , {64), {64}]; i m = i m d i s k + n o i s e ;
ListDensityPlot[im, ImageSize -> 120];

Figure 21.2 Simple 64x64 test image of a black disk on a white background (intensities 0
resp. 255) with additive uniform noise (amplitude=100).

A rule for the choice of k is difficult to give. It depends on the choice of which edges have to
be enhanced, and which have to be canceled. The histogram of gradient values (at o- = 1)
may give some clue to how much 'relative edge strength' is present in the image:
21. Geometry-driven diffusion 367

Histogram[Flatten[grad =~(gD[im, i, 0, 1] 2 + gD[im, 0, i, l]Z)],


HistogramCategories ~ Range[0, 200, 51, ImageSize -> 280];

Figure 21.3 Histogram of the (scalar) magnitude of the gradient values at o- = 1 of the disk
image of figure 21.2.

In section 21.5 we derive that k determines the ’turnover’ point of edge reduction versus
enhancement. A forward-Euler approximation scheme now becomes particularly simple:

peronamalikcl[im_, 6s_, a_, k_, niter_] :=Module[{], evolved= im;


Do[evolved+= 6s (pmcl[evolved, a, k]), {niter}]; evolved];

where i m is the input image, 6 s is the time step, a is the scale of the differential operator, k
is the conductivity control parameter and n i t e r is the number of iterations. Here is an
example of its performance:

line = {Red, Line[{{0, 321, {64, 32}]I}; DisplayTogetherArray[


{ListDensityPlot[#, Epilog -> line] & /@
{im, imp =peronamalikcl[im, .i, .7, 25, 201},
ListPlot/@ {im [[3211 , imp [[321]}} , ImageSize -> 3801;

Figure 21.4 Top left: input image; top right: same image after variable conductance diffusion,
with operator scale o-= .8 pixels, timestep 5s = 0.1, k = 100, nr. of iterations = 10. Bottom
row: intensity profile of middle row of pixels for both images. The edge steepness is well
preserved, while the noise is substantially reduced. A little overshoot is seen at the edges.
368 21.4 Scale-space implementation of the P&M equation

We can define a signal-to-noise ratio (SNR) for this particular image by taking two square
(16x16) areas, one in the middle of the black disk and one in the lower left corner in the
background. The signal is defined as the difference of the means, the noise as the sum of the
variances of the intensity values in the areas:

<< Statistics'DescriptiveStatistics';
snr[im_] :=
Module[{7, ml = SubMatrix[im, {24, 247, {16, 167] //Flatten;
m2 = SubMatrix[im, {3, 3}, {16, 16}] //Flatten;
Mean[m2] - M e a n [ m l ] ];
Variance[ml] + Variance[m2]

ListDensityPlot[im, Epilog -> {Hue[l], Thickness[.011,


Line[{{3, 3}, {3, 19}, {19, 19}, {19, 3}, {3, 37}],
Line[{{24, 24}, {24, 407, {40, 407, {40, 24], {24, 2477]7,
ImageSize -> 150];

Figure 21.5 Areas through which the signal-to-noise ratio (SNR) is defined.

Clearly, the signal-to-noise ratio increases substantially during the evolution until
t=niter 6s=l:

e v o l v e d = im; out = {}; a = .8; 6s = .1; k = 100; niter = 10;


Do[evolved+= 6s (pmcl[im, Q, k]);
out =Append[out, snr[evolved]], {niter}];
ListPlot[out, PlotJoined -> True, AxesLabel ->
{"evolution\ntime\n(in iterations)", "SNR"}, ImageSize -> 250];

0.8

06

04

evolution
. . . . . . . . . . . . . time
4 6 8 10 (initeration$

Figure 21.6 The signal-to-noise ratio (SNR) increases substantially with increasing evolution
time.
21. Geometry-driven diffusion 369

But this cannot continue, of course, for physical reasons. When w e continue the evolution
until t = 20 (in units of iterations), w e see that the gain is lost again:

e v o l v e d [] im; o u t = { } ; a = .8; 6s = .1; k = i00; n i t e r = 20;


Do[evolved+=6s ( p m c l [ i m , a, k ] ) ;
out =Appendrout, snr[evolved]], {niter}] ;
ListPlot[out, PlotJoined-> True, AxesLabel->
{ "evolutlon\ntlme\n (in i t e r a t i o n s ) " , " S N R " }, I m a g e S i z e -> 190] ;

SNR

08
06
O4

02 evolution
’ 20 time
5 10 15 (in iterationO

Figure 21.7 There is a maximum in the signal-to-noise ratio (SNR) for variable conductance
diffusion with increasing evolution time.

9 Task 21.1 Play with this maximum in SNR as a function of timestep, o- and k.

9 Task 21.2 Show that this maximum in SNR occurs for t = 4 o - - 2.

9 Task 21.3 Investigate the Perona & Malik equation and the SNR for 3D.

A good empirical value for k is the 80% or 90% percentile of the cumulative frequency
distribution of the gradient magnitude values in the image. Of course, this is a ’rule of
thumb’, the best choice may depend on the image and the task at hand and may need some
experimentation. The function k p e r c e n t i l e [ i m , O, p e r c , n b i n s ] calculates the
percentile value kpere for the image im, gradient scale a, percentage p e r c ([0-1]) in the
number of bins n b i n s :

Needs["Statistics'DataManipulation'"];

kpercentile [im_, u_, p e r c _ , n b i n s _ ] : = M o d u l e [ {max, c u m m a x I ,


grad2 = ~(gD[im, i, 0, a] 2 + g D [ i m , 0, i, G ] 2 ) ; m a x = M a x [ g r a d 2 ] ;
c o u n t s -- B i n C o u n t s [ F l a t t e n [grad2] , {0, m a x , m a x / n b i n s } ] ;
cumcounts = Rest[FoldList[Plus, 0, c o u n t s ] ] ;
cummax = Max [cumcounts] ;
Length [Select [cumcounts, (# < p e r c c u m m a x ) &] ] m a x / n b i n s ] ;

Here is the result for an image of man-made structures. We first find an estimate for k at
0"=1:
370 21.4 Scale-space implementation of the P&M equation

im= Import["Utrecht256.gif"] [[i, I]];


k90 = kpercentile[im, I, .9, I00]

77.1295

We choose deliberately a smaller k = 25 to get more blurring overall:

6s = .02; Q = i; k = 25; e v o l v e d = im;


D o [ e v o l v e d += ~S ( p m c l [ i m , o, k ] ) , { 6 0 } ]
Show[GraphicsArray[ListDensityPlot[#, DisplayFunction ~ Identity] & /@
{im, e v o l v e d } ] , I m a g e S i z e - > 400];

Figure 21.8 Left: original image, resolution 256x256. Right: Variable conductance diffusion
with k = 25, o-= 1, ,Ss = 0.02 and 60 iterations. Note the good keeping of the gradient
steepness at the edges and the reduction of the brick and rooftile texture.

21.5 The P&M equation is ill-posed.


It is instructive to study the P&M equation in somewhat more detail. Let us look how the
diffusion process depends on the gradient strength, so we consider (in ID for simplicity)
c=c(Lx). So the P&M equation is Ls =Ts0 (c(Ts 3L OL. ,
Lx +cLxx. Suppose that the
flow (or flux function) c/.~ is decreasing with respect to Lx at some point x0, then we have
a ( c L x ) = C + C ’ L x = - a w i t h a > O . Now c ’ = O-L-7
OLx & ’ So the nonlinear diffusion equation
reads in this situation Ls + aLxx = O, from which we get Ls = - a L x x . Locally we have an
inverse heat equation which is well known to be ill-posed. This heat equation locally blurs or
deblurs, dependent on the condition of c. We study this condition in 1D. For cl and c2 we
find:

Clear[k, Lx, cl, c2] ; cl [Lx_] := E x p [ - L x 2 / k 2 ] ;


c2 [Lx_] : = i / (i + L x 2 / k 2 ) ; @Lx (cl [Lx I Lx) // S i m p l i f y

Lx2
e-~ (k 2 - 2 L x 2)
k2

aLx (C2 [Lx] Lx) // S i m p l i f y

k 4 _ k 2 LX 2
(k2 + L x 2 ) 2
21. Geometry-driven diffusion 371

The function cl Lx decreases for Lx > 1 a/t~ k and r Lx decreases for Lx > k. This implies
that with k we can adjust the turnover point in the gradient strength, below which we have
blurring, and above which we have deblurring. This is a favorable property: small edges
disappear through blurring during the evolution, while the stronger edges are not only kept
but even made steeper through the deblurring. Here are the graphs of the flux c Lx and of
a(cLx) for both c’s with k = 2:
old.
k = 2; DisplayTogetherArray[
{{Plot[cl[Lx] Lx, {Lx, 0, 5}, AxesLabel ~ {"Lx" , "flow c I Lx"}],
FilledP1ot[Evaluate[aLx (cl[Lx] Lx)],
{Lx, 0, 5 } , A x e s L a b e l ~ {"Lx", "0Lx (Cl Lx)"},
epilog = Epilog -> {Text["blurring", {1.5, . 8 } ] , A r r o w [(1.5, .7}, {1.1, .5}],
Text["deblurring", {3, .4}], Arrow[{3, .3}, {2.8, .1}]}]},
{Plot[c2[Lx] Lx, {Lx, 0, 5}, AxesLabel ~ ("Lx", "flow c2 Lx"}],
FilledPlot[Evaluate[aLx (c2[Lx] Lx)], {Lx, 0, 5},
AxesLabel ~ {'Lx", "aLx (C2 Lx)"}, epilog]}}, ImageSize -> 310];

Figure 21.9 The value of k determines the turnover point of the direction of diffusion. Top
row: Flow function and its derivative with respect to the gradient for conductivity function cl.
Bottom row: idem for c2. For both c's: k = 2. For positive slope of the flux as a function of
the gradient we get local blurring, for negative slope we get local deblurring.

We saw before that a reasonable value for k is the 90% percentile, i.e. all edges with
strength below this value will be smoothened out, and all edges stronger than this value will
be enhanced.

The original formulation by Perona and Malik employed nearest neighbor differences in 4
directions to calculate the local gradient strength. This introduces artefacts because there is a
bias for direction. We now understand that the Gaussian derivative kernel is the appropriate
regularized differential operator, which does not introduce a bias for direction. This was
introduced first by Catt6, Lions, Morel and Coil |Catt6 et al. 1992].
372 21.6 Von Neumann stability of numerical PDE’s

21.6 Von Neumann stability of numerical PDE's


When we approximate a partial differential equation with finite differences in the forward
Euler scheme, we want to make large steps in evolution time (or scale) to reach the final
evolution in the fastest way, with as little iterations as possible. How large steps are we
allowed to make? In other words, can we find a criterion for which the equation remains
stable? A famous answer to the question of stability was derived by Von Neumann [Ames
1977a], and is called the Von Neumann stability criterion. We explain the concept with a
simple 1D evolution equation, the 1D linear diffusion equation: aL
Ot --
02L
OX2 "

n+l n
This equation can be approximated with finite differences as tj A t-t7 _ LT+I-2Lj+C7
A X2
~ ’ where
we use a forward derivative for the time derivative in the left-hand side of the equation and
centered differences for the second order spatial derivative in the right-hand side. The upper
index n denotes the moment in time, the lower index j denotes the spatial pixel position. We
define R = ~A, t so we rewrite _jI.n+J - Ljn _ R(L~+I _ 2 L jn + L~_I)=
n
0, in Mathematica (we
define the finite difference function f [3 , n ] ):

Clear[L, f] ;
f[j_, n_] : = L [ j , n+l] -L[j, n] - R (L[j + i , n] - 2 L [ j , n] + L [ j - I , n])

Let the solution L~ of our PDE be a generalized exponential function, with k a general
(spatial) wavenumber:

C l e a r [ ~ , j, n, k, AX] ;
L [ j _ , n_] := 6 n E x p [ I j k A x ]

When we insert this solution in our discretized PDE, we get

f [j, n]

_eijkAx ~n + eij kAx ~l+n _ R (e i (-l+j)kAx ~n _ 2 e ijk~x ~n + ei (l+j) kAx ~n)

We want the increment function f ( j , n) to be maximal on the domain j , so we get the


condition of(j,n>
oj _ O. The Mathematica function E x p T o T r i g rewrites complex
exponentials into sinusoidal functions, using Euler , s formula e t O = cos(,/,) + i sin(~b):

E x p T o T r i g [E I ~ ]

Cos[D ] + iSin[~]

The maximum criterion Of(j,n) = 0 can be solved for ( :

Solve[ExpToTrig[Oj f[j, n]] = = 0 , ~] // S i m p l i f y

{ {~-~ 1 - 2 R + 2 R C O S [k Ax] }}

The amplitude ~:n of the solution (n eLjkAx should not explode for large n, so in order to get
a stable solution we need the criterion I~:l -< 1. This means, because C o s ( k A x ) - 1 is
always non-positive, that
21. Geometry-driven diffusion 373

At 1
R= ~ _< ~ (3)
This is the Von Neumann criterion. For the 2D case we can prove in a similar way (assuming
At 1
Ax=Ay):R2D= ~ <_ S"

9 Task 21.4 Show that the Von Neumann criterion for N-dimensional isotropic
diffusion e q u a l s ~ <- 2 1__
N ’

This is an essential result. When we take a too large step size for At in order to reach the
final time faster, we may find that the result gets unstable. We show an example in section
21.8. The Von Neumann criterion gives us the fastest way we can get to the iterative result. It
is safe to stay well under the maximum value, to not compromise this stability close to the
criterion.

The pixelstep Ax is mostly unity, so the maximum evolution stepsize should be


A t < ~- (pixel2). This is indeed a strong limitation, making many iteration steps necessary.
Gaussian derivative kernels improve this situation considerably, as we will see in the next
section.

21.7 Stability of Gaussian linear diffusion


The numerical stability criterion for Gaussian derivative implementations was derived by
Niessen et al.[Niessen et al. 1997]. The difference is that we now do not take the nearest
neighbor differences for an approximation of the derivatives, but a regular convolution with
the Gaussian aperture function, as scale-space theory prescribes. We do the analysis in 1D
first, then in 2D.

We start again with a general possible solution for the luminance function L(x, j, n), where x
is the spatial coordinate, j is the discrete spatial grid position, and n is the discrete moment
in evolution time of the PDE.

Clear[g, gxx, s, ~, j, k, n, Ax];


L[x_, j_, n_] : = ~ " E x p [ I j x ] ;

We define the Gaussian kernel and the Laplacian (in 1D it is just the second order spatial
derivative):

X2

g[x_] := ~ s
gxx[x_] = Simplify[ax,xg[X], s > O]
x2
e-~-~ ( - 2 s + x 2)
8 ~ S 5/2

We recall that the convolution of a function f with a kernel g is defined as


f | g = f ~ f ( y ) g(y - x) d x . The dummy variable y shifts the kernel over the full domain of
374 21.7 Stability of Gaussian linear diffusion

f , and the result f | g is a function of x. So for discrete location j at timestep n we get for
the blurred intensity:

convolved[x_] = Simplify[/~L[y,r’ J , n] g x x [ y - x] d y , s > O]

_e j (-js+Ix) j2 ~n

If we compare this with the original intensity function, we find a multiplication factor
_ e - j 2 s j2 :

convolved[x]
factor = // Simplify
L [ x , j, n]

-e j2s j2

s [] 1; P l o t [ f a c t o r , {j, 0, 3},
A x e s L a b e l -> {"j (time) ", "factor"}, I m a g e S i z e -> 230] ;

factor

i . . . . ' .... 2 .... -! ' 3 j(timc)

-005
-01
-015
-02
-025
03
-O35

Figure 21.10 There is a clear minimum of the multiplication factor - e - f s j2 as a function of j.


With this value the largest steps in the evolution can be made, giving the fastest way to the
result.

We are looking for the largest absolute value of the factor, because then we take the largest
evolution steps. Because the factor is negative everywhere we need to find the minimum of
the factor with respect to j , i.e. Ofactor = O:
Oj

Clear[s] ; s o l u t i o n = S o l v e [ a j f a c t o r == 0, j]

1
Only positive times j make sense, so we take j = 2-D~ 9We then find for the maximum size of
the timestep factor es "

1
f a c t o r /. j -> - -
4~
1
eS
21. Geometry-driven diffusion 375

So we find for the Gaussian derivative implementation ( = 1 - A--! so I~:l < 1 implies
~S

es <- 2, thus At _< 2 e s . Introducing this in the time-space ratio R = ~


A---k
At we finally get the
~

limiting stepsize for a stable solution under Gaussian blurting:

R= ~ _<2 e s = e 0 -2 (4)
Note that this enables substantially larger stepsizes then in the nearest neighbor case. E.g.
w h e n o- = 2, we get (for Ax = 1 ) a timestep of At <_ 10.87. We give the reasoning for the 2D
case for completeness. We define a 2D image L(x, y, j, k, n) on our discrete grid with (x, y)
the spatial coordinates, (j, k) the spatial integer grid positions, and n the discrete evolution
time:

Clear[x, y, j, k, n]; L[x_, y_, j_, k_, n_] := ~n EZJX EIkY;


1
Exp[-~] ; laplacian[x_, y_] = Simplify[Ox,xg+Oy,yg, s > 0]
g= 4~rs

x2 §
e- 4o (-4s+x2 + y 2)
16 ~ s s

First we integrate with respect to the shift ~e in x, then to the shift/3 in y (we may do this due
to the separability of the Gaussian kernel, and thus of the Laplacian function):

F |
convolutionl=Simplify[J_ L[a, .8, J, k, n] l a p l a c i a n [ a - x , B - Y ] d a , s > O]

4j2 s2-4ijsx§ 4iks$-2YB+~2 j2 S 2 dn


e ~ (-2s-4 + (y_ ~)2)
8 % ~ s 5/2

convolution = Simplify Ef ~ convolution1 d .8, s > 0]

--e J2 s-k2 s+ijx+ikY (j2 + k 2 ) ~n

The factor relative to the original luminance function is now:

convolution
factor = // Simplify
L[x, y, j, k, n]

-e <J2+~2>s (j2 + k 2)

This function has a m i n i m u m atJ It~0fact~ = 0, 0factor0k = 0}:

solution = S o l v e [ (Oj f a c t o r == O, Ok f a c t o r == 0}, s]


1

From the graph of the factor we appreciate that the solution space is on a circle in the spatial
(j, k) domain:
376 21.7 Stability of Gaussian linear diffusion

s = 1; P l o t 3 D [ f a c t o r , {j, 0, 3}, {k, 0, 3}, A x e s -> True,


A x e s L a b e l -> {"j", "k", "factor"}, I m a g e S i z e -> 320] ; Clear[s] ;

Figure 21.11 There is a clear circular line of minima of the multiplication factor
--e-(J2+k2)s(j 2 + k 2) a s a function of j and k. With t h e s e values the largest s t e p s in the
evolution can be made, giving the fastest way to the result.

So j2 + k 2 = /s. As in the 1D case, we find for the m i n i m u m factor -_!


~s :

1
factor /. { j 2 + k 2 _> ..~.}

1
eS

So in 2D we find for the Gaussian derivative implementation the same result:

R: At
AxAy
<2eS
- -
= CO_2 (5)

21.8 A practical example of numerical stability


The following example is from [Niessen 1997a]. The stability criterion for Gaussian
implementation is ~ As < 2 e s ---- e O "2 (because s = ~’-
1 O"2 ). For the Gaussian blurring of an
image with o- = 0.8 pixels for the Laplacian operator, we get As < e 0.82 = 1.74. Let us
study the effect o f taking a range o f evolution steps from somewhat smaller to somewhat
larger than the critical evolution step As = 1.74. As test image we take a white circle on a
black background. W e blur this image to o- = ~ pixels (which is to s = 64 pixels 2 ) in
two ways: a) with normal Gaussian convolution and b) with the numerical implementation o f
the diffusion equation and Gaussian derivative calculation o f the Laplacian.

disk = Table[Ill(x- 64) 2 + ( y - 6 4 ) 2 < 1000, 1, 0], {y, 128}, (x, 128}];
d i s k b l u r r e d = gD[disk, 0, 0, i ~ ] ;
21. Geometry-driven diffusion 377

DisplayTogetherArray[{ListDensityPlot[dlsk],
L i s t D e n s i t y P l o t [ d i s k b l u r r e d ] l , I m a g e S i z e ~ 2801;

Figure 21.12 T e s t i m a g e and its Gaussian blurred version at ~ = ~ 1 2 8 pixels. Resolution


image 128 x 128.

This function implements the numerical approximation of the linear, isotropic blurring of the
image im:

num[im_, nrsteps_, a_, e v o l u t i o n r a n g e _ ] :=


M o d u l e [ { 6 s , imt}, 6s = e v o l u t i o n r a n g e / n r s t e p s ; imt = im;
D o [ i m t += 6s (gD[imt, 2, 0, G I + gD[imt, 0, 2, u]) , {nrstepsl] ; imt];

To show the critical effect of the Von Neumann stability criterion, we evolve the image to a
scale s = 64 and plot the numerical result for a narrow range of timestep sizes (timestep =
64 / n s t e p s ) around the critical value 1.74:

DisplayTogetherArray[
T a b l e [ L i s t D e n s i t y P l o t [ n u m [ d l s k , nsteps, .8, 64],
P l o t L a b e l ~ "timestep = " < > T o S t r i n g [ N [ 6 4 / n s t e p s ] ] ] ,
{nsteps, 35, 39}], I m a g e S i z e -> 4001;

Figure 21.13 Study of the influence of time stepsize on the accuracy of the result in the
numerical approximation of the linear diffusion equation. Image resolution 1282 , evolution to
s = 64 pixel 2, scale Laplacian operator o-= 0.8 pixel. The critical timestep
/ks = e o -2 = 2.1718x0.82 = 1.74.

9 T a s k 21.5 W h y are the artefacts at left/right, t o p / b o t t o m ?

It is clear from the results that timesteps larger than the critical timestep (two left cases) lead
to erroneous results. It is safe to take the timestep at least 10% smaller than the critical
timestep. Note the amazingly sharp transition around the critical timestep.
378 21.9 Euclidean shortening flow

21.9 Euclidean shortening flow


Alvarez, Guichard, Lions and Morel [Alvarez 1992a] realized that the P&M variable
conductance diffusion was complicated by the choice of the parameter k.

They reasoned that the principal influence on the local conductivity should be to direct the
flow in the direction of the gradient only: we want a lot of diffusion along the edges, but
virtually no diffusion across the edges. This led to the proposal to make the intensity flow a
function of the unit gradient vector ~VL = (cos(~b), sin(0)), so we get the (nonlinear)
diffusion equation

oL ivLtv(
-37 = ) (6)

In Mathematica:

Clear[L]; << Calculus'VeetorAnalysis';


SetCoordinates[Cartesian[x, y, z]];
V L = Grad[L[x, y, O] ]

(L<1"~176[x, y, 0], LI~176 [x, y, 0], O}

VL
~.VL D i v [_ -_ -] // Simplify // shortnotation
~V'.VL
- 2 Lx Lxy Ly + Lxx L 2 + Lx2 Lyy

This is exactly r ]VL I’ i.e. the isophote curvature ~ (see chapter 6) times the gradient
magnitude ] VL ]. From the discussion of the gauge coordinates we recall that this is equal to
the second order derivative in the direction tangential to the isophote: x ]VL ] = Lvv.

The nonlinear diffusion according to this paradigm becomes particularly concise:

: Lvv (7)
Because the Laplacian A L = Lxx + Lyy = Lvv + Lww, we get ~OL = A L - L w w . We see that
we have corrected the normal diffusion with a factor proportional to the second order
derivative in the gradient direction (in gauge coordinates: Lww). This subtractive term
cancels the diffusion in the direction of the gradient. This gives us also a recipe for 3D:
O_L
Os
= A L - Lww where A L = L . . + Lvv + Lww = Lxx + Lyy + Lzz is the 3D Laplacian and u
and v are the gauge directions tangential to the isophote surface.

The enthusiasm of Alvarez et al. about this equation led them to the name ’fundamental
equation of image processing’ (!). This PDE is better known as the Euclidean shortening flow
equation, for reasons we will discuss in the next section.
There are a number of differences between this equation and the Perona & Malik equation: -
the flow (of flux) is independent of the magnitude of the gradient; - There is no extra free
parameter, like the edge-strength turnover parameter k; - in the P&M equation the diffusion
21. Geometry-driven diffusion 379

decreases when the gradient is large, resulting in contrast dependent smoothing; - this
equation is gray-scale invariant.

21.10 Grayscale invariance


A function is grayscale invariant, if the function does not change value when the grayscale
function L is modified by a monotonically increasing or decreasing function f(L), f 4: O.
Grayscale invariance is an attractive property: image analysis is the same when we change
e.g. the contrast or brightness on a monitor, or if we put up sunglasses. The gradient is
dependent on f :

Clear[f, L] ; u = f [L[x, y, s]] ;


gradu = 4 (0x u) 2 + (0y u) 2 ; g r a d u / / shortnotation

~ ( L x [ x , y, S] 2 +LM[X, y, S] 2) f'[L[x, y, s]] 2

The Euclidean shortening flow equation W = V. is independent of f :

( _ _ o,u ]
OxU +Oy grada ) == 0, f' [L[x, y, S]] # 0] //
F u l l S i m p l i f y [0s u - gradu 0 x gradu

shortnotation

(2 Lx [x, y, S] Lxy [X, y, S] Ly [X, y, s] +


Ly[X, y, S] 2 (-Lxx[X, y, S] +Lz [X, y, S]) +
Lx[X, y, S] 2 (-Lyy[X, y, s] +Lz [x, y, s])) /
(Lx [x, y, S] 2 +Ly[x, y, S] 2 ) = = 0

The function f does not show up anymore: the equation is greyscale invariant.

21.11 Numerical examples shortening flow


Let us study some numerical examples of this PDE on a simple image. We first define the
numerical iterative approximation to the nonlinear PDE:

euclideanshortening[im_, nrsteps_, a_, evolutionrange_] :=


M o d u l e [{~s, imt I , 6s = e v o l u t i o n r a n g e / n r s t e p s ; imt [] im;
D o [ i m t += 6s (gD[ims 0, 2, u] gD[imt, 1, 0, 0] 2 -
2 gD[imt, 0, I, G] gD[imt, I, 0, a] gD[imt, i, I, a] +
gD[imt, 0, i, G]2 gD[imt, 2, 0, u] ) /
(gD[imt, 0, i, al 2 + g D [ i m t , I, 0, ~]2), {nrsteps}] ; imt] ;

In gauge coordinates the expression becomes more compact:

euclideanshortening[im_, nrsteps_, a_, evolutionrange_] :=


M o d u l e [ { 6 s , imt}, 6s []e v o l u t i o n r a n g e / n r s t e p s ; imt[] ira;
D o [ i m t += 6sgauge2DN[im0, 0, 2, a] /. i m 0 - > imt, {nrsteps}] ; imt] ;

We study the disk image again, with additive uniform uncorrelated noise:
380 21.11 Numerical examples shortening flow

disk = Table[
I l l ( x - 64) 2 + (y- 64) ~ < 1000, 1, 0] +Random[I, {y, 128}, {x, 128}];
d i s k b l u r r e d = g D [ d i s k , 0, 0, i ~ I ;
DisplayTogetherArray[{ListDensityPlot[disk],
ListDensityPlot[diskblurred]}, ImageSize ~ 230];

Figure 21.14 Noisy disk image. Image resolution 1282, disk=l, background=0, noise
amplitude [0-1]. Right: image blurred with or= fl~2-8, or s = 6 4 with linear Gaussian
diffusion. By blurring the image the noise is gone, but the edge is gone too.

At
T h e critical timestep for this numerical s c h e m e is again ~ < 2 e s. W e check this with the
s a m e settings as above for the linear diffusion example:

Block[{$DisplayFunction = Identity},
pl = Table[ListDensityPlot[euclideanshortening[disk, nsteps, .8, 64],
P l o t L a b e l ~ "timestep = " ~ > T o S t r i n g [ N [ 6 4 / n s t e p s ] ] ] ,
{nsteps, 32, 42}]];
Show[GraphicsArray[Partition[pl, 5]], I m a g e S i z e ~ 355];

Figure 21.15 Euclidean shortening flow on the noisy disk image for a range of timestep sizes
around the critical timestep • = ~ 0.82 = 1.74. Clearly artefacts emerge for timestep sizes
larger than the critical timestep.

A n d here is the evolution itself (we choose s = 1.6 pixels 2 as timestep) for s = 1 to 57 in 8
steps, with the scale o f the flux operator o- = 0.8:

Block[{$DisplayFunction = Identity}, pl = Table[nsteps = C e i l i n g [ s / 1 . 6 ] ;


ListDensityPlot[euclideanshortening[disk, nsteps, .8, s],
P l o t L a b e l ~ "time [] " <>ToStrlng[s]], {s, 1, 57, 8}]I;
21. Geometry-drivendiffusion 381

Show[GraphicsArray[Partition[pl, 4]], ImageSize ~ 510];

Figure 21.16 Euclidean shortening flow on the noisy disk image as a function of evolution
time/scale (in pixel2 ). Noise is significantly reduced, the edge is preserved.

The noise gradually disappears in this nonlinear scale-space evolution, while the edge
strength is well preserved. Because the flux term, expressed in Gaussian derivatives, is
rotation invariant, the edges are well preserved irrespective of their direction: this is edge-
preserving smoothing..

DisplayTogetherArray[
ListPlot[disk[[64]], PlotJoined-> True, P l o t L a b e l ~ "original"l,
ListPlot[euclideanshortening[disk, 32, .8, 32][[64]],
PlotJoined-> True, P l o t L a b e l ~ "time = 31"], I m a g e S i z e - > 350];

original time=31
2
14
12
15
1

l 08
O.6

05 O.4

02

20 40 60 80 lot) 120

Figure 21.17 Plot of the middle horizontal row of pixel values for the original noisy disk image
(left) and the image at time t = 31 with Euclidean shortening flow. Note the overshoot at the
edges, and the good noise removal while keeping a steep edge.

This is an example for an ultrasound image with its particular speckle pattern:

us = I m p o r t [ " u s . g i f " ] [[1, 1]]; B l o c k [ { $ D i s p l a y F u n c t i o n = Identity, s = 9},


pl = L i s t D e n s i t y P l o t [ u s , P l o t L a b e l ~ "Original"];
p2 = L i s t D e n s i t y P l o t [ u s e = euclideanshortening[us, 6, .8, s],
P l o t L a b e l ~ "scale = " <> T o S t r i n g [ s ] ] ] ;
382 21. l I Numerical examples shortening flow

Show[GraphicsArray[{pl, p2}], ImageSize~500];

Figure 21.18 Euclidean shortening flow on a 260 x 345 pixel ultrasound image (source:
www.atl.com).

Many improvements and acceleration schemes have been proposed recently. It is beyond the
scope of this book to discuss these developments in detail. Good references to study are:
Kacur and Mikula [Kacur2001[, Weickert et al. [Weickert1997e, Weickert1998b].

21.12 Curve Evolution


We can consider an image as a collection of sampled luminance points. Alternatively, we can
consider an image as a set of isophotes (or level sets). They too describe the image
completely. When we consider the evolution of this set of isophotes under the control of a
nonlinear diffusion equation, we consider the evolution o f curves. This has become an
important branch of nonlinear evolutionary image processing. Mathematically a lot was
known about the evolution of curves and surfaces, among others from the studies of the
propagation with time of the firefronts (iso-temperature surfaces) in flames and combustion
[Sethian 1982].

Consider a parametefized closed curve without self-intersections C ( p ) : S ~ -~ R , and let


o~
= Y(p) be a regular parametric representation of the curve C with 7p :~ 0. The evolution
of the curve makes sure that the curve deforms. These deformations are functions of local
geometry, so we get in general a motion of each point on the curve. This deforming motion
can be decomposed in a component along the inward unit normal vector N , and a component
along the tangential unit vector T. When the parameter t measures evolution time, we get
OX
= c~(p, t) T + fl(p, t) N .

But an infinitesimal displacement along the tangential unit vector has no effect. We could as
well see the curve as a rope and shift the rope while keeping the same form of the curve
itself: only a displacement of points perpendicular to the curve will shift the shape of it. This
was first proved by Gage [Gage 1983], who showed that the previous equation is equivalent
to b7 = fl’ (P, t) N with Y(p, O) = ~o(P).
21. Geometry-driven diffusion 383

We consider planar curves, which are totally determined by their curvature as we recall from
ox
chapter 6. So, the first order expansion for/3’ in terms of its curvature is ~ - = (/3o + ,61 ~) N.

This describes a deformation characterized by a constant motion term/30 N and a motion due
to curvature/31 x N , both in the direction of the inward normal.

So we can discriminate three interesting cases:


- /3o = 0, /31 4: 0: pure curvature motion. This flow is Alvarez’s fundamental equation
discussed before, and is known as the Euclidean shortening flow.

The reason that it is called this way we will discuss in the next section. This flow evolves
concave regions to convex regions [Gage and Hamilton 1986], and shrinks convex regions to
circular points [Gage 1983, Gage 1984]. Curvature motion has a smoothing effect on curves.
- /30 4: 0, /31 = 0: constant motion or normal motion flow. The isophotes move in the
direction of their normal with velocity/3o. This flow is intimately related to the dilation and
erosion operators with a disk as structuring element from mathematical morphology. We will
discuss mathematical morphological operators in the next section in more detail. Constant
motion has a sharpening effect on curves.
- / 3 0 :~ 0,/31 4: 0: both curvature and constant motion. Both deformations are in a sense each
other’s opposite and could be appropriately weighted, prescribed by the task, to come to a
satisfactory shape deformation. This framework was developed by Kimia et al. as the
’reaction-diffusion’ (sharpening resp. smoothing) framework [Kimia 1992, Kimia 1996a].
Note that/30 and fll do not have the same dimensionality, so we need to work with natural
coordinates (~c ~ -~, see chapter 6). This 2-dimensional scale-space with parameters/30 and
/31 is also known as the entropy scale-space.

21.13 Duality between PDE- and curve evolution formulation


It was proven by Osher and Sethian [Osher and Sethian 1988, Sethian 1990a] that for all
points on a level set (defined by L(x) = a where a is a constant) for all a and where VL 4: 0,
and with the evolution of the intensities of the image (the generation of a scale-space)
aL = V . F ( O i L , Oi,iL . . . . ), then the levelsets of L evolve
specified by the evolution equation 37
governed by the following curve evolution equation: oc IvqOiiL ...) N.
a, _ -V.T(OiL

Here F is the (arbitrary) flow vector expressed in any partial spatial derivative of the image,
and N is the inward unit normal.

In general, a point on a curve can move in any direction. This direction can be decomposed
as a component in the tangential (to the curve) direction T, and a component in the normal
direction N. So the most general curve evolution equation is -37 OC = a T + fl N . However, a
motion tangential to the curve in unnoticeable, the exact position of the curve (isophote in
our situation) is irrelevant. Compare the situation with a rubber band running over two
rotating wheels: two photographs taken at different moments show the band in different
positions, but this cannot be seen on the pictures of the structureless band. This irrelevance
of the component ~ T was shown mathematically by Gage [Gage 1983].
384 21.13 Duality between P D E - and curve evolution formulation

So the level sets evolve according to ~oc- = # N .

This is an important formulation, as we can now relate both schemes directly together. It
implies that when we consider a curve evolution scheme ~~c- = g(K, ~~, ap a~2 , . . . ) ~ o f a
curve C ( x ( p ) , y ( p ) ) where p is the curve arclength parameterizing the curve and K the
isophote curvature, we know that the luminance L evolves according to
OLa, - Lw g ( - ~ . . . . ). We recall from chapter 6 that K= ----~-. The PDE for the evolution
of the luminance for the entropy scale-space becomes now: ~OL- = ,6o Lw + fll L ~ , belonging
to the class of Hamilton-Jacobi equations.
Olver, Sapiro and Tannenbanm [Olver 1994d] give a very useful generalization to the
Euclidean shortening flow equation. They show that when the evolution of the curve C is
expressed in the second order derivative with respect to the intrinsic curve arclength r, i.e.
ac
Ot -
~2c
Or2 ’
-
then this equation defines a flow which is invariant to any Lie group action (an
action which can be expressed in infinitesimal form, like translations, rotations, scalings) for
which the arclength is invariant. They show that these equations locally behave as geometric
heat equation ~OC- = -~2 A L, where g is the group-invariant metric g = ~-x"
Or If r is the (usual)

Euclidean arclength we get the Euclidean shortening flow discussed above.

Name of Luminance Curve Timestep Timestep


flow evolution evolution N.N. Gauss der.

Linear -O-L = AL -a-c = - [--~L[


AL (~x)2 2 es
ot at 2D

Variable aL - ,. (c VL) o...~e = lax)2 2 es


ot - ot I~L [ 2D
conductance

Normal or OL = CLw o...~c = c


Ot at
constant motion

Euclidean aL = Lv v Oc = ~ (ax)2 2 es
ot at 2
shortening

Affine OL = L v v ~- X ~ or =x~

shortening
1 2
2
Affine shortening
oL oe = x T
~T
modified

Entropy oL = ~ o I ~ + ~ 1 Lvv oc - (jSo + E1 x ) - , ~ (Ax)2 -s 2 *S


oW -~" -

Figure 21.19 Table of some popular nonlinear diffusion equations with their name, PDE
formula for the luminance evolution, the PDE formula for (isophote) curve evolution, the
maximum allowed timestep for nearest neighbor implementations (N.N.), and the maximum
allowed timestep for Gaussian derivative implementation.

When ra is the affine arclength (invariant under affine transformations) they derive the curve
evolution equation oc
Ot -- o~-c
Or2. -- KT1 N , or equivalently -3T
OL = Lvv T1 Lw T2
~

Table 21.19 summarizes the properties of a number of important nonlinear diffusion schemes
[adapted from Niessen 1997d]:

Here are the M a t h e m a t i c a forward-Euler implementations of constant-motion (normal) flow,


affine shortening flow, and a modified version of affine shortening flow (proposed by
21. Geometry-driven diffusion 385

Niessen [Niessen 1997c]) to slow down the evolution at high gradients so it maintains
comers better:

constantmotion[im_, nrsteps, u_, e v o l u t i o n r a n g e _ ] :=


Module[{6s, imt}, &s = e v o l u t i o n r a n g e / n r s t e p s ; i m t = im;

Do[imt+= 6s~gO[imt, i, 0, ~]~ + gD[imt, 0, I, o] ~ , { n r s t e p s } ] ; imt];

affineshortening[im_, nrsteps_, a_, e v o l u t i o n r a n g e j :=


Module[{6s, imt}, 6s = evolutionrange ; imt = im; Do[
nrsteps
imt += 6s (gD[imt, 0, 2, u] gD[imt, i, 0, u] 2 - 2 gD[imt, 0, i, u] gD[imt, i, 0, G]
gD[imt, I, i, G I +gD[imt, 0, 1, a] 2 gD[imt, 2, 0, ~1) ^ (1/3)
(gD[imt, 0, i, u] 2 + gD[imt, i, 0, u]2) I/3 , {nrsteps}]; imt];

The modified affine shortening flow applies a different scale O-grad for the modifying
gradient, weighted by a factor k:

affineshorteningmodified[im_, nrsteps , G_, ugrad , k_, e v o l u t i o n r a n g e ~ :=


evolutionrange
Module [ {6s, imt}, 6s = ; imt = im;
nrsteps
Do[imt += 6s (gD[imt, 0, 2, u] gD[imt, 1, 0, a] 2 - 2 g D [ i m t , 0, 1, a]
gD[imt, i, 0, a] gD[imt, i, I, a] +gD[imt, 0, i, a] 2 gD[imt, 2, 0, u]) ^
(1/3) (gD[imt, 0, 1, u] 2 +gD[imt, i, 0, G]2) I/3

(gD[imt, O, I, ugrad] 2 + gD[imt, I, 0, ograd] 2) , {nrsteps}] } imt];

We use a test image of a square (intensities between 0 and 100) with added Gaussian noise
(mean = 0, o- = 50).

<< Statistics'ContinuousDistributions';
n o i s y s q u a r e = Table[If[35 < x < 93 && 30 < y < 93, 0, 100] +
Random[NormalDistribution[0, 50]], {y, 128}, {x, 128}I;
DisplayTogetherArray[ListDensityPlot/@tnoisysquare,
euclideanshortening[noisysquare, 32, i, 321} , ImageSize -> 3151;

Figure 21.20 Euclidean shortening flow is a curvature-driven flow; in curve evolution


terminology: curves are moved in the gradient direction with a speed proportional with the
curvature. Clearly the corners with K >> 0 are smoothed, while the straight (K = 0) edges are
preserved.
386 21.13 Duality between PDE- and curve evolution formulation

Why the name 'shortening flow'?

Sapiro and Tannenbaum [Sapiro1993e] proved that the length of the curve under a
shortening flow indeed shrinks. They showed for the metric of the curve, defined as
g(p, t) = I ~ I, where p is an arbitrary parametrization of the curve Y, that the evolution of
Og ------K 2 g"
the metric is equal to: -b7

2~T
The total length of the curve L = J~ g(r, t ) d r
evolves as
otat - ~t~ f0 ~rg(r, t ) d r = -Jot-2~rK2 g(r, t ) d r = - f o Lx 2 dv, from which we see that the length
is always decreasing with time.

21.14 Mathematical Morphology


Mathematical morphology is one of the oldest image processing and analysis techniques.
The original idea is the application of a logical area-operator (called structuring element) on

i00oo00
areas of the image in the same way as convolution filters. The following example illustrates
this. We define a simple binary image i b i n and a structuring element s t r u c t e l :

I b i n [] P a d R i g h t [ T a b l e [ l , {3), {3}], {7, 7}, 0, 2];


structel = Table[l, {3}, (3l]; M a t r i x F o r m / @ { I b i n , structel}

11)t
0
0 0 0 0 0 0 0
0 0 1 1 1 0
0 0 1 1 1 0 0 , 1 i i
0 0 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0

The structuring element is shifted over the image exactly as in a 2D convolution. The logical
operation B i t O r applied on all elements that are in both the structuring element and the
underlying image patch (this is detected with B i t A n d ) give a dilation of the object in the
image:

ListConvolve[structel, Ibin, {2, 2}, 0, BitAnd, BitOr] // MatrixForm

0 0 0 0 0 0 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 0 0 0 0 0 0

Reversing the operations gives an erosion:


21. Geometry-driven diffusion 387

ListCorrelate[1- structel, 1bin, {2, 2}, 1, BitOr, Bit~d] / / M a t r i x F o ~

0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

Show[Import["erosions-dilations.gif"], ImageSize -> 350];

Figure 21.21 Erosion and dilation of a curve with a ball rolled over it at the outer and inner
borders. The larger structuring element smoothes the curve more. The size of the ball is the
scale of the smoothing process. From van den Boomgaard [vandenBoomgaard1993a].

The relation with normal flow now becomes clear: the motion of the contour of the image is
governed by the structuring element in exactly the same way as the level set is moved in the
direction of the normal.

This is only true for an isotropic convex (i.e. round) structuring element. One also says that
the unit gradient vector ]VL] is the infinitesimal generator for the normal motion evolution
equation. Figure 21.21 shows the concept.

We define the following functions in Mathematica (for application on binary images only):

Unprotect[binarydilate, binaryerode];
binarydilate[im_, structel_] := ListConvolve[structel,
im, Ceiling[Dimensions[structel] /2], 0, BitAnd, BitOr];
binaryerode[im_, structel_] := ListCorrelate[1- structel,
im, Ceiling[Dimensions[structel] /2], i, BitOr, BitAnd];

The sequence of erosion followed by dilation removes small structures selectively from the
image:

textim= Import["Text.gif"] [[1, 1]];


im = textim; Block[{$DisplayFunction = Identityl,
p0 = ListDensityPlot[im];
p l = Table[ListDensityPlot[im=binaryerode[im, structel]], {i, 3}];
p2 = Table[ListDensityPlot[im =binarydilate[im, structel]], {i, 3l]];
388 2 l. 14 Mathematical Morphology

Show[GraphicsArray[{p0, p2[[3]]I], ImageSize-> 340];

Figure 21.22 Left: original binary image. Right: result after three erosions followed by three
dilations with a square 3x3 structuring element. The smaller structures have been eroded
fully and did not return upon subsequent restoration of the contour by dilation.

Here are the intermediate steps:

Show[GraphicsArray[{pl, p2}], ImageSize-> 330];

Figure 21.23 Top row: three consecutive erosions of the text image. Bottom row: Three
consecutive dilations of the eroded image.

Subtracting the result of the operation of erosion (of the original image) from the result of the
operation of dilation (of the original image) gives us the result of the morphological gradient
operator:

ListDensityPlot[binarydilate[textim, structel] -
binaryerode[textim, s t r u c t e l ] , I m a g e S i z e -> 180];

Figure 21.24 The subtraction of an eroded and dilated image gives the morphological
gradient of the image. The scale of the operation is hidden in the size of the structuring
element.
21. Geometry-driven diffusion 389

21.15 Mathematical morphology on grayvalued images


The classical way to change the binary operators from mathematical morphology into
operators for gray-valued images, is to replace the binary operators by maximum/minimum
operators.
Mathematical morphology is a mature and well documented branch of computer vision. We
will limit ourselves here to the Mathematica implementation of the grayvalue-erosion and
grayvalue dilation operators. Dilation employs the M a x operator, erosion the Min operator.

U n p r o t e e t [ d i l a t e , erode] ; d i l a t e [ i m , el_] :=
L i s t C o n v o l v e [ e l , ira, C e i l i n g [ D i m e n s i o n s [ e l ] /2] , Min[im] , Plus, Max] ;
e r o d e [im_, el_] : = L i s t C o r r e l a t e [-el, im,
C e i l i n g [Dimensions [el] / 2 ] , M a x [im] , Plus, Min] ;

Here is an example for a 2562 image:

i m m r = I m p o r t [ " m r 2 5 6 . g i f " ] [[1, 1]] ; e l = Table[l, {3}, {3}] ;


DisplayTogetherArray [
{ L i s t D e n s i t y P l o t [immr] , L i s t D e n s i t y P l o t [imd = d i l a t e [immr, el] ] ,
L i s t D e n s i t y P l o t [ i m e = e r o d e l i m m r , el]]}, I m a g e S i z e - > 5 1 0 ] ;

Figure 21.25 Left: original image. Middle: grayvalue dilated, right: grayvalue eroded version.

Structuring element = 1 1 1 .
$ 1 $

And here is the morphological gradient for this image and structuring element:

ListDensityPlot[ime - imd, I m a g e S i z e -> 450];

Figure 21.26 Grayvalue morphological gradient. Structuring element as in fig. 21.25.


390 21.16 Mathematical morphology versus scale-space

21.16 Mathematical morphology versus scale-space


It was shown by van den Boomgaard and Dorst that a parabolic structuring element leads to
Gaussian blurring [van den Boomgaard 19971. This establishes an elegant equivalence
between mathematical morphology and Gaussian scale-space.

Florack, Maas and Niessen [Florack 1999a1 related mathematical morphology and Gaussian
scale-space by showing that both theories are cases from a more general formulation.

It can be shown that dilation or erosion with a ball is mathematically equivalent to constant
motion flow, where the isophotes are considered as curves and they are moved in the gradient
(or opposite) direction. Here is the Mathematica code and some examples:

constantmotion[im_, nrsteps_, u_, evolutionrange_] :=


Module [ {6s, imt}, 6s = evolutionrange / nrsteps ; imt = ira;
Do[imt += 6 s ~ ( g D [ i m t , i, 0, a] 2 + gD[imt, 0, I, u]2), (nrsteps}]; imtl;

DisplayTogetherArray[
ListDensityPlot[constantmotion[textim, 10, 1, 10]],
ListDensityPlot[constantmotion[textim, 10, 1, -10]], ImageSize -> 440];

Figure 21.27 Grayvalue morphological dilation and erosion are equivalent to non-linear
aL
diffusion with constant motion flow (with the PDE ~-=_+ I~LI). The expansion of the
contour by the morphological structuring element is equivalent to the curve motion of the
isophote in the gradient direction. Left: constant motion dilation. Right: constant motion
erosion.

21.17 Summary of this chapter


The diffusion can be made locally adaptive to image structure. Three mathematical
approaches are discussed:

1. PDE-based nonlinear diffusion, where the luminance function evolves as the divergence of
some flow. The nonlinear PDE’s involve local image derivatives, and cannot be solved
analytically;
2. Evolution of the isophotes as an example of curve-evolution;
3. Variational methods, minimizing an energy functional defined on the image.
21. Geometry-driven diffusion 391

Adaptive smoothing requires geometric reasoning to define the influence on the diffusivity
coefficient. The simplest equation is the equation proposed by Perona & Malik, where the
variable conduction is a function of the local edge strength. Strong gradient magnitudes
prevent the blurring locally, the effect is edge preserving smoothing. The strong feedback
connections seen from V1 to LGN may be involved (among many other possible
mechanisms) in a locally adaptive scheme for image enhancement.

The numerical implementation of the nonlinear PDE’s is examplified with an iterative


forward-Euler scheme. This is the simplest scheme to start with, but is unstable for stepsizes
larger than the Von Neumann criterion. We derive this criterion again and expand it for
scaled (regularized) differential operators. The stepsize can then be made substantially larger.

The Perona & Malik equation leads to deblurring (enhancing edges) for edges larger than the
turnover point k, and blurs smaller edges. This is one of the reasons why the performance of
this PDE is so appreciated.

There is a strong analogy between curve evolution and PDE based schemes. They can be
related directly to one another.

Euclidean shortening flow involves the diffusion to be limited to the direction perpendicular
to the gradient only. The divergence of the flow in the equation is equal to the second order
gauge derivative Lvv with respect to v, the direction tangential to the isophote. Normal
motion flow is equivalent to the mathematical morphological erosion or dilation with a ball.
The dilation and erosion operators are shown to be convolution operators with boolean
operations on the operands.
22. Epilog
Computer vision is a huge field, and this book could only touch upon a small section of it.
First of all, the emphasis has been on the introduction of the notion of observing the physical
phenomena, which makes the incorporation of scale unavoidable. Secondly, scale-space
theory nicely starts from an axiomatic basis, and incorporates the full mathematical toolbox.
It has become a mature branch in modern computer vision research.

A third major notion is the regularization property of multi-scale operators, in particular


spatio-temporal differential operators. This enables the use of powerful differential geometric
methods on the discrete data of computer vision, for the many fields where differential
structure is part of the analysis, such as shape, texture, motion etc.

A next major emphasis has been the inspiration from the workings of the human visual
system. Here too is still much to be learned, especially in the layers following the visual
front-end, where perceptual grouping and recognition is performed. The intricate feedback
loops and the local orientation column linking should be among the first to be studied in
detail. Modem neuro-imaging technology is about to give many clues.

The theory covered in this book, focusing on bio-mimieking front-end vision, is primarily
applied to the local analysis of image structure. The extraction of global, intelligently
connected structure is a widely explored area, where model-based and statistical methods
prevail to arrive to good perceptual grouping of local image structure. The study of the multi-
scale relations in the deep structure in scale-space, and the use of hierarchical, more
topological multi-scale methods, has just only started.

Finally, computer vision is solidly based on mathematics, in any applicable field. Image
processing has become a science.

This book showed the use of Mathematica, as a powerful combination of a complete high-
level mathematical programming language, from which through pattern matching the
numerical implementation can be automatically generated, enabling rapid prototyping for
virtually all the concepts discussed in this book. Many of the functions are intrinsically n-
dimensional.

The role of and need for robust computer vision techniques is ever increasing. In diagnostic
radiology, computer-aided diagnosis will see great successes in the next decade, and the
availability of huge image databanks will stimulate the study to image guided retrieval and
self-organization of analysis systems.

Much has been left untreated, the main reason is that the field is so huge. A possible sequel
of this book might include multi-scale methods for shape from shading, texture analysis
22. Epilog 394

(locally orderless images), 3D differential geometry, wavelet based analysis, nonlinear and
statistical methods, and deep structure analysis.

This book is meant to be a interactive tutorial in multi-scale image analysis, describing the
basic functionality. It is my sincere wish that this book has invited you to actively explore
this fascinating area.

Eindhoven, Summer 2002.

Figure 22.1 Artist's impression of a 'deep structure scale-space' forest, where the geodesic
paths of the extrema and saddlepoints are shown as brown branches, and the points of
annihilation of extrema and saddlepoints are depicted as green balls. Scale runs vertical.
The mist is a partly transparent surface where the determinant of the Hessian vanishes.
Artist: ir. Frans Kanters, University of Technology Eindhoven, Department of Biomedical
Engineering, Biomedical Image Analysis Group, 2003.
A. Introduction to Mathematica
Afler finding the next4o last bug, clean up your debugging stuff. The last bug in any
piece of code is invariably found by the first user of the code and never by the programmer.
Roman Maeder, Programming in Mathematica I, pp. 43.

Mathematica is an fully integrated environment for technical computing. It has been


developed by prof. Stephen Wolfram and is now being developed and distributed by
Wolfram Research Inc. Mathematica comes with an excellent on-board help facility, which
includes the full text of the handbook (over 1400 pages).

Mathematica used to be slow and memory-intensive. This might be the reason why so many
computer vision labs have not considered applying it to images. It is a pleasant ’discovery’
that Mathematica is now fast and efficient with memory.

As a concise quick reference guide, here are the most important things to know when you
want to get started with Mathematica:

A.1 Quick overview of using Mathematica


Mathematica consists essentially of two programs, that communicate with each other: the
kernel for the calculations and the front-end for the graphical user-interface (input and
presentation of data). The front-end is an advanced text processor. It creates notebooks,
portable documents that consists of cells, indicated by the brackets to the right of the cell.
The front-end and kernel together form an interactive interpreter.

Cells come in many styles, like ’title’, ’text’, ’input’ etc. The front-end takes care of automatic
grouping of the cells (this, as any function, can be turned off).

By double-clicking on the group bracket a group of cells can be closed or opened, for quick
overview of the document.

Input style cells are sent to the kernel and executed by pressing ’shift-return’. Commands can
extend over several lines in an input type cell, separated by returns. Cells can be edited, and
run again. Mathematica remembers activities in the order of processing, not by the location
of the cell in the notebook.

The format menu item contains all commands for formatting text. The style sheets set the
definitions for appearances of elements for each cell type. We initialize every chapter with
the following commands. The first sets paths and options for often used functions, the second
contains the functions predefined for the book:

<< FrontEndVision'FEV';
A. Introduction to Mathematica 396

Mathematica can do both symbolic and numerical calculations. It has arbitrary precision.
N [~r, 1 0 0 ] gives the numerical value of 7r in 100 significant digits, D [ L o g [ x ] , x ] gives
the first derivative of the natural logarithm of x.

N[~, i00]

3.1415926535897932384626433832795028841971693993751058209749445923078\
16406286208998628034825342117068

D[Log[x], x]
1

Use of brackets in Mathematica:


{ } List Example: { x , y , z }
[ ] Function Example: H e r m i t e H [ n , x ]
( ) Combine Example: ( x + 3) 2
(*...*) Comment Example: (* T h i s function .. *)

The internal structure of every expression is a H e a d in front of a list of operands. Check the
internal representation with F u l l F o r m :

FullForm[{l, 2, ab, c 2, {p, q}}]

List[l, 2, Times[a, b], Power[c, 2], List[p, q]]

Mathematica is strongly list oriented. Lists can be nested in any order. Every expression is a
list, as are our image data. Most c o m m a n d s are optimized for list operations.

Notation:
- Multiplication is indicated with a space or *.
- A semicolon ; at the end of a c o m m a n d means "Print no output". Useful when a lot of
textual output is expected. The semicolon ; is also the regular expression statement separator.
- Enter Greek letters with the escape key - before and after it: E.g.: - p _--turns into 7r. Any
symbol can be entered through ’palettes’ (see the File m e n u on the title bar of your
Mathematica session).
- Enter superscript with c o n t r o l - " , subscript with c o n t r o l - , division line with
c o n t r o l - / , a square-root sign with control-2.
- Mathematica’s internal variables and functions all begin with a capitalized letter, your own
defined variables should always begin with a lower case for clear distinction: P i ,
Plot3D[ ],myfunction[ ], {x,y, z}.
- Often we use the ’postfix’ form for the application of a function: 2 ^ 100 // N is
equivalent to N [ 2100 ] .

2 ^ 100 // N

1.26765 • 1030
397 A.1 Quick overview of using Mathematica

Some front-end tips:


- The menu item Format - Show Toolbar gives a handy toolbar below the title bar of your
window.

- Keeping the alt-key press while dragging the mouse gives smooth window scrolling.
- H e l p is available on any command.

The full 1400 page manual is online under the Help menu item.

There is a very useful ’getting started’ section, and a ’tour of Mathematica’. Shortcut for help:
Highlight the text and press F1.
- Command completion is done with control-k, the list of arguments of a function with shift-
control-k.
- The notebook can be executed completely with the Kernel menu commands.
- All output can be deleted, which may save disk space considerably.
- A series of Graphics output can be animated by double-clicking one of the figures. The
bottombar of the window show steering controls for the animation.
- The input menus contains many interesting features, as 3D viewpoint selection, color
selection, hyperlinks, sound, tables and matrices, automatic numbering objects etc. and is
worth studying the features.

A.2 Quick overview of the most useful commands


The commands below occur often in this book. Full explanation and many examples are
given in the Help browser of Mathematica. For the sake of the readers that do not have
Mathematica running, available or at hand while reading this book, some short examples are
given of the actions of the commands.

Plot commands come with many options. See available options in the Help browser or with
e . g . Options [Plot3D].

Plot, PIot3D, ContourPlot, ListPlot, ListDensityPlot

Mathematica shows every plot it creates immediately. The output of the plot commands is
controlled by the option DisplayFunction. It specifies the function to apply for displaying
graphics (or sound).

To prevent intermediate results, e.g. while preparing a series of plots to be shown with
GraphiesArray, a useful construct is to create a scoping construct with
Block[{vars} . . . ]. Within a block, all variables , e a r n are hidden from the main
global context. E.g. in the following example the setting for $ D i s p l a y F u n c t i o n is
temporarily set in the block context to I d e n t i t y , which means: no output.
A. Introduction to Mathernatica 398

Block[{$DisplayFunction = Identity),
pl []Plot[Sin[x), {x, 0, 6~}];
p2 = Plot3D[Sin[x] Cos[y), {x, 0, 2~), {y, 0, 2~}]];
Show[GraphicsArray[{pl, p2}]];

Mathematica is L i s t oriented. This is a short nested List:

ma= {{a, b, c}, {d, e, f), {g, h, i}}

{{a, b, c}, {d, e, f}, {g, h, i}}

F u l l F o r m gives the internal representation, i.e. a H e a d with a series of operands:

FullForm [ma]

List[List[a, b, c], List[d, e, f], List[g, h, i]]

L i s t a b l e is an attribute of many functions. It means that they perform their action on all
elements of a list:

ma+ 1

{{l+a, l + b , l + c } , {l+d, l + e , l + f } , {l+g, l+h, 1+i}}

ma 2

{{a 2 ' b 2 ' cZ}, {d 2 ' e 2 ' f2), {g2, h 2 ' i2})

Clear [x] o r f [x] =. c l e a r s f [x]. R e m o v e [f] completely r e m o v e s the symbol f.

f=.; f3

fa

N e s t applies a function multiple times.

f=.; Nest[f, x, 3]

f[f[f[x]]]

With repeated operations (using N e s t ) a wide variety of self-similar structure can be


generated. From Stephen Wolfram’s new book [Wolfram2002] the gasket fractal:
399 A.2 Quick overview of the most useful commands

Nest[SubsuperscriptBox[#, #, #] &, "0", 5] //DisplayForm


n~n~

00~ ~0~ ~ 0~
nn nn~o0~
0n

nn nn
~ na
n nnn~
~ nnnnn
nn~
nnnn
n

Figure A.1 The gasket fractal is created by ropeated action, which is implemented with the
function N e s t . See also mathforum.org/advancedlrobertdltypefrac.html.

Map maps a function on the elements of a list:

Map [f, ma ]

{f[{a, b, c}], f[{d, e, f}], f[{g, h, ill}

You can specify the level in the list where the function should be mapped:

Map[f, ma, 2]

{f[{f[a], f[b], f[c]}], f[{f[d], f[e], f[f]}], f[{f[g], f[h], f[i]}]}

Map[f, ma, {2}]

{{f[a], f[b], f[cl}, {f[d], f[e], f[f]}, {f[g], f[h], f[i]} l

A p p l y replaces the head of an expression with a new head. This sums the columns:

Apply [Plus, ma]

{a+d+g, b+e+h, c+f+i}

This sums the rows, i.e. P l u s is applied at level 1 in the List:

Apply[Plus, ma, {I}]

[ a + b + c , d + e + f, g + h + i }

Apply can even replace at level O, i.e. the head itself. This sums all elements of a matrix:

Apply[Plus, ma, {0, 1}]

a+b+c+d+e+f+g+h+i

Some often used commands have short notations:


Apply @@
Map /@
A. Introduction to Mathematica 400

Replace /.
Condition /;

Postfix //

Times@@ma/. { c - > c 2}

{adg, b e h , c 2 fi}

In the following lines, the first statement rotates (cyclic) the elements one position to the
right, while the second rotates one position downward (rowshift down).

RotateRight/@ ma

{{c, a, b}, {f, d, e}, {i, g, h}}

RotateRight [ma]

{{g, h, i}, {a, b, c}, {d, e, f}}

Plus @@ ma

[a+d+g, b+e+h, c+f+i}

/a c/
ma // Mas

d e f
g h i

To execute a function on every dimension of a multidimensional array (e.g. 2D the columns


and the rows), use the function Map [ f , d a t a , l e v e l l with indication of the level of
mapping. Here is an example for an operation on a 31) array, for the z, y and x direction:

Clear[a, b, c, d, e, f, g]; m2 = {{{a, b}, {c, d}}, {{e, f}, {g, h}}};
Map [RotateRight, m2, {0} ]
Map [RotateRight, m2, { 1 } ]
Map [RotateRight, m2, {2 } ]

{{{e, f}, {g, h}}, {{a, b}, {c, d}}}

{{{c, d}, {a, b}}, {{g, h}, {e, f}}}

{{{b, a}, {d, c}}, {{f, e}, {h, g}}}

T a b l e generates lists of any dimension, e.g. vectors, matrices, tensors:

rt = Table[Random[] , {i, 1, 3}, {j, 1, 4}] ; rt / / M a t r i x F o r m

0.419066 0.685606 0.696535 0.470892 ]


0.898048 0.0157444 0.0700894 0.44903
0.0873124 0.818922 0.833205 0.0487981
401 A.3 Pure functions

A.3 Pure functions


An operator is a ’pure function’, e.g. (#2) & is a function without name, where some
operation on the operand # is performed, in this case squaring the variable. So (#2) &
means: ’square the argument’. Multiple variables are indicated with #1, #2 etc.

(#2) & [f]

f2

p = (Sin[#l+#2] +Sqrt[#2]) &;


P[f, g]

~+Sin[f+g]

We use a pure function frequently when we have to apply a function repeatedly to different
arguments:

f = windingnumber[#] &;
f /@{i, 4, 6}

{windingnumber[l], windingnumber[4], windingnumber[6]}

windingnumber[4]

or when we want to plot something for a particular range of scales: (the pure function is now
’ListDensityPlot the output of gD at some scale for the scales 2, 3 and 6’):

im= Import["mr256.gif"] [[1, 1]];


DisplayTogetherArray[
ListDensityPlot[gD[im, i, 0, #]] & /@ {2, 3, 6}, ImageSize ~ 410];

Figure A.2 Use of the function Map (/@).

A.4 Pattern matching


To show the usefulness of pure functions and the technique of pattern matching we work out
an example in some more detail: We do manipulations on words of a complete English
dictionary, consisting of 118617 English words.
A. Introduction to Mathematica 402

We find palindromes (words that are the same when written in reverse order) and word
length statistics. The example is taken from one of the Wolfram Mathematica tutorials, see
library.wolfram.com.

We read the d S a with R e a d L i s t and check its size with D i m e n s i o n . N o ~ t h ~ we put a


; (semicolon) m the end of the next cell. This prevents the full text of the dictionary to be
printed as output to the screen, which is in this case a very useful femure!

data = ReadList["dictionaryl18617.txt", String];


Dimensions{data]

{i18617}

These are the first 20 elements:

Take{data, 20]

{aardvark, aardvarks, aaronic, abaca, abaci, aback, abacterial, abacus,


abacuses, abaft, abalienate, abalienated, abalienating, abalienation,
abalone, abalones, abandon, abandoned, abandonedly, abandonee}

We use the commands for counting the number of letters in a string, and to reverse a string:

StringLength["FrontEndVision"]

14

StringReverse["FrontEndVision"]

noisiVdnEtnorF

The following command selects those elements which are equ~ to its reverse, and are longer
then 2 letters. && denotes the logic~ And. Note the use of the pure function:

Select{data, (# == StringReverse[#] && StringLength[#] > 2) &]

{adinida, aha, ama, ana, anna, bib, bob, boob, civic, dad, deed, deified,
deled, did, dud, eke, ene, ere, ese, esse, eve, ewe, eye, gag, gig, hah,
huh, kayak, kook, level, madam, malayalam, minim, mom, mum, nisin,
non, noon, nun, pap, peep, pep, pip, poop, pop, pup, radar, redder,
refer, reviver, rotator, rotor, sagas, sees, sexes, shahs, sis,
solos, sos, stats, succus, suus, tat, tenet, tit, tnt, toot, tot, wow}

To c~culate the length o f each word, we map S t r i n g L e n g t h on data:

wordLengths = Map[StringLength, data];


Max[wordLengths]

45

C o u n t {list, pattern] gives the number of elements in list that match pattern.
403 A.4 Pattern matching

Count[wordLengths, i0]

14888

t = Table[Count[wordLengths, i], {i, I, Max[wordLengths]}]

{0, 93, 754, 3027, 6110, 10083, 14424, 16624, 16551, 14888, 12008,
8873, 6113, 3820, 2323, 1235, 707, 413, 245, 135, 84, 50, 23,
16, 9, 4, 2, i, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, 2}

ListPlot [t, PlotStyle ~ PointSize [0.02] ,


AxesLabel -> { "wordlength", "occurrence" },
PlotJoined -> False, ImageSize -> 230] ;

e~
~c.rrence

150C0 9
12500
100~ 9

7500 9
5000

25O0 0o
e 9 ’ . . . . . . . . . . . . . . . . . . . . . . . . . wordlength
10 20 30 4O

Figure A.3 Histogram of word lengths in a large English dictionary.

Pattern matching is one of the most powerful techniques in Mathematica. There are three
symbols for a pattern: _ denotes anything which is a single element, _ _ denotes anything
which is one or more elements, _ _ denotes anything which is zero, one or more elements.
x denotes a pattern which is known under the name x. R e p l a c e m e n t (/.) is by R u l e
(~). The following statement replaces every occurrence of a into ~ / a :

f = . ; m a / . {a -> %/-a'].

{{~, b, c}, {d, e, f}, {g, h, i}}

This returns the positions in the dictionary where words are found of more then 23 letters:

Position[data, x_/; StringLength[x] > 23]

{{5028}, {17833}, {33114}, {33134}, {35841}, {49683}, {50319},


{57204}, {57205}, {60016}, {62133}, {62598}, {62599}, {63552},
{63723}, {63724}, {63725}, {67140}, {67141}, {70656}, {74099},
{74101}, {74103}, {74166}, {79229}, {79273}, {79274}, {79500},
{83241}, {104039}, {105810}, {i06774}, {113411}, {114511}}

This returns the first pair of two consecutive 13-letter words in the dictionary:

Dimensions[data]

{118617}
A. Introduction to Mathematica 404

d a t a /. { a ,
b _ / ; StringLength[b] == 13,
c_/; StringLength[c] == 13, d } -> [b, c}

{abstentionism, abstentionist}

A.5 Some special plot forms


ParametricPlot3D [{fx,fy,fz} , {t,tmin, tmax} , {u,umin,umax} ] creates a
surface, rather than a curve.

The surface is formed from a collection o f quadrilaterals. The corners o f the quadrilaterals
have coordinates corresponding to the values of the fi when t and u take on values in a
regular grid.

r[u_] : = - I + E ~ ; x=2r[u] Cos[u] Cos 1 ;

y = 2 r[u] C o s [ yv] 2 Sin[u] ; z = _ _ _ I r[2u] +r[u] Sin[v];

shell = ParametricPlot3D[{x, y, z}, {u, 0, 6Pi}, {v, 0, 2Pi},


FlotPoints ~ {i00, 40} , PlotRange ~ All, Axes ~ False, Boxed ~ False,
V i e w P o i n t - > {2.581, 1.657, 0.713}, ImageSize->200] ;

Figure A.4 An example of P a r a m e t r i c P l o t 3 D for the plotting of more complicated 3D


manifolds.
405 A.5 Some special plot forms

Clear[x, y, z]; < < F r o n t E n d V i s i o n ' I m p l i c i t P l o t 3 D ' ;


torus = I m p l i c i t P l o t 3 D [ z ^2 = 1 - (2 - S q r t [ x ^ 2 + y ^ 2 ] ) ^2,
{x, -3, 3}, {y, -3, 3}, {z, -i, i}, P l o t P o i n t s ~ {15, 15, i0},
P a s s e s ~ A u t o m a t i c , I m a g e S i z e - > 200];

Figure A.5 An example of ImplicitPlot3D for the plotting of more complicated 3D


manifolds.

A.6 A faster way to read binary 3D data


Mathematica has a set of utilities to read and write binary data from and to files. Examples of
such commands are R e a d . B i n a r y [ . . . ] and W r i t e B i n a r y [ . . . ]. They are available
in the package U t i l i t i e s ’ B i n a r y F i l e s " (see the help browser). These commands
however are slow.
A faster way is to use an external C program to read the data, and to communicate with this
program with MathLink. The program binary.exe (for Windows) is an executable C-program
that contains all commands of the package BinaryFiles" in a fast version. This
executable is available from MathSource at the URL:
www.mathsource.com/Content/Enhancements/MathLink/0206-783.
Here also versions for other platforms are available. It is beyond the scope of this book to
explain MathLink, but in the help browser and at the MathSource repository good manuals
are available. The package is installed by Install:

Insta11["binary.exe"];

The taskbar in Windows at the bottom of the screen should now display the active program
binary.exe with which we now will communicate.

Let us read a file in raw bytes dataformat with a 3D MRI dataset. It is given that the set
mri 01.bin contains 166 slices with 146 rows (x-dimension) and 168 columns (y-
dimensions). Each pixel is stored as an unsigned byte. To read this file we first need to open
it:

channel = OpenReadBinary[
$ F E V D i r e c t o r y <> " I m a g e s \ m r i 0 2 . b i n " , F i l e n a m e C o n v e r s i o n ~ Identity] ;

It is fastest to read the binary 3D image stack slice by slice. We first define space to store the
image, then we read 166 slices as bytes. Each slice is partitioned into 146 rows. At the end
we close the file.
A. Introduction to Mathematica 406

im = Table[{}, {166}];
Table[
im[[i]] = Partition[ReadListBinary[channel, Byte, 168 146], 146];,
{i, i, 166l] ;
Close[channel];

We check for the dimensions of our 3D image, about 4 million pixels:

Dimensions[im]

{166, 168, 146l

By calculating the Transpose of the 3D image, we can interchange the coordinates. The
second argument is the new ordering of the coordinates. In this way it is easy to plot the
other perpendicular planes. Let us look at the 85th image of the original stack, and the 85th
image of two transposed forms respectively, as shown in the statement below. As we see
from the left figure, tile original slices were acquired in the coronal plane. The transposed
images show us the sagittal plane (middle, Latin ’sagitta’ = arrow) and the transversal plane
(right). This method of transposing the data so we get other perpendicular planes is called
multiplanar reformatting.

DisplayTogetherArray[
ListDensityPlot/@ {im[[85]], Transpose[im, {3, 2, II][[85]],
Transpose[im, {3, i, 2}][[85]]}, ImageSize ~ 500l;

Figure A.6 Multiplanar reformatting is the visualization of perpendicular images in a 3D


dataset by transposing the 3D dataset. The left image is one of the original acquisitions in
the coronal plane. The middle image shows the 85th image in the sagittal plane. It is formed
by showing all rows of pixels perpendicular to the pixels in the 85th column in the left image.
The right image shows the 85th image in the transversal plane. It is formed by showing all
rows of pixels perpendicular to the pixels in the 85th row in the left image.

Multiplanar reformatting is of course only of high quality if the voxels are isotropic. Note in
this example that slight differences in overall intensities in the original 3D MRI acquisition
show up as vertical lines.

By manipulation of the pointer StreamPosition we can read an arbitrary slice from the 3D
dataset. This pointer points at the position just preceding the next bytes to read. After
opening the file, the streamposition is set to zero, which is at the beginning of the data. By
407 A.6 A faster way to read binary 3D data

setting the stream position pointer 84 images further (84"168"146 locations further from
zero), the next statement reads the 85th image only:

channel = O p e n R e a d B i n a r y [
$ F E V D i r e c t o r y <> " I m a g e s \ m r i 0 2 . b i n " , F i l e n a m e C o n v e r s i o n ~ Identity] ;
S e t S t r e a m P o s i t i o n [channel, 84 * 168 * 146] ;
im85 = P a r t i t i o n [ R e a d L i s t B i n a r y [ c h a n n e l , Byte, 1 6 8 1 4 6 ] , 146];
L i s t D e n s i t y P l o t [ i m 8 5 , I m a g e S i z e -> 150];

Figure A,7 Direct read of a single slice from a 3D dataset is best done by manipulation of the
stream position pointer.

Close [channel] ;

A.7 What often goes wrong


In this section we give a random set of traps in which you may easily fall if not warned:

A7.1 Repeated definition

When a function from a package is called before the package is actually read into the kernel,
Mathematica adds the name to its global list as soon as it appears:

R e m o v e [Histogram]

Histogram

Histogram

The function does not work, because the package Graphics'Graphics" has not been
read.

When subsequently the package is read, Mathematica complains that it may overwrite a
previous definition, and does not redefine the function H i s t o g r a m . The natural way out is
to R e m o v e the first definition, and to read the package again.
A. Introduction to Mathematica 408

<< Graphics'Graphics';
Histogram::shdw :
Symbol Histogram appears in multiple contexts {Graphics'Graphics', Global'};
definitions in context Graphics'Graphics" may
shadow or be shadowed by other definitions.

Remove[Histogram];
<< Graphics'Graphics';

Now the definition is fine:

? Histogram
Histogram[{xl, x2, ...}] generates a bar graph representing a histogram of the
univariate data {xl, x2, ...}. The width of each bar is proportional
to the width of the interval defining the respective category, and
the area of the bar is proportional to the frequency with which the
data fall in that category. Histogram range and categories may be
specified using the options HistogramRange and HistogramCategories.
Histogram[{fl, f2, ...}, FrequencyData -> True] generates a histogram
of the univariate frequency data {fl, f2, ...}, where fi is the
frequency with which the original data fall in category i. More...

Histogram[{xl, x2 . . . . }] generates a bar graph representing


a histogram of the univariate data {xl, x2 . . . . }. M0[e...

A7.2 Endless numerical output

Prevent accidental output to the notebook if not necessary and very long, eg. when an image
is calculated.

Any output is not printed when the statement is concluded with a semicolon. The first
statement generates about a million r a n d o m numbers to the screen, which will take a very
long time to generate and prevent you from continuing (luckily, we made the cell
inevaluatable). The second statement with the semicolon is fine.

m=Table[Random[], (1000}, {1000}]

m= Table[Random[l, {1000}, {1000}];

Use Alt-.to abo~ an unwanted ev~uation.

A7.3 For speed: make data numerical when possible

Be careful with symbolic computations on larger datasets. You may only be interested in the
numerical result. Compare the examples below:

mml= Table[Sin[xy], {y, 1, 8}, {x, 1, 8}];


Timing [ s y m b o l i c I n v e r s e = I n v e r s e [mml] ; ]

{ 36 9
343 Second, Null}

Even for this small matrix, each symbolic term is huge, and very impractical to handle. The
numerical result is very fast:
409 A.7 What often goes wrong

Timing [numericalInverse = Inverse [N [mml] ] ; }

{0. Second, Null}

We look at the result:

Short [numericalInverse, 4]

{{0.150096, -0.00900765, 0.16249, -0.295079,


-0.194507, -0.0870855, 0.139714, 0.336583}, <<6>>, {<<1>>}}

Another example: the numerical Eigenvalues of a matrix with 10,000 elements is computed
fast:

mm2 = Table[Random[} , {i00}, {100}] ;


Timing [Eigenvalues [mm2 ] ; ]

{0.032Second, Null}

And use functional programming and internal functions as much as possible.

Timing [array = Range [107 ] ; ]

{0.031Second, Null}

Timing[array= Table[i, {i, i, 107}] ;]

{5. 906 Second, Null}

A7.4 No more loops and indexing

E.g. to multiply each 2 elements of an array, from head to tail:

kl=m=Table[Random[], {i, i, 106}] ;


Timing[For[i=0, i < 1 0 6 , kl[[i]] = m [ [ i ] ] m [ [ 1 0 ~ - i + l ] ] , i++]] //First

18. 719 Second

The same result is acquired much faster if we use mathematical programming with native
Mathematica functions. They are optimized for speed, and programming becomes much
more elegant.

Timing [k = m Reverse [m] ] / / First

0.406 Second

A7.5 Copy and paste in InputForm

There are 4 format types for cells in Mathematica. InputForm, OutputForm,


StandardForm and T r a d i t i o n a l F o r m . See the help browser for a description of these
types. Be alert when pasting a T r a d i t i o n a l F o r m cell as input cell, Mathematica may be
not able to interpret this unequivocally. When you attempt this, Mathematica will issue a
warning, and you see the wigged line in the cell bracket.
A. Introduction to Mathematica 410

A.8 Suggested reading


A number of excellent books are available on Mathematica. A few of the best are listed here:

[Blachman1999] N. Blachman. Mathematica: A practical approach. Prentice Hall, 2nd


edition, 1999. ISBN 0-13-259201-0.
This complete tutorial is the easiest, quickest way for professionals to learn Mathematica, the
world’s leading mathematical problem-solving software. The book introduces the basics of
Mathematica and shows readers how to get around in the program. It walks readers through
all of Mathematica’s practical, built-in numerical functions--and covers symbolic
capabilities, plotting, visualization, and analysis.

[Ruskeep~i/~1999] H. Ruskeep~i/i. Mathematica Navigator: Graphics and methods of applied


mathematics. Academic press, London. 1999. ISBN 0-12-603640-3 (paperback+CD-ROM).

Mathematica Navigator gives you a general introduction to the use of Mathematica, with
emphasis on graphics, methods of applied mathematics, and programming.

The book serves both as a tutorial and as a handbook. No previous experience with
Mathematica is assumed, but the book contains also advanced material and material not
easily found elsewhere. Valuable for both beginners and experienced users, it is a great
source of examples of code.

From the author:


I would like first to ask, what is the general nature of your book Mathematica Navigator?
-

- Before answering, I would like to ask you, whether you know what is the difference
between an applied mathematician and a pure mathematician?
- Hm..., I seem to remember having heard some differences, but no, l don’t remember any at
this moment. Tell me.
- An applied mathematician has a solution for every problem while a pure mathematician has
a problem for every solution.
Yes, indeed. That is a very describing difference. But how does this maxim relate with my
-

question about the nature of your book?


I am an applied mathematician, and so I took the task of solving every problem - namely in
-

using Mathematica.
- Not a very modest goal...
- Frankly, I took the task to write as useful a guide as possible so that you would have a more
easy way to the wonderful world of Mathematica.
- Do you start with the basics?
- Yes, and then the book goes carefully through the main material of Mathematica.
- What are the main areas of Mathematica?
- Graphics, symbolic calculation, numerical calculation, and programming.
- And how far does your book go?
- The book contains some advanced topics and material not easily found elsewhere, such as
stereographic figures, graphics for four-dimensional functions, graphics of real-life data,
fractal images, constrained nonlinear optimization, boundary value problems, nonlinear
411 A.8 Suggested reading

difference equations, bifurcation diagrams, partial differential equations, probability,


simulating stochastic processes, statistics.

And for many subjects we also write our own programs, to practice programming.
- Do you emphasize symbolic or numerical methods?

- Both are important. For a given problem, we usually first try symbolic methods, and if they
fail, then we resort to numerical methods. Thus, for each topic, the book presents first
symbolic methods and then numerical methods. The book gives numerical methods a special
emphasis.
- Have you excluded some topics?
- Topics of a "pure" nature such as number theory, finite fields, quatemions, or graph theory
are not considered. Commands for manipulating strings, boxes, and notebooks are covered
only briefly. MathLink is left out (MathLink is a part of Mathematica enabling interaction
between Mathematica and external programs).
- Do you like to say something about the writing of the book?
- The writing was simply exciting. It is one of the most interesting epochs of my life thus far.
By writing the book I learned a lot about Mathematica and obtained a comprehensive view
of it. And the more I learned about Mathematica, the more I admired it.
- What are the fine aspects of Mathematica?
- It is consistent, reliable, and comprehensive. In addition, Mathematica has very powerful
commands, produces excellent graphics, and has a wonderful interface. However, it certainly
takes some time to get used to Mathematica, but that time is interesting and rewarding, and
then you have a powerful tool at your disposal.
- Thank you very much for this interview.
- Thank you.

[Wolfram1999] S. Wolfram. The Mathematica book. Fourth edition, Wolfram Media /


Cambridge University Press, 1999. 1470 pages. ISBN 0-52-164314-7.

The definite reference guide for Mathematica, written by the author of Mathematica,
Stephen Wolfram. Not a tutorial, but a handbook. The full text of this book is available in the
indexed help-browser of Mathematica, as well as a searchable document on the web:
documents.wolfram.com/v4/index3.html.

[M~ider1996a] R. M~ider, Programming in Mathematica, 3rd ed.. Addison-Wesley Pub., 1996.

This revised and expanded edition of the standard reference on programming in Mathematica
addresses all the new features in the latest versions 3 and 4 of Mathematica.

The support for developing larger applications has been improved, and the book now
discusses the software engineering issues related to writing and using larger programs in
Mathematica. As before, Roman M~ider, one of the original authors of the Mathematica
system, explains how to take advantage of its powerful built-in programming language. It
includes many new programming techniques which will be indispensable for anyone
interested in high level Mathematica programming.
A. Introduction to Mathematica 412

[M~ider1996b] R. M~ider, The Mathematica programmer II. Academic Press, 1996. 296
pages. ISBN 0-12-464992-0 (paperback). This book, which includes a CD-ROM, is a second
volume to follow The Mathematica Programmer (now out of print) and includes many new
programming techniques which will be indispensable for anyone interested in high level
Mathematica programming.

A9. W e b resources

Webpages are very dynamic, and it is impossible to give a complete overview here. Some
stable pointers to a wealth of information and support are:

www.wolfram.com: The official homepage of Wolfram Inc., the maker and distributor of
Mathematica. Here many links are available for support, add-on packages, books, the
complete Mathematica book on-line, WebMathematica, GridMathematica, etc.

www.mathsource.com: MathSource is a vast electronic library of Mathematica materials,


including immediately accessible Mathematica programs, documents, journals (Mathematica
in Education, the Mathematica Journal) and many, many examples. Established in 1990,
MathSource offers a convenient way for Mathematica developers and users to share their
work with others in the Mathematica community. In MathSource you can either browse the
archive or search by author, title, keyword, or item number.

There are many introductions to Mathematica and pages with helpful links. Here are some
examples:

Tour of Mathematica (also available in the helpbrowser of Mathematica)


www.verbeia.com/mathematica/tips/tip_index.html Ted’s Tips and Tricks
www.mathematica.ch/(in German)
phong.informatik.uni-leipzig.de/-kuska/mview3d.html/ (MathGL3d, an OpenGL translator
for Graphics3D structures)
www.unca.edu/~mcmcclur/mathematicaGraphics/Mathematica graphics examples
www.wolfram.com/products/applications/parallel/Parallel computing toolkit
www.wolfram.com/solutions/mathlink/jlink/Java toolkit
www.math.wright.edu/Calculus/Lab/Download/Calculus teaching material
forums.wolfram.com/mathgroup/MathGroup newsgroup archive
mathforum.org/math.topics.html MathForum by Drexler University
integrals.wolfram.corn/The Integrator
B. The concept of convolution
B.1 Convolution

<< FrontEndVision'FEV';
Show[Import["retinapsf.gif"], ImageSize-> 150];

. . . . .

~ ~’"~

Figure B.1 Experimental measurements of light that has been reflected from a human eye
looking at a fine line ('line spread function'). The reflected light has been blurred by double
passage through the optics of the eye. The amount of unsharpness is a function of the pupil
diameter: with wider pupil opening the line got more blurred. Source: Campbell and Gubish,
1996. Taken from [Wandel11995].

We take the example of the not-exactly-sharp projection of images on the human retina. Lens
errors, cornea distortions, the retinal nerve tissue: they all contribute to the ’smearing effect’.

In fact, every point in the stimulus gives rise to the same blurring effect. Figure 1 shows
some actual measurements at various pupil diameters. Suppose that the profile of the blur
function is a Gaussian kernel. Then every point of the stimulus will become a Gaussian
function when projected on the retina. We study the 1-D case, and call the blur function
g [ x , a ] . The blur function is also called the filter or the kernel, or the operator (sometime
we encounter the name stencil or template).

1 X2
glx_, ~ l .'-- - - "-p[---],
~/2 ~ 0 .2 2 a2

We define the input signal as s [ x ], and the output signal as g [ x ]. We can think of the
input signal as a series of points next to each other, and each of these input points gives rise
to a blurred output point. We plot the input point function f p n t [ x ] (a very narrow function
around the point x=O) at positions x=O and x=2:
414 B.1 Convolution

fpnt[x_] = If[-0.05 < x < C.05, I, 0];


Block[{$DisplayFunction= Identity},
pl = Plot[fpnt[x], {x, -4j. 4}]; p2 = P l o t [ f p n t [ x - 2 ] , {x, -4, 4}]];
Show[GraphicsArray[{pl, p2}], I m a g e S i z e - > 3 0 0 ] ;

11
08
0.6
04
lI
08
06
04
i ?
4 -2 2 4 4 -2
Figure B.2 The sampling function, at two arbitrary positions.

As an input signal we take an example function consisting of a sum of sine functions of


different frequencies. We can approximate this function as a set of points very close to each
other. The amplitude of each point is modulated with the input function, so we plot the
product f[a] fpnt[x-a].

f[x_] = Sin[x] + 0.5 Sin[5 x];


Block[{$DisplayFnnction = Identity},
p l = Plot[f[x], {x, -4, 4}];
p2 = Show[Table[Plot[fl~a] f p n t [ x - u ] ,
{x, -4, 4}, P l o t P o i n t s - > 160], {a, -4, 4, .11]]];
Show[GraphicsArray[{pl, p2}], ImageSize-> 350];

1.5 15

0.51 ~4 ~ ~0"5 ]
-4 -2 -4
0

-1

Figure B.3 Left: The input signal f(x), right: the sampled representation.

Every point is blurred on its own, so we replace every poinffunction fpnt [ a ] by its blurred
version g [ a ] . Every point is widened substantially, and the final step is adding all these
responses together.

Block[{$DisplayFnnction = Identlty},
pl = Show[Table[Plot[f[a] g a u s s [ x - a , 1],
{x, -4, 4}, PlotPoints-> 160], {a, -4, 4, .11]];
h[x_] = Sum[f[G] gauss[x- u, i], {a, -6, 6, .11];
p2 = Plot[h[x], {x, -4, 4}]];
B. The concept of convolution 415

Show[GraphicsArray[{p1, p2}], ImageSize-> 350];

0.6

2\,
Figure B.4 Left: every sample gives rise to a kernel, at the respective position with the
amplitude of the sample. Right: the summed response is the convolution.

In the limit of making the plot functions smaller and smaller we get a summation described
by an integral: the convolution-integral. So we may write for the output h [ x ] :

h[x] = g [ x - a , a] d a

It describes exactly how the output h [ x] looks like when an input signal f [ x] is processed
by a system g [ x ] . We say that the signal f [ x ] is filtered with a filter or kernel g [ x ] . We
also call g [ x ] the pointspread-function of the system. Note that the convolution-integral
integrates from -oo to oo, indicating that we integrate over the whole length of the signal. The
parameter ct is the so-called shift variable. (sometimes the convolution integral is called the
shift-integral).

Actually, the sum above is an approximation. The exact output is given by the convolution-
integral. Let us plot the integral, and draw your conclusions yourself. We have approximated
the solution quite well:

h[x_] : = ~ = f [ a ] g [ x - a , 1] da;
J_
p5 = Plot[Evaluate[h[x]], {x, -4, 4}, ImageSize-> 160] ;
06
04
02

Figure B.5 The convolution as an exact integral of the analytical input function and kernel.

The output signal looks different from the input signal, and this is exactly what we wanted:
we have filtered the high frequencies out of the signal. The frequency S i n [ 5x] is virtually
eliminated, as the figure above is almost the low frequency S i n [x] function, as we put in
the input. This low frequency filter is however 40% attenuated due to the filtering (check the
amplitudes).

A short form for writing the convolution integral is the symbol | the convolution operator.
The function h is the convolution of the function s with g
416 B.1 Convolution

h[x] = s | g[x]

The convolution operator is a linear operator:

(hl [x] + h2 [X] ) | [] hl Ix] | g[X l + h2 Ix] |


(a f[x] ) | = a (f[x I | )

In 2-D we get:

h [ x _ , Y_I :=
f~176 f [ a , ~] g [ x - a , y-~] dadll

B.2 Convolution is a product in the Fourier domain


It is often not so easy to calculate the convolution integral. It is often much easier to calculate
the integral of the convolution in the Fourier domain. This section explains how this works.
W e start from the convolution o f the function h(x) with the kernel g(x):

f (x) = r = h (y) g ( x - y ) dy

The Fourier transform F(~o) o f f(x) is then by definition

F ( t o ) = f _ o f (x) e .... a x = J ’ = f _ = h (y) g ( x _ y ) d y e .... d x

W e reorder the terms in the integ~ral and make the substitution x - y =-r, so we have
x = ~- + y and e -io~x = e -i~c. e -i~~ :

;_=h (y)
;_~ e-i=Xg ( x - y ) d x d y =
f_=h (y)
f~ e-~O(=*Y)g (’c) d ( ’ c + y ) d y

W h e n we integrate to T, we keep y constant, so d ( r + y ) is equal to dr. So we get:

;~ h (y)
f: e -i~ e -i~Y g (~) ~ dy

Because e -i~~ is constant in the integration to T, we may bring it as a constant outside the
integral:

~_~h (y)e -i~y d y ~_~e-i~*g (x)dr = H ( ~ ) . G (~o)

where H(a~) is the Fourier transform o f the function h(x) and G(~o) is the Fourier transform
of the kernel g(x). So the Fourier transform o f a convolution is the product o f the Fourier
transform o f the filter with the Fourier transform o f the signal. To put it otherwise:

f(x) = h(x) | g(x)


$ $ $
F(co) = H(~o). G(| J)
B. The concept of convolution 417

We can calculate a convolution h | g by calculating the Fourier transforms H(o)) and G(~o),
multiply them to get F(oJ), and we get f(x) by taking the inverse Fourier transform of F(~o).
Often this is a much faster method then calculating the convolution integral, because the
routines that calculate the Fourier transform are so fast. We look at an example where we
filter noise from the data. We simulate a signal of a noisy ECG recording by some sine
functions:

data = Table [

N[Sin[, 25622r2t] + S i n [ ~ 2~r 6t] + 0.75 (Random[] - 1/2)] {t, 256} ] ;

ListPlot [data, AspeetRatio -> .2, PlotJoined -> True,


ImageSize -> 400, AxesLabel -> { "time", "Ampl" } ] ;
Ampl

05
" ’ time
-O5
-t
1.5

Figure B.6 A discrete input signal with additive uneorrelated uniform noise.

We make a filter with a Gaussian shape and shift it to the origin:

a = 5.; k e r n e l = Table[gauss[x, a], {x, -128, 127}];


kernel = RotateLeft[kernel, 128] ;
ListPlot[kernel, PlotRange-> {{0, 256}, All}, P l o t J o i n e d - > T r u e ,
AspectRatio -> .2, AxesLabel -> { "Freq", "Ampl" }, ImageSize -> 400] ;
Ampl
0.08
006
0O4
0O2
. . . . . . . . . . . . . . . . . . . . . . Freq
50 10(3 150 200 250

Figure B.7 The Gaussian kernel in the spatial domain, shifted to the origin. The right half of
the frequency axis (i.e. 129-256) is often called the negative frequency axis.

This is a low-pass filter. We now filter (convolve) in the Fourier domain (note that a space
denotes multiplication in Mathematica):

cony = Sqrt [256] Inverse}?ourier[Fourier [data] Fourier [kernel] ] ;


ListPlot [ cony, AspectRatio - > . 2, P l o t J o i n e d - > True,

AxesLabel -> {"time", "Ampl" }, ImageSize -> 400] ;


Ampl

05
time
-05
-I

Figure B.8 The filtered result.

As expected, the noise did not pass the filter.


C. Installing the book and
packages
C.1 Content
TheCD-ROM with the book "Front-End Vision and Multiscale Image Analysis" contains the
following:

- the ZIP archive file F r o n t E n d V i s i o n . z i p with all the necessary files with pathnames.
This file contains:
- the directory FrontEndVision/Documentation/English with the collection of
Mathematica notebooks of the total book. All chapters, appendices, table of contents and the
referencelist are separate notebooks. This is the sourcecode of the book.
- the notebook FrontEndVision/FEV.nb and package FrontEndVision/FEV.m.
This notebook contains the initializations and the image analysis functions used throughout
the book. The package is loaded as the first command in every notebook of this book that
contains executable code. Do not edit this package, as it is automatically generated from
FEV. n b upon saving.
- the directory F r o n t E n d V i s i o n ] : i . m a g e s with the images, figures and other data used in
the book.
- the directory FrontEndVision/Documentation/English/pdf with the P D F
versions of all notebooks.
- the directory F r o n t E n d V i s i o n / L i v e G r a p h i c s 3 D with the source code and an
example of the interactive 3D Java viewer LiveGraphics3D.
-the stylenotebook
SystemFiles/FrontEnd/StyleSheets/FrontEndVision.nb, which contains
the style directives (layout and makeup) of the notebooks on your screen and printer.

- the stylenotebook
SystemFiles/FrontEnd/StyleSheets/PackageNotebook.nb, which contains
directives for notebooks to generate a package automatically (as F E V . rda).
the file F r o n t E n d V i s i o n / b i n a r y .
- e x e with is a compiled C++ executable, to read
binary data fast. This is now integrated in Mathematica, it may be useful for older versions.

When the kernel of Mathematica is not available, the notebooks can be read with the flee
stand-alone notebook reader Mat:b_Reader. This is the front-end program, to read and view
the Mathematica notebooks of the book. The latest version of M,a e h R e a d e r can be
downloaded for free from www.wolfiram.com/products/mathreader. Versions are supplied for
Windows, Macintosh, Linux and Unix.
The notebooks of this book can only be run on Mathematica 4.0 and later.
C. Installing the book and packages 420

C.2 Installation for all systems


For: Windows 95/98/2000/NT/XP, Macintosh, Unix, Linux.
Run the command $ T o p D i r o c t o r y . This returns the directory where the file
FrontEndVision. z i p file should be copied and executed to uncompress the files in that
directory.

$TopDirectory
C:\Program Files\Wolfram Research\Mathematica\4.2

All files will be automatically installed in the right locations. E.g. for Windows users:
At the end you should have the following directory where you find your notebooks:
C:kProgram Files\Wolfram Research~vIathematica\4. lkAddOnskApplications\
FrontEndVision’4)ocumentation\Engli sh

C.3 Viewing the book in the Help Browser


The book can best be interactively viewed in the Help Browser of Mathematica:

1. Run the menu command "Rebuild[ Help Index" in the Help menu in the top menubar in
Mathematica.

The commands in the file


$TopDirectory\A ddOnskApplications’\FrontEndVision\Documentation\EnglishkBrowserCateg
ories.m contains the instructions to generate the appropriate menus in the Help browser;

2. Open de Help Browser and click on the button "AddOns". In the left menu panel the book
is now available.

The notebooks viewed with the He![p Browser can all be executed as normal notebooks.
However, the Help browser files are read-only. A particular useful feature is to copy and
paste the commands in the book through the Help Browser as directly usable code into your
own notebooks.

The book has an extensive reference list. The last chapter is the extensive alphabetical index.
The Help Browser enables quick hypertext jumping to the appropriate chapter and cell where
the selected keyword appears. The same feature is true for the TableOfContent notebook.

Some suggestions:

- Increase the magnification of the Help Browser for all notebooks to e.g. 150% by setting
the magnification globally for this notebook. This is accomplished, after selection the Help
Browser as current notebook, with the Option Inspector, available in the Format menu.
’Show option values for:’ global. Search for the keyword ’magnification’ and set the value to
your liking.
421 C.4 Sources of additional applications

C.4 Sources of additional applications


Get the last versions of
- The OpenGL 3D viewer MathGL3d:
phong.informatik.uni-leipzig.de/~kuska/mathgl3dv3/;
- The Java interactive 3D animation viewer LiveGraphics3D:
wwwvis.informatik.uni-stuttgart.de/~kraus/LiveGraphics3D/.
D. First Start with Mathematica:
Tips & Tricks
This notebook is only available in the electronic version.

1. Evaluation

1,1 Startup

1.2 Cells

1,3 Prevent unwanted (large) output to the screen

1.4 Interrupt, stop and start over

1,5 Manual shortcuts

1,6 The help browser

1.7 Packages

1,8 Setting the Path to find files

1.9 Write in mathematical notation

1.10 The size of notebooks

1,11 Checking for proper print layout

1,12 Suggestions

2. Images

2.1 Read an image

2.2 Take a submatrix, a subimage

2,3 Sampling points from an image

2,4 Draw a contour on the image

2,5 Generate an animation


424 2.1mages

2.6 Automatic numbering objects

2.7 Check the internal structure of a cell

2.8 Resizing Graphics without regenerating it

2.9 Display a group of images

2.10 Text on graphics

3. Programming
3.1 Fast summation

3.2 Random numbers and noise images

3.3 Gaussian noise

3.4 Interpolation

3.5 Pure functions

3.6 Use internal functions for speed

3.7 Good practice

4.3D

4.1 Changing the ViewPoint

4.2 Interactive 3D display

4.3 LiveGraphics3D functions


References
[1] P. Abbott (ed.), "Tricks of the trade", The Mathematica Journal, Wolfram Media Inc., vol. 7, no. 2, 105-127,
1998.
[2] M. Abramowitz, "Handbook of mathematical functions". Dover Publications, 1971.
[3] S. T. Acton, A. C. Bovik, and M. M. Crawford, "Anisotropic diffusion pyramids for image segmentation", in
Proc. first Intern. Conf. on Image Processing, pp. 478-482, IEEE, 1994.
[4] S. T. Acton, "Diffusion-based edge detectors", in: Handbook of image and video processing, A. Bovik ed.,
Academic Press, San Diego, 2000.
[51 E. H. Adelson and J. R. Bergen, "Spatiotemporal energy models for the perception of motion", Journal of the
Optical Society of America-A, vol. 2, no. 2, pp. 284-299, 1985. Also appeared as MIT-MediaLab-TR148.
September 1990.
[6] E. H. Adelson and J. R. Bergen, "Spatiotemporal energy models for the perception of motion", Journal of the
Optical Society of America-A, vol. 2, no. 2, pp. 284-299, 1985. Also appeared as MIT-MediaLab-TR148.
September 1990.
[7] J. Aggarwall and N. Nandhakumar, "On the computation of motion from sequences of images", IEEE Tr.
PAMI, vol. 76, no. 8, pp. 917-935, 1988. A review.
[8] A. Almansa and T. Lindeberg, "Enhancement of fingerprint images using shape-adaptated scale-space
operators", IEEE Tr. on Image Processing. In: J. Sporring, M. Nielsen, L. Florack, and P. Johansen (eds.)
Gaussian Scale-Space Theory: Proc. PhD School on Scale-Space Theory , Copenhagen, Denmark, May 1996,
Kluwer Academic Publishers, 1997.
[9] J. M. Alonso and L. M. Martinez, "Functional connectivity between simple cells and complex cells in cat
striate cortex", Nature Neuroscience, vol. 1, pp. 395-403, 1998.
[10] L. Alvarez, P. L. Lions, and J. M. Morel, "Image selective smoothing and edge detection by nonlinear
diffusion. II", SIAM Journal on Numerical Analysis, vol. 29, pp. 845-866, June 1992.
[11] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel, "Axioms and fundamental equations of image
processing", Archives for Rational Mechanics, vol. 123, pp. 199-257, September 1993.
[12] L. Alvarez and J. M. Morel, "Formalization and computational aspect of image analysis", Acta Numerica,
1994.
[13] L. Alvarez and L. Mazorra, "Signal and image restoration using shock filters and anisotropic diffusion",
SIAM Journal on Numerical Analysis, vol. 31, pp. 590-605, January 1994.
[14] L. Alvarez and J. M. Morel, "Morphological approach to multiscale analysis", in Geometry-Driven Diffusion
in Computer Vision (B. M. ter Haar Romeny, ed.), Computational Imaging and Vision, pp. 229-254, Kluwer
Academic Publishers B.V., 1994.
[ 15] L. Alvarez, "Images and PDE’s", in Proc. of 12th Intern. Conf. on Analysis and Optimization of Systems (M.-
O. Berger, R. Deriche, I. Heriin, J. Jaffr6, and J.-M. Morel, eds.), vol. 219 of Lecture Notes in Control and
Information Sciences, pp. 3-14, Springer, London, 1996.
116] W. F. Ames, "Numerical methods for partial differential equations". New York, San Francisco: Academic
Press, 1977.
117] A. A. Amini, T. E. Weymouth, and R. C. Jain, "Using dynamic programming for solving variational
problems in vision", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 12, no. 9, pp. 855-867, 1990.
[181 F. R. Amthor, E. S. Takahashi and C. W. Oyster, "Morphologies of rabbit ganglion cells with concentric
receptive fields", Join. of Comparitive Neurology, vol. 280, pp. 72-96, 1989.
[19] S. Angenent, "On the formation of singularities in the curve shortening flow", Journal of Differential
Geometry, vol. 33, pp. 601-633, 1991.
[20] V. I. Arnold, "Singularity theory". Cambridge: Cambridge University Press, 1981.
[21] H. Asada and M. Brady, "The curvature primal sketch", IEEE Tr. on Pattern Analysis and Machine
Intelligence, vol. 8, no. 1, pp. 2-14, 1986.
[22] J. J. Atick and A. N. Redlich, "Mathematical model of the simple cells in the visual cortex", Biological
Cybernetics, vol. 63, pp. 99-109, 1990.
[23] N. Ayache, "Medical computer vision, virtual reality and robotics", Image and Vision Computing, vol. 13,
pp. 295-313, May 1995.
[24] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda, "Uniqueness of the Gaussian kernel for scale-space
filtering", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 8, no. 1, pp. 26-33, 1986.
References 426

[25] S. Back, H. Neumann, and H. S. Stiehl, "On segmenting computed tomograms", in Proceedings of the 3rd
Intern. Symposium CAR ’89 (H. U. Lemke, M. L. Rhodes, C. C. Jaffee, and R. Felix, eds.), Berlin, Springer-
Verlag, 1989.
[26] S. Back, H. Neumann, and H. S. Stiehl, "On scale-space edge detection in computed tomograms", in
Proceedings of the 1 lth DAGM-Symposium, Hamburg (H. Burkhardt, K.-H. Hoehne, and B. Neumann, eds.),
Berlin, Springer-Verlag, 1989.
[27] R. Balart, "Matrix reformulation of the Gabor transform", Optical Engineering, vol. 31, pp. 1235-1242, June
1992.
[28] P. Baldi and W. Heiligenberg, "How sensory maps could enhance resolution through ordered arrangements
of broadly tuned receivers", Biological Cybernetics, vol. 59, pp. 313-318, 1988.
[29] C. Ballester and M. Gonzalez, "Affine invariant multiscale segmentation by variational methods", in Eighth
Workshop on Image and Multidimensional Image Processing, (Cannes), pp. 220-221, IEEE, September 8-10
1993.
[30] D. Bar-Natan, "Random dot stereograms", The Mathematica Journal, vol. 1, no. 3, pp. 69-75, 1991.
[31] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, "Performance of optical flow techniques", Intern. Journal of
Computer Vision, vol. 12, no. 1, pp. 43-77, 1994.
[32] R. Bauer and B. M. Dow, "Local and global properties of striate cortical organization: an advanced model",
Biological Cybernetics, vol. 64, pp. 477-483, 1991.
[33] S. Beauchemin and J. Barton, "The computation of optic flow", ACM Computing Surveys, vol. 27, no. 3, pp.
433-467, 1995.
[341 A. Bebernes and D. Eberly, Mathematical Problems from Combustion Theory. Springer-Verlag, 1989.
[35] J. V. Beck, K. D. Cole, A. Haji-Sheikh, and B. Litkouhi, Heat Conduction using Green’s Functions. London:
Hemispere Publishing Corporation, 1992.
[36] W. Beil, "Steerable filters and invariance theory", Pattern Recognition Letters, vol. 16, no. 11, pp. 453-460,
1994.
[37] B. Bell and L. F. Pan, "Contour tracking and comer detection in a logic programming environment", IEEE
Tr. on Pattern Analysis and Machine Intelligence, vol. 12, no. 9, pp. 913-917, 1990.
[38] S. Belongie, J. Malik and J. Puzicha, "Shape matching and object recognition using shape contexts", IEEE
Tr. on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, 2002.
[39] B. M. Bennett, D. D. Hoffman, and C. Prakash, Observer Mechanics. A Formal Theory of Perception.
London: Academic Press, 1989.ISBN 0-12-0888635-9.
[40] F. Bergholm, "Edge focusing", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 9, pp. 726-741,
November 1987.
[41] G. L. Bilbro, W. E. Snyder, S. J. Gamier, , and J. W. Gault, "Mean field annealing: A formalism for
constructing GNC-like algorithms", IEEE Trans. Neural Networks, vol. 3, January 1992.
[42] T. O. Binford, "Inferring surfaces from images," Artificial Intelligence, vol. 17, pp. 205-244, 1981.
[43] M. Bister, J. Cornelis, and A. Rosenfeld, "A critical view of pyramid segmentation algorithms", Pattern
Recognition Letters, vol. 11, pp. 605-617, 1990.
[44] E. Bj6rkman, J. C. Zagal, T. Lindeberg, P. E. Roland, "Evaluation of design options for scale-space primal
sketch analysis of brain activation images", HBM’2000, Intern. Conf. on Functional Mapping of the Human
Brain, San Antonio, Texas, 2000.
[45] N. Blachman, "Mathematica: A Practical Approach". 2nd edition, Mathematica 3.0, Prentice Hall, 631 pp.,
ISBN 0132592010, 1999.
[46] A. Blake and A. Zisserman, "Visual Reconstruction". Cambridge, Mass.: MIT Press, 1987.
[471 H. Blakemore, C. Blakemore and H. Barlow, Images and Understanding: Thoughts about Images, Ideas
about Understanding, Cambridge University Press, April 1989.
[48] C. Blakemore (Ed.), Vision: Coding and Efficiency, Cambridge University Press, January 1990.
[49] W. Blaschke and K. Reidemeister, Differential Geometry, vol. 1-2. Springer-Verlag, 1923.
[50] G.G. Blasdel and G. Salama. "Voltage-sensitive dyes reveal a modular organization in money striate cortex",
Nature, vol. 321, pp. 579-585, 1986.
[51] J. Blom, "Modellen voor de funktionele ordening van (1-dim.) zintuig-systemen", Tech. Rep. V-raft-38-85,
Department of Medical and Physiological Physics, University of Utrecht, Princetonplein 5, 3584 CC Utrecht,
Netherlands, 1985.
[52] J. Blom, "Affine Invariant Corner Detection", in: PhD Thesis, Utrecht University, NL-Utrecht, 1991.
[53] J. Blom, "Topological and Geometrical Aspects of Image Structure". PhD thesis, Utrecht University, 1992.
[54] J. Blom, B. M. ter Haar Romeny, and J. J. Koenderink, "Affine invariant corner detection", tech. rep., 3D
Computer Vison Research Group, Utrecht University NL, 1992.
[55] J. Blom, J. J. Koenderink, B. M. ter Haar Romeny, and A. M. L. Kappers, Topological image-structure for a
discrete image on a hexagonal lattice with finite intensity sampling", J. of Vis. Comm. and Im. Repr., 1992.
427 References

[56] J. Blom, B. M. ter Ham" Romeny, A. Bel, and J. J. Koenderink, "Spatial derivatives and the propagation of
noise in Gaussian scale-space", J. of Vis. Comm. and Image Repr., vol. 4, pp. 1-13, March 1993.
[57] J. A. Bloom and T.R. Reed, "A Gaussian Derivative-Based Transform", 1EEE Tr. on Image Processing, Vol.
5, No. 3, 1996.
[58] D. Blostein and N. Ahuja, "Representation and three-dimensional interpretation of image texture: An
integrated approach", in Proc. 1st Int. Conf. on Computer Vision, (London), pp. 444-449, IEEE Computer Society
Press, 1987.
[59] H. Blum, "Biological shape and visual science", J. Theor. Biology, vol. 38, pp. 205-287, 1973.
[60J G. Bluman and S. Kumei, "A remarkable nonlinear diffusion equation", Journal of Mathematical Physics,
vol. 21, pp. 1019-1023, May 1980.
[61] T. Bonhoeffer and A. Grinvald, "The layout of iso-orientation domains in area-18 of cat visuabcortex -
optical imaging reveals a pinwheel-like organization". J. Neurosci. vol. 13, pp. 4157-4180, 1993.
[62] F. L. Bookstein, "Principal warps: thin-plate splines and the decomposition of deformations", IEEE trans.
Pattern Analysis and Machine Intelligence, Vol. 11, No. 6, pp. 567-585, 1989.
[63] G. Borgefors, "Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm", IEEE Trans.
Pattern Anal. Machine Intell., vol. 10, no. 6, 1988.
[64] A. C. Bovik, M. Clark, and W. S. Geisler, "Multichannel texture analysis using localized spatial filters",
IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 12, no. 1, pp. 55-73, 1990.
[65 ] A. Bovik, "Handbook of image and video processing", Academic Press, 2000.
[66] B. B. Boycott and H. WO.ssle. "The morphological types of ganglion cells of the domestic cat’s retina".
Journal of Physiology, 240:379-419, 1974.
[67] C. B. Boyer, A History of Mathematics. Brooklyn,New York: Princeton University Press, January 1968.
First Princeton Paperback Printing 1985.
[68] K. A. Brakke, "The motion of a surface by its mean curvature", tech. rep., Princeton University Press,
Princeton NY, 1978.
[69] A. Brauer, "Uber die Nullstellen der Hermitesehen Polynome", Mathematische Annalen, vol. 107, pp. 87-89,
1933.
[70] L. Bretzner and T. Lindeberg, "Feature tracking with automatic selection of spatial scales", In Linde, Sparr
(Eds.): Proc. Swedish Symposium on Image Analysis, SSAB’96 Lund, Sweden, pp. 24-28, March 1996. Extended
version in: Computer Vision and Image Understanding. vol. 71, pp. 385--392, Sept. 1998.
[71] L. Bretzner and T. Lindeberg, "On the handling of spatial and temporal scales in feature tracking", Proc.
First Intern. Conf. on Scale-Space Theory in Computer Vision, Utrecht, Netherlands, B.M. ter Haar Romeny ed.,
Springer-Verlag Lecture Notes in Computer Science, volume 1252. July 2-4, 1997.
[72] L. Bretzner and T. Lindeberg, "Qualitative multi-scale feature hierarchies for object tracking", in M. Nielsen,
P. Johansen, O. F. Olsen and J. Weickert (Eds) Proc. 2nd Intern. Conf. on Scale-Space Theory in Computer
Vision, Corfu, Greece, September 1999. Springer Lecture Notes in Computer Science, vol 1682, pp. 117--128.
Extended version in J. of Visual Communication and Image Representation, 11, 115-129, 2000.
[73] R. W. Brockett and P. Maragos, "Evolution equations for continuous-scale morphology", in Proc. Intern.
Conf. on Acoustics, Speech and Signal Processing, pp. 125-128, IEEE, 1992.
[74] J. W. Bruce and P. J. Giblin, Curves and Singularities.Cambridge: Cambridge University Press, 1984.
[75] V. Bruce, P. R. Green, and M. A. Georgeson, Visual Perception. Hove, East Sussex, UK: Psychology Press,
1996.
[76] A. M. Brnckstein and A. N. Netravali, "On differential invariants of planar curves and recognizing partially
occluded planar shapes", in Proc. of Visual Form Workshop, (Capri), Plenum Press, May 1990.
[77] A. M. Brnckstein, R. J. Holt, A. N. Netravali, and T. J. Richardson, "Invariant signatures for planar shape
recognition under partial occlusion", in Proceedings of the l l t h IAPR international conference on pattern
recognition, 1992. Long version in Computer Vision, Graphics and Image Processing: Image Understanding, vol.
58, hr. 1, pp. 49-65, i993.
[78] K. Brunnstrom, J.-O. Eklundh, and T. Lindeberg, "On scale and resolution in active analysis of local image
structure", Image and Vision Computing, vol. 8, no. 4, pp. 289-296, 1990.
[79J K. Brnnnstrom, T. Lindeberg, and J.-O. Eklundh, "Active detection and classification of junctions by
foveation with a head-eye system guided by the scale-space primal sketch", in Proe. second European Conf. on
Computer Vision (G. Sandini, ed.), vol. 588 of Lecture Notes in Computer Science, (Santa Margherita Ligure,
Italy), pp. 701-709, Springer-Verlag, May 1992.
[80] B. Buck, A. C. Merchant, and S. M. Perez, "An illustration of Benford’s first digit law using alpha decay half
lives". European Journal of Physics, vol. 14, pp. 59-63, 1993.
[81] C. A. Burbeck and S. M. Pizer, "Object representation by cores: indentifying and representing primitive
spatial regions", Tech. Rep. TR94-048b, University of North Carolina at Chapel Hill, 1994.
[82] P. J. Burt, T. H. Hong, and A. Rosenfeld, "Segmentation and estimation of image region properties through
cooperative hierarchical computation", IEEE Tr. on Systems, Man, and Cybernetics, vol. 11, no. 12, pp. 802-825,
1981.
References 428

[83] P. J. Burt, "Fast filter transforms for image processing", Computer Vision, Graphics, and Image Processing,
vol. 16, pp. 20-51, 1981.
[84] P. J. Burt and E. H. Adelson, "The Laplacian pyramid as a compact image code", IEEE Trans.
Communications, vol. 9, no. 4, pp. 532-540, 1983.
[85] P. J. Burt, "Multiresolution images precessing and analysis", chapter in: The pyramid as a structure for
efficient computation, pp. 6-35. Berlin: Springer Veflag, 1984. A. Rosenfeld, Ed.
[86] P. Buser and M. Imbert, "Vision". London, England: The MIT Press, 1994.
[87] J. Canny, "A computational approach to edge detection", IEEE Tr. on Pattern Analysis and Machine
Intelligence, vol. 8, no. 6, pp. 679-698, 1986.
[88] V. Cantoni and S. Levialdi, eds., Pyramidal Systems for Computer Vision. Berlin: Springer-Verlag, 1986.
[89] E. Cartan, "La th~orie des groupes finis et continus et la gdometrie differentielle traitdes par la m6thode du
r6p~re mobile". Ganthiers-Villars, 1937.
[90] E. Cartan, "Les probl6mes d’6quivalence", in Oeuvres Compl6tes, vol. 2, pp. 1311-1334, Paris: Gauthiers-
Villars, 1952.
[91] E. Cartan, "Leqons sur la geometrie des espaces de Riemann". Paris: Gauthier-Villars, 2 ed., 1963.
[92] D. Casasent and D. Psaltis, "Position, rotation, and scale invariant optical correlation", Applied Optics, vol.
15, no. 7, pp. 1795-1799, 1976.
[93] V. Caselles, F. Cart6, T. Coil, and F. Dibos, "A geometric model for active contours in image processing",
Numerische Mathematik, vol. 66, pp. 1-31, 1993.
[94] V. Caselles, R. Kimmel, and G. Sapiro, "Geodesic snakes", tech. rep., Department of Mathematics,
University of Illes Balears, Palma de Mallorca, Spain, 1994.
[95] V. Caselles, R. Kimmel, and G. Sapiro, "Geodesic active contours", in Proc. Fifth Intern. Conf. on Computer
Vision (E. Grimson, S. Shafer, A. Blake, and K. Sugihara, eds.), pp. 694-699, 1995.
[961 S. Castan, J. Zhao and J. Shen, "Optimal filter for edge detection methods and results". In Proc. First Eur.
Conf. on Computer Vision, pp. 13-17, 1990.
[971 F. Catt6, P. L. Lions, J. M. Morel, and T. Coll, "hnage selective smoothing and edge detection by nonlinear
diffusion", SIAM Journal on Numerical Analysis, vol. 29, pp. 182-193, February 1992.
[98] F. Catt6, F. Guichard, and G. K0pfler, "A morphological approach to mean curvature motion", Teeh. Rep.
9310, CEREMADE, Universit6 Paris Dauphine, 1993.
[99] F. Catt6, "Convergence of iterated affine and morphological filters by nonlinear semigroup theory", in Proc.
of 12th Intern. Conf. on Analysis and Optimization of Systems (M. O. Berger, R. Deriche, I. Herlin, J. Jaffr6, and
J. M. Morel, eds.), vol. 219 of Lecture Notes in Control and Information Sciences, pp. 125-133, Springer,
London, 1996.
[100] A. Cayley, "On contour and slope lines", The London, Edinghburgh and Dublin Philosophical Magazine
and J. of Science, vol. 18, no. 120, pp. 264-268, 1859.
[101] A. Chehikian and J. L. Crowley, "Fast computation of optimal semi-octave pyramids", in Proc. 7th Scand.
Conf. on Image Analysis, (Aalborg, Denmark), pp. 18-27, August 1991.
[102] J. S. Chen and G. Medioni, "Detection, localization and estimation of edges", IEEE Tr. on Pattern Analysis
and Machine Intelligence, vol. 11, pp. 191-198, 1989.
[103] M. Chen and P. Yan, "A multiscale approach based on morphological filtering", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 11, no. 7, pp. 694-700, 1989.
[104] W. Chen, T. Kato et al., "LGN activation during visual imagery tasks shown by fMRI", Proc. 2 na Intern.
Conf. on Functional Mapping of the Human Brain 1996.
[105] F. H. Cheng and W. H. Hsu, "Parallel algorithm for corner finding on digital curves", Pattern Recognition
Letters, vol. 8, pp. 47-53, 1988.
[1061 P. S. Churchland and T. J. Sejnowski, "The Computational Brain". MIT Press, 1992.
[107] J. Clark, "Singularity theory and phantom edges in scale-space", IEEE Tr. on Pattern Analysis and Machine
Intelligence, vol. 10, no. 5, 1988.
[108] J. J. Clark, "Authenticating edges produced by zero-crossing algorithms", IEEE Tr. on Pattern Analysis and
Machine Intelligence, vol. 11, pp. 43-57, 1989.
[ 109] A. Clebsch, "Theorie der Bin~iren Algebraischen Formen". Leipzig: Verlag von Teubner, 1872.
[110l J. Coggins and A. Jain, "A spatial filtering approach to texture analysis", Pattern Recognition Letters, vol.
3, pp. 195-203, 1985.
[111] M. A. Cohen and S. Grossberg, "Neural dynamics of brightness perception: Features, boundaries, diffusion,
and resonance", Perception and Psychophysies, vol. 36, no. 5, pp. 428-456, 1984.
Ill 12] I. Cohen, L. D. Cohen, and N. Ayache, "Using deformable surfaces to segment 3D images and infer
differential structures", Computer Vision, Graphics, and Image Processing, vol. 56, pp. 242-263, 1992.
[113] T. Cohignac, F. Eve, F. Guichard, and J. M. Morel, "Numerical analysis of the fundamental equation of
image processing", Tech. Rep. 9254, CEREMADE, Universite Paris Dauphine, 1992.
429 References

[114] T. Cohignac, F. Guichard, and J. M. Morel, "Multiscale analysis of shapes, images and textures", in Eighth
Workshop on Image and Multidimensional Image Processing, (Cannes), pp. 142-143, 1EEE, September 8-10
1993.
[115] T. Cohignac, F. Eve, F. Guichard, and C. Lopez, "Affine morphological scale-space: Numerical analysis of
its fundamental equation", tech. rep., Ceremade, Universite Paris Dauphine, 1993.
[116] R. Cormack and R. Fox, "The computation of retinal disparity", Perception and Psychophysics, vol. 37, no.
2, pp. 176-178, 1985.
[117] T. N. Cornsweet, "Visual perception", Academic Press, New York, 1970.
[118]G. H. Cottet, "Diffusion approximation on neural networks and applications for image processing", in Proc.
Sixth European Conf. on Mathematics in Industry (F. Hodnett, ed.), (Stuttgart), pp. 3-9, Teubner, 1992.
[119] J. Crank, "Mathematics of diffusion". London: Oxford University Press, 1956.
[120] J. R. Crowley, "A representation for shape based on peaks and ridges in the Difference of Low-Pass
Transform", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 6, no. 2, pp. 156-170, 1984.
[121] J. L. Crowley and R. M. Stern, "Fast computation of the difference of low pass Transform", IEEE Tr. on
Pattern Analysis and Machine Intelligence, vol. 6, pp. 212-222, 1984.
[I22] J. L. Crowley and A. C. Sanderson, "Multiple resolution representation and probabilistic matching of 2-D
gray-scale shape", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 1 i3-121, 1987.
[123] C. A. Curcio, K. R. Sloan, R. E. Kalina, A. E. Hendrickson, "Human photoreceptor topography". Journal of
Comparative Neurology, vol. 292, pp. 497-523, 1990.
[124] D. M. Dacey and S. Brace, "A coupled network for parasol but not for midget ganglion cells in the primate
retina", Visual Neuroscience, vol. 9, pp. 279-290, 1992.
[125] G. Dalmaso, J. M. Morel, and S. Solimini, "A variational method in image segmentation: Existence and
approximation results", Aeta Math., vol. 168, pp. 89-151, 1992.
[126] E. Dam & M. Lillholm, "Generic Events for the Isophote Curvature", Graduate project, University of
Copenhagen, http://www.it-c.dk/people/erikdam, March 1999.
[127] E. B. Dam and Mads Nielsen, "Non-linear diffusion for interactive multi-scale watershed segmentation.
Proceedings for MICCAI 2000, Lecture Notes in Computer Science, volume 1935, October 2000.
[128] E. Dam & M. Nielsen, "Non-Linear Diffusion for Interactive Multi-scale Watershed Segmentation", in
Proceedings for MICCAI 2000, Pittsburg, Lecture Notes in Computer Science, vol. 1935, pp. 216-225, 2000.
[129] E. Dam, P. Johansen, O.F. Olsen, A. Thomsen, T. Darvann, A.B. Dobrzeniecki, N.V. Hermann, N. Kitai, S.
Kreiborg, P. Larsen and M. Nielsen: "Interactive Multi-Scale Segmentation in Clinical Use", CompuRAD, ECR
2000, Vienna, 2000.
[130] J. Damon, "Local Morse theory for solutions to the heat equation and Gaussian blurring", J. of Differential
Equations, vol. 115, no. 2, pp. 368-401, January 1995.
[131] J. Damon, Chapter "Local Morse Theory for Solutions to the Heat Equation and Gaussian Blurring" in
"Gaussian Scale-Space Theory" (Edited by J. Sporting, M. Nielsen, L. Florack, & P. Johansen), Kluwer
Academic Publishers, 1997.
[132] J. Damon, "Ridges and cores for two-dimensional images", Jour. Math. Imag. and Vision, vol. 10, pp.
163-174, 1999.
[133] P.-E. Danielsson and O. Seger, "Rotation invariance in gradient and higher order derivative detectors",
Computer Vision, Graphics, and Image Processing, vol. 49, pp. 198-221, 1990.
[134] J. G. Dangman, "Uncertainty Relation for Resolution in Space, Spatial Frequency, and Orientation
Optimized by Two-Dimensional Visual Cortical Filters", J. Opt. Soc. Am., Vol. 2, 1985.
[1351 J. G. Daugman, "Two-dimensional spectral analysis of cortical receptive fields profile", Vision Research,
vol. 20, pp. 847-856, 1980.
[136] J. G. Daugman, "Six formal properties of anisotropic visual filters: structural principles and
frequency/orientation selectivity", IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 882-887, 1983.
[137] J. G. Daugman, "Uncertainty relation for resolution in space spatial frequency, and orientation optimized by
two-dimensional visual cortical filters", Journal of the Optical Society of America-A, voI. 2, pp. 1160-1169, 1985.
[138] J. G. Daugman, "Pattern and motion vision without Laplacian zero crossings", Journal of the Optical
Society of America-A, vol. 5, no. 7, pp. 1142-1148, 1988.
[1391 J. G. Daugman, "Complete discrete 2-D Gabor transforms by neural networks for image analysis and
compression", IEEE Tr. Acoust., Speech, Signa Processing, vol. 36, no. 4, pp. 1169-1179, 1988.
[140] L. S. Davis, "A survey on edge detection techniques", Comp. Graph. and Image Proc., vol. 4, no. 3, pp.
248-270, 1975.
[141] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman, "Depth is encoded in the visual cortex by a specialized
receptive field structure.", Nature, vol. 352, pp. 156-159, 1991.
[142] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman, "Spatiotemporal organization of simple-cell receptive
fields in the cat’s striate cortex", Journal of Neurophysiology, vol. 69, no. 4, pp. 1091-1135, 1993.
[143] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman, "Receptive field dynamics in the central visual pathways",
Trends Neurosci., vol. 18, pp. 451-458, 1995.
References 430

[144] G. C. DeAngelis, Geoffrey M. Ghose, Izumi Ohzawa, and Ralph D. Freeman, "Functional Micro-
Organization of Primary Visual Cortex: Receptive Field Analysis of Nearby Neurons", The Journal of
Neuroscience, vol. 19, no. 10, pp. 4046-4064, 1999. URL
[145] E. De Boer and P. Kuyper, "Triggered correlation", IEEE Trans. Biomedical Engineering, vol. 15, pp.
169-179, 1968.
[146] C. de Boor, "A practical guide to splines". Springer Verlag, New York, 1978.
[147] E. De Micheli, B. Caprile, P. Ottonello, and V. Torre, "Localization and noise in edge detection", IEEE Tr.
on Pattern Analysis and Machine Intelligence, vol. 10, no. 11, pp. 1106-1117, 1989.
[148] R. Deriche, "Using Canny’s criteria to derive an optimal edge detector recursively implemented". Intern. J.
of Computer Vision, Vol. 1, pp. 167-187, 1987.
[149] R. Deriche, "Fast algorithms for low-level vision", IEEE Tr. on Pattern Analysis and Machine Intelligence,
vol. 12, no. 1, pp. 78-87, 1990.
[150] R. Deriche, "Recursively implementing the Gaussian and its derivatives", Proc. Second Intern. Conf. on
Image Processing, pp. 263-267, Singapore, 1992.
[151] R. L. De Valois, N. P. Cottaris, L. E. Mahon, S. D. Elfar, J. A. Wilson, "Spatial and temporal receptive
fields of geniculate and cortical cells and directional selectivity", Vision Research, vol. 40, pp. 3685-3702, 2000.
[152] F. Devernay and O. Faugeras, "Computing differential properties of 3-D shapes from stereoscopic images
without 3-D models", in Proceedings of the IEEE Computer Society Conf. on Computer Vision and Pattern
Recognition, (Seattle, Washington), pp. 208-213, June 1994.
[153] S. Dickinson, "Object representation and recognition", in: E. Lepore and Z. Pylyshyn (eds.), "What is
Cognitive Science?", Basil Blackwell publishers, pp. 172--207, 1999.
[154] R. W. Ditchburn and B. L. Ginsborg, "Vision with a stabilized retinal image", Nature, vol. 170, pp. 36-37,
1952.
[155] B. Doolittle and E. Maclay, "The forest has eyes". The Greenwich Workshop Press, 1998.
[156] A. J. van Doorn, J. J. Koenderink, and M. A. Bouman, "The influence of the retinal inhomogeneity on the
perception of spatial patterns", Kybernetik, vol. 10, pp. 223-230, 1972.
[157] L. Dorst and R. van den Boomgaard, "Morphological signal processing and the slope transform", Signal
Processing, voi. 38, pp. 79-98, 1994.
[158] L. Dorst and R. van den Boomgaard, "Orientation-based representations for mathematical morphology", in
Shape, Structure and Pattern Recognition (D. Dori and A. Brnckstein, eds.), (Nahariya,Israel), pp. 13-22, October
1993.
1159] B. Dubuc and S. W. Zucker, "Complexity, confusion, and perceptual grouping. Part I and II, Int. J. of
Computer Vision, 42(1/2): 55-115, 2001; reprinted in J. Math. Imaging and Vision, 15(1/2): 55-115, 2001.
[160] R. Duits, L. M. J. Florack, B. M. ter Haar Romeny, and J. de Graaf, "On the axioms of scale-space theory."
Proc. Fourth IASTED International Conference on Signal and Image Processing (SIP 2002), August 12-14, 2002,
Kauai, Hawaii, USA.
[161] D. Eberly, D. Fritsch, and C. Kurak, "Filtering with a normalized Laplacian of a Ganssian filter", in
Proceedings of the SPIE Intern. Symposium, Mathematical Methods in Medical Imaging, (San Diego, CA), 1992.
1162] D. Eberly, R. Gardner, B. Morse, S. Pizer, and C. Scharlach, "Ridges for image analysis." Journal of
Mathematical Imaging and Vision, July 1993.
1163] D. Eberly, Geometric Analysis of Ridges in N-Dimensional Images. PhD thesis, University of North
Carolina at Chapel Hill, Computer Science Department, 1994.
[164] D. Eberly, "A differential geometric approach to anisotropic diffusion", in Geometry-Driven Diffusion in
Computer Vision (B. M. ter Haar Romeny, ed.), pp. 371-392, Dordrecht: Kluwer Academic Publishers, 1994.
[165] D. H. Eberly, Geometric Methods for Analysis of Ridges in N-Dimenional Images. PhD thesis, The
University of North Carolina, Chapel Hill, North Carolina, January 1994. Department of Computer Vision.
11661 J. Elder and S.W. Zucker. Local scale control for edge detection and blur estimation. In Lecture Notes in
Computer Science, pages 57-69, New York, 1996. Proc. 4th European Conf. on Computer Vision, Springer
Verlag.
[167] J. Elder, J. and S. W. Zucker, "Evidence for boundary-specific grouping in human vision", Vision
Research, 38(1): 143-152, 1998.
[1681 H. Farid and E. P. Simoncelli, "Optimally rotation-equivariant directional derivative kernels", 7th Int’l
Conf. on Computer Analysis of Images and Patterns, Kiel, Germany. September 10-12, 1997.
[169] O. Faugeras, "On the motion of 3D curves and its relationship to optical flow", in Proc. ECCV’90 (O.
Faugeras, ed.), (Antibes, France), pp. 107-117, Springer-Verlag, April 1990.
[170] O. Faugeras, "Cartan’s moving frame method and its applications to the geometry and evolution of curves
in the Euclidean, affine and projective planes", Tech. Rep. TR-2053, INRIA, 1993.
[171] O. Faugeras, "Computer vision research at INRIA", Intern. Journal of Computer Vision, vol. 10, no. 2, pp.
91-99, 1993.
11721 O. Faugeras, Three-Dimensional Computer Vision.MIT Press, 1994.
431 References

[1731 O. Faugeras and R. Keriven, "Affine curvature from affine scale-space", in Proc. Fifth Intern. Conf. on
Computer Vision, 1995.
[174] D. Ferster, K. D. Miller, "Neural Mechanisms of Orientation Selectivity in the Visual Cortex", Annual
Reviews of Neuroscience, Vol. 23, pp. 441-471, 2000.
[175] M. Fidrich and J.-P. Thirion, "Multiscale representation and analysis of features from medical images", in
Intern. Conf. on Computer Vision, Virtual Reality and Robotics in Medicine (N. Ayache, ed.), vol. 905 of LNCS,
Nice, pp. 358-364, April 1995.
[176] M. Fidrich and J.-P. Thirion, "Multiscale extraction of features from medical images", in Intern. Conf. on
Computer Analysis of Images and Patterns (V. Hlavac and R. S’ara, eds.), vol. 970 of LNCS, (Prague), pp.
637-642, September 1995.
[177] B. Fischer, "Overlap of receptive field centers and representation of the visual field in the cat’s optic tract",
Vision Research, vol. 13, pp. 2113-2120, 1973.
[178] D. J. Fleet and A. D. Jepson, "Hierarchical construction of orientation and velocity selective filters", PAMI,
vol. 11, pp. 315-325, March 1989.
[179] D. J. Fleet and A. D. Jepson, "Computation of component image velocity from local phase information",
Intern. Journal of Computer Vision, vol. 5, no. I, pp. 77-104, 1990.
[1801 L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Scale and the differential
structure of images", Image and Vision Computing, vol. 10, pp. 376-388, 1992.
[181] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "General intensity
transformations and second order invariants", in Theory and Applications of Image Analysis (P. Johansen and S.
Olsen, eds.), vol. 2 of Series in Machine Perception and Artificial Intelligence, pp. 22-29, Singapore: World
Scientific, 1992.
[182] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Cartesian differential
invariants in scale-space", Journal of Mathematical Imaging and Vision, vol. 3, pp. 327-348, November 1993.
[183] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Images: Regular
tempered distributions", in Proceedings of the NATO Advanced Research Workshop Shape in Picture -
Mathematical description of shape in greylevel images (Y.-L. O, A. Toet, H. J. A. M. Heijmans, D. H. Foster, and
P. Meer, eds.), vol. 126 of NATO ASI Series F, pp. 651-660, Springer Verlag, Berlin, 1994.
[184] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Linear scale-space",
Journal of Mathematical Imaging and Vision, vol. 4, no. 4, pp. 325-351, 1994.
[185] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "General intensity
transformations and differential invariants", Journal of Mathematical Imaging and Vision, vol. 4, pp. 171-187,
May 1994.
[186] L. M. J. Florack and M. Nielsen, "The intrinsic structure of the optic flow field", ERCIM Technical Report
07/94-R033, 1994.
[187] L. M. J. Florack, A. H. Salden, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Nonlinear
scale-space", Image and Vision Computing, vol. 13, pp. 279-294, May 1995.
1188] L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "The Gaussian scale-
space paradigm and the multiscale local jet", Intern. Journal of Computer Vision, vol. 18, pp. 61-75, April 1996.
[189] L. M. J. Florack, "Data, models and images", in IEEE Intern. Conf. on Image Processing ICIP’96, P.
Delogne, ed., Lausanne, CH, pp. 469-472, September 16-19 1996.
[190] L. M. J. Florack, "The concept of a functional integral - a potentially interesting method for image
processing", Tech. Rep. 96/7, Institute of Datalogy, University of Copenhagen, 1996.
[191] L. M. J. Florack, "Image structure", Kluwer Academic Publishers, Dordrecht, the Netherlands, 1997.
[192] L. M. J. Florack, "The intrinsic structure of optic flow incorporating measurement duality", Intern. Journal
of Computer Vision, vol. 27, no. 3, pp. 263-286, 1998.
[193] L. M. J. Florack, R. Maas, and W. J. Niessen, "Pseudo-linear scale space theory", International Journal of
Computer Vision, vol. 31, no. 2/3, pp. 247-259, 1999.
[194] L. M. J. Florack, "A spatio-frequeney trade-off scale for scale-space filtering", IEEE Tr. on Pattern Anal.
and Mach. Intell. PAMI, vol. 22, no. 9, pp. 1050-1055, 2000.
[195] L. M. J. Florack and A. Kuijper, "The topological structure of scale-space images", Journal of
Mathematical Imaging and Vision, Vol. 12, No. i, pp. 65-79, 2000.
[196] L. M. J. Florack, "Motion extraction - an approach based on duality and gauge theory," in R. Klette, H. S.
Stiehl, M. A. Viergever, and K. L. Vincken, eds., Performance Characterization in Computer Vision, vol. 17 of
Computational Imaging and Vision Series. Kluwer Academic Publishers, pp. 69-80, 2000.
[197] L. M. J. Florack, "A geometric model for cortical magnification," S.-W. Lee, H. H. Btilthoff, and T.
Poggio, eds., Biologically Motivated Computer Vision, vol. 1811 of Lecture Notes in Computer Science. Berlin:
Springer-Verlag, pp. 574-583, May 2000.
[198] L. M. J. Florack, "Scale-space theories for scalar and vector images," in M. Kerckhove, ed., Scale-Space
and Morphology in Computer Vision: Proceedings of the Third International Conference, Scale-Space 2001,
Vancouver, Canada, vol. 2106 of Lecture Notes in Computer Science. Berlin: Springer-Verlag, pp. 193-204, July
2001.
References 432

[199] L. M. J. Florack, "Non-linear scale-spaces isomorphic to the linear case with applications to scalar, vector
and multispectral images," Journal of Mathematical Imaging and Vision, vol. 15, pp. 39-53, July/October 2001.
[2001 M. A. F/3rstner and E. GUlch, "A fast operator for detection and precise location of distinct points, corners
and centers of circular features", in Proc. Intercommission Workshop of the Int. Soc. for Photogrammetry and
Remote Sensing, (Interlaken, Switzerland), 1987.
[201] J. B. Fourier, "The Analytical Theory of Heat". New York: Dover Publications, Inc., 1955.Replication of
the English translation that first appeared in 1878 with previous corrigenda incorporated into the text, by
Alexander Freeman, M. A. Original work: "Theorie Analytique de la Chaleur", Paris, 1822.
[202] R. D. Freeman and I. Ohzawa, "On the neurophysiological organization of binocular vision", Vision
Research, vol. 30, no. 11, pp. 1661-1676, 1990.
[203] W. T. Freeman and E. H. Adelson, "Steerable filters for early vision, image analysis and wavelet
decomposition", in Proc. 3rd Int. Conf. on Computer Vision, (Osaka, Japan), IEEE Computer Society Press,
December 1990.
[204] W. T. Freeman and E. H. Adelson, "The design and use of steerable filters", IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 13, pp. 891-906, September 1991.
[205] W. T. Freeman and M. Roth, "Orientation histograms for hand gesture recognition", in Proc. of the 1EEE
Int. Workshop on Automatic Face and Gesture Recognition, 1995.
[206] D. Fritsch, Registration of Radiotherapy Images Using Multiscale Medial Descriptions of Image Structure.
PhD thesis, The University of North Carolina at Chapel Hill, Department of Biomedical Engineering, 1993.
[2071 J. Frnhlich and J. Weickert, "Image processing using a wavelet algorithm for nonlinear diffusion", Tech.
Rep. 104, Laboratory of Technomathematics, Univ. of Kaiserslautern, Germany, March 1994.
[208] R. E. Frye and R. S. Ledley, "Derivative of Gaussian functions as receptive field models for disparity
sensitive neurons of the visual cortex", Proceedings of the 1996 Fifteenth Southern Biomedical Engineering
Conf., pp. 270-273, 1996.
[209] D. Gabor, "Theory of communications", Journal IEEE London, vol. 93, pp. 429-457, 1946.
[2101 M. Gage, "An isoperimetric inequality with applications to curve shortening", Duke Mathematical Journal,
vol. 50, pp. 1225-1229, 1983.
[2111 M. Gage, "Curve shortening makes convex curves circular", Invent. Math., vol. 76, pp. 357-364, 1984.
[212] M. Gage and R. S. Hamilton, "The heat equation shrinking convex plane curves", Journal of Differential
Geometry, vol. 23, pp. 69-96, 1986.
[213] M. Gage, "On an area-preserving evolution equation for plane curves", Contemporary Mathematics, vol.
51, pp. 51-62, 1986.
[214] E. B. Gamble and T. Poggio, "Visual integration and detection of discontinuities: The key role of intensity
edges", tech. rep., MIT A.I. Lab, 1987. A.I. Memo No. 970.
[215] J. GSrding, "Shape from texture for smooth curved surfaces in perspective projection", Journal of
Mathematical Imaging and Vision, vol. 2, pp. 329-352, 1992.
[216] J. Ggtrding and T. Lindeberg, "Direct computation of shape cues by multi-scale retinotopic processing."
submitted, 1993.
[217] J. Gfirding and T. Lindeberg, "Direct estimation of local surface shape in a fixating binocular vision
system", in Proc. 3rd European Conf. on Computer Vision (J.-O. Eklundh, ed.), vol. 800 of Lecture Notes in
Computer Science, (Stockholm, Sweden), pp. 365-376, Springer-Verlag, May 1994.
[218] J. Gfirding and T. Lindeberg, "Direct computation of shape cues based on scale-adapted spatial derivative
operators", Intern. Journal of Computer Vision, vol. 17, no. 2, pp. 163-192, 1996.
[219] J. L. Gardner, A. Anzai, I. Ohzawa, and R. D. Freeman, "Linear and nonlinear contributions to orientation
tuning of simple cells in the cat’s striate cortex". Visual Neuroscience 16:1115-1121, 1999.
[220] L. J. Garey (Ed.): "Brodmann’s localisation in the cerebral cortex", Smith Gordo & Co, 1992.
[221] J.M. Gauch, "Image Segmentation and Analysis via Multiscale Gradient Watershed Hierachies", IEEE
Transactions on Image Processing, 8(1):69-79, January 1999.
[222] D. Geiger and A. Yuille, "A common framework for image segmentation", Intern. Journal of Computer
Vision, vol. 6, no. 3, pp. 227-243, 1991.
[223] S. Geman and D. Geman, "Stochastic relaxation, gibbs distributions, and the bayesian restoration of
images", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 6, pp. 721-741, 1984.
[224] R. Geraets, A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Affine scale-space for discrete
pointsets", in Proc. Soc. for Neural Networks (C. Gielen, ed.), Nijmegen, the Netherlands, SNN, 1995.
[225] R. Geraets, A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Object recognition by affine
evolution of measured interest points", in Proc. Computing Science in the Netherlands, Utrecht, the Netherlands,
pp. 86-97, SION, 1995.
[226] G. Gerig, O. K"ubler, R. Kikinis, and F. A. Jolesz, "Nonlinear anisotropic filtering of MRI data", Journal of
Mathematical Imaging and Vision, vol. 11, pp. 221-232, June 1992.
433 References

[227] G. Gerig, G. Szekely, and T. Koller, "Line-finding in 2-D and 3-D by multi-valued non-linear diffusion of
feature maps", in DAGM Symposium, Informatik aktuell (S. J. Poeppl and H. Handels, eds.), pp. 289-296,
Springer-Verlag, 1993.Mustererkennung 1993, 15.
[228] G. Gerig, G. Sz’ekely, G. Israel, and M. Berger, "Detection and characterization of unsharp blobs by curve
evolution", in Proe. Information Processing in Medical Imaging (IPMI’95) (Y. B. et al., ed.), Series on
Computational Imaging and Vision, pp. 165-176, Kluwer Academic Publishers, June 1995.
[229] H. J. M. Gerrits, B. de Haan, and A. J. H. Vendrik, "Experiments with retinal stabilized images, relations
between the observations and neural data", Vision Research, vol. 6, pp. 427-440, 1966.
[230] J. M. Geusebroek, A. Dev, R. van den Boomgaard, A. W. M. Smeulders, F. Cornelissen and H. Geerts,
"Color Invariant edge detection". In: Scale-Space theories in Computer Vision, Lecture Notes in Computer
Science, vol. 1252, pp. 45%464, Springer-Verlag, 1999.
[23l] J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders and A. Dev. "Color and scale: the spatial
structure of color images". In: Eur. Conf. on Computer Vision 2000, Lecture Notes in Computer Science, Vol.
1842, Springer, pp. 331-341, June 26 - July 1, 2000.
[232] J. M. Geusebroek, A. W. M. Smeulders and R. van den Boomgaard, "Measurement of Color Invariants".
Proc. CVPR, vol. 1, pp. 50-57, June 13-15, 2000.
[233] J. M. Geusebroek, D. Koelma, A. W. M. Smeulders and Th. Gevers. "Image Retrieval and Segmentation
based on Color lnvariants". Proc. CVPR, June 13-15, 2000.
[234] T. Gevers and A. W. Smeulders. "Color based object recognition". Patt. Recogn. 32, 453-464, 1999.
[235] G. M. Ghose, I. Ohzawa, and R. D. Freeman, "Receptive field maps of correlated discharge between pairs
of neurons in the cat’s visual cortex", Journal of Neurophysiology, vol. 71, no. 1, pp. 330-346, 1994.
[236] J. Gibbon and R. M. Church, "Time-left: linear versus logarithmic subjective time". Journal of
Experimental Psychology: Animal Behavior Processes, vol. 7, pp. 87-108, 1981.
[237] J. J. Gibson, The Perception of the Visual World. Boston: Houghton-Mifflin, 1950.
[238] B. Gillam and B. Rogers, "Orientation disparity, deformation, and stereoscopic slant perception",
Perception, vol. 20, pp. 441-448, 1991.
[239] R. Gilmore, Catastrophe theory for scientists and engineers. New York: Wiley-Interscience, 1981.
[240] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore: The Johns Hopkins University Press,
1989. Second Edition.
[241] Goodchild, A.K., K.K. Ghosh, and P.R. Martin. "Comparison of photoreceptor spatial density and ganglion
cell morphology in the retina of human, macaque monkey, cat, and the marmoset Callithrix jacchus". Journal of
Comparative. Neurology. 366 : 55-75, 1996.
[242] C. d. Graaf, S. M. Pizer, A. Toet, J. J. Koenderink, and P. P. van Rijk, "Pyramid segmentation of medical
3D images", in Proc. of the 1984 Int. Joint Alpine Symposium, pp. 71-77, IEEE, 1984.
[243[ N. Graham, "The visual system does a crude Fourier analysis of patterns", in SIAM-AMS Proceedings (S.
Grossberg, ed.), vol. 13, (Hillsdale, New Jersey), pp. 1-16, American Mathematical Society, Lawrence Erlbaum
Associates, 1981.
[244] A. Gray, "Modern differential geometry of curves and surfaces". CRC Press Inc., Boca Raton, 1993, second
edition 1997.
[245] M. Grayson, "The heat equation shrinks embedded plane curves to round points", Journal of Differential
Geometry, vol. 26, pp. 285-314, 1987.
[246] H. Greenspan, S. Belongie, R. Goodman, P. Perona, S. Rakshit, and C. H. Anderson, "Overcomplete
steerable pyramid filters and rotation invariance", in Proc. IEEE Computer Soc. Conf. on Computer Vision and
Pattern Recognition, CVPR’94, pp. 222-228, IEEE, 1994.
[247] W. E. L. Grimson, "From images to surfaces". Cambridge MA: MIT Press, 1981.
[248] S. Grossberg and D. Todorovic, "Neural dynamics of 1-D and 2-D brightness perception: A unified model
of classical and recent phenomena", Perception and Psychophysics, vol. 43, pp. 241-277, 1988.
[249] F. Guichard, "Multiscale analysis of movies", in Eighth Workshop on Image and Multidimensional Image
Processing, (Cannes), pp. 236-237, 1EEE, September 8-10 1993.
[250] A. Guiducci, "Corner characterization by differential geometry techniques", Pattern Recognition Letters,
vol. 8, pp. 311-318, 1988.
[251] R. W. Guillery, "A quantitative study of synaptic interconnections in the dorsal lateral geniculate nucleans
in the cat", Zeitschrift fur Zellforschung, vol. 96, pp. 39-48, 1969.
[252] R. W. Guillery, "The organization of synaptic interconnections in the laminae of the dorsal lateral
geniculate nucleaus in the cat", Zeitschrift fur Zellforschung, vol. 96, pp. 1-38, 1969.
[253] R. W. Guillery, "Patterns of synaptic interconnections in the dorsal lateral geniculate nucleaus of cat and
monkey: A brief review", Vision Research (Suppl.), vol. 3, pp. 211-227, 1971.
[254] M. M. Gupta and G. K. Knopf, eds., "Neuro-vision systems, principles and applications". A selected reprint
volume, IEEE Press, New York, 1993.
[255] B. Gurevieh, "Foundations of the theory of algebraic invariants". Groningen: P. Noordhoff, 1979.
[256] W. Hackbush, "Multi-grid methods and applications". New York: Springer-Vedag, 1985.
References 434

[257] J. S. Hadamard, "Sur les problbmes aux derivtes partielles el leur signification physique." Bull. Univ.
Princeton, vol. 13, pp. 49-62, 1902.
[258] T. Hall, "Carl Friedrich Gauss, a biography". M.I.T. Press, Cambridge, 1970.
[259] R. M. Haralick, "Zero-crossing of second directional derivative edge operator", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 6, pp. 58-68, 1984.
[260] H. K. Hartline, "The receptive fields of optic nerve fibers", American Joernal of Physiology, vol. 130, pp.
690 699, 1940.
[261] M. Hashimoto and J. Sklansky, "Multiple order derivatives for detecting local image characteristics",
Computer Vision, Graphics, and Image Processing, vol. 39, pp. 28-55, 1987.
[262] D. J. Heeger, "Optical flow using spatiotemporal filters", Intern. Journal of Computer Vision, vol. 1, pp.
279-302, 1988.
[263] H. J. A. M. Heijmans, "Mathematical morphology: a geometrical approach in image processing", Nieuw
Archief voor Wiskunde, vol. 10, pp. 237-276, November 1992.
[264] F. Heitger, L. Rosenthaler, R. vonder Heydt, E. Peterhans, and O. KUbler, "Simulation of neural contour
mechanisms: from simple to end-stopped cells", Vision Research, vol. 32, no. 5, pp. 963-981, 1992.
[265] E. Hering. "Outlines of a theory of the light sense". Harvard University Press, Cambridge, 1964.
[266] D. Hilbert, "0ber die Theorie tier algebraischen Formen", Math. Annalen, vol. 36, pp. 473-534, 1890.
[267] D. Hilbert, "Uber die vollen Invariantensystemen", Math. Annalen, vol. 42, pp. 313-373, 1893.
[268] H. Hildreth, "The detection of intensity changes by computer and biological visual systems", Computer
Vision, Graphics, and Image Processing, vol. 22, pp. 1-27, 1983.
[269] E. Hildreth, "The measurement of visual motion".Cambridge, Mass.: M. I. T. Press, 1983.
[270] E. C. Hildreth, "Computations underlying the measurement of visual motion", Artificial Intelligence, vol.
23, pp. 309-354, 1984.
[271] B. K. P. Horn and B. Schunck, "Determining optic flow". Artificial Intelligence, vol. 23, pp. 185-203, 1981.
[272] B. K. P. Horn, "Robot Vision". Cambridge MA: MIT Press, 1986.
[273] B. Horn and M. Brooks, eds., "Shape from shading".Cambridge, Mass.: M. I. T. Press, 1989.
[274] D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction, and functional architecture in the
cat’s visual cortex", Journal of Physiology, vol. 160, pp. 106-154, 1962.
[275] D. H. Hubel and T. N. Wiesel, "Brain mechanisms of vision", Scientific American, vol. 241, pp. 45-53,
1979.
[276] D. H. Hubel, "Eye, brain and vision", vol. 22 of Scientific American Library. New York: Scientific
American Press, 1988.
[277] R. A. Hummel, B. B. Kimia, and S. W. Zucker, "Gaussian blur and the heat equation: Forward and inverse
solutions", in Proc. CVPR, pp. 668-671, 1985.
[278] R. A. Hummel and D. Lowe, "Computing Gaussian blur", in Proc. ICPR 1986, pp. 910-912, 1986.
[279] R. A. Hummel, B. B. Kimia, and S. W. Zucker, "Deblurring Gaussian blur", Computer Vision, Graphics,
and Image Processing, vol. 38, pp. 66-80, 1987.
[280] R. A. Hummel, "The scale-space formulation of pyramid data structures", in Parallel Computer Vision (L.
Uhr, ed.), pp. 187-123, Academic Press, New York, 1987.
[281] R. A. Hummel, "Representations based on zero crossings in scale-space", in Proc. IEEE Computer Vision
and Pattern Recognition Conf., pp. 204-209, June 1986. Also in: "Readings in Computer Vision: Issues,
Problems, Principles and Paradigms", M. Fischler and O. Firschein (eds.), Morgan Kaufmann, 1987.
[282] R. A. Hummel and R. Moniot, "Reconstructions from zero-crossings in scale-space", IEEE Trans.
Acoustics, Speech, and Signal Processing, vol. 37, no. 12, pp. 2111-2130, 1989.
[283] T. Iijima, "Basic theory on the normalization of a pattern", Bulletin of Electrical Laboratory, vol. 26, pp.
368-388, 1962.In Japanese.
[284] T. Iijima, "Basic equation of figure and observational transformation", Tr. of the Institute of Electronics and
Communication Engineers of Japan, voI. 54-C, no. 9, pp. 37-38, 1971. English Abstracts.
[285] T. Iijima, "Basic theory on the construction of figure space", Tr. of the Institute of Electronics and
Communication Engineers of Japan, vol. 54-C, no. 8, pp. 35-36, 1971. English Abstracts.
[286] T. lijima, "A suppression kernel of a figure and its mathematical characteristics", Tr. of the Institute of
Electronics and Communication Engineers of Japan, vol. 54-C, no. 9, pp. 30-31, 1971. English Abstracts.
[287] T. Iijima, "Basic theory on normalization of figure", Tr. of the Institute of Electronics and Communication
Engineers of Japan, vol. 54-C, no. 11, pp. 24-25, 1971. English Abstracts.
[288] T. Iijima, "A system of fundamental functions in an abstract figure space", Tr. of the Institute of Electronics
and Communication Engineers of Japan, vol. 54-C, no. 11, pp. 26-27, 1971. English Abstracts.
[289] T. Iijima, "Basic theory on feature extraction of figures", Tr. of the Institute of Electronics and
Communication Engineers of Japan, vol. 54-C, no. 12, pp. 35-36, 1971. English Abstracts.
[290] T. Iijima, "Basic theory on the structural recognition of a figure", Tr. of the Institute of Electronics and
Communication Engineers of Japan, vol. 55-D, no. 8, pp. 28-29, i972. English Abstracts.
435 References

[291] T. Iijima, "Theoretical studies on the figure identification by pattern matching", Tr. of the Institute of
Electronics and Communication Engineers of Japan, vol. 54-D, no. 8, pp. 29-30, 1972. English Abstracts.
[292] T. Jackway, "Morphological scale-space", in Proc. 1 lth Int. Conf. Pattern Recognition (ICPR 11), vol. C,
(The Hague), pp. 252-255, Aug. 30 - Sept. 3 1992.
[293] T. Jackway and M. Deriche, "Scale-space properties of multiscale morphological dilation~erosion", 1EEE
Tr. on Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 38-51, 1996.
[294l P. T. Jackway and M. Deriche, "Scale-space properties of the multiscale morphological dilation-erosion",
IEEE PAMI, vol. 18, no. 1, pp. 38-51, 1996.
[295] M. J~gersand, "Saliency maps and attention selection in scale and spatial coordinates: An information
theoretic approach", in Proc. Fifth Intern. Conf. on Computer Vision (E. Grimson, S. Shafer, A. Blake, and K.
Sugihara, eds.), (MIT Cambridge, MA), pp. 195-202, IEEE, June 20-23 1995.
[296] E. T. Jaynes, "Prior probabilities", Proc. 1EEE SSC-4, pp. 227-241, 1968.
[297] A. D. Jepson and D. J. Fleet, "Scale-space singularities", in Proc. first European Conf. on Computer Vision
(O. Faugeras, ed.), (Berlin), pp. 50-56, Springer-Verlag, 1990. Lecture Notes in Computer Science.
[298] P. Johansen, S. Skelboe, K. Grue, and J. D. Andersen, "Representing signals by their top points in scale-
space", in Proceedings of the 8-th Intern. Conf. on Pattern Recognition, (Paris), pp. 215-217, October 27-31 1986.
[299] Johansen, "On the classification of toppoints in scale-space", Journal of Mathematical Imaging and Vision,
vol. 4, no. 1, pp. 57-68, 1994.
[300] J.-J. Jolion and A. Rozenfeld, "A pyramid framework for early vision". Dordrecht, Netherlands: Kluwer
Academic Publishers, 1994.
[301 ] J. P. Jones and L. A. Palmer, "The two-dimensional spatial structure of simple receptive fields in cat striate
cortex", Journal of Neurophysiology, vol. 58, pp. 1187-1211, 1987.
[302] G. J. Jones and J. Malik, "A computational framework for determining stereo correspondence from a set of
linear spatial filters", Image and Vision Computing, vol. 10, no. 10, pp. 699-708, 1992.
[303] D. B. Judd and G. Wyszecki. "Color in business, science and industry". Wiley, New York, NY, 1975.
[304] J. Kacur, K. Mikula, "Slow and fast diffusion effects in image processing", Computing and Visualization in
Science, Springer Berlin, vol. 3, nr. 4, 2001.
[305] S. N. Kalitzin, B. M. ter Haar Romeny, A. H. Salden, P. F. M. Nacken, and M. A. Viergever, "Topological
numbers and singularities in scalar images, scale-space evolution properties", Journal of Mathematical Imaging
and Vision, vol. 7, 1996.
[306] S. N. Kalitzin, B. M. ter Haar Romeny, and M. A. Viergever, "Invertible orientation bundles on 2D scalar
images", in Proc. First Intern. Conf. on Scale-Space Theory in Computer Vision, Lecture Notes in Computer
Science, (Utrecht, the Netherlands), pp. 77-88, Springer Verlag, July 1997.
1307] S. N. Kalitzin, B. M. ter Haar Romeny, and M. A. Viergever, "On topological deep-structure
segmentation", in Proc. Intern. Conf. on Image Processing, (Santa Barbara, California), pp. 863-866, October
26-29 1997.
[308] S. N. Kalitzin, "Topological numbers and singularities in scalar images, scale-space evolution properties",
in Gaussian Scale-Space Theory (Jon Sporring, Mads Nielsen, Luck Horack and Peter Johansen, Eds.), pp.
181-189, Kluwer Academic Publishers, 1997.
[309] S. N. Kalitzin, B. M. ter Haar Romeny, and M. A. Viergever, "Invertible apertured orientation filters in
image analysis". Intern. J. of Computer Vision, vol. 31, no. 2/3, pp. 145-158, 1998.
[310] K. Kanatani, "Group-theoretical methods in image understanding", vol. 20 of Series in Information
Sciences. Springer-Verlag, 1990.
[311] E. R. Kandel, J. H. Schwartz, and T. M. Jessell, "Principles of Neural Science". McGraw-Hill Companies,
New York, fourth edition, 2000.
[3 i2] D. Kapur and J. L. Mundy, "Geometric Reasoning". MIT Press, 1995.
[313] N. Karssemeijer, "Detection of stellate distortions in mammograms using scale-space operators", in Proc.
Information Processing in Medical Imaging, pp. 335-346, 1995.
[314] N. Karssemeijer and G. te Brake, "Detection of stellate distortions in mammograms", IEEE Tr. Medical
Imaging, vol. 15, no. 5, pp. 611-619, I996.
[315] M. Kass and A. Witkin, "Analyzing oriented patterns", Computer Vision, Graphics, And Image Processing,
vol. 37, pp. 362-385, 1985.
1316] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models", Intern. Journal of Computer
Vision, vol. 1, no. 4, pp. 321-331, 1988.
[317] T. P. Kaushal, "Towards visually convincing image segmentation", Image and Vision Computing, vol. 10,
pp. 617-624, November 1992.
[318] R. G. Kessel and R. H. Kardon, "Tissues and organs, a text atlas of scanning electron microscopy", W. H.
Freeman and Company, San Francisco and Oxford, 1979.
[319] S. Kichenassamy, "Nonlinear diffusions and hyperbolic smoothing for edge enhancement", in Proc. of 12th
Intern. Conf. on Analysis and Optimization of Systems (M. O. Berger, R. Deriche, I. Herlin, J. Jaffr6, and J. M.
References 436

Morel, eds.), vol. 219 of Lecture Notes in Control and Information Sciences, pp. 119-124, Springer, London,
1996.
[320] B. B. Kimia, "Deblurring Gaussian blur, continuous and discrete approaches", Master’s thesis, McGill
University, Electrical Eng. Dept., Montreal, Canada, 1986.
[321] B. B. Kimia, A. Tannenbaum, and S. W. Zucker, "Towards a computational theory of shape, an overview",
in Proc. first European Conf. on Computer Vision, vol. 427 of Lecture Notes in Computer Science, (New York),
pp. 402-407, Springer-Verlag, 1990.
[322] B. B. Kimia, "Entropy scale-space", in Proc. of Visual Form Workshop, (Capri, Italy), Plenum Press, May
1991.
[323] B. B. Kimia, A. Tannenbaum, and S. W. Zucker, "On the evolution of curves via a function of curvature I:
the classical case", Journal of Mathematical Analysis and Applications, vol. 163, pp. 438-458, 1992.
[324] B. B. Kimia and S. W. Zncker, "Analytic inverse of discrete Gaussian blur", Optical Engineering, vol. 32,
no. 1, pp. 166-176, 1986.
[325] B. B. Kimia and K. Siddiqi, "Geometric heat equation and nonlinear diffusion of shapes and images",
Computer Vision and Image Understanding, vol. 64, pp. 305-322, 1996.
[326] L. Kitchen and A. Rosenfeld, "Gray-level comer detection", Pattern Recognition Letters, vol. 1, pp. 95-102,
1982.
[327] F. Klein, "Erlanger Programm", Math. Annalen, vol. 43, pp. 63-100, 1893.
[328] C. B. Knudsen and H. I. Christensen, "On methods for efficient pyramid generation", in Proc. 7th Scand.
Conf. on Image Analysis, (Aalborg, Denmark), pp. 28-39, August 1991.
[329] H. Kobayashi, L. White, Joseph, and A. Abidi, Asad, "An active resistor network for Gaussian filtering of
images", IEEE Journal of Solid-State Circuits, vol. 26, pp. 738-748, May 1991.
[330] J. J. Koenderink and A. J. van Doom, "Geometry of binocular vision and a model for stereopsis",
Biological Cybernetics, vol. 21, pp. 29-35, 1976.
[331] J. J. Koenderink and A. J. van Doom, "Visual detection of spatial contrast; influence of location in the
visual field, target extent and illuminance level", Biological Cybernetics, vol. 30, pp. 157-167, 1978.
[3321 J. J. Koenderink and A. J. van Doom, "The structure of two-dimensional scalar fields with applications to
vision", Biological Cybernetics, vol. 33, pp. 151-158, 1979.
[333] J. J. Koenderink and A. J. van Doom, "The internal representation of solid shape with respect to vision",
Biological Cybernetics, no. 32, pp. 211-216, 1979.
[334] J. J. Koenderink and A. J. van Doom, "Photometric invariants related to solid shape", Optica Acta, vol. 27,
pp. 981-996, 1980.
[335] J. J. Koenderink and A. J. van Doom, "A description of the structure of visual images in terms of an
ordered hierarchy of light and dark blobs", in Second Int. Visual Psycbophysics and Medical Imaging Conf.,
1981.IEEE Cat. No. 81 CH 1676-6.
[336] J. J. Koenderink, "The structure of images", Biological Cybernetics, vol. 50, pp. 363-370, 1984.
[337] J. J. Koendefink, "Simultaneous order in nervous nets from a functional standpoint", Biological
Cybemetics, vol. 50, pp. 35-41, 1984.
[338] J. J. Koenderink, "Geometrical structures determined by the functional order in nervous nets", Biological
Cybernetics, vol. 50, pp. 43-50, 1984.
[339] J. J. Koenderink, "The concept of local sign", in Limits in Perception (A. J. van Doom, W. A. van de Grind,
and J. J. Koenderink, eds.), pp. 495-547, Utrecht: VNU Science Press, 1984.
[340] J. J. Koenderink, A. J. van Doom, and W. A. van de Grind, "Spatial and temporal parameters of motion
detection in the peripheral visual field", Joumal of the Optical Society of America-A, vol. 2, pp. 252-259,
February i985.
[341] J. J. Koenderink, "The structure of the visual field", in The Physics of Structure Formation, Theory and
Simulation (W. Guettinger and G. Dangelmayr, eds.), Springer-Verlag, 1986. Proceedings of an Intern.
Symposium, Tuebingen, Fed. Rep. of Germany, October 27-November 2.
[342] J. J. Koenderink and A. J. van Doom, "Dynamic shape", Biological Cybernetics, vol. 53, pp. 383-396, 1986.
[343] J. J. Koenderink, "Optic flow", Vision Research, vol. 1, pp. 161-180, 1986.
[344] J. J. Koenderink, "Image structure", in Mathematics and Computer Science in Medical Imaging (M. A.
Viergever and A. Todd-Pokropek, eds.), (Berlin), Springer-Verlag, 1986. Proceedings of the NATO Advanced
Study Institute of Mathematics and Computer Science in Medical Imaging, held in I1 Ciocco, Italy, September 21
- October 4.
[345] J. J. Koenderink and A. J. van Doom, "Representation of local geometry in the visual system", Biological
Cybernetics, vol. 55, pp. 367-375, 1987.
[346] J. J. Koenderink, "Design principles for a front-end visual system", in Neural Computers (R. Eckmiller and
C. v. d. Malsburg, eds.), Springer-Verlag, 1987. Proceedings of the NATO Advanced Research Workshop on
Neural Computers, held in Neuss, Fed. Rep. of Germany, September 28-October 2.
437 References

[347] J. J. Koenderink, "An internal representation for solid shape based on the topological properties of the
apparent contour", in Image Understanding (J. Richards and S. Ullman, eds.), pp. 257-285, Norwood, New
Jersey: Alex Publishing Corporation, i987.
[348] J. J. Koenderink and A. J. van Doom, "Facts on optic flow", Biological Cybernetics, vol. 56, pp. 247-254,
1987.
[349] J. J. Koenderink, "Image structure", in Mathematics and Computer Science in Medical Imaging
(Viergever/Todd-Pokropek, ed.), pp. 67-104, Springer-Verlag, 1988. NATO ASI F39.
[350] J. J. Koenderink: "Scale-Time", Biological Cybernetics, vol. 58, pp. 159-162, 1988.
[3511 J. J. Koenderink and A. J. Van Doom, "Operational significance of receptive field assemblies", Biological
Cybernetics, vol. 58, pp. 163-171, 1988.
[352] J. J. Koenderink and W. Richards, "Two-dimensional curvature operators", Journal of the Optical Society
of America-A, vol. 5, no. 7, pp. 1136-1141, 1988.
[353] J. J. Koenderink and A. J. van Doom, "The basic geometry of a vision system", pp. 481-485. Kluwer
Academic Publishers, i988. Trappl, R. (Ed.).
[354] J. J. Koenderink, "Design for a sensorium", pp. 185-207. D-6940 Weinheim, Federal Republic of Germany:
VCH Verlagsgesellschaft mbH, 1988. Editors: von Seelen, Werner and Shaw, Gordon and Leinhos, Ulrich M.
[355] J. J. Koenderink, "A hitherto unnoticed singularity of scale-space", IEEE Tr. on Pattern Analysis and
Machine Intelligence, vol. 11, no. 1 l, pp. 1222-1224, 1989.
13561J. J. Koenderink, "Solid Shape". Cambridge, Mass.: MIT Press, 1990.
[357] J. J. Koenderink and A. J. van Doom, "Receptive field families", Biological Cybernetics, vol. 63, pp.
291-298, 1990.
[358] J. J. Koenderink, "The brain a geometry engine", Psychological Research, vol. 52, pp. 122-127, 1990.
[359] J. J. Koenderink, "Perception and control of self-motion", ln: Some theoretical aspects of optic flow, pp.
53-68. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Inc., 1990.R. Warren, Ed.
[360] J. J. Koenderink and A. J. van Doom, "Advances in neural computers", chapter in: Receptive field
taxonomy. Elsevier, 1990.R. Eckmuller, Ed.
1361[ J. J. Koenderink, "Mapping formal structures on networks", pp. 93-98. North-Holland: Elsevier Science
Publishers B. V., 1991. Eds.: Kohonen, T. and Mfikisara, K. and Simula, O. and Kangas, J.
[3621 J. J. Koenderink and A. J. van Doom, "Affine structure from motion", Journal of the Optical Society of
America-A, vol. 8, no. 2, pp. 377-385, 1991.
[363] J. J. Koenderink, "Local image structure", in Proc. Scand. Conf. on Image Analysis, (Aalborg, DK), pp.
1-7, August 1991.
[364] J. J. Koenderink and A. J. van Doom, "Receptive field assembly pattern specifity", J. of Vis. Comm. and
Im. Repr., vol. 3, no. 1, pp. 1-12, i991.
[365] J. J. Koenderink and A. J. van Doorn, "Generic neighborhood operators", IEEE Tr. on Pattern Analysis and
Machine Intelligence, vol. 14, pp. 597-605, June 1992.
[366] J. J. Koenderink, "Local image structure", in Theory and Applications of Image Analysis (P. Johansen and
S. Olsen, eds.), vol. 2 of Series in Machine Perception and Artificial Intelligence, pp. 15-21, Singapore: World
Scientific, 1992.
[367] J. J. Koenderink, Fundamentals of Bicentric Perspective, vol. 653 of Lecture Notes in Computer Science,
pp. 233-251. Heidelberg Bedim Springer Verlag, 1992. Bensoussan A. and Verjus J. P. (Eds.).
[368] J. J. Koenderink, A. Kappers, and A. van Doom, "Local operations: The embodiment of geometry", in
Artificial and Biological Vision Systems (G. A. Orban and H. H. Nagel, eds.), ESPRIT: Basic Research Series,
pp. 1-23, DG XIII Commision of the European Communities, 1992.
[369] J. J. Koenderink, "Iseikonic invariants in bicentric perspective", Tech. Rep. UBI-T-92.MF-058, Utrecht
Biophysics Research Institute, Division: Medical and Physiological Physics, Buys Ballot Laboratory, Utrecht
University, 1992.
[370] J. J. Koenderink and A. J. van Doom, "Second order optic flow", Journal of the Optical Society of America-
A, vol. 8, no. 2, pp. 530-538, 1992.
[371] J. J. Koenderink and A. J. van Doom, "Surface shape and curvature scales", Image & Vision Computing,
vol. 10, pp. 557-565, 1992.
[372] J. J. Koenderink and A. J. van Doom, "Local features of smooth shapes: Ridges and courses", in Proc. SPIE
Geometric Methods in Computer Vision lI, vol. 2031, (San Diego, CA), pp. 2-13, Proceedings SPIE, July, 12-13
1993.
[373] J. J. Koenderink and A. J. van Doorn, "Two-plus-one dimensional differential geometry", Pattern
Recognition Letters, vol. 2l, pp. 439-443, May 1994.
1374] J. J. Koenderink and A. J. van Doom. "Illuminance texture due to surface mesostmcture". J. Opt. Soc. Am.
A, vol. 13, no. 3, pp. 452-463, 1996.
[375] J.J. Koenderink and A. J. van Doom. "Metamerism in complete sets of image operators". In: Advances in
Image Understanding, Bowyer K., Ahuja N. (eds.), IEEE Computer Society Press, Los Alamitos, California,
I13-129, 1996.
References 438

[376] J.J. Koenderink, A. J. van Doom, C. Christou and J. S. Lappin, "Shape constancy in pictorial relief".
Perception, vol. 25, pp. 155-164, 1996.
[377] J . J . Koenderink, A. J. van Doom, C. Christou and J. S. Lappin, "Perturbation study of shading in
pictures". Perception, vol. 25, pp. 1009-1026, 1996.
[378] J. J. Koenderink, A. J. van Doom and A. M. L. Kappers, "Pictorial surface attitude and local depth
comparisons". Perception & Psychophysies, vol. 58, no. 2, pp. 163-173, 1996.
[379] J.J. Koenderink, A. J. van Doom and M/Stavridi, "Bidirectional reflection distribution function expressed
in terms of surface scattering modes". In: Proc. Europ. Conf. on Computer Vision - ECCV ’96, B. Buxton, R.
Cipolla (eds.), Springer Verlag, Berlin, pp. 28-39, 1996.
[380] J.J. Koenderink, A. M. L. Kappers, J. T. Todd, J. F. Norman and F. Phillips, "Surface range and attitude
probing in stereoscopically presented dynamic scenes". Journal of Experimental Psychology: Human Perception
and Performance, vol. 22, pp. 869-878, 1996.
[381] J. J. Koenderink, "Scale in perspective", in Ganssian Scale-Space, Kluwer Academic Press, 1997. Sporring,
J. et al. (eds.).
[382] J. J. Koenderink and A. J. van Doom, "The generic bilinear calibration-estimation problem". Intern. Journal
of Computer Vision, voI. 23, no. 3, pp. 217-234, 1997.
[383] J. J. Koenderink, A. J. van Doom, A. M. L. Kappers and J. Todd, "The visual contour in depth". Perception
& Psyehophysics, vol. 59, no. 6, pp. 828-838, 1997.
[384] J. J. Koenderink, A. M. L. Kappers, F. E. Pollick and M. Kawato, "Correspondence in pictorial space".
Perception & Psychophysics, vol. 59, no. 6, pp. 813-827, 1997.
[385] J. J. Koenderink, "Receptive field calculus". In: Progress in Neural Networks, Vol. 4: Machine Vision,
Omidvar O.M., Mohan R. (eds.), Ablex Publishing Corporation, Greenwich, Connecticut, pp. 1-28, 1997.
[386] J. J. Koenderink, "Pictorial relief". In: Advances in visual form analysis, Arcelli C., Cordelia L.P., Sanniti
di Baja G. (eds.), World Scientific, Singapore, pp. 308-317, 1997.
[387] J. J. Koenderink and A. J. van Doom, "Image structure". In: Mustererkennung 1997, Paulus E., Walal F.M.
(eds.), Springer-Verlag, Berlin, pp. 3-35, 1997.
[388] J. J. Koenderink and A. J. van Doom, "Local image operators and iconic structure". In: Algebraic frames
for the perception-action cycle, Sommer G., Koenderink J.J. (eds.), Springer-Verlag, Berlin, 66-93, 1997.
[389] J. J. Koenderink, "Color Space". Utrecht University, the Netherlands, 1998.
[390] Koenderink J.J., Doom A.J. van, "The structure of locally orderless images". Intern. Journal of Computer
Vision, vol. 31, no. 2/3, pp. 159-168, 1999.
[391] J. J. Koenderink and A. J. van Doom, "Blur and disorder". In: Scale-space theories in computer vision, M.
Nielsen, P. Johansen, O. F. Olsen and J. Weickert (eds.), Lecture Notes in Computer Science, Vol. 1682,
Springer, Berlin, pp. 1-9, 1999.
[392] J. J. Koenderink and A. J. van Doom, "The structure of colorimetry". In: Algabraic frames for the
perception-action cycle, Sommer G., Zeevi Y.Y. (eds.), Springer, Berlin, pp. 69-77, 2000.
[393] G. Koepfler, C. Lopez, and J. M. Morel, "A multiscale algorithm for image segmentation by variational
method", SIAM Journal on Numerical Analysis, 1994.
[394] A. S. E. Koster, K. L. Vincken, C. N. De Graaf, O. C. Zander, and M. A. Viergever, "Heuristic linking
models in multiscale image segmentation", Computer Vision and Image Understanding, vol. 65, no. 3, pp.
382-402, 1997.
[395] A. Kuijper, L. M. J. Florack, "Calculations on critical points under Gaussian blurring", in Prec. 2"a Intern.
Conf. on Scale-Space Theory in Computer Vision (Corfu, Greece), Lecture Notes in Computer Vision, vol. 1682,
pp. 318-329, 1999.
[396] A. Kuijper, L. M. J. Florack, "Hierarchical pre-segmentation without prior knowledge", in Proe. 8th IEEE
Intern. Conf. on Computer Vision (Vancouver CA), pp. 487-493, 2001.
[397] A. Kuijper, L. M. J. Florack, "Understanding mad modeling the evolution of critical points under Gaussian
blurring", in Proc. 7 th Eur. Conf. on Computer Vision (Copenhagen DK), Lecture Notes in Computer Vision, vol.
2350, pp. 143-157, 2001.
[398] A. Kuijper, L. M. J. Florack, "The relevance of non-genetic events in scale-space models", in Proc. 7 th Eur.
Conf. on Computer Vision (Copenhagen DK), Lecture Notes in Computer Vision, vol. 2350, pp. 190-204, 2001.
[399] A. Kuijper, L. M. J. Florack, "Scale-space hierarchy", J. of Math. Imaging and Vision, 2002.
[400] A. Kuijper, "The deep structure of Gaussian scale-space images", PhD thesis, Utrecht University, 2002.
[401] I. Laptev, H. Mayer, T. Lindeberg, W. Eckstein, C. Steger, A. Banmgartner, "Automatic extraction of roads
from aerial images based on scale-space and snakes", Machine Vision and Applications, 2000.
[402] T. S. Lee, "Image representation using 2D Gabor wavelets", IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 18, pp. 959-971, 1996.
[403[ R. Lenz, "Group theoretical methods in image processing", vol. 413 of Lecture Notes in Computer Science,
Goos, G. and Hartmanis, J. (Eds.), Berlin: Springer Verlag, 1990.
[404] W. R. Levick, "Sampling of information space by retinal ganglion cells", in Visual Neuroscience (J. D.
Pettigrew, K. J. Sanderson, and W. R. Levick, eds.), ch. 3, pp. 33-43, Cambridge University Press, 1986.
439 References

[405] Z. Li and J. J. Atick, "Towards a theory of striate cortex", Neural Computation, vol. 6, pp. 127-146, 1994.
[406] X. Li and T. Chen, "Optimal L~ approximation of the Gaussian kernel with application to scale-space
construction", IEEE Tr. Patter Anal. and Machine Intell., vol. 17, no. 10, pp. 1015-1019, 1995.
[407] L. M. Lifshitz and S. M. Pizer, "A multiresolution hierarchical approach to image segmentation based on
intensity extrema", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 12, no. 6, pp. 529-541, 1990.
[408] T. Lindeberg, "Scale-space for discrete signals", IEEE Tr. on Pattern Analysis and Machine Intelligence,
vol. 12, no. 3, pp. 234-245, 1990.
[409] T. Lindeberg and J. O. Eklundh, "Scale detection and region extraction from a scale-space primal sketch",
in Proc. 3rd Int. Conf. on Computer Vision, (Osaka, Japan), pp. 416-426, December 1990.
[410] T. Lindeberg, "Discrete scale-space theory and the scale-space primal sketch". PhD thesis, Royal institute
of Technology, Department of Numerical Analysis and Computing Science, Royal Institute of Technology, S-100
44 Stockholm, Sweden, May 1991.
[411] T. Lindeberg and J. O. Eklundh, "On the computation of a scale-space primal sketch", Journal of Visual
Comm. and Image Rep., vol. 2, pp. 55-78, 1991.
[412] T. Lindeberg, "Scale-space behaviour of local extrema and blobs", Journal of Mathematical Imaging and
Vision, vol. 1, pp. 65-99, March 1992.
[413] T. Lindeberg, "On the behaviour in scale-space of local extrema and blobs", in Theory and Applications of
Image Analysis (P. Johansen and S. Olsen, eds.), vol. 2 of Series in Machine Perception and Artificial
Intelligence, pp. 38-47, Singapore: World Scientific, 1992.
[414] T. Lindeberg, "Effective scale: A natural unit for measuring scale-space lifetime", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 15, October 1993.
[415] T. Lindeberg and L. M. J. Florack, "On the decrease of resolution as a function of eccentricity for a foveal
vision system", Teeh. Rep. TRITA-NA-P9229, Dept. of Numerical Analysis and Computing Science, Royal
Institute of Technology, November 1992.
[416] T. Lindeberg and J. O. Eklundh, "The scale-space primal sketch: Construction and experiments", Image and
Vision Computing, vol. 10, pp. 3-18, January 1992.
[417] T. Lindeberg and J. Gfirding, "Shape from texture from a multi-scale perspective", in Proceedings of the
fourth ICCV, H. H. Nagel et al., eds., Berlin, Germany, pp. 683-691, IEEE Computer Society Press, 1993.
[418] T. Lindeberg, "Detecting salient blob-iike image structures and their scales with a scale-space primal sketch
- a method for focus-of-attention", Intern. Journal of Computer Vision, vol. 11, no. 3, pp. 283-318, 1993.
[419] T. Lindeberg, "Discrete derivative approximations with scale-space properties: A basis for low-level feature
extraction", Journal of Mathematical Imaging and Vision, vol. 3, no. 4, pp. 349-376, 1993.
[420] T. Lindeberg, "Feature detection with automatic scale selection", Teeh. Rep. ISRN KTH/NA/P-96/18-SE,
KTH - NADA, May 1996. Earlier version presented in Proc. 8th Scandinavian Conf. on Image Analysis, Tromso,
Norway, pp 857-866, 1993.
[421] T. Lindeberg, "Scale-Space Theory in Computer Vision". The Kluwer Intern. Series in Engineering and
Computer Science, Dordrecht, the Netherlands: Kluwer Academic Publishers, 1994.
[422] T. Lindeberg, "Scale-space theory: A basic tool for analysing structures at different scales", J. of Applied
Statistics, 21(2), Supplement on Advances in Applied Statistics: Statistics and Images: 2, pp. 224--270, 1994.
[423] T. Lindeberg and J. Gfirding, "Shape-adapted smoothing in estimation of 3-D depth cues from affine
distortions of local 2-D structure", in Proc. 3rd European Conf. on Computer Vision (J.-O. Eklundh, ed.), vol. 800
of Lecture Notes in Computer Science, (Stockholm, Sweden), pp. 389-400, Springer-Verlag, May 1994.
[424] T. Lindeberg, "On scale selection for differential operators", Proc. 8th Scandinavian Conf. Image Analysis
(K. H. K. A. Hogdra, B. Braathen, ed.), (Tromso, Norway), pp. 857-866, Norwegian Society for Image
Processing and Pattern Recognition, May 1994.
[425] T. Lindeberg, "Scale-space behaviour and invariance properties of differential singularities", in Proc. of the
NATO Advanced Research Workshop Shape in Picture -- Mathematical Description of Shape in Greylevel
Images (Y.-L. O, A. Toet, H. J. A. M. Heijmans, D. H. Foster, and P. Meer, eds.), vol. 126 of NATO ASI Series
F, pp. 591-600, Springer Verlag, Berlin, 1994.
[426] T. Lindeberg and B. M. ter Haar Romeny, "Linear scale-space: I. Basic theory. II. Early visual operations",
in Geometry-Driven Diffusion in Computer Vision (B. M. ter Haar Romeny, ed.), Computational Imaging and
Vision, pp. 1-38,39-72, Dordrecht, the Netherlands: Kluwer Academic Publishers, 1994.
[4271 T. Lindeberg, "Junction detection with automatic selection of detection scales and localization scales", in
Proc. First Intern. Conf. on Image Processing, vol. I, Austin, TX, pp. 924-928, IEEE CS, November 1994.
[428] T. Lindeberg and L. M. J. Florack, "Foveal scale-space and linear increase of receptive field size as a
function of eccentricity", Tech. Rep. ISRN KTH/NA/P-94/27-SE, Dept. of Numerical Analysis and Computing
Science, Royal Institute of Technology, August 1994.
[429] T. Lindeberg, "Scale-space for N-dimensional discrete signals", in Proc. of the NATO Advanced Research
Workshop Shape in Picture - Mathematical Description of Shape in Greylevel Images (Y.-L. O, A. Toet, H. J. A.
M. Heijmans, D. H. Foster, and P. Meer, eds.), vol. 126 of NATO ASI Series F, pp. 571-590, Springer Verlag,
Berlin, 1994.
References 440

1430] T. Lindeberg, "Scale-space theory: a basic tool for analyzing structures at different scales", Journal of
Applied Statistics, vol. 21, no. 2, pp. 223-261, 1994. Special issue on Statistics and Images.
[4311 T. Lindeberg, "Direct estimation of affine deformations of brightness patterns using visual front-end
operators with automatic scale selection", in Proc. 5th Intern. Conf. on Computer Vision (E. Grimson, ed.),
(Cambridge, MA), pp. 134-141, IEEE Computer Society Press, June 1995.
[432] T. Lindeberg, "A scale selection principle for estimating image deformations", Tech. Rep. ISRN
KTH/NA/P-96/16-SE, KTH - NADA, May 1996. Shortened version in Proc. 5th Int. Conf. on Computer Vision,
Boston, Massachussetts, pp 134-141, 1995.
[433] T. Lindeberg and D. FagerstrOm, "Scale-space with causal time direction", Proc. 4 th Europ. Conf. on
Computer Vision, Cambridge, UK, (B. Buxton and R. Cipolla, Eds). April 14-18, vol. 1064 of Lecture Notes in
Computer Science, pp. 229-240, Springer-Veflag, Berlin, 1996.
[434] T. Lindeberg, "Linear spatio-temporal scale-space", Proc. First Intern. Conf. on Scale-Space Theory in
Computer Vision, B.M. ter Haar Romeny ed., Utrecht, Netherlands, Springer-Verlag Lecture Notes in Computer
Science, volume 1252, July 2-4, 1997.
[4351 T. Lindeberg, "On automatic selection of temporal scales in time-casual scale-space", in Proc. AFPAC’97:
Algebraic Frames for the Perception-Action Cycle", G. Sommer and J. J. Koenderink, eds., vol. 1315 of Lecture
Notes in Computer Science, Kiel, Germany, pp. 94--113, Springer Verlag, Berlin, Sept. 1997.
[436] T. Lindeberg and J. G~rding, "Shape-adapted smoothing in estimation of 3-D depth cues from affine
distortions of local 2-D brightness structure", Image and Vision Computing, vol. 15, pp. 415-434, 1997.
[437] T. Lindeberg and Li, "Segmentation and classification of edges using minimum description length
approximation and complementary junction cues", Computer Vision and Image Understanding, vol. 67, no. 1, pp.
88-98, 1997.
[438] T. Lindeberg, "Feature detection with automatic scale selection", Intern. Journal of Computer Vision, vol.
30, no. 2, pp. 77-116, 1998.
[439] T. Lindeberg, "Edge detection and ridge detection with automatic scale selection", Intern. Journal of
Computer Vision, vol. 30, no. 2, pp. 117-154, 1998.
[440] T. Lindeberg, "A scale selection principle for estimating image deformations", Image and Vision
Computing, vol. 16, no. 14, pp. 961-977, 1998.
[441] T. Lindeberg, Lidberg and Roland, "Analysis of brain activation patterns using a 3-D scale-space primal
sketch", Human Brain Mapping, vol 7, no 3, pp 166--194, 1999.
[442] M. Lindenbaum, M. Fischer, and A. Bruckstein, "On Gabor’s contribution to image enhancement", Pattern
Recognition Society, vol. 27, no. 1, pp. 1-8, 1994.
[443] A. Liu, S. M. Pizer, D. Eberly, B. Morse, J. Rosenman, and V. Carrasco, "Volume registration using the 3D
core", in Proc. SPIE Medical Imaging VIII, (Newport Beach, CA), February 1994.
[444] J. Llacer, B. M. ter Haar Romeny, L. M. J. Florack, and M. A. Viergever, "The use of geometric prior
information in bayesian tomographic image reconstruction: a preliminary report", in Proc. SHE Conf. on
Mathematical Methods in Medical Imaging, vol. 1768, San Diego, pp. 82-96, 23-24 July 1992.
[445] J. Llaeer, B. M. ter Haar Romeny, L. M. J. Florack, and M. A. Viergever, "The representation of medical
images by visual response functions", IEEE Engineering in Medicine and Biology, vol. 3, no. 93, pp. 40-47, 1993.
[446] N. K. Logothetis, H. Guggenberger, S. Peled, and J. Pauls, "Functional imaging of the monkey brain".
Nature Neuroscience, volume 2, no 6, pp. 555-562, 1999.
[447] M. Loog, J. J. Duistermaat, and L. M. J. Florack, "On the behavior of spatial critical points under Gaussian
blurring. A folklore theorem and scale-space constraints," in M. Kerckhove, ed., Scale-Space and Morphology in
Computer Vision: Proceedings of the Third International Conference, Scale-Space 2001, Vancouver, Canada, vol.
2106 of Lecture Notes in Computer Science. Berlin: Springer-Verlag, pp. 183-192, 2001.
[448] H. Lotze, "Mikrokosmos". Leipzig: Hirzel, 1884.
[449] A. M. Ldpez, F. Lumbreras, J. Sen’at and J. J. Villanueva, "Evaluation of methods for ridge and valley
detection", IEEE Tr. on Pattern Analysis and Machine Intelligence (PAMI), vol. 21, pp. 327-335, 1999.
[450] A. M. L6pez, F. Lumbreras, J. Senat and J. J. Villanueva, "New improvements in the multiscale analysis of
tmbecular bone patterns", Frontiers in Artificial Intelligence and Applications, Volumen: Pattern Recognition and
Applications, IOS Press - Ohmsa, pags: 251-260, 2000.
[451] A. M. L6pez, F. Lumbreras, J. Serrat and J. J. Villanueva, "MultilocaI creaseness based on the level set
extrinsic curvature", Computer Vision and Image Understanding (CVIU), vol. 77, pp. 111-144, 2000.
[452] K.-C. Low and J. M. Coggins, "Multiscale vector fields for image pattern recognition", tech. rep., Univ.
North Carolina, Dept. of Comp. Science, 1989.
[453] D. G. Lowe, "Organization of smooth image curves at multiple scales", Intern. Journal of Computer Vision,
vol. 3, pp. 119-130, 1989.
[454] Y. Lu and R. C. Jain, "Behaviour of edges in scale-space", IEEE Tr. on Pattern Analysis and Machine
Intelligence, vol. 11, no. 4, pp. 337-357, 1989.
[455] B. Lucas and T.Kanade, "An iterative image-registration technique with an application to stereo vision",
Proc. IJCAI, pp. 674-679, Vancouver, Ca, 1981.
441 References

[456] R. Maas, M. Nielsen, W. J. Niessen, B. M. ter Haar Romeny, and M. A. Viergever, "A scale-space
approach to binocular stereo". In: Abstracts of the ASCI Imaging Workshop 1995 (L. J. van Vliet and I. T.
Young, eds.), (Venray, The Netherlands), p. 34, ASCI, 25-27 October 1995.
[457] R. Maas, M. Nielsen, W. J. Niessen, B. M. ter Haar Romeny, L. M. J. Florack, and M. A. Viergever, "Local
disparity measurements using scalable operators", in Proc. DIKU PhD Summerschool on Gaussian Scale-Space
Theory (P. Johansen, ed.), no. 96/19 in Tech. Rep., (Copenhagen, Denmark), pp. 80-87, DIKU, 10-13 May 1996.
Also in Proc. 5th Danish Conf. on Pattern Recognition and Image Analysis (P. Johansen, ed.), no. 96/22 in Tech.
Rep., (Copenhagen, Denmark), pp. 99-106, 26-27 August 1996.
[458] J. MacLean, S. Raab, L. A. Palmer, "Contribution of linear mechanisms to the specifications of local
motion by simple cells in areas 17 and 18 of the eat", Visual Neuroscience, vol. 1,271-294, 1994.
[459] R. M~ider. Programming in Mathematiea. Addison-Wesley Pub. Co. 3rd edition, 1996.
[460] R. Mader. The Mathematiea programmer II. Academic Press, 1996.
[461] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Extraction of invariant ridgelike features for
CT and MR brain image matching", in Proc. Int. Conf. on Volume Image Processing (M. A. Viergever, ed.),
(Utrecht), pp. 129-132, SCVR, 1993.
[462] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Evaluation of ridge seeking operators for
multimodality medical image matching", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 18, no. 4,
pp. 353-365, 1996.
[463] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Comparison of feature-based matching of CT
and MR brain images", in CVRMed (N. Ayache, ed.), vol. 905 of Lecture Notes in Computer Science, (Berlin),
pp. 219-228, Springer-verlag, 1995.
[464] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Comparison of edge-based and ridge-based
registration of CT and MR brain images", Medical Image Analysis, vol. 1, no. 2, pp. 151-161, 1996.
[465] J. B. A. Maintz, F. J. Beckman, W. de Bruin, P. A. van den Elsen, P. P. van Rijk, and M. A. Viergever,
"Automatic registration and intensity scaling of SPECT brain images", Journal of nuclear medicine, vol. 37, no.
5, supplement, p. 213P, 1996. abstract.
[466] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Registration of SPECT and MR brain images
using a fuzzy surface", in Medical Imaging ’96 - Image processing (M. H. Loew and K. M. Hanson, eds.), vol.
2710, (Bellingham, WA, USA), pp. 821-829, SPIE, 1996.
[467] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever, "Registration of 3D medical images using
simple morphological tools", in IPMI ’97 (J. Duncan and G. Gindi, eds.), vol. 1230 of Lecture Notes in Computer
Science, pp. 204-217, 1997.
[468J J. Malik and P. Perona, "A computational model of texture segmentation", in Proc. CVPR i989, pp.
326-332, 1989.
[469] J. Malik and P. Perona, "Preattentive texture discrimination with early vision mechanisms", JoumaI of the
Optical Society of America, vol. 7, pp. 923-932, May 1990.
[470] J. Malik and R. Rosenholtz, "A differential method for computing local shape-from-texture for planar and
curved surfaces", in Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 267-273,
1993.
14711 R. Malladi, J. Sethian, and B. Vemuri, "Shape modeling with front propagation: a level set approach", IEEE
Tr. on Pattern Analysis and Machine Intelligence, vol. 17, no. 2, pp. 158-174, 1995.
[472] R. Malladi and J. A. Sethian, "Level sets and fast marching methods in image processing and computer
vision", in Proc. third Intern. Conf. on Image Processing (P. Delogne, ed.), pp. 489-492, IEEE, 1996.
[473] S. G. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation", IEEE Tr. on
Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-694, 1989.
[474] S. G. Mallat and S. Zhong, "Characterization of signals from multi-scale edges", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 14, pp. 710-723, 1992.
[475] Z. Z. Manseur and D. C. Wilson, "Decomposition methods for convolution operators", Computer Vision,
Graphics, and Image Processing, vol. 53, pp. 428-434, September 1991.
[476] S. Marcelja, "Mathematical Description of the Responses of Simple Cortical Cells", J. Opt. Soc. Am., Vol.
70, 1980.
[477] D. Marr, "Vision".W.H. Freeman and Co., i882.
[478] D. C. Marr and E. C. Hildreth, "Theory of edge detection", Proc. Roy. Soc. London rm B, vol. 207, pp.
187-217, 1980.
[479] J. B. Martens, "Deblurring digital images by means of polynomial transforms", Computer Vision, Graphics,
and Image and Stochastic Processing, vol. 50, pp. 157-176, 1990.
[480] C. Mason and E. R. Kandel, "Central visual pathways", in Principles of Neural Science, pp. 420-434,
Prentice-Hall Intern. Inc., 1991. Kandel, E. R. and Schwartz, J. H. and Jessell, T. M. (Eds.).
[481] S. Massey and G.A. Jones, "Decomposition and Hierarchy: Efficient Structural Matching of Large Multi-
scale Representations", Proceedings 2. International Conference on Scale-Space Theories in Computer Vision,
Lecture Notes in Computer Vision, volume 1682, 1999.
References 442

[482] J. C. Maxwell, "On hills and dales", The London, Edinghburgh and Dublin Philosophical Magazine and J.
of Science, vol. 40, no. 269, pp. 421-425, 1870.Reprinted in Niven, W. D The Scientific Papers of James Clark
Maxwell, Vol II 1956, Dover Publications New York.
[483] B. A. McGuire, C. D. Gilbert, P. K. Rivlin, T. N. Wiesel, "Targets of horizontal connections in macaque
primary visual cortex. J. Comp. Neurol. vol. 305, pp. 370-392, 1991.
[484] T. Mclnerney and D. Terzopoulos, "Deformable models in medical images analysis: a survey", Medical
Image Analysis, vol. 2, pp. 91-108, 1996.
[485] J. McLean and L. A. Palmer, "Contribution of linear spatiotemporal receptive field structure to velocity
selectivity of simple cells in area 17 of cat.", Vision Research, vol. 29, pp. 675-679, 1989.
[486] T. A. McMahon and J. T. Bonner, "On size and life", Scientific American Books, Inc., W. H. Freeman and
Company, New York, 1983.
[487] M. Michaelis, Low level image processing using steerable filters. PhD thesis, Technische Fakultat der
Christian-Albrechts-Universit~itKiel, Germany, Dec. 1995.
[488] M. Michaelis, G. Sommer, "A Lie group approach to steerable filters". Pattern Recognition Letters, vol. 16,
no.ll, pp. 1165-1174, 1995.
[489] J. Milnor, "Morse theory", vol. 51 of Annals of Mathematics Studies. Princeton University Press, 1963.
[490] A. M. Misha and M. Carver, "The silicon retina", Scientific American, pp. 40-45, May 1991.
[491] C. W. Misner, K. S. Thorne, and J. A. Wheeler, Gravitation. San Francisco: Freeman, 1973.
[492] F. Mokhtarian and A. Mackworth, "Scale-based description of planar curves and two-dimensional shapes",
IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 8, pp. 34-43, 1986.
[493] F. Mokhtarian, "Multi-scale description of space curves and three-dimensional objects", in Proc. IEEE
CVPR, (Ann Arbor, Michigan), 1988.
[494] F. Mokhtarian, "The renormalized curvature scale-space and the evolutions properties of planar curves", in
Proc. IEEE CVPR, (Ann Arbor, Michigan), pp. 318 - 326, 1988.
1495] F. Mokhtarian, "Evolution properties of space curves", in Proc. IEEE CVPR, (Tarpon Springs, Florida), pp.
100-105, 1988.
[496] F. Mokhtarian, "Fingerprint theorems for curvature and torsion zero-crossing", in Proc. IEEE CVPR, (San
Diego, California), pp. 269-275, 1989.
[497] F. Mokhatarian and A. Mackworth, "A theory of multi-scale, curvature-based shape representation for
planar curves", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 14, pp. 789-805, 1992.
[498] F. Mokhtarian, "Multi-scale torsion-based shape representations for space curves", in Proc. IEEE CVPR,
(New York City, NY), 1993.
[499] O. Monga, N. Ayache, and P. T. Sander, "From voxel to intrinsic surface features", Image and Vision
Computing, vol. 10, pp. 403-417, July/August 1992.
[500] O. Monga and S. Benayoun, "Using partial derivatives of 3D images to extract typical surface features",
Computer Vision and Image Understanding, vol. 61, no. 2, pp. 171-189, 1995.
[501] J. M. Morel and S. Solimini, "Segmentation of images by variational methods: A constructive approach",
Rev. Matematica de la Universidad Complutense, vol. 1, no. 3, pp. 169-182, 1988.
[502] J. M. Morel and S. Solimini, Variational Methods in Image Segmentation. No. 14 in Progress in Non-linear
Differential Equations and their Applications, Basel, Switzerland: Birkh~iuser Vedag, 1995.ISBN 3-7643-3720-6.
[503] Monison, "Powers of ten: about the relative size of things in the universe". W. H. Freeman and Company,
1985. See also
[504] B. S. Morse, S. M. Pizer, and A. Liu, "Multiseale medial analysis of medical images", in Information
Processing in Medical Imaging (IPMI 14) (H. Barrett and A. Gmitro, eds.), (Berlin), Springer-Verlag, 1993.
[505] B. S. Morse, S. M. Pizer, D. T. Puff, and C. Gu, "Zoom-invariant vision of figural shape: Effects on cores
of image disturbances", Tech. Rep. TR96-005, University of North Carolina, Dept. of Computer Science, 1996.
[506] D. Mumford and J. Shah, "Boundary detection by minimizing functionals", in Proc. IEEE Conf. on
Computer Vision and Pattern Recognition, (San Francisco), 1985.
[507] D. Mumford and J. Shah, "Optimal approximations by piecewise smooth functions and associated
variational problems", Communications on Pure and Applied Mathematics, vol. XLll, pp. 577-685, July 1989.
1508] D. Mumford, "On the computational architecture of the neocortex. I. the role of the thalamo-cortical loop",
Biological Cybernetics, vol. 65, pp. 135-145, 1991.
[509] D. Mumford, "On the computational architecture of the neocortex: IL The role of cortico-eortical loops:.
Biological Cybernetics, vol. 66, pp. 241-251, 1992.
[510] D. Mumford, "Bayesian rationale for the variational formulation", in Geometry-Driven Diffusion in
Computer Vision (B. M. ter Haar Romeny, ed.), Computational Imaging and Vision, pp. 135-146, Kluwer
Academic Publishers B.V., 1994.
[511] J. L. Mundy and A. Zisserman, eds., Geometric Invariance in Computer Vision.Cambridge, Massachusetts:
M1T Press, 1992.
443 References

[512] H. Neumann, H. Ottenberg, and H. S. Stiehl, "Accuracy of regularized differential operators for
discontinuity localization in 1-D and 2-D intensity functions", Tech. Rep. FBI-HH-M-186/90, Universit"at
Hamburg, Fachbereich Informatik, 1990.
[5131 H. Neumann and H. S. Stiehl, "A competitive/cooperative (artificial neural) network approach to the
extraction of n-th order junctions", in Proceedings of the 1 lth DAGM-Symposium, Hamburg (H. Burkhardt, K.-
H. Hoehne, and B. Neumann, eds.), (Berlin), Springer-Verlag, 1989.
[5141 M. Nielsen, "Isotropic regularization", in Proc. British Machine Vision Conf., pp. 135-144, 1993.
[515] M. Nielsen, "From paradigm to algorithms in computer vision". PhD thesis, Datalogisk Institut
Kopenhagen University’, Denmark, Dept. of Computer Science, Universitetsparken 1, DK-2100 Kopenhagen 0,
Denmark, 1995. ISSN 0107-8283.
[516] M. Nielsen and R. Deriche, "Binocular dense depth reconstruction using isotropy constraint", in Proc. 9th
Scand. Conf. on Image Analysis, (Uppsala, Sweden), pp. 49-56, 1995.
[517] M. Nielsen, L. M. J. Flomck, and R. Deriche, "Regularization, scale-space, and edge detection filters", in
Proc. Fourth European Conf. on Computer Vision, (Cambridge, UK), April 14-18 1996.
[518] M. Nielsen, R. Maas, W. Niessen, L. Florack, and B. M. ter Haar Romeny, "Local disparity structure by
scale-space operators", Tech. Rep. 96-17, DIKU Computer Science Department, Copenhagen University, 1996.
[519] M. Nielsen, "Scale-Space Generators and Functionals". In J. Sporring, M. Nielsen, L. Florack, and P.
Johansen (eds.) Ganssian Scale-Space Theory, pp. 99-114, Kluwer Academic Publishers, 1997.
[5201 M. Nielsen, L. M. J. Florack, and R. Deriche, "Regularization, scale space, and edge detection filters", J.
Mathematical Imaging and Vision, vol. 7, pp. 291-307, October 1997.
[52111 M. Nielsen and O. F. Olsen, "The structure of the optic flow field", Proc. ECCV, Lecture Notes in
Computer Science, vol. 1407, pp. 271-287, 1994.
[52211M. Nielsen, P. Johansen, O. F. Olsen, J. Weickert (Eds.), "Scale-space theories in computer vision", Lecture
Notes in Computer Science, Vol. 1682, Springer, Berlin, 1999. ISBN 3-540-66498-X.
[5231 M. Nielsen, M. Lillholm, "What do features tell about images?", In "Scale-space theories in computer
vision", Lecture Notes in Computer Science, Vol. 2106, pp. 39-50, Springer, Berlin, 2001.
[524] W. J. Niessen, B. M. ter Haar Rorneny, and M. A. Viergever, "Numerical analysis of geometry-driven
diffusion equations", in Geometry-Driven Diffusion in Computer Vision (B. M. ter Haar Romeny, ed.), vol. 1 of
Computational Imaging and Vision, pp. 393-410, Dordrecht: Kluwer Academic Publishers, 1994.
[525] W. J. Niessen, B. M. ter Haar Romeny, L. M. J. Florack, A. H. Salden, and M. A. Viergever, "Nonlinear
diffusion of scalar images using well-posed differential operators", in Proc.of Computer Vision and Pattern
Recognition, (Seattle, WA), pp. 92-97, IEEE Computer Society Press, 1994.
[526] W. J. Niessen, J. S. Duncan, L. M. J. Florack, B. M. ter Haar Romeny, and M. A. Viergever,
"Spwfiotemporal operators and optic flow", in Physics-Based Modeling in Computer Vision (S. T. Huang and D.
N. Metaxas, eds.), pp. 78-84, IEEE Computer Society Press, 1995.
[5271 W. J. Niessen, J. S. Duncan, B. M. ter Haar Romeny, and M. A. Viergever, "Spatiotemporal analysis of left
ventricular motion", in Medical Imaging 95: Image Processing (M. H. Loew, ed.), pp. 250-261, SPIE Press,
Bellingham, 1995.
[5281 W. J. Niessen, J. S. Duncan, and M. A. Viergever, "A scale-space approach to motion analysis", in
Computing Science in the Netherlands 95 (J. C. Van Vliet, ed.), pp. i70-181, Stichting Mathematisch Centrum,
Amsterdam, 1995.
[529] W. J. Niessen, B. M. ter Haar Romeny, L. M. J. Florack, and M. A. Viergever, "A general framework for
geometry-driven evolution equations", Intern. Journal of Computer Vision, vol. 21, no. 3, pp. 187-205, 1997.
[530] W. J. Niessen, K. L. Vincken, and M. A. Viergever, "Comparison of multiscale representations for a
linking-based image segmentation model", in Proc. IEEE Workshop on Mathematical Methods in Biomedical
Image Analysis, (San Francisco), pp. 263~272, 1996.
[53111 W. J. Niessen, M. Nielsen, L. M. J. Florack, R. Maas, B. M. ter Haar Romeny, and M. A. Viergever,
"Multiscale optic flow using physical constraints", in Proc. DIKU PhD Summerschool on Ganssian Scale-Space
Theory, no. 96/19 in DIKU Tech. Rep., (Copenhagen, Denmark), 1996.
[532] W. J. Niessen, J. S. Duncan, M. Nielsen, L. M. J. Florack, B. M. ter Haar Romeny, and M. A. Viergever,
"A multi-scale approach to image sequence analysis", Computer Vision and Image Understanding, vol. 65, no. 2,
pp. 259-268, 1997.
[533] W. J. Niessen and R. Maas, "Multiscale optic flow and stereo", in Gaussian Scale-Space Theory (J.
Sporting, M. Nielsen, L. Florack, and P. Johansen, eds.), Computational Imaging and Vision, pp. 31-42,
Dordrecht: Kluwer Academic Publishers, 1997.
[53411 W. J. Niessen, B. M. ter Haar Romeny, L. M. J. Florack, and M. A. Viergever, "A general framework for
geometry-driven evolution equations", Intern. Journal of Computer Vision, vol. 21, no. 3, pp. 187-205, 1997.
[53511 W. J. Niessen, A. M. Lopez, W. J. Van Enk, P. M. Van Roermund, B. M. ter Haar Romeny, and M. A.
Viergever, "In vivo analysis of trabecular bone architecture", in Proc. hfformation Processing in Medical Imaging
1997 (J. S. Duncan and G. Gindi, eds.), vol. 1230 of Lecture Notes in Computer Science, pp. 435-440, 1997.
[536] W.J. Niessen, K.L. Vincken, J. Weickert, M.A. Viergever, "Nonlinear multiscale representations for image
segmentation", Computer Vision and Image Understanding, Vol. 66, 233-245, 1997.
References 444

[537] W. J. Niessen, K. L. Vincken, J. Weickert, and M. A. Viergever, "Three-dimensional MR brain


segmentation", in Proc. Sixth Int. Conf. on Computer Vision (ICCV ’98, Bombay, Jan. 4-7, 1998), 53-58, 1998.
[538] W.J. Niessen, K.L. Vincken, J. Weickert, B.M. ter Haar Romeny, M.A. Viergever, "Multiscale
segmentation of three-dimensional MR brain images", Intern. Journal of Computer Vision, Vol. 31, 185-202,
1999.
[539] M. Nitzberg and T. Shiota, "Nonlinear image filtering with edge and comer enhancement", IEEE Tr. on
Pattern Analysis and Machine Intelligence, vol. 14, no. 8, pp. 826-833, 1992.
[540] N. NordstrOm, "Biased anisotropic diffusion -- a unified regularization and diffusion approach to edge
detection", Image and Vision Computing, vol. 8, no. 11, pp. 318-327, 1990. Also in: Proc. 1st European Conf. on
Computer Vision, LNCS-Series VoI. 427, Springer-Verlag, pages 18-27.
[541] I. Ohzawa, G. C. DeAngelis, and R. D. Freeman, "Stereoscopic depth discrimination in the visual cortex:
Neurons ideally suited as disparity detectors", Science, vol. 249, pp. 1037-1041, August 1990.
[542] I. Ohzawa, G. C. DeAngelis, and R. D. Freeman, "Encoding of binocular disparity by simple cells in the
cat’s visual cortex", Journal of Neurophysiology, vol. 75, no. 5, pp. 1779-1805, 1996.
[543] M. Okutomi and T. Kanade, "A locally adaptive window for signal matching", Intern. Journal of Computer
Vision, vol. 7, no. 2, pp. 143-162, 1992.
[544] O. F. Olsen and M. Nielsen, "Multi-scale gradient magnitude watershed segmentation". Lecture Notes in
Computer Science, vol. 1310, pp 6-13, 1997.
[545] O. F. Olsen, "Generic image structure", PhD Thesis, University of Copenhagen, Dept. of Computer
Science, Techn. Rep. DIKU-00-04, June 2000.
[546] B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive field properties by learning a sparse
code for natural images". Nature, 381: 607-609, 1996.
[547] B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: A strategy employed by
VI?", Vision Research, 37: 3311-3325, 1997.
[548] J. Olver, "Applications of Lie Groups to Differential Equations", vol. 107 of Graduate Texts in
Mathematics. Springer-Verlag, 1993.
15491 J. Olver, G. Sapiro, and A. Tanuenbaum, "Differential invariant signatures and flows in computer vision: A
symmetry group approach", in Geometry-Driven Diffusion in Computer Vision (B. M. ter Haar Romeny, ed.),
Computational Imaging and Vision, pp. 255-306, Dordrecht: Kluwer Academic Publishers, 1994.
[550] J. Olver, Equivalence, Invariants, and Symmetry.Cambridge University Press, 1995.
[551] A. H. J. Oomes and P. R. Snoeren, "Structural information in scale space", in Proceedings DIKU PhD
Summerschool on Classical Scale-Space Theory, (Copenhagen, Denmark), 1996.
[552] S. Osher and J. Sethian, "Fronts propagating with curvature dependent speed: algoritms based on the
Hamilton-Jacobi formalism", Journal of Computational Physics, vol. 79, pp. 12-49, 1988.
[553] G. Osterberg, Topography of the layer of rods and cones in the human retina", Acta Ophthalmologica, vol.
6, pp. 1-103, 1935.
[5541 N. Otsu, Mathematical Studies on Feature Extraction in Pattern Recognition. PhD thesis, Researches of the
Electrotechnical Laboratory, Ibaraki, Japan, 1981.
[555] N. Otte and H.-H. Nagel, "Optical flow estimation: Advances and comparisons", in: Proc. European Conf.
on Computer Vision (Stockholm), Lecture Notes in Computer Science, vol. 800, pp. 51-60, Springer Berlin,
1994.
[556] J. Pacacios, Dimensional Analysis. London: MacMillan and Co. Ltd, 1964.
[557] R. Pankhurst, "Dimensional Analysis and Scale Factors". Chapman and Hall Ltd, London, 1964.
[558] A. C. Papanicolaou, "Fundamentals of Functional Brain Imaging", Swets & Zeitlinger, 2000.
[5591 E. J. Panwels, M. Proesmans, L. J. Van Gool, T. Moons, and A. Oosterlinck, "Image enhancement using
coupled anisotropic diffusion equations", in Proc. on the 1 lth European Conf. on Circuit Theory and Design, vol.
2, pp. 1459-1464, 1993.
[560] E. J. Pauwels, P. Fiddelaers, T. Moons, and L. J. van Gool, "An extended class of scale-invariant and
recursive scale-space filters", IEEE Tr. on Pattern Anal. and Machine Perception, vol. 17, no. 1, pp. 691-701,
1995.
[561] S. Pedersen and M. Nielsen, "The Hausdorff dimension and scale-space normalisation of natural images, J.
of Visual Communication and Image Representation, vol. 11, no. 2, pp. 266-277, 2000.
[562~ P. Perona and J. Malik, "Scale-space and edge detection using anisotropic diffusion", in IEEE Computer
Society Workshop on Computer Vision, (Miami, FL), pp. 16-22, 1987.
[5631 P. Perona and J. Malik, "Scale-space and edge detection using anisotropic diffusion", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 12, pp. 629-639, July 1990.
[564~ P. Perona, "Deformable kernels for early vision", in IEEE CVPR, pp. 222-227, June 1994.
[565] P. Perona, "Steerable-scalable kernels for edge detection and junction analysis", in Proc. 2nd European
Conf. on Computer Vision, (Santa Margherita Ligure, Italy), pp. 3-18, May 1992.
445 References

[566] Perona, T. Shiota, and J. Malik, "Anisotropic diffusion", in Geometry-Driven Diffusion in Computer Vision
(B. M. ter Haar Romeny, ed.), Computational Imaging and Vision, pp. 73-92, Kluwer Academic Publishers B.V.,
1994.
[5671 P. Perona, "Deformable kernels for early vision." IEEE Pattern Anal. and Machine Perc., vol. 17, no. 5, pp.
488-499, 1995.
[568] E. Peterhans and R. v o n d e r Heydt, "Subjective contours - bridging the gap between psychophysics and
physiology", Trends in Neurosciences, vol. 14, no. 3, pp. 112-119, 1991.
[569] E. Peterhans and R. yon der Heydt, Representation of vision. Trends and tacit assumptions, ch. Elements of
form perception in monkey prestriate cortex, pp. lll-124.Cambridge: Cambridge University Press, 1991. A.
Gorea, Y. Fregnac, Z. Kapoula and J. Findlay, Eds.
[570] M. Petrou and P. Bosdogianni, "Image processing; the fundamentals", John Wiley & Sons, Chichester,
1999.
[571] C. A. Pickover, "Strange Brains and Genius: The Secret Lives of Eccentric Scientists and Madmen",
Plenum Publishing, 1998. ISBN 0-306-45784-9.
[572] M. A. Piech, "Decomposing the Laplacian", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol.
12, no. 8, pp. 830-831, 1990.
[573] S. M. Pizer, J. J. Koenderink, L. M. Lifshitz, L. Helmink, and A. D. J. Kaasjager, "An image description for
object definition, based on extremal regions in the stack", hfformation Processing in Medical Imaging,
Proceedings of the 8th conference, pp. 24-37, 1985.
[574] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. M. ter Haar Romeny, J.
B. Zimmerman and K. J. Zuiderveld: "Adaptive Histogram Equalization and its variations", Computer Vision,
Graphics and Image Processing, vol. 39, pp. 355-368, 1987.
[575] S. M. Pizer, J. M. Ganch, L. M. Lifshitz, and W. R. Oliver, "hnage description via annihilation of essential
structures", Tech. Rep. TR88-001, University of North Carolina at Chapel Hill, 1988.
[576] S. M. Pizer, J. M. Gauch, J. M. Coggins, R. E. Fredericksen, T. J. Cullip, and V. L. Interrante, "Multiscale,
geometric image descriptions for interactive object definition", in Mustererkennung 1989 (Proc. 1 lth Symposium
of DAGM), Informatik-Fachberichte 219, pp. 229-239, DAGM [The German Association for Pattern
Recognition], Springer-Veflag, 1989.
[577] S. M. Pizer, J. M. Ganch, T. J. Cullip, and R. E. Fredericksen, "Descriptions of intensity structure via scale
and symmetry", in Proceedings First Conf. on Visualization in Biomedical Computing, pp. 94-101, 1990.
[578] S. M. Pizer and B. M. ter Haar Romeny, "Fundamental properties of medical image perception", Journal of
Digital Imaging, vol. 4, pp. 1-20, Febr. 1990.
[579] S. M. Pizer, C. A. Burbeck, J. M. Coggins, D. S. Fritsch, and B. S. Morse, "Object shape before boundary
shape: Scale-space medial axes", Journal of Mathematical Imaging and Vision, vol. 4, no. 3, pp. 303-313, 1994.
[580] T. Poggio, H. Voorhees, and A. Yuille, "A regularized solution to edge detection", AI memo 833, MIT,
May 1985.
[581] T. Poggio, V. Torre, and C. Koch, "Computational vision and regularization theory", Nature, vol. 317, pp.
314-319, 1985.
[582] G. F. Poggio, F. Gonzalez, and F. Krause, "Stereoscopic mechanisms in monkey visual cortex: Binocular
correlation and disparity sensitivity", J. Neurosc, vol. 8, no. 12, pp. 4531-4550, 1988.
[583] D. A. Pollen and S. F. Ronner, "Phase Relationships Between Adjacent Simple Cells in the Visual Cortex,
Science Vol. 212, 1981.
[584] D. A. Pollen and S. F. Ronner, "Spatial computation performed by simple and complex cells in the visual
cortex of the cat", Vision Research, vol. 22, pp. 101-118, 1982.
[585] E. Prppel, "Time Perception", In: Handbook of Sensory Physiology, R. Held, H.W. Leibowitz, H.-L.
Teuber, eds., pp. 713-729, Springer, Heidelberg (1978).
[5861 T. Poston and I. Steward, "Catastrophe theory and its applications". London: Pitman, 1978.
[587] F. Preteux, "Watershed and skeleton by influence zones: A distance-based approach", Journal of
Mathematical Imaging and Vision, vol. 1, pp. 239-256, September 1992.
[588] K. H. Pribram, "Brain and perception: Holonomy and structure in figural processing". Lawrence Erlbaum
Associates, 1991.
[589] M. Proesmans, E. Pauwels, and L. Van Gool, "Coupled geometry-driven diffusion equations for low level
vision", in Geometry-Driven Diffusion in Computer Vision (B. M. ter Haar Romeny, ed.), pp. 191-228, Kluwer
Academic Publishers B.V., 1994.
[590] M. H. Protter and H. F. Weinberger, "Maximum principles in differential equations". New York: Prentice-
Hall, 1984.
[591] D. Puff, D. Eberly, and S. Pizer, "Object-based interpolation via the multiseale medial axis", in Proc. SPIE
Medical Imaging VIII, February 1994.
[592] E. Radmoser, O. Scherzer, J. Weickert, "Scale-space properties of regularization methods",
M. Nielsen, P. Johansen, O.F. Olsen, J. Weickert (Eds.), Scale-space theories in computer vision, Lecture Notes
in Computer Science, Vol. 1682, Springer, Berlin, 211-222, 1999.
References 446

[593] E. Radmoser, O. Scherzer, J. Weickert, "Scale-space properties of nonstationary iterative regularization


methods", Journal of Visual Communication and Image Representation (Special Issue on Scale-Space Theories in
Computer Vision, invited paper), 2000.
[594] S. V. Raman, S. Sarkar, and K. L. Boyer, "Tissue boundary refinement in magnetic resonance images using
contour-based scale-space matching", IEEE Tr. on Medical Imaging, vol. 10, pp. 109-121, June 1991.
[595] K. Rangarajan, M. Shah, and D. Van Brackle, "Optimal corner detector", in Proc. IEEE ICCV, (Tampa,
FL), pp. 90-94, 1988.
[596] A. R. Rao and B. G. Schunk, "Computing oriented texture fields", Computer Vision, Graphical Models and
Image Processing, vol. 53, pp. 157-185, 1991.
[597] R. P. N. Rao, B. A. Olshausen, and Michael S. Lewicki (Eds.), "Probabilistic models of the brain:
perception and neural function", MIT Press, 2001.
[598] Lord Rayleigh, "The principle of similitude", Nature, vol. XCV, pp. 66-68, 644, March 1915.
[599] W. E. Reichardt, "Autocorrelation, a principle for the evaluation of sensory information by the central
nervous system". W. A. Rosenblith (ed.). MIT Press, Cambridge Mass., 1961.
[600] W. E. Reichardt, "Movement perception in insects", In W. E. Reiehardt (ed.), Processing of optical data by
organisms and by machines. New York, Academic Press, 1961.
[601] W. E. Reichardt and M. Egelhaaf, "Properties of individual movement detectors as derived from
behavioural experiments on the visual system of the fly", Biological Cybernetics, vol. 58, pp. 287-294, 1988.
[602] Z. Rdti, "Deblurring images blurred by the discrete Gaussian", AML, vol. 8, no. 4, pp. 29-35, 1995.
[603] W. Richards, ed., Natural computation.Cambridge, Ma.: MIT Press, 1988.
[604] J. H. Rieger, "Generic evolutions of edges on families of diffused greyvalue surfaces", Journal of
Mathematical Imaging and Vision, vol. 5, pp. 207-217, September 1995.
[605] D. L. Ringach, G. Sapiro and R. Shapley, "A subspace reverse correlation technique for the study of visual
neurons", Vision Research, Vol 37, No 17, pp. 2455-2464, 1997.
[606] J. Rissanen, "Modeling by the shortest data description," Automatica, vol. 14, pp. 465-471, 1978.
[607] G. X. Ritter and J. N. Wilson, "Handbook of computer vision algorithms in image algebra", CRC Press,
Boca Raton, 2001.
[608] R. W. Rodieck, "The first steps in seeing". Sinauer Associates, Inc., Sunderland MA, 1998.
[609] A. Rosenfeld and M. Thurston, "Edge and curve detection for visual scene analysis", IEEE Trans. on
Computers, vol. C-20, pp. 562-569, May 1971.
[610] A. Rosenfeld, "Multiresolution Image Processing and Analysis", vol. 12 of Springer Series in Information
Sciences. Springer-Verlag, 1984.
161 ll P. L. Rosin, A. C. F. Colchester, and D. J. Hawkes, "Early image representation using regions defined by
maximum gradient profiles between singular points", Pattern Recognition, vol. 25, no. 7, pp. 695-711, 1992.
[612] L. Rosin, "Representing curves at their natural scales", Pattern Recognition, vol. 25, no. 11, pp. 1315-1325,
1992.
[613] J. Rubner and K. Schulten, "Development of feature detectors by self-organization", Biological
Cybernetics, vol. 62, pp. 193-199, 1990.
[614] L. I. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms", Physica
D, vol. 60, pp. 259-268, 1992.
[615] H. Ruskeep~i~i, "Mathematica Navigator: Graphics and methods of applied Mathematics". Academic Press,
London, 1999. ISBN 0126036403.
[616] J. C. Russ, The Image Processing Handbook. Boca Raton: CRC Press, 1994. Second Edition.
[617] P. Saint-Marc, J. S. Chen, and G. Medioni, "Adaptive smoothing: A general tool for early vision", IEEE Tr.
on Pattern Analysis and Machine Intelligence, vol. 13, no. 6, pp. 514-529, 1991.
[6181 A. H. Salden, B. M. ter Haar Romeny, L. M. J. Floraek, J. J. Koenderink, and M. A. Viergever, "A
complete and irreducible set of local orthogonally invariant features of 2-dimensional images", in Proceedings
1 lth 1APR Internat. Conf. on Pattern Recognition (I. T. Young, ed.), (The Hague, the Netherlands), pp. 180-184,
IEEE Computer Society Press, Los Alamitos, August 30-September 3 1992.
[619] A. H. Salden, L. M. J. Florack, B. M. ter Haar Romeny, J. J. Koenderink, and M. A. Viergever, "Multi-
scale analysis and description of image structure", Nieuw Archief voor Wiskunde, vol. 10, no. 3, pp. 309-326,
1992.
[6201 A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Image structure generating normalised
geometric scale spaces", in Volume Image Processing ’93 (M. A. Viergever, ed.), (Utrecht, the Netherlands), pp.
141-143, 1993.
[621 ] A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Dynamic scale-space theories", in Proc. Conf.
on Differential Geometry and Computer Vision: From Pure over Applicable to Applied Differential Geometry,
(Nordfjordeid, Norway), August 1-7 1995.
[622] A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Classical scale space theory from physical
principles", Journal of Mathematical Imaging and Vision, 1998.
447 References

1623] A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Linear scale space theory from physical
properties", J. of Mathematical Imaging and Vision, vol. 9. no.2, pp. 103-140, 1998.
[624] A. H. Salden, B. M. ter Haar Romeny, and M. A. Viergever, "Algebraic invariants of linear scale spaces."
Journal of Mathematical Imaging and Vision, March 1999.
[625] P. Sander and S. W. Zucker: "Singularities of principal direction fields from 3D images", IEEE tr. on
Pattern Analysis and Machine Intelligence, vol. 14, no. 3., pp. 30%317, 1992.
[626] W. Sanns, "Catastrophe theory with Mathematica, a geometric approach", Der Andere Verlag, Osnabrtick,
Germany. ISBN 3-934366-76-7.
[627] G. Sapiro and A. Tannenbaum, "Affine invariant scale-space", Intern. Journal of Computer Vision, vol. 11,
pp. 25-44, 1993.
[628] G. Sapiro and A. Tannenbaum, "On invariant curve evolution and image analysis", Indiana Journal of
Mathematics, vol. 42, no. 3, pp. 985-1009, 1993.
[629] G. Sapiro and A. Tannenbaum, "Area and length preserving geometric invariant scale-spaces", Tech. Rep.
LIDS-2200, MIT, 1993. Accepted for publication in IEEE-PAMI. Also in Proc. ECCV ’94, Stockholm, May 1994.
[630] G. Sapiro, "From active contours to anisotropic diffusion: connections between basic pde’s in image
processing", in Proc. third Intern. Conf. on Image Processing (P. Delogne, ed.), pp. 477-480, IEEE, 1996.
[631] W. Scanns, "Catastrophe theory with Mathematica, a geometric approach". Der Andere Veflag, ISBN
3934366767, 2000.
[632] H. Scharr, J. Weickert, "An anisotropic diffusion algorithm with optimized rotation invariance",
G. Sommer, N KrUger, C. Perwass (Eds.), Mustererkennung 2000, Springer, Berlin, 460-467, 2000.
[633] O. Seherzer and J. Weickert, "Relations between regularization and diffusion filtering", J. of Math. Imaging
and Vision, vol. 12, no. 1, pp. 43-63, 2000. Revised version of Technical Report DIKU-98/23, Dept. of Computer
Science, University of Copenhagen, Denmark, 1998.
[634] C. Schmidt and R. Mohr, "Combining greyvalue invariants with local constraints for object recognition", in
Proc. Intern. Conf. on Computer Vision and Pattern Recognition CVPR, (San Francisco), IAPR, June 16-20 1996.
[635] C. Schmidt and R. Mohr, "Object recognition using local characterization and semi-local constraints", tech.
rep., INRIA, 1996.
[636] C. Schn0rr, J. Weickert, "Variational image motion computation: theoretical framework, problems and
perspectives", G. Sommer, N KrUger, C. Perwass (Eds.), Mustererkennung 2000, Springer, Berlin, 476-487,
2000. Invited paper.
[637] I. J. Sch6nberg, "On smoothing operations and their generating functions", Bull. Amer. Math. Soc., vol. 59,
pp. 199-230, 1953.
[638] B. G. Schunck, "The motion constraint equation for optical flow", in Proceedings of the 7 th Intern. Conf. on
Pattern Recognition, pp. 22-24, 1984.
[639] L. Schwartz, "Theorie des distributions", vol. I, II of Actualites scientifiques et industrielles; Vol. 1091 and
1122. Paris: Publications de l’Institut de Math6matique de l’Universit6 de Strasbourg, 1950-1951. See also:
Hermann, Paris, 1951, 2nd edition 1966.
[640] E. L. Schwartz, "Topographical mapping in primate visual cortex: history, anatomy, and computation". In
D.H. Kelly, editor, Visual Science and Engineering: models and applications, chapter 8, pages 293-360. Marcell
Dekker, Inc, New York, 1994.
[641] A. Seckel, "The art of optical illusions", Carlton Books Ltd., 2000.
[642] J. Serra, Image Analysis and Mathematical Morphology. London, New York, Paris, San Diego, San
Francisco, S ao Paulo, Sydney, Tokyo and Toronto: Academic Press, 1982.
[643] J. A. Sethian, An Analysis of Flame Propagation. Ph.D. thesis, Dept. of Mathematics, University of
California, Berkeley, CA, 1982.
[644] J. A. Serbian, "Curvature and the evolution of fronts", Communications Mathematical Physics, vol. 101, pp.
487-499, 1985.
[645] J. A. Sethian, "A review of recent numerical algorithms for hypersurfaces moving with curvature dependent
speed", J. Differential Geometry, vol. 31, pp. 131-161, 1989.
[646] Sethian, J.A., Fast Marching Methods and Level Set Methods: Evolving Interfaces in Computational
Geometry, Fluid Mechanics, Computer Vision and Materials Sciences, Cambridge University Press, 1999.
[647] J. Shah, "Segmentation by nonlinear diffusion", Proc. Conf. on Computer Vision and Pattern Recognition,
pp. 202-207, June 1991.
[648] J. Shah, "Segmentation by minimizing functionals: Smoothing properties", SIAM J. Control and
Optimization, vol. 30, pp. 99-111, January 1992.
[649] S. M. Sherman and C. Kock, "The control or retinogeniculate transmission in the mammalian lateral
geniculate nucleus", Experimental Brain Research, vol. 63, pp. 1-20, 1986.
[650] S. M. Sherman and C. Kock, "Thalamus", in The Synaptic Organization of the Brain (G. M. Shepherd, ed.),
pp. 246-278, New York: Oxford University Press, 1990.Third Edition.
[651] S. M. Sherman, "Dynamic gating of retinal transmission to the visual cortex by the lateral geniculate
nucleus", in Thalamic Networks for Relay and Modulation (D. Minciacchi, M. Molinari, G. Maccbi, and E. G.
Jones, eds.), pp. 61-79, Oxford: Pergamon Press, 1993.
References 448

1652] S. M. Sherman, "Dual response modes in lateral geniculate neurons: Mechanisms and functions", Visual
Neuroscience, vol. 13, no. 2, pp. 205-213, 1990.
[6531 D. A. Sholl, "Dendritic organization in the neurons of the visual cortices of the cat". Journal of Anatomy,
87: 387-406, 1953.
[654] A. Shokoufandeh, S. Dickinson, C. Jonsson, L. Bretzner, and T. Lindeberg, "On the representation and
matching of qualitative shape at multiple scales", Proceedings European Conference on Computer Vision,
Copenhagen, May, pp. 759-775, 2002.
[655] D. Shy and P. Perona, "X-y separable pyramid steerable scalable kernels", in Proc. IEEE Computer Soc.
Conf. on Computer Vision and Pattern Recognition, CVPR’94, pp. 237-244, IEEE, 1994.
[656] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. W. Zucker, "Shock graphs and shape matching", Int. J.
of Computer Vision, 35(1): 13-32, 1999.
11657] J. G. Sirmnonds, A Brief on Tensor Analysis.Undergraduate Texts in Mathematics, Springer-Verlag, 1995.
Second Edition.
[6581 E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. Heeger, "Shiftable multi-scale transforms",
IEEEIT, vol. 38, pp. 587-607, March 1992.
[659] E. P. Simoneelli and W. Freeman, "The steerable pyramid: a flexible architecture for multi-scale derivative
computation", in Proe. of the 2nd Annual IEEE Int. Conf. on Image Processing, IEEE, oct 1995.
[660] E. P. Simoneelli and H. Farid, "Steerable wedge filters for local orientation analysis", IEEE Tr. on Image
Processing, vol. 5, no. 9, pp. 1377, 1996.
[661] E. P. Simoncelli, "A rotation invariant pattern signature", in Proc. of the 3rd IEEE Int. Conf. on Image
Processing, pp. 185-188, 1996.
1662] E. P. Simoncelli, "A rotation invariant pattern signature", in Proe. of the 3rd IEEE Int. Conf. on Image
Processing, pp. 185-188, 1996.
[663] W. Snyder, Y.-S. Han, G. Bilbro, R. Whitaker, and S. Pizer, "Image relaxation: Restoration and feature
extraction", IEEE Tr. PAMI, vol. 17, no. 6, pp. 620-624, 1995.
[664] G. Sperling and Z. L. Lu, "A systems analysis of visual motion perception". In: High-level motion
processing. Takeo Watanabe (Ed). Cambridge MA: MIT Press, pp.153-183, 1998.
[665] L. Spillmann and J. S. Wemer, Visual Perception: the Neurophysiological Foundations. Academic Press
Inc., 1989.
[666] M. Spivak, Calculus on Manifolds. New York, New York, USA: W. A. Benjamin, Inc., 1965.
[667] M. Spivak, A Comprehensive Introduction to Differential Geometry, vol. I-V. Houston, Texas: Publish or
Perish, Inc., second edition ed., 1979.
[668] J. Sporting, "The entropy of scale-space", in Proceedings 13th ICPR, Austria, 1996.
[669] J. Sporring, M. Nielsen, L. Florack, and P. J. (Eds.), "Gaussian Scale-Space". Dordrecht: Kluwer Academic
Publishers, 1996.
[670] J. Sporring and J. Weickert, "On generalized entropies and scale-space", in Scale-Space Theory in
Computer Vision (B. ter Haar Romeny, L. Florack, J. Koenderink, and M. Viergever, eds.), vol. 1252 of Lecture
Notes in Computer Science, pp. 53-64, Springer, Berlin, 1997.
[6711 J. Sporting, M. Nielsen, J. Weickert, O.F. Olsen, "A note on differential comer measures", Proc. 14th Int.
Conf. Pattern Recognition (ICPR 14, Brisbane, Aug. 17-20, 1998), IEEE Computer Society Press, Los Alamitos,
Vol. 1,652-654, 1998.
[672] J. Sporring, M. Nielsen, O.F. Olsen, J. Weiekert, "Smoothing images creates comers", Image and Vision
Computing, Vol. 18, 261-266, 2000. Revised version of Technical Report DIKU-98/1, Dept. of Computer
Science, University of Copenhagen, Denmark, 1998.
[673] J. Staal, S. Kalitzin, B. M. ter Haar Romeny and M. A. Viergever, "Detection of critical structures in scale-
space". Lecture Notes of Computer Science, voh 1682, pp. 105-116, 1999.
[674] P. Tavan, H. Grubmuller, and Kuhnel, "Self-organization of associative memory and pattern classification:
Recurrent signal processing on topological feature maps", Biological Cybernetics, vol. 64, pp. 95-105, 1990.
1675] P. C. Teo, Y. Hel-Or, Lie generators for computing steerable functions. Pattern Recognition Letters, vol.
19, pp. 7-17, 1998.
[676] B. M. ter Haar Romeny, L. M. J. Florack, J. J. Koenderink, and M. A. Viergever, "Invariant third order
properties of isophotes: T-junction detection", in Proc.Tth Scand. Conf. on Image Analysis (P. Johansen and S.
Olsen, eds.), Aalborg DK, pp. 346-353, August 1991. Also in: Theory & Applications of Image Analysis (P.
Johansen and S. Olsen, eds.), vol. 2 of Series in Machine Perception and Artificial Intelligence, pp. 30-37,
Singapore: World Scientific, 1992.
[677] B. M. ter Haar Romeny, L. M. J. Florack, J. J. Koenderink, and M. A. Viergever, "Scale-space: Its natural
operators and differential invariants", in Information Processing in Medical Imaging (A. C. F. Colchester and D.
J. Hawkes, eds.), vol. 511 of Lecture Notes in Computer Science, pp. 239-255, Springer-Verlag, Berlin, July 1991.
[678] B. M. ter Haar Romeny and L. M. J. Florack, "A multiscale geometric model of human vision", in
Perception of Visual Information (W. R. Hendee and P. N. T. Wells, eds.), ch. 4, pp. 73-114, Berlin: Springer-
Verlag, 1993. Second edition 1996.
449 References

[679] B. M. ter Haar Romeny, L. M. J. Florack, A. H. Salden, and M. A. Viergever, "Higher order geometrical
image structure", in Proc. Information Processing in Medical Imaging ’93, Flagstaff AZ (H. Barrett, ed.), (Berlin),
pp. 77-93, Springer-Vedag, 1993.
[680] B. M. ter Haar Romeny, L. M. J. Florack, M. de Swart, J. Wilting, and M. A. Viergever, "Deblurring
Gaussian blur", in Proceedings Mathematical Methods in Medical Imaging II, vol.-2299, (San Diego, CA), pp.
139--148, SPIE, July 25-26, 1994.
[681] B. M. ter Haar Romeny, W. J. Niessen, J. Wilting, and L. M. J. Florack, "Differential structure of images:
Accuracy of representation", in Proc. First IEEE Internat. Conf. on Image Processing, (Austin, TX), pp. 21-25,
IEEE, November, 13-16 1994.
[682] B. M. ter Haar Romeny (ed.), "Geometry-driven diffusion in computer vision". Dordrecht: Kluwer
Academic Publishers, 1994.
[683] B. M. ter Haar Romeny, "Scale-space research at Utrecht University", in Proc. 12th Intern. Conf. on
Analysis and Optimization of Systems: Images, Wavelets and PDE’s (M.-O. Berger, R. Deriche, I. Herlin, J.
Jaffr6, and J.-M. Morel, eds.), Lecture Notes in Control and Information Sciences, vol. 219, pp. 15-30, Springer,
London, June 26-28 1996.
[684] B. M. ter Haar Romeny, W. J. Niessen, J. Weickert, P. van Roermund, W. van Enk, A. Lopez, and R. Maas,
"Orientation detection of trabecular bone", in Progress in Biophysics and Molecular Biology, vol. 65, pp. P-
H5-43, August 11-16 1996. Proc. 12th Intern. Biophysics Congress.
[685] B. M. ter Haar Romeny, "Applications of scale-space theory", in Gaussian Scale-Space Theory (J. Sporting,
M. Nielsen, L. Flomck, and P. Johansen, eds.), Computational Imaging and Vision, pp. 3-19, Dordrecht: Kluwer
Academic Publishers, 1997.
[686] B. M. ter Haar Romeny, L. M. J. Florack, J. J. Koenderink, and M. A. Viergever, eds., "Scale-Space ’97:
Proc. First lnternat. Conf. on Scale-Space Theory in Computer Vision", vol. 1252 of Lecture Notes in Computer
Science. Berlin: Springer Verlag, 1997.
[687] B. M. ter Haar Romeny, B. Titulaer, S. Kalitzin, G. Scheffer, F. Broekmans and E. te Velde, "Computer
assisted human follicle analysis for fertility prospects with 3D ultrasound", Proceedings Intern. Conf. on
Information processing in Medical Imaging (IPMI99), vol. 1613, Lecture Notes in Computer Science, Springer-
Verlag, Heidelberg, i999.
[688] B. M. ter Haar Romeny, L.M.J. Florack, "Front-End Vision, a Multiscale Geometry Engine". Proc. First
IEEE Intern. Workshop on Biologically Motivated Computer Vision (BMCV2000), May 15-17, 2000, Seoul,
Korea. Lecture Notes in Computer Science, 2000.
[6891 B. M. ter Haar Romeny, "Computer Vision and Mathematica", J. of Computing and Visualization in
Science, 2002.
[690] E. ter Haar Romeny, "Edlef tar Haar Romeny, painter", Van Gruting Publishers, Westervoort, the
Netherlands, 2002.
[691] D. Terzopoulos, "Regularization of inverse visual problems involving discontinuities", IEEE Tr. on Pattern
Analysis and Machine Intelligence, vol. 8, pp. 413-424, 1981.
[692] J. Thirio, S. Benayoun: "Image surface extremal points, new feature points for image registration". INRIA
Technical Report RR-2003, 1993
[693] J.-P. Thirion and A. Gourdon, "Computing the differential characteristics of isodensity surfaces", Computer
Vision, Graphics, and Image Processing: Image Understanding, vol. 61, pp. 190-202, March 1995.
[6941 J.-P. Thirion: "The extremal mesh and the understanding of 3D surfaces". Intern. J. of Computer Vision,
vol. 19, no. 2, pp. 115-128, 1996.
[695] R. Thom, "Structural stability and Morphogenesis" (transl. D. H. Fowler). New York: Benjamin-Addison
Wesley, 1975.
[696] D. W. Thompson, "On Growth and Form". Cambridge University Press, 1942.
[697] P. Thompson, "Margaret Thatcher: a new illusion". Perception, vol. 9, pp. 483-484, 1980.
[698] A. N. Tikhonov and V. Y. Arsenin, "Solution of Ill-Posed Problems". Washington DC: Winston and Wiley,
1977.
[699] M. Tistarelli and G. Sandini, "On the advantages of polar and log-polar mapping for direct estimation of
time-to-impact from optical flow", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 15, pp. 401-416,
April 1993.
[700] A. Toet, J. Blom and J. J. Koenderink, "The construction of a simultaneous functional order in nervous
systems", Biol. Cybern., vol. 57, pp. 115-125, 127-i36, 331-340, 1987.
[701] V. Torte and T. A. Poggio, "On edge detection", IEEE Tr. on Pattern Analysis and Machine Intelligence,
vol. 8, no. 2, pp. 147-163, 1986.
[702] D. Y. Ts’o, R. D. Frostig, E. E. Lieke, and A. Grinvald, "Functional organization of primate visual cortex
revealed by high resolution optical imaging". Science 249: 417-20, 1990.
[703] W.A. van de Grind, J. J. Koenderink and A. J. van Doom, "Motion detection from photopic to low scotopic
luminance levels". Vision Research, vol. 40, no. 2, pp. 187-199, 1999.
References 450

[704] R. van den Boomgaard, "The morphological equivalent of the Gauss convolution", Nieuw Archief voor
Wiskunde (in English), vol. 10, pp. 219-236, November 1992.
[7051 R. van den Boomgaard and A. W. M. Smeulders, "Morphological multi-scale image analysis", in
Mathematical Morphology and its Applications to Signal Processing (J. Serra and P. Salembier, eds.), (Barcelona,
Spain), pp. 180-185, Universitat PoIitechnica de Catalunya,, May 1993.
[706l R. van den Boomgaard and A. W. M. Smeulders, "The morphological structure of images, the differential
equations of morphological scale-space", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 16, pp.
1101-1113, November 1994.
[707] R. van den Boomgaard and L. Dorst, "The morphological equivalent of Gaussian scale-space", Gaussian
Scale-Space Theory, J. Sporring and M. Nielsen and L. Florack and P. Johansen (eds), Series: Computational
Imaging and Vision, Kluwer Academic Publishers, pp 203-220, 1997.
[708] P. van den Elsen, E. J. D. Pol, J. B. A. Maintz, and M. A. Viergever, "Image fusion using geometrical
features", in SPIE Vol. 1808 Visualization in Biomedical Computing (R. A. Robb, ed.), (Bellingham, WA), SPIE
Press, 1992.
[709l P. van den Elsen and M. A. Viergever, "Fully automated CT and MR brain image registration by
correlation of geometrical features", in Proe. Information Processing in Medical Imaging ’93, Flagstaff AZ (H.
Barrett, ed.), Berlin, Springer Verlag, 1993.
[710] A. van den Elsen, J. B. A. Malntz, E. J. D. Pol, and M. A. Viergever, "Automatic registration of CT and
MR brain images using correlation of geometrical features", IEEE Tr. on Medical Images, vol. 14, no. 2, pp.
384-398, 1995.
[711] B. van Ginneken and B. M. ter Haar Romeny, "Applications of locally orderless images", In: Scale-space
theories in computer vision, Lecture Notes in Computer Science, vol. 1682, pp. 10-21, Springer, Berlin, 1999.
[712] B. van Ginneken and B. M. ter Haar Romeny, "Applications of locally orderless images", Journal of Visual
Communication and Image Representation, vol. 11, no. 2, pp. 196-208, June 2000.
[713] K. L. Vincken, C. N. de Graaf, A. S. E. Koster, M. A. Viergever, F. J. R. Appelman, and G. R. Timmens,
"Multiresolution segmentation of 3D images by the hyperstack", in Proc. First Conf. on Visualization in
Biomedical Computing, pp. 115-I22, Los Alamitos, CA: IEEE Computer Society Press, 1990.
[714] K. L. Vineken, W. J. Niessen, and M. A. Viergever, "Blurring strategies for image segmentation using a
multiscale linking model", in Proc. Computer Vision and Pattern Recognition, (San Francisco, CA), pp. 21-26,
IEEE Computer Society Press, i996.
[715I K. L. Vincken, A. S. E. Koster, and M. A. VJLergever, "probabilistic multiscale image segmentation", IEEE
Tr. on Pattern Analysis and Machine Intelligence, vol. 2, no. 19, pp. 109-120, 1997.
[716] R. vonder Heydt and E. Peterhans, "Mechanisms of contour perception in monkey visual cortex I. Lines of
pattern discontinuity", Journal of Neuroscienee, vol. 9, no. 5, pp. 1731-1748, 1989.
[717] G. Wallis, "Neural Mechanisms Underlying Processing in the Visual Areas of the Occipital and Temporal
Lobes", PhD thesis Max Planek Institute for Biological Cybernetics, Tubingen, Germany, 1994. URL:
www.kyb.tuebingen.mpg.de/bu/people/guy/allrdalla.html.
[718] B. A. Wandell, Foundations of Vision. Sunderland MA: Sinauer Associates, Inc., 1995.
[719] H. W~issle, Visual neuroscience, oh. Sampling of visual space by retinal ganglion cells, pp.
19-32.Cambridge University Press, 1986. J. D. Pettigrew, K. J. Sanderson and W. R. Levick, Eds.
[720] H. W~issle, U. Gruenert, J. Roehrenbeck, and B. Boycott, "Retinal ganglion cell density and cortical
magnification factor in the primate", Vision Research, vol. 30, pp. 1897-191 i, 1990.
[721] H. W~issle and B. B. Boycott. "Functional architecture of the mammalian retina". Physiological Review,
71:447-480, 1991.
[722] A. B. Watson, "Summation of grating patc]aes indicate many types of detector at one retinal location",
Vision Research, vol. 22, 17-25, 1982.
[723] A. B. Watson, "The cortex transform: Rapid computation of simulated neural images", Computer Vision,
Graphics, and Image Processing, 39 (3), 311-327, 1987.
[724] D. J. Watts, P. Sheridan Dodds, M. E. J. Newman: "Identity and search in social networks", Science vol.
296, no. 5571, pp. 1302-1305, 17 May 2002.
[725] J. Weber and J. Malik, "Robust computation of optical-flow in a multiscale differential framework", Intern.
Journal of Computer Vision, vol. 14, no. 1, pp. 67-81, 1995.
[726] J. Weickert, "Anisotropic diffusion filters for image processing based quality control", in Proc. Seventh
European Conf. on Mathematics in Industry (A. Fasano and M. Primicerio, eds.), pp. 355-362, Teubner, Stuttgart,
1994.
[727] J. Weickert, "Scale-space properties of nonlinear diffusion filtering with a diffusion tensor", Tech. Rep.
110, Laboratory of Technomathematics, Univ. of Kaiserslautern, Germany, October 1994.
[728] J. Weickert, "Multiscale texture enhancement", in Computer Analysis of Images and Patterns (V. Hlavac
and R. Sara, eds.), vol. 970 of Lecture Notes in Computer Science, pp. 230-237, Springer, Berlin, 1995.
I729] J. Weickert, "Foundations and applications of nonlinear anisotropic diffusion filtering", Z. Angew. Math.
Mech., Suppl. 1, vol. 76, pp. 283-286, 1996.
451 References

[730] J. Weickert, "Theoretical foundations of anisotropic diffusion in image processing", Computing Suppl., vol.
11, pp. 221-236, 1996.
[731] J. Weickert, "Nonlinear diffusion scale-spaces: From the continuous to the discrete setting", in ICAOS ’96:
Images, Wavelets and PDEs (M.-O. Berger, R. Deriche, I. Herlin, J. Jaffr6, and J.-M. Morel, eds.), voh 219 of
Lecture Notes in Control and Information Sciences, pp. 111-118, Springer, London, ! 996.
[732] J. Weickert, "A model for the cloudiness of fabrics", in Progress in Industrial Mathematics at ECMI 94 (H.
Neunzert, ed.), pp. 258-265, Wiley-Teubner, Chichester, 1996.
[733] J. A. Weickert, B. M. ter Haar Romeny, and M. A. Viergever, "Conservative image transformations with
restoration and scale-space properties", in Proc. 1996 IEEE Int. Conf. Image Processing, vol. I, (ICIP-96,
Lausanne, Sept. 16-19, 1996), pp. 465-468, 1996.
[734] J. Weickert, S. Ishikawa, A. Imiya, "On the history of Gaussian scale-space axiomatics", in J. Sporting, M.
Nielsen, L. Florack, P. Johansen (Eds.), Gaussian scale-space theory, Kluwer, Dordrecht, 45-59, 1997.
[735] J. Weickert, "Recursive separable schemes for nonlinear diffusion filters", in Scale-Space Theory in
Computer Vision (B. ter Haar Romeny, L. Flomck, J. Koenderink, and M. Viergever, eds.), vol. 1252 of Lecture
Notes in Computer Science, pp. 260-271, Springer, Berlin, 1997.
[736] J. Weickert and B. Benhamouda, "A semidiscrete nonlinear scale-space theory and its relation to the Perona-
Malik paradox", in Advances in Computer Vision (F. Solina, W. G. Kropatsch, R. Klette, and R. Bajcsy, eds.),
pp. 1-10, Springer, Wien, 1997.
[737] J. Weickert, "Nonlinear diffusion scale-spaces", in Gaussian Scale-Space Theory (J. Sporting, M. Nielsen,
L. Florack, and P. Johansen, eds.), pp. 221-234, Dordrecht: Kluwer, 1997.
[738] J. Weickert, K. J. Zuiderveld, B. M. ter Haar Romeny, and W. J. Niessen, Parallel implementations of AOS
schemes: A fast way of nonlinear diffusion filtering", in Proc. 1997 IEEE Int. Conf. Image Processing, vol. 3,
(ICIP-97, Santa Barbara, Oct. 26-29, 1997), pp. 396-399, 1997.
[739] J. Weickert, "A review of nonlinear diffusion filtering", in Scale-Space Theory in Computer Vision (B. ter
Haar Romeny, L. Florack, J. Koenderink, and M. Viergever, eds.), vol. 1252 of Lecture Notes in Computer
Science, pp. 3-28, Springer, Berlin, 1997.
[740] J. Weickert, "Coherence-enhancing diffusion of colour images", A. Sanfeliu, J.J. Villanueva, J. Vitri~t
(Eds.), Proc. VII National Symposium on Pattern Recognition and Image Analysis (VII NSPRIA, Barcelona,
April 21-25, i997), Vol. 1,239-244, 1997.
[741] J. Weiekert, B. M. ter Haar Romeny, A. Lopez, and W. J. van Enk, "Orientation analysis by coherence-
enhancing diffusion", in Proc. Symposium on Real World Computing, (RWC ’97, Tokyo, Jan. 29-31, 1997), pp.
96-103, 1997.
[742] J. Weickert, "Anisotropic diffusion in image processing", ECMI Series, Teubner Verlag, Stuttgart, 1998.
ISBN 3-519-02606-6.
[743] J. Weickert, B. M. ter Haar Romeny, and M. A. Viergever, "Efficient and reliable schemes for nonlinear
diffusion filtering", IEEE Tr. on Image Processing, Vol. 7, 398-410, 1998.
[744] J. Weickert, "Fast segmentation methods based on partial differential equations and the watershed
transfoImation", P. Levi, R.-J. Ahlers, F. May, M. Schanz (Eds.), Mustererkennung 1998, Springer, Berlin,
93-100, 1998.
[745] J. Weickert, "On discontinuity-preserving optic flow", S. Orphanoudakis, P. Trahanias, J. Crowley, N.
Katevas (Eds.), Proc. Computer Vision and Mobile Robotics Workshop (CVMR ’98, Santorini, Sept. 17-18,
1998), 115-122, 1998.
[746] J. Weickert, S. Ishikawa, A. Imiya, "Linear scale-space has first been proposed in Japan", J. Mathematical
Imaging and Vision, Vol. 10, 237-252, 1999.
[747] J. Weickert, "Nonlinear diffusion filtering", B. J~ihne, H. Haul3ecker, P. Geil31er (Eds.), Handbook on
Computer Vision and Applications, Vol. 2: Signal Processing and Pattern Recognition, Academic Press, San
Diego, 423-450, 1999.
[748] J. Weickert, J. Heers, C. Schn/3rr, K.J. Zuiderveld, O. Scherzer, H.S. Stiehl, "Fast parallel algorithms for a
broad class of nonlinear variational diffusion approaches", Real-Time Imaging, 2001. Revised version of
Technical Report 5/1999, Computer Science Series, University of Mannheim, 68131 Mannheim, Germany, 1999.
[749] J. Weickert, "Coherence-enhancing diffusion filtering", Intern. Journal of Computer Vision, Vol. 31,
111-127, 1999.
[750] J. Weickert, "Coherence-enhancing diffusion of colour images", Image and Vision Computing, Vol. 17,
201-212, 1999.
[751] J. Weickert, "Design of nonlinear diffusion filters", B. J~ihne, H. Haul3ecker (Eds.), Computer Vision and
Applications, Academic Press, San Diego, 439-458, 2000.
[752] J. Weickert, C. Schn6rr, "PDE-based preprocessing of medical images", Ktinstliche Intelligenz, No. 3, 5-10,
2000. Revised version of Technical Report 8/2000, Computer Science Series, University of Mannheim, 68131
Mannheim, Germany, February 2000.
[753] J. Weickert, C. Schn6rr, "Variational optic flow computation with a spatio-temporal smoothness
constraint", Journal of Mathematical Imaging and Vision, 2001. Revised version of Technical Report 15/2000,
Computer Science Series, University of Mannheim, 68131 Mannheim, Germany, July 2000.
References 452

[754] J. Weickert, "Efficient image segmentation using partial differential equations and morphology", Pattern
Recognition, 2001. Also available as Technical Report 3/2000, Computer Science Series, University of
Mannheim, 68131 Mannheim, Germany, February 2000.
[755] J. Weickert, H. Scharr, "A scheme for coherence-enhancing diffusion filtering with optimized rotation
invariance", Joumal of Visual Communication and Image Representation, 2001. Revised and shortened version
of Technical Report 4/2000, Computer Science Series, University of Mannheim, 68131 Mannheim, Germany,
February 2000.
[756] R. Weitzenb6ck, "Invariantentheorie".Groningen: P. Noordhoff, 1923.
[757] G. B. West. "Scale and Dimension - From animals to quarks". In: Particle Physics, a Los Alamos Primer.
N.C. Cooper and G. B. West, Eds. Cambridge University Press, Cambridge 1988.
[7581 H. Weyl, "The Classical Groups, their Invariants and Representations". Princeton, NJ: Princeton University
Press, 1946.
[759] H. Weyl, "Symmetry". Princeton, NJ: Princeton University" Press, 1983 (reprint of 1952).
[760] R. T. Whitaker and S. M. Pizer, "A multi-scale approach to nonuniform diffusion", Tech. Rep. TR91-040,
Medical Image Display Group, Department of Radiation Oncology, The University of North Carolina, Chapel
Hill, NC 27599-3175, September 1991.
[761] R. T. Wbitaker and S. M. Pizer, "A multi-scale approach to nonuniform diffusion", Computer Vision,
Graphics, and hnage Processing: Image Understanding, vol. 57, pp. 99-110, January 1993.
[762] R. T. Whitaker, "Geometry-limited diffusion in the characterization of geometric patches in images",
Computer Vision, Graphics, and Image Processing: Image Understanding, vol. 57, pp. 111-120, January 1993.
[763] R. T. Whitaker and S. M. Pizer, "Geometry-based image segmentation using anisotropic diffusion", in Proc.
of the NATO Advanced Research Workshop Shape in Picture -- Mathematical Description of Shape in Greylevel
Images (Y.-L. O, A. Toet, H. J. A. M. Heijmans, D. H. Foster, and P. Meer, eds.), vol. 126 of NATO ASI Series
F, pp. 641-650, Springer Verlag, Berlin, 1994.
[7641 R. Whitaker and G. Gerig, "Vector-valued diffusion", in Geometry-Driven Diffusion in Computer Vision
(B. M. ter Haar Romeny, ed.), Computational Imaging and Vision, pp. 93-134, Kluwer Academic Publishers,
1994.
[765] D. J. Williams and M. Shah, "Edge contours using multiple scales", Computer Vision, Graphics, and Image
Processing, vol. 51, pp. 256-274, 1990.
[766] R. W. Williams, "The human retina has a cone-enriched rim", Visual Neuroscience, vol. 6, pp. 403-406,
1991.
[767] R. Wilson and A. H. Bhalerao, "Kernel design for efficient multiresolution edge detection and orientation
estimation", IEEE Tr. on Pattern Analysis and Machine Intelligence, vol. 14, no. 3, pp. 384-390, 1992.
[7681 A. G. Wilson and V. E. Johnson, "Priors on scale-space templates", in Proc. Mathematical Methods in
Medical Imaging II, vol. 2299, (San Diego, CA), pp. 161-168, SPIE, July, 25-26 1994.
[769] R. A. Wilson and F. Keil, "The MIT Encyclopedia of the Cognitive Sciences", the MIT Press, 1999.
[770] A. Witkin, "Scale-space filtering", in Proc. Intern. Joint Conf. on Artificial Intelligence, (Karlsruhe,
Germany), pp. 1019-1023, 1983.
[771] A. Witkin, "Scale-space filtering: A new approach to multi-scale description", in Image Understanding
1984 (S. Ullman and W. Richards, eds.), NJ: Norwood Ablex, 1984.
[772] A. Witkin, D. Terzopoulos, and M. Kass, "Signal matching through scale-space", Intern. Journal of
Computer Vision, vol. 1, no. 2, pp. 134-144, 1988.
[773] S. Wolfram, "Mathematica: A System for doing Mathematics by Computer", Addison-Wesley, 1999.
Version 4.
[774] S. Wolfram, "A new kind of science", Addison-Wesley, 2002.
[775] G. Wyszecki and W. S. Stiles, "Color science: concepts and methods, quantitative data and formulae",
Wiley Series in Pure and Applied Optics, 2000.
[7761 R. A. Young, "Oh say can you see? the physiology of vision", publication GMR-7364, General Motors
Research Labs, Computer Science Dept., 30500 Mound Road, Box 9055, Warren, Michigan 48090-9055, May 31
I991.
[777] R. A. Young, "The Gaussian derivative theory of spatial vision: Analysis of cortical cell receptive field line-
weighting profiles", publication GMR-4920, General Motors Research Labs, Computer Science Dept., 30500
Mound Road, Box 9055, Warren, Michigan 48090-9055, May 28 1985.
[778] R. A. Young, "The Ganssian derivative model for machine vision: Visual cortex simulation", publication
GMR-5323, General Motors Research Labs, Computer Science Dept., 30500 Mound Road, Box 9055, Warren,
Michigan 48090-9055, July 7 1986.
[779] R. A. Young, "Simulation of human retinal function with the Gaussian derivative model", in Proc. IEEE
CVPR CH2290-5, (Miami, Fla.), pp. 564-569, 1986.
[780] R. A. Young, "The Gaussian derivative model for machine vision: I. retinal mechanisms", Spatial Vision,
vol. 2, no. 4, pp. 273-293, 1987.
453 References

[781 ] A. L. Yuille and T. Poggio, "Fingerprint theorems for zero crossings", JOSA, "A", vol. 2, pp. 683-692, May
1985.
[[782] A. L. Yuille and T. A. Poggio, "Scaling theorems for zero-crossings", 1EEE Tr. on Pattern Analysis and
Machine Intelligence, vol. 8, pp. 15-25, January 1986.
[783] A. L. Yuille and T. A. Poggio, "Scaling and fingerprint theorems for zero-crossings", in Advances in
Computer Vision (C. Brown, ed.), pp. 47-78, Lawrence Erlbaum, 1988.
[784] J. C. Zagal, E. Bj6rkman, T. Lindeberg, P. E. Roland, "Significance determination for the scale-space
primal sketch by comparison of statistics of scale-space blob volumes computed from PET signals versus residual
noise", HBM’2000, Intern. Conf. on Functional Mapping of the Human Brain, San Antonio, Texas, in press,
2000.
[785] S. Zeki, "A vision of the brain". Oxford: Blackwell Scientific Publications, 1993.
[786] J. Zhang and J. P. Miller, "A mathematical model for resolution enhancement in layered sensory systems",
Biological Cybernetics, vol. 64, pp. 357-364, 1991.
[787] S. W. Zucker and R. A. Hummel, "Receptive fields and the representation of visual information", in
Seventh Intern. Conf. on Pattern Recognition (Montreal, Canada, July 30-August 2, 1984), IEEE Publ.
84CH2046-1, pp. 515-517, IEEE, IEEE, 1984.
[788] S. W. Zucker, "Early orientation selection: Tangent fields and the dimensionality of their support",
Computer Vision, Graphics, and Image Processing, vol. 32, pp. 74-103, 1985.
[789] S. W. Zucker and R. A. Hummel, "Receptive fields and the representation of visual information", Human
Neurobiology, vol. 5, pp. 121-128, 1986.
[790] S. W. Zucker, "Which computation runs in visual cortical columns?", in J. Leo van Hemmen and T. J.
Sejnowski (eds.), Problems in Systems Neuroscience, Oxford University Press, 2001.
block function, 47
Index blood oxygenation level, 193
blurring, 33
3D, 424 BOLD fMRI, 193
3D rendering, 91 boundaries, 83
3D ultrasound, follicle detection, 225 brackets in Mathematica, 396
probe, 225 Brodmann’s area, 185

abort evaluation in Mathematica, 423 calcarine sulcus, 185


accuracy, of data representation, 281 calculus of variations, 26, 148
accuracy of differentiation, 137 Cartesian tensor, 92
acuity, 171 cascade property, 39, 151
adaptive filtering, 184 Cassini function, 94
affine invariant, corner detection, 113 Castan filter, 149
affine transformation, 113 catastrophe, 241
algebraic polynomial invariants, 92 cusp, 249
aliasing, 11 detection, 268
Alvarez, Luis, 378 fold, 248, 264
Alvarez’ equation, 109 swallowtail, 250
amacrine cell, 157, 288 catastrophe germ, 245
angular weight, 334 catastrophy theory, 243
animation, 75,423 catchment basin, 232, 234
anisotropy, 48 causality, 23
anisotropy ratio, 68 temporal, 350
annihilation, 255 CCD element, 32
anti-aliasing, 11 CD-ROM content, 419
anti-symmetric tensor, 99 cell internal structure, 424
aperture, measurement, 2 central limit theorem, 46, 277
temporal, 1,345 central nervous system, 161
aperture problem, 309 cerebellum, 180
aperture property, 309 changing the 3D viewpoint, 424
apertures, 1 characteristic equation, 119
arbitrary precision mode, 87 Chebyshev polynomials, 57
area 17, 185 checkerboard junction, 131
astigmatic eye, 48 cylindrical shapes, 117
automatic numbering objects, 424 coincidence detector, 287
automatic scale selection, 216, 219 coincidenceness,
autoradiogram, 190 of roots of a polynomial, 131
average grey level invariance, 38 color, image formation, 311
axioms, 15 color constancy problem, 313
color model, by Koenderink, 314
basilar membrane, 161 columnar structure, 185
basis, 92 commutation, 28
Benford’s Law, 352 Compile, 86, 88
bifurcation, 249 complex conjugate, 148
binomial coefficients, 43 complex cells, 187-188
biomimicking, 200 computational models, for vision, 155
biphasic cell, 356 computer aided analysis, 3D ultrasound,
biphasic index, 358 225
bipolar cells, 157 computer-aided diagnosis (CAD), 338
black body light source, 312 concave, 117
blob detector, 216 condition number, 307
blobness, 218 conductivity coefficient, 364
blobs, 190 cones, 157
wavelength sensitivity, 158
456 Index

constant motion flow, 390 deep structure, 154


constraint, 148 toolbox, 265
constraint equations, 26 deep structure, 35, 167,215, 223
context of voxels, 34 Default3D, 424
contextual structure, 154 deformable templates, 144
contour linking, 268 deformation, of images, 289
contour on the image, 423 degenerate Hessian, 244
ContourPlot, 93 Delta-Dirac function, 42
contraction, of tensor indices, 135 dendrites, 169
control parameter, 243 dendritic tree size, 170
convex, 117 dendritic tree area, as function of
convolution, 25,413 eccentricity, 170
cyclic, 72 density image, optic flow, 292
in Fourier domain, 141 Deriche filter, 150
in the Fourier domain, 416 derivative, directional, 102
convolution integral, 46 Fourier transform, 138
convolution integral, 29, 415 derivatives, in the visual system, 156
convolve, 39 of sampled data, 27
cooking of a turkey, 16 Descartes, 97
coordinate systems, 96 detection of maxima, 216
cornea, 48 difference of Ganssians, 175
comer detection, 114 differential operators, generic events in
affine invariant, 113 scale-space, 250
comerness, 114 differential features, 29
coronal plane, 406 differential geometry, 31, 91
correlation, of Ganssian derivatives, 60 differential operator, temporal, 346
correlation coefficient, 61 differential structure, 91
cortical activity map, 194 differentiation, by integration, 28
cortical columns, 187 diffuse boundaries, 78
cortical hypercolumn, 190 diffusion, anisotropic, 363
cortical receptive fields, coherence enhancing, 363
Gabor function model, 65 geometry-driven, 361,363
cortical simple cell, 354 inhomogeneous, 363
corticofugal, 184 isotropic, 363
corticofugal projections, to the LGN, 183 linear, 363
Craik-O’Brien-Cornsweetillusion, 175 tensor-driven, 363
creation, 255 variable conductance, 363
crest line, 125 diffusion equation, 49, 51
critical isophote, 93 dilation, 387
self-intersection, 262 dimension matrix, 18
CT scanner, 2 dimensional analysis, 15
cubic splines, 144 dimensionless, coordinate, 218
cumulative Ganssian function, 42 numbers, 20
curvature, 23 quantities, 18
curve evolution, 274, 382 diopter, 156
curvedness, 120 Dirac delta function, 29, 41
cusp catastrophe, evolution, 249 direction, of vector, 330
cyclotron, 194 directional derivative, 102, 129
Dirichlet problem, 67
Damon, James, 250 discrete Gaussian kernels, 64
data smoothing, 144 discriminant, 131
deblurring, 371 disparity map, 285
analytical methods, 277 display a group of images, 424
Gaussian blur, 277 dissimilarity measure, 231
scale-space approach, 277 dissimilarity scale-space, 233
Index 457

distribution, 30 differential, 29
distribution theory, 147 feedback, 183
distributions, regular, tempered, 144 feedback loops, 67
dithering, 5 field of view, 2
color, 6 fill-in effect, 201
Div, divergence operator, 294 filter, Castan, 149
divergence of flow, 365 Deriche, 150
divergence theorem, 257 normalized, 25
Dorst, Leo, 390 unity, 25
Dot, 100 filterbank of oriented filters, 184
draw a contour, 423 filterbank representation, 197
drift velocity, of singularities, 265 fingerprint, 253
duality, 95 of fold catastrophe, 249
PDE and curve evolution, 383 fingerprint enhancement, 239
dumb-bell example, 255 Fit, 229
dye injection, 170 fitting spherical harmonics to 3D points,
229
eccentricity, retinal, 159, 170 fixing the gauge, 103
edge focusing, 221 flowline, 92
implementation, 224 curvature, 110
edge hierarchy, 78 fluid density, 18
edge location, 223 fluid viscosity, 18
edge-preserving smoothing, 184, 362 fluorescence, 194
EEG, 191 fMRI, 193
Eigenvalues, 119 high resolution, 193
Einstein summation convention, 135 Fold, 235
electro-encephalography (EEG), 191 fold catastrophe, 264
elementary catastrophes, 246 fold catastrophe, evolution, 247
end-stopped cells, 187 follicle, 225
energy minimization, 144 follicle boundary, 228
ensemble measurements, 154 follicle detection in 3D ultrasound, 225
entropy, 26 font style, 424
entropy scale-space, 383 foreground-background illusion, 182
epileptic foci, 192 forward=Euler approximation, 365
Epilog, 423-424 Fourier, Jean-Baptiste, 15
erosion, 387 Fourier domain, boundary effects, 83
error function, 41 Fourier transform, Gaussian derivative, 28
Euclidean shortening flow, 378, 383 leakage, folding, 139
Euclidean tranformation, 101 fourth order structure, 131
Euler gamma function, 62 fractals, 35
Euler-Lagrange equations, 26, 148 frames, 345
EuroRad, 76, 102 Freeman’s lab, 162
Evaluate, 88 friction factor, 19
evolution, of images, 361 Frobenius norm, 307
evolutionary computing, 361 front-end, 4
exact representation, 86 axioms, 15
excitatory orientation coupling, 200 multi-scale geometry engine, 155
extremal line, 125 uncommitted, 9
extremal mesh, 126 front-end visual system,
extremality, 124 cortical columns, 197
eye physics, 155-156 Froude’s number, 20
function fitting, 144
Fast Fourier Transform, 89 functional, 148
Fast summation, 424 functional imaging, 155
features, 91 functional MRI (fMRI), 193
458 Index

fundamental equation, 109 gDn, 79


gD~b, 336
Gabor kernels, 65, 147 gD~bN, 337
Gamma function, 139 generalized functions, 40
ganglion cells, midget, 169 generalized anisotropic non-linear
parasol, 169, 288 diffusion, 240
Ganzfeld illumination, 160 generate an animation, 423
gauge, first order, 103 generic event, 246
gauge coordinates, first order, 103 geometric information, 29
for discrete images, 107 geometric reasoning, 92, 127, 130, 361
gauge frame, plot, 106 geometrical structure, 154
gauge invariants, 134 geometry-driven diffusion, 109, 361
Gaug, Carl Friedrich, 37 Giuseppe Arcimboldo, 9
Gaussian derivative, implementation, global structure, 271
N-dimensional, 78 Grad, gradient operator, 294
Gaussian derivatives, correlation, 60 gradient, integral curve, 95
Fourier domain, 57 magnitude, 101
implementation, Fourier domain, 79 gradient magnitude, 221
separability, 67 gradient magnitude squared, 231
zero crossings, 56 gradient scale-space, 222
Gaussian extremality, 125 gradient squared, generic events, 251
Gaussian curvature, 123, 327 gradient vector, 100
minimal surfaces, 126 graduated convexity, 30, 144
Gaussian derivative family, 27 graph theory, 274
Gaussian derivatives, 24, 53 Gray, Alfred, 126
algebraic structure, 53 grayscale invariance, 379
as bandpass filters, 58 grayscale transformation, 93
as smooth test functions, 145 Green’s function, 24
functions graphs, 74 group, of transformations, 97
implementation, recursive, 85 group theory, 351
implementation, separable, 73
implementation, spatial domain, 71 Hadamard, Jacques, 143
of infinite order, 60 half axis, 349
orthogonality, 57 Hausdorff dimension, 34, 220, 263
Szego’s formula, 60 Heavyside function, 42
Zernickes formula, 60 heightline, 94
zero crossings, 59 helical shell shape, 404
Gaussian envelop, 55 help browser, 423
Gaussian kernel, 37, 51 heltocat surface, 127
axiomatic derivation, 20 Hering color basis, 316
discrete, 64 Hermann grid illusion, 175
Fourier transform, 44 Hermite, Charles, 54
Fourier spectra, 46 polynomial, 55
half width at half maximum, 38 Hermitian matrix, 101
normalization, 38 Hessian matrix, 49
normalization factor, 55 Hessian matrix, 119, 131
orientation, 331 determinant, 123
spectral, 315 Eigenvectors, 122
temporally skewed, 353 trace, 124
variance, 37 hierachy, of structures, 223
width, 37 Hilbert, David, 131, 134
Gaussian noise, 282, 424 Hildreth, 112
Gaussian weighted extent, 40 hills and dales, 117
gDc, 72 histogram, 366
gDf, 80 Histogram function, 424
Index 459

histogram stretching, 78 name, 107


histogram equalization, 77 to coordinate transformations, 92
histology, 325 invariant theory, 31
horizontal cells, 157 invariants, meaning of, 100
Horn, Berthold, 290 tensor notation, 135
horseradish peroxidase, 199 inverse heat equation, 370
horseshoe crab, 4 inverse Fourier transform, 44
Hubel, David, 160 InverseFourier, 89
human visual system, bibiography, 153 ipsi-lateral, 181
hypercolumn, 190, 197 irreducible invariants, 134
interconnectivity, 199 iso-orientation contour, 199
hypercomplex cells, 187 isophote, 92
hypersurface, 258 3D, 95
hypo-echoic, 225 critical, 262
curvature, 108, 110
identity matrix, 99 density, 112
Iijima, 13 self-intersection, 94
ill-posed, 30, 143, 370 isophote curvature, gradient of, 129
illusion, Craik-O’Brien-Cornsweet, 175 isophote curvature, 134
curvature of straight lines, I 15 generic events, 255
foreground-background, 182 MR image, 110
Hermann grid, 175 isophote imaging, 194
parallel lines, 330 isotropic, 48, 68
step function, 175 isotropy, 15
Thatcher, 102
visual fading, 200 Jacobi, Carl, 98
image read, 423 polynomials, 57
image database indexing, 275 Jacobian, matrix, 98, 294
image differential structure, 91 Japan, 35
image mosaic, 7 Jordan curve, 93
image partitioning, 231 Josephson junction, 192
image restoration, 277 junction detection, 131
implicit function theorem, 245
ImplicitPlot3D, 249 kernel, family, 27
Import, 74 normalized, 22, 38
incommensurable dimensions, 345 width, 13
inner scale, 3, 33, 37 kernels, complete family, 35
installing the book, instructions, 419 keyhole observation, 34
Windows 95/98/2000/NT/XP, 420 knowledge incorporation, 361
integration time, 345 Kuffler, Stephen, 160
integration area, 25
intensity surface, 95 Lagrangian, 26, 148
interactive 3D display, 424 Laguerre polynomials, 57
interactive segmentation, 231 Laplace’s equation, 229
interpolation, 144, 424 Laplacian, 216
cubic spline, 289 in gauge coordinates, 105
nearest neighbour, 11 in natural coordinates, 133
interrupt, 423 of Gaussian, 172
intrinsic geometry, 103 operator, 24, 101
introduction to Mathematica, 395 repeated, 278
invariance, 100 zero crossings, 112
grayscale, 379 Laplacian operator, A, 49
scale, 15 lateral inhibition, 175
invariant, complete family, 107 Lateral Geniculate Nucleus, 162, 169, 179,
expressed in gauge coordinates, 103 181
460 Index

law of scale invariance, 16 MathGL3d, OpenGL viewer, 95


least square approximation, 229 MathSource, 412
Legendre polynomials, 57 matrix, inverse, 101
lemniscate of Bernouilli, 94 transpose, 101
level sets, 382 maximum principle, 50
Levi-Civita tensor, 99 maximum-length sequence, 163
LGN, Lateral Geniculate Nucleus, 169 MDL, minimum description length, 272
LGN pathway, 186 mean curvature, 123
Lie derivative, 293 MEG, 192
limits on differentiation, 137 mesh, by isophotes and flowlines, 95
Lindeberg, Tony, 13 Mexican hat function, 172
linear isotropic diffusion equation, 24 midbrain, 179
linearity, 15 midget ganglion cells, 169
linking, of contours, 268 minimal surface, 126
linking of regions, 234 minimization, in L 2 sense, 152
linking over scale, 215, 234 minimum description length (MDL), 272
ListContourPlot3D, 95 mirrored tiling, 84
ListConvolve, 71, 79 model, 34, 92
ListPlot3D, 74 modulation transfer function, 46
LiveGraphics3D, 318,424 Moebius strip, 126
local generic features, 33 Mona Lisa, 7
localization scale, 236 monkey saddle, winding number, 259
log-polar mapping, 186 monkeysaddle, higher order, 260
logarithmic half-axis sampling, 349 monosynaptic connection, 179
longevity, 223 Morel, Jean-Michel, 362
low-pass filter, 46 morphological gradient operator, 388
Lvv, ridge detector, 109 Morse lemma, 245
Morse theory, 107
macaque monkey, 170, 185 motion blur, 284
magnetic fields, 192 motion detection, with receptive field
magneto-encephalography (MEG), 192 pairs, 286
magno-cellular layers, LGN, 169, 181 MR brain segmentation, watershed, 238
Magritte, Ren6, 83, 128 multi-index, 145
Malik, Jitendra, 363 multi-scale derivative operators, 30
manifolds, 91 multi-scale geometry engine, 155
manual shortcuts, 423 multi-scale grouping, 273
MapThread, 84 multi-scale optic flow, 285
marching lines, 126 Multi-scale Optic Flow Constraint
Marr, 112 Equation (MOFCE), 286
matching, of images, 289 multi-scale optic flow constraint equation,
Mathematica, front-end, 395 derivation, 292
introduction, tutorial, 395 multi-scale orientation, texture, 329
kernel, 395 multi-scale segmentation, 231
mathematical notation, 423 multi-scale signature, of a signal, 223
speed concerns, 85 multi-scale watershed segmentation, 237
startup suggestions, 423 multiplanar reconstruction, 277
suggested reading, 410 multiplanar reformatting, 406
tips and tricks, 423 Mumford, David, 362
web resources, 412
mathematical notation, 423 N-jet, 31
mathematical point, 40 nabla operator, 101,129
mathematical morphology, 386 natural coordinates, 40, 132, 383
grayvalued images, 389 natural scale parameter, 32, 35
relation to gaussian scale-space, 390 Nest, 104
mathematical point, 29
Index 461

neural activity in the brain, measurement, orientation analysis, classical papers, 342
191 orientation bundle, 342
Nielsen, Mads, 25 orthogonal polynomial, 229
Niessen, Wiro, 373 orthogonal invariance, 103
noise, structure, 109 orthogonal matrix, 101
noise images, 424 orthogonal transformation, 101
noise is structure, 10 orthonormal transformation, 101
non-creation, of local extrema, 352 Osher, Stanley, 362
non-linear diffusion, for multi-scale Outer, 99
watershed segmentation, 239 outer scale, 3
nonlinear diffusion, 109 ovary, 225
norm, summation, 151
normal constraint, 292 packages, 423
normal distribution, 13 palettes, 423
normal flow, constraint, 301 palindrome, 401
normal motion flow, 383 parabolic lines, 124
normalization, 38 parallel lines illusion, 330
normalized filter, 25 parasol ganglion cell, 169, 288
normalized derivative operator, 218 Parceval theorem, 148
normalized derivatives, 218 partial differential equation, 361
normalized feature detection, 218 partial volume effect, 12
nullspace, 17 parvo-cellular layers, 181
LGN, 169
observation scale, 216 pattern matching, in Mathematica, 278
observations, 1 pattern matching, 401
critical view, 9 in Mathematica, 322
observed derivative, 137 PDE’s in computer vision, 184
occlusion boundary, 292 perceptual grouping, 273
occlusion boundaries, 128 perceptual grouping, 184, 200
ocular dominance bands, 191,197 Perona, Pietro, 363
off-center center-surround, 160 Perona and Malik, equation, 364
Olver, Peter, 384 Perona-Malik diffusion scheme, 239
on-center center-surround, 160 PET, 194
OpenGL viewer MathGL3d, 95 photopic vision, 158
operator, differential, 13 physical dimensions, 15
local, 2 physics of observation, 1, 21
spatio-temporal, 347 Pi-theorem, 18
operators, 13 pinwheel, 198
optic radiation, 183 singularity, 199
optic chiasm, 179 piramidal datavolume, 225
optic chiasma, 157 pitfalls in Mathematica, 407
optic flow, 285 Planck’s formula, 312
scale selection, 307 plot commands, 397
optic flow constraint equation (OFCE), PlotGradientField, 48
290 PlotLegend, 64
optic nerve, 5 pointspread function, 48, 74
optical imaging, voltage sensitive dyes, Poisseuille coefficient, 20
194 Poisson scale-space, 67
orderlessness, 26 positron, 194
organ of Corti, 161 positron emission tomography (PET), 194
orientation, column, 199 post stimulus-time histogram, 164
of vector, 330 post-synaptic potential, excitatory (EPSP),
sensitivity, 197 inhibitory (IPSP), 287
sensitivity tuning, 197 Powers of Ten, 4
tuning curve, 189 prevent the textual output, 423
462 Index

primal sketch, 271 resolution, CT scanner, 2


primary visual cortex, 162, 185 infinite, 9
principal curvature, 119 spatial, 2
principal directions, 122 spurious, 11
Printing Style Environment, 423 temporal, 2, 346
projective transformation, 101 retina, 153
proper print layout, 423 as multi-scale sampling device, 177
psychophysics, 5, 155 eccentricity, 170
Puccini pressure-sensitive receptors, 161 layers, 157
pure function, 104, 401 sensitivity, 159
pure functions, 424 retinal sampling, scale-space model, 167
pyramidal cell, 199 retinal receptive fields, scale-space model,
172
radioactive isotope, 194 retinotopic projection, 181
radioactive tracer, 190 retrograde projection, 184
random numbers, 424 reverse correlation, 162
random dot stereogram, 8 Reynold’s number, 18
read an image, 423 RGB color space, 317
read binary 3D data, 405 rhodopsine, 159
RealTime3D, 424 ridge, 117
receptive field, retinal, 154 ridge detection, 108
sensitivity pattern, 160 ridgeness, 329
receptive field, 4, 32, 156 rods, 157
as smooth test function, 146 electron-microscopic cross-section, 159
center-surround, 173 rods and cones, 32
cortical, directional selectivity, 189 RotationMatrix2D, 98
diameter, 170 RotationMatrix3D, 98
dynamic, 165 rowing, 19
real-time, 353 rut, 120
recording, 354
retina, 160 saddle point, 93, 117
sensitivity profile measurement, 162 saddle rot, 120
skin, 161 saddle shapes, 117
spatio-temporal, 182 sagittal plane, 406
static, 164 sampling property of derivatives, 41
temporal modulation, 188 sampling function, 41
time-causal, model, 354 sampling points, 423
red-green edge detector, 324 sampling rate reduction, 33
region focusing, 235 Sapiro, Guillermo, 384
region label, 233 scalar, 100
region linking, 234 scalar image, optic flow, 292
regular point, 244 scale, exponential parametrization, 35
regular tempered distribution, 145 temporal, 346
regular tempered distributions, 144 scale invariance, 15
regularization, 35, 143 scale selection, 216
example, 147 optic flow, 307
methods, 152 scale invariance, 132, 151
Tikhonov, 148 scale parameter, 40
regularization property, scale-space scale selection, 215
kernels, 277 scale-adptive system, 215
Reichardt, Werner, 287 scale-invariance, 32, 167
Reichardt detector, 286 scale-space, 8
generalized, 288 foundations, 13
Reichardt motion sensitive cell, 190 from causality, 23
resizing graphics, 424 from entropy, 25
Index 463

history, 13 singularities, 220


regularization properties, 148 singularity, 94, 107, 241
stack, 168 drift velocity, 265
temporal, 346 evolution in scale-space, 242
scale-space theory, linear, 9 singularity points, annihilation, 223
scale-space saddle, 272 size of notebooks, 423
scale-space stack, 31 skewness, 356
scale-space theory, theory of apertures, smooth test function, 30, 144
144 smoothing data, 144
scale-step, in nonlinear diffusion schemes, smoothing property, 39
384 snakes, 144
scale-time, 346 somatotopic projection, 179
scaling laws, 20 spatial frequency, 21
Schwartz, 30, 41 spatial shift invariance, 15
Laurent, 144, 146 spatio-temporal operator, 347
Schwartz space, 144 special transformation, 99
scotopic vision, 158 special orthogonal transformation, 101
Screen Style Environment, 423 special plot forms, 404
second order invariants, color mapping, spectral reflection function, 312
118 spectrum, 46
second order structure, 117 speed concerns, 85
segmentation, 216 spherical harmonics, to second order, 229
need, 91 splines, 30
self similar functions, 35 splitting lemma, 245,248
self-similar function, 39 spurious resolution, 11, 30
self-similarity, 32 stability, of Gaussian linear diffusion, 373
selfsimilarity, 39 of numerical PDE’s, 372
semi-axis, 349 stabilized retinal images, 200
semicolon, as separator, 92 state parameter, 243
sensitivity profile, 28 steerable filter, 331
separability, 43 steerable kernels, 329
Separability, 43 steering, with Cartesian partial derivatives,
separatrix, 272 336
sequence of images, 423 with self-similar functions, 332
Series, Taylor expansion, 97 stellate tumore detection, 338
Sethian, James, 362 step function illusion, 175
shading, 313 step-edge, 42
shadows, 78 stepsize, in numerical PDE evolution, 365
Shah, Jayant, 362 stop, 423
shape operator, 119 structural equivalence, 244
shape graph, 120 structure, generation, 23
shape index, 120 of noise, 109
shape-context operator, 274 second order, 117
shear transformation, 113 structure matrix, 343, 363
shift-integral, 415 structuring element, 386
shocks, 274 subimage, 423
shortening flow, 386 submatrix, 423
shortnotation, 105 superficial structure, 215
signal-to-noise ratio, 368 swallowtail catastrophe, 250
signature, of a signal, 223 symmetric tensor, 99
signature function, noisy edge, 223
simple cells, 187 T-junction detection, 127
simple cells, 187 T-junction-likeniness, 128
simultaneous scales observation, 167 Tannenbanm, Andy, 384
single cell recordings, 191 taxonomy, 187
464 Index

Taylor expansion, 96, 278 vicinity, on- and off-center receptive


tempered distributions, 144 fields, 170
temporal width, 345 visual system, self-organization, 190
tensor, contraction, 135 visual acuity, 171
indices, 135 visual column, 198
tensor analysis, 31 visual context, 155
tensor notation, 135 visual fading illusion, 200
test function, 30 visual front-end, 4
text on graphics, 424 visual pathway, 4, 156, 179
texture, multi-scale orientation, 329 visual system, color processing, 190
thalamus, 179 voltage sensitive dyes, 191,194
Thatcher illusion, 102 Von Neumann stability criterion, 372
theorem, Parceval, 148 voting scheme, 235
theory of distributions, 41
thin plate splines, 30, 144 warping, 289
third order structure, 127 watershed segmentation, MR brain, 238
Thorn, Ren~, 245 watershed segmentation, 215,232
Thorn’s theorem, 245 Weickert, Joachim, 362
Tikhonov regularization, 144, 148 well-posed, 143
time-resolution, 350 well-posed derivatives, 147
time-sequences, pre-recorded, 346 wetware, 154
Toeplitz matrix, 277 Weyl, Herman, 100
Tolansky’s curvature illusion, 111 Wiesel, Torsten, 160
tomographic image, 130 winding number, 257, 259
topological organization, 198 3D, 226
topological number, in scale-space, 258 on 2D images, 260
topological numbers, 257 Wolfram, Stephen, 395
topology, 31, 94
toms, 405 yellow-blue edge detector, 324
total edge strength, color, 322
transformation, affine, 113 zero crossings, Laplacian, 112
rotation, 100 zero-crossing detection, 222
shear, 113 zooming, 33
transformation group, 97
transformations, 96 &operator, 99
transversal plane, 406 g(x), 41
transversality, 247 E-operator, 99
tree structure, 215 ~t-normalized scale selection, 220
triangle function, 47
trough, 120

ultrasound, 382
umbilical point, 119, 124, 263
uncommitted, 9, 238
uncommittment, 27
unit vector, 100
unitstep function, 42
unity filter, 25

V1, primary visual cortex, 185


van den Boomgaard, Rein, 390
variance, 40
variational calculus, 148
vector, orthogonal, 25
vectorfield, 293
Computational Imaging and Vision

1. B.M. ter Haar Romeny (ed.): Geometry-Driven Diffusion in Computer Vision. 1994
ISBN 0-7923-3087-0
2. J. Serra and P. Soille (eds.): Mathematical Morphology and Its"Applications to hnage
Processing. 1994 ISBN 0-7923-3093-5
3. Y. Bizais, C. Barillot, and R. Di Paola (eds.): lnJormation Processing in Medical
Imaging. 1995 ISBN 0-7923-3593-7
4. R Grangeat and J.-L. Amans (eds.): Three-Dimensional hnage Reconstruction in
Radiology and Nuclear Medicine. 1996 ISBN 0-7923-4129-5
5. P. Maragos, R.W. Schafer and M.A. Butt (eds.): Mathematical Morphology
and Its Applications to Image and Signal Processing. 1996 ISBN 0-7923-9733-9
6. G. Xu and Z. Zhang: Epipolar Geometry in Stereo, Motion and Object Recognition.
A Unified Approach. 1996 ISBN 0-7923-4199-6
7. D. Eberly: Ridges in hnage and Data Analysis. 1996 ISBN 0-7923-4268-2
8. J. Sporring, M. Nielsen, L. Florack and R Johansen (eds.): Gaussian Scale-Space
Theory. 1997 ISBN 0-7923-456 i-4
9. M. Shah and R. Jain (eds.): Motion-BasedRecognition. 1997 ISBN 0-7923-4618-1
10. L. Florack: hnage Structure. 1997 ISBN 0-7923-4808-7
1 l. L.J. Latecki: Discrete Representation of Spatial Objects in Computer Vision. 1998
ISBN 0-7923-4912-1
12. H.J.A.M. Heijmans and J.B.T.M. Roerdink (eds.): Mathematical Morphology and its
Applications to hnage and Signal Processing. 1998 ISBN 0-7923-5133-9
13. N. Karssemeijer, M. Thijssen, J. Hendriks and L. van Eming (eds.): Digital Mammo-
graphy. 1998 ISBN 0-7923-5274-2
14. R. Highnam and M. Brady: Mammographic hnage Analysis. 1999
1SBN 0-7923-5620-9
15. I. Amidror: The Theo O, ~[the Moir~ Phenomenon. 2000 ISBN 0-7923-5949-6;
Pb: ISBN 0-7923-5950-x
16. G.L. Gimel’farb: Image Textures and Gibbs Random Fields. 1999 ISBN 0-7923-5961
17. R. Klette, H.S. Stiehl, M.A. Viergever and K.L. Vincken (eds.): Performance Char-
acterization in Computer Vision. 2000 ISBN 0-7923-6374-4
18. J. Goutsias, L. Vincent and D.S. Bloomberg (eds.): Mathematical Morphology and
Its Applications to hnage and Signal Processing. 2000 ISBN 0-7923-7862-8
19. A.A. Petrosian and EG. Meyer (eds.): Wavelets in Signal and Image Analysis. From
Theory to Practice. 2001 ISBN 1-4020-0053-7
20. A. Jakli6, A. Leonardis and E Solina: Segmentation and Recovery of Superquadrics.
2000 ISBN 0-7923-6601-8
21. K. Rohr: Landmark-Based Image Analysis’. Using Geometric and Intensity Models.
2001 ISBN 0-7923-6751-0
22. R.C. Veltkamp, H. Burkhardt and H.-R Kriegel (eds.): State<~j-the-Art in Content-
Based hnage and Video Retrieval. 2001 ISBN 1-4020-0109-6
23. A.A. Amini and J.L. Prince (eds.): Measurement of Cardiac Deformations from MRI:
Physical and Mathematical Models. 2001 ISBN 1-4020-0222-X
Computational Imaging and Vision

24. M.I. Schlesinger and V. Hlavgt6: Ten Lectures on Statistical and Structural Pattern
Recognition. 2002 ISBN 1-4020-0642-X
25. E Mokhtarian and M. Bober: Curvature Scale Space Representation: Theory, Appli-
cations, and MPEG- 7 Standardization. 2003 ISBN 1-4020-1233-0
26. N. Sebe and M.S. Lew: Robust Computer Vision: Theory and Applications.
2003 ISBN 1-4020-1293-8
27. B.M. ter Haar Romeny: Front-End Vision and Multi-Scale Image Analysis:
Multi-Scale Computer Vision Theory and Applications, written in Mathematic&
2008 ISBN 978-1-4020-1503-8 (HB); 978-1-4020-1507-6 (PB)

springer.com

You might also like