2010 Maria Petrou, Costas Petrou (Auth.) Image Processing - The Fundamentals, Second Edition
2010 Maria Petrou, Costas Petrou (Auth.) Image Processing - The Fundamentals, Second Edition
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
Image Processing: The Fundamentals
Maria Petrou Costas Petrou
A John Wiley and Sons, Ltd., Publication
This edition rst published 2010
c 2010 John Wiley & Sons Ltd
Registered oce
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial oces, for customer services and for information about how to apply for
permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identied as the author of this work has been asserted in accordance with the
Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of
the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not
be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand
names and product names used in this book are trade names, service marks, trademarks or registered
trademarks of their respective owners. The publisher is not associated with any product or vendor
mentioned in this book. This publication is designed to provide accurate and authoritative information in
regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in
rendering professional services. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Petrou, Maria.
Image processing : the fundamentals / Maria Petrou, Costas Petrou. 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-74586-1 (cloth)
1. Image processingDigital techniques.
TA1637.P48 2010
621.36
7dc22
2009053150
ISBN 978-0-470-74586-1
A catalogue record for this book is available from the British Library.
Set in 10/12 Computer Modern by Laserwords Private Ltd, Chennai, India.
Printed in Singapore by Markono
This book is dedicated to our mother and grandmother
Dionisia, for all her love and sacrices.
Contents
Preface xxiii
1 Introduction 1
Why do we process images? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What is an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What is a digital image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What is a spectral band? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Why do most image processing algorithms refer to grey images, while most images
we come across are colour images? . . . . . . . . . . . . . . . . . . . . . . . . 2
How is a digital image formed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
If a sensor corresponds to a patch in the physical world, how come we can have more
than one sensor type corresponding to the same patch of the scene? . . . . . 3
What is the physical meaning of the brightness of an image at a pixel position? . . 3
Why are images often quoted as being 512 512, 256 256, 128 128 etc? . . . . 6
How many bits do we need to store an image? . . . . . . . . . . . . . . . . . . . . . 6
What determines the quality of an image? . . . . . . . . . . . . . . . . . . . . . . . 7
What makes an image blurred? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
What is meant by image resolution? . . . . . . . . . . . . . . . . . . . . . . . . . . 7
What does good contrast mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
What is the purpose of image processing? . . . . . . . . . . . . . . . . . . . . . . . 11
How do we do image processing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Do we use nonlinear operators in image processing? . . . . . . . . . . . . . . . . . 12
What is a linear operator? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How are linear operators dened? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
What is the relationship between the point spread function of an imaging device
and that of a linear operator? . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How does a linear operator transform an image? . . . . . . . . . . . . . . . . . . . 12
What is the meaning of the point spread function? . . . . . . . . . . . . . . . . . . 13
Box 1.1. The formal denition of a point source in the continuous domain . . . . . 14
How can we express in practice the eect of a linear operator on an image? . . . . 18
Can we apply more than one linear operators to an image? . . . . . . . . . . . . . 22
Does the order by which we apply the linear operators make any dierence to the
result? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Box 1.2. Since matrix multiplication is not commutative, how come we can change
the order by which we apply shift invariant linear operators? . . . . . . . . . 22
vii
viii Contents
Box 1.3. What is the stacking operator? . . . . . . . . . . . . . . . . . . . . . . . . 29
What is the implication of the separability assumption on the structure of matrix H? 38
How can a separable transform be written in matrix form? . . . . . . . . . . . . . 39
What is the meaning of the separability assumption? . . . . . . . . . . . . . . . . . 40
Box 1.4. The formal derivation of the separable matrix equation . . . . . . . . . . 41
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 43
What is the signicance of equation (1.108) in linear image processing? . . . . . . 43
What is this book about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2 Image Transformations 47
What is this chapter about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
How can we dene an elementary image? . . . . . . . . . . . . . . . . . . . . . . . 47
What is the outer product of two vectors? . . . . . . . . . . . . . . . . . . . . . . . 47
How can we expand an image in terms of vector outer products? . . . . . . . . . . 47
How do we choose matrices h
c
and h
r
? . . . . . . . . . . . . . . . . . . . . . . . . . 49
What is a unitary matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
What is the inverse of a unitary transform? . . . . . . . . . . . . . . . . . . . . . . 50
How can we construct a unitary matrix? . . . . . . . . . . . . . . . . . . . . . . . . 50
How should we choose matrices U and V so that g can be represented by fewer bits
than f? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
What is matrix diagonalisation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Can we diagonalise any matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.1 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 51
How can we diagonalise an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Box 2.1. Can we expand in vector outer products any image? . . . . . . . . . . . . 54
How can we compute matrices U, V and
1
2
needed for image diagonalisation? . . 56
Box 2.2. What happens if the eigenvalues of matrix gg
T
are negative? . . . . . . . 56
What is the singular value decomposition of an image? . . . . . . . . . . . . . . . . 60
Can we analyse an eigenimage into eigenimages? . . . . . . . . . . . . . . . . . . . 61
How can we approximate an image using SVD? . . . . . . . . . . . . . . . . . . . . 62
Box 2.3. What is the intuitive explanation of SVD? . . . . . . . . . . . . . . . . . 62
What is the error of the approximation of an image by SVD? . . . . . . . . . . . . 63
How can we minimise the error of the reconstruction? . . . . . . . . . . . . . . . . 65
Are there any sets of elementary images in terms of which any image may be expanded? 72
What is a complete and orthonormal set of functions? . . . . . . . . . . . . . . . . 72
Are there any complete sets of orthonormal discrete valued functions? . . . . . . . 73
2.2 Haar, Walsh and Hadamard transforms . . . . . . . . . . . . . . . . . . 74
How are the Haar functions dened? . . . . . . . . . . . . . . . . . . . . . . . . . . 74
How are the Walsh functions dened? . . . . . . . . . . . . . . . . . . . . . . . . . 74
Box 2.4. Denition of Walsh functions in terms of the Rademacher functions . . . 74
How can we use the Haar or Walsh functions to create image bases? . . . . . . . . 75
How can we create the image transformation matrices from the Haar and Walsh
functions in practice? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
What do the elementary images of the Haar transform look like? . . . . . . . . . . 80
Can we dene an orthogonal matrix with entries only +1 or 1? . . . . . . . . . . 85
Box 2.5. Ways of ordering the Walsh functions . . . . . . . . . . . . . . . . . . . . 86
What do the basis images of the Hadamard/Walsh transform look like? . . . . . . 88
Contents ix
What are the advantages and disadvantages of the Walsh and the Haar transforms? 92
What is the Haar wavelet? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.3 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . 94
What is the discrete version of the Fourier transform (DFT)? . . . . . . . . . . . . 94
Box 2.6. What is the inverse discrete Fourier transform? . . . . . . . . . . . . . . . 95
How can we write the discrete Fourier transform in a matrix form? . . . . . . . . . 96
Is matrix U used for DFT unitary? . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Which are the elementary images in terms of which DFT expands an image? . . . 101
Why is the discrete Fourier transform more commonly used than the other
transforms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
What does the convolution theorem state? . . . . . . . . . . . . . . . . . . . . . . . 105
Box 2.7. If a function is the convolution of two other functions, what is the rela-
tionship of its DFT with the DFTs of the two functions? . . . . . . . . . . . . 105
How can we display the discrete Fourier transform of an image? . . . . . . . . . . . 112
What happens to the discrete Fourier transform of an image if the image
is rotated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
What happens to the discrete Fourier transform of an image if the image
is shifted? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
What is the relationship between the average value of the image and its DFT? . . 118
What happens to the DFT of an image if the image is scaled? . . . . . . . . . . . . 119
Box 2.8. What is the Fast Fourier Transform? . . . . . . . . . . . . . . . . . . . . . 124
What are the advantages and disadvantages of DFT? . . . . . . . . . . . . . . . . . 126
Can we have a real valued DFT? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Can we have a purely imaginary DFT? . . . . . . . . . . . . . . . . . . . . . . . . . 130
Can an image have a purely real or a purely imaginary valued DFT? . . . . . . . . 137
2.4 The even symmetric discrete cosine transform (EDCT) . . . . . . . . 138
What is the even symmetric discrete cosine transform? . . . . . . . . . . . . . . . . 138
Box 2.9. Derivation of the inverse 1D even discrete cosine transform . . . . . . . . 143
What is the inverse 2D even cosine transform? . . . . . . . . . . . . . . . . . . . . 145
What are the basis images in terms of which the even cosine transform expands an
image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.5 The odd symmetric discrete cosine transform (ODCT) . . . . . . . . 149
What is the odd symmetric discrete cosine transform? . . . . . . . . . . . . . . . . 149
Box 2.10. Derivation of the inverse 1D odd discrete cosine transform . . . . . . . . 152
What is the inverse 2D odd discrete cosine transform? . . . . . . . . . . . . . . . . 154
What are the basis images in terms of which the odd discrete cosine transform
expands an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.6 The even antisymmetric discrete sine transform (EDST) . . . . . . . 157
What is the even antisymmetric discrete sine transform? . . . . . . . . . . . . . . . 157
Box 2.11. Derivation of the inverse 1D even discrete sine transform . . . . . . . . . 160
What is the inverse 2D even sine transform? . . . . . . . . . . . . . . . . . . . . . . 162
What are the basis images in terms of which the even sine transform expands an
image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
What happens if we do not remove the mean of the image before we compute its
EDST? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
2.7 The odd antisymmetric discrete sine transform (ODST) . . . . . . . 167
What is the odd antisymmetric discrete sine transform? . . . . . . . . . . . . . . . 167
x Contents
Box 2.12. Derivation of the inverse 1D odd discrete sine transform . . . . . . . . . 171
What is the inverse 2D odd sine transform? . . . . . . . . . . . . . . . . . . . . . . 172
What are the basis images in terms of which the odd sine transform expands an
image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 176
3 Statistical Description of Images 177
What is this chapter about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Why do we need the statistical description of images? . . . . . . . . . . . . . . . . 177
3.1 Random elds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
What is a random eld? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
What is a random variable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
What is a random experiment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
How do we perform a random experiment with computers? . . . . . . . . . . . . . 178
How do we describe random variables? . . . . . . . . . . . . . . . . . . . . . . . . . 178
What is the probability of an event? . . . . . . . . . . . . . . . . . . . . . . . . . . 179
What is the distribution function of a random variable? . . . . . . . . . . . . . . . 180
What is the probability of a random variable taking a specic value? . . . . . . . . 181
What is the probability density function of a random variable? . . . . . . . . . . . 181
How do we describe many random variables? . . . . . . . . . . . . . . . . . . . . . 184
What relationships may n random variables have with each other? . . . . . . . . . 184
How do we dene a random eld? . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
How can we relate two random variables that appear in the same random eld? . . 190
How can we relate two random variables that belong to two dierent random
elds? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
If we have just one image from an ensemble of images, can we calculate expectation
values? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
When is a random eld homogeneous with respect to the mean? . . . . . . . . . . 195
When is a random eld homogeneous with respect to the autocorrelation function? 195
How can we calculate the spatial statistics of a random eld? . . . . . . . . . . . . 196
How do we compute the spatial autocorrelation function of an image in practice? . 196
When is a random eld ergodic with respect to the mean? . . . . . . . . . . . . . . 197
When is a random eld ergodic with respect to the autocorrelation function? . . . 197
What is the implication of ergodicity? . . . . . . . . . . . . . . . . . . . . . . . . . 199
Box 3.1. Ergodicity, fuzzy logic and probability theory . . . . . . . . . . . . . . . . 200
How can we construct a basis of elementary images appropriate for expressing in an
optimal way a whole set of images? . . . . . . . . . . . . . . . . . . . . . . . . 200
3.2 Karhunen-Loeve transform . . . . . . . . . . . . . . . . . . . . . . . . . . 201
What is the Karhunen-Loeve transform? . . . . . . . . . . . . . . . . . . . . . . . . 201
Why does diagonalisation of the autocovariance matrix of a set of images dene a
desirable basis for expressing the images in the set? . . . . . . . . . . . . . . . 201
How can we transform an image so its autocovariance matrix becomes diagonal? . 204
What is the form of the ensemble autocorrelation matrix of a set of images, if the
ensemble is stationary with respect to the autocorrelation? . . . . . . . . . . 210
How do we go from the 1D autocorrelation function of the vector representation of
an image to its 2D autocorrelation matrix? . . . . . . . . . . . . . . . . . . . 211
How can we transform the image so that its autocorrelation matrix is diagonal? . . 213
Contents xi
How do we compute the K-L transform of an image in practice? . . . . . . . . . . 214
How do we compute the Karhunen-Loeve (K-L) transform of an ensemble of
images? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Is the assumption of ergodicity realistic? . . . . . . . . . . . . . . . . . . . . . . . . 215
Box 3.2. How can we calculate the spatial autocorrelation matrix of an image, when
it is represented by a vector? . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Is the mean of the transformed image expected to be really 0? . . . . . . . . . . . 220
How can we approximate an image using its K-L transform? . . . . . . . . . . . . . 220
What is the error with which we approximate an image when we truncate its K-L
expansion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
What are the basis images in terms of which the Karhunen-Loeve transform expands
an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Box 3.3. What is the error of the approximation of an image using the Karhunen-
Loeve transform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
3.3 Independent component analysis . . . . . . . . . . . . . . . . . . . . . . 234
What is Independent Component Analysis (ICA)? . . . . . . . . . . . . . . . . . . 234
What is the cocktail party problem? . . . . . . . . . . . . . . . . . . . . . . . . . . 234
How do we solve the cocktail party problem? . . . . . . . . . . . . . . . . . . . . . 235
What does the central limit theorem say? . . . . . . . . . . . . . . . . . . . . . . . 235
What do we mean by saying that the samples of x
1
(t) are more Gaussianly dis-
tributed than either s
1
(t) or s
2
(t) in relation to the cocktail party problem?
Are we talking about the temporal samples of x
1
(t), or are we talking about
all possible versions of x
1
(t) at a given time? . . . . . . . . . . . . . . . . . . 235
How do we measure non-Gaussianity? . . . . . . . . . . . . . . . . . . . . . . . . . 239
How are the moments of a random variable computed? . . . . . . . . . . . . . . . . 239
How is the kurtosis dened? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
How is negentropy dened? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
How is entropy dened? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Box 3.4. From all probability density functions with the same variance, the Gaussian
has the maximum entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
How is negentropy computed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Box 3.5. Derivation of the approximation of negentropy in terms of moments . . . 252
Box 3.6. Approximating the negentropy with nonquadratic functions . . . . . . . . 254
Box 3.7. Selecting the nonquadratic functions with which to approximate the ne-
gentropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
How do we apply the central limit theorem to solve the cocktail party problem? . . 264
How may ICA be used in image processing? . . . . . . . . . . . . . . . . . . . . . . 264
How do we search for the independent components? . . . . . . . . . . . . . . . . . 264
How can we whiten the data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
How can we select the independent components from whitened data? . . . . . . . . 267
Box 3.8. How does the method of Lagrange multipliers work? . . . . . . . . . . . . 268
Box 3.9. How can we choose a direction that maximises the negentropy? . . . . . . 269
How do we perform ICA in image processing in practice? . . . . . . . . . . . . . . 274
How do we apply ICA to signal processing? . . . . . . . . . . . . . . . . . . . . . . 283
What are the major characteristics of independent component analysis? . . . . . . 289
What is the dierence between ICA as applied in image and in signal processing? . 290
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 292
xii Contents
4 Image Enhancement 293
What is image enhancement? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
How can we enhance an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
What is linear ltering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
4.1 Elements of linear lter theory . . . . . . . . . . . . . . . . . . . . . . . . 294
How do we dene a 2D lter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
How are the frequency response function and the unit sample response of the lter
related? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Why are we interested in the lter function in the real domain? . . . . . . . . . . . 294
Are there any conditions which h(k, l) must full so that it can be used as a convo-
lution lter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Box 4.1. What is the unit sample response of the 2D ideal low pass lter? . . . . . 296
What is the relationship between the 1D and the 2D ideal lowpass lters? . . . . . 300
How can we implement in the real domain a lter that is innite in extent? . . . . 301
Box 4.2. z-transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Can we dene a lter directly in the real domain for convenience? . . . . . . . . . 309
Can we dene a lter in the real domain, without side lobes in the frequency
domain? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
4.2 Reducing high frequency noise . . . . . . . . . . . . . . . . . . . . . . . . 311
What are the types of noise present in an image? . . . . . . . . . . . . . . . . . . . 311
What is impulse noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
What is Gaussian noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
What is additive noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
What is multiplicative noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
What is homogeneous noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
What is zero-mean noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
What is biased noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
What is independent noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
What is uncorrelated noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
What is white noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
What is the relationship between zero-mean uncorrelated and white noise? . . . . 313
What is iid noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Is it possible to have white noise that is not iid? . . . . . . . . . . . . . . . . . . . 315
Box 4.3. The probability density function of a function of a random variable . . . 320
Why is noise usually associated with high frequencies? . . . . . . . . . . . . . . . . 324
How do we deal with multiplicative noise? . . . . . . . . . . . . . . . . . . . . . . . 325
Box 4.4. The Fourier transform of the delta function . . . . . . . . . . . . . . . . . 325
Box 4.5. Wiener-Khinchine theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Is the assumption of Gaussian noise in an image justied? . . . . . . . . . . . . . . 326
How do we remove shot noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
What is a rank order lter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
What is median ltering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
What is mode ltering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
How do we reduce Gaussian noise? . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Can we have weighted median and mode lters like we have weighted mean lters? 333
Can we lter an image by using the linear methods we learnt in Chapter 2? . . . . 335
How do we deal with mixed noise in images? . . . . . . . . . . . . . . . . . . . . . 337
Contents xiii
Can we avoid blurring the image when we are smoothing it? . . . . . . . . . . . . . 337
What is the edge adaptive smoothing? . . . . . . . . . . . . . . . . . . . . . . . . . 337
Box 4.6. Ecient computation of the local variance . . . . . . . . . . . . . . . . . 339
How does the mean shift algorithm work? . . . . . . . . . . . . . . . . . . . . . . . 339
What is anisotropic diusion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Box 4.7. Scale space and the heat equation . . . . . . . . . . . . . . . . . . . . . . 342
Box 4.8. Gradient, Divergence and Laplacian . . . . . . . . . . . . . . . . . . . . . 345
Box 4.9. Dierentiation of an integral with respect to a parameter . . . . . . . . . 348
Box 4.10. From the heat equation to the anisotropic diusion algorithm . . . . . . 348
How do we perform anisotropic diusion in practice? . . . . . . . . . . . . . . . . . 349
4.3 Reducing low frequency interference . . . . . . . . . . . . . . . . . . . . 351
When does low frequency interference arise? . . . . . . . . . . . . . . . . . . . . . . 351
Can variable illumination manifest itself in high frequencies? . . . . . . . . . . . . 351
In which other cases may we be interested in reducing low frequencies? . . . . . . . 351
What is the ideal high pass lter? . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
How can we enhance small image details using nonlinear lters? . . . . . . . . . . . 357
What is unsharp masking? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
How can we apply the unsharp masking algorithm locally? . . . . . . . . . . . . . . 357
How does the locally adaptive unsharp masking work? . . . . . . . . . . . . . . . . 358
How does the retinex algorithm work? . . . . . . . . . . . . . . . . . . . . . . . . . 360
Box 4.11. Which are the grey values that are stretched most by the retinex
algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
How can we improve an image which suers from variable illumination? . . . . . . 364
What is homomorphic ltering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
What is photometric stereo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
What does atelding mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
How is atelding performed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
4.4 Histogram manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
What is the histogram of an image? . . . . . . . . . . . . . . . . . . . . . . . . . . 367
When is it necessary to modify the histogram of an image? . . . . . . . . . . . . . 367
How can we modify the histogram of an image? . . . . . . . . . . . . . . . . . . . . 367
What is histogram manipulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
What aects the semantic information content of an image? . . . . . . . . . . . . . 368
How can we perform histogram manipulation and at the same time preserve the
information content of the image? . . . . . . . . . . . . . . . . . . . . . . . . . 368
What is histogram equalisation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Why do histogram equalisation programs usually not produce images with at his-
tograms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
How do we perform histogram equalisation in practice? . . . . . . . . . . . . . . . 370
Can we obtain an image with a perfectly at histogram? . . . . . . . . . . . . . . . 372
What if we do not wish to have an image with a at histogram? . . . . . . . . . . 373
How do we do histogram hyperbolisation in practice? . . . . . . . . . . . . . . . . . 373
How do we do histogram hyperbolisation with random additions? . . . . . . . . . . 374
Why should one wish to perform something other than histogram equalisation? . . 374
What if the image has inhomogeneous contrast? . . . . . . . . . . . . . . . . . . . 375
Can we avoid damaging at surfaces while increasing the contrast of genuine tran-
sitions in brightness? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
xiv Contents
How can we enhance an image by stretching only the grey values that appear in
genuine brightness transitions? . . . . . . . . . . . . . . . . . . . . . . . . . . 377
How do we perform pairwise image enhancement in practice? . . . . . . . . . . . . 378
4.5 Generic deblurring algorithms . . . . . . . . . . . . . . . . . . . . . . . . 383
How does mode ltering help deblur an image? . . . . . . . . . . . . . . . . . . . . 383
Can we use an edge adaptive window to apply the mode lter? . . . . . . . . . . . 385
How can mean shift be used as a generic deblurring algorithm? . . . . . . . . . . . 385
What is toboggan contrast enhancement? . . . . . . . . . . . . . . . . . . . . . . . 387
How do we do toboggan contrast enhancement in practice? . . . . . . . . . . . . . 387
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 393
5 Image Restoration 395
What is image restoration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Why may an image require restoration? . . . . . . . . . . . . . . . . . . . . . . . . 395
What is image registration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
How is image restoration performed? . . . . . . . . . . . . . . . . . . . . . . . . . . 395
What is the dierence between image enhancement and image restoration? . . . . 395
5.1 Homogeneous linear image restoration: inverse ltering . . . . . . . . 396
How do we model homogeneous linear image degradation? . . . . . . . . . . . . . . 396
How may the problem of image restoration be solved? . . . . . . . . . . . . . . . . 396
How may we obtain information on the frequency response function
H(u, v) of the
degradation process? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
If we know the frequency response function of the degradation process, isnt the
solution to the problem of image restoration trivial? . . . . . . . . . . . . . . 407
What happens at frequencies where the frequency response function is zero? . . . . 408
Will the zeros of the frequency response function and the image always
coincide? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
How can we avoid the amplication of noise? . . . . . . . . . . . . . . . . . . . . . 408
How do we apply inverse ltering in practice? . . . . . . . . . . . . . . . . . . . . . 410
Can we dene a lter that will automatically take into consideration the noise in
the blurred image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
5.2 Homogeneous linear image restoration: Wiener ltering . . . . . . . 419
How can we express the problem of image restoration as a least square error esti-
mation problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Can we nd a linear least squares error solution to the problem of image
restoration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
What is the linear least mean square error solution of the image restoration
problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
Box 5.1. The least squares error solution . . . . . . . . . . . . . . . . . . . . . . . . 420
Box 5.2. From the Fourier transform of the correlation functions of images to their
spectral densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Box 5.3. Derivation of the Wiener lter . . . . . . . . . . . . . . . . . . . . . . . . 428
What is the relationship between Wiener ltering and inverse ltering? . . . . . . 430
How can we determine the spectral density of the noise eld? . . . . . . . . . . . . 430
How can we possibly use Wiener ltering, if we know nothing about the statistical
properties of the unknown image? . . . . . . . . . . . . . . . . . . . . . . . . 430
How do we apply Wiener ltering in practice? . . . . . . . . . . . . . . . . . . . . . 431
Contents xv
5.3 Homogeneous linear image restoration: Constrained matrix inversion 436
If the degradation process is assumed linear, why dont we solve a system of linear
equations to reverse its eect instead of invoking the convolution theorem? . 436
Equation (5.146) seems pretty straightforward, why bother with any other
approach? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Is there any way by which matrix H can be inverted? . . . . . . . . . . . . . . . . 437
When is a matrix block circulant? . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
When is a matrix circulant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Why can block circulant matrices be inverted easily? . . . . . . . . . . . . . . . . . 438
Which are the eigenvalues and eigenvectors of a circulant matrix? . . . . . . . . . . 438
How does the knowledge of the eigenvalues and the eigenvectors of a matrix help in
inverting the matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
How do we know that matrix H that expresses the linear degradation process is
block circulant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
How can we diagonalise a block circulant matrix? . . . . . . . . . . . . . . . . . . . 445
Box 5.4. Proof of equation (5.189) . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Box 5.5. What is the transpose of matrix H? . . . . . . . . . . . . . . . . . . . . . 448
How can we overcome the extreme sensitivity of matrix inversion to noise? . . . . . 455
How can we incorporate the constraint in the inversion of the matrix? . . . . . . . 456
Box 5.6. Derivation of the constrained matrix inversion lter . . . . . . . . . . . . 459
What is the relationship between the Wiener lter and the constrained matrix in-
version lter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
How do we apply constrained matrix inversion in practice? . . . . . . . . . . . . . 464
5.4 Inhomogeneous linear image restoration: the whirl transform . . . . 468
How do we model the degradation of an image if it is linear but inhomogeneous? . 468
How may we use constrained matrix inversion when the distortion matrix is not
circulant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
What happens if matrix H is really very big and we cannot take its inverse? . . . . 481
Box 5.7. Jacobis method for inverting large systems of linear equations . . . . . . 482
Box 5.8. Gauss-Seidel method for inverting large systems of linear equations . . . . 485
Does matrix H as constructed in examples 5.41, 5.43, 5.44 and 5.45 full the condi-
tions for using the Gauss-Seidel or the Jacobi method? . . . . . . . . . . . . . 485
What happens if matrix H does not satisfy the conditions for the Gauss-Seidel
method? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
How do we apply the gradient descent algorithm in practice? . . . . . . . . . . . . 487
What happens if we do not know matrix H? . . . . . . . . . . . . . . . . . . . . . 489
5.5 Nonlinear image restoration: MAP estimation . . . . . . . . . . . . . . 490
What does MAP estimation mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
How do we formulate the problem of image restoration as a MAP estimation? . . . 490
How do we select the most probable conguration of restored pixel values, given the
degradation model and the degraded image? . . . . . . . . . . . . . . . . . . . 490
Box 5.9. Probabilities: prior, a priori, posterior, a posteriori, conditional . . . . . . 491
Is the minimum of the cost function unique? . . . . . . . . . . . . . . . . . . . . . 491
How can we select then one solution from all possible solutions that minimise the
cost function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Can we combine the posterior and the prior probabilities for a conguration x? . . 493
Box 5.10. Parsevals theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
xvi Contents
How do we model in general the cost function we have to minimise in order to restore
an image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
What is the reason we use a temperature parameter when we model the joint prob-
ability density function, since its does not change the conguration for which
the probability takes its maximum? . . . . . . . . . . . . . . . . . . . . . . . . 501
How does the temperature parameter allow us to focus or defocus in the solution
space? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
How do we model the prior probabilities of congurations? . . . . . . . . . . . . . 501
What happens if the image has genuine discontinuities? . . . . . . . . . . . . . . . 502
How do we minimise the cost function? . . . . . . . . . . . . . . . . . . . . . . . . 503
How do we create a possible new solution from the previous one? . . . . . . . . . . 503
How do we know when to stop the iterations? . . . . . . . . . . . . . . . . . . . . . 505
How do we reduce the temperature in simulated annealing? . . . . . . . . . . . . . 506
How do we perform simulated annealing with the Metropolis sampler in practice? . 506
How do we perform simulated annealing with the Gibbs sampler in practice? . . . 507
Box 5.11. How can we draw random numbers according to a given probability
density function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Why is simulated annealing slow? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
How can we accelerate simulated annealing? . . . . . . . . . . . . . . . . . . . . . . 511
How can we coarsen the conguration space? . . . . . . . . . . . . . . . . . . . . . 512
5.6 Geometric image restoration . . . . . . . . . . . . . . . . . . . . . . . . . 513
How may geometric distortion arise? . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Why do lenses cause distortions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
How can a geometrically distorted image be restored? . . . . . . . . . . . . . . . . 513
How do we perform the spatial transformation? . . . . . . . . . . . . . . . . . . . . 513
How may we model the lens distortions? . . . . . . . . . . . . . . . . . . . . . . . . 514
How can we model the inhomogeneous distortion? . . . . . . . . . . . . . . . . . . 515
How can we specify the parameters of the spatial transformation model? . . . . . . 516
Why is grey level interpolation needed? . . . . . . . . . . . . . . . . . . . . . . . . 516
Box 5.12. The Hough transform for line detection . . . . . . . . . . . . . . . . . . . 520
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 526
6 Image Segmentation and Edge Detection 527
What is this chapter about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
What exactly is the purpose of image segmentation and edge detection? . . . . . . 527
6.1 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
How can we divide an image into uniform regions? . . . . . . . . . . . . . . . . . . 528
What do we mean by labelling an image? . . . . . . . . . . . . . . . . . . . . . . 528
What can we do if the valley in the histogram is not very sharply dened? . . . . . 528
How can we minimise the number of misclassied pixels? . . . . . . . . . . . . . . 529
How can we choose the minimum error threshold? . . . . . . . . . . . . . . . . . . 530
What is the minimum error threshold when object and background pixels are nor-
mally distributed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
What is the meaning of the two solutions of the minimum error threshold
equation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
How can we estimate the parameters of the Gaussian probability density functions
that represent the object and the background? . . . . . . . . . . . . . . . . . 537
Contents xvii
What are the drawbacks of the minimum error threshold method? . . . . . . . . . 541
Is there any method that does not depend on the availability of models for the
distributions of the object and the background pixels? . . . . . . . . . . . . . 541
Box 6.1. Derivation of Otsus threshold . . . . . . . . . . . . . . . . . . . . . . . . 542
Are there any drawbacks in Otsus method? . . . . . . . . . . . . . . . . . . . . . . 545
How can we threshold images obtained under variable illumination? . . . . . . . . 545
If we threshold the image according to the histogram of ln f(x, y), are we
thresholding it according to the reectance properties of the imaged
surfaces? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Box 6.2. The probability density function of the sum of two random variables . . . 546
Since straightforward thresholding methods break down under variable
illumination, how can we cope with it? . . . . . . . . . . . . . . . . . . . . . . 548
What do we do if the histogram has only one peak? . . . . . . . . . . . . . . . . . 549
Are there any shortcomings of the grey value thresholding methods? . . . . . . . . 550
How can we cope with images that contain regions that are not uniform but they
are perceived as uniform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Can we improve histogramming methods by taking into consideration the spatial
proximity of pixels? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
Are there any segmentation methods that take into consideration the spatial prox-
imity of pixels? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
How can one choose the seed pixels? . . . . . . . . . . . . . . . . . . . . . . . . . . 554
How does the split and merge method work? . . . . . . . . . . . . . . . . . . . . . 554
What is morphological image reconstruction? . . . . . . . . . . . . . . . . . . . . . 554
How does morphological image reconstruction allow us to identify the seeds needed
for the watershed algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
How do we compute the gradient magnitude image? . . . . . . . . . . . . . . . . . 557
What is the role of the number we subtract from f to create mask g in the morpho-
logical reconstruction of f by g? . . . . . . . . . . . . . . . . . . . . . . . . . 558
What is the role of the shape and size of the structuring element in the morphological
reconstruction of f by g? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
How does the use of the gradient magnitude image help segment the image by the
watershed algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
Are there any drawbacks in the watershed algorithm which works with the gradient
magnitude image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Is it possible to segment an image by ltering? . . . . . . . . . . . . . . . . . . . . 574
How can we use the mean shift algorithm to segment an image? . . . . . . . . . . . 574
What is a graph? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
How can we use a graph to represent an image? . . . . . . . . . . . . . . . . . . . . 576
How can we use the graph representation of an image to segment it? . . . . . . . . 576
What is the normalised cuts algorithm? . . . . . . . . . . . . . . . . . . . . . . . . 576
Box 6.3. The normalised cuts algorithm as an eigenvalue problem . . . . . . . . . . 576
Box 6.4. How do we minimise the Rayleigh quotient? . . . . . . . . . . . . . . . . . 585
How do we apply the normalised graph cuts algorithm in practice? . . . . . . . . . 589
Is it possible to segment an image by considering the dissimilarities between regions,
as opposed to considering the similarities between pixels? . . . . . . . . . . . 589
6.2 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
How do we measure the dissimilarity between neighbouring pixels? . . . . . . . . . 591
xviii Contents
What is the smallest possible window we can choose? . . . . . . . . . . . . . . . . 592
What happens when the image has noise? . . . . . . . . . . . . . . . . . . . . . . . 593
Box 6.5. How can we choose the weights of a 3 3 mask for edge detection? . . . . 595
What is the best value of parameter K? . . . . . . . . . . . . . . . . . . . . . . . . 596
Box 6.6. Derivation of the Sobel lters . . . . . . . . . . . . . . . . . . . . . . . . . 596
In the general case, how do we decide whether a pixel is an edge pixel or not? . . . 601
How do we perform linear edge detection in practice? . . . . . . . . . . . . . . . . 602
Are Sobel masks appropriate for all images? . . . . . . . . . . . . . . . . . . . . . . 605
How can we choose the weights of the mask if we need a larger mask owing to the
presence of signicant noise in the image? . . . . . . . . . . . . . . . . . . . . 606
Can we use the optimal lters for edges to detect lines in an image in an
optimal way? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
What is the fundamental dierence between step edges and lines? . . . . . . . . . . 609
Box 6.7. Convolving a random noise signal with a lter . . . . . . . . . . . . . . . 615
Box 6.8. Calculation of the signal to noise ratio after convolution of a noisy edge
signal with a lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Box 6.9. Derivation of the good locality measure . . . . . . . . . . . . . . . . . . . 617
Box 6.10. Derivation of the count of false maxima . . . . . . . . . . . . . . . . . . 619
Can edge detection lead to image segmentation? . . . . . . . . . . . . . . . . . . . 620
What is hysteresis edge linking? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Does hysteresis edge linking lead to closed edge contours? . . . . . . . . . . . . . . 621
What is the Laplacian of Gaussian edge detection method? . . . . . . . . . . . . . 623
Is it possible to detect edges and lines simultaneously? . . . . . . . . . . . . . . . . 623
6.3 Phase congruency and the monogenic signal . . . . . . . . . . . . . . . 625
What is phase congruency? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
What is phase congruency for a 1D digital signal? . . . . . . . . . . . . . . . . . . 625
How does phase congruency allow us to detect lines and edges? . . . . . . . . . . . 626
Why does phase congruency coincide with the maximum of the local energy of the
signal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
How can we measure phase congruency? . . . . . . . . . . . . . . . . . . . . . . . . 627
Couldnt we measure phase congruency by simply averaging the phases of the har-
monic components? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
How do we measure phase congruency in practice? . . . . . . . . . . . . . . . . . . 630
How do we measure the local energy of the signal? . . . . . . . . . . . . . . . . . . 630
Why should we perform convolution with the two basis signals in order to get the
projection of the local signal on the basis signals? . . . . . . . . . . . . . . . . 632
Box 6.11. Some properties of the continuous Fourier transform . . . . . . . . . . . 637
If all we need to compute is the local energy of the signal, why dont we use Parsevals
theorem to compute it in the real domain inside a local window? . . . . . . . 647
How do we decide which lters to use for the calculation of the local energy? . . . 648
How do we compute the local energy of a 1D signal in practice? . . . . . . . . . . . 651
How can we tell whether the maximum of the local energy corresponds to a sym-
metric or an antisymmetric feature? . . . . . . . . . . . . . . . . . . . . . . . 652
How can we compute phase congruency and local energy in 2D? . . . . . . . . . . 659
What is the analytic signal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
How can we generalise the Hilbert transform to 2D? . . . . . . . . . . . . . . . . . 660
How do we compute the Riesz transform of an image? . . . . . . . . . . . . . . . . 660
Contents xix
How can the monogenic signal be used? . . . . . . . . . . . . . . . . . . . . . . . . 660
How do we select the even lter we use? . . . . . . . . . . . . . . . . . . . . . . . . 661
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 668
7 Image Processing for Multispectral Images 669
What is a multispectral image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
What are the problems that are special to multispectral images? . . . . . . . . . . 669
What is this chapter about? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
7.1 Image preprocessing for multispectral images . . . . . . . . . . . . . . 671
Why may one wish to replace the bands of a multispectral image with other
bands? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
How do we usually construct a grey image from a multispectral image? . . . . . . 671
How can we construct a single band from a multispectral image that contains the
maximum amount of image information? . . . . . . . . . . . . . . . . . . . . . 671
What is principal component analysis? . . . . . . . . . . . . . . . . . . . . . . . . . 672
Box 7.1. How do we measure information? . . . . . . . . . . . . . . . . . . . . . . . 673
How do we perform principal component analysis in practice? . . . . . . . . . . . . 674
What are the advantages of using the principal components of an image, instead of
the original bands? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
What are the disadvantages of using the principal components of an image instead
of the original bands? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Is it possible to work out only the rst principal component of a multispectral image
if we are not interested in the other components? . . . . . . . . . . . . . . . . 682
Box 7.2. The power method for estimating the largest eigenvalue of a matrix . . . 682
What is the problem of spectral constancy? . . . . . . . . . . . . . . . . . . . . . . 684
What inuences the spectral signature of a pixel? . . . . . . . . . . . . . . . . . . . 684
What is the reectance function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
Does the imaging geometry inuence the spectral signature of a pixel? . . . . . . . 684
How does the imaging geometry inuence the light energy a pixel receives? . . . . 685
How do we model the process of image formation for Lambertian surfaces? . . . . 685
How can we eliminate the dependence of the spectrum of a pixel on the imaging
geometry? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
How can we eliminate the dependence of the spectrum of a pixel on the spectrum
of the illuminating source? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
What happens if we have more than one illuminating sources? . . . . . . . . . . . 687
How can we remove the dependence of the spectral signature of a pixel on the
imaging geometry and on the spectrum of the illuminant? . . . . . . . . . . . 687
What do we have to do if the imaged surface is not made up from the same
material? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
What is the spectral unmixing problem? . . . . . . . . . . . . . . . . . . . . . . . . 688
How do we solve the linear spectral unmixing problem? . . . . . . . . . . . . . . . 689
Can we use library spectra for the pure materials? . . . . . . . . . . . . . . . . . . 689
How do we solve the linear spectral unmixing problem when we know the spectra
of the pure components? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
Is it possible that the inverse of matrix Q cannot be computed? . . . . . . . . . . . 693
What happens if the library spectra have been sampled at dierent wavelengths
from the mixed spectrum? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
xx Contents
What happens if we do not know which pure substances might be present in the
mixed substance? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
How do we solve the linear spectral unmixing problem if we do not know the spectra
of the pure materials? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
7.2 The physics and psychophysics of colour vision . . . . . . . . . . . . . 700
What is colour? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
What is the interest in colour from the engineering point of view? . . . . . . . . . 700
What inuences the colour we perceive for a dark object? . . . . . . . . . . . . . . 700
What causes the variations of the daylight? . . . . . . . . . . . . . . . . . . . . . . 701
How can we model the variations of the daylight? . . . . . . . . . . . . . . . . . . . 702
Box 7.3. Standard illuminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
What is the observed variation in the natural materials? . . . . . . . . . . . . . . . 706
What happens to the light once it reaches the sensors? . . . . . . . . . . . . . . . . 711
Is it possible for dierent materials to produce the same recording by a sensor? . . 713
How does the human visual system achieve colour constancy? . . . . . . . . . . . . 714
What does the trichromatic theory of colour vision say? . . . . . . . . . . . . . . . 715
What denes a colour system? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
How are the tristimulus values specied? . . . . . . . . . . . . . . . . . . . . . . . . 715
Can all monochromatic reference stimuli be matched by simply adjusting the inten-
sities of the primary lights? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Do all people require the same intensities of the primary lights to match the same
monochromatic reference stimulus? . . . . . . . . . . . . . . . . . . . . . . . . 717
Who are the people with normal colour vision? . . . . . . . . . . . . . . . . . . . . 717
What are the most commonly used colour systems? . . . . . . . . . . . . . . . . . . 717
What is the CIE RGB colour system? . . . . . . . . . . . . . . . . . . . . . . . . . 717
What is the XY Z colour system? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
How do we represent colours in 3D? . . . . . . . . . . . . . . . . . . . . . . . . . . 718
How do we represent colours in 2D? . . . . . . . . . . . . . . . . . . . . . . . . . . 718
What is the chromaticity diagram? . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Box 7.4. Some useful theorems from 3D geometry . . . . . . . . . . . . . . . . . . 721
What is the chromaticity diagram for the CIE RGB colour system? . . . . . . . . 724
How does the human brain perceive colour brightness? . . . . . . . . . . . . . . . . 725
How is the alychne dened in the CIE RGB colour system? . . . . . . . . . . . . 726
How is the XY Z colour system dened? . . . . . . . . . . . . . . . . . . . . . . . . 726
What is the chromaticity diagram of the XY Z colour system? . . . . . . . . . . . 728
How is it possible to create a colour system with imaginary primaries, in practice? 729
What if we wish to model the way a particular individual sees colours? . . . . . . . 729
If dierent viewers require dierent intensities of the primary lights to see white,
how do we calibrate colours between dierent viewers? . . . . . . . . . . . . . 730
How do we make use of the reference white? . . . . . . . . . . . . . . . . . . . . . . 730
How is the sRGB colour system dened? . . . . . . . . . . . . . . . . . . . . . . . 732
Does a colour change if we double all its tristimulus values? . . . . . . . . . . . . . 733
How does the description of a colour, in terms of a colour system, relate to the way
we describe colours in everyday language? . . . . . . . . . . . . . . . . . . . . 733
How do we compare colours? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
What is a metric? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Can we use the Euclidean metric to measure the dierence of two colours? . . . . . 734
Contents xxi
Which are the perceptually uniform colour spaces? . . . . . . . . . . . . . . . . . . 734
How is the Luv colour space dened? . . . . . . . . . . . . . . . . . . . . . . . . . . 734
How is the Lab colour space dened? . . . . . . . . . . . . . . . . . . . . . . . . . . 735
How do we choose values for (X
n
, Y
n
, Z
n
)? . . . . . . . . . . . . . . . . . . . . . . . 735
How can we compute the RGB values from the Luv values? . . . . . . . . . . . . . 735
How can we compute the RGB values from the Lab values? . . . . . . . . . . . . . 736
How do we measure perceived saturation? . . . . . . . . . . . . . . . . . . . . . . . 737
How do we measure perceived dierences in saturation? . . . . . . . . . . . . . . . 737
How do we measure perceived hue? . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
How is the perceived hue angle dened? . . . . . . . . . . . . . . . . . . . . . . . . 738
How do we measure perceived dierences in hue? . . . . . . . . . . . . . . . . . . . 738
What aects the way we perceive colour? . . . . . . . . . . . . . . . . . . . . . . . 740
What is meant by temporal context of colour? . . . . . . . . . . . . . . . . . . . . 740
What is meant by spatial context of colour? . . . . . . . . . . . . . . . . . . . . . . 740
Why distance matters when we talk about spatial frequency? . . . . . . . . . . . . 741
How do we explain the spatial dependence of colour perception? . . . . . . . . . . 741
7.3 Colour image processing in practice . . . . . . . . . . . . . . . . . . . . 742
How does the study of the human colour vision aect the way we do image
processing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
How perceptually uniform are the perceptually uniform colour spaces in practice? . 742
How should we convert the image RGB values to the Luv or the Lab colour
spaces? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
How do we measure hue and saturation in image processing applications? . . . . . 747
How can we emulate the spatial dependence of colour perception in image
processing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
What is the relevance of the phenomenon of metamerism to image processing? . . 756
How do we cope with the problem of metamerism in an industrial inspection appli-
cation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
What is a Monte-Carlo method? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
How do we remove noise from multispectral images? . . . . . . . . . . . . . . . . . 759
How do we rank vectors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
How do we deal with mixed noise in multispectral images? . . . . . . . . . . . . . . 760
How do we enhance a colour image? . . . . . . . . . . . . . . . . . . . . . . . . . . 761
How do we restore multispectral images? . . . . . . . . . . . . . . . . . . . . . . . . 767
How do we compress colour images? . . . . . . . . . . . . . . . . . . . . . . . . . . 767
How do we segment multispectral images? . . . . . . . . . . . . . . . . . . . . . . . 767
How do we apply k-means clustering in practice? . . . . . . . . . . . . . . . . . . . 767
How do we extract the edges of multispectral images? . . . . . . . . . . . . . . . . 769
What is the take home message of this chapter? . . . . . . . . . . . . . . . . . . 770
Bibliographical notes 775
References 777
Index 781
Preface
Since the rst edition of this book in 1999, the eld of Image Processing has seen many
developments. First of all, the proliferation of colour sensors caused an explosion of research
in colour vision and colour image processing. Second, application of image processing to
biomedicine has really taken o, with medical image processing nowadays being almost a
eld of its own. Third, image processing has become more sophisticated, having reached out
even further aeld, into other areas of research, as diverse as graph theory and psychophysics,
to borrow methodologies and approaches.
This new edition of the book attempts to capture these new insights, without, however,
forgetting the well known and established methods of image processing of the past. The book
may be treated as three books interlaced: the advanced proofs and peripheral material are
presented in grey boxes; they may be omitted in a rst reading or for an undergraduate course.
The back bone of the book is the text given in the form of questions and answers. We believe
that the order of the questions is that of coming naturally to the reader when they encounter
a new concept. There are 255 gures and 384 fully worked out examples aimed at clarifying
these concepts. Examples with a number prexed with a B refer to the boxed
material and again they may be omitted in a rst reading or an undergraduate
course. The book is accompanied by a CD with all the MatLab programs that produced
the examples and the gures. There is also a collection of slide presentations in pdf format,
available from the accompanying web page of the book, that may help the lecturer who wishes
to use this material for teaching.
We have made a great eort to make the book easy to read and we hope that learning
about the nuts and bolts behind the image processing algorithms will make the subject
even more exciting and a pleasure to delve into.
Over the years of writing this book, we were helped by various people. We would par-
ticularly like to thank Mike Brookes, Nikos Mitianoudis, Antonis Katartzis, Mohammad Ja-
hangiri, Tania Stathaki and Vladimir Jeliazkov, for useful discussions, Mohammad Jahangiri,
Leila Favaedi and Olga Duran for help with some gures, and Pedro Garcia-Sevilla for help
with typesetting the book.
Maria Petrou and Costas Petrou
xxiii
Plates 771
(a) (b)
Plate I: (a) The colours of the Macbeth colour chart. (b) The chromaticity diagram of the
XY Z colour system. Points A and B represent colours which, although further apart than
points C and D, are perceived as more similar than the colours represented by C and D.
(a) One eigenvalue (b) Two eigenvalues (c) Three eigenvalues
(d) Four eigenvalues (e) Five eigenvalues (f) Six eigenvalues
Plate II: The inclusion of extra eigenvalues beyond the third one changes the colour appear-
ance very little (see example 7.12, on page 713).
Plate III: Colour perception depends on colour spatial frequency (see page 740).
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
772 Plates
Plate IV: Colour perception depends on colour context (see page 740).
(a) 5% impulse noise (b) 5% impulse + Gaussian ( = 15)
(c) Vector median ltering (d) -trimmed vector median ltering
Plate V: At the top, images aected by impulse noise and mixed noise, and at the bottom
their restored versions, using vector median ltering, with window size 3 3, and -trimmed
vector median ltering, with = 0.2 and window size 5 5 (example 7.32, page 761).
Plates 773
(a) Original (b) Seen from 2m
(c) Seen from 4m (d) Seen from 6m
Plate VI: (a) A Street in Shanghai (344 512). As seen from (b) 2m, (c) 4m and (d) 10m
distance. In (b) a border of 10 pixels around should be ignored, in (c) the stripe aected by
border eects is 22 pixels wide, while in (d) is 34 pixels wide (example 7.28, page 754).
(a) Abu-Dhabi building (b) After colour enhancement
Plate VII: Enhancing colours by increasing their saturation to its maximum, while retaining
their hue. Threshold= 0.04 and = 1/
_
f(1, 1) f(1, 2) . . . f(1, N)
f(2, 1) f(2, 2) . . . f(2, N)
.
.
.
.
.
.
.
.
.
f(N, 1) f(N, 2) . . . f(N, N)
_
_
(1.1)
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
2 Image Processing: The Fundamentals
with 0 f(x, y) G1, where usually N and G are expressed as positive integer powers of
2 (N = 2
n
, G = 2
m
).
What is a spectral band?
A colour band is a range of wavelengths of the electromagnetic spectrum, over which the
sensors we use to capture an image have nonzero sensitivity. Typical colour images consist
of three colour bands. This means that they have been captured by three dierent sets
of sensors, each set made to have a dierent sensitivity function. Figure 1.1 shows typical
sensitivity curves of a multispectral camera.
All the methods presented in this book, apart from those in Chapter 7, will refer to single
band images.
s
e
n
s
o
r
s
B
s
e
n
s
o
r
s
G
s
e
n
s
o
r
s
R
wavelength
s
e
n
s
i
t
i
v
i
t
y
1
0
Figure 1.1: The spectrum of the light which reaches a sensor is multiplied with the sensitivity
function of the sensor and recorded by the sensor. This recorded value is the brightness of
the image in the location of the sensor and in the band of the sensor. This gure shows the
sensitivity curves of three dierent sensor types.
Why do most image processing algorithms refer to grey images, while most images
we come across are colour images?
For various reasons.
1. A lot of the processes we apply to a grey image can be easily extended to a colour image
by applying them to each band separately.
2. A lot of the information conveyed by an image is expressed in its grey form and so
colour is not necessary for its extraction. That is the reason black and white television
receivers had been perfectly acceptable to the public for many years and black and
white photography is still popular with many photographers.
3. For many years colour digital cameras were expensive and not widely available. A
lot of image processing techniques were developed around the type of image that was
available. These techniques have been well established in image processing.
Nevertheless, colour is an important property of the natural world, and so we shall examine
its role in image processing in a separate chapter in this book.
1. Introduction 3
How is a digital image formed?
Each pixel of an image corresponds to a part of a physical object in the 3D world. This
physical object is illuminated by some light which is partly reected and partly absorbed by
it. Part of the reected light reaches the array of sensors used to image the scene and is
responsible for the values recorded by these sensors. One of these sensors makes up a pixel
and its eld of view corresponds to a small patch in the imaged scene. The recorded value by
each sensor depends on its sensitivity curve. When a photon of a certain wavelength reaches
the sensor, its energy is multiplied with the value of the sensitivity curve of the sensor at
that wavelength and is accumulated. The total energy collected by the sensor (during the
exposure time) is eventually used to compute the grey value of the pixel that corresponds to
this sensor.
If a sensor corresponds to a patch in the physical world, how come we can have
more than one sensor type corresponding to the same patch of the scene?
Indeed, it is not possible to have three dierent sensors with three dierent sensitivity curves
corresponding exactly to the same patch of the physical world. That is why digital cameras
have the three dierent types of sensor slightly displaced from each other, as shown in gure
1.2, with the sensors that are sensitive to the green wavelengths being twice as many as those
sensitive to the blue and the red wavelengths. The recordings of the three sensor types are
interpolated and superimposed to create the colour image. Recently, however, cameras have
been constructed where the three types of sensor are combined so they exactly exist on top of
each other and so they view exactly the same patch in the real world. These cameras produce
much sharper colour images than the ordinary cameras.
R R G G
G G
R G
G R R
G R G R G
B B B
B
B B B
B B
G
G
G
G
G
G
R
R
G
G
G
Figure 1.2: The RGB sensors as they are arranged in a typical digital camera.
What is the physical meaning of the brightness of an image at a pixel position?
The brightness values of dierent pixels have been created by using the energies recorded
by the corresponding sensors. They have signicance only relative to each other and they
are meaningless in absolute terms. So, pixel values between dierent images should only be
compared if either care has been taken for the physical processes used to form the two images
to be identical, or the brightness values of the two images have somehow been normalised so
that the eects of the dierent physical processes have been removed. In that case, we say
that the sensors are calibrated.
4 Image Processing: The Fundamentals
Example 1.1
You are given a triple array of 33 sensors, arranged as shown in gure 1.3,
with their sampled sensitivity curves given in table 1.1. The last column of
this table gives in some arbitrary units the energy, E
i
, carried by photons
with wavelength
i
.
Wavelength Sensors B Sensors G Sensors R Energy
0
0.2 0.0 0.0 1.00
1
0.4 0.2 0.1 0.95
2
0.8 0.3 0.2 0.90
3
1.0 0.4 0.2 0.88
4
0.7 0.6 0.3 0.85
5
0.2 1.0 0.5 0.81
6
0.1 0.8 0.6 0.78
7
0.0 0.6 0.8 0.70
8
0.0 0.3 1.0 0.60
9
0.0 0.0 0.6 0.50
Table 1.1: The sensitivity curves of three types of sensor for wavelengths
in the range [
0
,
9
], and the corresponding energy of photons of these
particular wavelengths in some arbitrary units.
The shutter of the camera is open long enough for 10 photons to reach the
locations of the sensors.
(2,2)
(3,1)
B B B
B B B
R R
R
G G
G G
R G
B B B
(2,1) (2,3)
(3,2) (3,3)
(1,1) (1,2) (1,3)
G R R
G R G R G R
Figure 1.3: Three 3 3 sensor arrays interlaced. On the right, the loca-
tions of the pixels they make up. Although the three types of sensor are
slightly misplaced with respect to each other, we assume that each triplet
is coincident and forms a single pixel.
For simplicity, we consider that exactly the same types of photon reach all
sensors that correspond to the same location.
1. Introduction 5
The wavelengths of the photons that reach the pixel locations of each
triple sensor, as identied in gure 1.3, are:
Location (1, 1):
0
,
9
,
9
,
8
,
7
,
8
,
1
,
0
,
1
,
1
Location (1, 2):
1
,
3
,
3
,
4
,
4
,
5
,
2
,
6
,
4
,
5
Location (1, 3):
6
,
7
,
7
,
0
,
5
,
6
,
6
,
1
,
5
,
9
Location (2, 1):
0
,
1
,
0
,
2
,
1
,
1
,
4
,
3
,
3
,
1
Location (2, 2):
3
,
3
,
4
,
3
,
4
,
4
,
5
,
2
,
9
,
4
Location (2, 3):
7
,
7
,
6
,
7
,
6
,
1
,
5
,
9
,
8
,
7
Location (3, 1):
6
,
6
,
1
,
8
,
7
,
8
,
9
,
9
,
8
,
7
Location (3, 2):
0
,
4
,
3
,
4
,
1
,
5
,
4
,
0
,
2
,
1
Location (3, 3):
3
,
4
,
1
,
0
,
0
,
4
,
2
,
5
,
2
,
4
Calculate the values that each sensor array will record and thus produce
the three photon energy bands recorded.
We denote by g
X
(i, j) the value that will be recorded by sensor type X at location (i, j).
For sensors R, G and B in location (1, 1), the recorded values will be:
g
R
(1, 1) = 2E
0
0.0 + 2E
9
0.6 + 2E
8
1.0 + 1E
7
0.8 + 3E
1
0.1
= 1.0 0.6 + 1.2 1.0 + 0.7 0.8 + 2.85 0.1
= 2.645 (1.2)
g
G
(1, 1) = 2E
0
0.0 + 2E
9
0.0 + 2E
8
0.3 + 1E
7
0.6 + 3E
1
0.2
= 1.2 0.3 + 0.7 0.6 + 2.85 0.2
= 1.35 (1.3)
g
B
(1, 1) = 2E
0
0.2 + 2E
9
0.0 + 2E
8
0.0 + 1E
7
0.0 + 3E
1
0.4
= 2.0 0.2 + 2.85 0.4
= 1.54 (1.4)
Working in a similar way, we deduce that the energies recorded by the three sensor
arrays are:
E
R
=
_
_
2.645 2.670 3.729
1.167 4.053 4.576
4.551 1.716 1.801
_
_
E
G
=
_
_
1.350 4.938 4.522
2.244 4.176 4.108
2.818 2.532 2.612
_
_
E
B
=
_
_
1.540 5.047 1.138
4.995 5.902 0.698
0.536 4.707 5.047
_
_
(1.5)
6 Image Processing: The Fundamentals
256 256 pixels 128 128 pixels
64 64 pixels 32 32 pixels
Figure 1.4: Keeping the number of grey levels constant and decreasing the number of pixels
with which we digitise the same eld of view produces the checkerboard eect.
Why are images often quoted as being 512 512, 256 256, 128 128 etc?
Many calculations with images are simplied when the size of the image is a power of 2. We
shall see some examples in Chapter 2.
How many bits do we need to store an image?
The number of bits, b, we need to store an image of size N N, with 2
m
grey levels, is:
b = N N m (1.6)
So, for a typical 512 512 image with 256 grey levels (m = 8) we need 2,097,152 bits or
262,144 8-bit bytes. That is why we often try to reduce m and N, without signicant loss of
image quality.
1. Introduction 7
What determines the quality of an image?
The quality of an image is a complicated concept, largely subjective and very much application
dependent. Basically, an image is of good quality if it is not noisy and
(1) it is not blurred;
(2) it has high resolution;
(3) it has good contrast.
What makes an image blurred?
Image blurring is caused by incorrect image capturing conditions. For example, out of focus
camera, or relative motion of the camera and the imaged object. The amount of image
blurring is expressed by the so called point spread function of the imaging system.
What is meant by image resolution?
The resolution of an image expresses how much detail we can see in it and clearly depends
on the number of pixels we use to represent a scene (parameter N in equation (1.6)) and the
number of grey levels used to quantise the brightness values (parameter m in equation (1.6)).
Keeping m constant and decreasing N results in the checkerboard eect (gure 1.4).
Keeping N constant and reducing m results in false contouring (gure 1.5). Experiments
have shown that the more detailed a picture is, the less it improves by keeping N constant
and increasing m. So, for a detailed picture, like a picture of crowds (gure 1.6), the number
of grey levels we use does not matter much.
Example 1.2
Assume that the range of values recorded by the sensors of example 1.1 is
from 0 to 10. From the values of the three bands captured by the three
sets of sensors, create digital bands with 3 bits each (m = 3).
For 3-bit images, the pixels take values in the range [0, 2
3
1], ie in the range
[0, 7]. Therefore, we have to divide the expected range of values to 8 equal intervals:
10/8 = 1.25. So, we use the following conversion table:
All pixels with recorded value in the range [0.0, 1.25) get grey value 0
All pixels with recorded value in the range [1.25, 2.5) get grey value 1
All pixels with recorded value in the range [2.5, 3.75) get grey value 2
All pixels with recorded value in the range [3.75, 5.0) get grey value 3
All pixels with recorded value in the range [5.0, 6.25) get grey value 4
All pixels with recorded value in the range [6.25, 7.5) get grey value 5
All pixels with recorded value in the range [7.5, 8.75) get grey value 6
All pixels with recorded value in the range [8.75, 10.0] get grey value 7
8 Image Processing: The Fundamentals
This mapping leads to the following bands of the recorded image.
R =
_
_
2 2 2
0 3 3
3 1 1
_
_
G =
_
_
1 3 3
1 3 3
2 2 2
_
_
B =
_
_
1 4 0
3 4 0
0 3 4
_
_
(1.7)
m = 8 m = 7 m = 6
m = 5 m = 4 m = 3
m = 2 m = 1
Figure 1.5: Keeping the size of the image constant (249 199) and reducing the number
of grey levels (= 2
m
) produces false contouring. To display the images, we always map the
dierent grey values to the range [0, 255].
1. Introduction 9
256 grey levels (m = 8) 128 grey levels (m = 7)
64 grey levels (m = 6) 32 grey levels (m = 5)
16 grey levels (m = 4) 8 grey levels (m = 3)
4 grey levels (m = 2) 2 grey levels (m = 1)
Figure 1.6: Keeping the number of pixels constant and reducing the number of grey levels
does not aect much the appearance of an image that contains a lot of details.
10 Image Processing: The Fundamentals
What does good contrast mean?
Good contrast means that the grey values present in the image range from black to white,
making use of the full range of brightness to which the human vision system is sensitive.
Example 1.3
Consider each band created in example 1.2 as a separate grey image. Do
these images have good contrast? If not, propose some way by which bands
with good contrast could be created from the recorded sensor values.
The images created in example 1.2 do not have good contrast because none of them
contains the value 7 (which corresponds to the maximum brightness and which would
be displayed as white by an image displaying device).
The reason for this is the way the quantisation was performed: the look up table created
to convert the real recorded values to digital values took into consideration the full range
of possible values a sensor may record (ie from 0 to 10). To utilise the full range of
grey values for each image, we should have considered the minimum and the maximum
value of its pixels, and map that range to the 8 distinct grey levels. For example, for
the image that corresponds to band R, the values are in the range [1.167, 4.576]. If we
divide this range in 8 equal sub-ranges, we shall have:
All pixels with recorded value in the range [0.536, 1.20675) get grey value 0
All pixels with recorded value in the range [1.20675, 1.8775) get grey value 1
All pixels with recorded value in the range [1.8775, 2.54825) get grey value 2
All pixels with recorded value in the range [2.54825, 3.219) get grey value 3
All pixels with recorded value in the range [3.219, 3.88975) get grey value 4
All pixels with recorded value in the range [3.88975, 4.5605) get grey value 5
All pixels with recorded value in the range [4.5605, 5.23125) get grey value 6
All pixels with recorded value in the range [5.23125, 5.902] get grey value 7
We must create one such look up table for each band. The grey images we create this
way are:
R =
_
_
3 3 6
0 6 7
7 1 1
_
_
G =
_
_
0 7 7
1 6 6
3 2 2
_
_
B =
_
_
1 6 0
6 7 0
0 6 6
_
_
(1.8)
Example 1.4
Repeat example 1.3, now treating the three bands as parts of the same
colour image.
If we treat all three bands as a single colour image (as they are meant to be), we must
1. Introduction 11
nd the minimum and maximum value over all three recorded bands, and create a
look up table appropriate for all bands. In this case, the range of the recorded values
is [0.536, 5.902]. The look up table we create by mapping this range to the range [0, 7] is
All pixels with recorded value in the range [0.536, 1.20675) get grey value 0
All pixels with recorded value in the range [1.20675, 1.8775) get grey value 1
All pixels with recorded value in the range [1.8775, 2.54825) get grey value 2
All pixels with recorded value in the range [2.54825, 3.219) get grey value 3
All pixels with recorded value in the range [3.219, 3.88975) get grey value 4
All pixels with recorded value in the range [3.88975, 4.5605) get grey value 5
All pixels with recorded value in the range [4.5605, 5.23125) get grey value 6
All pixels with recorded value in the range [5.23125, 5.902] get grey value 7
The three image bands we create this way are:
R =
_
_
3 3 4
1 5 6
5 1 1
_
_
G =
_
_
1 6 5
2 5 5
3 2 3
_
_
B =
_
_
1 6 0
6 7 0
0 6 6
_
_
(1.9)
Note that each of these bands if displayed as a separate grey image may not display the
full range of grey values, but it will have grey values that are consistent with those of
the other three bands, and so they can be directly compared. For example, by looking
at the three digital bands we can say that pixel (1, 1) has the same brightness (within
the limits of the digitisation error) in bands G or B. Such a statement was not possible
in example 1.3 because the three digital bands were not calibrated.
What is the purpose of image processing?
Image processing has multiple purposes.
To improve the quality of an image in a subjective way, usually by increasing its contrast.
This is called image enhancement.
To use as few bits as possible to represent the image, with minimum deterioration in
its quality. This is called image compression.
To improve an image in an objective way, for example by reducing its blurring. This is
called image restoration.
To make explicit certain characteristics of the image which can be used to identify the
contents of the image. This is called feature extraction.
How do we do image processing?
We perform image processing by using image transformations. Image transformations are
performed using operators. An operator takes as input an image and produces another
image. In this book we shall put emphasis on a particular class of operators, called linear
operators.
12 Image Processing: The Fundamentals
Do we use nonlinear operators in image processing?
Yes. We shall see several examples of them in this book. However, nonlinear operators cannot
be collectively characterised. They are usually problem- and application-specic, and they
are studied as individual processes used for specic tasks. On the contrary, linear operators
can be studied collectively, because they share important common characteristics, irrespective
of the task they are expected to perform.
What is a linear operator?
Consider O to be an operator which takes images into images. If f is an image, O(f) is the
result of applying O to f. O is linear if
O[af + bg] = aO[f] + bO[g] (1.10)
for all images f and g and all scalars a and b.
How are linear operators dened?
Linear operators are dened in terms of their point spread functions. The point spread
function of an operator is what we get out if we apply the operator on a point source:
O[point source] point spread function (1.11)
Or
O[( x, y)] h(x, , y, ) (1.12)
where ( x, y) is a point source of brightness 1 centred at point (x, y).
What is the relationship between the point spread function of an imaging device
and that of a linear operator?
They both express the eect of either the imaging device or the operator on a point source. In
the real world a star is the nearest to a point source. Assume that we capture the image of a
star by using a camera. The star will appear in the image like a blob: the camera received the
light of the point source and spread it into a blob. The bigger the blob, the more blurred the
image of the star will look. So, the point spread function of the camera measures the amount
of blurring present in the images captured by this camera. The camera, therefore, acts like
a linear operator which accepts as input the ideal brightness function of the continuous real
world and produces the recorded digital image. That is why we use the term point spread
function to characterise both cameras and linear operators.
How does a linear operator transform an image?
If the operator is linear, when the point source is a times brighter, the result will be a times
higher:
O[a( x, y)] = ah(x, , y, ) (1.13)
1. Introduction 13
An image is a collection of point sources (the pixels) each with its own brightness value. For
example, assuming that an image f is 3 3 in size, we may write:
f =
_
_
f(1, 1) 0 0
0 0 0
0 0 0
_
_
+
_
_
0 f(1, 2) 0
0 0 0
0 0 0
_
_
+ +
_
_
0 0 0
0 0 0
0 0 f(3, 3)
_
_
(1.14)
We may say that an image is the sum of these point sources. Then the eect of an operator
characterised by point spread function h(x, , y, ) on an image f(x, y) can be written as:
g(, ) =
N
x=1
N
y=1
f(x, y)h(x, , y, ) (1.15)
where g(, ) is the output image, f(x, y) is the input image and the size of the image is
N N. Here we treat f(x, y) as the brightness of a point source located at position (x, y).
Applying an operator on it produces the point spread function of the operator times the
strength of the source, ie times the grey value f(x, y) at that location. Then, as the operator
is linear, we sum over all such point sources, ie we sum over all pixels.
What is the meaning of the point spread function?
The point spread function h(x, , y, ) expresses how much the input value at position (x, y)
inuences the output value at position (, ). If the inuence expressed by the point spread
function is independent of the actual positions but depends only on the relative position of
the inuencing and the inuenced pixels, we have a shift invariant point spread function:
h(x, , y, ) = h( x, y) (1.16)
Then equation (1.15) is a convolution:
g(, ) =
N
x=1
N
y=1
f(x, y)h( x, y) (1.17)
If the columns are inuenced independently from the rows of the image, then the point
spread function is separable:
h(x, , y, ) h
c
(x, )h
r
(y, ) (1.18)
The above expression serves also as the denition of functions h
c
(x, ) and h
r
(y, ). Then
equation (1.15) may be written as a cascade of two 1D transformations:
g(, ) =
N
x=1
h
c
(x, )
N
y=1
f(x, y)h
r
(y, ) (1.19)
If the point spread function is both shift invariant and separable, then equation (1.15)
may be written as a cascade of two 1D convolutions:
g(, ) =
N
x=1
h
c
( x)
N
y=1
f(x, y)h
r
( y) (1.20)
14 Image Processing: The Fundamentals
Box 1.1. The formal denition of a point source in the continuous domain
Let us dene an extended source of constant brightness
n
(x, y) n
2
rect(nx, ny) (1.21)
where n is a positive constant and
rect(nx, ny)
_
1 inside a rectangle |nx|
1
2
, |ny|
1
2
0 elsewhere
(1.22)
The total brightness of this source is given by
_
+
_
+
n
(x, y)dxdy = n
2
_
+
_
+
rect(nx, ny)dxdy
. .
area of rectangle
= 1 (1.23)
and is independent of n.
As n +, we create a sequence,
n
, of extended square sources which gradually
shrink with their brightness remaining constant. At the limit,
n
becomes Diracs delta
function
(x, y)
_
= 0 for x = y = 0
= 0 elsewhere
(1.24)
with the property:
_
+
_
+
_
+
n
(x, y)g(x, y)dxdy (1.26)
is the average of image g(x, y) over a square with sides
1
n
centred at (0, 0). At the limit,
we have
_
+
_
+
_
+
g(x, y)
n
(x a, y b)dxdy (1.28)
1. Introduction 15
is the average value of g over a square
1
n
1
n
centred at x = a, y = b, since:
n
(x a, y b) = n
2
rect[n(x a), n(y b)]
=
_
n
2
|n(x a)|
1
2
|n(y b)|
1
2
0 elsewhere
(1.29)
We can see that this is a square source centred at (a, b) by considering that |n(xa)|
1
2
means
1
2
n(x a)
1
2
, ie
1
2n
x a
1
2n
, or a
1
2n
x a +
1
2n
. Thus, we
have
n
(x a, y b) = n
2
in the region a
1
2n
x a +
1
2n
, b
1
2n
y b +
1
2n
.
At the limit of n +, integral (1.28) is the value of the image g at x = a, y = b, ie:
_
+
_
+
g(x, y)
n
(x a, y b)dxdy = g(a, b) (1.30)
This equation is called the shifting property of the delta function. This equation also
shows that any image g(a, b) can be expressed as a superposition of point sources.
Example 1.5
The following 3 3 image
f =
_
_
0 2 6
1 4 7
3 5 7
_
_
(1.31)
is processed by a linear operator O which has a point spread function
h(x, , y, ) dened as:
h(1, 1, 1, 1) = 1.0 h(1, 1, 1, 2) = 0.5 h(1, 1, 1, 3) = 0.0 h(1, 2, 1, 1) = 0.5
h(1, 2, 1, 2) = 0.0 h(1, 2, 1, 3) = 0.4 h(1, 3, 1, 1) = 0.5 h(1, 3, 1, 2) = 1.0
h(1, 3, 1, 3) = 0.6 h(2, 1, 1, 1) = 0.8 h(2, 1, 1, 2) = 0.7 h(2, 1, 1, 3) = 0.4
h(2, 2, 1, 1) = 0.6 h(2, 2, 1, 2) = 0.5 h(2, 2, 1, 3) = 0.4 h(2, 3, 1, 1) = 0.4
h(2, 3, 1, 2) = 0.8 h(2, 3, 1, 3) = 1.0 h(3, 1, 1, 1) = 0.9 h(3, 1, 1, 2) = 0.5
h(3, 1, 1, 3) = 0.5 h(3, 2, 1, 1) = 0.6 h(3, 2, 1, 2) = 0.5 h(3, 2, 1, 3) = 0.3
h(3, 3, 1, 1) = 0.5 h(3, 3, 1, 2) = 0.9 h(3, 3, 1, 3) = 1.0 h(1, 1, 2, 1) = 1.0
h(1, 1, 2, 2) = 0.6 h(1, 1, 2, 3) = 0.2 h(1, 2, 2, 1) = 0.0 h(1, 2, 2, 2) = 0.2
h(1, 2, 2, 3) = 0.4 h(1, 3, 2, 1) = 0.4 h(1, 3, 2, 2) = 1.0 h(1, 3, 2, 3) = 0.6
h(2, 1, 2, 1) = 0.8 h(2, 1, 2, 2) = 0.7 h(2, 1, 2, 3) = 0.6 h(2, 2, 2, 1) = 0.6
h(2, 2, 2, 2) = 0.5 h(2, 2, 2, 3) = 0.5 h(2, 3, 2, 1) = 0.5 h(2, 3, 2, 2) = 1.0
h(2, 3, 2, 3) = 1.0 h(3, 1, 2, 1) = 0.7 h(3, 1, 2, 2) = 0.5 h(3, 1, 2, 3) = 0.5
h(3, 2, 2, 1) = 0.6 h(3, 2, 2, 2) = 0.5 h(3, 2, 2, 3) = 0.5 h(3, 3, 2, 1) = 0.5
h(3, 3, 2, 2) = 1.0 h(3, 3, 2, 3) = 1.0 h(1, 1, 3, 1) = 1.0 h(1, 1, 3, 2) = 0.6
h(1, 1, 3, 3) = 1.0 h(1, 2, 3, 1) = 0.5 h(1, 2, 3, 2) = 0.1 h(1, 2, 3, 3) = 0.6
h(1, 3, 3, 1) = 0.5 h(1, 3, 3, 2) = 1.0 h(1, 3, 3, 3) = 0.6 h(2, 1, 3, 1) = 0.5
h(2, 1, 3, 2) = 0.7 h(2, 1, 3, 3) = 0.5 h(2, 2, 3, 1) = 0.6 h(2, 2, 3, 2) = 0.5
h(2, 2, 3, 3) = 0.5 h(2, 3, 3, 1) = 0.4 h(2, 3, 3, 2) = 0.9 h(2, 3, 3, 3) = 1.0
16 Image Processing: The Fundamentals
h(3, 1, 3, 1) = 0.8 h(3, 1, 3, 2) = 0.5 h(3, 1, 3, 3) = 0.5 h(3, 2, 3, 1) = 0.5
h(3, 2, 3, 2) = 0.5 h(3, 2, 3, 3) = 0.8 h(3, 3, 3, 1) = 0.4 h(3, 3, 3, 2) = 0.4
h(3, 3, 3, 3) = 1.0
Work out the output image.
The point spread function of the operator h(x, , y, ) gives the weight with which the
pixel value at input position (x, y) contributes to output pixel position (, ). Let us
call the output image g(, ). We show next how to calculate g(1, 1). For g(1, 1), we
need to use the values of h(x, 1, y, 1) to weigh the values of pixels (x, y) of the input
image:
g(1, 1) =
3
x=1
3
y=1
f(x, y)h(x, 1, y, 1)
= f(1, 1)h(1, 1, 1, 1) + f(1, 2)h(1, 1, 2, 1) + f(1, 3)h(1, 1, 3, 1)
+f(2, 1)h(2, 1, 1, 1) + f(2, 2)h(2, 1, 2, 1) + f(2, 3)h(2, 1, 3, 1)
+f(3, 1)h(3, 1, 1, 1) + f(3, 2)h(3, 1, 2, 1) + f(3, 3)h(3, 1, 3, 1)
= 2 1.0 + 6 1.0 + 1 0.8 + 4 0.8 + 7 0.5 + 3 0.9
+5 0.7 + 7 0.8 = 27.3 (1.32)
For g(1, 2), we need to use the values of h(x, 1, y, 2):
g(1, 2) =
3
x=1
3
y=1
f(x, y)h(x, 1, y, 2) = 20.1 (1.33)
The other values are computed in a similar way. Finally, the output image is:
g =
_
_
27.3 20.1 18.9
19.4 16.0 18.4
16.0 29.3 33.4
_
_
(1.34)
Example 1.6
Is the operator of example 1.5 shift invariant?
No, it is not. For example, pixel (2, 2) inuences the value of pixel (1, 2) with weight
h(2, 1, 2, 2) = 0.7. These two pixels are at distance 2 1 = 1 along the x axis and at
distance 2 2 = 0 along the y axis. At the same relative distance are also pixels (3, 3)
and (2, 3). The value of h(3, 2, 3, 3), however, is 0.8 = 0.7.
1. Introduction 17
Example 1.7
The point spread function of an operator that operates on images of size
3 3 is
h(1, 1, 1, 1) = 1.0 h(1, 1, 1, 2) = 0.5 h(1, 1, 1, 3) = 0.0 h(1, 2, 1, 1) = 0.5
h(1, 2, 1, 2) = 0.0 h(1, 2, 1, 3) = 0.4 h(1, 3, 1, 1) = 0.5 h(1, 3, 1, 2) = 1.0
h(1, 3, 1, 3) = 0.6 h(2, 1, 1, 1) = 0.8 h(2, 1, 1, 2) = 0.7 h(2, 1, 1, 3) = 0.4
h(2, 2, 1, 1) = 1.0 h(2, 2, 1, 2) = 0.5 h(2, 2, 1, 3) = 0.0 h(2, 3, 1, 1) = 0.5
h(2, 3, 1, 2) = 0.0 h(2, 3, 1, 3) = 0.4 h(3, 1, 1, 1) = 0.9 h(3, 1, 1, 2) = 0.5
h(3, 1, 1, 3) = 0.5 h(3, 2, 1, 1) = 0.8 h(3, 2, 1, 2) = 0.7 h(3, 2, 1, 3) = 0.4
h(3, 3, 1, 1) = 1.0 h(3, 3, 1, 2) = 0.5 h(3, 3, 1, 3) = 0.0 h(1, 1, 2, 1) = 1.0
h(1, 1, 2, 2) = 1.0 h(1, 1, 2, 3) = 0.5 h(1, 2, 2, 1) = 0.0 h(1, 2, 2, 2) = 0.5
h(1, 2, 2, 3) = 0.0 h(1, 3, 2, 1) = 0.4 h(1, 3, 2, 2) = 0.5 h(1, 3, 2, 3) = 1.0
h(2, 1, 2, 1) = 0.8 h(2, 1, 2, 2) = 0.8 h(2, 1, 2, 3) = 0.7 h(2, 2, 2, 1) = 1.0
h(2, 2, 2, 2) = 1.0 h(2, 2, 2, 3) = 0.5 h(2, 3, 2, 1) = 0.0 h(2, 3, 2, 2) = 0.5
h(2, 3, 2, 3) = 0.0 h(3, 1, 2, 1) = 0.7 h(3, 1, 2, 2) = 0.9 h(3, 1, 2, 3) = 0.5
h(3, 2, 2, 1) = 0.8 h(3, 2, 2, 2) = 0.8 h(3, 2, 2, 3) = 0.7 h(3, 3, 2, 1) = 1.0
h(3, 3, 2, 2) = 1.0 h(3, 3, 2, 3) = 0.5 h(1, 1, 3, 1) = 1.0 h(1, 1, 3, 2) = 1.0
h(1, 1, 3, 3) = 1.0 h(1, 2, 3, 1) = 0.5 h(1, 2, 3, 2) = 0.0 h(1, 2, 3, 3) = 0.5
h(1, 3, 3, 1) = 0.5 h(1, 3, 3, 2) = 0.4 h(1, 3, 3, 3) = 0.5 h(2, 1, 3, 1) = 0.5
h(2, 1, 3, 2) = 0.8 h(2, 1, 3, 3) = 0.8 h(2, 2, 3, 1) = 1.0 h(2, 2, 3, 2) = 1.0
h(2, 2, 3, 3) = 1.0 h(2, 3, 3, 1) = 0.5 h(2, 3, 3, 2) = 0.0 h(2, 3, 3, 3) = 0.5
h(3, 1, 3, 1) = 0.8 h(3, 1, 3, 2) = 0.7 h(3, 1, 3, 3) = 0.9 h(3, 2, 3, 1) = 0.5
h(3, 2, 3, 2) = 0.8 h(3, 2, 3, 3) = 0.8 h(3, 3, 3, 1) = 1.0 h(3, 3, 3, 2) = 1.0
h(3, 3, 3, 3) = 1.0
Is it shift variant or shift invariant?
In order to show that the function is shift variant, it is enough to show that for at
least two pairs of pixels, that correspond to input-output pixels in the same relative
position, the values of the function are dierent. This is what we did in example
1.6. If we cannot nd such an example, we must then check for shift invariance.
The function must have the same value for all pairs of input-output pixels that are in
the same relative position in order to be shift invariant. As the range of values each
of the arguments of this function takes is [1, 3], the relative coordinates of pairs of
input-output pixels take values in the range [2, 2]. We observe the following.
For x = 2 and y = 2 we have: h(1, 3, 1, 3) = 0.6
For x = 2 and y = 1 we have: h(1, 3, 1, 2) = h(1, 3, 2, 3) = 1.0
For x = 2 and y = 0 we have: h(1, 3, 1, 1) = h(1, 3, 2, 2) = h(1, 3, 3, 3) = 0.5
For x = 2 and y = 1 we have: h(1, 3, 2, 1) = h(1, 3, 3, 2) = 0.4
For x = 2 and y = 2 we have: h(1, 3, 3, 1) = 0.5
For x = 1 and y = 2 we have: h(1, 2, 1, 3) = h(2, 3, 1, 3) = 0.4
For x = 1 and y = 1 we have:
h(1, 2, 1, 2) = h(2, 3, 2, 3) = h(1, 2, 2, 3) = h(2, 3, 1, 2) = 0.0
For x = 1 and y = 0 we have:
18 Image Processing: The Fundamentals
h(1, 2, 1, 1) = h(1, 2, 2, 2) = h(1, 2, 3, 3) = h(2, 3, 1, 1) = h(2, 3, 2, 2) = h(2, 3, 3, 3) = 0.5
For x = 1 and y = 1 we have:
h(1, 2, 2, 1) = h(1, 2, 3, 2) = h(2, 3, 2, 1) = h(2, 3, 3, 2) = 0.0
For x = 1 and y = 2 we have: h(1, 2, 3, 1) = 0.5
For x = 0 and y = 2 we have: h(1, 1, 1, 3) = h(2, 2, 1, 3) = h(3, 3, 1, 3) = 0.0
For x = 0 and y = 1 we have:
h(1, 1, 1, 2) = h(2, 2, 1, 2) = h(3, 3, 1, 2) = h(1, 1, 2, 3) = h(2, 2, 2, 3) = h(3, 3, 2, 3) = 0.5
For x = 0 and y = 0 we have:
h(1, 1, 1, 1) = h(2, 2, 1, 1) = h(3, 3, 1, 1) = h(1, 1, 2, 2) = h(2, 2, 2, 2) =
h(3, 3, 2, 2) = h(1, 1, 3, 3) = h(2, 2, 3, 3) = h(3, 3, 3, 3) = 1.0
For x = 0 and y = 1 we have:
h(1, 1, 2, 1) = h(2, 2, 2, 1) = h(3, 3, 2, 1) = h(1, 1, 3, 2) = h(2, 2, 3, 2) = h(3, 3, 3, 2) = 1.0
For x = 0 and y = 2 we have: h(1, 1, 3, 1) = h(2, 2, 3, 1) = h(3, 3, 3, 1) = 1.0
For x = 1 and y = 2 we have: h(2, 1, 1, 3) = h(3, 2, 1, 3) = 0.4
For x = 1 and y = 1 we have:
h(2, 1, 1, 2) = h(2, 1, 2, 3) = h(3, 2, 1, 2) = h(3, 2, 2, 3) = 0.7
For x = 1 and y = 0 we have:
h(2, 1, 1, 1) = h(2, 1, 2, 2) = h(2, 1, 3, 3) = h(3, 2, 1, 1) = h(3, 2, 2, 2) = h(3, 2, 3, 3) = 0.8
For x = 1 and y = 1 we have:
h(2, 1, 2, 1) = h(2, 1, 3, 2) = h(3, 2, 2, 1) = h(3, 2, 3, 2) = 0.8
For x = 1 and y = 2 we have: h(2, 1, 3, 1) = h(3, 2, 3, 1) = 0.5
For x = 2 and y = 2 we have: h(3, 1, 1, 3) = 0.5
For x = 2 and y = 1 we have: h(3, 1, 1, 2) = h(3, 1, 2, 3) = 0.5
For x = 2 and y = 0 we have: h(3, 1, 1, 1) = h(3, 1, 2, 2) = h(3, 1, 3, 3) = 0.9
For x = 2 and y = 1 we have; h(3, 1, 2, 1) = h(3, 1, 3, 2) = 0.7
For x = 2 and y = 2 we have: h(3, 1, 3, 1) = 0.8
So, this is a shift invariant point spread function.
How can we express in practice the eect of a linear operator on an image?
This is done with the help of matrices. We can rewrite equation (1.15) as follows:
g(, ) =
f(1, 1)h(1, , 1, ) + f(2, 1)h(2, , 1, ) + . . . + f(N, 1)h(N, , 1, )
+f(1, 2)h(1, , 2, ) + f(2, 2)h(2, , 2, ) + . . . + f(N, 2)h(N, , 2, )
+. . . + f(1, N)h(1, , N, ) + f(2, N)h(2, , N, ) + . . .
+f(N, N)h(N, , N, ) (1.35)
The right-hand side of this expression can be thought of as the dot product of vector
h
T
[h(1, , 1, ), h(2, , 1, ), . . . , h(N, , 1, ), h(1, , 2, ), h(2, , 2, ), . . . ,
h(N, , 2, ), . . . , h(2, , N, ), . . . , h(N, , N, )] (1.36)
with vector:
1. Introduction 19
f
T
[f(1, 1), f(2, 1), . . . , f(N, 1), f(1, 2), f(2, 2), . . . , f(N, 2),
. . . , f(1, N), f(2, N), . . . , f(N, N)] (1.37)
This last vector is actually the image f(x, y) written as a vector by stacking its columns one
under the other. If we imagine writing g(, ) in the same way, then vectors h
T
will arrange
themselves as the rows of a matrix H, where for = 1, will run from 1 to N to give the
rst N rows of the matrix, then for = 2, will run again from 1 to N to give the second
N rows of the matrix, and so on. Thus, equation (1.15) may be written in a more compact
way as:
g = Hf (1.38)
This is the fundamental equation of linear image processing. H here is a square N
2
N
2
matrix that is made up of N N submatrices of size N N each, arranged in the following
way:
H =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
x
_
y = 1
= 1
_
x
_
y = 2
= 1
_
. . .
x
_
y = N
= 1
_
x
_
y = 1
= 2
_
x
_
y = 2
= 2
_
. . .
x
_
y = N
= 1
_
.
.
.
.
.
.
.
.
.
x
_
y = 1
= N
_
x
_
y = 2
= N
_
. . .
x
_
y = N
= N
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.39)
In this representation each bracketed expression represents an N N submatrix made up
from function h(x, , y, ) for xed values of y and and with variables x and taking up
all their possible values in the directions indicated by the arrows. This schematic structure
of matrix H is said to correspond to a partition of this matrix into N
2
square submatrices.
Example 1.8
A linear operator is such that it replaces the value of each pixel by the
average of its four nearest neighbours. Apply this operator to a 33 image
g. Assume that the image is repeated ad innitum in all directions, so that
all its pixels have neighbours. Work out the 99 matrix H that corresponds
to this operator by using equation (1.39).
If the image is repeated in all directions, the image and the neighbours of its border
pixels will look like this:
g
33
| g
31
g
32
g
33
| g
11
g
13
| g
11
g
12
g
13
| g
11
g
23
| g
21
g
22
g
23
| g
21
g
33
| g
31
g
32
g
33
| g
31
g
13
| g
11
g
12
g
13
| g
11
(1.40)
20 Image Processing: The Fundamentals
The result then of replacing every pixel by the average of its four nearest neighbours
is:
g
31
+g
12
+g
21
+g
13
4
g
32
+g
13
+g
22
+g
11
4
g
33
+g
11
+g
23
+g
12
4
g
11
+g
22
+g
31
+g
23
4
g
12
+g
23
+g
32
+g
21
4
g
13
+g
21
+g
33
+g
22
4
g
21
+g
32
+g
11
+g
33
4
g
22
+g
33
+g
12
+g
31
4
g
23
+g
31
+g
13
+g
32
4
(1.41)
In order to construct matrix H we must deduce from the above result the weight by
which a pixel at position (x, y) of the input image contributes to the value of a pixel at
position (, ) of the output image. Such value will be denoted by h(x, , y, ). Matrix
H will be made up from these values arranged as follows:
H =
_
_
h(1, 1, 1, 1) h(2, 1, 1, 1) h(3, 1, 1, 1) h(1, 1, 2, 1) h(2, 1, 2, 1)
h(1, 2, 1, 1) h(2, 2, 1, 1) h(3, 2, 1, 1) h(1, 2, 2, 1) h(2, 2, 2, 1)
h(1, 3, 1, 1) h(2, 3, 1, 1) h(3, 3, 1, 1) h(1, 3, 2, 1) h(2, 3, 2, 1)
h(1, 1, 1, 2) h(2, 1, 1, 2) h(3, 1, 1, 2) h(1, 1, 2, 2) h(2, 1, 2, 2)
h(1, 2, 1, 2) h(2, 2, 1, 2) h(3, 2, 1, 2) h(1, 2, 2, 2) h(2, 2, 2, 2)
h(1, 3, 1, 2) h(2, 3, 1, 2) h(3, 3, 1, 2) h(1, 3, 2, 2) h(2, 3, 2, 2)
h(1, 1, 1, 3) h(2, 1, 1, 3) h(3, 1, 1, 3) h(1, 1, 2, 3) h(2, 1, 2, 3)
h(1, 2, 1, 3) h(2, 2, 1, 3) h(3, 2, 1, 3) h(1, 2, 2, 3) h(2, 2, 2, 3)
h(1, 3, 1, 3) h(2, 3, 1, 3) h(3, 3, 1, 3) h(1, 3, 2, 3) h(2, 3, 2, 3)
h(3, 1, 2, 1) h(1, 1, 3, 1) h(2, 1, 3, 1) h(3, 1, 3, 1)
h(3, 2, 2, 1) h(1, 2, 3, 1) h(2, 2, 3, 1) h(3, 2, 3, 1)
h(3, 3, 2, 1) h(1, 3, 3, 1) h(2, 3, 3, 1) h(3, 3, 3, 1)
h(3, 1, 2, 2) h(1, 1, 3, 2) h(2, 1, 3, 2) h(3, 1, 3, 2)
h(3, 2, 2, 2) h(1, 2, 3, 2) h(2, 2, 3, 2) h(3, 2, 3, 2)
h(3, 3, 2, 2) h(1, 3, 3, 2) h(2, 3, 3, 2) h(3, 3, 3, 2)
h(3, 1, 2, 3) h(1, 1, 3, 3) h(2, 1, 3, 3) h(3, 1, 3, 3)
h(3, 2, 2, 3) h(1, 2, 3, 3) h(2, 2, 3, 3) h(3, 2, 3, 3)
h(3, 3, 2, 3) h(1, 3, 3, 3) h(2, 3, 3, 3) h(3, 3, 3, 3)
_
_
(1.42)
By inspection we deduce that:
H =
_
_
0 1/4 1/4 1/4 0 0 1/4 0 0
1/4 0 1/4 0 1/4 0 0 1/4 0
1/4 1/4 0 0 0 1/4 0 0 1/4
1/4 0 0 0 1/4 1/4 1/4 0 0
0 1/4 0 1/4 0 1/4 0 1/4 0
0 0 1/4 1/4 1/4 0 0 0 1/4
1/4 0 0 1/4 0 0 0 1/4 1/4
0 1/4 0 0 1/4 0 1/4 0 1/4
0 0 1/4 0 0 1/4 1/4 1/4 0
_
_
(1.43)
1. Introduction 21
Example 1.9
The eect of a linear operator is to subtract from every pixel its right
neighbour. This operator is applied to image (1.40). Work out the output
image and the H matrix that corresponds to this operator.
The result of this operator will be:
g
11
g
12
g
12
g
13
g
13
g
11
g
21
g
22
g
22
g
23
g
23
g
21
g
31
g
32
g
32
g
33
g
33
g
31
(1.44)
By following the procedure we followed in example 1.8, we can work out matrix H,
which here, for convenience, we call
H:
H =
_
_
1 0 0 1 0 0 0 0 0
0 1 0 0 1 0 0 0 0
0 0 1 0 0 1 0 0 0
0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 1
_
_
(1.45)
Example 1.10
The eect of a linear operator is to subtract from every pixel its bottom
right neighbour. This operator is applied to image (1.40). Work out the
output image and the H matrix that corresponds to this operator.
The result of this operator will be:
g
11
g
22
g
12
g
23
g
13
g
21
g
21
g
32
g
22
g
33
g
23
g
31
g
31
g
12
g
32
g
13
g
33
g
11
(1.46)
By following the procedure we followed in example 1.8, we can work out matrix H,
22 Image Processing: The Fundamentals
which here, for convenience, we call
H:
H =
_
_
1 0 0 0 1 0 0 0 0
0 1 0 0 0 1 0 0 0
0 0 1 1 0 0 0 0 0
0 0 0 1 0 0 0 1 0
0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 1 0 0
0 1 0 0 0 0 1 0 0
0 0 1 0 0 0 0 1 0
1 0 0 0 0 0 0 0 1
_
_
(1.47)
Can we apply more than one linear operators to an image?
We can apply as many operators as we like.
Does the order by which we apply the linear operators make any dierence to
the result?
No, if the operators are shift invariant. This is a very important and convenient property of
the commonly used linear operators.
Box 1.2. Since matrix multiplication is not commutative, how come we can
change the order by which we apply shift invariant linear operators?
In general, if A and B are two matrices, AB = BA. However, matrix H, which expresses
the eect of a linear operator, has a particular structure. We can see that from equation
(1.39): the N
2
N
2
matrix H can be divided into N
2
submatrices of size N N each.
There are at most N distinct such submatrices and they have a particular structure:
the second column of their elements can be produced from the rst column, by shifting
all elements by one position down. The element that sticks out at the bottom is put
at the empty position at the top. The third column is produced by applying the same
procedure to the second column as so on (see example 1.11). A matrix that has this
property is called circulant matrix. At the same time, these submatrices are also
arranged in a circulant way: the second column of them is produced from the rst
column by shifting all of them by one position down. The submatrix that sticks out at
the bottom is put at the empty position at the top. The third column of matrices is
created from the second by applying the same procedure and so on (see example 1.12).
A matrix that has this property is called block circulant. It is this particular structure
that allows one to exchange the order by which two operators with matrices H and
H,
respectively, are applied to an image g written as a vector:
H
Hg =
HHg (1.48)
1. Introduction 23
Example B1.11
Write down the form you expect two 3 3 circulant matrices to have and
show that their product commutes.
A 33 circulant matrix will contain at most 3 distinct elements. Let us call them ,
and for matrix A, and ,
and for matrix
A. Since these matrices are circulant,
they must have the following structure:
A =
_
_
_
_
A =
_
_
_
_
(1.49)
We can work out the product A
A:
A
A =
_
_
_
_
_
_
_
_
=
_
_
+
+ + +
+ +
+
+ + +
+ +
+
+ + +
+ +
_
_
(1.50)
We get the same result by working out
AA:
AA =
_
_
_
_
_
_
_
_
=
_
_
+ +
+ +
+ +
+ +
+ +
+ +
+
+ +
+ +
+
_
_
(1.51)
We can see that A
A =
AA, ie that these two matrices commute.
Example B1.12
Identify the 3 3 submatrices from which the H matrices of examples 1.8,
1.9 and 1.10 are made.
We can easily identify that matrix H of example 1.8 is made up from submatrices:
H
11
_
_
0 1/4 1/4
1/4 0 1/4
1/4 1/4 0
_
_
H
21
= H
31
_
_
1/4 0 0
0 1/4 0
0 0 1/4
_
_
(1.52)
24 Image Processing: The Fundamentals
Matrix
H of example 1.9 is made up from submatrices:
H
11
_
_
1 0 0
0 1 0
0 0 1
_
_
H
21
_
_
0 0 0
0 0 0
0 0 0
_
_
H
31
_
_
1 0 0
0 1 0
0 0 1
_
_
(1.53)
Matrix
H of example 1.10 is made up from submatrices:
H
11
_
_
1 0 0
0 1 0
0 0 1
_
_
H
21
_
_
0 0 0
0 0 0
0 0 0
_
_
H
31
_
_
0 1 0
0 0 1
1 0 0
_
_
(1.54)
Thus, matrices H,
H and
H of examples 1.8, 1.9 and 1.10, respectively, may be written
as:
H =
_
_
H
11
H
31
H
21
H
21
H
11
H
31
H
31
H
21
H
11
_
_
H =
_
_
H
11
H
31
H
21
H
21
H
11
H
31
H
31
H
21
H
11
_
_
H =
_
_
H
11
H
31
H
21
H
21
H
11
H
31
H
31
H
21
H
11
_
_
(1.55)
Each one of the submatrices is circulant and they are arranged in a circulant manner,
so matrices H,
H and
H are block circulant.
Example 1.13
Apply the operator of example 1.9 to the output image (1.41) of example
1.8 by working directly on the output image. Then apply the operator of
example 1.8 to the output image (1.44) of example 1.9. Compare the two
answers.
The operator of example 1.9 subtracts from every pixel the value of its right neighbour.
We remember that the image is assumed to be wrapped round so that all pixels have
neighbours in all directions. We perform this operation on image (1.41) and obtain:
_
_
g
31
+g
12
+g
21
g
32
g
22
g
11
4
g
32
+g
13
+g
22
g
33
g
23
g
12
4
g
33
+g
11
+g
23
g
31
g
21
g
13
4
g
11
+g
22
+g
31
g
12
g
32
g
21
4
g
12
+g
23
+g
32
g
13
g
33
g
22
4
g
13
+g
21
+g
33
g
11
g
31
g
23
4
g
21
+g
32
+g
11
g
22
g
12
g
31
4
g
22
+g
33
+g
12
g
23
g
13
g
32
4
g
23
+g
31
+g
13
g
21
g
11
g
33
4
_
_
(1.56)
The operator of example 1.8 replaces every pixel with the average of its four neighbours.
1. Introduction 25
We apply it to image (1.44) and obtain:
_
_
g
31
g
32
+g
12
g
13
+g
21
g
22
+g
13
g
11
4
g
32
g
33
+g
13
g
11
+g
22
g
23
+g
11
g
12
4
g
11
g
12
+g
22
g
23
+g
31
g
32
+g
23
g
21
4
g
12
g
13
+g
23
g
21
+g
32
g
33
+g
21
g
22
4
g
21
g
22
+g
32
g
33
+g
11
g
12
+g
33
g
31
4
g
22
g
23
+g
33
g
31
+g
12
g
13
+g
31
g
32
4
g
33
g
31
+g
11
g
12
+g
23
g
21
+g
12
g
13
4
g
13
g
11
+g
21
g
22
+g
33
g
31
+g
22
g
23
4
g
23
g
21
+g
31
g
32
+g
13
g
11
+g
32
g
33
4
_
_
(1.57)
By comparing outputs (1.56) and (1.57) we see that we get the same answer whichever
is the order by which we apply the operators.
Example 1.14
Use matrix multiplication to derive the matrix with which an image (writ-
ten in vector form) has to be multiplied from the left so that the operators
of examples 1.8 and 1.9 are applied in a cascaded way. Does the answer
depend on the order by which the operators are applied?
If we apply rst the operator of example 1.9 and then the operator of example 1.8, we
must compute the product H
H of matrices H and
H given by equations (1.43) and
(1.45), respectively:
H
H =
_
_
0
1
4
1
4
1
4
0 0
1
4
0 0
1
4
0
1
4
0
1
4
0 0
1
4
0
1
4
1
4
0 0 0
1
4
0 0
1
4
1
4
0 0 0
1
4
1
4
1
4
0 0
0
1
4
0
1
4
0
1
4
0
1
4
0
0 0
1
4
1
4
1
4
0 0 0
1
4
1
4
0 0
1
4
0 0 0
1
4
1
4
0
1
4
0 0
1
4
0
1
4
0
1
4
0 0
1
4
0 0
1
4
1
4
1
4
0
_
_
_
_
1 0 0 1 0 0 0 0 0
0 1 0 0 1 0 0 0 0
0 0 1 0 0 1 0 0 0
0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 1
_
_
=
_
_
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
_
_
(1.58)
26 Image Processing: The Fundamentals
If we apply rst the operator of example 1.8 and then the operator of example 1.9, we
must compute the product
HH of matrices H and
H given by equations (1.43) and
(1.45), respectively:
HH =
_
_
1 0 0 1 0 0 0 0 0
0 1 0 0 1 0 0 0 0
0 0 1 0 0 1 0 0 0
0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 0 1
1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 1
_
_
_
_
0
1
4
1
4
1
4
0 0
1
4
0 0
1
4
0
1
4
0
1
4
0 0
1
4
0
1
4
1
4
0 0 0
1
4
0 0
1
4
1
4
0 0 0
1
4
1
4
1
4
0 0
0
1
4
0
1
4
0
1
4
0
1
4
0
0 0
1
4
1
4
1
4
0 0 0
1
4
1
4
0 0
1
4
0 0 0
1
4
1
4
0
1
4
0 0
1
4
0
1
4
0
1
4
0 0
1
4
0 0
1
4
1
4
1
4
0
_
_
=
_
_
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
1/4 1/4 1/4 1/4 1/4 1/4 0 0 0
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
0 0 0 1/4 1/4 1/4 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
1/4 1/4 1/4 0 0 0 1/4 1/4 1/4
_
_
(1.59)
By comparing results (1.58) and (1.59) we see that the order by which we multiply the
matrices, and by extension the order by which we apply the operators, does not matter.
Example 1.15
Process image (1.40) with matrix (1.59) and compare your answer with the
output images produced in example 1.13.
To process image (1.40) by matrix (1.59), we must write it in vector form, by stacking
its columns one under the other:
1. Introduction 27
_
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
1
4
1
4
1
4
0 0 0
1
4
1
4
1
4
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
=
_
_
g
11
+g
21
+g
31
+g
12
g
22
g
32
4
g
11
g
21
+g
31
g
12
+g
22
g
32
4
g
11
+g
21
g
31
g
12
g
22
+g
32
4
g
12
+g
22
+g
32
+g
13
g
23
g
33
4
g
12
g
22
+g
32
g
13
+g
23
g
33
4
g
12
+g
22
g
32
g
13
g
23
+g
33
4
g
11
g
21
g
31
g
13
+g
23
+g
33
4
g
11
+g
21
g
31
+g
13
g
23
+g
33
4
g
11
g
21
+g
31
+g
13
+g
23
g
33
4
_
_
(1.60)
To create an image out of the output vector (1.60), we have to use its rst three
elements as the rst column of the image, the next three elements as the second column
of the image, and so on. The image we obtain is:
_
_
g
11
+g
21
+g
31
+g
12
g
22
g
32
4
g
12
+g
22
+g
32
+g
13
g
23
g
33
4
g
11
g
21
g
31
g
13
+g
23
+g
33
4
g
11
g
21
+g
31
g
12
+g
22
g
32
4
g
12
g
22
+g
32
g
13
+g
23
g
33
4
g
11
+g
21
g
31
+g
13
g
23
+g
33
4
g
11
+g
21
g
31
g
12
g
22
+g
32
4
g
12
+g
22
g
32
g
13
g
23
+g
33
4
g
11
g
21
+g
31
+g
13
+g
23
g
33
4
_
_
(1.61)
By comparing (1.61) and (1.56) we see that we obtain the same output image either we
apply the operators locally, or we operate on the whole image by using the corresponding
matrix.
Example 1.16
By examining matrices H,
H and
H given by equations (1.43), (1.45) and
(1.47), respectively, deduce the point spread function of the corresponding
operators.
As these operators are shift invariant, by denition, in order to work out their point
spread functions starting from their corresponding H matrix, we have to reason about
the structure of the matrix as exemplied by equation (1.39). The point spread function
will be the same for all pixels of the input image, so we might as well pick one of them;
say we pick the pixel in the middle of the input image, with coordinates x = 2 and
y = 2. Then, we have to read the values of the elements of the matrix that correspond
to all possible combinations of and , which indicate the coordinates of the output
image. According to equation (1.39), takes all its possible values along a column
of submatrices, while takes all its possible values along a column of any one of the
submatrices. Fixing the value of x = 2 means that we shall have to read only the
middle column of the submatrix we use. Fixing the value of y = 2 means that we
shall have to read only the middle column of submatrices. The middle column then of
28 Image Processing: The Fundamentals
matrix H, when wrapped to form a 33 image, will be the point spread function of the
operator, ie it will represent the output image we shall get if the operator is applied to
an input image that consists of only 0s except in the central pixel where it has value 1
(a single point source). We have to write the rst three elements of the central column
of the H matrix as the rst column of the output image, the next three elements as
the second column, and the last three elements as the last column of the output image.
For the three operators that correspond to matrices H,
H and
H, we obtain:
h =
_
_
0
1
4
0
1
4
0
1
4
0
1
4
0
_
_
h =
_
_
0 0 0
1 1 0
0 0 0
_
_
h =
_
_
1 0 0
0 1 0
0 0 0
_
_
(1.62)
Example 1.17
What will be the eect of the operator that corresponds to matrix (1.59)
on a 3 3 input image that depicts a point source in the middle?
The output image will be the point spread function of the operator. Following the
reasoning of example 1.16, we deduce that the point spread function of this composite
operator is:
_
_
1/4 1/4 0
1/4 1/4 0
1/4 1/4 0
_
_
(1.63)
Example 1.18
What will be the eect of the operator that corresponds to matrix (1.59)
on a 3 3 input image that depicts a point source in position (1, 2)?
In this case we want to work out the output for x = 1 and y = 2. We select from
matrix (1.59) the second column of submatrices, and the rst column of them and wrap
it to form a 3 3 image. We obtain:
_
_
1/4 1/4 0
1/4 1/4 0
1/4 1/4 0
_
_
(1.64)
Alternatively, we multiply from the left input image
g
T
=
_
0 0 0 1 0 0 0 0 0
_
(1.65)
1. Introduction 29
with matrix H
H given by equation (1.58) and write the output vector as an image.
We get exactly the same answer.
Box 1.3. What is the stacking operator?
The stacking operator allows us to write an N N image array as an N
2
1 vector, or
an N
2
1 vector as an N N square array.
We dene some vectors V
n
and some matrices N
n
as:
V
n
_
_
0
.
.
.
0
_
_
rows 1 to n 1
1 } row n
0
.
.
.
0
_
_
rows n + 1 to N
_
_
(1.66)
N
n
_
_
0
_
_
n 1 square N N ma-
trices on top of each other
with all their elements 0
1 0 . . . 0
0 1 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 1
_
_
the n
th
matrix is the unit
matrix
0
_
_
N n square N N ma-
trices on the top of each
other with all their ele-
ments 0
_
_
(1.67)
The dimensions of V
n
are N 1 and of N
n
N
2
N. Then vector f which corresponds
to the N N square matrix f is given by:
f =
N
n=1
N
n
fV
n
(1.68)
It can be shown that if f is an N
2
1 vector, we can write it as an N N matrix f,
the rst column of which is made up from the rst N elements of f , the second column
30 Image Processing: The Fundamentals
from the second N elements of f , and so on, by using the following expression:
f =
N
n=1
N
T
n
fV
T
n
(1.69)
Example B1.19
You are given a 3 3 image f and you are asked to use the stacking operator
to write it in vector form.
Let us say that:
f =
_
_
f
11
f
12
f
13
f
21
f
22
f
23
f
31
f
32
f
33
_
_
(1.70)
We dene vectors V
n
and matrices N
n
for n = 1, 2, 3:
V
1
=
_
_
1
0
0
_
_
, V
2
=
_
_
0
1
0
_
_
, V
3
=
_
_
0
0
1
_
_
(1.71)
N
1
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, N
2
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, N
3
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. (1.72)
According to equation (1.68):
f = N
1
fV
1
+ N
2
fV
2
+ N
3
fV
3
(1.73)
1. Introduction 31
We shall calculate each term separately:
N
1
fV
1
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
12
f
13
f
21
f
22
f
23
f
31
f
32
f
33
_
_
_
_
1
0
0
_
_
N
1
fV
1
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
0
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.74)
Similarly:
N
2
fV
2
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
f
12
f
22
f
32
0
0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, N
3
fV
3
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
0
0
f
13
f
23
f
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.75)
Then by substituting into (1.73), we obtain vector f .
Example B1.20
You are given a 9 1 vector f . Use the stacking operator to write it as a
3 3 matrix.
Let us say that:
32 Image Processing: The Fundamentals
f =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
f
12
f
22
f
32
f
13
f
23
f
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.76)
According to equation (1.69)
f = N
T
1
f V
1
T
+ N
T
2
f V
2
T
+ N
T
3
f V
3
T
(1.77)
(where N
1
, N
2
, N
3
, V
1
, V
2
and V
3
are dened in Box 1.3)
We shall calculate each term separately:
N
T
1
fV
T
1
=
_
_
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
f
12
f
22
f
32
f
13
f
23
f
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0
_
=
_
_
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
0 0
f
21
0 0
f
31
0 0
f
12
0 0
f
22
0 0
f
32
0 0
f
13
0 0
f
23
0 0
f
33
0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
f
11
0 0
f
21
0 0
f
31
0 0
_
_
(1.78)
Similarly:
N
T
2
fV
T
2
=
_
_
0 f
12
0
0 f
22
0
0 f
32
0
_
_
, N
T
3
fV
T
3
=
_
_
0 0 f
13
0 0 f
23
0 0 f
33
_
_
(1.79)
Then by substituting into (1.77), we obtain matrix f.
1. Introduction 33
Example B1.21
Show that the stacking operator is linear.
To show that an operator is linear we must show that we shall get the same answer
either we apply it to two images w and g and sum up the results with weights and ,
respectively, or we apply it to the weighted sum w+g directly. We start by applying
the stacking operator to composite image w + g, following equation (1.68):
N
n=1
N
n
(w + g)V
n
=
N
n=1
N
n
(wV
n
+ gV
n
)
=
N
n=1
(N
n
wV
n
+ N
n
gV
n
)
=
N
n=1
N
n
wV
n
+
N
n=1
N
n
gV
n
(1.80)
Since and do not depend on the summing index n, they may come out of the
summands:
N
n=1
N
n
(w + g)V
n
=
N
n=1
N
n
wV
n
+
N
n=1
N
n
gV
n
(1.81)
Then, we dene vector w to be the vector version of image w, given by
N
n=1
N
n
wV
n
and vector g to be the vector version of image g, given by
N
n=1
N
n
gV
n
, to obtain:
N
n=1
N
n
(w + g)V
n
= w+ g (1.82)
This proves that the stacking operator is linear, because if we apply it separately to
images w and g we shall get vectors w and g, respectively, which, when added with
weights and , will produce the result we obtained above by applying the operator to
the composite image w + g directly.
Example B1.22
Consider a 9 9 matrix H that is partitioned into nine 3 3 submatrices.
Show that if we multiply it from the left with matrix N
T
2
, dened in Box
1.3, we shall extract the second row of its submatrices.
34 Image Processing: The Fundamentals
We apply denition (1.67) for N = 3 and n = 2 to dene matrix N
2
and we write
explicitly all the elements of matrix H before we perform the multiplication:
N
T
2
H =
_
_
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
11
h
12
h
13
| h
14
h
15
h
16
| h
17
h
18
h
19
h
21
h
22
h
23
| h
24
h
25
h
26
| h
27
h
28
h
29
h
31
h
32
h
33
| h
34
h
35
h
36
| h
37
h
38
h
39
h
41
h
42
h
43
| h
44
h
45
h
46
| h
47
h
48
h
49
h
51
h
52
h
53
| h
54
h
55
h
56
| h
57
h
58
h
59
h
61
h
62
h
63
| h
64
h
65
h
66
| h
67
h
68
h
69
h
71
h
72
h
73
| h
74
h
75
h
76
| h
77
h
78
h
79
h
81
h
82
h
83
| h
84
h
85
h
86
| h
87
h
88
h
89
h
91
h
92
h
93
| h
94
h
95
h
96
| h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
h
41
h
42
h
43
| h
44
h
45
h
46
| h
47
h
48
h
49
h
51
h
52
h
53
| h
54
h
55
h
56
| h
57
h
58
h
59
h
61
h
62
h
63
| h
64
h
65
h
66
| h
67
h
68
h
69
_
_
(1.83)
We note that the result is a matrix made up from the middle row of partitions of the
original matrix H.
Example B1.23
Consider a 9 9 matrix H that is partitioned into nine 3 3 submatrices.
Show that if we multiply it from the right with matrix N
3
, dened in Box
1.3, we shall extract the third column of its submatrices.
1. Introduction 35
We apply denition (1.67) for N = 3 and n = 3 to dene matrix N
3
and we write
explicitly all the elements of matrix H before we perform the multiplication:
HN
3
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
11
h
12
h
13
| h
14
h
15
h
16
| h
17
h
18
h
19
h
21
h
22
h
23
| h
24
h
25
h
26
| h
27
h
28
h
29
h
31
h
32
h
33
| h
34
h
35
h
36
| h
37
h
38
h
39
h
41
h
42
h
43
| h
44
h
45
h
46
| h
47
h
48
h
49
h
51
h
52
h
53
| h
54
h
55
h
56
| h
57
h
58
h
59
h
61
h
62
h
63
| h
64
h
65
h
66
| h
67
h
68
h
69
h
71
h
72
h
73
| h
74
h
75
h
76
| h
77
h
78
h
79
h
81
h
82
h
83
| h
84
h
85
h
86
| h
87
h
88
h
89
h
91
h
92
h
93
| h
94
h
95
h
96
| h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
17
h
18
h
19
h
27
h
28
h
29
h
37
h
38
h
39
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
h
77
h
78
h
79
h
87
h
88
h
89
h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.84)
We observe that the result is a matrix made up from the last column of the partitions
of the original matrix H.
Example B1.24
Multiply the 3 9 matrix produced in example 1.22 with the 9 3 matrix
produced in example 1.23 and show that the resultant 3 3 matrix is the
sum of the individual multiplications of the corresponding partitions.
36 Image Processing: The Fundamentals
_
_
h
41
h
42
h
43
| h
44
h
45
h
46
| h
47
h
48
h
49
h
51
h
52
h
53
| h
54
h
55
h
56
| h
57
h
58
h
59
h
61
h
62
h
63
| h
64
h
65
h
66
| h
67
h
68
h
69
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
17
h
18
h
19
h
27
h
28
h
29
h
37
h
38
h
39
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
h
77
h
78
h
79
h
87
h
88
h
89
h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
h
41
h
17
+ h
42
h
27
+ h
43
h
37
+ h
44
h
47
+ h
45
h
57
+ h
46
h
67
+ h
47
h
77
+ h
48
h
87
+ h
49
h
97
h
51
h
17
+ h
52
h
27
+ h
53
h
37
+ h
54
h
47
+ h
55
h
57
+ h
56
h
67
+ h
57
h
77
+ h
58
h
87
+ h
59
h
97
h
61
h
17
+ h
62
h
27
+ h
63
h
37
+ h
64
h
47
+ h
65
h
57
+ h
66
h
67
+ h
67
h
77
+ h
68
h
87
+ h
69
h
97
h
41
h
18
+ h
42
h
28
+ h
43
h
38
+ h
44
h
48
+ h
45
h
58
+ h
46
h
68
+ h
47
h
78
+ h
48
h
88
+ h
49
h
98
h
51
h
18
+ h
52
h
28
+ h
53
h
38
+ h
54
h
48
+ h
55
h
58
+ h
56
h
68
+ h
57
h
78
+ h
58
h
88
+ h
59
h
98
h
61
h
18
+ h
62
h
28
+ h
63
h
38
+ h
64
h
48
+ h
65
h
58
+ h
66
h
68
+ h
67
h
78
+ h
68
h
88
+ h
69
h
98
h
41
h
19
+ h
42
h
29
+ h
43
h
39
+ h
44
h
49
+ h
45
h
59
+ h
46
h
69
+ h
47
h
79
+ h
48
h
89
+ h
49
h
99
h
51
h
19
+ h
52
h
29
+ h
53
h
39
+ h
54
h
49
+ h
55
h
59
+ h
56
h
69
+ h
57
h
79
+ h
58
h
89
+ h
59
h
99
h
61
h
19
+ h
62
h
29
+ h
63
h
39
+ h
64
h
49
+ h
65
h
59
+ h
66
h
69
+ h
67
h
79
+ h
68
h
89
+ h
69
h
99
_
_
=
_
_
h
41
h
17
+ h
42
h
27
+ h
43
h
37
h
41
h
18
+ h
42
h
28
+ h
43
h
38
h
41
h
19
+ h
42
h
29
+ h
43
h
39
h
51
h
17
+ h
52
h
27
+ h
53
h
37
h
51
h
18
+ h
52
h
28
+ h
53
h
38
h
51
h
19
+ h
52
h
29
+ h
53
h
39
h
61
h
17
+ h
62
h
27
+ h
63
h
37
h
61
h
18
+ h
62
h
28
+ h
63
h
38
h
61
h
19
+ h
62
h
29
+ h
63
h
39
_
_
+
_
_
h
44
h
47
+ h
45
h
57
+ h
46
h
67
h
44
h
48
+ h
45
h
58
+ h
46
h
68
h
44
h
49
+ h
45
h
59
+ h
46
h
69
h
54
h
47
+ h
55
h
57
+ h
56
h
67
h
54
h
48
+ h
55
h
58
+ h
56
h
68
h
54
h
49
+ h
55
h
59
+ h
56
h
69
h
64
h
47
+ h
65
h
57
+ h
66
h
67
h
64
h
48
+ h
65
h
58
+ h
66
h
68
h
64
h
49
+ h
65
h
59
+ h
66
h
69
_
_
+
_
_
h
47
h
77
+ h
48
h
87
+ h
49
h
97
h
47
h
78
+ h
48
h
88
+ h
49
h
98
h
47
h
79
+ h
48
h
89
+ h
49
h
99
h
57
h
77
+ h
58
h
87
+ h
59
h
97
h
57
h
78
+ h
58
h
88
+ h
59
h
98
h
57
h
79
+ h
58
h
89
+ h
59
h
99
h
67
h
77
+ h
68
h
87
+ h
69
h
97
h
67
h
78
+ h
68
h
88
+ h
69
h
98
h
67
h
79
+ h
68
h
89
+ h
69
h
99
_
_
=
_
_
h
41
h
42
h
43
h
51
h
52
h
53
h
61
h
62
h
63
_
_
_
_
h
17
h
18
h
19
h
27
h
28
h
29
h
37
h
38
h
39
_
_
+
_
_
h
44
h
45
h
46
h
54
h
55
h
56
h
64
h
65
h
66
_
_
_
_
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
_
_
+
_
_
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
_
_
_
_
h
77
h
78
h
79
h
87
h
88
h
89
h
97
h
98
h
99
_
_
1. Introduction 37
Example B1.25
Use the stacking operator to show that the order of two linear operators
can be interchanged as long as the multiplication of two circulant matrices
is commutative.
Consider an operator with matrix H applied to vector f constructed from an image f,
by using equation (1.68):
f Hf = H
N
n=1
N
n
fV
n
(1.85)
The result is vector
f , to which we can apply matrix
H that corresponds to another
linear operator:
f
H
f =
HH
N
n=1
N
n
fV
n
(1.86)
From this output vector
f by applying equation
(1.69):
f =
N
m=1
N
T
m
f V
m
T
(1.87)
We may replace
f =
N
m=1
N
T
m
HH
N
n=1
N
n
fV
n
V
m
T
=
N
m=1
N
n=1
(N
T
m
H)(HN
n
)f(V
n
V
m
T
) (1.88)
The various factors in (1.88) have been grouped together to facilitate interpretation.
Factor (N
T
m
H) extracts from matrix
H the m
th
row of its partitions (see exam-
ple 1.22), while factor (HN
n
) extracts from matrix H the n
th
column of its partitions
(see example 1.23). The product of these two factors extracts the (m, n) partition of
matrix
HH, which is a submatrix of size N N. This submatrix is equal to the sum
of the products of the corresponding partitions of matrices (N
T
m
H) and (HN
n
) (see
example 1.24). Since these products are products of circulant matrices (see example
1.11), the order by which they are multiplied does not matter.
So, we shall get the same answer here either we have (N
T
m
H)(HN
n
) or (N
T
m
H)(
HN
n
),
ie we shall get the same answer whichever way we apply the two linear operators.
38 Image Processing: The Fundamentals
What is the implication of the separability assumption on the structure of matrix
H?
According to the separability assumption, we can replace h(x, , y, ) with the product of
two functions, h
c
(x, )h
r
(y, ). Then inside each partition of H in equation (1.39), h
c
(x, )
remains constant and we may write for H:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
r11
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
. . . h
rN1
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
h
r12
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
. . . h
rN2
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
.
.
.
.
.
.
.
.
.
.
.
.
h
r1N
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
. . . h
rNN
_
_
_
_
_
h
c11
. . . h
cN1
h
c12
. . . h
cN2
.
.
.
.
.
.
h
c1N
. . . h
cNN
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.89)
Here the arguments of functions h
c
(x, ) and h
r
(y, ) have been written as indices to save
space. We say then that matrix H is the Kronecker product of matrices h
T
r
and h
T
c
and
we write this as:
H = h
T
r
h
T
c
(1.90)
Example 1.26
Calculate the Kronecker product AB where
A
_
_
1 2 3
4 3 1
2 4 1
_
_
B
_
_
2 0 1
0 1 3
2 1 0
_
_
(1.91)
1. Introduction 39
A B =
_
_
1 B 2 B 3 B
4 B 3 B 1 B
2 B 4 B 1 B
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
_
_
2 0 1
0 1 3
2 1 0
_
_
2
_
_
2 0 1
0 1 3
2 1 0
_
_
3
_
_
2 0 1
0 1 3
2 1 0
_
_
4
_
_
2 0 1
0 1 3
2 1 0
_
_
3
_
_
2 0 1
0 1 3
2 1 0
_
_
1
_
_
2 0 1
0 1 3
2 1 0
_
_
2
_
_
2 0 1
0 1 3
2 1 0
_
_
4
_
_
2 0 1
0 1 3
2 1 0
_
_
1
_
_
2 0 1
0 1 3
2 1 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2 0 1 4 0 2 6 0 3
0 1 3 0 2 6 0 3 9
2 1 0 4 2 0 6 3 0
8 0 4 6 0 3 2 0 1
0 4 12 0 3 9 0 1 3
8 4 0 6 3 0 2 1 0
4 0 2 8 0 4 2 0 1
0 2 6 0 4 12 0 1 3
4 2 0 8 4 0 2 1 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.92)
How can a separable transform be written in matrix form?
Consider again equation (1.19) which expresses the separable linear transform of an image:
g(, ) =
N
x=1
h
c
(x, )
N
y=1
f(x, y)h
r
(y, ) (1.93)
Notice that factor
N
y=1
f(x, y)h
r
(y, ) actually represents the product fh
r
of two N N
matrices, which must be another matrix s fh
r
of the same size. Let us dene an element
of s as:
s(x, )
N
y=1
f(x, y)h
r
(y, ) = fh
r
(1.94)
Then (1.93) may be written as:
g(, ) =
N
x=1
h
c
(x, )s(x, ) (1.95)
40 Image Processing: The Fundamentals
Thus, in matrix form:
g = h
T
c
s = h
T
c
fh
r
(1.96)
What is the meaning of the separability assumption?
Let us assume that operator O has point spread function h(x, , y, ), which is separable. The
separability assumption implies that operator O operates on the rows of the image matrix
f independently from the way it operates on its columns. These independent operations are
expressed by the two matrices h
r
and h
c
, respectively. That is why we chose subscripts r and
c to denote these matrices (r = rows, c = columns). Matrix h
r
is used to multiply the image
from the right. Ordinary matrix multiplication then means that the rows of the image are
multiplied with it. Matrix h
c
is used to multiply the image from the left. Thus, the columns
of the image are multiplied with it.
Example B1.27
Are the operators which correspond to matrices H,
H and
H given by
equations (1.43), (1.45) and (1.47), respectively, separable?
If an operator is separable, we must be able to write its H matrix as the Kronecker
product of two matrices. In other words, we must check whether from every submatrix
of H we can get out a common factor, such that the submatrix that remains is the
same for all partitions. We can see that this is possible for matrix
H, but it is not
possible for matrices H and
H. For example, some of the partitions of matrices H
and
H are diagonal, while others are not. It is impossible to express them then as the
product of a scalar with the same 3 3 matrix. On the other hand, all partitions of
H are diagonal (or zero), so we can see that the common factors we can get out from
each partition form matrix
A
_
_
1 1 0
0 1 1
1 0 1
_
_
(1.97)
while the common matrix that is multiplied in each partition by these coecients is
the 3 3 identity matrix:
I
_
_
1 0 0
0 1 0
0 0 1
_
_
(1.98)
So, we may write:
H = AI (1.99)
1. Introduction 41
Box 1.4. The formal derivation of the separable matrix equation
We can use equations (1.68) and (1.69) with (1.38) as follows. First, express the output
image g using (1.69) in terms of g:
g =
N
m=1
N
T
m
gV
T
m
(1.100)
Then express g in terms of H and f from (1.38) and replace f in terms of f using (1.68):
g = H
N
n=1
N
n
fV
n
(1.101)
Substitute (1.101) into (1.100) and group factors with the help of brackets to obtain:
g =
N
m=1
N
n=1
(N
T
m
HN
n
)f(V
n
V
T
m
) (1.102)
H is a N
2
N
2
matrix. We may think of it as partitioned in NN submatrices stacked
together. Then it can be shown that N
T
m
HN
n
is the H
mn
such submatrix (see example
1.28).
Under the separability assumption, matrix H is the Kronecker product of matrices h
c
and h
r
:
H = h
T
c
h
T
r
(1.103)
Then partition H
mn
is essentially h
T
r
(m, n)h
T
c
. If we substitute this in (1.102), we
obtain:
g =
N
m=1
N
n=1
h
T
r
(m, n)
. .
a scalar
h
T
c
f(V
n
V
T
m
)
g = h
T
c
f
N
m=1
N
n=1
h
T
r
(m, n)V
n
V
T
m
(1.104)
Product V
n
V
T
m
is the product between an N 1 matrix with the only nonzero element
at position n, with a 1 N matrix, with the only nonzero element at position m. So,
it is an N N square matrix with the only nonzero element at position (n, m). When
multiplied with h
T
r
(m, n), it places the (m, n) element of the h
T
r
matrix in position
(n, m) and sets to zero all other elements. The sum over all ms and ns is matrix h
r
.
So, from (1.104) we have:
g = h
T
c
fh
r
(1.105)
42 Image Processing: The Fundamentals
Example 1.28
You are given a 9 9 matrix H which is partitioned into nine 3 3
submatrices. Show that N
T
2
HN
3
, where N
2
and N
3
are matrices of the
stacking operator, is partition H
23
of matrix H.
H
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
11
h
12
h
13
| h
14
h
15
h
16
| h
17
h
18
h
19
h
21
h
22
h
23
| h
24
h
25
h
26
| h
27
h
28
h
29
h
31
h
32
h
33
| h
34
h
35
h
36
| h
37
h
38
h
39
h
41
h
42
h
43
| h
44
h
45
h
46
| h
47
h
48
h
49
h
51
h
52
h
53
| h
54
h
55
h
56
| h
57
h
58
h
59
h
61
h
62
h
63
| h
64
h
65
h
66
| h
67
h
68
h
69
h
71
h
72
h
73
| h
74
h
75
h
76
| h
77
h
78
h
79
h
81
h
82
h
83
| h
84
h
85
h
86
| h
87
h
88
h
89
h
91
h
92
h
93
| h
94
h
95
h
96
| h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(1.106)
N
T
2
HN
3
=
_
_
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
11
h
12
h
13
h
14
h
15
h
16
h
17
h
18
h
19
h
21
h
22
h
23
h
24
h
25
h
26
h
27
h
28
h
29
h
31
h
32
h
33
h
34
h
35
h
36
h
37
h
38
h
39
h
41
h
42
h
43
h
44
h
45
h
46
h
47
h
48
h
49
h
51
h
52
h
53
h
54
h
55
h
56
h
57
h
58
h
59
h
61
h
62
h
63
h
64
h
65
h
66
h
67
h
68
h
69
h
71
h
72
h
73
h
74
h
75
h
76
h
77
h
78
h
79
h
81
h
82
h
83
h
84
h
85
h
86
h
87
h
88
h
89
h
91
h
92
h
93
h
94
h
95
h
96
h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
17
h
18
h
19
h
27
h
28
h
29
h
37
h
38
h
39
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
h
77
h
78
h
79
h
87
h
88
h
89
h
97
h
98
h
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
h
47
h
48
h
49
h
57
h
58
h
59
h
67
h
68
h
69
_
_
= H
23
(1.107)
1. Introduction 43
What is the take home message of this chapter?
Under the assumption that the operator with which we manipulate an image is linear and
separable, this operation may be expressed by an equation of the form
g = h
T
c
fh
r
(1.108)
where f and g are the input and output images, respectively, and h
c
and h
r
are matrices
expressing the point spread function of the operator.
Figure 1.7: The original image Ancestors and a compressed version of it.
What is the signicance of equation (1.108) in linear image processing?
In linear image processing we are trying to solve the following four problems in relation to
equation (1.108).
Given an image f, choose matrices h
c
and h
r
so that the output image g is better than
f, according to some subjective criteria. This is the problem of image enhancement.
Linear methods are not very successful here. Most of image enhancement is done with
the help of nonlinear methods.
44 Image Processing: The Fundamentals
Given an image f, choose matrices h
c
and h
r
so that g can be represented by fewer
bits than f, without much loss of detail. This is the problem of image compression.
Quite a few image compression methods rely on such an approach.
Given an image g and an estimate of h(x, , y, ), recover image f. This is the problem
of image restoration. A lot of commonly used approaches to image restoration follow
this path.
Given an image f, choose matrices h
c
and h
r
so that output image g salienates certain
features of f. This is the problem of feature extraction. Algorithms that attempt to
do that often include a linear step that can be expressed by equation (1.108), but most
of the times they also include nonlinear components.
Figures 1.71.11 show examples of these processes.
Figure 1.8: The original image Birthday and its enhanced version.
What is this book about?
This book is about introducing the mathematical foundations of image processing in the
context of specic applications in the four main themes of image processing as identied
above. The themes of image enhancement, image restoration and feature extraction will be
discussed in detail. The theme of image compression is only touched upon as this could be
the topic of a whole book on its own. This book puts emphasis on linear methods, but several
nonlinear techniques relevant to image enhancement, image restoration and feature extraction
will also be presented.
1. Introduction 45
Figure 1.9: The blurred original image Hara and its restored version.
Figure 1.10: The original image Mitsos and its edge maps of decreasing detail (indicating
locations where the brightness of the image changes abruptly).
46 Image Processing: The Fundamentals
(a) Image Siblings (b) Thresholding and binarisation
(c) Gradient magnitude (d) Region segmentation
Figure 1.11: There are various ways to reduce the information content of an image and
salienate aspects of interest for further analysis.
Chapter 2
Image Transformations
What is this chapter about?
This chapter is concerned with the development of some of the most important tools of linear
image processing, namely the ways by which we express an image as the linear superposition
of some elementary images.
How can we dene an elementary image?
There are many ways to do that. We already saw one in Chapter 1: an elementary image
has all its pixels black except one that has value 1. By shifting the position of the nonzero
pixel to all possible positions, we may create N
2
dierent such elementary images in terms
of which we may expand any N N image. In this chapter, we shall use more sophisticated
elementary images and dene an elementary image as the outer product of two vectors.
What is the outer product of two vectors?
Consider two vectors N 1:
u
T
i
= (u
i1
, u
i2
, . . . , u
iN
)
v
T
j
= (v
j1
, v
j2
, . . . , v
jN
) (2.1)
Their outer product is dened as:
u
i
v
T
j
=
_
_
_
_
_
u
i1
u
i2
.
.
.
u
iN
_
_
_
_
_
_
v
j1
v
j2
. . . v
jN
_
=
_
_
_
_
_
u
i1
v
j1
u
i1
v
j2
. . . u
i1
v
jN
u
i2
v
j1
u
i2
v
j2
. . . u
i2
v
jN
.
.
.
.
.
.
.
.
.
u
iN
v
j1
u
iN
v
j2
. . . u
iN
v
jN
_
_
_
_
_
(2.2)
Therefore, the outer product of these two vectors is an N N matrix which may be thought
of as an image.
How can we expand an image in terms of vector outer products?
We saw in the previous chapter that a general separable linear transformation of an image
matrix f may be written as
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
48 Image Processing: The Fundamentals
g = h
T
c
fh
r
(2.3)
where g is the output image and h
c
and h
r
are the transforming matrices.
We may use the inverse matrices of h
T
c
and h
r
to solve this expression for f in terms of g,
as follows: multiply both sides of the equation with (h
T
c
)
1
on the left and with h
1
r
on the
right:
(h
T
c
)
1
gh
1
r
= (h
T
c
)
1
h
T
c
fh
r
h
1
r
= f (2.4)
Thus we write:
f = (h
T
c
)
1
gh
1
r
(2.5)
Let us assume that we partition matrices (h
T
c
)
1
and h
1
r
in their column and row vectors,
respectively:
_
h
T
c
_
1
(u
1
[u
2
[ . . . [u
N
) , h
1
r
_
_
_
_
_
_
_
_
_
_
_
v
T
1
v
T
2
.
.
.
v
T
N
_
_
_
_
_
_
_
_
_
_
_
(2.6)
Then:
f =
_
u
1
u
2
. . . u
N
_
g
_
_
_
_
_
v
T
1
v
T
2
.
.
.
v
T
N
_
_
_
_
_
(2.7)
We may also write matrix g as the sum of N
2
, N N matrices, each one having only one
nonzero element:
g =
_
_
_
_
_
g
11
0 . . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
+
_
_
_
_
_
0 g
12
. . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
+ +
_
_
_
_
_
0 0 . . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . g
NN
_
_
_
_
_
(2.8)
Then equation (2.7) may be written as:
f =
N
i=1
N
j=1
g
ij
u
i
v
T
j
(2.9)
This is an expansion of image f in terms of vector outer products. The outer product
u
i
v
T
j
may be interpreted as an image so that the sum over all combinations of the outer
products, appropriately weighted by the g
ij
coecients, represents the original image f.
2. Image Transformations 49
Example 2.1
Derive the term with i = 2 and j = 1 on the right-hand side of equation
(2.9).
Let us denote by u
i1
, u
i2
, . . . , u
iN
the elements of vector u
i
and by v
i1
, v
i2
, . . . , v
iN
the elements of vector v
i
.
If we substitute g from equation (2.8) into equation (2.7), the right-hand side of equa-
tion (2.7) will consist of N
2
terms of similar form. One such term is:
_
u
1
u
2
. . . u
N
_
_
_
_
_
_
0 0 . . . 0
g
21
0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
_
_
_
_
_
v
1
T
v
2
T
.
.
.
v
N
T
_
_
_
_
_
=
_
u
1
u
2
. . . u
N
_
_
_
_
_
_
0 0 . . . 0
g
21
0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
_
_
_
_
_
v
11
v
12
. . . v
1N
v
21
v
22
. . . v
2N
.
.
.
.
.
.
.
.
.
v
N1
v
N2
. . . v
NN
_
_
_
_
_
=
_
_
_
_
_
u
11
u
21
. . . u
N1
u
12
u
22
. . . u
N2
.
.
.
.
.
.
.
.
.
u
1N
u
2N
. . . u
NN
_
_
_
_
_
_
_
_
_
_
0 0 . . . 0
g
21
v
11
g
21
v
12
. . . g
21
v
1N
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
=
_
_
_
_
u
21
g
21
v
11
u
21
g
21
v
12
. . . u
21
g
21
v
1N
u
22
g
21
v
11
u
22
g
21
v
12
. . . u
22
g
21
v
1N
. . . . . . . . .
u
2N
g
21
v
11
u
2N
g
21
v
12
. . . u
2N
g
21
v
1N
_
_
_
_
= g
21
_
_
_
_
u
21
v
11
u
21
v
12
. . . u
21
v
1N
u
22
v
11
u
22
v
12
. . . u
22
v
1N
. . . . . . . . .
u
2N
v
11
u
2N
v
12
. . . u
2N
v
1N
_
_
_
_
= g
21
u
2
v
T
1
How do we choose matrices h
c
and h
r
?
There are various options for the choice of matrices h
c
and h
r
, according to what we wish to
achieve. For example, we may choose them so that the transformed image may be represented
by fewer bits than the original one, or we may choose them so that truncation of the expansion
of the original image smooths it by omitting its high frequency components, or optimally
approximates it according to some predetermined criterion. It is often convenient to choose
matrices h
c
and h
r
to be unitary so that the transform is easily invertible. If matrices h
c
and h
r
are chosen to be unitary, equation (2.3) represents a unitary transform of f, and
g is termed the unitary transform domain of image f.
50 Image Processing: The Fundamentals
What is a unitary matrix?
A matrix U is called unitary if its inverse is the complex conjugate of its transpose, ie
UU
T
= I (2.10)
where I is the unit matrix. We sometimes write superscript H instead of T and call
U
T
U
H
the Hermitian transpose or conjugate transpose of matrix U.
If the elements of the matrix are real numbers, we use the term orthogonal instead of
unitary.
What is the inverse of a unitary transform?
If matrices h
c
and h
r
in (2.3) are unitary, then the inverse of it is:
f = h
c
gh
H
r
(2.11)
For simplicity, from now and on we shall write U instead of h
c
and V instead of h
r
, so
that the expansion of an image f in terms of vector outer products may be written as:
f = U
gV
H
(2.12)
How can we construct a unitary matrix?
If we consider equation (2.10), we see that for matrix U to be unitary, the requirement is that
the dot product of any of its columns with the complex conjugate of any other column must
be zero, while the magnitude of any of its column vectors must be 1. In other words, U is
unitary if its columns form a set of orthonormal vectors.
How should we choose matrices U and V so that g can be represented by fewer
bits than f?
If we want to represent image f with fewer than N
2
number of elements, we may choose
matrices U and V so that the transformed image g is a diagonal matrix. Then we could
represent image f with the help of equation (2.9) using only the N nonzero elements of g.
This can be achieved with a process called matrix diagonalisation and it is called Singular
Value Decomposition (SVD) of the image.
What is matrix diagonalisation?
Diagonalisation of a matrix A is the process by which we identify two matrices A
u
and A
v
so
that matrix A
u
AA
v
J is diagonal.
Can we diagonalise any matrix?
In general no. For a start, a matrix has to be square in order to be diagonalisable. If a matrix
is square and symmetric, then we can always diagonalise it.
Singular value decomposition 51
2.1 Singular value decomposition
How can we diagonalise an image?
An image is not always square and almost never symmetric. We cannot, therefore, apply
matrix diagonalisation directly. What we do is to create a symmetric matrix from it, which
is then diagonalised. The symmetric matrix we create out of an image g is gg
T
(see example
2.2). The matrices, which help us then express an image as the sum of vector outer products,
are constructed from matrix gg
T
, rather than from the image itself directly. This is the
process of Singular Value Decomposition (SVD). It can be shown then (see Box 2.1),
that if gg
T
is a matrix of rank r, matrix g can be written as
g = U
1
2
V
T
(2.13)
where U and V are orthogonal matrices of size N r and
1
2
is a diagonal r r matrix.
Example 2.2
You are given an image which is represented by a matrix g. Show that
matrix gg
T
is symmetric.
A matrix is symmetric when it is equal to its transpose. Therefore, we must show that
the transpose of gg
T
is equal to gg
T
. Consider the transpose of gg
T
:
(gg
T
)
T
= (g
T
)
T
g
T
= gg
T
(2.14)
Example B2.3
If is a diagonal 2 2 matrix and
m
is dened by putting all nonzero
elements of to the power of m, show that:
1
2
1
2
= I and
1
2
1
2
= I. (2.15)
Indeed:
1
2
1
2
=
_
1
2
1
0
0
1
2
2
_
_
1
0
0
2
_
_
1
2
1
0
0
1
2
2
_
=
_
1
2
1
0
0
1
2
2
__
1
2
1
0
0
1
2
2
_
=
_
1 0
0 1
_
(2.16)
This also shows that
1
2
1
2
= I.
52 Image Processing: The Fundamentals
Example B2.4
Assume that H is a 3 3 matrix and partition it into a 2 3 submatrix H
1
and a 1 3 submatrix H
2
. Show that:
H
T
H = H
T
1
H
1
+H
T
2
H
2
(2.17)
Let us say that:
H =
_
_
h
11
h
12
h
13
h
21
h
22
h
23
h
31
h
32
h
33
_
_
, H
1
_
h
11
h
12
h
13
h
21
h
22
h
23
_
, and H
2
_
h
31
h
32
h
33
_
(2.18)
We start by computing the left-hand side of (2.17):
H
T
H =
_
_
h
11
h
21
h
31
h
12
h
22
h
32
h
13
h
23
h
33
_
_
_
_
h
11
h
12
h
13
h
21
h
22
h
23
h
31
h
32
h
33
_
_
=
_
_
h
2
11
+h
2
21
+
h
2
31
h
11
h
12
+h
21
h
22
+
h
31
h
32
h
11
h
13
+h
21
h
23
+
h
31
h
33
h
12
h
11
+h
22
h
21
+
h
32
h
31
h
2
12
+h
2
22
+
h
2
32
h
12
h
13
+h
22
h
23
+
h
32
h
33
h
13
h
11
+h
23
h
21
+
h
33
h
31
h
13
h
12
+h
23
h
22
+
h
33
h
32
h
2
13
+h
2
23
+
h
2
33
_
_
(2.19)
Next, we compute the right-hand side of (2.17), by computing each term separately:
H
T
1
H
1
=
_
_
h
11
h
21
h
12
h
22
h
13
h
23
_
_
_
h
11
h
12
h
13
h
21
h
22
h
23
_
=
_
_
h
2
11
+h
2
21
h
11
h
12
+h
21
h
22
h
11
h
13
+h
21
h
23
h
12
h
11
+h
22
h
21
h
2
12
+h
2
22
h
12
h
13
+h
22
h
23
h
13
h
11
+h
23
h
21
h
13
h
12
+h
23
h
22
h
2
13
+h
2
23
_
_
(2.20)
H
T
2
H
2
=
_
_
h
31
h
32
h
33
_
_
_
h
31
h
32
h
33
_
=
_
_
h
2
31
h
31
h
32
h
31
h
33
h
32
h
31
h
2
32
h
32
h
33
h
33
h
31
h
33
h
32
h
2
33
_
_
(2.21)
Adding H
T
1
H
1
and H
T
2
H
2
we obtain the same answer as the one we obtained by cal-
culating the left-hand side of equation (2.17) directly.
Singular value decomposition 53
Example B2.5
Show that if we partition an N N matrix S into an r N submatrix S
1
and an (N r) N submatrix S
2
, equation
SAS
T
=
_
_
S
1
AS
T
1
[ S
1
AS
T
2
[
S
2
AS
T
1
[ S
2
AS
T
2
_
_
(2.22)
is correct, with A being an N N matrix.
Trivially:
SAS
T
=
_
_
S
1
S
2
_
_
A
_
S
T
1
[ S
T
2
_
(2.23)
Consider the multiplication of A with
_
S
T
1
[ S
T
2
_
. The rows of A will be multiplied
with the columns of
_
S
T
1
[ S
T
2
_
. Schematically:
_
_
_
_
. . . . . .
. . . . . .
. . . . . .
_
_
_
_
_
_
_
_
_
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
_
_
_
_
_
(2.24)
Then it becomes clear that the result will be
_
AS
T
1
[ AS
T
2
_
. Next we consider the
multiplication of
_
_
S
1
S
2
_
_
with
_
AS
T
1
[ AS
T
2
_
. The rows of
_
_
S
1
S
2
_
_
will mul-
tiply the columns of
_
AS
T
1
[ AS
T
2
_
. Schematically:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.25)
Then the result is obvious.
54 Image Processing: The Fundamentals
Example B2.6
Show that if Agg
T
A
T
= 0 then Ag = 0, where A and g are an r N and an
N N real matrix, respectively.
We may write
Agg
T
A
T
= Ag(Ag)
T
= 0 (2.26)
Ag is an r N matrix. Let us call it B. We have, therefore, BB
T
= 0:
_
_
_
_
b
11
b
12
. . . b
1r
b
21
b
22
. . . b
2r
. . . . . . . . .
b
r1
b
r2
. . . b
rr
_
_
_
_
_
_
_
_
b
11
b
21
. . . b
r1
b
12
b
22
. . . b
r2
. . . . . . . . .
b
1r
b
2r
. . . b
rr
_
_
_
_
=
_
_
_
_
0 0 . . . 0
0 0 . . . 0
. . . . . . . . .
0 0 . . . 0
_
_
_
_
_
_
_
_
b
2
11
+b
2
12
+. . . +b
2
1r
. . . . . . . . .
. . . b
2
21
+b
2
22
+. . . +b
2
2r
. . . . . .
. . . . . . . . . . . .
. . . . . . . . . b
2
r1
+b
2
r2
+. . . +b
2
rr
_
_
_
_
=
_
_
_
_
0 0 . . . 0
0 0 . . . 0
. . . . . . . . .
0 0 . . . 0
_
_
_
_
(2.27)
Equating the corresponding elements, we obtain, for example:
b
2
11
+b
2
12
+. . . +b
2
1r
= 0 (2.28)
The only way that the sum of the squares of r real numbers can be 0 is if each one of
them is 0. Similarly for all other diagonal elements of BB
T
. This means that B = 0,
ie that Ag = 0.
Box 2.1. Can we expand in vector outer products any image?
Yes. Consider an image g and its transpose g
T
. Matrix gg
T
is real and symmetric (see
example 2.2) and let us say that it has r nonzero eigenvalues. Let
i
be its i
th
eigenvalue.
Then it is known, from linear algebra, that there exists an orthogonal matrix S (made
up from the eigenvectors of gg
T
) such that:
Singular value decomposition 55
Sgg
T
S
T
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
0 . . . 0 [ 0 0 . . . 0
0
2
. . . 0 [ 0 0 . . . 0
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
0 0 . . .
r
[ 0 0 . . . 0
0 0 . . . 0 [ 0 0 . . . 0
0 0 . . . 0 [ 0 0 . . . 0
.
.
.
.
.
.
.
.
. [
.
.
.
.
.
.
.
.
.
0 0 . . . 0 [ 0 0 . . . 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
[ 0
0 [ 0
_
_
(2.29)
where and 0 represent the partitions of the diagonal matrix above. Similarly, we can
partition matrix S to an r N matrix S
1
and an (N r) N matrix S
2
:
S =
_
_
S
1
S
2
_
_
(2.30)
Because S is orthogonal, and by using the result of example 2.4, we have:
S
T
S = I S
T
1
S
1
+S
T
2
S
2
= I
S
T
1
S
1
= I S
T
2
S
2
S
T
1
S
1
g = g S
T
2
S
2
g (2.31)
From (2.29) and examples 2.5 and 2.6 we clearly have:
S
1
gg
T
S
T
1
= (2.32)
S
2
gg
T
S
T
2
= 0 S
2
g = 0 (2.33)
Using (2.33) into (2.31) we have:
S
T
1
S
1
g = g (2.34)
This means that S
T
1
S
1
= I, ie S
1
is an orthogonal matrix. We multiply both sides of
equation (2.32) from left and right with
1
2
to get:
1
2
S
1
gg
T
S
T
1
1
2
=
1
2
1
2
= I (2.35)
Since
1
2
is diagonal,
1
2
= (
1
2
)
T
. So the above equation may be rewritten as:
1
2
S
1
g(
1
2
S
1
g)
T
= I (2.36)
56 Image Processing: The Fundamentals
Therefore, there exists a matrix q
1
2
S
1
g, the inverse of which is its transpose (ie it
is orthogonal). We may express matrix S
1
g as
1
2
q and substitute in (2.34) to obtain:
S
T
1
1
2
q = g or g = S
T
1
1
2
q (2.37)
In other words, g is expressed as a diagonal matrix
1
2
made up from the square roots of
the nonzero eigenvalues of gg
T
, multiplied from left and right with the two orthogonal
matrices S
1
and q. This result expresses the diagonalisation of image g.
How can we compute matrices U, V and
1
2
needed for image diagonalisation?
We know, from linear algebra, that matrix diagonalisation means that a real square matrix A
may be written as UU
T
, where U is made up from the eigenvectors of A written as columns,
and is a diagonal matrix made up from the eigenvalues of A written along the diagonal in
the order corresponding to the eigenvectors that make up the columns of U. We need this
information in the proof that follows.
If we take the transpose of (2.13) we have:
g
T
= V
1
2
U
T
(2.38)
Multiply (2.13) with (2.38) by parts, to obtain:
gg
T
= U
1
2
V
T
V
1
2
U
T
= U
1
2
I
1
2
U
T
= U
1
2
1
2
U
T
= UU
T
(2.39)
This shows that matrix consists of the r nonzero eigenvalues of matrix gg
T
while U is made
up from the eigenvectors of the same matrix.
Similarly, if we multiply (2.38) with (2.13) by parts, we get:
g
T
g = V V
T
(2.40)
This shows that matrix V is made up from the eigenvectors of matrix g
T
g.
Box 2.2. What happens if the eigenvalues of matrix gg
T
are negative?
We shall show that the eigenvalues of gg
T
are always non-negative numbers. Let us
assume that is an eigenvalue of matrix gg
T
and u is the corresponding eigenvector.
We have then:
gg
T
u = u (2.41)
Multiply both sides with u
T
from the left:
u
T
gg
T
u = u
T
u (2.42)
Singular value decomposition 57
Since is a scalar, it can change position on the right-hand side of the equation. Also,
because of the associativity of matrix multiplication, we may write:
(u
T
g)(g
T
u) = u
T
u (2.43)
Since u is an eigenvector, u
T
u = 1. Therefore:
(g
T
u)
T
(g
T
u) = (2.44)
g
T
u is some vector y. Then we have: = y
T
y which means that is non-negative
since y
T
y is the square magnitude of vector y.
Example 2.7
If
i
are the eigenvalues of gg
T
and u
i
the corresponding eigenvectors, show
that g
T
g has the same eigenvalues, with the corresponding eigenvectors
given by v
i
= g
T
u
i
.
By denition:
gg
T
u
i
=
i
u
i
(2.45)
Multiply both sides from the left with g
T
:
g
T
gg
T
u
i
= g
T
i
u
i
(2.46)
As
i
is a scalar, it may change position with respect to the other factors on the
right-hand side of (2.46). Also, by the associativity of matrix multiplication:
g
T
g(g
T
u
i
) =
i
(g
T
u
i
) (2.47)
This identies g
T
u
i
as an eigenvector of g
T
g with
i
the corresponding eigenvalue.
Example 2.8
You are given an image: g =
_
_
1 0 0
2 1 1
0 0 1
_
_
. Compute the eigenvectors u
i
of
gg
T
and v
i
of g
T
g.
The transpose of g is:
58 Image Processing: The Fundamentals
g
T
=
_
_
1 2 0
0 1 0
0 1 1
_
_
(2.48)
We start by computing rst gg
T
:
gg
T
=
_
_
1 0 0
2 1 1
0 0 1
_
_
_
_
1 2 0
0 1 0
0 1 1
_
_
=
_
_
1 2 0
2 6 1
0 1 1
_
_
(2.49)
The eigenvalues of gg
T
will be computed from its characteristic equation:
1 2 0
2 6 1
0 1 1
49 4
2
=
7 6.7
2
= 6.854 or = 0.146 (2.51)
In descending order, the eigenvalues are:
1
= 6.854,
2
= 1,
3
= 0.146 (2.52)
Let u
i
=
_
_
x
1
x
2
x
3
_
_
be the eigenvector which corresponds to eigenvalue
i
. Then:
_
_
1 2 0
2 6 1
0 1 1
_
_
_
_
x
1
x
2
x
3
_
_
=
i
_
_
x
1
x
2
x
3
_
_
x
1
+ 2x
2
=
i
x
1
2x
1
+ 6x
2
+x
3
=
i
x
2
x
2
+x
3
=
i
x
3
(2.53)
For
i
= 6.854
2x
2
5.854x
1
= 0 (2.54)
2x
1
0.854x
2
+x
3
= 0 (2.55)
x
2
5.854x
3
= 0 (2.56)
Multiply (2.55) with 5.854 and add equation (2.56) to get:
11.7x
1
4x
2
= 0 (2.57)
Singular value decomposition 59
Equation (2.57) is the same as (2.54). So we have really only two independent equa-
tions for the three unknowns. We choose the value of x
1
to be 1. Then:
x
2
= 2.927 and from (2.55) x
3
= 2 + 0.85 2.925 = 2 + 2.5 = 0.5 (2.58)
Thus, the rst eigenvector is
_
_
1
2.927
0.5
_
_
(2.59)
and after normalisation, ie division with
_
1
2
+ 2.927
2
+ 0.5
2
= 3.133, we obtain:
u
1
=
_
_
0.319
0.934
0.160
_
_
(2.60)
For
i
= 1, the system of linear equations we have to solve is:
x
1
+ 2x
2
= x
1
x
2
= 0
2x
1
+x
3
= 0 x
3
= 2x
1
(2.61)
Choose x
1
= 1. Then x
3
= 2. Since x
2
= 0, we must divide all components with
1
2
+ 2
2
=
i=1
1
2
i
u
i
v
T
i
(2.71)
since the only nonzero terms are those with i = j. Elementary images u
i
v
T
i
are known as the
eigenimages of image f.
Singular value decomposition 61
Can we analyse an eigenimage into eigenimages?
No. An eigenimage N N may be written as the outer product of two vectors, say vectors
u and v:
uv
T
=
_
_
_
_
_
u
1
u
2
.
.
.
u
N
_
_
_
_
_
_
v
1
v
2
. . . v
N
_
=
_
_
_
_
u
1
v
1
u
1
v
2
. . . u
1
v
N
u
2
v
1
u
2
v
2
. . . u
2
v
N
. . . . . . . . . . . .
u
N
v
1
u
N
v
2
. . . u
N
v
N
_
_
_
_
(2.72)
Any row of the outer product of two vectors may be written as the linear function of any
other row. For example, we can see from (2.72) row number 1 is row number 2 times u
1
/u
2
.
So, an eigenimage is a matrix with rank 1, ie it has only one nonzero eigenvalue and only one
eigenvector: it cannot be analysed any further.
Example 2.9
Consider a 2 2 image that can be written as the outer product of two
vectors. Show that it has only one nonzero eigenvalue and that the corre-
sponding eigenvector is parallel to the rst of the two vectors, the outer
product of which makes the image.
Let us say that the image can be written as the outer product of vectors a
T
= (a
1
, a
2
)
and b
T
= (b
1
, b
2
):
ab
T
=
_
a
1
a
2
_
_
b
1
b
2
_
=
_
a
1
b
1
a
1
b
2
a
2
b
1
a
2
b
2
_
(2.73)
We solve the characteristic equation of this matrix to work out its eigenvalues:
a
1
b
1
a
1
b
2
a
2
b
1
a
2
b
2
= 0
(a
1
b
1
)(a
2
b
2
) a
1
b
2
a
2
b
1
= 0
a
1
b
2
a
2
b
1
a
2
b
2
a
1
b
1
+
2
a
1
b
2
a
2
b
1
= 0
( a
2
b
2
a
1
b
1
) = 0
= a
2
b
2
+a
1
b
1
or = 0 (2.74)
So, only one eigenvalue is dierent from zero. The corresponding eigenvector (x
1
, x
2
)
T
is the solution of:
_
a
1
b
1
a
1
b
2
a
2
b
1
a
2
b
2
__
x
1
x
2
_
= (a
2
b
2
+a
1
b
1
)
_
x
1
x
2
_
a
1
b
1
x
1
+a
1
b
2
x
2
= (a
2
b
2
+a
1
b
1
)x
1
a
2
b
1
x
1
+a
2
b
2
x
2
= (a
2
b
2
+a
1
b
1
)x
2
a
1
x
2
= a
2
x
1
a
2
x
1
= a
1
x
2
x
2
=
a
2
a
1
x
1
(2.75)
62 Image Processing: The Fundamentals
Choose x
1
= a
1
. Then x
2
= a
2
, and the eigenvector is (a
1
, a
2
)
T
times a constant that
will make sure its length is normalised to 1. So, the eigenvector is parallel to vector a
since they only dier by a multiplicative constant.
How can we approximate an image using SVD?
If in equation (2.71) we decide to keep only k < r terms, we shall reproduce an approximated
version of the image:
f
k
=
k
i=1
1
2
i
u
i
v
T
i
(2.76)
Example 2.10
A 256256 grey image with 256 grey levels is to be transmitted. How many
terms can be kept in its SVD before the transmission of the transformed
image becomes too inecient in comparison with the transmission of the
original image? (Assume that real numbers require 32 bits each.)
Assume that
1
2
i
is incorporated into one of the vectors u
i
or v
i
in equation (2.76).
When we transmit term i of the SVD expansion of the image, we must transmit the two
vectors u
i
and v
i
, that are made up from 256 elements each, which are real numbers.
We must, therefore, transmit
2 32 256 bits per term.
If we want to transmit the full image, we shall have to transmit 256 256 8 bits
(since each pixel requires 8 bits). Then the maximum number of terms transmitted
before the SVD becomes uneconomical is:
k =
256 256 8
2 32 256
=
256
8
= 32 (2.77)
Box 2.3. What is the intuitive explanation of SVD?
Let us consider a 2 3 matrix (an image) A. Matrix A
T
A is a 3 3 matrix. Let us
consider its eect on a 3 1 vector u: A
T
Au = A
T
(Au). When matrix A operates on
vector u, it produces 2 1 vector u:
Singular value decomposition 63
u Au
_
u
1
u
2
_
=
_
a
11
a
12
a
13
a
21
a
22
a
23
_
_
_
u
1
u
2
u
3
_
_
(2.78)
This is nothing else than a projection of vector u from a 3D space to a 2D space. Next,
let us consider the eect of A
T
on this vector:
A
T
(Au) = A
T
u
_
_
a
11
a
21
a
12
a
22
a
13
a
23
_
_
_
u
1
u
2
_
_
_
u
1
u
2
u
3
_
_
(2.79)
This is nothing else than an upsampling and embedding of vector u from a 2D space
into a 3D space. Now, if vector u is an eigenvector of matrix A
T
A, the result of this
operation, namely projecting it on a lower dimensionality space and then embedding
it back into the high dimensionality space we started from, will produce a vector that
has the same orientation as the original vector u, and magnitude times the original
magnitude: A
T
Au = u, where is the corresponding eigenvalue of matrix A
T
A.
When is large ( > 1), this process, of projecting the vector in a low dimensionality
space and upsampling it again back to its original space, will make the vector larger and
stronger, while if is small ( < 1), the vector will shrink because of this process.
We may think of this operation as a resonance: eigenvectors with large eigenvalues
gain energy from this process and emerge times stronger, as if they resonate with the
matrix. So, when we compute the eigenimages of matrix A, as the outer products of the
eigenvectors that resonate with matrix A
T
A (or AA
T
), and arrange them in order of
decreasing corresponding eigenvectors, eectively we nd the modes of the image: those
components that contain the most energy and resonate best with the image when the
image is seen as an operator that projects a vector to a lower dimensionality space and
then embeds it back to its original space.
What is the error of the approximation of an image by SVD?
The dierence between the original and the approximated image is:
D f f
k
=
r
i=k+1
1
2
i
u
i
v
T
i
(2.80)
We may calculate how big this error is by calculating the norm of matrix D, ie the sum
of the squares of its elements. From (2.80) it is obvious that, if u
im
is the m
th
element of
vector u
i
and v
in
is the n
th
element of vector v
i
, the mn
th
element of D is:
64 Image Processing: The Fundamentals
d
mn
=
r
i=k+1
1
2
i
u
im
v
in
d
2
mn
=
_
r
i=k+1
1
2
i
u
im
v
in
_
2
=
r
i=k+1
i
u
2
im
v
2
in
+ 2
r
i=k+1
r
j=k+1,j=i
1
2
i
1
2
j
u
im
v
in
u
jm
v
jn
(2.81)
The norm of matrix D will be the sum of the squares of all its elements:
[[D[[ =
n
d
2
mn
=
n
r
i=k+1
i
u
2
im
v
2
in
+ 2
n
r
i=k+1
r
j=k+1,j=i
1
2
i
1
2
j
u
im
v
in
u
jm
v
jn
=
r
i=k+1
m
u
2
im
n
v
2
in
+ 2
r
i=k+1
r
j=k+1,j=i
1
2
i
1
2
j
m
u
im
u
jm
n
v
in
v
jn
(2.82)
However, u
i
, v
i
are eigenvectors and therefore they form an orthonormal set. So
m
u
2
im
= 1,
n
v
2
in
= 1,
n
v
in
v
jn
= 0 and
m
u
im
u
jm
= 0 for i ,= j (2.83)
since u
i
u
T
j
= 0 and v
i
v
T
j
= 0 for i ,= j. Then:
[[D[[ =
r
i=k+1
i
(2.84)
Therefore, the square error of the approximate reconstruction of the image using equation
(2.76) is equal to the sum of the omitted eigenvalues.
Example 2.11
For a 3 3 matrix D show that its norm, dened as the trace of D
T
D, is
equal to the sum of the squares of its elements.
Let us assume that:
D
_
_
d
11
d
12
d
13
d
21
d
22
d
23
d
31
d
32
d
33
_
_
(2.85)
Singular value decomposition 65
Then:
D
T
D =
_
_
d
11
d
21
d
31
d
12
d
22
d
32
d
13
d
23
d
33
_
_
_
_
d
11
d
12
d
13
d
21
d
22
d
23
d
31
d
32
d
33
_
_
=
_
_
d
2
11
+d
2
21
+d
2
31
d
11
d
12
+d
21
d
22
+d
31
d
32
d
11
d
13
+d
21
d
23
+d
31
d
33
d
12
d
11
+d
22
d
21
+d
32
d
31
d
2
12
+d
2
22
+d
2
32
d
12
d
13
+d
22
d
23
+d
32
d
33
d
13
d
11
+d
23
d
21
+d
33
d
31
d
13
d
12
+d
23
d
22
+d
33
d
32
d
2
13
+d
2
23
+d
2
33
_
_
(2.86)
Finally:
trace[D
T
D] = (d
2
11
+d
2
21
+d
2
31
) + (d
2
12
+d
2
22
+d
2
32
) + (d
2
13
+d
2
23
+d
2
33
)
= sum of all elements of D squared. (2.87)
How can we minimise the error of the reconstruction?
If we arrange the eigenvalues
i
of matrices f
T
f and ff
T
in decreasing order and truncate
the expansion at some integer k < r, where r is the rank of these matrices, we approximate
the image f by f
k
, which is the least square error approximation. This is because the sum of
the squares of the elements of the dierence matrix is minimal, since it is equal to the sum
of the unused eigenvalues which have been chosen to be the smallest ones.
Notice that the singular value decomposition of an image is optimal in the least square
error sense but the basis images (eigenimages), with respect to which we expanded the image,
are determined by the image itself. (They are determined by the eigenvectors of f
T
f and
ff
T
.)
Example 2.12
In the singular value decomposition of the image of example 2.8, only
the rst term is kept while the others are set to zero. Verify that the
square error of the reconstructed image is equal to the sum of the omitted
eigenvalues.
If we keep only the rst eigenvalue, the image is approximated by the rst eigenimage
only, weighted by the square root of the corresponding eigenvalue:
66 Image Processing: The Fundamentals
g
1
=
_
1
u
1
v
T
1
=
6.85
_
_
0.319
0.934
0.160
_
_
_
0.835 0.357 0.418
_
=
_
_
0.835
2.444
0.419
_
_
_
0.835 0.357 0.418
_
=
_
_
0.697 0.298 0.349
2.041 0.873 1.022
0.350 0.150 0.175
_
_
(2.88)
The error of the reconstruction is given by the dierence between g
1
and the original
image:
g g
1
=
_
_
0.303 0.298 0.349
0.041 0.127 0.022
0.350 0.150 0.825
_
_
(2.89)
The sum of the squares of the errors is:
0.303
2
+ 0.298
2
+ 0.349
2
+ 0.041
2
+ 0.127
2
+ 0.022
2
+ 0.350
2
+ 0.150
2
+ 0.825
2
= 1.146 (2.90)
This is exactly equal to the sum of the two omitted eigenvalues
2
and
3
.
Example 2.13
Perform the singular value decomposition (SVD) of the following image:
g =
_
_
1 0 1
0 1 0
1 0 1
_
_
(2.91)
Thus, identify the eigenimages of the above image.
We start by computing gg
T
:
gg
T
=
_
_
1 0 1
0 1 0
1 0 1
_
_
_
_
1 0 1
0 1 0
1 0 1
_
_
=
_
_
2 0 2
0 1 0
2 0 2
_
_
(2.92)
The eigenvalues of gg
T
are the solutions of:
2 0 2
0 1 0
2 0 2
= 0 (2 )
2
(1 ) 4(1 ) = 0
(1 )( 4) = 0 (2.93)
Singular value decomposition 67
The eigenvalues are:
1
= 4,
2
= 1,
3
= 0. The rst corresponding eigenvector is
the solution of the system of equations:
_
_
2 0 2
0 1 0
2 0 2
_
_
_
_
x
1
x
2
x
3
_
_
= 4
_
_
x
1
x
2
x
3
_
_
2x
1
+ 2x
3
= 4x
1
x
2
= 4x
2
2x
1
+ 2x
3
= 4x
3
x
1
= x
3
x
2
= 0
(2.94)
We choose x
1
= x
3
=
1
2
so that the eigenvector has unit length. Thus,
u
T
1
=
_
1
2
0
1
2
_
. For the second eigenvalue, we have:
_
_
2 0 2
0 1 0
2 0 2
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
x
1
x
2
x
3
_
_
2x
1
+ 2x
3
= x
1
x
2
= x
2
2x
1
+ 2x
3
= x
3
x
1
= 2x
3
x
2
= x
2
x
3
= 2x
1
(2.95)
The second of the above equations conveys no information, giving us the option to
choose whatever value of x
2
we want. However, we cannot choose x
2
= 0, because the
rst and the third equations appear to be incompatible, unless x
1
= x
3
= 0. If x
2
were
0 too, we would have the trivial solution which does not represent an eigenvector.
So, the above three equations are satised if x
1
= x
3
= 0 and x
2
is anything apart from
0. Then x
2
is chosen to be 1 so that u
2
has also unit length. Thus, u
T
2
=
_
0 1 0
_
.
Because g is symmetric, gg
T
= g
T
g and the eigenvectors of gg
T
are the same as the
eigenvectors of g
T
g. Then the SVD of g is:
g =
_
1
u
1
u
T
1
+
_
2
u
2
u
T
2
= 2
_
_
1
2
0
1
2
_
_
_
1
2
0
1
2
_
+
_
_
0
1
0
_
_
_
0 1 0
_
=
_
_
1 0 1
0 0 0
1 0 1
_
_
+
_
_
0 0 0
0 1 0
0 0 0
_
_
(2.96)
These two matrices are the eigenimages of g.
Example 2.14
Perform the singular value decomposition of the following image and iden-
tify its eigenimages:
g =
_
_
0 1 0
1 0 1
0 1 0
_
_
(2.97)
68 Image Processing: The Fundamentals
Start by computing gg
T
:
gg
T
=
_
_
0 1 0
1 0 1
0 1 0
_
_
_
_
0 1 0
1 0 1
0 1 0
_
_
=
_
_
1 0 1
0 2 0
1 0 1
_
_
(2.98)
The eigenvalues of this matrix are the solutions of:
1 0 1
0 2 0
1 0 1
= 0 (1 )
2
(2 ) (2 ) = 0
(2 )
_
(1 )
2
1
_
= 0 (2 )(1 1)(1 + 1) = 0 (2.99)
So,
1
= 2,
2
= 2,
3
= 0.
The rst eigenvector is:
_
_
1 0 1
0 2 0
1 0 1
_
_
_
_
x
1
x
2
x
3
_
_
= 2
_
_
x
1
x
2
x
3
_
_
x
1
+x
3
= 2x
1
2x
2
= 2x
2
x
1
+x
3
= 2x
3
x
1
= x
3
x
2
any value
(2.100)
Choose x
1
= x
3
=
1
2
and x
2
= 0, so u
1
=
_
1
2
, 0,
1
2
_
T
.
The second eigenvector must satisfy the same constraints and must be orthogonal
to u
1
. Therefore:
u
2
= (0, 1, 0)
T
(2.101)
Because g is symmetric, gg
T
= g
T
g and the eigenvectors of gg
T
are the same as the
eigenvectors of g
T
g. Then the SVD of g is:
g =
_
1
u
1
v
T
1
+
_
2
u
2
v
T
2
=
2
_
_
1
2
0
1
2
_
_
_
0 1 0
_
+
2
_
_
0
1
0
_
_
_
1
2
0
1
2
_
=
2
_
_
0
1
2
0
0 0 0
0
1
2
0
_
_
+
2
_
_
0 0 0
1
2
0
1
2
0 0 0
_
_
=
_
_
0 1 0
0 0 0
0 1 0
_
_
+
_
_
0 0 0
1 0 1
0 0 0
_
_
(2.102)
These two matrices are the eigenimages of g. Note that the answer would not have
changed if we exchanged the denitions of u
1
and u
2
, the order of which is meaningless,
since they both correspond to the same eigenvalue with multiplicity 2 (ie it is 2-fold
degenerate).
Singular value decomposition 69
Example 2.15
Show the dierent stages of the SVD of the following image:
g =
_
_
_
_
_
_
_
_
_
_
_
_
255 255 255 255 255 255 255 255
255 255 255 100 100 100 255 255
255 255 100 150 150 150 100 255
255 255 100 150 200 150 100 255
255 255 100 150 150 150 100 255
255 255 255 100 100 100 255 255
255 255 255 255 50 255 255 255
50 50 50 50 255 255 255 255
_
_
_
_
_
_
_
_
_
_
_
_
(2.103)
The gg
T
matrix is:
gg
T
=
_
_
_
_
_
_
_
_
_
_
_
_
520200 401625 360825 373575 360825 401625 467925 311100
401625 355125 291075 296075 291075 355125 381125 224300
360825 291075 282575 290075 282575 291075 330075 205025
373575 296075 290075 300075 290075 296075 332575 217775
360825 291075 282575 290075 282575 291075 330075 205025
401625 355125 291075 296075 291075 355125 381125 224300
467925 381125 330075 332575 330075 381125 457675 258825
311100 224300 205025 217775 205025 224300 258825 270100
_
_
_
_
_
_
_
_
_
_
_
_
(2.104)
Its eigenvalues sorted in decreasing order are:
2593416.500 111621.508 71738.313 34790.875
11882.712 0.009 0.001 0.000
The last three eigenvalues are practically 0, so we compute only the eigenvectors that
correspond to the rst ve eigenvalues. These eigenvectors are the columns of the
following matrix:
_
_
_
_
_
_
_
_
_
_
_
_
0.441 0.167 0.080 0.388 0.764
0.359 0.252 0.328 0.446 0.040
0.321 0.086 0.440 0.034 0.201
0.329 0.003 0.503 0.093 0.107
0.321 0.086 0.440 0.035 0.202
0.359 0.252 0.328 0.446 0.040
0.407 0.173 0.341 0.630 0.504
0.261 0.895 0.150 0.209 0.256
_
_
_
_
_
_
_
_
_
_
_
_
(2.105)
The v
i
eigenvectors, computed as g
T
u
i
, turn out to be the columns of the following
70 Image Processing: The Fundamentals
matrix:
_
_
_
_
_
_
_
_
_
_
_
_
0.410 0.389 0.264 0.106 0.012
0.410 0.389 0.264 0.106 0.012
0.316 0.308 0.537 0.029 0.408
0.277 0.100 0.101 0.727 0.158
0.269 0.555 0.341 0.220 0.675
0.311 0.449 0.014 0.497 0.323
0.349 0.241 0.651 0.200 0.074
0.443 0.160 0.149 0.336 0.493
_
_
_
_
_
_
_
_
_
_
_
_
(2.106)
(a) (b)
(c) (d)
(e) (f )
Figure 2.1: The original image and its ve eigenimages, each scaled independently to
have values from 0 to 255.
Singular value decomposition 71
In gure 2.1 the original image and its ve eigenimages are shown. Each eigenimage
has been scaled so that its grey values vary between 0 and 255. These eigenimages have
to be weighted by the square root of the appropriate eigenvalue and added to produce
the original image. The ve images shown in gure 2.2 are the reconstructed images
when one, two,. . ., ve eigenvalues were used for the reconstruction.
(a) (b)
(c) (d)
(e) (f )
Figure 2.2: Image reconstruction using one, two,. . ., ve eigenimages from top right
to bottom left sequentially, with the original image shown in (f ).
Then we calculate the sum of the squared errors for each reconstructed image according
to the formula:
all pixels
(reconstructed pixel original pixel)
2
(2.107)
72 Image Processing: The Fundamentals
We obtain:
Square error for image 2.2a: 230033.32 (
2
+
3
+
4
+
5
= 230033.41)
Square error for image 2.2b: 118412.02 (
3
+
4
+
5
= 118411.90)
Square error for image 2.2c: 46673.53 (
4
+
5
= 46673.59)
Square error for image 2.2d: 11882.65 (
5
= 11882.71)
Square error for image 2.2e: 0
We see that the sum of the omitted eigenvalues agrees very well with the value of the
square error for each reconstructed image.
Are there any sets of elementary images in terms of which any image may be
expanded?
Yes. They are dened in terms of complete and orthonormal sets of discrete valued discrete
functions.
What is a complete and orthonormal set of functions?
A set of functions S
n
(t), where n is an integer, is said to be orthogonal over an interval
[0, T] with weight function w(t), if:
_
T
0
w(t)S
n
(t)S
m
(t)dt =
_
k ,= 0 if n = m
0 if n ,= m
(2.108)
In other words, the set of functions S
n
(t), for n an integer index identifying the individual
functions, is orthogonal when the integral of the product of any two of these functions over
a certain interval, possibly weighted by a function w(t), is zero, unless the two functions are
the same function, in which case the result is equal to a nonzero constant k. The set is called
orthonormal, if k = 1. Note that from an orthogonal set of functions we can easily create
an orthonormal set by a simple scaling of the functions. The set is called complete, if we
cannot nd any other function which is orthogonal to the set and does not belong to the set.
An example of a complete and orthogonal set is the set of functions S
n
(t) e
jnt
, which are
used as the basis functions of the Fourier transform.
Example 2.16
Show that the columns of an orthogonal matrix form a set of orthonormal
vectors.
Let us say that A is an N N orthogonal matrix (ie A
T
= A
1
), and let us consider
its column vectors u
1
, u
2
, . . . , u
N
. We obviously have:
Singular value decomposition 73
A
1
A = I A
T
A = I
_
_
_
_
_
u
1
T
u
2
T
.
.
.
u
N
T
_
_
_
_
_
_
u
1
u
2
. . . u
N
_
= I
_
_
_
_
_
u
1
T
u
1
u
1
T
u
2
. . . u
1
T
u
N
u
2
T
u
1
u
2
T
u
2
. . . u
2
T
u
N
.
.
.
.
.
.
.
.
.
u
N
T
u
1
u
N
T
u
2
. . . u
N
T
u
N
_
_
_
_
_
=
_
_
_
_
_
1 0 . . . 0
0 1 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 1
_
_
_
_
_
(2.109)
This proves that the columns of A form an orthonormal set of vectors, since u
T
i
u
j
= 0
for i ,= j and u
T
i
u
i
= 1 for every i.
Example 2.17
Show that the inverse of an orthogonal matrix is also orthogonal.
An orthogonal matrix is dened as:
A
T
= A
1
(2.110)
To prove that A
1
is also orthogonal, it is enough to prove that (A
1
)
T
= (A
1
)
1
.
This is equivalent to (A
1
)
T
= A, which is readily derived if we take the transpose of
equation (2.110).
Example 2.18
Show that the rows of an orthogonal matrix also form a set of orthonormal
vectors.
Since A is an orthogonal matrix, so is A
1
(see example 2.17). The columns of an
orthogonal matrix form a set of orthonormal vectors (see example 2.16). Therefore,
the columns of A
1
, which are the rows of A, form a set of orthonormal vectors.
Are there any complete sets of orthonormal discrete valued functions?
Yes. There is, for example, the set of Haar functions, which take values from the set
of numbers 0, 1,
2
p
, for p = 1, 2, 3, . . . and the set of Walsh functions, which take
values from the set of numbers +1, 1.
74 Image Processing: The Fundamentals
2.2 Haar, Walsh and Hadamard trans-
forms
How are the Haar functions dened?
They are dened recursively by equations
H
0
(t) 1 for 0 t < 1
H
1
(t)
_
1 if 0 t <
1
2
1 if
1
2
t < 1
H
2
p
+n
(t)
_
2
p
for
n
2
p
t <
n+0.5
2
p
2
p
for
n+0.5
2
p
t <
n+1
2
p
0 elsewhere
(2.111)
where p = 1, 2, 3, . . . and n = 0, 1, . . . , 2
p
1.
How are the Walsh functions dened?
They are dened in various ways, all of which can be shown to be equivalent. We use here
the denition from the recursive equation
W
2j+q
(t) (1)
j
2
+q
W
j
(2t) + (1)
j+q
W
j
(2t 1) (2.112)
where
_
j
2
_
means the largest integer which is smaller or equal to
j
2
, q = 0 or 1, j = 0, 1, 2, . . .
and:
W
0
(t)
_
1 for 0 t < 1
0 elsewhere
(2.113)
Dierent denitions (eg see Box 2.4) dene these functions in dierent orders (see Box
2.5).
Box 2.4. Denition of Walsh functions in terms of the Rademacher functions
A Rademacher function of order n (n ,= 0) is dened as:
R
n
(t) sign [sin (2
n
t)] for 0 t 1 (2.114)
For n = 0:
R
0
(t) 1 for 0 t 1 (2.115)
Haar, Walsh and Hadamard transforms 75
These functions look like square pulse versions of the sine function. The Walsh functions
in terms of them are dened as
W
n
(t)
m+1
i=1,b
i
=0
R
i
(t) (2.116)
where b
i
are the digits of n when expressed as a binary number:
n = b
m+1
2
m
+b
m
2
m1
+ +b
2
2
1
+b
1
2
0
(2.117)
For example, the binary expression for n when n = 4 is 100. This means that m = 2,
b
3
= 1 and b
2
= b
1
= 0. Then:
W
4
(t) = R
3
(t) = sign [sin (8t)] (2.118)
Figure 2.3 shows sin(8t), R
3
(t) and
W
4
(t).
sin(8 t)
t
1
R (t), W (t)
1
0
1
4 3
1
t
1
0
1
Figure 2.3: The sine function used to dene the corresponding Rademacher function,
which is Walsh function
W
4
(t).
How can we use the Haar or Walsh functions to create image bases?
We saw that a unitary matrix has its columns forming an orthonormal set of vectors (=discrete
functions). We can use the discretised Walsh or Haar functions as vectors that constitute such
an orthonormal set. In other words, we can create transformation matrices that are made up
from Walsh or Haar functions of dierent orders.
76 Image Processing: The Fundamentals
How can we create the image transformation matrices from the Haar and Walsh
functions in practice?
We rst scale the independent variable t by the size of the matrix we want to create. Then
we consider only its integer values i. Then H
k
(i) can be written in a matrix form for k =
0, 1, 2, . . . , N 1 and i = 0, 1, . . . , N 1 and be used for the transformation of a discrete
N N image. We work similarly for W
k
(i).
Note that the Haar/Walsh functions dened this way are not orthonormal. Each has to
be normalised by being multiplied with
1
T
in the continuous case, or with
1
N
in the discrete
case, if t takes up N equally spaced discrete values.
Example 2.19
Derive the matrix which may be used to calculate the Haar transform of a
4 4 image.
First, by using equation (2.111), we shall calculate and plot the Haar functions of
the continuous variable t which are needed for the calculation of the transformation
matrix.
H(0, t) = 1 for 0 t < 1
H(0,t)
1
0 1
t
H(1, t) =
_
_
_
1 for 0 t <
1
2
1 for
1
2
t < 1
1
0
1/2
1 t
H(1,t)
1
In the denition of the Haar functions, when p = 1, n takes the values 0 and 1.
Case p = 1, n = 0:
H(2, t) =
_
2 for 0 t <
1
4
2 for
1
4
t <
1
2
0 for
1
2
t < 1
2
0
1/4 1/2 1
t
H(2,t)
2
Haar, Walsh and Hadamard transforms 77
Case p = 1, n = 1:
H(3, t) =
_
_
0 for 0 t <
1
2
2 for
1
2
t <
3
4
2 for
3
4
t < 1
3/4 1
t
1/2
H(3,t)
0
2
2
To transform a 4 4 image we need a 4 4 matrix. If we scale the t axis by
multiplying it with 4 and take only the integer values of t (ie t = 0, 1, 2, 3), we can
construct the transformation matrix. The plots of the scaled functions look like this:
H(0,t)
1
0
1 2
3 4
t
H(1,t)
1
0
1
1 2 3
4 t
H(2,t)
2
0
1 2 3 4
t
2
2
0
1 2 3
4
t
2
H(3,t)
78 Image Processing: The Fundamentals
The entries of the transformation matrix are the values of H(s, t) where s and t take
values 0, 1, 2, 3. Obviously then, the transformation matrix is:
H =
1
2
_
_
_
_
1 1 1 1
1 1 1 1
2 0 0
0 0
2
2
_
_
_
_
(2.119)
Factor
1
2
is introduced to normalise matrix H so that HH
T
= I, the unit matrix.
Example 2.20
Calculate the Haar transform of image:
g =
_
_
_
_
0 1 1 0
1 0 0 1
1 0 0 1
0 1 1 0
_
_
_
_
(2.120)
The Haar transform of image g is A = HgH
T
. We use matrix H derived in example
2.19:
A =
1
4
_
_
_
_
1 1 1 1
1 1 1 1
2 0 0
0 0
2
2
_
_
_
_
_
_
_
_
0 1 1 0
1 0 0 1
1 0 0 1
0 1 1 0
_
_
_
_
_
_
_
_
1 1
2 0
1 1
2 0
1 1 0
2
1 1 0
2
_
_
_
_
=
1
4
_
_
_
_
1 1 1 1
1 1 1 1
2 0 0
0 0
2
2
_
_
_
_
_
_
_
_
2 0
2
2
2 0
2
2
2 0
2
2
2 0
2
2
_
_
_
_
=
1
4
_
_
_
_
8 0 0 0
0 0 0 0
0 0 4 4
0 0 4 4
_
_
_
_
=
_
_
_
_
2 0 0 0
0 0 0 0
0 0 1 1
0 0 1 1
_
_
_
_
(2.121)
Haar, Walsh and Hadamard transforms 79
Example 2.21
Reconstruct the image of example 2.20 using an approximation of its Haar
transform by setting its bottom right element equal to 0.
The approximate transformation matrix becomes:
A =
_
_
_
_
2 0 0 0
0 0 0 0
0 0 1 1
0 0 1 0
_
_
_
_
(2.122)
The reconstructed image is given by g = H
T
AH:
g =
1
4
_
_
_
_
1 1
2 0
1 1
2 0
1 1 0
2
1 1 0
2
_
_
_
_
_
_
_
_
2 0 0 0
0 0 0 0
0 0 1 1
0 0 1 0
_
_
_
_
_
_
_
_
1 1 1 1
1 1 1 1
2 0 0
0 0
2
2
_
_
_
_
=
1
4
_
_
_
_
1 1
2 0
1 1
2 0
1 1 0
2
1 1 0
2
_
_
_
_
_
_
_
_
2 2 2 2
0 0 0 0
2
2
2
2 0 0
_
_
_
_
=
1
4
_
_
_
_
0 4 4 0
4 0 0 4
4 0 2 2
0 4 0 0
_
_
_
_
=
_
_
_
_
0 1 1 0
1 0 0 1
1 0 0.5 0.5
0 1 0 0
_
_
_
_
(2.123)
The square error is equal to:
0.5
2
+ 0.5
2
+ 1
2
= 1.5 (2.124)
Note that the error is localised in the bottom-right corner of the reconstructed image.
80 Image Processing: The Fundamentals
What do the elementary images of the Haar transform look like?
Figure 2.4 shows the basis images for the expansion of an 8 8 image in terms of the Haar
functions. Each of these images has been produced by taking the outer product of a discretised
Haar function either with itself or with another one. The numbers along the left and on the
top indicate the order of the function used along each row or column, respectively. The
discrete values of each image have been scaled in the range [0, 255] for displaying purposes.
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.4: Haar transform basis images. In each image, grey means 0, black means a
negative and white means a positive number. Note that each image has been scaled separately:
black and white indicate dierent numbers from one image to the next. The vectors used
to construct these images were constructed in the same way as the vectors in example 2.19,
the only dierence being that the t axis was scaled by multiplication with 8 instead of 4, and
functions up to H(7, t) had to be dened. Each image here is the outer product of two such
vectors. For example, the image in row 4 and column 5 is the outer product of 8 1 vectors
H(4, t)H(5, t)
T
, for t in the range [0, 8) sampled at values 0, 1, 2, . . . , 7.
Haar, Walsh and Hadamard transforms 81
Example 2.22
Derive the matrix which can be used to calculate the Walsh transform of
a 4 4 image.
First, by using equation (2.112), we calculate and plot the Walsh functions of the
continuous variable t which are needed for the calculation of the transformation matrix.
W(0, t) =
_
1 for 0 t < 1
0 elsewhere
W(0,t)
1
0
t
1
Case j = 0, q = 1,
_
j
2
_
= 0.
W(1, t) =
_
W(0, 2t) W
_
0, 2
_
t
1
2
___
(2.125)
We must compute the values of W(0, 2t) and W
_
0, 2
_
t
1
2
__
. We have to use the
denition of W(0, t) and examine the range of values of the expression that appears
instead of t in the above two functions. For example, for 2t to be in the range [0, 1], so
that W(0, 2t) ,= 0, t must be in the range [0, 1/2]. So, we have to consider conveniently
chosen ranges of t.
For 0 t <
1
2
:
0 2t < 1 W(0, 2t) = 1 (2.126)
1
2
t
1
2
< 0 1 2
_
t
1
2
_
< 0 W
_
0, 2
_
t
1
2
__
= 0 (2.127)
Therefore:
W(1, t) = 1 for 0 t <
1
2
(2.128)
For
1
2
t < 1:
1 2t < 2 W(0, 2t) = 0 (2.129)
0 t
1
2
<
1
2
0 2(t
1
2
) 1 W
_
0, 2
_
t
1
2
__
= 1 (2.130)
Therefore:
W(1, t) = (1) = 1 for
1
2
t < 1 (2.131)
82 Image Processing: The Fundamentals
W(1, t) =
_
1 for 0 t <
1
2
1 for
1
2
t < 1
1
0
1
1/2
t
W(1,t)
1
Case j = 1, q = 0,
_
j
2
_
= 0.
W(2, t) = W(1, 2t) W
_
1, 2
_
t
1
2
__
(2.132)
For 0 t <
1
4
:
0 2t <
1
2
W(1, 2t) = 1 (2.133)
1
2
t
1
2
<
1
4
1 2
_
t
1
2
_
<
1
2
W
_
1, 2
_
t
1
2
__
= 0 (2.134)
Therefore:
W(2, t) = 1 for 0 t <
1
4
(2.135)
For
1
4
t <
1
2
:
1
2
2t < 1 W(1, 2t) = 1 (2.136)
1
4
t
1
2
< 0
1
2
2
_
t
1
2
_
< 0 W
_
1, 2
_
t
1
2
__
= 0 (2.137)
Therefore:
W(2, t) = 1 for
1
4
t <
1
2
(2.138)
For
1
2
t <
3
4
:
1 2t <
3
2
W(1, 2t) = 0 (2.139)
0 t
1
2
<
1
4
0 2
_
t
1
2
_
<
1
2
W
_
1, 2
_
t
1
2
__
= 1 (2.140)
Haar, Walsh and Hadamard transforms 83
Therefore:
W(2, t) = 1 for
1
2
t <
3
4
(2.141)
For
3
4
t < 1:
3
2
2t < 2 W(1, 2t) = 0 (2.142)
1
4
t
1
2
<
1
2
1
2
2
_
t
1
2
_
< 1 W
_
1, 2
_
t
1
2
__
= 1 (2.143)
Therefore:
W(2, t) = 1 for
3
4
t < 1 (2.144)
W(2, t) =
_
_
_
1 for 0 t <
1
4
1 for
1
4
t <
3
4
1 for
3
4
t < 1
1
0
1
t
W(2,t)
1
1/2
Case j = 1, q = 1,
_
j
2
_
= 0.
W(3, t) =
_
W(1, 2t) +W
_
1, 2
_
t
1
2
___
(2.145)
For 0 t <
1
4
:
W(1, 2t) = 1, W
_
1, 2
_
t
1
2
__
= 0 (2.146)
Therefore:
W(3, t) = 1 for 0 t <
1
4
(2.147)
For
1
4
t <
1
2
:
W(1, 2t) = 1, W
_
1, 2
_
t
1
2
__
= 0 (2.148)
84 Image Processing: The Fundamentals
Therefore:
W(3, t) = 1 for
1
4
t <
1
2
(2.149)
For
1
2
t <
3
4
:
W(1, 2t) = 0, W
_
1, 2
_
t
1
2
__
= 1 (2.150)
Therefore:
W(3, t) = 1 for
1
2
t <
3
4
(2.151)
For
3
4
t < 1:
W(1, 2t) = 0, W
_
1, 2
_
t
1
2
__
= 1 (2.152)
Therefore:
W(3, t) = 1 for
3
4
t < 1 (2.153)
W(3, t) =
_
_
1 for 0 t <
1
4
1 for
1
4
t <
1
2
1 for
1
2
t <
3
4
1 for
3
4
t < 1
1
0
1
t 1
1/2
W(3,t)
To create a 4 4 matrix, we multiply t with 4 and consider only its integer values ie
0, 1, 2, 3. The rst row of the matrix will be formed from W(0, t). The second from
W(1, t), the third from W(2, t) and so on:
W =
1
2
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
(2.154)
This matrix has been normalised by multiplying it with
1
2
so that W
T
W = I, where I
is the unit matrix.
Haar, Walsh and Hadamard transforms 85
Example 2.23
Calculate the Walsh transform of image:
g =
_
_
_
_
0 1 1 0
1 0 0 1
1 0 0 1
0 1 1 0
_
_
_
_
(2.155)
In the general formula of a separable linear transform with real matrices U and V ,
A = UgV
T
, use U = V = W as derived in example 2.22:
A =
1
4
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
_
_
_
_
0 1 1 0
1 0 0 1
1 0 0 1
0 1 1 0
_
_
_
_
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
=
1
4
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
_
_
_
_
2 0 2 0
2 0 2 0
2 0 2 0
2 0 2 0
_
_
_
_
=
1
4
_
_
_
_
8 0 0 0
0 0 0 0
0 0 8 0
0 0 0 0
_
_
_
_
=
_
_
_
_
2 0 0 0
0 0 0 0
0 0 2 0
0 0 0 0
_
_
_
_
(2.156)
Can we dene an orthogonal matrix with entries only +1 or 1?
Yes. These are the Hadamard matrices named after the mathematician who studied them
in 1893. For a general size, these matrices have been shown to exist only for sizes up to
200200. Beyond this size, the Hadamard matrices are dened only for sizes that are powers
of 2, using a recursive algorithm, as follows:
H
1
=
_
1 1
1 1
_
and H
2N
=
_
H
N
H
N
H
N
H
N
_
(2.157)
The rows of such matrices can be shown to be discretised Walsh functions. So the Walsh
functions may be calculated from these matrices for N = 2
n
, for n a positive integer.
86 Image Processing: The Fundamentals
Box 2.5. Ways of ordering the Walsh functions
Equation (2.112) denes the Walsh functions in what is called sequency order, or
Walsh order or Walsh-Kaczmarz order.
Denition of the Walsh functions in terms of the Rademacher functions (see equation
(2.116)) results in the Walsh functions being in natural or normal or binary or dyadic
or Paley order. Let us denote these functions by
W. Note that as all Rademacher
functions start with a positive sign, no matter how many of them we multiply to create
a Walsh function, the Walsh function created will always start with a positive value.
So, some of the Walsh functions created that way will be equal to the negative of the
corresponding Walsh function created by the dierence equation (2.112).
The Walsh functions generated from the Hadamard matrices are said to be in Kro-
necker or lexicographic ordering. Let us denote these functions by
W. All these
functions also start from a positive value, so again, some of them will be equal to the
negative of a Walsh function created by the dierence equation (2.112). Because of
that, we say that the Walsh functions created from the Rademacher functions and the
Hadamard matrices have positive phase.
Nat. Bit-rev. Nat. Relationship Relationship
n Binary Gray order of order sequency lexicographic
of seq. n of lex. -natural -natural
0 000 000 0 000 0 W
0
(t) =
W
0
(t)
W
0
(t) =
W
0
(t)
1 001 001 1 100 4 W
1
(t) =
W
1
(t)
W
1
(t) =
W
4
(t)
2 010 011 3 010 2 W
2
(t) =
W
3
(t)
W
2
(t) =
W
2
(t)
3 011 010 2 110 6 W
3
(t) =
W
2
(t)
W
3
(t) =
W
6
(t)
4 100 110 6 001 1 W
4
(t) =
W
6
(t)
W
4
(t) =
W
1
(t)
5 101 111 7 101 5 W
5
(t) =
W
7
(t)
W
5
(t) =
W
5
(t)
6 110 101 5 011 3 W
6
(t) =
W
5
(t)
W
6
(t) =
W
3
(t)
7 111 100 4 111 7 W
7
(t) =
W
4
(t)
W
7
(t) =
W
7
(t)
Table 2.1: n is either the sequency or the lexicographic order. Functions W are com-
puted from equation (2.112); functions
W are computed from equation (2.116), and
functions
W
3
(t) =
W
6
(t). Here
W
3
(t) is the 4th row
of Hadamard matrix of size 8 8. Table 2.1 lists the corresponding order of the rst 8
Walsh functions.
Example B2.24
Compute the Walsh function with natural order n = 7.
We write n in binary code: 7 = 111. In order to create the function with this natural
order, we shall use equation (2.116), on page 75, with m = 2, b
1
= b
2
= b
3
= 1. Then:
W
7
(t) = R
1
(t)R
2
(t)R
3
(t) (2.158)
Figure 2.5 shows the plots of the three Rademacher functions we have to multiply and
the resultant Walsh function created this way. The formula for
W
7
(t) is:
W
7
(t) =
_
_
1 for 0 t <
1
8
1 for
1
8
t <
3
8
1 for
3
8
t <
4
8
1 for
4
8
t <
5
8
1 for
5
8
t <
7
8
1 for
7
8
t < 1
(2.159)
88 Image Processing: The Fundamentals
R (t)
t
1
1
0
1
W (t)
7
t
1
1
0
1
R (t)
1
t
1
1
0
1
2
t
1
1
0
1
3
R (t)
Figure 2.5: The top three functions are multiplied to produce function
W
7
(t) at the
bottom.
What do the basis images of the Hadamard/Walsh transform look like?
Figure 2.6 shows the basis images for the expansion of an 8 8 image in terms of Walsh
functions in the order they are produced by applying equation (2.113), on page 74. The
basis images were produced by taking the vector outer product of all possible pairs of the
discretised 8-samples long Walsh functions.
Haar, Walsh and Hadamard transforms 89
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.6: Hadamard/Walsh transform basis images. The vectors used to construct these
images were constructed in the same way as the vectors in example 2.22, the only dierence
being that the t axis was scaled by multiplication with 8 instead of 4, and functions up to
W(7, t) had to be dened. Each image here is the outer product of two such vectors. For
example, the image in row 4 and column 5 is the outer product of 81 vectors W(4, t)W(5, t)
T
,
for t in the range [0, 8) sampled at values 0, 1, 2, . . . , 7.
Example 2.25
Show the dierent stages of the Haar transform of the image of example
2.15, on page 69.
We can perform the reconstruction by keeping only basis images made up from one, up
to eight Haar functions. Each such reconstruction will be an improved approximation
90 Image Processing: The Fundamentals
of the original image over the previous approximation. The series of images we obtain
by these reconstructions are shown in gure 2.7. For example, gure 2.7b is the re-
constructed image when only the coecients that multiply the four basis images at the
top left corner of gure 2.4 are retained. These four basis images are created from the
rst two Haar functions, H(0, t) and H(1, t). Image 2.7g is reconstructed when all the
coecients that multiply the basis images along the bottom row and the right column
in gure 2.4 are set to 0. In other words, the basis images used for this reconstruction
were created from the rst seven Haar functions, ie H(0, t), H(1, t), . . . , H(6, t).
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.7: Reconstructed images when the basis images used are those created from
the rst one, two, three,. . ., eight Haar functions, from top left to bottom right, re-
spectively.
The sum of the square errors for each reconstructed image is as follows:
Square error for image 2.7a: 366394
Square error for image 2.7b: 356192
Square error for image 2.7c: 291740
Square error for image 2.7d: 222550
Square error for image 2.7e: 192518
Square error for image 2.7f: 174625
Square error for image 2.7g: 141100
Square error for image 2.7h: 0
Haar, Walsh and Hadamard transforms 91
Example 2.26
Show the dierent stages of the Walsh/Hadamard transform of the image
of example 2.15, on page 69.
We can perform the reconstruction by keeping only basis images made up from one, up
to eight Walsh functions. Each such reconstruction will be an improved approximation
of the original image over the previous approximation. The series of images we obtain
by these reconstructions are shown in gure 2.8. For example, gure 2.8f has been
reconstructed from the inverse Walsh/Hadamard transform, by setting to 0 all elements
of the transformation matrix that multiply the basis images in the bottom two rows and
the two rightmost columns in gure 2.6. These omitted basis images are those that are
created from functions W(6, t) and W(7, t).
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.8: Reconstructed images when the basis images used are those created from
the rst one, two, three,. . ., eight Walsh functions, from top left to bottom right,
respectively.
The sum of the square errors for each reconstructed image is as follows.
Square error for image 2.8a: 366394
Square error for image 2.8b: 356190
Square error for image 2.8c: 262206
Square error for image 2.8d: 222550
Square error for image 2.8e: 148029
Square error for image 2.8f: 92078
Square error for image 2.8g: 55905
Square error for image 2.8h: 0
92 Image Processing: The Fundamentals
(a) (b)
Figure 2.9: The ower image approximated with the same number of terms using (a) the
Haar transform and (b) the Walsh transform. Note how the reconstruction with Haar has
localised error, at the positions where the expansion coecients have been set to 0, while the
Walsh reconstruction distributes the error over the whole image.
What are the advantages and disadvantages of the Walsh and the Haar trans-
forms?
From gure 2.4, on page 80, notice that the higher order Haar basis images use the same
basic pattern that scans the whole image, as if every basis image attempts to capture more
accurately the local characteristics of the image focusing every time at one place only. For
example, all the 16 basis images in the bottom right quadrant of gure 2.4 use a window
of 2 2 pixels to reproduce detail in various parts of the image. If we are not interested
in that level of detail, we can set the corresponding 16 coecients of the transform to zero.
Alternatively, if, for example, we are not interested in the details only on the right part of
the image, we may set to 0 all coecients that multiply the basis images of the last column
of gure 2.4. In other words, the Haar basis functions allow us to reconstruct with dierent
levels of detail dierent parts of an image.
In contrast, higher order Walsh basis images try to approximate the image as a whole,
with uniformly distributed detail structure. This is because Walsh functions cannot take the
0 value. Notice how this dierence between the two bases is reected in the reconstructed
images: both images 2.7g (repeated here in gure 2.9a) and 2.8g (repeated here in gure
2.9b) have been reconstructed by retaining the same number of basis images. In gure 2.9a
the ower has been almost fully reconstructed apart from some details on the right and at
the bottom, because the omitted basis images were those that would describe the image in
those locations, and the image happened to have signicant detail there. That is why the
reconstructed error in this case is higher for the Haar than the Walsh case. Notice that the
error in the Walsh reconstruction is uniformly distributed over the whole image.
Walsh transforms have the advantage over Haar transforms that the Walsh functions take
up only two values, namely +1 or 1, and thus they are easily implemented in a computer
as their values correspond to binary logic.
Haar, Walsh and Hadamard transforms 93
LH1 LL LH2
H
2
L
H
3
L
LH3
H1H3
H2H3
H3H3
H
3
H
1
H
2
H
1
H1H2
H2H2
H3H2
H
1
H
1
H
1
L
Figure 2.10: The empty panels shown correspond to the basis images shown in gure 2.4.
The thick lines divide them into sets of elementary images of the same resolution. Letters
L and H are used to indicate low and high resolution, respectively. The numbers next to
letter H indicates which level of high resolution. The pairs of letters used indicate which
resolution we have along the vertical and horizontal axis. For example, pair L-H2 indicates
that the corresponding panels have low resolution along the vertical axis, but high second
order resolution along the horizontal axis.
What is the Haar wavelet?
The property of the Haar basis functions to concentrate at one part of the image at a time
is a characteristic property of a more general class of functions called wavelets. The Haar
wavelets are all scaled and translated versions of the same function. For an 88 image they
are shown in the 16 bottom right panels of gure 2.4 for the nest scale of resolution. The
function represented by the top left panel of gure 2.4, ie the average at image, is called
the scaling function. The basis images represented by the other panels in the rst column
and the top row of gure 2.4 are produced from combinations of the scaling function and the
wavelet. The rest of the panels correspond to intermediate scales and they may be grouped in
sets of the same resolution panels, that cover the full image (gure 2.10). All panels together
constitute a complete basis in terms of which any 8 8 image may be expanded.
94 Image Processing: The Fundamentals
2.3 Discrete Fourier Transform
What is the discrete version of the Fourier transform (DFT)?
The 1D discrete Fourier transform (DFT) of a function f(k), dened at discrete points k =
0, 1, . . . , N 1, is dened as:
F(m)
1
N
N1
k=0
f(k) exp
_
j
2mk
N
_
(2.160)
The 2D discrete Fourier transform for an N N image is dened as:
mn
1
N
2
N1
k=0
N1
l=0
g
kl
e
j2
km+nl
N
(2.161)
Note that in all previous sections the index of the elements of a signal or an image was
taking values starting from 1. If we had retained that convention here, in the exponent of the
exponential function in (2.160), instead of having k we should have had (k 1). For the sake
of simplicity, in this section, we assume that for an N-sample long signal, the indices start
from 0 and go up to N 1, instead of starting from 1 and going up to N.
Unlike the other transforms that were developed directly in the discrete domain, this
transform was initially developed in the continuous domain. To preserve this historical
consistency, we shall go back into using function arguments rather than indices. Further,
because we shall have to associate Fourier transforms of dierent functions, we shall use the
convention of the Fourier transform being denoted by the same letter as the function, but with
a hat on the top. Dierent numbers of hats will be used to distinguish the Fourier transforms
that refer to dierent versions of the same function. The reason for this will become clear
when the case arises. So, for the time being, we dene the Fourier transform of an M N
digital image as follows:
g(m, n)
1
MN
M1
k=0
N1
l=0
g(k, l)e
j2[
km
M
+
ln
N
]
(2.162)
We must think of this formula as a slot machine: when we slot in a function, out pops
its DFT:
. . .
..
DFT
=
1
MN
M1
k=0
N1
l=0
. . .
..
function
e
j2[
km
M
+
ln
N
]
(2.163)
This way of thinking will be very useful when we try to prove the various properties of
the DFT.
Discrete Fourier transform 95
Example B2.27
For S and t integers, show that:
S1
m=0
e
j2t
m
S
= S(t) (2.164)
This is a geometric progression with S elements, rst term 1 (m = 0) and ratio
q e
j2
t
S
. The sum of the rst S terms of such a geometric progression is given by:
S1
m=0
q
m
=
q
S
1
q 1
for q ,= 1 (2.165)
For q ,= 1, ie for e
j2
t
S
,= 1, ie for t ,= 0, sum (2.164) is, therefore, equal to:
S1
m=0
e
j2t
m
S
=
e
j2t
1
e
2j
t
S
1
=
cos(2t) +j sin(2t) 1
e
j2
t
S
1
=
1 +j0 1
e
j2
t
S
1
= 0 (2.166)
If, however, t = 0, all terms in (2.164) are equal to 1 and we have
S1
m=0
1 = S. So
S1
m=0
e
j2t
m
S
=
_
S if t = 0
0 if t ,= 0
(2.167)
and (2.164) follows.
Box 2.6. What is the inverse discrete Fourier transform?
To solve equation (2.162) for g(k, l), we multiply both sides with e
j2[
qm
M
+
pn
N
]
and sum
over all m and n from 0 to M 1 and N 1, respectively. We get:
M1
m=0
N1
n=0
g(m, n)e
j2[
qm
M
+
pn
N
]
=
1
MN
M1
k=0
N1
l=0
M1
m=0
N1
n=0
g(k, l)e
j2[
m(qk)
M
+
n(pl)
N
]
=
1
MN
M1
k=0
N1
l=0
g(k, l)
M1
m=0
e
j2m
qk
M
N1
n=0
e
j2n
pl
N
(2.168)
96 Image Processing: The Fundamentals
Applying formula (2.164) once for t q k and once for t p l and substituting into
equation (2.168), we deduce that the right-hand side of (2.168) is
1
MN
M1
k=0
N1
l=0
g(k, l)M(q k)N(p l) (2.169)
where (a b) is 0 unless a = b. Therefore, the above expression is g(q, p), ie:
g(q, p) =
M1
m=0
N1
n=0
g(m, n)e
j2[
qm
M
+
pn
N
]
(2.170)
This is the inverse 2D discrete Fourier transform.
How can we write the discrete Fourier transform in a matrix form?
We construct matrix U with elements
U
x
=
1
N
exp
_
j
2x
N
_
(2.171)
where x takes values 0, 1, . . . , N 1 along each column and takes the same values along
each row. Notice that U is symmetric, ie U
T
= U. Then, according to equation (2.3), on
page 48, the 2D discrete Fourier transform of an image g is given by:
g = UgU (2.172)
Example 2.28
Derive the matrix with which the discrete Fourier transform of a 44 image
may be obtained.
Apply formula (2.171) with N = 4, 0 x 3, 0 3:
U =
1
4
_
_
_
_
e
j
2
4
0
e
j
2
4
0
e
j
2
4
0
e
j
2
4
0
e
j
2
4
0
e
j
2
4
1
e
j
2
4
2
e
j
2
4
3
e
j
2
4
0
e
j
2
4
2
e
j
2
4
4
e
j
2
4
6
e
j
2
4
0
e
j
2
4
3
e
j
2
4
6
e
j
2
4
9
_
_
_
_
(2.173)
Discrete Fourier transform 97
Or:
U =
1
4
_
_
_
_
1 1 1 1
1 e
j
2
e
j
e
j
3
2
1 e
j
e
j2
e
j3
1 e
j
3
2
e
j3
e
j
9
2
_
_
_
_
=
1
4
_
_
_
_
1 1 1 1
1 e
j
2
e
j
e
j
3
2
1 e
j
1 e
j
1 e
j
3
2
e
j
e
j
2
_
_
_
_
(2.174)
Recall that:
e
j
2
= cos
2
j sin
2
= j
e
j
= cos j sin = 1
e
j
3
2
= cos
3
2
j sin
3
2
= j (2.175)
Therefore:
U =
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
(2.176)
Example 2.29
Use matrix U of example 2.28 to compute the discrete Fourier transform
of the following image:
g =
_
_
_
_
0 0 1 0
0 0 1 0
0 0 1 0
0 0 1 0
_
_
_
_
(2.177)
Calculate rst gU:
_
_
_
_
0 0 1 0
0 0 1 0
0 0 1 0
0 0 1 0
_
_
_
_
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
=
1
4
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
(2.178)
Multiply the result with U from the left to get UgU = g (the discrete Fourier transform
of g):
98 Image Processing: The Fundamentals
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
1
4
_
_
_
_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
=
1
16
_
_
_
_
4 4 4 4
0 0 0 0
0 0 0 0
0 0 0 0
_
_
_
_
=
_
_
_
_
1
4
1
4
1
4
1
4
0 0 0 0
0 0 0 0
0 0 0 0
_
_
_
_
(2.179)
Example 2.30
Using the denition of DFT by formula (2.162), verify that the DFT values
of image (2.177) for m = 1 and n = 0, and for m = 0 and n = 1 are as worked
out in example 2.29.
Applying (2.162) for M = N = 4, we obtain:
g(m, n) =
1
16
3
k=0
3
l=0
g(k, l)e
j2
km+nl
4
(2.180)
For image (2.177) we have g(0, 2) = g(1, 2) = g(2, 2) = g(3, 2) = 1 and all other g(k, l)
values are 0. Then:
g(m, n) =
1
16
_
e
j2
2n
4
+e
j2
m+2n
4
+e
j2
2m+2n
4
+e
j2
3m+2n
4
_
(2.181)
For m = 1 and n = 0 we obtain:
g(1, 0) =
1
16
_
e
0
+e
j2
1
4
+e
j2
2
4
+e
j2
3
4
_
=
1
16
_
1 + cos
2
j sin
2
+ cos j sin + cos
3
2
j sin
3
2
_
=
1
16
[1 j 1 +j] = 0 (2.182)
For m = 0 and n = 1 we obtain:
g(0, 1) =
1
16
_
e
j2
2
4
+e
j2
2
4
+e
j2
2
4
+e
j2
2
4
_
=
1
16
[4(cos j sin )] =
1
4
(2.183)
We note that both values deduced agree with the g(1, 0) and g(0, 1) we worked out using
the matrix multiplication approach in example 2.29.
Discrete Fourier transform 99
Is matrix U used for DFT unitary?
We must show that any row of this matrix is orthogonal to the complex conjugate of any
other row
1
. Using denition (2.171), the product of rows corresponding to x = x
1
and x = x
2
is given by
1
N
2
N1
=0
e
j
2x
1
N
e
j
2x
2
N
=
1
N
2
N1
=0
e
j
2(x
2
x
1
)
N
=
1
N
2
N(x
2
x
1
) =
1
N
(x
2
x
1
) (2.184)
where we made use of equation (2.164), on page 95.
So, UU
H
does not produce the unit matrix, but a diagonal matrix with all its elements
along the diagonal equal to 1/N. On the other hand, matrix
U
1
N
U (2.185)
is unitary. However, if we were to compute DFT using matrix
U instead of U, the produced
DFT would not have been identical with that produced by using the conventional denition
formulae (2.160) and (2.161). To have full agreement between the matrix version of DFT and
the formula version we should also redene DFT as
F(m)
1
N
N1
k=0
f(k) exp
_
j
2mk
N
_
(2.186)
for 1D and as
mn
1
N
N1
k=0
N1
l=0
g
kl
e
j2
km+nl
N
(2.187)
for 2D. These are perfectly acceptable alternative denitions. However, they have certain
consequences for some theorems, in the sense that they alter some scaling constants. A
scaling constant is not usually a problem in a transformation, as long as one is consistent in all
subsequent manipulations and careful when taking the inverse transform. In other words, you
either always use U given by (2.171) and denitions (2.160) and (2.161), remembering that U
is unitary apart from a scaling constant, or you always use
U given by (2.185) and denitions
(2.186) and (2.187), remembering that some theorems may involve dierent multiplicative
constants from those found in conventional books on DFT.
Example 2.31
Show that
e
j
2
N
x
= e
j
2
N
[mod
N
(x)]
(2.188)
where x is an integer.
For integer x we may write x = qN +r, where q is the integer number of times N ts
in x and r is the residual. For example, if N = 32 and x = 5, q = 0 and r = 5. If
1
The need to take the complex conjugate of the second row arises because when we multiply imaginary
numbers, in order to get 1, we have to multiply j with j, ie its complex conjugate, rather than j with j.
100 Image Processing: The Fundamentals
N = 32 and x = 36, q = 1 and r = 4. The residual r is called modulus of x over N
and is denoted as mod
N
(x).
A complex exponential e
j
may be written as cos +j sin . We may, therefore, write:
e
j
2
N
x
= e
j
2
N
(qN+r)
= e
j
2
N
qNj
2
N
r
= e
j2q
e
j
2
N
r
= [cos(2) j sin(2)] e
j
2
N
r
= (1 j 0) e
j
2
N
r
= e
j
2
N
[mod
N
(x)]
(2.189)
Example 2.32
Derive matrix U needed for the calculation of the DFT of an 8 8 image.
By applying formula (2.171) with N = 8 we obtain:
64U =
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1
1 e
j
2
8
1
e
j
2
8
2
e
j
2
8
3
e
j
2
8
4
e
j
2
8
5
e
j
2
8
6
e
j
2
8
7
1 e
j
2
8
2
e
j
2
8
4
e
j
2
8
6
e
j
2
8
8
e
j
2
8
10
e
j
2
8
12
e
j
2
8
14
1 e
j
2
8
3
e
j
2
8
6
e
j
2
8
9
e
j
2
8
12
e
j
2
8
15
e
j
2
8
18
e
j
2
8
21
1 e
j
2
8
4
e
j
2
8
8
e
j
2
8
12
e
j
2
8
16
e
j
2
8
20
e
j
2
8
24
e
j
2
8
28
1 e
j
2
8
5
e
j
2
8
10
e
j
2
8
15
e
j
2
8
20
e
j
2
8
25
e
j
2
8
30
e
j
2
8
35
1 e
j
2
8
6
e
j
2
8
12
e
j
2
8
18
e
j
2
8
24
e
j
2
8
30
e
j
2
8
36
e
j
2
8
42
1 e
j
2
8
7
e
j
2
8
14
e
j
2
8
21
e
j
2
8
28
e
j
2
8
35
e
j
2
8
42
e
j
2
8
49
_
_
_
_
_
_
_
_
_
_
_
_
_
Using formula (2.188), the above matrix may be simplied to:
U =
1
64
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1
1 e
j
2
8
1
e
j
2
8
2
e
j
2
8
3
e
j
2
8
4
e
j
2
8
5
e
j
2
8
6
e
j
2
8
7
1 e
j
2
8
2
e
j
2
8
4
e
j
2
8
6
e
j
2
8
0
e
j
2
8
2
e
j
2
8
4
e
j
2
8
6
1 e
j
2
8
3
e
j
2
8
6
e
j
2
8
1
e
j
2
8
4
e
j
2
8
7
e
j
2
8
2
e
j
2
8
5
1 e
j
2
8
4
e
j
2
8
0
e
j
2
8
4
e
j
2
8
0
e
j
2
8
4
e
j
2
8
0
e
j
2
8
4
1 e
j
2
8
5
e
j
2
8
2
e
j
2
8
7
e
j
2
8
4
e
j
2
8
1
e
j
2
8
6
e
j
2
8
3
1 e
j
2
8
6
e
j
2
8
4
e
j
2
8
2
e
j
2
8
0
e
j
2
8
6
e
j
2
8
4
e
j
2
8
2
1 e
j
2
8
7
e
j
2
8
6
e
j
2
8
5
e
j
2
8
4
e
j
2
8
3
e
j
2
8
2
e
j
2
8
1
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.190)
Discrete Fourier transform 101
Which are the elementary images in terms of which DFT expands an image?
As the kernel of DFT is a complex function, these images are complex. They may be created
by taking the outer product of any two rows of matrix U. Figure 2.11 shows the real parts of
these elementary images and gure 2.12 the imaginary parts, for the U matrix computed in
example 2.32.
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.11: Real part of the Fourier transform basis images, appropriate for expanding an
8 8 image. All panels have been scaled together for presentation purposes.
The values of all the images have been linearly scaled to vary between 0 (black) and 255
(white). The numbers along the left and the top indicate which transposed row of matrix U
was multiplied with which row to produce the corresponding image.
102 Image Processing: The Fundamentals
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.12: Imaginary part of the Fourier transform basis images, appropriate for expanding
an 8 8 image. All panels have been scaled together for presentation purposes.
Example 2.33
Compute the real and imaginary parts of the discrete Fourier transform of
image:
g =
_
_
_
_
0 0 0 0
0 1 1 0
0 1 1 0
0 0 0 0
_
_
_
_
(2.191)
Discrete Fourier transform 103
We shall use matrix U of example 2.28. We have to compute g = UgU. We start by
computing rst gU:
gU =
1
4
_
_
_
_
0 0 0 0
0 1 1 0
0 1 1 0
0 0 0 0
_
_
_
_
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
=
1
4
_
_
_
_
0 0 0 0
2 1 j 0 1 +j
2 1 j 0 1 +j
0 0 0 0
_
_
_
_
(2.192)
We then multiply this result with U from the left:
g =
1
16
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
_
_
_
_
0 0 0 0
2 1 j 0 1 +j
2 1 j 0 1 +j
0 0 0 0
_
_
_
_
=
1
16
_
_
_
_
4 2 2j 0 2 + 2j
2 2j 2j 0 2
0 0 0 0
2 + 2j 2 0 2j
_
_
_
_
=
_
_
_
_
_
1
4
1+j
8
0
1+j
8
1+j
8
j
8
0
1
8
0 0 0 0
1+j
8
1
8
0
j
8
_
_
_
_
_
(2.193)
Splitting the real and imaginary parts, we obtain:
'(A) =
_
_
_
_
_
1
4
1
8
0
1
8
1
8
0 0
1
8
0 0 0 0
1
8
1
8
0 0
_
_
_
_
_
and (A) =
_
_
_
_
_
0
1
8
0
1
8
1
8
1
8
0 0
0 0 0 0
1
8
0 0
1
8
_
_
_
_
_
(2.194)
Example 2.34
Show the dierent stages of the approximation of the image of example
2.15, on page 69, by its Fourier transform.
The eight images shown in gure 2.13 are the reconstructed images when one, two,. . .,
eight lines of matrix U were used for the reconstruction. The sum of the squared errors
for each reconstructed image are:
104 Image Processing: The Fundamentals
Square error for image 2.13a: 366394
Square error for image 2.13b: 285895
Square error for image 2.13c: 234539
Square error for image 2.13d: 189508
Square error for image 2.13e: 141481
Square error for image 2.13f: 119612
Square error for image 2.13g: 71908
Square error for image 2.13h: 0
Note that the reconstructed images are complex and in each case we consider only the
real part of the reconstructed image.
From the basis images shown in gures 2.11 and 2.12, we can see that after the rows
and columns marked with number 3, the basis images are symmetrically repeated. This
is due to the nature of the complex exponential functions. This also means that by the
4th reconstruction shown in gure 2.13, the highest resolution details of the image have
already been in place, and after that point, the extra components that are incorporated
are gradually improving the details at various scales in the real part and gradually
reduce the information in the imaginary part, which becomes exactly 0 for the full
reconstruction.
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.13: Reconstructed image when the basis images used are those created from
the rst one, two,. . ., eight lines of matrix U of example 2.32, from top left to bottom
right, respectively.
Discrete Fourier transform 105
Why is the discrete Fourier transform more commonly used than the other trans-
forms?
The major advantage of the discrete Fourier transform over the Walsh transform is that it
obeys the convolution theorem. One may dene a corresponding theorem for the Walsh
functions, but the relationship between the Walsh transform and the convolution is not as
simple and it cannot be implemented cheaply on a computer. The convolution theorem makes
the Fourier transform by far the most attractive in image processing.
Apart from that, the Fourier transform uses very detailed basis functions, so in general it
can approximate an image with smaller error than the other transforms for a xed number
of terms retained. This may be judged from the reconstruction errors of example 2.34, when
compared with the reconstruction errors of examples 2.25 and 2.26. We must compare the
errors for reconstructed images (a), (b) and (d) which correspond to keeping the rst 2
0
, 2
1
,
and 2
2
basis functions, respectively. In this particular example, however, the Walsh transform
seems to produce better approximations for high numbers of retained coecients, as judged
by the square error, although the Fourier reconstructions appear visually more acceptable.
This touches upon the problem of expressing the quality of an image by some function of
its values: there are no really quantitative measures of image quality that correspond to the
perceived quality of an image by human viewers.
In any case, we must remember that when we say we retained n number of basis images,
in the case of the Fourier transform we actually require 2n coecients for the reconstruction,
while in the case of Haar and Walsh transforms we require only n coecients. This is because
the Fourier coecients are complex and both their real and imaginary parts have to be stored
or transmitted.
What does the convolution theorem state?
The convolution theorem states that: the Fourier transform of the convolution of two functions
is proportional to the product of the individual Fourier transforms of the two functions. If
the functions are images dened over a nite space, this theorem is true only if we assume
that each image is repeated periodically in all directions.
Box 2.7. If a function is the convolution of two other functions, what is the
relationship of its DFT with the DFTs of the two functions?
Assume that we convolve two discrete 2D functions g(n, m) and w(n, m) to produce
another function v(n, m):
v(n, m) =
N1
=0
M1
=0
g(n n
, mm
)w(n
, m
) (2.195)
Let us say that the discrete Fourier transforms of these three functions are v, g and w,
respectively. To nd a relationship between them, we shall try to calculate the DFT of
106 Image Processing: The Fundamentals
v(n, m). For this purpose, we multiply both sides of equation (2.195) with the kernel
1
NM
exp
_
j2
_
pn
N
+
qm
M
__
(2.196)
and sum over all m and n. Equation (2.195) then becomes:
1
NM
N1
n=0
M1
m=0
v(n, m)e
j2[
pn
N
+
qm
M
]
=
1
NM
N1
=0
M1
=0
N1
n=0
M1
m=0
g(nn
, mm
)w(n
, m
)e
j2[
pn
N
+
qm
M
]
(2.197)
We recognise the left-hand side of this expression to be the discrete Fourier transform
of v, ie:
v(p, q) =
1
NM
N1
=0
M1
=0
N1
n=0
M1
m=0
g(n n
, mm
)w(n
, m
)e
j2
pn
N
e
j2
qm
M
We would like to split the expression on the right-hand side into the product of two
double sums, which eventually will be identied as the DFTs of g and w. To achieve
this, we must have independent indices for g and w. We introduce new indices:
n n
, mm
(2.198)
Then n = n
+n
, m = m
+m
and n
. To do that,
we map the area over which we sum in the (n, m) space into the corresponding area in
the (n
, m
) space:
m
n
m
n
n
m
n
n
m m
m
n
n
=
0
n
=
N
1
m=M1
m=0
=M1
=
N
=
Figure 2.14: The area over which we sum is like a oating rectangle which is shifted
about in the coordinate space we use, according to the change of variables we perform.
The area over which we sum in the (n, m) space is enclosed by four lines with equations
given on the left-hand side of the list below. Each of these lines is transformed into a
line in the (n
, m
= m
m = M 1 m
= M 1 m
n = 0 n
= n
n = N 1 n
= N 1 n
=0
M1
=0
w(n
, m
)e
j2
pn
N
+
qm
M1m
=m
N1n
=n
g(n
, m
)e
j2
pn
N
+
qm
(2.199)
Let us concentrate on the last two sums of (2.199). Let us call them factor T. We may
separate the negative from the positive indices of n
and write:
T
M1m
=m
_
_
1
=n
+
N1n
=0
_
_
g(n
, m
)e
j2
pn
N
+
qm
=
M1m
=m
e
j2
pm
M
1
=n
g(n
, m
)e
j2
qn
N
+
M1m
=m
e
j2
pm
M
N1n
=0
g(n
, m
)e
j2
qn
N
(2.200)
Clearly the two images g and w are not dened for negative indices. We may choose to
extend their denition for indices outside the range [0, N 1], [0, M 1] in any which
way suits us. Let us examine the factor:
1
=n
g(n
, m
)e
j2q
n
N
(2.201)
We dene a new variable n
N + n
= n
=Nn
g(n
N, m
)e
j2q
n
N
e
j2q
(2.202)
As q is an integer, e
j2q
= 1. Now if we choose to dene: g(n
N, m
) g(n
, m
),
the above sum is:
N1
=Nn
g(n
, m
)e
j2q
n
N
(2.203)
108 Image Processing: The Fundamentals
Since n
. Then
the above expression becomes:
N1
=Nn
g(n
, m
)e
j2q
n
N
(2.204)
This term is added to the term
Nn
=0
g(n
, m
)e
j2q
n
N
(2.205)
in (2.200) and the two together may be written as:
N1
=0
g(n
, m
)e
j2q
n
N
(2.206)
We can work in a similar way for the summation over index m
=0
N1
=0
g(n
, m
)e
j2
pn
N
+
qm
(2.207)
This does not contain indices m
, n
and m
n=0
M1
m=0
g(n, m)w(n, m)e
j2[
kn
N
+
lm
M
]
(2.211)
We may express images g and w in terms of their DFTs g and w as follows:
g(n, m) =
N1
p=0
M1
q=0
g(p, q)e
j2[
pn
N
+
qm
M
]
w(n, m) =
N1
s=0
M1
r=0
w(s, r)e
j2[
ns
N
+
rm
M
]
(2.212)
Substituting these expressions into (2.211) we obtain:
x(k, l) =
1
NM
N1
n=0
M1
m=0
N1
p=0
M1
q=0
g(p, q)e
j2[
pn
N
+
qm
M
]
N1
s=0
M1
r=0
w(s, r)e
j2[
ns
N
+
rm
M
]
e
j2[
km
M
+
ln
N
]
=
1
NM
N1
n=0
M1
m=0
N1
p=0
M1
q=0
N1
s=0
M1
r=0
g(p, q) w(s, r)e
j2[
n(s+p)
N
+
m(r+q)
M
]
e
j2[
kn
N
+
lm
M
]
(2.213)
We notice that indices n and m do not appear in g(p, q) and w(s, r), so we may collect
the terms that depend of n and m separately and sum over them:
x(k, l) =
1
NM
N1
p=0
M1
q=0
N1
s=0
M1
r=0
g(p, q) w(s, r)
N1
n=0
e
j2
s+pk
N
M1
m=0
e
j2
r+ql
M
(2.214)
To compute the sums of the exponential functions we apply formula (2.164), on page
95, once for S N and t s +p k and once for S M and t r +q l:
x(k, l) =
1
NM
N1
p=0
M1
q=0
N1
s=0
M1
r=0
g(p, q) w(s, r)N(s +p k)M(r +q l) (2.215)
110 Image Processing: The Fundamentals
The delta functions will pick from all values of s and r only the ones that may zero
their arguments, ie they will only retain the terms for which s = k p and r = l q.
Therefore:
x(k, l) =
N1
p=0
M1
q=0
g(p, q) w(k p, l q)
. .
Convolution of g with w
(2.216)
Example 2.36
Show that if g(k, l) is an M N image dened as a periodic function with
periods M and N in the whole (k, l) space, its DFT g(m, n) is also periodic
in the (m, n) space, with the same periods.
We must show that g(m + M, n + N) = g(m, n). We start from the denition of
g(m, n):
g(m, n) =
1
MN
M1
k=0
N1
l=0
g(k, l)e
j2[
km
M
+
ln
N
]
(2.217)
Then
g(m+M, n +N) =
1
MN
M1
k=0
N1
l=0
g(k, l)e
j2[
k(m+M)
M
+
l(n+N)
N
]
=
1
MN
M1
k=0
N1
l=0
g(k, l)e
j2
km
M
e
j2
ln
N
= g(m, n) (2.218)
where we made use of e
j2t
= cos(2t) +j sin(2t) = 1, for t an integer.
Example B2.37
Show that if v(n, m) is dened as
v(n, m)
N1
=0
M1
=0
g(n n
, mm
)w(n
, m
) (2.219)
where g(n, m) and w(n, m) are two periodically dened images with
Discrete Fourier transform 111
periods N and M in the two variables respectively, v(n, m) is also given
by:
v(n, m) =
N1
=0
M1
=0
w(n n
, mm
)g(n
, m
) (2.220)
Dene some new variables of summation k and l, so that:
k n n
= n k
l mm
= ml (2.221)
As n
takes values from 0 to N1, k will take values from n to nN+1. Similarly, as
m
takes values from 0 to M1, l will take values from m to mM +1. Substituting
in equation (2.219) we have:
v(n, m) =
nN+1
k=n
mM+1
l=m
g(k, l)w(n k, ml) (2.222)
Consider the sum:
nN+1
k=n
g(k, l)w(n k, ml) (2.223)
First reverse the order of summation, with no consequence, and write it as:
n
k=N+n+1
g(k, l)w(n k, ml) (2.224)
Next split the range of indices [N + n + 1, n] into the two ranges [N + n + 1, 1]
and [0, n]:
1
k=N+n+1
g(k, l)w(n k, ml) +
n
k=0
g(k, l)w(n k, ml) (2.225)
Then note that the range of indices [N + n + 1, 1] is eectively indices [N, 1]
minus indices [N, N +n]:
1
k=N
g(k, l)w(n k, ml)
. .
Change variable
kk+N
N+n
k=N
g(k, l)w(n k, ml)
. .
Change variable
kk+N
+
n
k=0
g(k, l)w(n k, ml)
(2.226)
112 Image Processing: The Fundamentals
After the change of variables:
N1
k=0
g(
k N, l)w(n
k +N, ml)
n
k=0
g(
k N, l)w(n
k +N, ml)
+
n
k=0
g(k, l)w(n k, ml) (2.227)
g periodic g(k N, l) = g(k, l)
w periodic w(s +N, t) = w(s, t) (2.228)
Therefore, the last two sums in (2.227) are identical and cancel each other, and the
summation over k in (2.222) is from 0 to N 1. Similarly, we can show that the
summation over l in (2.222) is from 0 to M 1, and thus prove equation (2.220).
How can we display the discrete Fourier transform of an image?
Assume that the discrete Fourier transform of an image is g(p, q). Scalars g(p, q) are the
coecients of the expansion of the image into discrete Fourier functions, each one of which
corresponds to a dierent pair of spatial frequencies in the 2D (p, q) plane. As p and q increase,
the contributions of these high frequencies to the image become less and less signicant (in
terms of the eect they have on the mean square error of the image when it is reconstructed
without them) and thus the values of the corresponding coecients g(p, q) become smaller.
We may nd dicult to display these coecients, because their values span a great range.
So, for displaying purposes only, we use the following logarithmic function:
d(p, q) log
10
(1 +[ g(p, q)[) (2.229)
This function is then scaled into a displayable range of grey values and displayed instead
of g(p, q). Notice that when g(p, q) = 0, d(p, q) = 0 too. This function has the property of
reducing the ratio between the high values of g and the small ones, so that small and large
values can be displayed in the same scale. For example, if g
max
= 100 and g
min
= 0.1, it is
rather dicult to draw these numbers on the same graph, as their ratio is 1000. However,
log
10
(101) = 2.0043 and log
10
(1.1) = 0.0414 and their ratio is only 48. So, both numbers can
be drawn on the same scale more easily.
In order to display the values of d(p, q) as a grey image, the scaling is done as follows.
The minimum and the maximum values of d(p, q) are identied and are denoted by d
min
and d
max
, respectively. Then each frequency sample (p, q) is assigned a new value d
new
(p, q)
dened as:
d
new
(p, q)
_
d(p, q) d
min
d
max
d
min
255 + 0.5
_
(2.230)
Note that when d(p, q) = d
min
, the fraction is 0, and taking the integer part of 0.5 yields 0.
When d(p, q) = d
max
, the fraction becomes 1 and multiplied with 255 yields 255. The term
Discrete Fourier transform 113
0.5 is used to ensure that the real numbers that result from the division and multiplication
with 255 are rounded to the nearest integer, rather than truncated to their integer part. For
example, if the resultant number is 246.8, the integer part is 246 but if we add 0.5 rst and
then take the integer part, we get 247 which is an integer that represents 246.8 much better
than 246.
What happens to the discrete Fourier transform of an image if the image is
rotated?
We rewrite here the denition of the discrete Fourier transform, ie equation (2.162), on page
94, for a square image (M = N):
g(m, n) =
1
N
2
N1
k=0
N1
l=0
g(k, l)e
j2
km+ln
N
(2.231)
We may introduce polar coordinates on the planes (k, l) and (m, n), as follows: k r cos ,
l r sin , m cos , n sin . We note that km + ln = r(cos cos + sin sin ) =
r cos( ). Then equation (2.231) becomes:
g(, ) =
1
N
2
N1
k=0
N1
l=0
g(r, )e
j2
r cos()
N
(2.232)
Variables k and l, over which we sum, do not appear in the summand explicitly. However,
they are there implicitly and the summation is supposed to happen over all relevant points.
From the values of k and l, we are supposed to nd the corresponding values of r and .
Assume now that we rotate g(r, ) by an angle
0
. It becomes g(r, +
0
). We want to
nd the discrete Fourier transform of this rotated function. Formula (2.232) is another slot
machine. We slot in the appropriate place the function, the transform of which we require,
and out comes its DFT. Therefore, we shall use formula (2.232) to calculate the DFT of
g(r, +
0
) by simply replacing g(r, ) with g(r, +
0
). We denote the DFT of g(r, +
0
)
as
g(, ). We get:
g(, ) =
1
N
2
. .
all points
g(r, +
0
)e
j2
r cos()
N
(2.233)
To nd the relationship between
g(, ) and g(, ) we have somehow to make g(r, )
appear on the right-hand side of this expression. For this purpose, we introduce a new
variable,
+
0
and replace by
0
in (2.233):
g(, ) =
1
N
2
. .
all points
g(r,
)e
j2
r cos(
0
)
N
(2.234)
Then on the right-hand side we recognise the DFT of the unrotated image calculated at
+
0
instead of : g(, +
0
). That is, we have:
g(, ) = g(, +
0
) (2.235)
114 Image Processing: The Fundamentals
We conclude that:
The DFT of the image rotated by
0
= the DFT of the unrotated image rotated
by the same angle
0
.
Example 2.38
Rotate the image of example 2.29, on page 97, clockwise by 90
o
about its
top left corner and recalculate its discrete Fourier transform. Thus, verify
the relationship between the discrete Fourier transform of a 2D image and
the discrete Fourier transform of the same image rotated by angle
0
.
The rotated by 90
o
image is:
_
_
_
_
0 0 0 0
0 0 0 0
1 1 1 1
0 0 0 0
_
_
_
_
(2.236)
To calculate its DFT we multiply it rst from the right with matrix U of example 2.28,
1
4
_
_
_
_
0 0 0 0
0 0 0 0
1 1 1 1
0 0 0 0
_
_
_
_
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
=
_
_
_
_
0 0 0 0
0 0 0 0
1 0 0 0
0 0 0 0
_
_
_
_
(2.237)
and then multiply the result from the left with the same matrix U:
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
_
_
_
_
0 0 0 0
0 0 0 0
1 0 0 0
0 0 0 0
_
_
_
_
=
_
_
_
_
1/4 0 0 0
1/4 0 0 0
1/4 0 0 0
1/4 0 0 0
_
_
_
_
(2.238)
By comparing the above result with the result of example 2.29 we see that the dis-
crete Fourier transform of the rotated image is the discrete Fourier transform of the
unrotated image rotated clockwise by 90
o
.
What happens to the discrete Fourier transform of an image if the image is
shifted?
Assume that we shift the image to the point (k
0
, l
0
), so that it becomes g(k k
0
, l l
0
). To
calculate the DFT of the shifted image, we slot this function into formula (2.231). We denote
the DFT of g(k k
0
, l l
0
) as
g(m, n) and obtain:
g(m, n) =
1
N
2
N1
k=0
N1
l=0
g(k k
0
, l l
0
)e
j2
km+ln
N
(2.239)
Discrete Fourier transform 115
Figure 2.15: A 3 3 image repeated ad innitum in both directions. Any 3 3 oating
window (depicted with the thick black line) will pick up exactly the same pixels wherever it
is placed. If each pattern represents a dierent number, the average inside each black frame
will always be the same. When we take a weighted average, as long as the weights are also
periodic with the same period as the image and are shifted in the same way as the elements
inside each window, the result will also be always the same. Kernel e
j2m/3
used for DFT
has such properties and that is why the range of indices over which we sum does not matter,
as long as they are consecutive and equal in number to the size of the image.
To nd a relationship between
g(m, n) and g(m, n), we must somehow make g(k, l) appear
on the right-hand side of this expression. For this purpose, we dene new variables k
kk
0
and l
l l
0
. Then:
g(m, n) =
1
N
2
N1k
0
=k
0
N1l
0
=l
0
g(k
, l
)e
j2
k
m+l
n
N
e
j2
k
0
m+l
0
n
N
(2.240)
Because of the assumed periodic repetition of the image in both directions and the easily
proven periodicity of the exponential kernel, also in both directions with the same period N,
where exactly we perform the summation (ie between which indices) does not really matter,
as long as a window of the right size is used for the summation (see gure 2.15). In other
words, as long as summation indices k
and l
and l
and l
and, therefore, it can come out of the summation. Then, we recognise in (2.240)
the DFT of g(k, l) appearing on the right-hand side. (Note that k
, l
, l
2
2
+j
2
2
=
2
2
(1 +j)
m = 0 n = 2 F = e
j2
6
8
= cos
3
2
+j sin
3
2
= j
m = 0 n = 3 F = e
j2
9
8
= e
j
4
= cos
4
+j sin
4
=
2
2
+j
2
2
=
2
2
(1 +j)
m = 1 n = 0 F = e
j2
3
8
=
2
2
(1 +j)
m = 1 n = 1 F = e
j2
6
8
= j
m = 1 n = 2 F = e
j2
9
8
=
2
2
(1 +j)
m = 1 n = 3 F = e
j2
12
8
= e
j
= cos +j sin = 1
m = 2 n = 0 F = e
j2
6
8
= j
m = 2 n = 1 F = e
j2
9
8
=
2
2
(1 +j)
m = 2 n = 2 F = e
j2
12
8
= 1
m = 2 n = 3 F = e
j2
15
8
= e
j
7
4
= cos
7
4
+j sin
7
4
=
2
2
(1 j)
m = 3 n = 0 F = e
j2
9
8
=
2
2
(1 +j)
Discrete Fourier transform 117
m = 3 n = 1 F = e
j2
12
8
= 1
m = 3 n = 2 F = e
j2
15
8
=
2
2
(1 j)
m = 3 n = 3 F = e
j2
18
8
= e
j
2
= cos
2
+j sin
2
= j (2.243)
So, the DFT of the shifted function is given by (2.193) if we multiply each element of
that matrix with the corresponding correction factor:
g
shifted
=
_
_
_
_
_
1
4
1
1+j
8
2
2
(1 +j) 0 (j)
j1
8
2
2
(1 +j)
1+j
8
2
2
(1 +j)
j
8
(j) 0
2
2
(1 +j)
1
8
(1)
0 (j) 0
2
2
(1 +j) 0 (1) 0
2
2
(1 j)
j1
8
2
2
(1 +j)
1
8
(1) 0
2
2
(1 j)
j
8
j
_
_
_
_
_
=
_
_
_
_
_
1
4
2
8
0
2
8
2
8
1
8
0
1
8
0 0 0 0
2
8
1
8
0
1
8
_
_
_
_
_
(2.244)
Example 2.40
Compute the DFT of image (2.191), on page 102, using formula (2.162)
and assuming that the centre of the axes is in the centre of the image.
If the centre of the axes is in the centre of the image, the only nonzero elements of
this image are at half-integer positions. They are:
g
_
1
2
,
1
2
_
= g
_
1
2
,
1
2
_
= g
_
1
2
,
1
2
_
= g
_
1
2
,
1
2
_
= 1 (2.245)
Applying formula (2.162) then for k and l values from the set 1/2, 1/2, we obtain:
g(m, n) =
1
16
_
e
j
2
4
mn
2
+e
j
2
4
m+n
2
+e
j
2
4
mn
2
+e
j
2
4
m+n
2
_
(2.246)
We apply this formula now to work out the elements of the DFT of the image. For
m = n = 1 we obtain:
g(1, 1) =
1
16
_
e
j
2
+ 1 + 1 +e
j
=
1
16
[j + 2 j] =
1
8
(2.247)
118 Image Processing: The Fundamentals
For m = 0 and n = 1 we obtain:
g(0, 1) =
1
16
_
e
j
4
+e
j
4
+e
j
4
+e
j
=
1
16
_
2
2
+j
2
2
+
2
2
j
2
2
+
2
2
+j
2
2
+
2
2
j
2
2
_
=
2
8
(2.248)
We work similarly for the other terms. Eventually we obtain the same DFT we ob-
tained in example 2.39, given by equation (2.244), where we applied the shifting prop-
erty of the Fourier transform.
What is the relationship between the average value of the image and its DFT?
The average value of the image is given by:
g =
1
N
2
N1
k=0
N1
l=0
g(k, l) (2.249)
If we set m = n = 0 in (2.231), we get:
g(0, 0) =
1
N
2
N1
k=0
N1
l=0
g(k, l) (2.250)
Therefore, the mean of an image and the direct component (or dc) of its DFT (ie the
component at frequency (0, 0)) are equal:
g = g(0, 0) (2.251)
Example 2.41
Conrm the relationship between the average of image
g =
_
_
_
_
0 0 0 0
0 1 1 0
0 1 1 0
0 0 0 0
_
_
_
_
(2.252)
and its discrete Fourier transform.
Apply the discrete Fourier transform formula (2.162) for N = M = 4 and for
Discrete Fourier transform 119
m = n = 0:
g(0, 0) =
1
16
3
k=0
3
l=0
g(k, l) =
1
16
(0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 1 + 0
+0 + 0 + 0 + 0) =
1
4
(2.253)
The mean of g is:
g
1
16
3
k=0
3
l=0
g(k, l) =
1
16
(0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + 1 + 1 + 0
+0 + 0 + 0 + 0) =
4
16
=
1
4
(2.254)
Thus (2.251) is conrmed.
What happens to the DFT of an image if the image is scaled?
When we take the average of a discretised function over an area over which this function is
dened, we implicitly perform the following operation: we divide the area into small elemen-
tary areas of size x y say, take the value of the function at the centre of each of these
little tiles and assume that it represents the value of the function over the whole tile. Thus,
we sum and divide by the total number of tiles. So, really the average of a function is dened
as:
g =
1
N
2
N1
x=0
N1
y=0
g(x, y)xy (2.255)
We simply omit x and y because x and y are incremented by 1 at a time, so x =
y = 1. We also notice, from the denition of the discrete Fourier transform, that really, the
discrete Fourier transform is a weighted average, where the value of g(k, l) is multiplied with
a dierent weight inside each little tile. Seeing the DFT that way, we realise that the correct
denition of the discrete Fourier transform should include a factor k l too, as the area
of the little tile over which we assume the value of the function g to be constant. We omit it
because k = l = 1. So, the formula for DFT that explicitly states this is:
g(m, n) =
1
N
2
N1
k=0
N1
l=0
g(k, l)e
j2
km+ln
N
kl (2.256)
Now assume that we change the scales in the (k, l) plane and g(k, l) becomes g(k, l).
We denote the discrete Fourier transform of the scaled g as
g(m, n). In order to calculate it,
we must slot function g(k, l) in place of g(k, l) in formula (2.256). We obtain:
g(m, n) =
1
N
2
N1
k=0
N1
l=0
g(k, l)e
j2
km+ln
N
kl (2.257)
120 Image Processing: The Fundamentals
We wish to nd a relationship between
g(m, n) and g(m, n). Therefore, somehow we must
make g(k, l) appear on the right-hand side of equation (2.257). For this purpose, we dene
new variables of summation k
k and l
l. Then:
g(m, n) =
1
N
2
(N1)
=0
(N1)
=0
g(k
, l
)e
j2
k
m
+l
n
N
k
(2.258)
The summation that appears in this expression spans all points over which function g(k
, l
)
is dened, except that the summation variables k
and l
,
n
g(m, n) =
1
g
_
m
,
n
_
(2.259)
The DFT of the scaled function =
1
|product of scaling factors|
the DFT of the
unscaled function calculated at the same point inversely scaled.
Example 2.42
You are given a continuous function f(x, y) where < x < + and <
y < +, dened as
f(x, y) =
_
_
_
1 for 0.5 < x < 4.5 and 0.5 < y < 1.5
0 elsewhere
(2.260)
Sample this function at integer positions (i, j), where 0 i, j < 4, to create
a 4 4 digital image.
Figure 2.16 on the left shows a plot of this function. On the right it shows the sampling
points (i, j) marked as black dots. The region highlighted with grey is the area where
the function has value 1. At all other points the function has value 0. So, the image
created by sampling this function at the marked points is:
g =
_
_
_
_
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
_
_
_
_
(2.261)
Discrete Fourier transform 121
x
y
f(x,y)
1
2
3
4
y
3 2
1
4
x
1
1
2
2
3
3
4
4
1
5
1
5
Figure 2.16: On the left, the plot of a continuous function and on the right the region
where the function takes nonzero values highlighted in grey. The black dots are the
sampling points we use to create a digital image out of this function.
Example 2.43
Scale function f(x, y) of example 2.42 to produce function
f(x, y) where
= = 2. Plot the scaled function and sample it at points (i, j) where i
and j take all possible values from the set 0, 0.5, 1, 1.5.
We note that function
f(x, y) will be nonzero when 0.5 < x < 4.5 and 0.5 <
y < 1.5. That is
f(x, y) will be nonzero when 0.5/ < x < 4.5/ and 0.5/ <
y < 1.5/. So, for 0.25 < x < 2.25 and 0.25 < y < 0.75,
f(x, y) will be 1 and it
will be 0 for all other values of its argument. This function is plotted in the left panel
of gure 2.17. On the right we can see how this plot looks from above, marking with
grey the region where the function takes value 1, and with black dots the points where
it will be sampled to create a digital version of it. The digital image we create this way
is:
g =
_
_
_
_
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
_
_
_
_
(2.262)
122 Image Processing: The Fundamentals
x
y
f(x,y)
1
2
3
4
x
y
3 2
1
4
1
1
2
2
3
3
4
4
5
5
1
1 1
Figure 2.17: On the left, the plot of the scaled function and on the right the region
where the function takes nonzero values highlighted in grey. The black dots are the
sampling points we use to create a digital image out of this function.
Example 2.44
Use formula (2.256) to compute the DFT of the digital image you created
in example 2.43.
Here:
N = 4, k = l =
1
2
(2.263)
So, formula (2.256) takes the form:
g(m, n) =
1
16
k{0,0.5,1,1.5}
l{0,0.5,1,1.5}
g(k, l)e
j2
km+ln
4
1
4
=
1
64
_
g(0, 0.5)e
j2
n
8
+g(0.5, 0.5)e
j2
m+n
4
+g(1, 0.5)e
j2
2m+n
8
+g(1.5, 0.5)e
j2
3m+n
8
_
=
1
64
_
e
j
n
4
+e
j
m+n
4
+e
j
2m+n
4
+e
j
3m+n
4
_
(2.264)
This is the DFT of the image.
Discrete Fourier transform 123
Example 2.45
Compute the DFT of image (2.261). Then use the result of example 2.44
to verify formula (2.259).
We use formula (2.256) with N = 4 and k = l = 1 to obtain the DFT of (2.261):
g(m, n) =
1
16
3
k=0
3
l=0
g(k, l)e
j2
km+ln
4
=
1
16
_
e
j2
n
4
+e
j2
m+n
4
+e
j2
2m+n
4
+e
j2
3m+n
4
_
=
1
16
_
e
j
n
2
+e
j
m+n
2
+e
j
2m+n
2
+e
j2
3m+n
2
_
(2.265)
For = = 2, according to formula (2.259), we must have:
g(m, n) =
1
4
g
_
m
2
,
n
2
_
(2.266)
By comparing (2.264) and (2.265) we see that (2.266) is veried.
Example B2.46
If w
N
e
j2
N
, show that
w
2t
2M
= w
t
M
w
2ut+u
2M
= w
ut
M
w
u
2M
(2.267)
where N, M, t and u are integers.
By denition:
w
2t
2M
=
_
e
j2
2M
_
2t
= e
j22t
2M
= e
j2t
M
=
_
e
j2
M
_
t
= w
t
M
(2.268)
Similarly:
w
2ut+u
2M
=
_
e
j2
2M
_
2ut+u
= e
j2(2ut+u)
2M
= e
j22ut
2M
e
j2u
2M
= e
j2ut
M
w
u
2M
= w
ut
M
w
u
2M
(2.269)
124 Image Processing: The Fundamentals
Example B2.47
If w
M
e
j2
M
, show that
w
u+M
M
= w
u
M
w
u+M
2M
= w
u
2M
(2.270)
where M and u are integers.
By denition:
w
u+M
M
= e
j2(u+M)
M
= e
j2u
M
e
j2M
M
= w
u
M
e
j2
= w
u
M
(2.271)
Also:
w
u+M
2M
= e
j2(u+M)
2M
= e
j2u
2M
e
j2M
2M
= w
u
2M
e
j
= w
u
2M
(2.272)
Box 2.8. What is the Fast Fourier Transform?
All the transforms we have dealt with so far are separable. This means that they may
be computed as two 1D transforms as opposed to one 2D transform. The discrete
Fourier transform in 2D may be computed as two discrete Fourier transforms in 1D,
using special algorithms which have been especially designed for speed and eciency.
Such algorithms are called Fast Fourier Transforms (FFT). We shall describe briey
here the Fast Fourier Transform algorithm called successive doubling. We shall work
in 1D. The discrete Fourier transform is dened as
f(u) =
1
N
N1
x=0
f(x)w
ux
N
(2.273)
where w
N
e
j2
N
. Assume now that N = 2
n
. Then we may write N as 2M and
substitute in (2.273):
f(u) =
1
2M
2M1
x=0
f(x)w
ux
2M
(2.274)
We may separate the odd and even values of the argument of f. Let us express that by
writing:
x 2y when x is even
x 2y + 1 when x is odd (2.275)
Discrete Fourier transform 125
Then:
f(u) =
1
2
_
1
M
M1
y=0
f(2y)w
u(2y)
2M
+
1
M
M1
y=0
f(2y + 1)w
u(2y+1)
2M
_
(2.276)
From example 2.46 we know that w
2uy
2M
= w
uy
M
and w
2uy+u
2M
= w
uy
M
w
u
2M
. Then:
f(u) =
1
2
_
1
M
M1
y=0
f(2y)w
uy
M
+
1
M
M1
y=0
f(2y + 1)w
uy
M
w
u
2M
_
(2.277)
We may write
f(u)
1
2
_
f
even
(u) +
f
odd
(u)w
u
2M
_
(2.278)
where we have dened
f
even
(u) to be the DFT of the even samples of function f and
f
odd
to be the DFT of the odd samples of function f:
f
even
(u)
1
M
M1
y=0
f(2y)w
uy
M
f
odd
(u)
1
M
M1
y=0
f(2y + 1)w
uy
M
(2.279)
Formula (2.278), however, denes
f(u) only for u < M because denitions (2.279) are
valid for 0 u < M, being the DFTs of M-sample long functions. We need to dene
f(u +M) =
1
2
_
1
M
M1
y=0
f(2y)w
uy+My
M
+
1
M
M1
y=0
f(2y + 1)w
uy+My
M
w
u+M
2M
_
(2.280)
Making use of equations (2.270) we obtain:
f(u +M) =
1
2
_
f
even
(u)
f
odd
(u)w
u
2M
_
(2.281)
We note that formulae (2.278) and (2.281) with denitions (2.279) fully dene
f(u).
Thus, an N point transform may be computed as two N/2 point transforms given by
equations (2.279). Then equations (2.278) and (2.281) may be used to calculate the
full transform. It can be shown that the number of operations required reduces from
being proportional to N
2
to being proportional to Nlog
2
N. This is another reason why
images with dimensions powers of 2 are preferred.
126 Image Processing: The Fundamentals
What are the advantages and disadvantages of DFT?
DFT oers a much richer representation of an image than the Walsh or the Haar transforms.
However, it achieves that at the expense of using complex numbers. So, although in theory
the approximation of an image in terms of Fourier coecients is more accurate than its
approximation in terms of Walsh transforms for a xed number of terms retained, the Fourier
coecients are complex numbers and each one requires twice as many bits to be represented
as a Walsh coecient. So, it is not fair to compare the error of the DFT with that of the
other transforms for a xed number of terms retained, but rather the error of the DFT for K
terms retained with that of the other transforms for 2K terms retained.
Another disadvantage of DFT is shared with the Walsh transform when these two trans-
forms are compared with the Haar transform: they are both global transforms as can be
inferred from the structure of the basis functions in terms of which they expand an image.
So, neither DFT nor the Walsh transform allows the preferential reconstruction of an image
at certain localities like Haar transform does. This situation, however, is mitigated by ap-
plying the DFT in local windows of the image. This leads to Gabor functions which are
extensively examined in Book II and are beyond the scope of this book.
Can we have a real valued DFT?
Yes, if the signal is real and symmetric, dened over a symmetric range of values.
Let us consider the DFT of a symmetric signal f(k) that is dened over a symmetric range
of indices. Let us say that this signal consists of N samples where N is even, so that we may
write N 2J.
We have to be careful now about the values the indices of this signal take when we use
them in the exponent of the kernel of the DFT. As the origin of the axes is in between the two
central samples of the signal, all samples are at half integer locations, starting from
1
2
and
going up to
_
J
1
2
_
. For example, in example 2.40, on page 117, we have a 2D image. One
line of it may be treated as a 1D signal with N = 4 and therefore J = 2, with the indices of
the available samples taking values 2, 1, 0, 1, corresponding to true coordinate locations
_
3
2
,
1
2
,
1
2
,
3
2
_
. It is these coordinate locations that will have to be used in the exponent of
the DFT kernel to weigh the corresponding sample values.
So, the DFT of signal f(k) will be given by:
F(m) =
1
2J
J1
k=J
f(k)e
j
2
2J
m(k+
1
2
)
=
1
2J
J1
k=J
f(k) cos
_
2m
2J
_
k +
1
2
__
j
1
2J
J1
k=J
f(k) sin
_
2m
2J
_
k +
1
2
__
. .
S
(2.282)
It can be shown that S = 0 (see example 2.48).
Discrete Fourier transform 127
Example B2.48
Show that the imaginary part of (2.282) is zero.
Let us split S into two parts, made up from the negative and non-negative indices:
S =
1
k=J
f(k) sin
_
2m
2J
_
k +
1
2
__
+
J1
k=0
f(k) sin
_
2m
2J
_
k +
1
2
__
(2.283)
In the rst sum on the right-hand side of the above equation, let us dene a new
summation variable k
k 1 k = k
then will be from J 1 to 0. As the order by which we sum does not matter, we may
exchange the lower with the upper limit. We shall then have:
S =
J1
=0
f(k
1) sin
_
2m
2J
_
k
1 +
1
2
__
+
J1
k=0
f(k) sin
_
2m
2J
_
k +
1
2
__
=
J1
=0
f(k
1) sin
_
2m
2J
_
k
1
2
__
+
J1
k=0
f(k) sin
_
2m
2J
_
k +
1
2
__
(2.284)
Function f(k) is symmetric, so the values at the negative indices [J, J +
1, . . . , 2, 1] are mirrored in the values of the non-negative indices [0, 1, . . . , J
2, J 1]. This means that f(k
1) = f(k
=0
f(k
) sin
_
2m
2J
_
k
+
1
2
__
+
J1
k=0
f(k) sin
_
2m
2J
_
k +
1
2
__
= 0 (2.285)
Example 2.49
Show that a real symmetric signal f(k) made up from an odd number of
samples 2J +1, dened over a symmetric range of indices, has a real DFT.
The signal is dened over indices [J, J + 1, . . . , 1, 0, 1, . . . , J 1, J]. Let us take
its DFT:
F(m) =
1
2J + 1
J
k=J
f(k)e
j2
mk
2J+1
(2.286)
128 Image Processing: The Fundamentals
We may separate the real and imaginary parts, to write:
F(m) =
1
2J + 1
J
k=J
f(k) cos
2mk
2J + 1
j
1
2J + 1
J
k=J
f(k) sin
2mk
2J + 1
. .
S
(2.287)
Let us concentrate on the imaginary part and let us split the sum into three terms, the
negative indices, the 0 index and the positive indices:
S =
1
k=J
f(k) sin
2mk
2J + 1
+f(0) sin0 +
J
k=1
f(k) sin
2mk
2J + 1
(2.288)
In the rst sum, we change summation variable from k to k
k k = k
. The
summation limits then become from J to 1, and since in summation the order of the
summands does not matter, we may say that the summation over k
runs from 1 to J:
S =
J
=1
f(k
) sin
_
2mk
2J + 1
_
+
J
k=1
f(k) sin
2mk
2J + 1
(2.289)
If f(k) is symmetric, f(k
) = f(k
=1
f(k
) sin
2mk
2J + 1
+
J
k=1
f(k) sin
2mk
2J + 1
(2.290)
As k
k=1
f(k) cos
2mk
2J + 1
_
(2.291)
Discrete Fourier transform 129
From example 2.49 we know that the DFT of such a signal is:
F(m) =
1
2J + 1
J
k=J
f(k) cos
2mk
2J + 1
. .
S
(2.292)
Let us split the sum into three terms, the negative indices, the 0 index and the positive
indices:
S =
1
k=J
f(k) cos
2mk
2J + 1
+f(0) cos 0 +
J
k=1
f(k) cos
2mk
2J + 1
(2.293)
In the rst sum we change summation variable from k to k
k k = k
. The
summation limits then become from J to 1, and since in summation the order of the
summands does not matter, we may say that the summation over k
runs from 1 to J:
S = f(0) +
J
=1
f(k
) cos
_
2mk
2J + 1
_
+
J
k=1
f(k) cos
2mk
2J + 1
(2.294)
If f(k) is symmetric, f(k
) = f(k
=1
f(k
) cos
2mk
2J + 1
+
J
k=1
f(k) cos
2mk
2J + 1
(2.295)
As k
k=0
f(k) cos
_
m
J
_
k +
1
2
__
(2.296)
According to equation (2.282) the DFT of such a signal is given by:
F(m) =
1
2J
J1
k=J
f(k) cos
_
2m
2J
_
k +
1
2
__
. .
S
(2.297)
130 Image Processing: The Fundamentals
Let us split the sum into two parts, made up from the negative and non-negative indices:
S =
1
k=J
f(k) cos
_
2m
2J
_
k +
1
2
__
+
J1
k=0
f(k) cos
_
2m
2J
_
k +
1
2
__
(2.298)
In the rst sum on the right-hand side of the above equation, let us dene a new
summation variable k
k 1 k = k
then will be from J 1 to 0. As the order by which we sum does not matter, we may
exchange the lower with the upper limit. We shall then have:
S =
J1
=0
f(k
1) cos
_
2m
2J
_
k
1 +
1
2
__
+
J1
k=0
f(k) cos
_
2m
2J
_
k +
1
2
__
=
J1
=0
f(k
1) cos
_
2m
2J
_
k
1
2
__
+
J1
k=0
f(k) cos
_
2m
2J
_
k +
1
2
__
(2.299)
Function f(k) is symmetric, so the values at the negative indices [J, J +
1, . . . , 2, 1] are mirrored in the values of the non-negative indices [0, 1, . . . , J
2, J 1]. This means that f(k
1) = f(k
1) = f(k
k=0
f(k) sin
_
m
J
_
k +
1
2
__
(2.300)
In the case of an antisymmetric signal dened over an odd set of indices 2J + 1, we have
f(k
) = f(k
) = f(k
), f(0) has to be 0 as no other number is equal to its opposite. Further, the two
sums in (2.290), instead of cancelling out, are identical. According to all these observations
then, the DFT of such a signal is given by:
F(m) = j
2
2J + 1
J
k=1
f(k) sin
2mk
2J + 1
(2.301)
Discrete Fourier transform 131
Example B2.52
A 2D function f(k, l) is dened for k taking integer values in the range
[M, M 1] and l taking integer values in the range [N, N 1], and it has
the following properties:
f(k1, l1) = f(k, l) f(k1, l) = f(k, l) f(k, l1) = f(k, l) (2.302)
Work out the DFT of this function.
Applying (2.282) to 2D, we obtain:
F(m, n) =
1
2M2N
M1
k=M
N1
l=N
f(k, l)e
j
2
2M
m(k+
1
2
)
e
j
2
2N
n(l+
1
2
)
=
1
4MN
M1
k=M
N1
l=N
f(k, l)e
j[
m
M
(k+
1
2
)+
n
N
(l+
1
2
)]
(2.303)
We split the negative from the non-negative indices in each fraction:
F(m, n) =
1
4MN
(2.304)
_
1
k=M
1
l=N
. .
A
1
+
1
k=M
N1
l=0
. .
A
2
+
M1
k=0
1
l=N
. .
A
3
+
M1
k=0
N1
l=0
. .
A
4
_
f(k, l)e
j[
m
M
(k+
1
2
)+
n
N
(l+
1
2
)]
We shall change variables of summation in A
1
to
k k 1 k =
k 1 and
l l 1 l =
k=0
N1
l=0
f(
k 1,
l 1)e
j[
m
M
(
k
1
2
)+
n
N
(
l
1
2
)]
(2.305)
Or:
132 Image Processing: The Fundamentals
A
1
=
M1
k=0
N1
l=0
f(
k,
l)
_
cos
_
m
M
_
k
1
2
_
+
n
N
_
l
1
2
__
j sin
_
m
M
_
k
1
2
_
+
n
N
_
l
1
2
___
=
M1
k=0
N1
l=0
f(
k,
l)
_
cos
_
m
M
_
k
1
2
__
cos
_
n
N
_
l
1
2
__
sin
_
m
M
_
k
1
2
__
sin
_
n
N
_
l
1
2
__
j sin
_
m
M
_
k
1
2
__
cos
_
n
N
_
l
1
2
__
j cos
_
m
M
_
k
1
2
__
sin
_
n
N
_
l
1
2
___
=
M1
k=0
N1
l=0
f(
k,
l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
__
+j sin
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+j cos
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.306)
Term A
4
may be written as:
A
4
=
M1
k=0
N1
l=0
f(k, l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
__
j sin
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
j cos
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.307)
We observe that A
1
+A
4
may be written as:
A
1
+A
4
=
M1
k=0
N1
l=0
2f(k, l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.308)
Discrete Fourier transform 133
Working in a similar way and changing variable of summation to
k k 1 k =
k 1 in term A
2
, we deduce:
A
2
=
M1
k=0
N1
l=0
f(
k 1, l)e
j[
m
M
(
k
1
2
)+
n
N
(l+
1
2
)]
=
M1
k=0
N1
l=0
f(
k, l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
__
+j sin
_
m
M
_
k
1
2
__
cos
_
n
N
_
l +
1
2
__
j cos
_
m
M
_
k
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.309)
Working in a similar way and changing variable of summation to
l l 1 l =
l 1 in term A
3
, we deduce:
A
3
=
M1
k=0
N1
l=0
f(k,
l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
__
j sin
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+j cos
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.310)
Sum A
2
+A
3
then is:
A
2
+A
3
=
M1
k=0
N1
l=0
2f(k, l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.311)
Combining the sums (2.308) and (2.311) into (2.304), we obtain:
F(m, n) =
1
MN
M1
k=0
N1
l=0
f(k, l) cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
(2.312)
134 Image Processing: The Fundamentals
Example B2.53
A 2D function f(k, l) is dened for k taking integer values in the range
[M, M1] and l taking integer values in the range [N, N 1], and has the
following properties:
f(k 1, l 1) = f(k, l) f(k 1, l) = f(k, l) f(k, l 1) = f(k, l)
(2.313)
Work out the DFT of this function.
This case is similar to that of example 2.52, except for the last two properties of
function f(k, l). The antisymmetry of the function in terms of each of its arguments
separately, does not aect the sum of terms A
1
+ A
4
of the DFT, given by (2.308).
However, because the function that appears in terms A
2
and A
3
changes sign with the
change of summation variable, the sum of terms A
2
and A
3
now has the opposite sign:
A
2
+A
3
=
M1
k=0
N1
l=0
2f(k, l)
_
cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
+sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
___
(2.314)
Combining sums (2.308) and (2.314) into (2.304), we obtain:
F(m, n) =
1
MN
M1
k=0
N1
l=0
f(k, l) sin
_
m
M
_
k +
1
2
__
sin
_
n
N
_
l +
1
2
__
(2.315)
Example B2.54
A 2D function f(k, l) is dened for k taking integer values in the range
[M, M] and l taking integer values in the range [N, N], and has the fol-
lowing properties:
f(k, l) = f(k, l) f(k, l) = f(k, l) f(k, l) = f(k, l) (2.316)
Work out the DFT of this function.
Applying equation (2.286) for 2D, we obtain:
F(m, n) =
1
(2M + 1)(2N + 1)
M
k=M
N
l=N
f(k, l)e
j
2mk
2M+1
e
j
2nl
2N+1
(2.317)
Discrete Fourier transform 135
We may separate the negative from the zero and the positive indices in the double sum:
F(m, n) =
1
(2M + 1)(2N + 1)
_
1
k=M
1
l=N
. .
A
1
+
0
k=0
1
l=N
. .
A
2
+
M
k=1
1
l=N
. .
A
3
+
1
k=M
0
l=0
. .
A
4
+
0
k=0
0
l=0
. .
A
5
+
M
k=1
0
l=0
. .
A
6
+
1
k=M
N
l=1
. .
A
7
+
0
k=0
N
l=1
. .
A
8
+
M
k=1
N
l=1
. .
A
9
f(k, l)e
j[
2mk
2M+1
+
2nl
2N+1
]
_
(2.318)
We shall use the identities cos(a + b) = cos a cos b sin a sin b and sin(a + b) =
sin a cos b + cos a sin b, and the fact that cos(a) = cos a and sin(a) = sin a, and
express the complex exponential in terms of trigonometric functions. Then:
A
9
=
M
k=1
N
l=1
f(k, l)
_
cos
2mk
2M + 1
cos
2nl
2N + 1
sin
2mk
2M + 1
sin
2nl
2N + 1
j sin
2mk
2M + 1
cos
2nl
2N + 1
j cos
2mk
2M + 1
sin
2nl
2N + 1
_
(2.319)
In A
1
, change summation variables to
k k k =
k and
l l l =
l. Also,
use the rst of properties (2.316):
A
1
=
M
k=1
N
l=1
f(
k,
l)
_
cos
2m
k
2M + 1
cos
2n
l
2N + 1
sin
2m
k
2M + 1
sin
2n
l
2N + 1
+j sin
2m
k
2M + 1
cos
2n
l
2N + 1
+j cos
2m
k
2M + 1
sin
2n
l
2N + 1
_
(2.320)
Then:
A
1
+A
9
=2
M
k=1
N
l=1
f(k, l)
_
cos
2mk
2M + 1
cos
2nl
2N + 1
sin
2mk
2M + 1
sin
2nl
2N + 1
_
(2.321)
We observe that:
A
8
=
N
l=1
f(0, l)
_
cos
2nl
2N + 1
j sin
2nl
2N + 1
_
(2.322)
In A
2
we set k = 0 and
l l l =
l:
A
2
=
N
l=1
f(0,
l)
_
cos
2n
l
2N + 1
+j sin
2n
l
2N + 1
_
(2.323)
136 Image Processing: The Fundamentals
Then:
A
2
+A
8
= 2
N
l=1
f(0, l) cos
2nl
2N + 1
(2.324)
In A
3
, we set
l l l =
l) = f(k,
l):
A
3
=
M
k=1
N
l=1
f(k,
l)
_
cos
2mk
2M + 1
cos
2n
l
2N + 1
+ sin
2mk
2M + 1
sin
2n
l
2N + 1
j sin
2mk
2M + 1
cos
2n
l
2N + 1
+j cos
2mk
2M + 1
sin
2n
l
2N + 1
_
(2.325)
In A
7
, we set
k k k =
k, l) = f(
k, l):
A
7
=
M
k=1
N
l=1
f(
k, l)
_
cos
2m
k
2M + 1
cos
2nl
2N + 1
+ sin
2m
k
2M + 1
sin
2nl
2N + 1
+j sin
2m
k
2M + 1
cos
2nl
2N + 1
j cos
2m
k
2M + 1
sin
2nl
2N + 1
_
(2.326)
Then:
A
3
+A
7
= 2
M
k=1
N
l=1
f(k, l)
_
cos
2mk
2M + 1
cos
2nl
2N + 1
+ sin
2mk
2M + 1
sin
2nl
2N + 1
_
(2.327)
We observe that:
A
6
=
M
k=1
f(k, 0)
_
cos
2mk
2M + 1
j sin
2mk
2M + 1
_
(2.328)
In A
4
, we set
k k k =
k, 0) = f(
k, 0):
A
4
=
M
k=1
f(
k, 0)
_
cos
2m
k
2M + 1
+j sin
2m
k
2M + 1
_
(2.329)
Then:
A
4
+A
6
= 2
M
k=1
f(k, 0) cos
2mk
2M + 1
(2.330)
Finally, we observe that A
5
= f(0, 0).
Putting all these together, we deduce that:
Discrete Fourier transform 137
F(m, n) =
1
(2M + 1)(2N + 1)
_
4
M
k=1
N
l=1
f(k, l) cos
2mk
2M + 1
cos
2nl
2N + 1
+2
M
k=1
f(k, 0) cos
2mk
2M + 1
+ 2
N
l=1
f(0, l) +f(0, 0)
_
(2.331)
Example B2.55
A 2D function f(k, l) is dened for k taking integer values in the range
[M, M] and l taking integer values in the range [N, N], and has the fol-
lowing properties:
f(k, l) = f(k, l)
f(k, l) = f(k, l)
f(k, l) = f(k, l)
f(0, l) = f(k, 0) = f(0, 0) = 0 (2.332)
Work out the DFT of this function.
We work as we did in example 2.54. However, due to the dierent properties of the
function, now terms A
2
= A
4
= A
5
= A
6
= A
8
= 0. Further, sum A
3
+ A
7
now has
the opposite sign. This results to:
F(m, n) = 4
1
(2M + 1)(2N + 1)
M
k=1
N
l=1
f(k, l) sin
2mk
2M + 1
sin
2nl
2N + 1
(2.333)
Can an image have a purely real or a purely imaginary valued DFT?
In general no. The image has to be symmetric about both axes in order to have a real valued
DFT (see example 2.40) and antisymmetric about both axes in order to have an imaginary
valued DFT. In general this is not the case. However, we may double the size of the image
by reecting it about its axes in order to form a symmetric or an antisymmetric image four
times the size of the original image. We may then take the DFT of the enlarged image
which will be guaranteed to be real or imaginary, accordingly. This will result in the so
called even symmetric discrete cosine transform, or the odd symmetric discrete
cosine transform, or the even antisymmetric discrete sine transform or the odd
antisymmetric discrete sine transform.
138 Image Processing: The Fundamentals
2.4 The even symmetric discrete cosine
transform (EDCT)
What is the even symmetric discrete cosine transform?
Assume that we have an M N image f and reect it about its left and top border so that
we have a 2M 2N image. The DFT of the 2M 2N image will be real (see example 2.52,
on page 131) and given by:
f
ec
(m, n)
1
MN
M1
k=0
N1
l=0
f(k, l) cos
_
m
M
_
k +
1
2
__
cos
_
n
N
_
l +
1
2
__
(2.334)
This is the even symmetric cosine transform (EDCT) of the original image.
Example 2.56
Compute matrix U
ec
appropriate for multiplying from left and right an 88
image with the origin of the axes in its centre, in order to obtain its DFT.
When the origin of the axis of a 1D signal is in the middle of the signal, the kernel of
the DFT is
1
2J
e
j
J
m(k+
1
2
)
(2.335)
where m is the frequency index taking integer values from 0 to 7 and k is the signal
index, taking integer values from 4 to +3 (see equation (2.282)). As J is half the
size of the signal, here J = 4. Matrix U
ec
is not symmetric in its arguments, so the
general DFT transform of an image g will be U
ec
gU
T
ec
, instead of UGU we had for the
case of the DFT using matrix U of equation (2.190), on page 100.
We may then construct matrix U
ec
by allowing k to take all its possible values along
each row and m to take all its possible values along each column. Then matrix 8U
ec
is:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 . . . 1
e
j
4
(
7
2
)
e
j
4
(
5
2
)
e
j
4
(
3
2
)
e
j
4
(
1
2
)
e
j
4
1
2
. . . e
j
4
7
2
e
j
4
2(
7
2
)
e
j
4
2(
5
2
)
e
j
4
2(
3
2
)
e
j
4
2(
1
2
)
e
j
4
2
1
2
. . . e
j
4
2
7
2
e
j
4
3(
7
2
)
e
j
4
3(
5
2
)
e
j
4
3(
3
2
)
e
j
4
3(
1
2
)
e
j
4
3
1
2
. . . e
j
4
3
7
2
e
j
4
4(
7
2
)
e
j
4
4(
5
2
)
e
j
4
4(
3
2
)
e
j
4
4(
1
2
)
e
j
4
4
1
2
. . . e
j
4
4
7
2
e
j
4
5(
7
2
)
e
j
4
5(
5
2
)
e
j
4
5(
3
2
)
e
j
4
5(
1
2
)
e
j
4
5
1
2
. . . e
j
4
5
7
2
e
j
4
6(
7
2
)
e
j
4
6(
5
2
)
e
j
4
6(
3
2
)
e
j
4
6(
1
2
)
e
j
4
6
1
2
. . . e
j
4
6
7
2
e
j
4
7(
7
2
)
e
j
4
7(
5
2
)
e
j
4
7(
3
2
)
e
j
4
7(
1
2
)
e
j
4
7
1
2
. . . e
j
4
7
7
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.336)
Even symmetric cosine transform 139
After simplication, this matrix becomes:
U
ec
=
1
8
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1
e
j
7
8
e
j
5
8
e
j
3
8
e
j
8
e
j
8
e
j
3
8
e
j
5
8
e
j
7
8
e
j
7
4
e
j
5
4
e
j
3
4
e
j
4
e
j
4
e
j
3
4
e
j
5
4
e
j
7
4
e
j
5
8
e
j
15
8
e
j
9
8
e
j
3
8
e
j
3
8
e
j
9
8
e
j
15
8
e
j
5
8
e
j
7
2
e
j
5
2
e
j
3
2
e
j
2
e
j
2
e
j
3
2
e
j
5
2
e
j
7
2
e
j
3
8
e
j
9
8
e
j
15
8
e
j
5
8
e
j
5
8
e
j
15
8
e
j
9
8
e
j
3
8
e
j
5
4
e
j
7
4
e
j
4
e
j
3
4
e
j
3
4
e
j
4
e
j
7
4
e
j
5
4
e
j
8
e
j
3
8
e
j
5
8
e
j
7
8
e
j
7
8
e
j
5
8
e
j
3
8
e
j
8
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.337)
Example 2.57
Compute the even symmetric cosine transform of image
g =
_
_
_
_
1 2 0 1
1 0 0 0
0 0 2 2
1 2 2 0
_
_
_
_
(2.338)
by taking the DFT of the corresponding image of size 8 8.
We start by creating rst the corresponding large image of size 8 8:
g =
_
_
_
_
_
_
_
_
_
_
_
_
0 2 2 1 1 2 2 0
2 2 0 0 0 0 2 2
0 0 0 1 1 0 0 0
1 0 2 1 1 2 0 1
1 0 2 1 1 2 0 1
0 0 0 1 1 0 0 0
2 2 0 0 0 0 2 2
0 2 2 1 1 2 2 0
_
_
_
_
_
_
_
_
_
_
_
_
(2.339)
To take the DFT of this image we multiply it from the left with matrix (2.337) and
from the right with the transpose of the same matrix. The result is:
140 Image Processing: The Fundamentals
G
ec
=
_
_
_
_
_
_
_
_
_
_
_
_
0.875 0 0.088 0 0 0 0.088 0
0.129 0.075 0.139 0.146 0 0.146 0.139 0.075
0.177 0.149 0.125 0.129 0 0.129 0.125 0.149
0.149 0.208 0.010 0.013 0 0.013 0.010 0.208
0 0 0 0 0 0 0 0
0.149 0.208 0.010 0.013 0 0.013 0.010 0.208
0.177 0.149 0.125 0.129 0 0.129 0.125 0.149
0.129 0.075 0.139 0.146 0 0.146 0.139 0.075
_
_
_
_
_
_
_
_
_
_
_
_
(2.340)
Example 2.58
Compute the (1, 2) element of the even symmetric cosine transform of im-
age (2.338) by using formula (2.334). Compare your answer with that of
example 2.57.
Applying the formula for m = 1, n = 2 and M = N = 4, we obtain:
g
ec
(1, 2) =
1
16
3
k=0
3
l=0
g(k, l) cos
_
4
_
k +
1
2
__
cos
_
2
4
_
l +
1
2
__
(2.341)
=
1
16
_
g(0, 0) cos
8
cos
4
+g(0, 1) cos
8
cos
3
4
+g(0, 3) cos
8
cos
7
4
+
g(1, 0) cos
3
8
cos
4
+g(2, 2) cos
5
8
cos
5
4
+g(2, 3) cos
5
8
cos
7
4
+
g(3, 0) cos
7
8
cos
4
+g(3, 1) cos
7
8
cos
3
4
+g(3, 2) cos
7
8
cos
5
4
_
Here we omitted terms for which g(k, l) = 0. Substituting the values of g(k, l) in
(2.341) and doing the calculation, we deduce that g
ec
(1, 2) = 0.139402656. We observe
from (2.340) that
G
ec
(1, 2) = 0.139, so the two values agree.
Example B2.59
The even symmetric cosine transform of an M-sample long signal f(k) is
dened as:
f
ec
(m)
1
M
M1
k=0
f(k) cos
m(2k + 1)
2M
(2.342)
Even symmetric cosine transform 141
Identify the period of
f
ec
(m).
The period of a function is the smallest number X for which
f
ec
(m + X) =
f
ec
(m),
for all m. If X +, the function is not periodic.
Using denition (2.342), we have:
f
ec
(m+X) =
1
M
M1
k=0
f(k) cos
(m+X)(2k + 1)
2M
=
1
M
M1
k=0
f(k) cos
_
_
_
_
m(2k + 1)
2M
. .
+
X(2k + 1)
2M
_
_
_
_
(2.343)
In order to have
f
ec
(m+X) =
f
ec
(m), we must have
cos
_
+
X(2k + 1)
2M
_
= cos (2.344)
This is only true if X(2k + 1)/(2M) is an integer multiple of 2. The rst number
for which this is guaranteed is for X = 4M. So,
f
ec
(m) is periodic with period 4M.
Example B2.60
You are given a 5-sample long signal with the following values: f(0) = 0,
f(1) = 1, f(2) = 2, f(3) = 3 and f(4) = 4. Compute its EDCT
f
ec
(m) and plot
both the extended signal and its EDCT for 60 consecutive samples.
To compute the EDCT of the data we apply formula (2.342) for M = 5. According to
example 2.59,
f
ec
(m) is periodic with period 4M = 20. So, we work out its values for
m = 0, 1, 2, . . . , 19. The values of
f
ec
(m) for one period are:
(1, 0, 0.09, 0, 0, 0, 0.09, 0, 1, 2, 1, 0, 0.09, 0, 0, 0, 0.09, 0, 1, 2) (2.345)
The extended version of the given signal is a 10-sample long signal formed by reecting
the original signal about its origin. The added samples are: f(5) = 4, f(4) = 3,
f(3) = 2, f(2) = 1 and f(1) = 0. So, the DFT of signal (4, 3, 2, 1, 0, 0, 1, 2, 3, 4)
is the EDCT of the original signal. The DFT sees the extended signal repeated ad
innitum. Figure 2.18 shows on the left the plot of 60 samples of this signal, and on
the right three periods of the EDCT of the original data.
142 Image Processing: The Fundamentals
0 20 40 60
0
1
2
3
4
0 20 40 60
2
1
0
1
2
Figure 2.18: On the left, 60 consecutive samples of the extended signal seen by the
DFT. On the right, the EDCT of the original 5-sample long signal also for 60 consec-
utive samples.
Example B2.61
Use denition (2.342) to show that
f
ec
(m) =
f
ec
(m).
By applying denition (2.342), we may write:
f
ec
(m) =
1
M
M1
k=0
f(k) cos
(m)(2k + 1)
2M
=
1
M
M1
k=0
f(k) cos
m(2k + 1)
2M
=
f
ec
(m) (2.346)
Example B2.62
If t is an integer, show that:
M1
m=M
e
j
tm
M
= 2M(t) (2.347)
Even symmetric cosine transform 143
We dene a new variable of summation m m+M m = mM. Then:
M1
m=M
e
j
tm
M
=
2M1
m=0
e
j
t( mM)
M
(2.348)
We observe that e
j
t( mM)
M
= e
j
t m
M
e
jt
. Since e
jt
= cos(t) j sin(t) = (1)
t
,
we may write:
M1
m=M
e
j
tm
M
= (1)
t
2M1
m=0
e
j
t m
M
(2.349)
The sum on the right-hand side of the above equation is a geometric progression with
rst term 1 and ratio q e
j
t
M
. We apply formula (2.165), on page 95, to compute
the sum of the rst 2M terms of it, when q ,= 1, ie when t ,= 0, and obtain:
M1
m=M
e
j
tm
M
= (1)
t
_
e
j
t
M
_
2M
1
e
j
t
M
1
= (1)
t
e
j2t
1
e
j
t
M
1
= 0 (2.350)
This is because e
j2t
= cos(2t) +j sin(2t) = 1.
If t = 0, all terms in the sum on the left-hand side of (2.347) are equal to 1, so the
sum is equal to 2M. This completes the proof of (2.347).
Box 2.9. Derivation of the inverse 1D even discrete cosine transform
The 1D EDCT is dened by (2.342). Let us dene f(k 1) f(k) for all values of
k = 0, 1, . . . , M 1. We also note that:
cos
m[2(k 1) + 1]
2M
= cos
m(2k 1)
2M
= cos
m(2k + 1)
2M
= cos
m(2k + 1)
2M
(2.351)
Then:
1
k=M
f(k) cos
m(2k + 1)
2M
=
M1
k=0
f(k) cos
m(2k + 1)
2M
(2.352)
We can see that easily by changing variable of summation in the sum on the left-hand
side from k to
k k 1. The limits of summation will become from M 1 to 0 and
the summand will not change, as f(
k 1) = f(
f
ec
(m)
1
2M
M1
k=M
f(k) cos
m(2k + 1)
2M
(2.353)
144 Image Processing: The Fundamentals
To derive the inverse transform we must solve this equation for f(k). To achieve this,
we multiply both sides of the equation with cos
m(2p+1)
2M
and sum over m from M to
M 1:
M1
m=M
f
ec
(m) cos
m(2p + 1)
2M
. .
S
=
1
2M
M1
m=M
M1
k=M
f(k) cos
m(2k + 1)
2M
cos
m(2p + 1)
2M
(2.354)
On the right-hand side we replace the trigonometric functions by using formula cos
_
e
j
+e
j
_
/2, where is real. We also exchange the order of summations, observing
that summation over m applies only to the kernel functions:
S =
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(2k+1)
2M
+e
j
m(2k+1)
2M
_ _
e
j
m(2p+1)
2M
+e
j
m(2p+1)
2M
_
=
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(2k+2p+2)
2M
+e
j
m(2k2p)
2M
+e
j
m(2k+2p)
2M
+e
j
m(2k2p2)
2M
_
=
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(k+p+1)
M
+e
j
m(kp)
M
+e
j
m(k+p)
M
+e
j
m(kp1)
M
_
(2.355)
To compute the sums over m, we make use of (2.347):
S =
1
8M
M1
k=M
f(k) [2M(k +p + 1) + 2M(k p)
+2M(k +p) + 2M(k p 1)]
=
1
4
M1
k=M
f(k) [(k +p + 1) +(k p) +(k +p) +(k p 1)]
=
1
4
M1
k=M
f(k)[2(k +p + 1) + 2(k p)]
=
1
2
M1
k=M
f(k)[(k +p + 1) +(k p)] (2.356)
We used here the property of the delta function that (x) = (x). We note that, from
all the terms in the sum, only two will survive, namely the one for k = p 1 and the
one for k = p. Given that we dened f(k 1) = f(k), both these terms will be equal,
Even symmetric cosine transform 145
ie f(p 1) = f(p), and so we shall have that S = f(p). This allows us to write the
1D inverse EDCT as:
f(p) =
M1
m=M
f
ec
(m) cos
m(2p + 1)
2M
(2.357)
We split the negative from the non-negative indices in the above sum:
f(p) =
1
m=M
f
ec
(m) cos
m(2p + 1)
2M
. .
S
+
M1
m=0
f
ec
(m) cos
m(2p + 1)
2M
(2.358)
In the rst sum we change variable of summation from m to m m m = m. The
summation limits over m are then from M to 1, or from 1 to M:
S =
M
m=1
f
ec
( m) cos
( m)(2p + 1)
2M
=
M
m=1
f
ec
( m) cos
m(2p + 1)
2M
(2.359)
Here we made use of the result of example 2.61. Using (2.359) in (2.358), we may write:
f(p) =
f
ec
(0) cos
0(2p + 1)
2M
+
f
ec
(M) cos
M(2p + 1)
2M
+ 2
M1
m=1
f
ec
(m) cos
m(2p + 1)
2M
(2.360)
We note that
cos
M(2p + 1)
2M
= cos
(2p + 1)
2
= 0 (2.361)
since the cosine of an odd multiple of /2 is always 0. Finally, we may write for the
inverse 1D EDCT
f(p) =
f
ec
(0) + 2
M1
m=1
f
ec
(m) cos
m(2p + 1)
2M
=
M1
m=0
C(m)
f
ec
(m) cos
m(2p + 1)
2M
(2.362)
where C(0) = 1 and C(m) = 2 for m = 1, 2, . . . , M 1.
What is the inverse 2D even cosine transform?
The inverse of equation (2.334) is
f(k, l) =
M1
m=0
N1
n=0
C(m)C(n)
f
ec
(m, n) cos
m(2k + 1)
2M
cos
n(2l + 1)
2N
(2.363)
where C(0) = 1 and C(m) = C(n) = 2 for m, n ,= 0.
146 Image Processing: The Fundamentals
What are the basis images in terms of which the even cosine transform expands
an image?
In equation (2.363), we may view function
T
m
(k) C(m) cos
m(2k + 1)
2M
(2.364)
as a function of k with parameter m. Then the basis functions in terms of which an M N
image is expanded are the vector outer products of vector functions T
m
(k)T
T
n
(l), where k =
0, . . . , M 1 and l = 0, . . . , N 1. For xed (m, n) this vector outer product creates an
elementary image of size MN. Coecient
f
ec
(m, n) in (2.363) tells us the degree to which
this elementary image is present in the original image f(k, l).
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.19: The basis images in terms of which any 8 8 image is expanded by EDCT.
The numbers on the left and at the top indicate the indices of the T
m
(k) functions, the outer
product of which resulted in the corresponding basis image. For example, the image in line 3
and column 0 corresponds to T
3
T
T
0
, where the elements of these vectors are given by (2.364)
for k = 0, 1, . . . , 7.
Figure 2.19 shows the elementary images in terms of which any 8 8 image is expanded
by the EDCT. These images have been produced by setting M = 8 in (2.364) and allowing
parameter m to take values 0, . . . , 7. For every value of m we have a dierent function T
m
(k).
Each one of these functions is then sampled at values of k = 0, . . . , 7 to form an 8 1 vector.
The plots of these eight functions are shown in gure 2.20.
Even symmetric cosine transform 147
7 5 3 1
0
k
2
1
1
2
0
T (k)
7 5 3 1
0
k
2
1
1
2
T (k)
1
7 5 3 1
0
k
2
1
1
2
T (k)
2
7 5 3 1
0
k
2
1
1
2
T (k)
3
7 5 3 1
0
k
2
1
1
2
T (k)
4
7 5 3 1
0
k
2
1
1
2
T (k)
5
7 5 3 1
0
k
2
1
1
2
T (k)
6
7 5 3 1
0
k
2
1
1
2
T (k)
7
Figure 2.20: These plots are the digitised versions of T
m
(k) for m = 0, 1, . . . , 7, from top left
to bottom right, respectively. Continuous valued functions T
m
(k) dened by equation (2.364)
are sampled for integer values of k to form vectors. The outer product of these vectors in all
possible combinations form the basis images of gure 2.19. In these plots the values of the
functions at non-integer arguments are rounded to the nearest integer.
Figure 2.19 shows along the left and at the top which function T
m
(k) (identied by index
m) was multiplied with which other function to create the corresponding elementary image.
Each one of these elementary images is then scaled individually so that its grey values range
from 1 to 255.
148 Image Processing: The Fundamentals
Example 2.63
Take the EDCT transform of image (2.103), on page 69, and show the
various approximations of it by reconstructing it using only the rst 1, 4,
9, etc elementary images in terms of which it is expanded.
The eight images shown in gure 2.21 are the reconstructed images when for the recon-
struction the basis images created from the rst one, two,. . ., eight functions T
m
(k)
are used. For example, gure 2.21f has been reconstructed from the inverse EDCT
transform, by setting to 0 all elements of the transformation matrix that multiply the
basis images in the bottom two rows and the two right-most columns in gure 2.19.
These omitted basis images are those that are created from functions T
6
(k) and T
7
(k).
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.21: Approximate reconstructions of the ower image, by keeping only the
coecients that multiply the basis images produced by the outer products of the T
m
T
T
n
vectors dened by equation (2.364), for all possible combinations of m and n, when m
and n are allowed to take the value of 0 only, values 0, 1, values 0, 1, 2, etc, from
top left to bottom right, respectively. In these reconstructions, values smaller than 0
and larger than 255 were truncated to 0 and 255, respectively, for displaying purposes.
The sum of the square errors for each reconstructed image is as follows.
Square error for image 2.21a: 366394
Square error for image 2.21b: 338683
Square error for image 2.21c: 216608
Square error for image 2.21d: 173305
Square error for image 2.21e: 104094
Square error for image 2.21f: 49179
Square error for image 2.21g: 35662
Square error for image 2.21h: 0
Odd symmetric cosine transform 149
2.5 The odd symmetric discrete cosine
transform (ODCT)
What is the odd symmetric discrete cosine transform?
Assume that we have an MN image f and reect it about its left-most column and about its
topmost row so that we have a (2M1)(2N1) image. The DFT of the (2M1)(2N1)
image will be real (see example 2.54) and given by:
f
oc
(m, n)
1
(2M 1)(2N 1)
_
f(0, 0) + 4
M1
k=1
N1
l=1
f(k, l) cos
2mk
2M 1
cos
2nl
2N 1
+
2
M1
k=1
f(k, 0) cos
2mk
2M 1
+ 2
N1
l=1
f(0, l) cos
2nl
2N 1
_
(2.365)
In a more concise way, this may be written as
f
oc
(m, n)
1
(2M 1)(2N 1)
M1
k=0
N1
l=0
C(k)C(l)f(k, l) cos
2mk
2M 1
cos
2nl
2N 1
(2.366)
where C(0) = 1 and C(k) = C(l) = 2 for k, l ,= 0. This is the odd symmetric discrete
cosine transform (ODCT) of the original image.
Example 2.64
Compute the odd symmetric cosine transform of image
g =
_
_
_
_
1 2 0 1
1 0 0 0
0 0 2 2
1 2 2 0
_
_
_
_
(2.367)
by taking the DFT of the corresponding image of size 7 7.
We start by creating rst the corresponding large image of size 7 7:
g =
_
_
_
_
_
_
_
_
_
_
0 2 2 1 2 2 0
2 2 0 0 0 2 2
0 0 0 1 0 0 0
1 0 2 1 2 0 1
0 0 0 1 0 0 0
2 2 0 0 0 2 2
0 2 2 1 2 2 0
_
_
_
_
_
_
_
_
_
_
(2.368)
150 Image Processing: The Fundamentals
To take the DFT of this image we have to multiply it from left and right with the appro-
priate matrix U for images of these dimensions. We create this matrix using denition
(2.286). Here J = 3 and the elements of matrix U are given by
1
7
e
j2mk/7
, where
k takes values 3, 2, 1, 0, 1, 2, 3 along each row and m takes values 0, 1, 2, 3, 4, 5, 6
along each column.
U
oc
=
1
7
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1
e
j
6
7
e
j
4
7
e
j
2
7
1 e
j
2
7
e
j
4
7
e
j
6
7
e
j
12
7
e
j
8
7
e
j
4
7
1 e
j
4
7
e
j
8
7
e
j
12
7
e
j
4
7
e
j
12
7
e
j
6
7
1 e
j
6
7
e
j
12
7
e
j
4
7
e
j
10
7
e
j
2
7
e
j
8
7
1 e
j
8
7
e
j
2
7
e
j
10
7
e
j
2
7
e
j
6
7
e
j
10
7
1 e
j
10
7
e
j
6
7
e
j
2
7
e
j
8
7
e
j
10
7
e
j
12
7
1 e
j
12
7
e
j
10
7
e
j
8
7
_
_
_
_
_
_
_
_
_
_
_
(2.369)
We use this matrix to compute U gU
T
. We keep separate the real and imaginary parts.
We observe that the imaginary part turns out to be 0.
G
oc
=
_
_
_
_
_
_
_
_
_
_
0.878 0.002 0.119 0.040 0.040 0.119 0.002
0.235 0.005 0.192 0.047 0.047 0.192 0.005
0.069 0.257 0.029 0.133 0.133 0.029 0.257
0.228 0.140 0.006 0.057 0.057 0.006 0.140
0.228 0.140 0.006 0.057 0.057 0.006 0.140
0.069 0.257 0.029 0.133 0.133 0.029 0.257
0.235 0.005 0.192 0.047 0.047 0.192 0.005
_
_
_
_
_
_
_
_
_
_
(2.370)
Example 2.65
Compute the (1, 2) element of the odd symmetric cosine transform of im-
age (2.367) by using formula (2.365). Compare your answer with that of
example 2.64.
Applying the formula for m = 1, n = 2 and M = N = 4, we obtain
g
oc
(1, 2)
1
49
3
k=0
3
l=0
C(k)C(l)g(k, l) cos
2k
7
cos
22l
7
(2.371)
where C(0) = 1 and C(k) = C(l) = 2 for k, l ,= 0. Expanding the sums and keeping
only the nonzero elements, we deduce:
Odd symmetric cosine transform 151
g
oc
(1, 2) =
1
49
_
g(0, 0) + 2g(0, 1) cos
4
7
+ 2g(0, 3) cos
12
7
+ 2g(1, 0) cos
2
7
+
4g(2, 2) cos
4
7
cos
8
7
+ 4g(2, 3) cos
4
7
cos
12
7
+ 2g(3, 0) cos
6
7
+
4g(3, 1) cos
6
7
cos
4
7
+ 4g(3, 2) cos
6
7
cos
8
7
_
(2.372)
Substituting the values of g(k, l) and performing the calculation, we deduce that
g
oc
(1, 2) = 0.191709.
We see from (2.370) that the value of
G
oc
(1, 2) = 0.192.
Example B2.66
The odd symmetric cosine transform of an M-sample long signal f(k) is
dened as
f
oc
(m)
1
2M 1
M1
k=0
C(k)f(k) cos
2mk
2M 1
(2.373)
where C(0) = 1 and C(k) = 2 for k ,= 0. Identify the period of
f
oc
(m).
The period of a function is the smallest number X for which
f
oc
(m + X) =
f
oc
(m),
for all m.
Using denition (2.373), we have:
f
oc
(m+X) =
1
2M 1
M1
k=0
C(k)f(k) cos
2(m+X)k
2M 1
=
1
2M 1
M1
k=0
C(k)f(k) cos
_
2mk
2M 1
. .
+
2Xk
2M 1
_
(2.374)
In order to have
f
oc
(m+X) =
f
oc
(m), we must have:
cos
_
+
2Xk
2M 1
_
= cos (2.375)
This is only true if 2Xk/(2M 1) is an integer multiple of 2. The rst number
for which this is guaranteed is for X = 2M 1. So,
f
oc
(m) is periodic with period
2M 1.
152 Image Processing: The Fundamentals
Example B2.67
Show that
M1
m=M+1
e
j
2tm
2M1
= (2M 1)(t) (2.376)
for t integer.
We dene a new summation variable m m + M 1 m = mM + 1. Then we
have
M1
m=M+1
e
j
2tm
2M1
=
2M2
m=0
e
j
2t( mM+1)
2M1
= e
j
2t(M+1)
2M1
2M2
m=0
e
j
2t m
2M1
= (2M 1)(t)
(2.377)
where we made use of (2.164), on page 95, with S = 2M 1 and the fact that for
t = 0, e
j
2t(M+1)
2M1
= 1.
Box 2.10. Derivation of the inverse 1D odd discrete cosine transform
The 1D ODCT is dened by (2.373). Let us dene f(k) f(k) for values of k =
1, . . . , M 1. As the cosine function is an even function with respect to its argument
k, we may rewrite denition (2.373) as:
f
oc
(m) =
1
2M 1
_
f(0) + 2
M1
k=1
f(k) cos
2mk
2M 1
_
=
1
2M 1
M1
k=M+1
f(k) cos
2mk
2M 1
(2.378)
To derive the inverse transform we must solve this equation for f(k). To achieve this,
we multiply both sides of the equation with cos
2mp
2M1
and sum over m from M +1 to
M 1:
M1
m=M+1
f
oc
(m) cos
2mp
2M 1
. .
S
=
1
2M 1
M1
m=M+1
M1
k=M+1
f(k) cos
2mk
2M 1
cos
2mp
2M 1
(2.379)
On the right-hand side we replace the trigonometric functions by using formula cos
_
e
j
+e
j
_
/2, where is real. We also exchange the order of summations, observing
that summation over m applies only to the kernel functions:
Odd symmetric cosine transform 153
S =
1
4(2M 1)
M1
k=M+1
f(k)
M1
m=M+1
_
e
j
2mk
2M1
+e
j
2mk
2M1
_ _
e
j
2mp
2M1
+e
j
2mp
2M1
_
=
1
4(2M 1)
M1
k=M+1
f(k)
M1
m=M+1
_
e
j
2m(k+p)
2M1
+e
j
2m(kp)
2M1
+e
j
2m(k+p)
2M1
+e
j
2m(kp)
2M1
_
(2.380)
To compute the sums over m, we make use of (2.376):
S =
1
4(2M 1)
M1
k=M+1
f(k) [(2M 1)(k +p) + (2M 1)(k p)
+(2M 1)(k +p) + (2M 1)(k p)]
=
1
4
M1
k=M+1
f(k) [(k +p) +(k p) +(k +p) +(k p)]
=
1
4
M1
k=M+1
f(k)[2(k +p) + 2(k p)]
=
1
2
M1
k=M+1
f(k)[(k +p) +(k p)] (2.381)
We used here the property of the delta function that (x) = (x). We note that, from
all the terms in the sum, only two will survive, namely the one for k = p and the one
for k = p. Given that we dened f(k) = f(k), both these terms will be equal and so
we shall have S = f(p). This allows us to write the 1D inverse ODCT as:
f(p) =
M1
m=M+1
f
oc
(m) cos
2mp
2M 1
(2.382)
From denition (2.373) it is obvious that
f
oc
(m) =
f
oc
(m). The cosine function is
also an even function of m, so we may write
f(p) =
f
oc
(0) + 2
M1
m=1
f
oc
(m) cos
2mp
2M 1
(2.383)
or, in a more concise way,
f(p) =
M1
m=0
C(m)
f
oc
(m) cos
2mp
2M 1
(2.384)
where C(0) = 1 and C(m) = 2 for m ,= 0.
154 Image Processing: The Fundamentals
What is the inverse 2D odd discrete cosine transform?
The inverse of equation (2.365) is
f(k, l) =
M1
m=0
N1
n=0
C(m)C(n)
f
oc
(m, n) cos
2mk
2M 1
cos
2nl
2N 1
(2.385)
where C(0) = 1 and C(m) = C(n) = 2 for m, n ,= 0.
What are the basis images in terms of which the odd discrete cosine transform
expands an image?
In equation (2.385), we may view function
U
m
(k) C(m) cos
2mk
2M 1
(2.386)
as a function of k with parameter m. Then the basis functions, in terms of which an M N
image is expanded, are the vector outer products of vector functions U
m
(k)U
T
n
(l), where
k = 0, . . . , M 1 and l = 0, . . . , N 1. For xed (m, n), each such vector outer product
creates an elementary image of size M N. Coecient
f
oc
(m, n) in (2.385) tells us the
degree to which this elementary image is present in the original image f(k, l).
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Figure 2.22: The basis images in terms of which ODCT expands any 8 8 image. The
numbers on the left and at the top are the indices n and m, respectively, of the functions
dened by (2.386), the vector outer product of which, U
m
U
T
n
, is the corresponding elementary
image.
Odd symmetric cosine transform 155
Figure 2.22 shows the elementary images in terms of which any 8 8 image is expanded
by the ODCT. These images have been produced by setting M = 8 in (2.386) and allowing
parameter m to take values 0, . . . , 7. For every value of m we have a dierent function U
m
(k).
Each one of these functions is then sampled at values of k = 0, . . . , 7 to form an 8 1 vector.
The plots of these eight functions are shown in gure 2.23.
7 5 3 1
0
k
2
1
1
2
U (k)
0
7 5 3 1
0
k
2
1
1
2
U (k)
1
7 5 3 1
0
k
2
1
1
2
U (k)
2
7 5 3 1
0
k
2
1
1
2
U (k)
3
7 5 3 1
0
k
2
1
1
2
U (k)
4
7 5 3 1
0
k
2
1
1
2
U (k)
5
7 5 3 1
0
k
2
1
1
2
U (k)
6
7 5 3 1
0
k
2
1
1
2
U (k)
7
Figure 2.23: Functions U
m
(k) dened by (2.386), for m = 0, 1, . . . , 7, from top left to bottom
right, respectively. Values of non-integer arguments are rounded to the nearest integer. The
sampled versions of these functions at integer values of k are used to create the basis images
of gure 2.22, by taking their vector outer product in all possible combinations.
Figure 2.22 shows along the left and at the top which function U
m
(k) (identied by index
m) was multiplied with which other function to create the corresponding elementary image.
Each one of these elementary images is then scaled individually so that its grey values range
from 1 to 255.
156 Image Processing: The Fundamentals
Example 2.68
Take the ODCT transform of image (2.103), on page 69, and show the
various approximations of it, by reconstructing it using only the rst 1, 4,
9, etc elementary images in terms of which it is expanded.
The eight images shown in gure 2.24 are the reconstructed images, when, for the
reconstruction, the basis images created from the rst one, two,. . ., eight functions
U
m
(k) are used. For example, gure 2.24f has been reconstructed from the inverse
ODCT transform, by setting to 0 all elements of the transformation matrix that mul-
tiply the basis images in the bottom two rows and the two right-most columns in gure
2.22. The omitted basis images are those that are created from functions U
6
(k) and
U
7
(k).
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.24: Gradually improved approximations of the ower image as more and
more terms in its expansion, in terms of the basis images of gure 2.22, are retained,
starting from the at image at the top left corner and gradually adding one row and
one column of images at a time, until the bottom-most row and the right-most column
are added. In the approximate reconstructions, negative values and values larger than
255 were truncated to 0 and 255, respectively, for displaying purposes.
The sum of the square errors for each reconstructed image is as follows.
Square error for image 2.24a: 368946
Square error for image 2.24b: 342507
Square error for image 2.24c: 221297
Square error for image 2.24d: 175046
Square error for image 2.24e: 96924
Square error for image 2.24f: 55351
Square error for image 2.24g: 39293
Square error for image 2.24h: 0
Even sine transform 157
2.6 The even antisymmetric discrete sine
transform (EDST)
What is the even antisymmetric discrete sine transform?
Assume that we have an M N image f, change its sign and reect it about its left and top
border so that we have a 2M 2N image. The DFT of the 2M 2N image will be real and
given by (see example 2.53):
f
es
(m, n)
1
MN
M1
k=0
N1
l=0
f(k, l) sin
m(2k + 1)
2M
sin
n(2l + 1)
2N
(2.387)
This is the even antisymmetric discrete sine transform (EDST) of the original image.
Example 2.69
Compute the even antisymmetric sine transform of image
g =
_
_
_
_
1 2 0 1
1 0 0 0
0 0 2 2
1 2 2 0
_
_
_
_
(2.388)
by taking the DFT of the corresponding enlarged image of size 8 8.
We start by creating rst the corresponding large image of size 8 8:
g =
_
_
_
_
_
_
_
_
_
_
_
_
0 2 2 1 1 2 2 0
2 2 0 0 0 0 2 2
0 0 0 1 1 0 0 0
1 0 2 1 1 2 0 1
1 0 2 1 1 2 0 1
0 0 0 1 1 0 0 0
2 2 0 0 0 0 2 2
0 2 2 1 1 2 2 0
_
_
_
_
_
_
_
_
_
_
_
_
(2.389)
To take the DFT of this image we multiply it from the left with matrix U given by
(2.337) and from the right with the transpose of the same matrix. The result is:
158 Image Processing: The Fundamentals
G
es
=
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 0 0 0 0 0
0 0.333 0.072 0.127 0.034 0.127 0.072 0.333
0 0.163 0.188 0.068 0.088 0.068 0.188 0.163
0 0.315 0.173 0.021 0.082 0.021 0.173 0.315
0 0.048 0.177 0.115 0.250 0.115 0.177 0.048
0 0.315 0.173 0.021 0.082 0.021 0.173 0.315
0 0.163 0.188 0.068 0.088 0.068 0.188 0.163
0 0.333 0.072 0.127 0.034 0.127 0.072 0.333
_
_
_
_
_
_
_
_
_
_
_
_
(2.390)
Example 2.70
Compute the (1, 2) element of the even antisymmetric sine transform of
image (2.388) by using formula (2.387). Compare your answer with that
of example 2.69.
Applying the formula for m = 1, n = 2 and M = N = 4, we obtain:
g
es
(1, 2) =
1
16
3
k=0
3
l=0
g(k, l) sin
(2k + 1)
8
sin
2(2l + 1)
8
=
1
16
_
g(0, 0) sin
8
sin
4
+g(0, 1) sin
8
sin
3
4
+g(0, 3) sin
8
sin
7
4
+g(1, 0) sin
3
8
sin
4
+g(2, 2) sin
5
8
sin
5
4
+g(2, 3) sin
5
8
sin
7
4
+g(3, 0) sin
7
8
sin
4
+g(3, 1) sin
7
8
sin
3
4
+g(3, 2) sin
7
8
sin
5
4
_
(2.391)
Here we omitted terms for which g(k, l) = 0. Substituting the values of g(k, l) in
(2.391) and performing the calculation, we deduce that g
es
(1, 2) = 0.0718.
We note from (2.390) that the (1, 2) element of
G
es
is 0.072.
Even sine transform 159
Example B2.71
The even antisymmetric sine transform of an M-sample long signal f(k) is
dened as:
f
es
(m) j
1
M
M1
k=0
f(k) sin
m(2k + 1)
2M
(2.392)
Identify the period of
f
es
(m).
The period of a function is the smallest number X for which
f
es
(m + X) =
f
es
(m),
for all m.
Using denition (2.392), we have:
f
es
(m+X) = j
1
M
M1
k=0
f(k) sin
(m+X)(2k + 1)
2M
= j
1
M
M1
k=0
f(k) sin
_
m(2k + 1)
2M
. .
+
X(2k + 1)
2M
_
(2.393)
In order to have
f
es
(m+X) =
f
es
(m), we must have:
sin
_
+
X(2k + 1)
2M
_
= sin (2.394)
This is only true if X(2k + 1)/(2M) is an integer multiple of 2. The rst number
for which this is guaranteed is for X = 4M. So,
f
es
(m) is periodic with period 4M.
Example B2.72
You are given a 5-sample long signal with the following values: f(0) = 0,
f(1) = 1, f(2) = 2, f(3) = 3 and f(4) = 4. Compute its EDST
f
es
(m) and plot
both the extended signal and its EDST, for 50 consecutive samples.
The extended signal we create is 4, 3, 2, 1, 0, 0, 1, 2, 3, 4. DFT sees this signal
repeated ad innitum. Since M = 5 here, the EDST of the original signal has period
20. The values of
f
es
(m) for one period are:
160 Image Processing: The Fundamentals
(1.29, 0.85, 0.49, 0.53, 0.4, 0.53, 0.49, 0.85, 1.29, 0,
1.29, 0.85, 0.49, 0.53, 0.4, 0.53, 0.49, 0.85, 1.29, 0)
0 25 50
4
2
0
2
4
0 25 50
2
1
0
1
2
Figure 2.25: On the left, 50 consecutive samples of the extended signal as seen by
the DFT. On the right, the EDST of the original 5-sample long signal, also for 50
consecutive samples.
Figure 2.25 shows 50 samples of the signal as seen by the DFT and 2.5 periods of the
EDST of the original signal.
Box 2.11. Derivation of the inverse 1D even discrete sine transform
The 1D EDST is dened by (2.392). Let us dene f(k 1) f(k) for all values of
k = 0, 1, . . . , M 1. We also note that:
sin
m(2(k 1) + 1)
2M
= sin
m(2k 1)
2M
= sin
m(2k + 1)
2M
= sin
m(2k + 1)
2M
(2.395)
Then:
1
k=M
f(k) sin
m(2k + 1)
2M
=
M1
k=0
f(k) sin
m(2k + 1)
2M
(2.396)
We can see that easily by changing variable of summation in the sum on the left-hand
side from k to
k k 1. The limits of summation will become from M 1 to 0 and
the summand will not change, as f(
k 1) = f(
f
es
(m) j
1
2M
M1
k=M
f(k) sin
m(2k + 1)
2M
(2.397)
To derive the inverse transform we must solve this equation for f(k). To achieve this
we multiply both sides of the equation with j sin
m(2p+1)
2M
and sum over m from M
to M 1:
j
M1
m=M
f
es
(m) sin
m(2p + 1)
2M
. .
S
=
1
2M
M1
m=M
M1
k=M
f(k) sin
m(2k + 1)
2M
sin
m(2p + 1)
2M
(2.398)
On the right-hand side we replace the trigonometric functions by using formula sin
_
e
j
e
j
_
/(2j), where is real. We also exchange the order of summations, observ-
ing that summation over m applies only to the kernel functions:
S =
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(2k+1)
2M
e
j
m(2k+1)
2M
_ _
e
j
m(2p+1)
2M
e
j
m(2p+1)
2M
_
=
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(2k+2p+2)
2M
e
j
m(2k2p)
2M
e
j
m(2k+2p)
2M
+e
j
m(2k2p2)
2M
_
=
1
8M
M1
k=M
f(k)
M1
m=M
_
e
j
m(k+p+1)
M
e
j
m(kp)
M
e
j
m(k+p)
M
+e
j
m(kp1)
M
_
(2.399)
To compute the sums over m, we make use of (2.347), on page 142:
S =
1
8M
M1
k=M
f(k) [2M(k +p + 1) 2M(k p)
2M(k +p) + 2M(k p 1)]
=
1
4
M1
k=M
f(k) [(k +p + 1) (k p) (k +p) +(k p 1)]
=
1
4
M1
k=M
f(k)[2(k +p + 1) 2(k p)]
=
1
2
M1
k=M
f(k)[(k +p + 1) (k p)] (2.400)
162 Image Processing: The Fundamentals
We used here the property of the delta function that (x) = (x). We note that, from
all the terms in the sum, only two will survive, namely the one for k = p 1 and the
one for k = p. Given that we dened f(k 1) = f(k), both these terms will be
equal, ie f(p 1) = f(p), and so we shall obtain S = 2f(p). This allows us to write
the 1D inverse EDST as:
f(p) = j
M1
m=M
f
es
(m) sin
m(2p + 1)
2M
(2.401)
We split the negative from the non-negative indices in the above sum:
f(p) = j
_
1
m=M
f
es
(m) sin
m(2p + 1)
2M
. .
S
+
M1
m=0
f
es
(m) sin
m(2p + 1)
2M
_
(2.402)
In the rst sum we change the variable of summation from m to m m m = m.
The summation limits over m are then from M to 1, or from 1 to M:
S =
M
m=1
f
es
( m) sin
( m)(2p + 1)
2M
=
M
m=1
f
es
( m) sin
m(2p + 1)
2M
(2.403)
Here we made use of the fact that the sine function is antisymmetric with respect to m
and so is
f
es
( m) if we look at its denition. So their product is symmetric with respect
to change of sign of m. Using (2.403) into (2.402), we may write:
f(p) = j
f
es
(0) sin
0(2p + 1)
2M
+j
f
es
(M) sin
M(2p + 1)
2M
+j2
M1
m=1
f
es
(m) sin
m(2p + 1)
2M
(2.404)
Finally, we may write for the inverse 1D EDST
f(p) = j
M
m=1
S(m)
f
es
(m) sin
m(2p + 1)
2M
(2.405)
where S(M) = 1 and S(m) = 2 for m ,= M.
What is the inverse 2D even sine transform?
The inverse of equation (2.387) is
f(k, l) =
M
m=1
N
n=1
S(m)S(n)
f
es
(m, n) sin
m(2k + 1)
2M
sin
n(2l + 1)
2N
(2.406)
where S(M) = 1, S(N) = 1, and S(m) = S(n) = 2 for m ,= M, n ,= N.
Even sine transform 163
What are the basis images in terms of which the even sine transform expands an
image?
In equation (2.406), we may view function
V
m
(k) jS(m) sin
m(2k + 1)
2M
(2.407)
as a function of k with parameter m. Then the basis functions, in terms of which an M N
image is expanded, are the vector outer products of vector functions V
m
(k)V
T
n
(l), where
k = 0, . . . , M 1 and l = 0, . . . , N 1. For xed (m, n), such a vector outer product creates
an elementary image of size M N. Coecient
f
es
(m, n) in (2.406) tells us the degree to
which this elementary image is present in the original image f(k, l).
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Figure 2.26: The basis images in terms of which EDST expands an 8 8 image. These basis
images are the vector outer products of imaginary functions. The numbers on the left and
at the top are indices m in (2.407), identifying which functions produced the corresponding
basis image. Note that this basis does not include a at image, ie there is no dc component.
This means that, for best results, the mean of the image that is to be expanded in terms of
these functions should be removed before the expansion.
Figure 2.26 shows the elementary images in terms of which any 8 8 image is expanded
by the EDST. These images have been produced by setting M = 8 in (2.407) and allowing
parameter m to take values 1, . . . , 8. For every value of m, we have a dierent function V
m
(k).
Each one of these functions is then sampled at values of k = 0, . . . , 7 to form an 8 1 vector.
The plots of these eight functions are shown in gure 2.27.
164 Image Processing: The Fundamentals
7 5 3 1
0
k
V (k)
2
1
1
2
1
7 5 3 1
0
k
V (k)
2
1
1
2
2
7 5 3 1
0
k
V (k)
2
1
1
2
3
7 5 3 1
0
k
2
1
1
2
V (k)
4
7 5 3 1
0
k
2
1
1
2
V (k)
5
7 5 3 1
0
k
2
1
V (k)
6
1
2
7 5 3 1
0
k
2
1
1
2
V (k)
7
7 5 3 1
0
k
2
1
1
2
V (k)
8
Figure 2.27: These are the discretised versions of imaginary functions V
m
(k), given by (2.407).
The vector outer product of all possible combinations of them produce the basis images shown
in gure 2.26, useful for the expansion of any 8 8 image. Note that the basis images are
real because they are the products of the multiplications of two purely imaginary functions.
Figure 2.26 shows along the left and at the top which function V
m
(k) (identied by index
m) was multiplied with which other function to create the corresponding elementary image.
Each one of these elementary images is then scaled individually so that its grey values range
from 1 to 255.
Even sine transform 165
Example 2.73
Take the EDST transform of image (2.103), on page 69, and show the
various approximations of it, by reconstructing it using only the rst 1, 4,
9, etc elementary images in terms of which it is expanded.
Before we apply EDST, we remove the mean from all pixels of the image. After
each reconstruction, and before we calculate the reconstruction error, we add the mean
to all pixels. The eight images shown in gure 2.28 are the reconstructed images
when, for the reconstruction, the basis images created from the rst one, two,. . ., eight
functions V
m
(k) are used. For example, gure 2.28f has been reconstructed from the
inverse EDST transform, by setting to 0 all elements of the transformation matrix that
multiply the basis images in the bottom two rows and the two right-most columns in
gure 2.26. The omitted basis images are those that are created from functions V
7
(k)
and V
8
(k).
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.28: Successive approximations of the ower image by retaining an increasing
number of basis functions, from m = 1 to m = 8, from top left to bottom right,
respectively. For example, panel (b) was created by keeping only the coecients that
multiply the four basis images at the top left corner of gure 2.26. Values smaller
than 0 and larger than 255 were truncated to 0 and 255, respectively, for displaying
purposes.
The sum of the square errors for each reconstructed image is as follows.
Square error for image 2.28a: 341243
Square error for image 2.28b: 328602
Square error for image 2.28c: 259157
Square error for image 2.28d: 206923
Square error for image 2.28e: 153927
Square error for image 2.28f: 101778
Square error for image 2.28g: 55905
Square error for image 2.28h: 0
166 Image Processing: The Fundamentals
What happens if we do not remove the mean of the image before we compute its
EDST?
The algorithm will work perfectly well even if we do not remove the mean of the image,
but the approximation error, at least for the reconstructions that are based only on the rst
few components, will be very high. Figure 2.29 shows the successive reconstructions of the
ower image without removing the mean before the transformation is taken. The various
approximations of the image should be compared with those shown in gure 2.28. The
corresponding approximation errors are:
Square error for image 2.29a: 1550091
Square error for image 2.29b: 1537450
Square error for image 2.29c: 749053
Square error for image 2.29d: 696820
Square error for image 2.29e: 342055
Square error for image 2.29f: 289906
Square error for image 2.29g: 55905
Square error for image 2.29h: 0
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 2.29: Successive approximations of the ower image by retaining an increasing number
of basis functions, from m = 1 to m = 8, from top left to bottom right, respectively. In this
case the mean value of the image was not removed before the transformation was computed.
Odd sine transform 167
2.7 The odd antisymmetric discrete sine
transform (ODST)
What is the odd antisymmetric discrete sine transform?
Assume that we have an M N image f, change sign and reect it about its left and top
border and also insert a row and a column of 0s along the reection lines, so that we have a
(2M + 1) (2N + 1) image. The DFT of the (2M + 1) (2N + 1) image will be real (see
example 2.55) and given by:
4
(2M + 1)(2N + 1)
M
k=1
N
l=1
f(
k,
l) sin
2m
k
2M + 1
sin
2n
l
2N + 1
(2.408)
Note that here indices
k and
l are not the indices of the original image, which were running
from 0 to M 1 and N 1, respectively. Because of the insertion of the row and column of
0s, the indices have been shifted by 1. In order to retain the original indices, we dene the
odd discrete sine transform (ODST) of the original image as:
f
os
(m, n)
4
(2M + 1)(2N + 1)
M1
k=0
N1
l=0
f(k, l) sin
2m(k + 1)
2M + 1
sin
2n(l + 1)
2N + 1
(2.409)
Example 2.74
Compute the odd antisymmetric sine transform of image
g =
_
_
_
_
1 2 0 1
1 0 0 0
0 0 2 2
1 2 2 0
_
_
_
_
(2.410)
by taking the DFT of the corresponding image of size 9 9.
We start by creating rst the corresponding large image of size 9 9:
g =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 2 2 1 0 1 2 2 0
2 2 0 0 0 0 0 2 2
0 0 0 1 0 1 0 0 0
1 0 2 1 0 1 2 0 1
0 0 0 0 0 0 0 0 0
1 0 2 1 0 1 2 0 1
0 0 0 1 0 1 0 0 0
2 2 0 0 0 0 0 2 2
0 2 2 1 0 1 2 2 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.411)
168 Image Processing: The Fundamentals
To take the DFT of this image we multiply it from the left with the appropriate matrix
U and from the right with its transpose. We create this matrix using denition (2.286),
on page 127. Here J = 4 and the elements of matrix U are given by
1
9
e
j2mk/9
where k takes values 4, 3, 2, 1, 0, 1, 2, 3, 4 along each row and m takes values
0, 1, 2, 3, 4, 5, 6, 7, 8 along each column.
U
os
=
1
9
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 1
e
j
8
9
e
j
6
9
e
j
4
9
e
j
2
9
1 e
j
2
9
e
j
4
9
e
j
6
9
e
j
8
9
e
j
16
9
e
j
12
9
e
j
8
9
e
j
4
9
1 e
j
4
9
e
j
8
9
e
j
12
9
e
j
16
9
e
j
6
9
e
j
18
9
e
j
12
9
e
j
6
9
1 e
j
6
9
e
j
12
9
e
j
18
9
e
j
6
9
e
j
14
9
e
j
6
9
e
j
16
9
e
j
8
9
1 e
j
8
9
e
j
16
9
e
j
6
9
e
j
14
9
e
j
4
9
e
j
12
9
e
j
2
9
e
j
10
9
1 e
j
10
9
e
j
2
9
e
j
12
9
e
j
4
9
e
j
12
9
e
j
18
9
e
j
6
9
e
j
12
9
1 e
j
12
9
e
j
6
9
e
j
18
9
e
j
12
9
e
j
2
9
e
j
6
9
e
j
10
9
e
j
14
9
1 e
j
14
9
e
j
10
9
e
j
6
9
e
j
2
9
e
j
10
9
e
j
12
9
e
j
14
9
e
j
16
9
1 e
j
16
9
e
j
14
9
e
j
12
9
e
j
10
9
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.412)
G
os
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 0 0 0 0 0 0
0 0.302 0.050 0.102 0.041 0.041 0.102 0.050 0.302
0 0.087 0.198 0.032 0.103 0.103 0.032 0.198 0.087
0 0.285 0.001 0.074 0.063 0.063 0.074 0.001 0.285
0 0.078 0.140 0.089 0.092 0.092 0.089 0.140 0.078
0 0.078 0.140 0.089 0.092 0.092 0.089 0.140 0.078
0 0.285 0.001 0.074 0.063 0.063 0.074 0.001 0.285
0 0.087 0.198 0.032 0.103 0.103 0.032 0.198 0.087
0 0.302 0.050 0.102 0.041 0.041 0.102 0.050 0.302
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(2.413)
Example 2.75
Compute the (1, 2) element of the odd antisymmetric sine transform of
image (2.410) by using formula (2.408). Compare your answer with that
of example 2.74.
Applying the formula for m = 1, n = 2 and M = N = 4, we obtain:
g
os
(1, 2) =
4
81
3
k=0
3
l=0
g(k, l) sin
2(k + 1)
9
sin
22(l + 1)
9
(2.414)
Odd sine transform 169
Or:
g
os
(1, 2) =
4
81
_
g(0, 0) sin
2
9
sin
4
9
+g(0, 1) sin
2
9
sin
8
9
+g(0, 3) sin
2
9
sin
16
9
+g(1, 0) sin
4
9
sin
4
9
+g(2, 2) sin
6
9
sin
12
9
+g(2, 3) sin
6
9
sin
16
9
+g(3, 1) sin
8
9
sin
8
9
+g(3, 1) sin
8
9
sin
8
9
+g(3, 2) sin
8
9
sin
12
9
_
(2.415)
Here we omitted terms for which g(k, l) = 0. Substituting the values of g(k, l) in
(2.415) and performing the calculation, we deduce that g
os
(1, 2) = 0.0497.
This is in agreement with the values of
G
os
(1, 2) = 0.050 we deduced in example 2.74.
Example B2.76
The odd antisymmetric sine transform of an M-sample long signal f(k),
dened for values k = 1, . . . , M, is dened as:
f
os
(m) j
2
2M + 1
M1
k=0
f(k) sin
2m(k + 1)
2M + 1
(2.416)
Identify the period of
f
os
(m).
The period of a function is the smallest number X for which
f
os
(m + X) =
f
os
(m),
for all m.
Using denition (2.416), we have:
f
os
(m+X) = j
2
2M + 1
M1
k=0
f(k) sin
2(m+X)(k + 1)
2M + 1
= j
2
2M + 1
M1
k=0
f(k) sin
_
2m(k + 1)
2M + 1
. .
+
2X(k + 1)
2M + 1
_
(2.417)
In order to have
f
os
(m+X) =
f
os
(m), we must have:
170 Image Processing: The Fundamentals
sin
_
+
2X(k + 1)
2M + 1
_
= sin (2.418)
This is only true if 2X(k+1)/(2M+1) is an integer multiple of 2. The rst number
for which this is guaranteed is for X = 2M + 1. So,
f
os
(m) is periodic with period
2M + 1.
Example B2.77
You are given a 5-sample long signal with the following values: f(0) = 0,
f(1) = 1, f(2) = 2, f(3) = 3 and f(4) = 4. Compute its ODST
f
os
(m) and plot
both the extended signal and its ODST, for 55 consecutive samples.
The extended signal is 4, 3, 2, 1, 0, 0, 0, 1, 2, 3, 4. DFT sees this signal repeated
ad innitum with period 11. Since M = 5, according to the result of example 2.76, the
ODST of the original data is periodic with period 2M +1 = 11 as well. The values of
f
os
(m) for one period are:
(1.14, 0.9, 0.46, 0.49, 0.4, 0.4, 0.49, 0.46, 0.9, 1.14, 0)
(2.419)
Figure 2.30 shows the plots of 55 consecutive samples of the extended signal and the
ODST of the original data.
0 27.5 55
4
2
0
2
4
0 27.5 55
2
1
0
1
2
Figure 2.30: On the left, 55 consecutive samples of the extended signal seen by the
DFT. On the right, ve periods of the ODST of the original 5-sample long signal.
Odd sine transform 171
Box 2.12. Derivation of the inverse 1D odd discrete sine transform
To derive the inverse ODST we shall make use of (2.408), where indices
k and
l refer
to the indices of the enlarged image, and they are related to the indices of the original
image by being increased by 1 in relation to them.
The 1D version of ODST is then:
f
os
(m) j
2
2M + 1
M
k=1
f(
k) sin
2m
k
2M + 1
(2.420)
(This denition is equivalent to (2.416), remembering that
k = k + 1.)
Let us dene f(
k) f(
f
os
(m) j
1
2(2M + 1)
M
k=M
f(
k) sin
2m
k
2M + 1
(2.421)
To derive the inverse transform we must solve this equation for f(
m=M
f
os
(m) sin
2mp
2M + 1
. .
S
=
1
2(2M + 1)
M
m=M
M
k=M
f(
k) sin
2m
k
2M + 1
sin
2mp
2M + 1
(2.422)
On the right-hand side we replace the trigonometric functions by using formula sin
_
e
j
e
j
_
/(2j), where is real. We also exchange the order of summations, observ-
ing that summation over m applies only to the kernel functions:
S =
1
8(2M + 1)
M
k=M
f(
k)
M
m=M
_
e
j
2m
k
2M+1
e
j
2m
k
2M+1
_ _
e
j
2mp
2M+1
e
j
2mp
2M+1
_
=
1
8(2M + 1)
M
k=M
f(
k)
M
m=M
_
e
j
2m(
k+p)
2M+1
e
j
2m(
kp)
2M+1
e
j
2m(
k+p)
2M+1
+e
j
2m(
kp)
2M+1
_
(2.423)
To compute the sums over m, we apply formula (2.164), on page 95, for S = 2M + 1:
172 Image Processing: The Fundamentals
S =
1
8(2M + 1)
M
k=M
f(
k)
_
(2M + 1)(
k p)
(2M + 1)(
k p)
_
=
1
8
M
k=M
f(
k)
_
(
k +p) (
k p) (
k +p) +(
k p)
_
=
1
8
M
k=M
f(
k)[2(
k +p) 2(
k p)]
=
1
4
M
k=M
f(
k)[(
k +p) (
k p)] (2.424)
We used here the property of the delta function that (x) = (x). We note that, from
all the terms in the sum, only two will survive, namely the one for
k = p and the one
for
k = p. Given that we dened f(
k) = f(
m=M
f
os
(m) sin
2mp
2M + 1
for p = 1, 2, . . . , M (2.425)
As the sine function is antisymmetric with respect to m and the
f
os
(m) can also be seen
to be antisymmetric from its denition (2.416), we may conclude that their product is
symmetric, and so we may write:
f(p) = j4
M
m=1
f
os
(m) sin
2mp
2M + 1
for p = 1, 2, . . . , M (2.426)
To go back to the original indices, we remember that p refers to the indices of the
enlarged image, and so it is shifted by 1 in relation to the original data. So, in terms of
the original indices, the inverse ODST is:
f(k) = j4
M
m=1
f
os
(m) sin
2m(k + 1)
2M + 1
for k = 0, 1, . . . , M 1 (2.427)
What is the inverse 2D odd sine transform?
The inverse of equation (2.408) is:
f(k, l) = 16
M
m=1
N
n=1
f
os
(m, n) sin
2m(k + 1)
2M + 1
sin
2n(l + 1)
2N + 1
(2.428)
Odd sine transform 173
What are the basis images in terms of which the odd sine transform expands an
image?
In equation (2.428), we may view function
W
m
(k) j4 sin
2m(k + 1)
2M + 1
(2.429)
as a function of k with parameter m. Then the basis functions, in terms of which an M N
image is expanded, are the vector outer products of vector functions W
m
(k)W
T
n
(l), where
k = 0, . . . , M 1 and l = 0, . . . , N 1. For xed (m, n) such a vector outer product creates
an elementary image of size M N. Coecient
f
os
(m, n) in (2.428) tells us the degree to
which this elementary image is present in the original image f(k, l).
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Figure 2.31: Basis images created as vector outer products of functions W
m
(k), dened by
(2.429). The indices m and n of the functions, that are multiplied, W
m
W
T
n
, to form each
image, are given on the left and at the top, respectively.
Figure 2.31 shows the elementary images in terms of which any 8 8 image is expanded
by the ODST. These images have been produced by setting M = 8 in (2.429) and allowing
parameter m to take values 1, . . . , 8. For every value of m we have a dierent function W
m
(k).
Each one of these functions is then sampled at values of k = 0, . . . , 7 to form an 8 1 vector.
The plots of these eight functions are shown in gure 2.32.
174 Image Processing: The Fundamentals
W (k)
1
7 5 3 1
4j
2j
2j
4j
0
k
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
2
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
3
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
4
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
5
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
6
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
7
7 5 3 1
4j
2j
2j
4j
0
k
W (k)
8
Figure 2.32: The imaginary functions W
m
(k) dened by (2.429), used to construct the basis
images of size 8 8 shown in gure 2.31.
Figure 2.31 shows along the left and at the top which function W
m
(k) (identied by index
m) was multiplied with which other function to create the corresponding elementary image.
Each one of these elementary images is then scaled individually, so that its grey values range
from 1 to 255.
Odd sine transform 175
Example 2.78
Take the ODST transform of image (2.103), on page 69, and show the
various approximations of it, when for the reconstruction the basis images
are created from the rst one, two,. . ., eight functions W
m
(k).
We rst remove the mean value of the image from the values of all pixels. Then
we perform the transformation and the reconstructions. Before we display the recon-
structions, we add the mean value to all pixels. The results are shown in gure 2.33.
(a) (b) (c) (d)
(e) (f ) (g) (h)
Figure 2.33: Successive approximations of the ower image using the ODST transform.
The sum of the squared errors for each reconstructed image is as follows.
Square error for image 2.33a: 350896
Square error for image 2.33b: 326264
Square error for image 2.33c: 254763
Square error for image 2.33d: 205803
Square error for image 2.33e: 159056
Square error for image 2.33f: 109829
Square error for image 2.33g: 67374
Square error for image 2.33h: 0
0 1 2 3 4 5 6
SVD 230033 118412 46673 11882
Haar 366394 356192 291740 222550 192518 174625 141100
Walsh 366394 356190 262206 222550 148029 92078 55905
DFT 366394 285895 234539 189508 141481 119612 71908
EDCT 366394 338683 216608 173305 104094 49179 35662
ODCT 368946 342507 221297 175046 96924 55351 39293
EDST 341243 328602 259157 206923 153927 101778 55905
ODST 350896 326264 254763 205803 159056 109829 67374
Table 2.2: The errors of the successive approximations of image (2.103) by the various trans-
forms of this chapter. The numbers at the top indicate the order of the approximation.
176 Image Processing: The Fundamentals
What is the take home message of this chapter?
This chapter presented the linear, unitary and separable transforms we apply to images.
These transforms analyse each image into a linear superposition of elementary basis images.
Usually these elementary images are arranged in increasing order of structure (detail). This
allows us to represent an image with as much detail as we wish, by using only as many of
these basis images as we like, starting from the rst one. The optimal way to do that is to
use as basis images those that are dened by the image itself, the eigenimages of the image
(SVD). This, however, is not very ecient, as our basis images change from one image to the
next.
Alternatively, some bases of predened images may be created with the help of orthonor-
mal sets of functions. These bases try to capture the basic characteristics of all images. Once
the basis used has been agreed, images may be communicated between dierent agents by
simply transmitting the weights with which each of the basis images has to be multiplied
before all of them are added to create the original image. The rst one of these basis images
is usually a uniform image, except in the case of the sine transform. The form of the rest
of the images of each basis depends on the orthonormal set of functions used to generate
them. As these basic images are used to represent a large number of images, more of them
are needed to represent a single image than if the eigenimages of the image itself were used
for its representation. However, the gain in the number of bits used comes from the fact that
the basis images are pre-agreed and they do not need to be stored or transmitted with each
image separately.
The bases constructed with the help of orthonormal sets of discrete functions (eg Haar
and Walsh) are easy to implement in hardware. However, the basis constructed with the help
of the orthonormal set of complex exponential functions is by far the most popular. The
representation of an image in terms of it is called discrete Fourier transform. Its popularity
stems from the fact that manipulation of the weights with which the basis images are super-
imposed to form the original image, for the purpose of omitting, for example, certain details
in the image, can be achieved by manipulating the image itself with the help of a simple con-
volution. By-products of the discrete Fourier transform are the sine and cosine transforms,
which articially enlarge the image so its discrete Fourier transform becomes real.
Table 2.2 lists the errors of the reconstructions of image (2.103) we saw in this chapter.
EDCT has a remarkably good reconstruction and that is why it is used in JPEG. SVD has
the least error, but remember that the basis images have also to be transmitted when this
approximation is used. EDST and ODST also require the dc component to be transmitted
in addition to the coecients of the expansion. Finally, DFT requires two numbers to be
transmitted per coecient retained, as it is a complex transform and so the coecients
it produces have real and imaginary parts. So, the 0th approximation for this particular
example, requires 16 real numbers for SVD (two 8 1 real vectors), only 1 number for
Haar, Walsh, DFT, EDCT and ODCT, and 2 for EDST and ODST. The 1st approximation
requires 32 real numbers for SVD, only 4 real numbers for Haar, Walsh, EDCT and ODCT,
7 real numbers for DFT (the real valued dc component, plus 3 complex numbers), and 5 real
numbers for EDST and ODST (the dc component, plus the 4 coecients for the rst 4 basis
images). The 2nd approximation requires 48 real numbers for SVD, only 9 real numbers for
Haar, Walsh, EDCT and ODCT, 17 real numbers for DFT and 10 real numbers for EDST
and ODST. (Remember that the 2nd order approximation uses the 9 basis images at the top
left corner of the images shown in gures 2.4, 2.6, 2.11, 2.12, 2.19, 2.22, 2.26 and 2.31.)
Chapter 3
Statistical Description of Images
What is this chapter about?
This chapter provides the necessary background for the statistical description of images from
the signal processing point of view. It treats each image as the outcome of some random
process, and it shows how we can reason about images using concepts from probability and
statistics, in order to express a whole collection of images as composites of some basic images.
In some cases it treats an image as the only available version of a large collection of similar
(in some sense) images and reasons on the statistical properties of the whole collection.
Why do we need the statistical description of images?
In the previous chapter we saw how we may construct bases of elementary images in terms of
which any image of the same size may be expressed as a linear combination of them. Examples
of such bases are the Fourier, Walsh and Haar bases of elementary images. These bases are
universal but not optimal in any sense, when the expansion of an image in terms of any of
them is truncated. They simply allow the user to approximate an image by omitting certain
frequencies (Fourier), or certain structural details (Walsh) or even reconstruct preferentially
certain parts of the image (Haar). We also saw how to construct basis images that are optimal
for a specic image. This led to the Singular Value Decomposition of an image which allows
the approximation of an image in the least square error sense. Between these two extremes
of universal bases, appropriate for all images, and very specic bases, appropriate for one
image only, we may wish to consider bases of elementary images that might be optimal for
a specic collection of images. For example, in various applications, we often have to deal
with sets of images of a certain type, like X-ray images, trac scene images, etc. Each image
in the set may be dierent from all the others, but at the same time all images may share
certain common characteristics. We need the statistical description of sets of images so that
we capture these common characteristics and use them in order to represent an image with
fewer bits and reconstruct it with the minimum error on average. In such a case, a pixel in
an image may be thought of as taking a value randomly selected from the set of values that
appear in the same grid location over all images in the set. A pixel, therefore, becomes a
random variable. As we have many pixels arranged in the spatial coordinates of an image,
an image becomes a random eld.
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
178 Image Processing: The Fundamentals
3.1 Random elds
What is a random eld?
A random eld is a spatial function that assigns a random variable to each spatial position.
What is a random variable?
A random variable is the value we assign to the outcome of a random experiment.
What is a random experiment?
It is a process that produces an unpredictable outcome, from a set of possible outcomes.
Throwing a die is a random experiment. Drawing the lottery is a random experiment.
How do we perform a random experiment with computers?
We do not. We produce random variables which are not truly random, and that is why
they are better described as pseudorandom. They are produced by applying a sequence
of formulae designed to produce dierent sequences of numbers when initialised by dierent
numbers, called seeds. These sequences of produced numbers are usually repeated with a
very long cycle. For examples, they may be repeated after 2
32
numbers have been produced.
Usually the number used as seed is the time from the clock. Normally, the user has the option
to specify the seed so the user may at a later stage reproduce the pseudorandom sequence of
numbers in order to debug or investigate an algorithm that depends on them.
How do we describe random variables?
Random variables are described in terms of their distribution functions which in turn are
dened in terms of the probability of an event happening. An event is a collection of
outcomes of a random experiment.
Example 3.1
Consider the cube shown in gure 3.1. Consider that you perform the
following random experiment: you throw it in the air and let it land on
the ground. The outcome of this random experiment is the particular side
of the cube that faces up when the cube rests on the ground. We agree to
associate with each possible outcome the following values of variable x:
Outcome: Side ABCH face up x = 28
Outcome: Side BCDG face up x = 23
Outcome: Side GDEF face up x = 18
Outcome: Side EFAH face up x = 25
Outcome: Side EDCH face up x = 14
Outcome: Side FGBA face up x = 18
Random elds 179
Variable x is random because it takes values according to the outcome of a
random experiment. What is the probability of x taking values in the set
14, 18?
B
C
H
G
D
E
F
A
Figure 3.1: A solid cube made up from uniformly dense material.
Assuming that the cube is made from material with uniform density, all sides are
equally likely to end up being face up. This means that each one of them has one in six
chances to end up in that position. Since number 18 is associated with two faces and
number 14 with one, the chances to get an 18 or a 14 are three in six, ie the probability
of x getting value either 14 or 18 is 0.5.
What is the probability of an event?
The probability of an event happening is a non-negative number which has the following
properties:
(i) the probability of the event, which includes all possible outcomes of the experiment, is
1;
(ii) the probability of two events which do not have any common outcomes is the sum of
the probabilities of the two events separately.
Example 3.2
In the random experiment of example 3.1, what is the event that includes
all possible outcomes?
In terms of which side lands face up, the event that includes all possible outcomes is
the set ABCH,BCDG,GDEF,EFAH,EDCH,FGBA. In terms of values of random
variable x, it is 28, 23, 18, 25, 14.
180 Image Processing: The Fundamentals
Example 3.3
In the random experiment of example 3.1, what is the probability of events
14, 18 and 23?
In example 3.1 we worked out that the probability of event 14, 18 is 0.5. The prob-
ability of event 23 is one in six. Since these two events do not have any common
outcomes, the probability of either one or the other happening is the sum of the prob-
abilities of each one happening individually. So, it is 1/2 + 1/6 = 4/6 = 2/3.
What is the distribution function of a random variable?
The distribution function of a random variable f is a function which tells us how likely it is
for f to be less than the argument of the function:
P
f
(z)
. .
Distribution
function of f
= T
..
probability
f
..
random
variable
< z
..
a number
(3.1)
Clearly, P
f
() = 0 and P
f
(+) = 1.
Example 3.4
If z
1
z
2
, show that P
f
(z
1
) P
f
(z
2
).
Assume that A is the event (ie the set of outcomes) which makes f < z
1
and B is the
event which makes f < z
2
. Since z
1
z
2
, A B B = (BA) A. Events (BA)
and A do not have common outcomes (see gure 3.2).
BA
z
2
z
1
z
2
B
f
f
A
Figure 3.2: The representation of events A and B as sets.
Then by property (ii) in the denition of the probability of an event:
T(B) = T(B A) +T(A)
P
f
(z
2
) = T(B A)
. .
a nonnegative number
+P
f
(z
1
) P
f
(z
2
) P
f
(z
1
) (3.2)
Random elds 181
Example 3.5
Show that:
T(z
1
f < z
2
) = P
f
(z
2
) P
f
(z
1
) (3.3)
According to the notation of example 3.4, z
1
f < z
2
when the outcome of the
random experiment belongs to B A (the shaded area in gure 3.2); ie T(z
1
f <
z
2
) = P
f
(B A). Since B = (B A) A, P
f
(B A) = P
f
(B) P
f
(A) and (3.3)
follows.
What is the probability of a random variable taking a specic value?
If the random variable takes values from the set of real numbers, it has zero probability of
taking a specic value. (This can be seen if in (3.3) we set z
1
= z
2
.) However, it may have
nonzero probability of taking a value within an innitesimally small range of values. This is
expressed by its probability density function.
What is the probability density function of a random variable?
The derivative of the distribution function of a random variable is called the probability
density function of the random variable:
p
f
(z)
dP
f
(z)
dz
(3.4)
The expected or mean value of the random variable f is dened as
f
Ef
_
+
zp
f
(z)dz (3.5)
and the variance as:
2
f
E(f
f
)
2
_
+
(z
f
)
2
p
f
(z)dz (3.6)
The standard deviation is the positive square root of the variance, ie
f
.
Example 3.6
Starting from denition 3.4 and using the properties of P
f
(z), prove that
_
+
p
f
(z)dz = 1
_
+
p
f
(z)dz =
_
+
dP
f
(z)
dz
dz = P
f
(z)[
+
= P
f
(+)P
f
() = 10 = 1 (3.7)
182 Image Processing: The Fundamentals
Example 3.7
The distribution function of a random variable f is given by
P
f
(z) =
1
1 +e
z
(3.8)
Compute the corresponding probability density function and plot both
functions as functions of z.
The probability density function p
f
(z) of z is given by the rst derivative of its distri-
bution function:
p
f
(z)
dP
f
(z)
dz
=
e
z
(1 +e
z
)
2
=
e
z
(1 +e
z
) (1 +e
z
)
=
1
(1 +e
z
) (1 +e
z
)
=
1
1 +e
z
+e
z
+ 1
=
1
2 + 2 cosh z
=
1
2(1 + cosh z)
(3.9)
Here we made use of the denition of the hyperbolic cosine cosh z (e
z
+e
z
)/2.
The plots of functions (3.8) and (3.9) are shown in gure 3.3.
4 3 2 1 0 1 2 3 4
0.2
0.4
0.6
0.8
1.0
z
P (z)
f
4 3 2 1 0 1 2 3 4
0.05
0.10
0.15
0.20
0.25
p (z)
f
z
Figure 3.3: A plot of P
f
(z) on the left and p
f
(z) on the right.
Random elds 183
Example B3.8
Compute the mean and variance of random variable z of example 3.7.
According to denition (3.5) and equation (3.9), we have:
f
=
_
+
zp
f
(z)dz
=
_
+
z
1
2(1 + cosh z)
dz
= 0 (3.10)
This is because the integrand is an antisymmetric function and the integration is over
a symmetric interval (since cosh(z) = cosh(z)).
According to denition (3.6) and equation (3.9), we have:
2
f
=
_
+
(z
z
)
2
p
f
(z)dz
=
_
+
z
2
1
2(1 + cosh z)
dz
=
_
+
0
z
2
1
1 + cosh z
dz (3.11)
This is because the integrand now is symmetric and the integration is over a symmetric
interval. To compute this integral, we make use of a formula taken from a table of
integrals,
_
+
0
x
1
1 + cosh x
dx =
_
2 2
3
_
()( 1) (3.12)
valid for ,= 2 with (x) being Riemanns function. Functions and are well
known transcendental functions with values and formulae that can be found in many
function books. We apply (3.12) for = 3 and obtain:
2
f
= (3)(2) (3.13)
The function for integer arguments is given by (z) = (z 1)!. So, (3) = 2! = 2.
Riemanns function, on the other hand, for positive integer arguments, is dened as
(n)
+
k=1
k
n
(3.14)
and for n = 2 it can be shown to be equal to
2
/6. We deduce then that
2
f
=
2
/3.
184 Image Processing: The Fundamentals
How do we describe many random variables?
If we have n random variables we can dene their joint distribution function:
P
f
1
f
2
...f
n
(z
1
, z
2
, . . . , z
n
) Tf
1
< z
1
, f
2
< z
2
, . . . , f
n
< z
n
(3.15)
We can also dene their joint probability density function:
p
f
1
f
2
...f
n
(z
1
, z
2
, . . . , z
n
)
n
P
f
1
f
2
...f
n
(z
1
, z
2
, . . . , z
n
)
z
1
z
2
. . . z
n
(3.16)
What relationships may n random variables have with each other?
If the distribution of n random variables can be written as
P
f
1
f
2
...f
n
(z
1
, z
2
, . . . , z
n
) = P
f
1
(z
1
)P
f
2
(z
2
) . . . P
f
n
(z
n
) (3.17)
then these random variables are called independent. They are called uncorrelated if:
Ef
i
f
j
= Ef
i
Ef
j
, i, j, i ,= j (3.18)
Any two random variables are orthogonal to each other if:
Ef
i
f
j
= 0 (3.19)
The covariance of any two random variables is dened as:
c
ij
E(f
i
f
i
)(f
j
f
j
) (3.20)
Example 3.9
Show that if the covariance c
ij
of two random variables is zero, the two
variables are uncorrelated.
Expanding the right-hand side of the denition of the covariance, we get:
c
ij
= Ef
i
f
j
f
i
f
j
f
j
f
i
+
f
i
f
j
= Ef
i
f
j
f
i
Ef
j
f
j
Ef
i
+
f
i
f
j
= Ef
i
f
j
f
i
f
j
f
j
f
i
+
f
i
f
j
= Ef
i
f
j
f
i
f
j
(3.21)
Notice that the operation of taking the expectation value of a xed number has no eect
on it; ie E
fi
=
fi
. If c
ij
= 0, we obtain
Ef
i
f
j
=
f
i
f
j
= Ef
i
Ef
j
(3.22)
which shows that f
i
and f
j
are uncorrelated, according to (3.18).
Random elds 185
Example 3.10
Show that if two random variables are independent, their joint probability
density function may be written as the product of their individual proba-
bility density functions.
According to denition (3.17), when two random variables f
1
and f
2
are independent,
their joint distribution function P
f
1
f
2
(x, y) may be written as the product of their
individual distribution functions:
P
f
1
f
2
(x, y) = P
f
1
(x)P
f
2
(y) (3.23)
According to denition (3.16), their joint probability density function p
f
1
f
2
(x, y) is
p
f
1
f
2
(x, y) =
2
P
f
1
f
2
(x, y)
xy
=
2
P
f
1
(x)P
f
2
(y)
xy
=
dP
f
1
(x)
dx
dP
f
2
(y)
dy
= p
f
1
(x)p
f
2
(y) (3.24)
where we recognised dP
f
1
(x)/dx to be the probability density function of f
1
and
dP
f
2
(y)/dy to be the probability density function of f
2
.
Example 3.11
Show that if two random variables f
1
and f
2
are independent, they are
uncorrelated.
According to example 3.10, when two random variables f
1
and f
2
are independent, their
joint probability density function p
f
1
f
2
(x, y) is equal to the product of their individual
probability density functions p
f
1
(x) and p
f
2
(y). To show that they are uncorrelated,
we must show that they satisfy equation (3.18). We start by applying a generalised
version of denition (3.10) to compute the mean of their product:
f
1
f
2
=
_
+
_
+
xyp
f
1
f
2
(x, y)dxdy
=
_
+
_
+
xyp
f
1
(x)p
f
2
(y)dxdy
=
_
+
xp
f
1
(x)dx
_
+
yp
f
2
(y)dy
=
f
1
f
2
(3.25)
This concludes the proof.
186 Image Processing: The Fundamentals
Example B3.12
Variables f
1
and f
2
have a joint probability density function which is uni-
form inside the square ABCD shown in gure 3.4 and 0 outside it. Work
out a formula for p
f
1
f
2
(x, y).
l
l
l
l
y
x
2 4 6 8
C
B
A
D
3
4
1
2
6
8
O
2
4
Figure 3.4: Joint probability density function p(x, y) is nonzero and uniform
inside square ABCD.
First we have to write down equations for lines l
1
, l
2
, l
3
and l
4
:
l
1
: x +y = 8
l
2
: x +y = 6
l
3
: x y = 2
l
4
: x y = 4
Next we have to derive the coordinates of the intersection points A, B, C and
D. By solving the pairs of the equations that correspond to the intersecting lines, we
deduce that:
Point A: Intersection of l
1
and l
3
: Coordinates (5, 3)
Point B: Intersection of l
1
and l
4
: Coordinates (6, 2)
Point C: Intersection of l
2
and l
4
: Coordinates (5, 1)
Point D: Intersection of l
2
and l
3
: Coordinates (4, 2)
Since p
f
1
f
2
(x, y) is uniform inside a square with side
2, ie inside an area of
2, and since it has to integrate to 1 over all values of x and y, p
f
1
f
2
(x, y) = 1/2 inside
square ABCD. So, we may write:
p
f
1
f
2
(x, y) =
_
_
_
1
2
if 6 x +y 8 and 2 x y 4
0 elsewhere
(3.26)
Random elds 187
Example B3.13
Compute the expectation value of
f
1
f
2
if the joint probability density func-
tion of variables f
1
and f
2
is given by equation (3.26).
By denition, the expectation value Ef
1
f
2
is given by:
Ef
1
f
2
=
_
+
_
+
xyp
f
1
f
2
(x, y)dxdy (3.27)
As the region over which p
f
1
f
2
(x, y) is nonzero has a boundary made up from linear
segments, we must split the integration over x from the x coordinate of point D (ie from
4, see example 3.12) to the x coordinate of point C (ie 5) and from the x coordinate
of point C to the x coordinate of point B (ie 6). From gure 3.4 we can see that the
limits of integration over y in each one of these ranges of x are from line l
2
to line l
3
for x from 4 to 5, and from line l
4
to line l
1
for x from 5 to 6:
Ef
1
f
2
=
_
5
4
_
y=x2
y=6x
xy
1
2
dydx +
_
6
5
_
y=8x
y=x4
1
2
dydx
=
1
2
_
_
5
4
x
_
y
2
2
x2
6x
dx +
_
6
5
x
_
y
2
2
8x
x4
dx
_
=
1
2
__
5
4
x
_
x
2
+ 4 4x
2
36 +x
2
12x
2
_
dx
+
_
6
5
x
_
64 +x
2
16x
2
x
2
+ 16 8x
2
_
dx
_
=
1
4
__
5
4
x(32 + 8x)dx +
_
6
5
x(48 8x)dx
_
=
_
5
4
(2x
2
8x)dx +
_
6
5
(12x 2x
2
)dx
=
_
2x
3
3
8x
2
2
5
4
+
_
12x
2
2
2x
3
3
6
5
=
_
250
3
200
2
128
3
+
128
2
_
+
_
432
2
432
3
300
2
+
250
3
_
= 10 (3.28)
188 Image Processing: The Fundamentals
Example B3.14
The joint probability density function of variables f
1
and f
2
is given by
equation (3.26). Compute the probability density function of variables f
1
and f
2
. Are variables f
1
and f
2
independent?
We are asked to compute the so called marginal probability density functions of vari-
ables f
1
and f
2
. The probability density function of f
1
will express the probability
of nding a value of f
1
in a particular range of width dx irrespective of the value of
f
2
. So, in order to work out this probability we have to eliminate the dependence of
p
f
1
f
2
(x, y) on y. This means that we have to integrate p
f
1
f
2
(x, y) over all values of y.
We have to do this by splitting the range of values of x from 4 to 5 and from 5 to 6
and integrating over the appropriate limits of y in each range:
p
f
1
(x) =
_
y=x2
y=6x
1
2
dy =
1
2
(x 2 6 +x) = x 4 for 4 x 5
p
f
1
(x) =
_
y=8x
y=x4
1
2
dy =
1
2
(8 x x + 4) = 6 x for 5 x 6 (3.29)
Working in a similar way for the probability density function of f
2
, we obtain:
p
f
2
(y) =
_
x=4+y
x=6y
1
2
dx =
1
2
(4 +y 6 +y) = y 1 for 1 y 2
p
f
2
(y) =
_
x=8y
x=2+y
1
2
dx =
1
2
(8 y 2 y) = 3 y for 2 y 3 (3.30)
The two random variables are not independent because we cannot write p
f
1
f
2
(x, y) =
p
f
1
(x)p
f
2
(y).
Example B3.15
Compute the mean values of variables f
1
and f
2
of example 3.14. Are these
two variables uncorrelated?
We shall make use of the probability density functions of these two variables given by
equations (3.29) and (3.30). The mean value of f
1
is given by:
Random elds 189
Ef
1
=
_
+
xp
f
1
(x)dx
=
_
5
4
x(x 4)dx +
_
6
5
x(6 x)dx
=
_
5
4
(x
2
4x)dx +
_
6
5
(6x x
2
)dx
=
_
x
3
3
4x
2
2
5
4
+
_
6x
2
2
x
3
3
6
5
=
_
125
3
100
2
64
3
+
64
2
_
+
_
216
2
216
3
150
2
+
125
3
_
= 5 (3.31)
In a similar way, we can work out that Ef
2
= 2. In example 3.13 we worked out
that Ef
1
f
2
= 10. So, Ef
1
f
2
= Ef
1
Ef
2
, and the two random variables are
uncorrelated. This is an example of random variables that are dependent (because
we cannot write p
f
1
f
2
(x, y) = p
f
1
(x)p
f
2
(y)) but uncorrelated (because we can write
Ef
1
f
2
= Ef
1
Ef
2
). In general, uncorrelatedness is a much weaker condition
than independence.
How do we dene a random eld?
If we dene a random variable at every point in a 2D space, we say that we have a 2D random
eld. The position of the space where the random variable is dened is like a parameter of
the random eld: f(r;
i
).
This function for xed r is a random variable, but for xed
i
(xed outcome) is a 2D
function in the plane, an image, say. As
i
scans all possible outcomes of the underlying
statistical experiment, the random eld represents a series of images. On the other hand,
for a given outcome, (xed
i
), the random eld gives the grey level values at the various
positions in an image.
Example 3.16
Using an unloaded die, we conducted a series of experiments. Each experi-
ment consisted of throwing the die four times. The outcomes
1
,
2
,
3
,
4
f
(r) = Ef(r;
i
) =
_
+
zp
f
(z; r)dz (3.33)
Since for dierent values of r we have dierent random variables, f(r
1
;
i
) and f(r
2
;
i
),
we can dene their correlation, called autocorrelation (we use auto because the two
variables come from the same random eld) as:
R
ff
(r
1
, r
2
) = Ef(r
1
;
i
)f(r
2
;
i
) =
_
+
_
+
z
1
z
2
p
f
(z
1
, z
2
; r
1
, r
2
)dz
1
dz
2
(3.34)
The autocovariance C(r
1
, r
2
) is dened as:
C
ff
(r
1
, r
2
) = E[f(r
1
;
i
)
f
(r
1
)][f(r
2
;
i
)
f
(r
2
)] (3.35)
Random elds 191
Example B3.17
Show that for a random eld:
C
ff
(r
1
, r
2
) = R
ff
(r
1
, r
2
)
f
(r
1
)
f
(r
2
) (3.36)
Starting from equation (3.35):
C
ff
(r
1
, r
2
) = E [f(r
1
;
i
)
f
(r
1
)] [f(r
2
;
i
)
f
(r
2
)]
= E f(r
1
;
i
)f(r
2
;
i
) f(r
1
;
i
)
f
(r
2
)
f
(r
1
)f(r
2
;
i
)
+
f
(r
1
)
f
(r
2
)
= Ef(r
1
;
i
)f(r
2
;
i
) Ef(r
1
;
i
)
f
(r
2
)
f
(r
1
)Ef(r
2
;
i
)
+
f
(r
1
)
f
(r
2
)
= R
ff
(r
1
, r
2
)
f
(r
1
)
f
(r
2
)
f
(r
1
)
f
(r
2
) +
f
(r
1
)
f
(r
2
)
= R
ff
(r
1
, r
2
)
f
(r
1
)
f
(r
2
) (3.37)
Example 3.18
Compute the mean of the ensemble of images (3.32).
The mean of a random eld is given by (3.33). However, in this case, instead of having
explicitly the probability density function of the random variable associated with each
position, p
f
(z; r), we have an ensemble of values. These values are assumed to have
been drawn according to p
f
(z; r). All we have to do then in order to nd the mean is
simply to average these values. The result is:
=
_
_
_
_
2.50 3.50 4.75 1.75
3.75 3.50 3.75 3.25
4.25 4.00 3.00 3.25
2.75 3.75 2.25 3.50
_
_
_
_
(3.38)
192 Image Processing: The Fundamentals
Example 3.19
Compute the autocorrelation matrix for the ensemble of images (3.32).
The autocorrelation matrix for a random eld is given by (3.34). If, however, we
do not have an expression for the joint probability density function of the random
variables at two positions, namely p
f
(z
1
, z
2
; r
1
, r
2
), we cannot use this formula. If
we have instead an ensemble of versions of the random eld, all we have to do is to
perform the relevant statistics on the ensemble of images we have. This is the case
here.
As we have 16 positions, ie 16 random variables, we may have 16
2
= 256 combinations
of positions. We shall work out here just a couple of the values of the autocorrelation
function:
R
ff
((1, 1), (1, 1)) =
1
2
+ 2
2
+ 1
2
+ 6
2
4
= 10.5
R
ff
((1, 1), (1, 2)) =
1 3 + 2 5 + 1 2 + 6 4
4
= 9.75
R
ff
((2, 3), (4, 1)) =
1 2 + 5 5 + 3 3 + 6 1
4
= 10.5 (3.39)
Example 3.20
Compute the autocovariance matrix for the ensemble of images (3.32).
The autocovariance matrix for a random eld is given by (3.35). As we do not have
an explicit formula for the joint probability density function of the random variables
at two positions, we compute the autocovariance matrix using ensemble statistics. We
only show here a couple of relative positions. In this calculation we make use of the
mean value at each position as computed in example 3.18:
C
ff
((1, 1), (1, 1)) =
(1 2.5)
2
+ (2 2.5)
2
+ (1 2.5)
2
+ (6 2.5)
2
4
= 4.25
C
ff
((1, 1), (1, 2)) =
1
4
[(1 2.5)(3 3.5) + (2 2.5)(5 3.5)+
(1 2.5)(2 3.5) + (6 2.5)(4 3.5)] = 1
C
ff
((2, 3), (4, 1)) =
1
4
[(1 3.75)(2 2.75) + (5 3.75)(5 2.75)+
(3 3.75)(3 2.75) + (6 3.75)(1 2.75)] = 0.1875
(3.40)
Random elds 193
How can we relate two random variables that belong to two dierent random
elds?
If we have two random elds, ie two series of images generated by two dierent underlying
random experiments, represented by f and g, we can dene their cross correlation
R
fg
(r
1
, r
2
) = Ef(r
1
;
i
)g(r
2
;
j
) (3.41)
and their cross covariance:
C
fg
(r
1
, r
2
) = E[f(r
1
;
i
)
f
(r
1
)][g(r
2
;
j
)
g
(r
2
)]
= R
fg
(r
1
, r
2
)
f
(r
1
)
g
(r
2
) (3.42)
Two random elds are called uncorrelated if for all values r
1
and r
2
:
C
fg
(r
1
, r
2
) = 0 (3.43)
This is equivalent to:
Ef(r
1
;
i
)g(r
2
;
j
) = Ef(r
1
;
i
)Eg(r
2
;
j
) (3.44)
Example B3.21
Show that for two uncorrelated random elds we have:
E f(r
1
;
i
)g(r
2
;
j
) = E f(r
1
;
i
) E g(r
2
;
j
) (3.45)
This follows trivially from the denition of uncorrelated random elds
(C
fg
(r
1
, r
2
) = 0) and the expression
C
fg
(r
1
, r
2
) = Ef(r
1
;
i
)g(r
2
;
j
)
f
(r
1
)
g
(r
2
) (3.46)
which can be proven in a similar way as (3.36).
Example 3.22
You are given an ensemble of versions of a random eld:
_
_
_
_
5 6 5 7
7 6 6 8
6 7 4 5
6 4 5 3
_
_
_
_
_
_
_
_
5 5 4 8
7 6 5 9
5 6 7 5
8 6 4 5
_
_
_
_
_
_
_
_
5 4 6 7
8 4 4 3
3 5 5 5
4 4 4 5
_
_
_
_
_
_
_
_
7 5 5 4
3 3 5 6
4 6 7 5
3 3 5 6
_
_
_
_
(3.47)
Compute the cross-covariance between this random eld and that of the
random eld represented by ensemble (3.32).
194 Image Processing: The Fundamentals
First we have to compute the average at each position in this second random eld.
This is given by the average of the versions of the eld we are given, and found to be:
_
_
_
_
5.50 5.00 5.00 6.50
6.25 4.75 5.00 6.50
4.50 6.00 5.75 5.00
5.25 4.25 4.50 4.75
_
_
_
_
(3.48)
The cross-covariance then is computed by using formula (3.42). This formula tells us
to consider pairs of positions, and for each member of a pair to subtract the corre-
sponding mean, multiply the value the rst member of the pair has in a version of the
rst random eld with the value the second member of the pair has in the corresponding
version of the second random eld, and average over all versions.
Let us consider two positions (2, 1) and (3, 3). The mean of the rst eld in position
(2, 1) is 3.75 according to (3.38). The mean of the second eld in position (3, 3) is
5.75 according to (3.48). The cross-covariance for these two positions between the two
elds is:
C
fg
((2, 1), (3, 3)) =
1
4
[(3 3.75)(4 5.75) + (4 3.75)(7 5.75)+
(4 3.75)(5 5.75) + (4 3.75)(7 5.75)] = 0.4375
(3.49)
Similarly, for positions (1, 1) and (4, 4), and positions (1, 1) and (1, 1), we nd:
C
fg
((1, 1), (4, 4)) =
1
4
[(1 2.5)(3 4.75) + (2 2.5)(5 4.75)+
(1 2.5)(5 4.75) + (6 2.5)(6 4.75)] = 1.625
C
fg
((1, 1), (1, 1)) =
1
4
[(1 2.5)(5 5.5) + (2 2.5)(5 5.5)+
(1 2.5)(5 5.5) + (6 2.5)(7 5.5)] = 1.75 (3.50)
We can work out the full cross-covariance by considering all pairs of positions in the
two random elds.
Example 3.23
Show that if the values at any two positions r
1
and r
2
, where r
1
,= r
2
, in a
random eld are uncorrelated, the covariance matrix of the random eld
is diagonal.
Random elds 195
The values at two dierent positions of a random eld are two random variables.
According to denition (3.18), two random variables are uncorrelated if the expectation
of their product is equal to the product of their expectations. The expectation of the
product of the two random variables in this case is the value of the autocorrelation
function of the eld for the two positions, let us call it R
ff
(r
1
, r
2
). If we denote by
f
(r
1
) and
f
(r
2
) the mean of the random eld at positions r
1
and r
2
, respectively,
we may then write
R
ff
(r
1
, r
2
) =
f
(r
1
)
f
(r
2
) (3.51)
since we are told that the values at r
1
and r
2
are uncorrelated. According to example
3.17 then, C
ff
(r
1
, r
2
) = 0. Note that this refers only to the case r
1
,= r
2
. If r
1
=
r
2
, according to denition (3.35), C
ff
(r
1
, r
1
) is the expectation value of a squared
number, ie it is the average of non-negative numbers, and as such it cannot be 0. It
is actually the variance of the random eld at position r
1
: C
ff
(r
1
, r
1
) =
2
(r
1
). So,
the autocovariance matrix of an uncorrelated random eld is diagonal, with all its o-
diagonal elements 0 and all its diagonal elements equal to the variance of the eld at
the corresponding positions.
If we have just one image from an ensemble of images, can we calculate expecta-
tion values?
Yes. We make the assumption that the image we have is an instantiation of a random eld that
is homogeneous (stationary) with respect to the mean and the autocorrelation function,
and ergodic with respect to the mean and the autocorrelation function. This assumption
allows us to replace the ensemble statistics of the random eld (ie the statistics we could
compute over a collection of images) with the spatial statistics of the single image we have.
When is a random eld homogeneous with respect to the mean?
A random eld is homogeneous (stationary) with respect to the mean, if the expectation value
at all positions is the same, ie if the left-hand side of equation (3.33) does not depend on r.
When is a random eld homogeneous with respect to the autocorrelation func-
tion?
If the expectation value of the random eld does not depend on r, and if its autocorrelation
function is translation invariant, then the eld is called homogeneous (or stationary) with
respect to the autocorrelation function.
A translation invariant autocorrelation function depends on only one argument, the rela-
tive shifting of the positions at which we calculate the values of the random eld:
R
ff
(r
0
) = Ef(r;
i
)f(r +r
0
;
i
) (3.52)
196 Image Processing: The Fundamentals
Example 3.24
Show that the autocorrelation function R(r
1
, r
2
) of a homogeneous (station-
ary) random eld depends only on the dierence vector r
1
r
2
.
The autocorrelation function of a homogeneous random eld is translation
invariant. Therefore, for any translation vector r
0
we may write:
R
ff
(r
1
, r
2
) = Ef(r
1
;
i
)f(r
2
;
i
) = Ef(r
1
+r
0
;
i
)f(r
2
+r
0
;
i
)
= R
ff
(r
1
+r
0
, r
2
+r
0
) r
0
(3.53)
Choosing r
0
= r
2
we see that for a homogeneous random eld:
R
ff
(r
1
, r
2
) = R
ff
(r
1
r
2
, 0) = R
ff
(r
1
r
2
) (3.54)
How can we calculate the spatial statistics of a random eld?
Given a random eld we can dene its spatial average as
(
i
) lim
S+
1
S
_
S
f(r;
i
)dxdy (3.55)
where r = (x, y) and
_
S
is the integral over the whole space o with area S. The result (
i
)
is clearly a function of the outcome on which f depends; ie (
i
) is a random variable.
The spatial autocorrelation function of the random eld is dened as:
R(r
0
;
i
) lim
S+
1
S
_
S
f(r;
i
)f(r +r
0
;
i
)dxdy (3.56)
This is another random variable.
How do we compute the spatial autocorrelation function of an image in practice?
Let us say that the image is M N in size. The relative positions of two pixels may be (i, j)
and (i +k, j +l), where k and l may take values from M +1 to M 1 and from N +1 to
N 1, respectively. The autocorrelation function then will be of size (2M 1) (2N 1),
and its (k, l) element will have value
R(k, l) =
1
NM
M1
i=0
N1
j=0
g(i, j)g(i +k, j +l) (3.57)
where k = M + 1, M + 2, . . . , 0, . . . , M 1, l = N + 1, N + 2, . . . , 0, . . . , N 1 and
g(i, j) is the grey value of the image at position (i, j). In order to have the same number of
pairs for all possible relative positions, we assume that the image is repeated ad innitum in
all directions, so all pixels have neighbours at all distances and in all orientations.
Random elds 197
Example 3.25
Compute the spatial autocorrelation function of the following image:
_
_
1 2 1
1 2 1
1 2 1
_
_
(3.58)
We apply formula (3.57) for M = N = 3. The autocorrelation function will be a 2D
discrete function (ie a matrix) of size 5 5. The result is:
R =
1
9
_
_
_
_
_
_
15 15 18 15 15
15 15 18 15 15
15 15 18 15 15
15 15 18 15 15
15 15 18 15 15
_
_
_
_
_
_
(3.59)
When is a random eld ergodic with respect to the mean?
A random eld is ergodic with respect to the mean, if it is homogeneous with respect to the
mean and its spatial average, dened by (3.55), is independent of the outcome on which f
depends, ie it is the same from whichever version of the random eld it is computed, and is
equal to the ensemble average dened by equation (3.33):
Ef(r;
i
) = lim
S+
1
S
_
S
f(r;
i
)dxdy = = a constant (3.60)
When is a random eld ergodic with respect to the autocorrelation function?
A random eld is ergodic with respect to the autocorrelation function if it is homogeneous
(stationary) with respect to the autocorrelation function and its spatial autocorrelation func-
tion, dened by (3.56), is independent of the outcome of the experiment on which f depends,
depends only on the displacement r
0
and is equal to the ensemble autocorrelation function
dened by equation (3.52):
Ef(r;
i
)f(r +r
0
;
i
)
. .
ensemble autocorrelation function
= lim
S+
1
S
_
S
f(r;
i
)f(r +r
0
;
i
)dxdy
. .
spatial autocorrelation function
= R(r
0
)
. .
independent of
i
(3.61)
198 Image Processing: The Fundamentals
Example 3.26
You are given the following ensemble of images:
_
_
_
_
5 4 6 2
5 3 4 3
6 6 7 1
5 4 2 3
_
_
_
_
,
_
_
_
_
4 2 2 1
7 2 4 9
3 5 4 5
4 6 6 2
_
_
_
_
,
_
_
_
_
3 5 2 3
5 4 4 3
2 2 6 6
6 5 4 6
_
_
_
_
,
_
_
_
_
6 4 2 8
3 5 6 4
4 4 2 2
5 4 3 4
_
_
_
_
,
_
_
_
_
4 3 5 4
6 5 6 2
4 3 3 4
3 3 6 5
_
_
_
_
,
_
_
_
_
4 5 4 5
1 6 2 6
4 8 4 4
1 3 2 7
_
_
_
_
,
_
_
_
_
2 7 6 4
2 4 2 4
6 3 4 7
4 3 6 2
_
_
_
_
,
_
_
_
_
5 3 6 6
4 4 5 2
4 2 3 4
5 5 4 4
_
_
_
_
Is this ensemble of images ergodic with respect to the mean? Is it ergodic
with respect to the autocorrelation?
It is ergodic with respect to the mean because the average of each image is 4.125 and
the average at each pixel position over all eight images is also 4.125.
It is not ergodic with respect to the autocorrelation function. To prove this, let us
calculate one element of the autocorrelation matrix, say element Eg
23
g
34
which is
the average of product values of all pixels at positions (2, 3) and (3, 4) over all images:
Eg
23
g
34
=
4 1 + 4 5 + 4 6 + 6 2 + 6 4 + 2 4 + 2 7 + 5 4
8
=
4 + 20 + 24 + 12 + 24 + 8 + 14 + 20
8
=
126
8
= 15.75 (3.62)
This should be equal to the element of the autocorrelation function which expresses the
spatial average of pairs of pixels which are diagonal neighbours from top left to bottom
right direction, computed from any image. Consider the last image in the ensemble.
We have:
< g
ij
g
i+1,j+1
> =
5 4 + 3 5 + 6 2 + 4 2 + 4 3 + 5 4 + 4 5 + 2 4 + 3 4
9
=
20 + 15 + 12 + 8 + 12 + 20 + 20 + 8 + 12
9
=
127
9
= 13 ,= 15.75
(3.63)
The two results are not the same, and therefore the ensemble is not ergodic with respect
to the autocorrelation function.
Random elds 199
What is the implication of ergodicity?
If an ensemble of images is ergodic, then we can calculate its mean and autocorrelation
function by simply calculating spatial averages over any image of the ensemble we happen to
have (see gure 3.5).
For example, assume that we have a collection of M images of similar type
g
1
(x, y), g
2
(x, y), . . . , g
M
(x, y). The mean and autocorrelation function of this collection can
be calculated by taking averages over all images in the collection. On the other hand, if we
assume ergodicity, we can pick up only one of these images and calculate the mean and the
autocorrelation function from it with the help of spatial averages. This will be correct if the
natural variability of all the dierent images is statistically the same as the natural variability
exhibited by the contents of each single image separately.
Average across images
at a single position=
Average over all positions
in the same image
E
n
s
e
m
b
l
e
o
f
i
m
a
g
e
s
Figure 3.5: Ergodicity in a nutshell.
Example 3.27
You are given the following ensemble of four 3 3 images:
_
_
3 2 1
0 1 2
3 3 3
_
_
_
_
2 2 2
2 3 2
2 1 2
_
_
_
_
3 2 3
3 2 3
0 2 0
_
_
_
_
0 2 2
3 2 1
3 2 3
_
_
(3.64)
(i) Is this ensemble ergodic with respect to the mean?
(ii) Is this ensemble ergodic with respect to the autocorrelation function?
200 Image Processing: The Fundamentals
(i) This ensemble is ergodic with respect to the mean, because the average across the
four images for each pixel is 2 and the spatial average of every image is 2 too.
(ii) Let us compute the ensemble autocorrelation function for pixels (1, 1) and (2, 1).
3 0 + 2 2 + 3 3 + 0 3
4
=
13
4
= 3.25 (3.65)
Let us also compute the spatial autocorrelation function of the rst image of the en-
semble for the same relative position. We have 6 pairs in this relative position:
3 0 + 2 1 + 1 2 + 0 3 + 1 3 + 2 3
6
=
13
6
= 2.17 (3.66)
The ensemble is not ergodic with respect to the autocorrelation function, because the
spatial autocorrelation function computed for two positions, one below the other, gave
a dierent answer from that obtained by taking two positions one under the other and
averaging over the ensemble.
Box 3.1. Ergodicity, fuzzy logic and probability theory
Ergodicity is the key that connects probability theory and fuzzy logic. Probability
theory performs all its operations assuming that situations, objects, images, or in general
the items with which it deals, are the results of some random process which may create
an ensemble of versions of each item. It always computes functions over that virtual
ensemble, the properties of which are modelled by some parametric function. Fuzzy
logic, on the other hand, instead of saying this item has probability x% to be red and
y% to be green and z% to be yellow, etc, it says this item consists of a red part
making up x% of it, a green part making up y% of it, a yellow part making up z% of
it, and so on. If ergodicity were applicable, all items that make up the ensemble used
by probability theory would have consisted of parts that reect the variety of objects
in the ensemble in the right proportions. So, if ergodicity were applicable, either we
computed functions over the ensemble of items as done by probability theory, or we
computed functions over a single item, as done by fuzzy logic, we would have found the
same answer, as every item would be expected to contain all variations that may be
encountered in the right proportions; every item would be a fair representative of the
whole ensemble, and one would not need to have the full ensemble to have a complete
picture of the world.
How can we construct a basis of elementary images appropriate for expressing in
an optimal way a whole set of images?
We do that by choosing a transformation that diagonalises the ensemble autocovariance ma-
trix of the set of images. Such a transformation is called Karhunen-Loeve transform.
Karhunen-Loeve transform 201
3.2 Karhunen-Loeve transform
What is the Karhunen-Loeve transform?
It is the transformation of an image into a basis of elementary images, dened by diagonalising
the covariance matrix of a collection of images, which are treated as instantiations of the same
random led, and to which collection of images the transformed image is assumed to belong.
Why does diagonalisation of the autocovariance matrix of a set of images dene
a desirable basis for expressing the images in the set?
Let us consider a space where we have as many coordinate axes as we have pixels in an
image, and let us assume that we measure the value of each pixel along one of the axes.
Each image then would be represented by a point in this space. The set of all images will
make a cluster of such points. The shape of this cluster of points is most simply described
in a coordinate system made up from the axes of symmetry of the cluster. For example,
in geometry, the equation of a 2D ellipse in an (x, y) coordinate system dened by its axes
of symmetry is x
2
/
2
+ y
2
/
2
= 1, where and are the semi-major and semi-minor
axes of the ellipse. In a general coordinate system, however, the equation of the ellipse is
a x
2
+ b y
2
+ c x y + d x + e y + f = 0, where a, b, c, d, e and f are some constants (see gure
3.6). This example demonstrates that every shape implies an intrinsic coordinate system, in
terms of which it is described in the simplest possible way. Let us go back now to the cluster
of points made up from the images in a set. We would like to represent that cloud of points
in the simplest possible way.
x
y
x
y
O
C
Figure 3.6: The equation of an ellipse is much simpler in coordinate system Cxy, which is
intrinsic to the ellipse, than in coordinate system O x y.
The rst step in identifying an intrinsic coordinate system for it is to shift the origin of
the axes to the centre of the cluster (see gure 3.7). It can be shown that if we rotate the axes
so that they coincided with the axes of symmetry of the cloud of points, the autocorrelation
matrix of the points described in this rotated system would be diagonal (see example 3.29).
202 Image Processing: The Fundamentals
i
(x ,y )
i
d
i
Figure 3.7: To nd the axis of symmetry of a cloud of points we rst translate the original
coordinate system to the centre of the cloud. In practice this means that we remove the
average coordinate values from all coordinates of the points of the cloud. Then we postulate
that the axis of symmetry is at orientation with respect to the horizontal axis, and work
out what should be.
Example B3.28
Show that the axis of symmetry of a cloud of points in 2D is the one
for which the sum of the squares of the distances of the points from it is
minimal.
Let us consider a cloud of points with respect to a coordinate system centred at their
average position. Let us call that system (x, y) and let us use subscript i to identify
individual points. Let us consider an axis passing through the origin of the coordinate
system and having orientation with respect to the x axis (see gure 3.7). The axis
of symmetry of the cloud of points will be such that the sum of the signed distances d
i
of all points from the axis will be as close as possible to 0. So, the symmetry axis will
have orientation angle , such that:
i
d
i
= minimum
_
i
d
i
_
2
= minimum
i
d
2
i
+2
j,j=i
d
i
d
j
= minimum
(3.67)
The distance of a point (x
i
, y
i
) from a line, with directional vector (cos , sin ), is
given by:
d
i
= y
i
cos x
i
sin (3.68)
Let us examine the second term in (3.67):
Karhunen-Loeve transform 203
j,j=i
d
i
d
j
=
j,j=i
(y
i
cos x
i
sin )(y
j
cos x
j
sin )
= (cos )
2
j,j=i
y
i
y
j
+ (sin )
2
j,j=i
x
i
x
j
cos sin
j,j=i
y
i
x
j
sin cos
j,j=i
x
i
y
j
= (cos )
2
i
y
i
j,j=i
y
j
+ (sin )
2
i
x
i
j,j=i
x
j
cos sin
i
y
i
j,j=i
x
j
sin cos
i
x
i
j,j=i
y
j
= 0 (3.69)
The result is 0 because the axes are centred at the centre of the cloud of points and so
i
y
i
=
i
x
i
= 0. So, the second term in (3.67) is 0 and the axis of symmetry is
dened by the angle that minimises
i
d
2
i
, ie the sum of the square distances of all
points from it.
Example B3.29
Show that the axis of symmetry of a 2D cloud of points is such that the
correlation of the values of the points along the two axes is 0.
According to example 3.28, the axis of symmetry is dened by angle for which
i
d
2
i
is minimal:
i
d
2
i
=
i
(y
i
cos x
i
sin )
2
= minimal
(cos )
2
i
y
2
i
+ (sin )
2
i
x
2
i
sin(2)
i
y
i
x
i
(3.70)
Here we made use of sin(2) = 2 sin cos . This expression is minimal for the value
of that makes its rst derivative with respect to zero:
2 cos sin
i
y
2
i
2 sin cos
i
x
2
i
2 cos(2)
i
y
i
x
i
= 0
sin(2)
_
i
y
2
i
i
x
2
i
_
= 2 cos(2)
i
y
i
x
i
tan(2) =
2
i
y
i
x
i
i
y
2
i
i
x
2
i
(3.71)
204 Image Processing: The Fundamentals
Note now that if the correlation between x
i
and y
i
is zero, ie if
i
y
i
x
i
= 0, tan(2) =
0 and so = 0, ie the symmetry axis coincides with axis x. So, the set of axes dened
by the symmetry of the cloud of points is the same set of axes for which the correlation
between the values along the dierent axes is zero. This means that the autocorrelation
matrix of points (x
i
, y
i
) will be diagonal, with elements along the diagonal the variances
of the two components. (The autocovariance is the same as the autocorrelation when
we are dealing with zero mean random variables. See also example 3.23.)
How can we transform an image so its autocovariance matrix becomes diagonal?
We wish to be able to express the autocovariance matrix of the image in terms of the image
itself in a linear way, so that from the transformation of the autocovariance matrix to be able
easily to work out the transformation of the image itself. This is not possible if we carry on
treating the image as 2D. We saw in Chapter 1 that in the simplest of cases an image has to
be manipulated by two matrices of the same size, one from the left and one from the right.
Such a manipulation will make the relationship between transformed image and transform of
the covariance matrix of the original image far too complicated. On the other hand, if we
express the image as a vector, by stacking its columns one under the other, the relationship
between the covariance matrix of the elements of this vector and the vector itself is much
more straightforward. Another way to see the necessity of using the image in its vector form
is to think in terms of the space where the value of each pixel is measured along one axis. The
whole image in such a space is represented by a single point, ie by a vector with coordinates
the values of its pixels: the spatial arrangement of the pixels in the original image is no longer
relevant. All we need to do in order to nd the best basis for the cloud of points made up from
all images we wish to consider is to nd a transformation that makes the correlation matrix
of the points, with respect to the new basis system, diagonal, ie the correlation between any
pair of pixels zero, irrespective of their spatial positions in the image.
Example 3.30
Write the images of example 3.26 in vector form and compute their ensem-
ble autocovariance matrix.
The images in vector form are:
g
1
T
= (5, 5, 6, 5, 4, 3, 6, 4, 6, 4, 7, 2, 2, 3, 1, 3)
g
2
T
= (4, 7, 3, 4, 2, 2, 5, 6, 2, 4, 4, 6, 1, 9, 5, 2)
g
3
T
= (3, 5, 2, 6, 5, 4, 2, 5, 2, 4, 6, 4, 3, 3, 6, 6)
g
4
T
= (6, 3, 4, 5, 4, 5, 4, 4, 2, 6, 2, 3, 8, 4, 2, 4)
g
5
T
= (4, 6, 4, 3, 3, 5, 3, 3, 5, 6, 3, 6, 4, 2, 4, 5)
Karhunen-Loeve transform 205
g
6
T
= (4, 1, 4, 1, 5, 6, 8, 3, 4, 2, 4, 2, 5, 6, 4, 7)
g
7
T
= (2, 2, 6, 4, 7, 4, 3, 3, 6, 2, 4, 6, 4, 4, 7, 2)
g
8
T
= (5, 4, 4, 5, 3, 4, 2, 5, 6, 5, 3, 4, 6, 2, 4, 4)
(3.72)
To compute the autocovariance matrix of the ensemble of these vectors we rst remove
from them the average vector, calculated to be:
T
g
= (4.125, 4.125, 4.125, 4.125, 4.125, 4.125, 4.125, 4.125) (3.73)
The new vectors are:
g
T
1
= (0.875, 0.875, 1.875, 0.875, 0.125, 1.125, 1.875, 0.125, 1.875, 0.125,
2.875, 2.125, 2.125, 1.125, 3.125, 1.125)
g
T
2
= (0.125, 2.875, 1.125, 0.125, 2.125, 2.125, 0.875, 1.875, 2.125, 0.125,
0.125, 1.875, 3.125, 4.875, 0.875, 2.125)
g
T
3
= (1.125, 0.875, 2.125, 1.875, 0.875, 0.125, 2.125, 0.875, 2.125, 0.125,
1.875, 0.125, 1.125, 1.125, 1.875, 1.875)
g
T
4
= (1.875, 1.125, 0.125, 0.875, 0.125, 0.875, 0.125, 0.125, 2.125, 1.875,
2.125, 1.125, 3.875, 0.125, 2.125, 0.125)
g
T
5
= (0.125, 1.875, 0.125, 1.125, 1.125, 0.875, 1.125, 1.125, 0.875, 1.875,
1.125, 1.875, 0.125, 2.125, 0.125, 0.875)
g
T
6
= (0.125, 3.125, 0.125, 3.125, 0.875, 1.875, 3.875, 1.125, 0.125, 2.125,
0.125, 2.125, 0.875, 1.875, 0.125, 2.875)
g
T
7
= (2.125, 2.125, 1.875, 0.125, 2.875, 0.125, 1.125, 1.125, 1.875, 2.125,
0.125, 1.875, 0.125, 0.125, 2.875, 2.125)
g
T
8
= (0.875, 0.125, 0.125, 0.875, 1.125, 0.125, 2.125, 0.875, 1.875, 0.875,
1.125, 0.125, 1.875, 2.125, 0.125, 0.125)
(3.74)
The autocovariance matrix of the set is given by
C(k, l) =
1
16
8
i=1
g
i
(k) g
i
(l) (3.75)
where g
i
(k) is the k
th
element of vector g
i
. Since the vectors have 16 elements, k and
l take values from 1 to 16 and so the autocovariance matrix of the set is 16 16.
206 Image Processing: The Fundamentals
C =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
9.19 8.63 8.50 8.63 8.00 8.56 8.75 8.63 8.38 9.06 8.25
8.63 10.31 8.06 9.06 7.50 7.75 7.94 9.13 8.13 9.25 8.81
8.50 8.06 9.31 8.38 8.88 8.50 8.81 8.13 9.38 8.25 8.63
8.63 9.06 8.38 9.56 8.44 8.06 7.56 8.94 8.31 8.94 8.81
8.00 7.50 8.88 8.44 9.56 8.81 8.50 8.06 8.81 7.81 8.75
8.56 7.75 8.50 8.06 8.81 9.19 8.69 8.06 8.56 8.50 8.13
8.75 7.94 8.81 7.56 8.50 8.69 10.44 8.25 8.44 7.88 8.81
8.63 9.13 8.13 8.94 8.06 8.06 8.25 9.06 8.06 8.69 8.63
8.38 8.13 9.38 8.31 8.81 8.56 8.44 8.06 10.06 8.25 8.68
9.06 9.25 8.25 8.94 7.81 8.50 7.88 8.69 8.25 9.56 8.06
8.25 8.81 8.63 8.81 8.75 8.13 8.81 8.63 8.69 8.06 9.69
8.00 9.19 8.38 8.56 8.38 8.19 7.63 8.63 8.50 8.63 8.13
9.06 7.31 8.56 8.44 8.75 9.38 8.19 8.13 8.50 9.00 7.38
8.38 8.69 8.19 7.94 8.19 8.06 9.69 8.94 7.50 7.88 8.44
7.56 8.38 8.19 8.44 9.06 8.44 7.75 8.56 8.38 7.88 8.44
8.56 8.00 8.00 8.06 8.63 9.25 8.81 8.25 8.20 8.50 8.50
8.00 9.06 8.38 7.56 8.56
9.19 7.31 8.69 8.38 8.00
8.38 8.56 8.19 8.19 8.00
8.56 8.44 7.94 8.44 8.06
8.38 8.75 8.19 9.06 8.63
8.19 9.38 8.06 8.44 9.25
7.63 8.19 9.69 7.75 8.81
8.63 8.13 8.94 8.56 8.25
8.50 8.50 7.50 8.38 8.19
8.62 9.00 7.88 7.88 8.50
8.13 7.38 8.44 8.44 8.50
9.81 8.00 8.75 9.50 7.88
8.00 10.69 7.63 8.06 9.06
8.75 7.63 10.94 8.88 8.06
9.50 8.06 8.88 10.19 8.44
7.88 9.06 8.06 8.44 9.94
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(3.76)
Example 3.31
Compute the spatial autocorrelation function of the vector representation
of the rst image of the images of example 3.26.
The vector representation of this image is
g
1
T
= (5, 5, 6, 5, 4, 3, 6, 4, 6, 4, 7, 2, 2, 3, 1, 3) (3.77)
Karhunen-Loeve transform 207
The spatial autocorrelation function of this digital signal now is a function of the
relative shift between the samples that make up the pairs of pixels we have to consider.
Let us call the function R(h) where h takes values from 0 to maximum 15, as this
signal consists of a total of 16 samples and it is not possible to nd samples at larger
distances than 15 from each other. We compute:
R(0) =
1
16
_
5
2
+ 5
2
+ 6
2
+ 5
2
+ 4
2
+ 3
2
+ 6
2
+ 4
2
+ 6
2
+ 4
2
+ 7
2
+ 2
2
+ 2
2
+ 3
2
+1
2
+ 3
2
_
= 19.75
R(1) =
1
15
(5 5 + 5 6 + 6 5 + 5 4 + 4 3 + 3 6 + 6 4 + 4 6
+6 4 + 4 7 + 7 2 + 2 2 + 2 3 + 3 1 + 1 3) = 17.67
R(2) =
1
14
(5 6 + 5 5 + 6 4 + 5 3 + 4 6 + 3 4 + 6 6 + 4 4
+6 7 + 4 2 + 7 2 + 2 3 + 2 1 + 3 3) = 18.79
R(3) =
1
13
(5 5 + 5 4 + 6 3 + 5 6 + 4 4 + 3 6 + 6 4 + 4 7 + 6 2
+4 2 + 7 3 + 2 1 + 2 3) = 17.54
R(4) =
1
12
(5 4 + 5 3 + 6 6 + 5 4 + 4 6 + 3 4 + 6 7 + 4 2 + 6 2
+4 3 + 7 1 + 2 3) = 17.83
R(5) =
1
11
(5 3 + 5 6 + 6 4 + 5 6 + 4 4 + 3 7 + 6 2 + 4 2 + 6 3
+4 1 + 7 3) = 18.09
R(6) =
1
10
(5 6 + 5 4 + 6 6 + 5 4 + 4 7 + 3 2 + 6 2 + 4 3 + 6 1
+4 3) = 18.2
R(7) =
1
9
(5 4 + 5 6 + 6 4 + 5 7 + 4 2 + 3 2 + 6 3 + 4 1 + 6 3)
= 18.11
R(8) =
1
8
(5 6 + 5 4 + 6 7 + 5 2 + 4 2 + 3 3 + 6 1 + 4 3) = 17.13
R(9) =
1
7
(5 4 + 5 7 + 6 2 + 5 2 + 4 3 + 3 1 + 6 3) = 15.71
R(10) =
1
6
(5 7 + 5 2 + 6 2 + 5 3 + 4 1 + 3 3) = 14.17
R(11) =
1
5
(5 2 + 5 2 + 6 3 + 5 1 + 4 3) = 11
R(12) =
1
4
(5 2 + 5 3 + 6 1 + 5 3) = 11.5
R(13) =
1
3
(5 3 + 5 1 + 6 3) = 12.67
R(14) =
1
2
(5 1 + 5 3) = 10
R(15) = 5 3 = 15 (3.78)
208 Image Processing: The Fundamentals
Example 3.32
Compute the spatial autocorrelation function of the 1D representation of
the rst image of the images of example 3.26, assuming that the 1D signal
is repeated ad innitum.
The dierence with example 3.31 is that now we can have equal number of pairs of
samples for all relative shifts. Let us write the augmented signal we shall be using:
g
1
T
= (5, 5, 6, 5, 4, 3, 6, 4, 6, 4, 7, 2, 2, 3, 1, 3[5, 5, 6, 5, 4, 3, 6, 4, 6, 4, 7, 2, 2, 3, 1, 3) (3.79)
The spatial autocorrelation function now takes the following values:
R(0) =
1
16
_
5
2
+ 5
2
+ 6
2
+ 5
2
+ 4
2
+ 3
2
+ 6
2
+ 4
2
+ 6
2
+ 4
2
+ 7
2
+ 2
2
+ 2
2
+ 3
2
+1
2
+ 3
2
_
= 19.75
R(1) =
1
16
(5 5 + 5 6 + 6 5 + 5 4 + 4 3 + 3 6 + 6 4 + 4 6
+6 4 + 4 7 + 7 2 + 2 2 + 2 3 + 3 1 + 1 3 + 3 5) = 17.5
R(2) =
1
16
(5 6 + 5 5 + 6 4 + 5 3 + 4 6 + 3 4 + 6 6 + 4 4
+6 7 + 4 2 + 7 2 + 2 3 + 2 1 + 3 3 + 1 5 + 3 5) = 17.69
R(3) =
1
16
(5 5 + 5 4 + 6 3 + 5 6 + 4 4 + 3 6 + 6 4 + 4 7 + 6 2
+4 2 + 7 3 + 2 1 + 2 3 + 3 5 + 1 5 + 3 6) = 16.63
R(4) =
1
16
(5 4 + 5 3 + 6 6 + 5 4 + 4 6 + 3 4 + 6 7 + 4 2 + 6 2
+4 3 + 7 1 + 2 3 + 2 5 + 3 5 + 1 6 + 3 5) = 16.25
R(5) =
1
16
(5 3 + 5 6 + 6 4 + 5 6 + 4 4 + 3 7 + 6 2 + 4 2 + 6 3
+4 1 + 7 3 + 2 5 + 2 5 + 3 6 + 1 5 + 3 4) = 15.88
R(6) =
1
16
(5 6 + 5 4 + 6 6 + 5 4 + 4 7 + 3 2 + 6 2 + 4 3 + 6 1
+4 3 + 7 5 + 2 5 + 2 6 + 3 5 + 1 4 + 3 3) = 16.69
R(7) =
1
16
(5 4 + 5 6 + 6 4 + 5 7 + 4 2 + 3 2 + 6 3 + 4 1 + 6 3
+4 5 + 7 5 + 2 6 + 2 5 + 3 4 + 1 3 + 3 6) = 17.06
R(8) =
1
16
(5 6 + 5 4 + 6 7 + 5 2 + 4 2 + 3 3 + 6 1 + 4 3 + 6 5
+4 5 + 7 6 + 2 5 + 2 4 + 3 3 + 1 6 + 3 4) = 17.13
Karhunen-Loeve transform 209
R(9) =
1
16
(5 4 + 5 7 + 6 2 + 5 2 + 4 3 + 3 1 + 6 3 + 4 5 + 6 5
+4 6 + 7 5 + 2 4 + 2 3 + 3 6 + 1 4 + 3 6) = 17.06
R(10) =
1
16
(5 7 + 5 2 + 6 2 + 5 3 + 4 1 + 3 3 + 6 5 + 4 5 + 6 6
+4 5 + 7 4 + 2 3 + 2 6 + 3 4 + 1 6 + 3 4) = 16.69
R(11) =
1
16
(5 2 + 5 2 + 6 3 + 5 1 + 4 3 + 3 5 + 6 5 + 4 6 + 6 5
+4 4 + 7 3 + 2 6 + 2 4 + 3 6 + 1 4 + 3 7) = 15.88
R(12) =
1
16
(5 2 + 5 3 + 6 1 + 5 3 + 4 5 + 3 5 + 6 6 + 4 5 + 6 4
+4 3 + 7 6 + 2 4 + 2 6 + 3 4 + 1 7 + 3 2) = 16.25
R(13) =
1
16
(5 3 + 5 1 + 6 3 + 5 5 + 4 5 + 3 6 + 6 5 + 4 4 + 6 3
+4 6 + 7 4 + 2 6 + 2 4 + 3 7 + 1 2 + 3 2) = 16.63
R(14) =
1
16
(5 1 + 5 3 + 6 5 + 5 5 + 4 6 + 3 5 + 6 4 + 4 3 + 6 6
+4 4 + 7 6 + 2 4 + 2 7 + 3 2 + 1 2 + 3 3) = 17.69
R(15) =
1
16
(5 3 + 5 5 + 6 5 + 5 6 + 4 5 + 3 4 + 6 3 + 4 6 + 6 4
+4 6 + 7 4 + 2 7 + 2 2 + 3 2 + 1 3 + 3 1) = 17.5
Note that by assuming repetition of the signal we have introduced some symmetry in
the autocorrelation function, as samples that are at a distance h apart from each other
can also be thought of as being at a distance 16 h apart. So, R(h) = R(16 h).
Example 3.33
Show that the spatial autocorrelation function R(h) and the spatial auto-
covariance function C(h) of an N-sample long signal g with spatial mean g
2
are related by:
C(h) = R(h) g
2
(3.80)
By denition,
C(h) =
1
N
i
[g(i) g][g(i +h) g]
=
1
N
i
g(i)g(i +h) g
1
N
i
g(i) g
1
N
i
g(i +h) +g
2
1
N
i
1
= R(h) g
2
g
2
+g
2
(3.81)
210 Image Processing: The Fundamentals
Formula (3.80) then follows. Note that this result could not have been obtained if we
had not considered that the signal was repeated, because we could not have replaced
1
N
i
g(i +h) with g.
Example 3.34
Compute the spatial autocovariance function of the 1D representation of
the rst image of the images of example 3.26, on page 198, assuming that
the 1D signal is repeated ad innitum.
The only dierence with example 3.32 is that before performing the calculation of
R(h), we should have removed the spatial mean of the samples. The spatial mean is
g = 4.125.
According to example 3.33 all we need do to go from R(h) to C(h) is to remove g
2
from the values of R(h). The result is:
C = (15.625, 13.375, 13.565, 12.505, 12.125, 11.755, 12.565, 12.935, 13.005,
12.935, 12.565, 11.755, 12.125, 12.505, 13.565, 13.375) (3.82)
What is the form of the ensemble autocorrelation matrix of a set of images, if
the ensemble is stationary with respect to the autocorrelation?
The ensemble being stationary with respect to the autocorrelation means that the value of
the autocorrelation will be the same for all pairs of samples that are in the same relative
position from each other. We can see that better if we consider a set of 3 3 images. An
element of this set is image g
i
and it has the form:
_
_
g
i
11
g
i
12
g
i
13
g
i
21
g
i
22
g
i
23
g
i
31
g
i
32
g
i
33
_
_
(3.83)
The autocorrelation function of the set takes a pair of positions and nds the average value
of their product over the whole set of images. To visualise this, we create a double entry table
and place all possible positions along the rows and the columns of the matrix, such that we
have all possible combinations. We represent the average value of each pair of positions with
a dierent letter, using the same letter for positions that are at the same relative position
from each other. We obtain:
Karhunen-Loeve transform 211
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
g
11
A B C D E F G H I
g
21
B A B J D E K G H
g
31
C B A L J D M K G
g
12
D J L A B C D E F
g
22
E D J B A B J D E
g
32
F E D C B A L J D
g
13
G K M D J L A B C
g
23
H G K E D J B A B
g
33
I H G F E D C B A
(3.84)
Note that if the ensemble were not stationary, we could have had a dierent letter (value) at
every position in the above table.
Example B3.35
In (3.84) the relative position of two positions has been decided according
to 2D. What form would the same matrix have had if we had written all
images as vectors, and thus decided the relative positions of two samples
from the vector arrangement?
When we write image (3.83) as a vector, we bring next to pixel g
i
31
, pixel g
i
12
, and
thus these two positions now become next-door neighbours, and in a stationary signal
their product is expected to have average value equal to that of positions g
i
11
and g
i
21
,
for example. So, the autocorrelation matrix now takes the form:
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
g
11
A B C D E F G H I
g
21
B A B C D E F G H
g
31
C B A B C D E F G
g
12
D C B A B C D E F
g
22
E D C B A B C D E
g
32
F E D C B A B C D
g
13
G F E D C B A B C
g
23
H G F E D C B A B
g
33
I H G F E D C B A
(3.85)
How do we go from the 1D autocorrelation function of the vector representation
of an image to its 2D autocorrelation matrix?
Assuming ergodicity, the 1D spatial autocorrelation function of the vector representation of
the image is treated like the ensemble 2D autocorrelation matrix (see example 3.36).
212 Image Processing: The Fundamentals
Example 3.36
You are given only the rst image of those in example 3.26 and you are told
that it is representative of a whole collection of images that share the same
statistical properties. Assuming ergodicity, estimate the autocovariance
matrix of the vector representations of the ensemble of these images.
The spatial autocovariance matrix of this image has been computed in example 3.34.
We use those values to create the autocovariance matrix of the ensemble, assuming
that due to ergodicity, it will have the banded structure shown in (3.85).
C =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
15.625 13.375 13.565 12.505 12.125 11.755 12.565 12.935
13.375 15.625 13.375 13.565 12.505 12.125 11.755 12.565
13.565 13.375 15.625 13.375 13.565 12.505 12.125 11.755
12.505 13.565 13.375 15.625 13.375 13.565 12.505 12.125
12.125 12.505 13.565 13.375 15.625 13.375 13.565 12.505
11.755 12.125 12.505 13.565 13.375 15.625 13.375 13.565
12.565 11.755 12.125 12.505 13.565 13.375 15.625 13.375
12.935 12.565 11.755 12.125 12.505 13.565 13.375 15.625
13.005 12.935 12.565 11.755 12.125 12.505 13.565 13.375
12.935 13.005 12.935 12.565 11.755 12.125 12.505 13.565
12.565 12.935 13.005 12.935 12.565 11.755 12.125 12.505
11.755 12.565 12.935 13.005 12.935 12.565 11.755 12.125
12.125 11.755 12.565 12.935 13.005 12.935 12.565 11.755
12.505 12.125 11.755 12.565 12.935 13.005 12.935 12.565
13.565 12.505 12.125 11.755 12.565 12.935 13.005 12.935
13.375 13.565 12.505 12.125 11.755 12.565 12.935 13.005
13.005 12.935 12.565 11.755 12.125 12.505 13.565 13.375
12.935 13.005 12.935 12.565 11.755 12.125 12.505 13.565
12.565 12.935 13.005 12.935 12.565 11.755 12.125 12.505
11.755 12.565 12.935 13.005 12.935 12.565 11.755 12.125
12.125 11.755 12.565 12.935 13.005 12.935 12.565 11.755
12.505 12.125 11.755 12.565 12.935 13.005 12.935 12.565
13.565 12.505 12.125 11.755 12.565 12.935 13.005 12.935
13.375 13.565 12.505 12.125 11.755 12.565 12.935 13.005
15.625 13.375 13.565 12.505 12.125 11.755 12.565 12.935
13.375 15.625 13.375 13.565 12.505 12.125 11.755 12.565
13.565 13.375 15.625 13.375 13.565 12.505 12.125 11.755
12.505 13.565 13.375 15.625 13.375 13.565 12.505 12.125
12.125 12.505 13.565 13.375 15.625 13.375 13.565 12.505
11.755 12.125 12.505 13.565 13.375 15.625 13.375 13.565
12.565 11.755 12.125 12.505 13.565 13.375 15.625 13.375
12.935 12.565 11.755 12.125 12.505 13.565 13.375 15.625
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(3.86)
Karhunen-Loeve transform 213
Example B3.37
What is the structure of the autocorrelation matrix of an ergodic set of
3 3 images, worked out from a single available image, using its 1D repre-
sentation, and assuming that it is repeated ad innitum?
When the samples are repeated, the value of the spatial autocorrelation function for
shift h is the same as that for shift 9 h. Then matrix (3.85) takes the form:
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
g
11
A B C D E E D C B
g
21
B A B C D E E D C
g
31
C B A B C D E E D
g
12
D C B A B C D E E
g
22
E D C B A B C D E
g
32
E E D C B A B C D
g
13
D E E D C B A B C
g
23
C D E E D C B A B
g
33
B C D E E D C B A
(3.87)
A matrix with the same value along each main diagonal direction is called Toeplitz.
How can we transform the image so that its autocorrelation matrix is diagonal?
Let us say that the original image is g of size NN and its transformed version is g. We shall
use the vector versions of them, g and g respectively; ie stack the columns of the two matrices
one below the other to create two N
2
1 vectors. We assume that the transformation we are
seeking has the form
g = A(g m) (3.88)
where the transformation matrix A is N
2
N
2
and the arbitrary vector m is N
2
1. We
assume that the image is ergodic. The mean vector of the transformed image is given by
g
= E g = EA(g m) = AEg Am = A(
g
m) (3.89)
where we have used the fact that A and m are nonrandom and, therefore, the expectation
operator leaves them unaected. Notice that although we talk about expectation value and
use the same notation as the notation used for ensemble averaging, because of the assumed
ergodicity, E g means nothing else than nding the average grey value of image g and
creating an N
2
1 vector with all its elements equal to this average grey value. If ergodicity
had not been assumed, E g would have meant that the averaging would have to be done
over all the versions of image g, and its elements most likely would not have been all equal,
unless the ensemble were stationary with respect to the mean.
214 Image Processing: The Fundamentals
We can conveniently choose m =
g
= Eg in (3.89). Then
g
= 0; ie the transformed
image will have zero mean.
The autocorrelation function of g then is the same as its autocovariance function and is
computed as:
C
g g
= E g g
T
= EA(g
g
)[A(g
g
)]
T
= EA(g
g
)(g
g
)
T
A
T
= AE
_
(g
g
)(g
g
)
T
_
. .
autocovariance of the
untransformed image
A
T
(3.90)
Note that because matrix A is not a random eld, it is not aected by the expectation
operator. Also note that, due to ergodicity, the ensemble autocovariance function of the
untransformed image may be replaced by its spatial autocovariance function.
So: C
g g
= AC
gg
A
T
. Then it is obvious that C
g g
is the diagonalised version of the
covariance matrix of the untransformed image. Such a diagonalisation is achieved if the
transformation matrix A is the matrix formed by the eigenvectors of the autocovariance
matrix of the image, used as rows. The diagonal elements of C
g g
then are the eigenvalues
of matrix C
gg
. The autocovariance matrix of the image may be calculated from the image
itself, since we assumed ergodicity (no large ensemble of similar images is needed). Equation
(3.88) then represents the Karhunen-Loeve transform of image g.
How do we compute the K-L transform of an image in practice?
Step 1: Compute the mean of input image G, of size M N, and remove it from all its
elements, to form image
G.
Step 2: Write the columns of
G one under the other to form a column vector g.
Step 3: Compute the spatial autocorrelation function C(h)
i
g(i) g(i + h)/(MN) and
from its elements the autocorrelation matrix C (of size MN MN), as in example 3.37.
(Alternatively, you may use the formula derived in Box 3.2.)
Step 4: Compute the eigenvalues and eigenvectors of C. If C has E nonzero eigenvectors,
you will produce E vectors of size MN 1.
Step 5: Arrange the eigenvectors in decreasing order of the corresponding eigenvalues.
Step 6: Create a matrix A (of size E MN) made up from the eigenvectors written one
under the other as its rows.
Step 7: Multiply matrix A with the image vector g, to produce the transformed vector g
into an image G
of size M N.
Step 9: Before you display G
you may scale it to the range [0, 255] to avoid the negative
values it will contain, and round its values to the nearest integer.
To produce the basis images of the K-L transform, you need Step 10:
Karhunen-Loeve transform 215
Step 10: Wrap every eigenvector you produced in Step 4 to form an image M N in size.
These will be the E basis images, appropriate for the representation of all images of the same
size that have the same autocovariance matrix as image G.
To reproduce the original image as a linear superposition of the basis images, you need
Step 11:
Step 11: Multiply each basis image with the corresponding element of g
i=0
g
i
g
i+h
for h = 0, . . . , N
2
1 (3.93)
where we must remember that g
i+h
= g
i+hN
2 if i +h N
2
.
We observe that:
k
i+h
= (i +h)
modulo N
l
i+h
=
i +h k
i+h
N
=
_
i +h
N
_
(3.94)
Since g
i+h
= g
i+hN
2 if i+h N
2
, instead of just i+h, we may write (i+h)
modulo N
2
.
Then the above equations become:
k
i+h
=
_
(i +h)
modulo N
2
modulo N
= (i +h)
modulo N
l
i+h
=
(i +h)
modulo N
2
k
i+h
N
=
_
(i +h)
modulo N
2
N
_
(3.95)
Element C(h) of the autocorrelation function of the 1D image representation, g, may
then be computed from its 2D representation, G, directly, using
C(h) =
1
N
2
N
2
1
i=0
G(k
i
, l
i
)G(k
i+h
, l
i+h
) (3.96)
where (k
i
, l
i
) are given by equations (3.92) and (k
i+h
, l
i+h
) are given by equations (3.95).
Elements C(h) may be used to build the 2D autocorrelation matrix of the vec-
tor representation of the image (see example 3.36).
Karhunen-Loeve transform 217
Example B3.38
Work out the formula for computing the spatial autocorrelation function
of the 1D representation g of an image G of size M N, when its indices
k and l take values from 1 to M and from 1 to N, respectively, and index i
has to take values from 1 to MN.
k=1
k=2
k=M
l
=
1
l
=
2
i
n
e
a
c
h
c
o
l
u
m
n
M
e
l
e
m
e
n
t
s
(k ,l )
i i
Figure 3.8: The pixel at position (k
i
, l
i
) has before it l
i
1 columns, with M elements
each, and is the k
th
i
element in its own column.
Consider the image of gure 3.8 where each dot represents a pixel. The (k
i
, l
i
) dot,
representing the i
th
element of the vector representation of the image, will have index
i equal to:
i = (l
i
1)M +k
i
(3.97)
Then:
k
i
= i
modulo M
l
i
= 1 +
i k
i
M
=
_
i
M
_
+ 1 (3.98)
Element C(h) of the autocorrelation function may then be computed using
C(h) =
1
NM
NM
i=0
G(k
i
, l
i
)G(k
i+h
, l
i+h
) (3.99)
where (k
i
, l
i
) are given by equations (3.98) and (k
i+h
, l
i+h
) are given by:
k
i+h
= [(i +h)
modulo MN
]
modulo M
l
i+h
= 1 +
(i +h)
modulo MN
k
i+h
M
=
_
(i +h)
modulo MN
M
_
+ 1(3.100)
218 Image Processing: The Fundamentals
Example B3.39
Assuming ergodicity, calculate the K-L transform of an ensemble of images,
one of which is:
_
_
_
_
3 5 2 3
5 4 4 3
2 2 6 6
6 5 4 6
_
_
_
_
(3.101)
The mean value of this image is 66/16 = 4.125. We subtract this from all the ele-
ments of the image and then we compute its spatial autocorrelation function C(h) for
h = 0, 1, . . . , 15 and use its values to construct the autocorrelation matrix with banded
structure similar to that shown in (3.87), but of size 16 16 instead of 9 9. The
elements of C(h) are:
2.11 0.52 0.39 0.45 0.3 0.23 0.36 0.27
0.64 0.27 0.36 0.23 0.3 0.45 0.39 0.52
(3.102)
Then we compute the eigenvectors of the autocorrelation matrix and sort them so that
their corresponding eigenvalues are in decreasing order. Finally, we use them as rows
to form the transformation matrix A with which our image can be transformed:
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0.35 0.14 0.25 0.33 0.00 0.33 0.25 0.14
0.00 0.33 0.25 0.14 0.35 0.14 0.25 0.33
0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
0.35 0.13 0.25 0.33 0.00 0.33 0.25 0.14
0.00 0.33 0.25 0.14 0.35 0.13 0.25 0.33
0.02 0.15 0.26 0.33 0.35 0.32 0.24 0.12
0.35 0.32 0.24 0.12 0.02 0.15 0.26 0.33
0.04 0.35 0.03 0.35 0.03 0.35 0.03 0.35
0.35 0.03 0.35 0.03 0.35 0.03 0.35 0.03
0.10 0.31 0.34 0.17 0.10 0.31 0.34 0.17
0.34 0.17 0.10 0.31 0.34 0.17 0.10 0.31
0.34 0.27 0.17 0.03 0.10 0.23 0.31 0.35
0.10 0.23 0.31 0.35 0.34 0.27 0.17 0.03
0.04 0.27 0.35 0.22 0.04 0.27 0.35 0.22
0.35 0.22 0.04 0.27 0.35 0.22 0.04 0.27
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Karhunen-Loeve transform 219
0.35 0.14 0.25 0.33 0.00 0.33 0.25 0.14
0.00 0.32 0.25 0.14 0.35 0.14 0.25 0.33
0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
0.35 0.13 0.25 0.33 0.00 0.33 0.25 0.14
0.00 0.33 0.25 0.14 0.35 0.13 0.25 0.33
0.02 0.15 0.26 0.33 0.35 0.32 0.24 0.12
0.35 0.32 0.24 0.12 0.02 0.15 0.26 0.33
0.03 0.35 0.03 0.35 0.03 0.35 0.03 0.35
0.35 0.03 0.35 0.03 0.35 0.03 0.35 0.03
0.10 0.31 0.34 0.17 0.10 0.31 0.34 0.17
0.34 0.17 0.10 0.31 0.34 0.17 0.10 0.31
0.34 0.27 0.17 0.03 0.10 0.23 0.31 0.35
0.10 0.23 0.31 0.35 0.34 0.27 0.17 0.03
0.04 0.27 0.35 0.22 0.04 0.27 0.35 0.22
0.35 0.22 0.04 0.27 0.35 0.22 0.04 0.27
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(3.103)
The corresponding eigenvalues are:
4.89 4.89 4 2.73 2.73 2.68 2.68 2.13
2.13 1.67 1.67 0.70 0.70 0.08 0.08 0
(3.104)
We stack the columns of the image minus its mean one below the other to form vector
g
g
:
(g
g
)
T
=
_
1.125, 0.875, 2.125, 1.875, 0.875,
0.125, 2.125, 0.875, 2.125, 0.125,
1.875, 0.125, 1.125, 1.125, 1.875, 1.875
_
(3.105)
We then multiply this vector with matrix A to derive the Karhunen-Loeve transform
of the image. In matrix form this is given by:
g =
_
_
_
_
0.30 2.30 0.89 0.03
3.11 2.29 0.76 0.22
2.00 0.34 1.66 0.34
0.43 1.86 1.18 0.00
_
_
_
_
(3.106)
220 Image Processing: The Fundamentals
Is the mean of the transformed image expected to be really 0?
No. The choice of vector m in equation (3.89) is meant to make the average of g 0. However,
that calculation is based on ensemble averages. In practice, the i
th
element of g is given by:
g
i
=
k
A
ik
(g
k
g
) (3.107)
To compute the average of the transformed image we sum over all values of i:
i
g
i
=
k
A
ik
(g
k
g
) =
k
(g
k
g
)
i
A
ik
(3.108)
Obviously,
k
(g
k
g
) = 0 given that
g
is the average value of the elements of g and
g
is a vector made up from elements all equal to
g
. The only way
g
is zero is for
i
g
i
to be 0, and this will happen only if
i
A
ik
is a constant number, independent of k. There
is no reason for this to be true, because matrix A is made up from the eigenvectors of the
covariance matrix of g written as rows one under the other. There is no reason to expect the
sums of the elements of all columns of A to be the same. So, in general, the average of the
transformed image will not be 0 because we compute this average as a spatial average and
not as the ensemble average according to the theory.
How can we approximate an image using its K-L transform?
The K-L transform of an image is given by
g = A(g
g
) (3.109)
where
g
is an N
2
1 vector with elements equal to the average grey value of the image,
and A is a matrix made up from the eigenvectors of the autocorrelation matrix of image g,
used as rows and arranged in decreasing order of the corresponding eigenvalues. The inverse
transform is:
g = A
T
g +
g
(3.110)
If we set equal to 0 the last few eigenvalues of the autocorrelation matrix of g, matrix
A will have its corresponding rows replaced by zeros, and so will the transformed image g.
The image we shall reconstruct then using (3.110) and the truncated version of A, or the
truncated version of g, will be an approximation of the original image.
What is the error with which we approximate an image when we truncate its K-L
expansion?
It can be shown (see Box 3.3), that, if we truncate the K-L expansion of an image, the image
will on average be approximated by a square error that is equal to the sum of the omitted
eigenvalues of the autocovariance matrix of the image.
Karhunen-Loeve transform 221
What are the basis images in terms of which the Karhunen-Loeve transform
expands an image?
Since g = A(g
g
) and A is an orthogonal matrix, the inverse transformation is given by
g
g
= A
T
g. We can write this expression explicitly:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
g
21
g
.
.
.
g
N1
g
g
12
g
.
.
.
g
N2
g
.
.
.
g
1N
g
.
.
.
g
NN
g
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
1,1
a
2,1
. . . a
N
2
,1
a
1,2
a
2,2
. . . a
N
2
,2
.
.
.
.
.
.
.
.
.
a
1,N
a
2,N
. . . a
N
2
,N
a
1,N+1
a
2,N+1
. . . a
N
2
,N+1
.
.
.
.
.
.
.
.
.
a
1,2N
a
2,2N
. . . a
N
2
,2N
.
.
.
.
.
.
.
.
.
a
1,N
2
N+1
a
2,N
2
N+1
. . . a
N
2
,N
2
N+1
.
.
.
.
.
.
.
.
.
a
1,N
2 a
2,N
2 . . . a
N
2
,N
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
.
.
.
g
N1
g
12
.
.
.
g
N2
.
.
.
g
1N
.
.
.
g
NN
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
= a
1,1
g
11
+a
2,1
g
21
+. . . +a
N
2
,1
g
NN
g
21
g
= a
1,2
g
11
+a
2,2
g
21
+. . . +a
N
2
,2
g
NN
. . . . . .
g
N1
g
= a
1,N
g
11
+a
2,N
g
21
+. . . +a
N
2
,N
g
NN
g
12
g
= a
1,N+1
g
11
+a
2,N+1
g
21
+. . . +a
N
2
,N+1
g
NN
. . . . . .
g
N2
g
= a
1,2N
g
11
+a
2,2N
g
21
+. . . +a
N
2
,2N
g
NN
. . . . . .
g
1N
g
= a
1,N
2
N+1
g
11
+a
2,N
2
N+1
g
21
+. . . +a
N
2
,N
2
N+1
g
NN
. . . . . .
g
NN
g
= a
1,N
2 g
11
+a
2,N
2 g
21
+. . . +a
N
2
,N
2 g
NN
We can rearrange these equations into matrix form:
_
_
_
_
_
g
11
g
g
12
g
. . . g
N1
g
g
21
g
g
22
g
. . . g
N2
g
.
.
.
.
.
.
.
.
.
g
N1
g
g
N2
g
. . . g
NN
g
_
_
_
_
_
= g
11
_
_
_
_
_
a
1,1
a
1,N+1
. . . a
1,N
2
N+1
a
1,2
a
1,N+2
. . . a
1,N
2
N+2
.
.
.
.
.
.
.
.
.
a
1,N
a
1,2N
. . . a
1,N
2
_
_
_
_
_
+ g
21
_
_
_
_
_
a
2,1
a
2,N+1
. . . a
2,N
2
N+1
a
2,2
a
2,N+2
. . . a
2,N
2
N+2
.
.
.
.
.
.
.
.
.
a
2,N
a
2,2N
. . . a
2,N
2
_
_
_
_
_
+
+. . .+ g
NN
_
_
_
_
_
a
N
2
,1
a
N
2
,N+1
. . . a
N
2
,N
2
N+1
a
N
2
,2
a
N
2
,N+2
. . . a
N
2
,N
2
N+2
.
.
.
.
.
.
.
.
.
a
N
2
,N
a
N
2
,2N
. . . a
N
2
,N
2
_
_
_
_
_
(3.111)
222 Image Processing: The Fundamentals
This expression makes it obvious that the eigenimages in terms of which the K-L transform
expands an image are formed from the eigenvectors of its spatial autocorrelation matrix, by
writing them in matrix form; ie by using the rst N elements of an eigenvector to form the
rst column of the corresponding eigenimage, the next N elements to form the next column
and so on. The coecients of this expansion are the elements of the transformed image.
We may understand this more easily by thinking in terms of the multidimensional space
where each image is represented by a point (see gure 3.9). The tip of each unit vector along
each of the axes, which we specied to be the symmetry axes of the cloud of points, represents
a point in this space, ie an image. This is the elementary image that corresponds to that axis
and the unit vector is nothing else than an eigenvector of the autocovariance matrix of the
set of dots.
g(1,1)
g(1,5)
g(1,6)
g(1,2)
g(1,3)
g(1,4)
g(1,7)
Figure 3.9: An image is a point in a multidimensional space where the value of each pixel
is measured along a dierent axis. The ensemble of images is a cloud of points. The new
coordinate system created, centred at the cloud of points, allows each point to be expressed
by its coordinates along these new axes, each one of which is dened by a unit vector. These
unit vectors are the eigenvectors of the autocovariance matrix of the cloud of points. The tip
of the thick unit vector along one of the new axes in this gure represents one of the basis
images created to represent the images in the ensemble. The coordinates of this basis image,
ie its pixel values, are the components of its position vector in the original coordinate system,
represented by the dashed vector.
Example 3.40
Consider a 3 3 image with column representation g. Write down an
expression for the K-L transform of the image in terms of the elements
of g and the elements a
ij
of the transformation matrix A. Calculate an
approximation to the image g by setting the last six rows of A to zero.
Show that the approximation will be a 9 1 vector with the rst three
elements equal to those of the full transformation of g and the remaining
Karhunen-Loeve transform 223
six elements zero.
Assume that
g
is the average grey value of image g. Then the transformed image will
have the form:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
. . . a
19
a
21
a
22
. . . a
29
a
31
a
32
. . . a
39
a
41
a
42
. . . a
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
91
a
92
. . . a
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
g
21
g
g
31
g
g
12
g
g
22
g
g
32
g
g
13
g
g
23
g
g
33
g
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
a
11
(g
11
g
) +a
12
(g
21
g
) +. . . +a
19
(g
33
g
)
a
21
(g
11
g
) +a
22
(g
21
g
) +. . . +a
29
(g
33
g
)
a
31
(g
11
g
) +a
32
(g
21
g
) +. . . +a
39
(g
33
g
)
a
41
(g
11
g
) +a
42
(g
21
g
) +. . . +a
49
(g
33
g
)
.
.
.
.
.
.
a
91
(g
11
g
) +a
92
(g
21
g
) +. . . +a
99
(g
33
g
)
_
_
_
_
_
_
_
_
_
_
_
_
(3.112)
If we set a
41
= a
42
= . . . = a
49
= a
51
= . . . = a
59
= . . . = a
99
= 0, clearly the last six
rows of the above vector will be 0 and the truncated transformation of the image will
be vector:
g
=
_
g
11
g
21
g
31
0 0 0 0 0 0
_
T
(3.113)
According to formula (3.111), the approximation of the image is then:
_
_
g
11
g
12
g
13
g
21
g
22
g
23
g
31
g
32
g
33
_
_
=
_
_
g
g
g
g
g
g
g
g
g
_
_
+ g
11
_
_
a
11
a
14
a
17
a
12
a
15
a
18
a
13
a
16
a
19
_
_
+ g
21
_
_
a
21
a
24
a
27
a
22
a
25
a
28
a
23
a
26
a
29
_
_
+ g
31
_
_
a
31
a
34
a
37
a
32
a
35
a
38
a
33
a
36
a
39
_
_
(3.114)
224 Image Processing: The Fundamentals
Example B3.41
Show that if A is an N
2
N
2
matrix the i
th
row of which is vector u
T
i
and C
2
an N
2
N
2
matrix with all its elements zero except the element at position
(2, 2), which is equal to c
2
, then:
A
T
C
2
A = c
2
u
2
u
T
2
(3.115)
Assume that u
ij
indicates the j
th
component of vector u
i
. Then:
A
T
C
2
A =
_
_
_
_
_
_
_
u
11
u
21
. . . u
N
2
1
u
12
u
22
. . . u
N
2
2
u
13
u
23
. . . u
N
2
3
.
.
.
.
.
.
.
.
.
u
1N
2 u
2N
2 . . . u
N
2
N
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 . . . 0
0 c
2
. . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
u
11
u
12
. . . u
1N
2
u
21
u
22
. . . u
2N
2
u
31
u
32
. . . u
3N
2
.
.
.
.
.
.
.
.
.
u
N
2
1
u
N
2
2
. . . u
N
2
N
2
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
u
11
u
21
. . . u
N
2
1
u
12
u
22
. . . u
N
2
2
u
13
u
23
. . . u
N
2
3
.
.
.
.
.
.
.
.
.
u
1N
2 u
2N
2 . . . u
N
2
N
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 . . . 0
c
2
u
21
c
2
u
22
. . . c
2
u
2N
2
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
_
_
=
_
_
_
_
_
c
2
u
2
21
c
2
u
21
u
22
. . . c
2
u
21
u
2N
2
c
2
u
22
u
21
c
2
u
2
22
. . . c
2
u
22
u
2N
2
.
.
.
.
.
.
.
.
.
c
2
u
2N
2u
21
c
2
u
2N
2 . . . c
2
u
2
2N
2
_
_
_
_
_
= c
2
u
2
u
T
2
(3.116)
The last equality follows by observing that c
2
is a common factor of all matrix elements
and after it is taken out, what remains is the outer product of vector u
2
with itself.
Karhunen-Loeve transform 225
Example B3.42
Assuming a 3 3 image, and accepting that we approximate it retaining
only the rst three eigenvalues of its autocovariance matrix, show that:
E g g
T
= C
g g
(3.117)
Using the result of example 3.40 concerning the truncated transform of image g
, we
have:
E g g
T
= E
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
0 0 0 0 0 0
_
_
_
= E
_
_
_
_
_
_
_
_
_
g
2
11
g
11
g
21
g
11
g
31
0 0 0 0 0 0
g
21
g
11
g
2
21
g
21
g
31
0 0 0 0 0 0
g
31
g
11
g
31
g
21
g
2
31
0 0 0 0 0 0
.
.
.
g
33
g
11
g
33
g
21
g
33
g
31
0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
E g
2
11
E g
11
g
21
E g
11
g
31
0 0 0 0 0 0
E g
21
g
11
E g
2
21
E g
21
g
31
0 0 0 0 0 0
E g
31
g
11
E g
31
g
21
E g
2
31
0 0 0 0 0 0
.
.
.
E g
33
g
11
E g
33
g
21
E g
33
g
31
0 0 0 0 0 0
_
_
_
_
_
_
_
(3.118)
The transformed image g is constructed in such a way that it has zero mean and all
the o-diagonal elements of its covariance matrix are equal to 0. Therefore, we have:
E g g
T
=
_
_
_
_
_
_
_
_
_
E g
2
11
0 0 0 0 0 0 0 0
0 E g
2
21
0 0 0 0 0 0
0 0 E g
2
31
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
.
.
.
0 0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
= C
g g
(3.119)
226 Image Processing: The Fundamentals
Box 3.3. What is the error of the approximation of an image using the
Karhunen-Loeve transform?
We shall show now that the Karhunen-Loeve transform not only expresses an image in
terms of uncorrelated data, but also, if truncated after a certain term, it can be used
to approximate the image in the least mean square error sense.
Assume that the image is of size N N. The transformation has the form:
g = Ag A
g
g = A
T
g +
g
(3.120)
We assume that we have ordered the eigenvalues of C
gg
in decreasing order. Assume
that we decide to neglect the last few eigenvalues and, say, we retain the rst K most
signicant ones. C
g g
is an N
2
N
2
matrix and its truncated version, C
g g
, has the last
N
2
K diagonal elements 0. The transformation matrix A
T
was an N
2
N
2
matrix,
the columns of which were the eigenvectors of C
gg
. Neglecting the N
2
K eigenvalues
is like omitting N
2
K eigenvectors, so the new transformation matrix A
T
has the last
N
2
K columns 0. The approximated image then is:
g
= A
T
g
+
g
(3.121)
The error of the approximation is g g
= A
T
g A
T
g
[[ = trace
_
(g g
)(g g
)
T
_
(3.122)
where trace means the sum of the diagonal elements of a square matrix. Therefore, the
mean square error is:
E[[g g
[[ = E
_
trace
_
(g g
)(g g
)
T
__
(3.123)
We can exchange the order of taking the expectation value and taking the trace:
E[[gg
[[ = trace
_
E
_
(gg
)(gg
)
T
__
= trace
_
E
_
(A
T
gA
T
g
)(A
T
gA
T
g
)
T
__
= trace
_
E
_
(A
T
gA
T
g
)( g
T
A g
T
A
)
__
= trace
_
E
_
A
T
g g
T
AA
T
g g
T
A
T
g
g
T
A+A
T
g
T
A
__
(3.124)
Matrices A and A
are xed, so the expectation operator does not aect them. Therefore:
E[[g g
[[ = trace
_
A
T
E g g
T
AA
T
E g g
T
A
T
E g
g
T
A+A
T
E g
T
A
_
(3.125)
Karhunen-Loeve transform 227
In this expression we recognise E g g
T
and E g
T
as the correlation matrices of
the set of images before and after the transformation: C
g g
and C
g g
.
Matrix g g
T
is the product of a vector and its transpose but with the last N
2
K
components of the transpose replaced by 0. The expectation operator will make all
the o-diagonal elements of g g
T
are 0 too will also make the last N
2
K diagonal elements 0
(see example 3.42). So, the result is:
E g g
T
= C
g g
(3.126)
Similar reasoning leads to:
E g
g
T
= C
g g
(3.127)
So:
E[[g g
[[ = trace
_
A
T
C
g g
AA
T
C
g g
A
T
C
g g
A+A
T
C
g g
A
_
(3.128)
Consider the sum: A
T
C
g g
A
+ A
T
C
g g
A
= (AA
)
T
C
g g
A
. We can partition
A in two sections, a K N
2
submatrix A
1
and an (N
2
K) N
2
submatrix A
2
. A
consists of A
1
and an (N
2
K) N
2
submatrix with all its elements zero:
A =
_
_
A
1
A
2
_
_
A
=
_
_
A
1
0
_
_
AA
=
_
_
0
A
2
_
_
and (AA
)
T
= ( 0
..
N
2
K
[ A
T
2
..
N
2
(N
2
K)
)
(3.129)
Then (AA
)
T
C
g g
A
=
_
0 [ A
T
2
_
C
g g
A
.
C
g g
can be partitioned into four submatrices
C
g g
=
_
_
C
1
[ 0
0 [ 0
_
_
(3.130)
where C
1
is K K diagonal. Then the product is:
_
0 [ A
T
2
_
_
_
C
1
[ 0
0 [ 0
_
_
= (0) (3.131)
Using this result in (3.128), we obtain:
E[[g g
[[ = trace
_
A
T
C
g g
AA
T
C
g g
A
_
(3.132)
228 Image Processing: The Fundamentals
Consider the term A
T
C
g g
A. We may assume that C
g g
is the sum of N
2
matrices, each
one being N
2
N
2
and having only one non zero element:
C
g g
=
_
_
_
_
_
1
0 . . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
+
_
_
_
_
_
0 0 . . . 0
0
2
. . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
+. . . +
_
_
_
_
_
0 0 . . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . .
N
2
_
_
_
_
_
(3.133)
A is made up of rows of eigenvectors while A
T
is made up of columns of eigenvectors.
Then we may write
A
T
C
g g
A =
N
2
i=1
_
u
1
u
2
. . . u
N
2
_
C
i
_
_
_
_
_
u
1
T
u
2
T
.
.
.
u
N
2
T
_
_
_
_
_
(3.134)
where C
i
is the matrix with its i
th
diagonal element nonzero and equal to
i
.
Generalising then the result of example 3.41, we have:
trace[A
T
C
g g
A] = trace
_
_
N
2
i=1
i
u
i
u
T
i
_
_
= trace
_
_
N
2
i=1
i
_
_
_
_
_
u
2
i1
u
i1
u
i2
. . . u
i1
u
iN
2
u
i2
u
i1
u
2
i2
. . . u
i2
u
iN
2
.
.
.
.
.
.
.
.
.
u
iN
2u
i1
u
iN
2 u
i2
. . . u
2
iN
2
_
_
_
_
_
_
_
=
N
2
i=1
i
trace
_
_
_
_
_
u
2
i1
u
i1
u
i2
. . . u
i1
u
iN
2
u
i2
u
i1
u
2
i2
. . . u
i2
u
iN
2
.
.
.
.
.
.
.
.
.
u
iN
2 u
i1
u
iN
2u
i2
. . . u
2
iN
2
_
_
_
_
_
=
N
2
i=1
i
(u
2
i1
+u
2
i2
+. . . +u
2
iN
2 ) =
N
2
i=1
i
(3.135)
To obtain this result we made use of the fact that u
i
is an eigenvector, and therefore
u
i1
2
+u
i2
2
+. . . +u
iN
2
2
= 1.
Applying this to equation (3.132) we eventually get:
Mean square error =
N
2
i=1
i=1
i
=
N
2
i=K+1
i
(3.136)
Note that all eigenvalues of C
gg
are non-negative, as C
gg
is a Gram matrix, and,
therefore, positive semidenite.
Karhunen-Loeve transform 229
Thus, when an image is approximated by its truncated Karhunen-Loeve expansion, the
mean square error committed is equal to the sum of the omitted eigenvalues of the
covariance matrix. Since
i
are arranged in decreasing order, this shows that the mean
square error is the minimum possible.
Example 3.43
The autocovariance matrix of a 2 2 image is given by:
C =
_
_
_
_
3 0 1 0
0 3 0 1
1 0 3 0
0 1 0 3
_
_
_
_
(3.137)
Calculate the transformation matrix A for the image, which when used for
the inverse transform, will approximate the image with mean square error
equal to 2.
We must nd the eigenvalues of this matrix, by solving the following equation:
3 0 1 0
0 3 0 1
1 0 3 0
0 1 0 3
= 0
(3 )
_
(3 )
3
(3 )
_
+ (1)
_
(3 )
2
(1)
2
_
= 0
(3 )
2
_
(3 )
2
1
_
_
(3 )
2
1
_
= 0
_
(3 )
2
1
_
2
= 0
(3 1)
2
(3 + 1)
2
= 0
(2 )
2
(4 )
2
= 0
1
= 4,
2
= 4,
3
= 2,
4
= 2 (3.138)
The corresponding eigenvectors for = 4 are:
_
_
_
_
3 0 1 0
0 3 0 1
1 0 3 0
0 1 0 3
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
= 4
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
3x
1
x
3
= 4x
1
3x
2
x
4
= 4x
2
x
1
+ 3x
3
= 4x
3
x
2
+ 3x
4
= 4x
4
x
3
= x
1
, x
4
= x
2
, x
1
= x
3
, x
2
= x
4
(3.139)
Choose: x
1
= x
3
= 0, x
2
=
1
2
, x
4
=
1
2
230 Image Processing: The Fundamentals
Or choose: x
1
=
1
2
, x
3
=
1
2
, x
2
= x
4
= 0.
The rst two eigenvectors, therefore, are:
_
0
1
2
0
1
2
_
and
_
1
2
0
1
2
0
_
, which are orthogonal to each other. For = 2 we have:
_
_
_
_
3 0 1 0
0 3 0 1
1 0 3 0
0 1 0 3
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
= 2
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
3x
1
x
3
= 2x
1
3x
2
x
4
= 2x
2
x
1
+ 3x
3
= 2x
3
x
2
+ 3x
4
= 2x
4
x
1
= x
3
, x
2
= x
4
, x
1
= x
3
, x
2
= x
4
(3.140)
Choose: x
1
= x
3
= 0, x
2
= x
4
=
1
2
.
We do not need to calculate the fourth eigenvector because we are interested in an
approximate transformation matrix. By setting some eigenvectors to
_
0 0 0 0
_
,
the mean square error we commit when reconstructing the image is equal to the sum of
the corresponding eigenvalues. In this case, if we consider as transformation matrix,
matrix
A,
A =
_
_
_
_
_
_
0
1
2
0
1
2
1
2
0
1
2
0
0
1
2
0
1
2
0 0 0 0
_
_
_
_
_
_
(3.141)
the error will be equal to
4
= 2.
Example 3.44
Show the dierent stages of the Karhunen-Loeve transform of the image
in example 2.15, on page 69.
There are 63 nonzero eigenvalues of the spatial autocorrelation matrix of this image.
Figure 3.10 shows the corresponding 63 eigenimages.
The eight images shown in gure 3.11 are the reconstructed images when 8, 16, 24,
32, 40, 48, 56 and 63 terms were used for the reconstruction.
The sums of the mean square errors for each reconstructed image are:
Square error for image 3.11a: 196460
_
63
i=9
i
= 197400
_
Karhunen-Loeve transform 231
Square error for image 3.11b: 136290
_
63
i=17
i
= 129590
_
Square error for image 3.11c: 82906
_
63
i=25
i
= 82745
_
Square error for image 3.11d: 55156
_
63
i=33
i
= 52036
_
Square error for image 3.11e: 28091
_
63
i=41
i
= 29030
_
Square error for image 3.11f: 13840
_
63
i=49
i
= 12770
_
Square error for image 3.11g: 257
_
63
i=57
i
= 295
_
Square error for image 3.11h: 0
The square errors of the reconstructions do not agree exactly with the sum of the
omitted eigenvalues, because each approximation is optimal only in the mean square
error sense, over a whole collection of images with the same autocorrelation function.
Figure 3.10: The 63 eigenimages, each scaled separately to have values from 0 to 255.
They are displayed in lexicographic order, ie from top left to bottom right, sequentially.
232 Image Processing: The Fundamentals
(a) (b) (c)
(d) (e) (f )
(g) (h)
Figure 3.11: Reconstructed image when the rst 8, 16, 24, 32, 40, 48, 56 and 63 eigen-
images shown in gure 3.10 were used (from top left to bottom right, respectively).
Example 3.45
The autocovariance matrix of a 2 2 image is given by:
C =
_
_
_
_
4 0 1 0
0 4 0 1
1 0 4 0
0 1 0 4
_
_
_
_
(3.142)
Calculate the transformation matrix A for the image, which, when used for
the inverse transform, will approximate the image with mean square error
Karhunen-Loeve transform 233
equal to 6.
We rst nd the eigenvalues of the autocovariance matrix:
4 0 1 0
0 4 0 1
1 0 4 0
0 1 0 4
= 0
(4 )
4 0 1
0 4 0
1 0 4
0 4 1
1 0 0
0 1 4
= 0
(4 )[(4 )
2
(4 )] [(4 )
2
1] = 0
[(4 )
2
1]
2
= 1 (4 1)
2
(4 + 1)
2
= 0 (3.143)
1
= 5
2
= 5
3
= 3
4
= 3
Since we allow error of image reconstruction equal to 6, we do not need to calculate
the eigenvectors that correspond to = 3.
Eigenvectors for = 5:
_
_
_
_
4 0 1 0
0 4 0 1
1 0 4 0
0 1 0 4
_
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
= 5
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
4x
1
x
3
= 5x
1
x
1
= x
3
4x
2
x
4
= 5x
2
x
2
= x
4
x
1
4x
3
= 5x
3
x
1
= x
3
x
2
4x
4
= 5x
4
x
2
= x
4
(3.144)
Choose x
1
= x
3
= 0, x
2
=
1
2
, x
4
=
1
2
. For
2
choose an orthogonal eigenvector,
eg x
2
= x
4
= 0, x
1
=
1
2
, x
3
=
1
2
. Then the transformation matrix which allows
reconstruction with mean square error 6 (equal to the sum of the omitted eigenvalues)
is:
A =
_
_
_
_
_
_
0
1
2
0
1
2
1
2
0
1
2
0
0 0 0 0
0 0 0 0
_
_
_
_
_
_
(3.145)
234 Image Processing: The Fundamentals
3.3 Independent component analysis
What is Independent Component Analysis (ICA)?
Independent Component Analysis (ICA) allows one to construct independent components
from an ensemble of data.
In the previous section, we saw how to dene a basis in terms of which we may create
uncorrelated components from an ensemble of data. Remember that independence is a much
stronger requirement than decorrelation (see example 3.15, on page 188). So, identifying
independent components is expected to be a much more dicult problem than identifying
uncorrelated components. Further, independence implies uncorrelatedness (see example 3.11,
on page 185), and when the random variables are of zero-mean, uncorrelatedness implies
orthogonality. (This follows trivially from denitions (3.18) and (3.19), on page 184.) This
relationship is schematically shown in gure 3.12.
The problem of identication of independent components is best understood in terms of
the so called cocktail party problem.
all pairs of zeromean
random variables
all pairs of 0mean
independent r. v.
all pairs of 0mean
uncorrelated r. v.
all pairs of 0mean
orthogonal r. v.
Figure 3.12: The set of all pairs of zero-mean independent random variables is a subset of
the set of all zero-mean uncorrelated random variables, which is a subset of the set of all
zero-mean orthogonal random variables, which is a subset of the set of all zero-mean random
variables. So, when we want to search for independent zero-mean random variables, we may
restrict our search among the uncorrelated zero-mean ones.
What is the cocktail party problem?
Imagine that you are in a room where several people are talking. Imagine that there are
several microphones recording the conversations. At any instant in time, you have several
blended recordings of the same speech signals. Let us say that there are two people talking,
producing signals s
1
(t) and s
2
(t) and there are two microphones recording. The recorded
signals x
1
(t) and x
2
(t) are
x
1
(t) = a
11
s
1
(t) +a
12
s
2
(t)
x
2
(t) = a
21
s
1
(t) +a
22
s
2
(t) (3.146)
Independent component analysis 235
where a
11
, a
12
, a
21
and a
22
are the blending factors, which are unknown. The question is,
given that (3.146) constitutes a system of two linear equations with six unknowns (the four
blending factors and the two original signals), can we solve it to recover the unknown signals?
How do we solve the cocktail party problem?
Clearly it is impossible to solve system (3.146) in any deterministic way. We solve it by
considering the statistical properties that characterise independent signals and by invoking
the central limit theorem.
What does the central limit theorem say?
According to the central limit theorem, the probability density function of a random variable,
that is the sum of n independent random variables, tends to a Gaussian, as n tends to innity,
no matter what the probability density functions of the independent variables are. In other
words, in (3.146) the samples of x
1
(t) are more Gaussianly distributed than either s
1
(t) or
s
2
(t). So, in order to estimate the values of the independent components, the rst thing we
need is a way to quantify the non-Gaussianity of a probability density function.
What do we mean by saying that the samples of x
1
(t) are more Gaussianly
distributed than either s
1
(t) or s
2
(t) in relation to the cocktail party problem?
Are we talking about the temporal samples of x
1
(t), or are we talking about all
possible versions of x
1
(t) at a given time?
The answer depends on the application we are interested in, which determines the nature of
the implied random experiment. Note that if ergodicity were assumed, it would have made
no dierence to the outcome either we were using temporal or ensemble statistics, but the
nature of the problem is such that ergodicity is not assumed here. Instead, what determines
the choice of the random experiment we assume is the application we are interested in. This
is dierent in signal and in image processing.
Example B3.46
It is known that a signal is corrupted by ten dierent sources of noise, all
of which produce random numbers that are added to the true signal value.
The random numbers produced by one such source of noise are uniformly
distributed in the range [
i
,
i
], where i = 1, 2, . . . , 10 identies the noise
source. We know that the values of
i
are:
1
= 0.9501,
2
= 0.2311,
3
= 0.6068,
4
= 0.4860,
5
= 0.8913,
6
= 0.7621,
7
= 0.4565,
8
= 0.0185,
9
= 0.8214 and
10
= 0.4447.
Work out a model for the probability density function of the composite
noise that corrupts this signal.
Let us assume that the signal consists of 3000 samples. The choice of this number
is not crucial, as long as the number is large enough to allow us to perform reliable
236 Image Processing: The Fundamentals
statistical estimates. Since we know that each source of noise produces random numbers
uniformly distributed in the range [
i
,
i
], let us draw 3000 such random numbers
for each of the various values of
i
. Let us call them x
ij
for j = 1, . . . , 3000. From
these numbers we may create numbers z
j
i
x
ij
, which could represent the total
error added to each true value of the signal. The histograms of the random numbers
we drew for i = 1, 2, . . . , 10 and the histogram of the 3000 numbers z
j
we created from
them are shown in gure 3.13. We can see that the z
j
numbers have a bell-shaped
distribution. We may try to t it with a Gaussian, by computing their mean and
standard deviation . It turns out that = 0.0281 and = 1.1645. In the bottom
right panel of gure 3.13, the Gaussian G(z) e
(z)
2
/(2
2
)
/(
2) is plotted on
the same axes as the normalised histogram of the z
j
values. To convert the histogram
of the eleventh panel into a probability density function that can be compared with
the corresponding Gaussian, we rst divide the bin entries with the total number of
elements we used (ie with 3000) and then we divide each such number with the bin
width, in order to make it into a density. The bin width here is 0.3715.
1 0.5 0 0.5 1
0
50
100
150
200
1 0.5 0 0.5 1
0
100
200
300
400
500
600
700
1 0.5 0 0.5 1
0
50
100
150
200
250
300
1 0.5 0 0.5 1
0
50
100
150
200
250
300
350
1 0.5 0 0.5 1
0
50
100
150
200
1 0.5 0 0.5 1
0
50
100
150
200
250
1 0.5 0 0.5 1
0
50
100
150
200
250
300
350
1 0.5 0 0.5 1
0
500
1000
1500
2000
1 0.5 0 0.5 1
0
50
100
150
200
250
1 0.5 0 0.5 1
0
100
200
300
400
4 2 0 2 4
0
100
200
300
400
4 2 0 2 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Figure 3.13: The ten histograms of x
ij
and the histogram of z
j
, from top left to bottom
middle, respectively. In the bottom right panel, the normalised histogram of z
j
and the
tted Gaussian probability density function. The normalised histogram was produced
from the histogram in the previous panel, by dividing its values with the total number
of samples we used to produce it and with the bin width we used, in order to convert
it into a density.
Independent component analysis 237
Example B3.47
Conrm that the Gaussian function you used to model the normalised
histogram of the random z
j
numbers you created in example 3.46 does
indeed t the data, by using the
2
-test.
For 3000 points, the Gaussian function predicts a certain number of events per bin for
the histogram. Let us say that histogram bin i contains values in the range (b
i1
, b
i2
].
The probability of nding a value in this range according to the Gaussian probability
density function with mean and standard deviation is:
p
i
1
2
_
b
i2
b
i1
e
(x)
2
2
2
dx (3.147)
Let us dene a new variable of integration z
x
2
. Then dx =
2
and z
i2
b
i2
2
:
p
i
=
1
_
z
i2
z
i1
e
z
2
dz
=
1
__
z
i2
0
e
z
2
dz
_
z
i1
0
e
z
2
dz
_
(3.148)
We may express these integrals in terms of the error function dened as:
erf(z)
2
_
z
0
e
t
2
dt (3.149)
Then:
p
i
=
1
2
[erf(z
i2
) erf(z
i1
)] (3.150)
If we multiply p
i
with the total number of random numbers we have, we shall have the
number of samples we expect to nd in bin i. Table 3.1 lists the boundaries of each
bin in the bottom middle panel of gure 3.13, the corresponding value p
i
computed
from (3.150), and the expected value of occupancy of each bin, E
i
3000p
i
, since
the histogram was created from 3000 numbers. The last column of this table, under
O
i
, lists the actual number of samples in each bin. Note that the rst and the last
bin are modied so the lower limit of the rst and the upper limit of the last bin are
and +, respectively. When computing then p
i
for these bins, we remember that
erf() = 0 and erf(+) = 1.
238 Image Processing: The Fundamentals
bin boundaries p
i
E
i
O
i
(, 3.3950] 0.0016 4.9279 4
(3.3950, 3.0235] 0.0027 8.2366 6
(3.0235, 2.6520] 0.0063 18.8731 15
(2.6520, 2.2805] 0.0130 39.0921 33
(2.2805, 1.9089] 0.0244 73.1964 70
(1.9089, 1.5374] 0.0413 123.8935 140
(1.5374, 1.1659] 0.0632 189.5689 224
(1.1659, 0.7943] 0.0874 262.2085 243
(0.7943, 0.4228] 0.1093 327.8604 332
(0.4228, 0.0513] 0.1235 370.5906 387
(0.0513, 0.3202] 0.1262 378.6724 330
(0.3202, 0.6918] 0.1166 349.7814 340
(0.6918, 1.0633] 0.0974 292.0743 294
(1.0633, 1.4348] 0.0735 220.4717 225
(1.4348, 1.8063] 0.0501 150.4437 163
(1.8063, 2.1779] 0.0309 92.8017 89
(2.1779, 2.5494] 0.0172 51.7483 68
(2.5494, 2.9209] 0.0087 26.0851 25
(2.9209, 3.2925] 0.0040 11.8862 9
(3.2925, +) 0.0025 7.5870 3
Table 3.1: The boundaries of the bins used to produce the two bottom right plots in
gure 3.13, and the corresponding probabilities of observing an event inside that range
of values, p
i
, computed from (3.150). The third column is the number of expected
events in each interval, produced by multiplying p
i
with the number of samples we
have, ie with 3000. The last column is the number of observed events in each interval.
The question we have to answer then is: are the numbers in the last column of table
3.1 in agreement with the numbers in the penultimate column, which are the expected
numbers according to the normal probability density function? To answer this question,
we compute the
2
value of these data, as follows:
i=1
(O
i
E
i
)
2
E
i
(3.151)
Here N is the total number of bins with E
i
5. Note that if any of the bins is expected
to have fewer than 5 points, that bin is joined with a neighbouring bin, so no bin is
expected to have fewer than 5 points when we perform this calculation. This happens
for the rst bin, according to the entries of table 3.1, which has to be joined with the
second bin, to form a single bin with expected value 13.1645(= 4.9279 + 8.2366) and
observed value 10(= 4 + 6). Therefore, N = 19 in this example. The value of
2
has
to be compared with a value that we shall read from some statistical tables, identied
Independent component analysis 239
by two parameters. The rst is the number of degrees of freedom of this test:
this is equal to N minus the number of parameters we estimated from the data and
used for the theoretical prediction. The number of parameters we used was 2, ie the
mean and the standard deviation of the Gaussian function used to make the theoretical
prediction. So, the degrees of freedom are = N 2 = 17. The other parameter,
which species which statistical table we should use, is the condence with which we
want to perform the test. This is usually set to = 0.95. This is the probability with
which, for degrees of freedom, one may nd a smaller value of
2
than the one given
in the table. If the computed
2
value is higher than the one given in the table, we
may say that the data are not Gaussianly distributed, with condence 95%. If the
value of
2
we computed is lower than the value given in the table, we may say that
our data are compatible with the hypothesis of a Gaussian distribution at the 95%
condence level.
The value of
2
for this particular example is
2
= 0.7347. The threshold value for 17
degrees of freedom and at the 95% condence level is 8.67. So, we conclude that the
hypothesis that the z
j
numbers are drawn from a Gaussian probability density function
is compatible with the data.
How do we measure non-Gaussianity?
A Gaussian probability density function is fully characterised by its mean and standard
deviation. All higher order moments of it are either 0 or they can be expressed in terms
of these two parameters. To check for non-Gaussianity, therefore, we check by how much
one of the higher order moments diers from the value expected for Gaussianly distributed
data. For example, a Gaussian probability density function has zero third moment (also
known as skewness) (see example 3.48) and its fourth order moment is 3
4
, where
2
is the
second order moment (also known as variance) (see example 3.50). It is also known that
from all probability density functions with xed standard deviation, the Gaussian has the
maximum entropy (see Box 3.4). This leads to another measure of non-Gaussianity, known
as negentropy.
How are the moments of a random variable computed?
The moments
1
of a random variable x with probability density function p(x) are dened as
i
_
+
(x )
i
p(x)dx (3.152)
where is the mean value of x, ie
_
+
xp(x)dx.
In practice, these integrals are replaced by sums over all N available values x
n
of the
random variable
1
In many books these moments are known as central moments because the mean is removed from the
random variable before it is integrated with the probability density function. In those books the moments are
dened as
i
x
i
p(x)dx. For simplicity here we drop the qualication central in the terminology we
use.
240 Image Processing: The Fundamentals
i
1
N
N
n=1
(x
n
)
i
(3.153)
where
1
N
N
n=1
x
n
(3.154)
In ICA we often use functions of these moments rather than the moments themselves,
like, for example, the kurtosis.
How is the kurtosis dened?
The kurtosis proper or Pearson kurtosis is dened as:
2
4
2
2
(3.155)
The kurtosis excess is dened as:
2
4
2
2
3 (3.156)
The kurtosis excess is also known as fourth order cumulant.
When a probability density function is atter than a Gaussian, it has negative kurtosis
excess and is said to be platykurtic or sub-Gaussian. If it is more peaky, it has positive
kurtosis and it is said to be leptokurtic or super-Gaussian (see gure 3.14).
p(x)
superGaussian
Gaussian
subGaussian
x
< 0
> 0
2
= 0
2
Figure 3.14: The value of the kurtosis excess may be used to characterise a probability density
function as being super- or sub-Gaussian.
Usually in ICA when we say kurtosis we refer to the kurtosis excess, which has zero
value for a Gaussian probability density function. We may, therefore, use the square of the
kurtosis excess, or its absolute value as a measure of non-Gaussianity. The higher its value,
the more non-Gaussian the probability density function of the data is.
Independent component analysis 241
Example B3.48
The third order moment is called skewness and quanties the asymmetry of
a probability density function. Compute the third moment of the Gaussian
probability density function.
The third moment of probability density function g(x)
1
2
exp
_
(x)
2
2
2
_
is:
3
1
2
_
+
(x )
3
e
(x)
2
2
2
dx (3.157)
We change variable of integration to x x dx = d x. The limits of integration
remain unaected. Then:
3
=
1
2
_
+
x
3
e
x
2
2
2
d x = 0 (3.158)
This integral is 0 because it is the integral of an antisymmetric (odd) integrand over a
symmetric interval.
Example B3.49
Show that:
_
+
e
x
2
dx =
(3.159)
Consider the integral of e
x
2
y
2
over the whole (x, y) 2D plane. It can be computed
either using Cartesian or polar coordinates. In Cartesian coordinates:
_
+
_
+
e
x
2
y
2
dxdy =
_
+
e
x
2
dx
_
+
e
y
2
dy =
__
+
e
x
2
dx
_
2
(3.160)
In polar coordinates (r, ), where r
2
x
2
+ y
2
and is such that x = r cos and
y = r sin , we have:
_
+
_
+
e
x
2
y
2
dxdy =
_
+
0
_
2
0
e
r
2
rdrd = 2
1
2
e
r
2
+
0
= (3.161)
By combining the results of equations (3.160) and (3.161), (3.159) follows.
242 Image Processing: The Fundamentals
Example B3.50
Compute the fourth moment of the Gaussian probability density function.
The fourth moment of probability density function g(x)
1
2
exp
_
(x)
2
2
2
_
is:
4
1
2
_
+
(x )
4
e
(x)
2
2
2
dx (3.162)
We change variable of integration to x (x )/(
2) dx =
2d x. The limits
of integration remain unaected. Then:
4
=
1
2
_
2
_
5
_
+
x
4
e
x
2
d x
=
4
4
_
+
x
3
1
2
d
_
e
x
2
_
=
2
4
_
x
3
e
x
2
_
+
e
x
2
3 x
2
d x
_
=
6
4
_
+
x
1
2
d
_
e
x
2
_
=
3
4
_
xe
x
2
_
+
e
x
2
d x
_
=
3
4
_
+
e
x
2
d x =
3
4
= 3
4
(3.163)
Here we made use of (3.159) and the fact that terms x
3
e
x
2
and xe
x
2
vanish.
Example B3.51
Compute the kurtosis of the Gaussian probability density function.
Remembering that for the Gaussian probability density function,
2
2
, and applying
formulae (3.155) and (3.156), and the result of example 3.50, we obtain for the kurtosis
proper and the kurtosis excess, respectively:
2
=
3
4
4
= 3
2
= 0 (3.164)
Independent component analysis 243
How is negentropy dened?
The negentropy of a random variable y is dened as
J(y) H() H(y) (3.165)
where H(y) is the entropy of y and is a Gaussianly distributed random variable with the
same covariance matrix as y.
How is entropy dened?
If P(y =
i
) is the probability of the discrete random variable y to take value
i
, the entropy
of this random variable is given by:
H(y)
i
P(y =
i
) lnP(y =
i
) (3.166)
Example B3.52
For a continuous variable x, with probability density function g(x), the
entropy is dened as:
H(x) =
_
+
2
e
(x)
2
2
2
.
We remember that since g(x) is a density function,
_
+
2
e
(x)
2
2
2
_
ln(
2)
(x )
2
2
2
_
dx
= ln(
2)
_
+
2
e
(x)
2
2
2
dx
. .
=1(density)
+
1
2
_
+
(x )
2
2
2
e
(x)
2
2
2
dx
. .
z(x)/(
2)dx=
2dz
= ln(
2) +
1
2
_
+
z
2
e
z
2
2dz
= ln(
2)
1
1
2
_
+
zd
_
e
z
2
_
= ln(
2)
1
2
_
ze
z
2
_
+
e
z
2
dz
_
= ln(
2) +
1
2
= ln(
2) +
1
2
ln e
= ln(
2) + ln
e = ln(
2
=
1
2A
_
A
A
x
2
dx =
1
2A
x
3
3
A
A
=
A
2
3
A =
3 (3.170)
The entropy then is:
H(x) =
_
+
p(x) lnp(x)dx
=
_
3
1
2
3
ln
_
1
2
3
_
dx
= ln(2
3) (3.171)
Example B3.54
Compute the negentropy of the zero mean uniform probability density
function with standard deviation .
We substitute the results of examples 3.52 and 3.53 in the denition of negentropy
given by (3.165):
J(y) = ln(
2e) ln(2
3) = ln
_
2e
2
3
_
= ln
__
e
6
_
= 0.176 (3.172)
Independent component analysis 245
Example B3.55
For z being a real number, prove the inequality:
ln z z 1 (3.173)
Consider function f(z) ln z z + 1. Its rst and second derivatives are:
df
dz
=
1
z
1
d
2
f
dz
2
=
1
z
2
< 0 (3.174)
Since the second derivative is always negative, the point where df/dz = 0 is a maximum
for f(z). This maximum is at z = 1 and at this point f(1) = 0. So, always f(z) 0,
and (3.173) follows.
Example B3.56
Consider two probability density functions a(x) and b(x). Show that:
_
+
a(x) ln[a(x)]dx
_
+
b(x)dx
_
+
a(x)dx = 1 1 = 0
The last equality follows from the fact that a(x) and b(x) are probability density func-
tions, and so each one integrates to 1. Inequality (3.175) then follows trivially.
246 Image Processing: The Fundamentals
Box 3.4. From all probability density functions with the same variance, the
Gaussian has the maximum entropy
Consider a probability density function f(x) with zero mean. We wish to dene f(x)
so that H(x) is maximal, subject to the constraint:
_
+
x
2
f(x)dx =
2
(3.177)
Assume that f(x) = Ae
x
2
, where parameters A and should be chosen so that f(x)
integrates to 1 and has variance
2
. Then the entropy of x is:
H(x) =
_
+
f(x) ln[f(x)]dx =
_
+
f(x)
_
ln A x
2
dx = ln A+
2
(3.178)
If (x) is another probability density function with the same variance, we must show
that the entropy of x now will be less than
2
ln A. From (3.175) we have:
_
+
(x) ln[(x)]dx
_
+
(x) ln[f(x)]dx
_
+
(x) ln[(x)]dx
_
+
(x)
_
ln Ax
2
dx = lnA +
2
(3.179)
This completes the proof.
How is negentropy computed?
Negentropy cannot be computed directly from equations (3.165) and (3.166). However, some
approximations have been proposed for its calculation and they are often used in practice.
Most of them are valid for probability density functions that are not too dierent from the
Gaussian. These approximations are valid when y and are zero-mean and unit variance
random variables, with being Gaussianly distributed. Some of these approximations are:
J
1
1
12
2
3
+
1
48
2
2
(3.180)
J
2
_
E
_
1
a
ln[cosh(ay)]
_
E
_
1
a
ln[cosh(a)]
__
2
(3.181)
J
3
_
E
_
e
y
2
2
_
2
_
2
(3.182)
J
4
36
8
3 9
_
E
_
ye
y
2
2
__
2
+
24
16
3 27
_
E
_
e
y
2
2
_
2
_
2
(3.183)
In (3.181) parameter a may take values in the range [1, 2]. In practice, often a = 1. E. . .
is the expectation operator.
Independent component analysis 247
Example B3.57
Compute the negentropy of the zero mean uniform probability density
function with standard deviation , using approximation (3.180). Compare
your answer with the exact answer of example 3.54.
According to example 3.53 the zero mean uniform probability density function with
standard deviation is dened as:
p(x) =
_
1
2
3
for
3 x
3
0 otherwise
(3.184)
This is a symmetric function, and so
3
= 0. Also,
2
=
2
. So, to compute
2
from
equation (3.156), we require
4
:
4
=
1
2
3
_
3
3
y
4
dy =
1
2
3
y
5
5
3
=
1
2
3
2(
3)
5
5
=
9
5
4
(3.185)
Then:
2
=
9
5
4
(
2
)
2
3 =
9
5
3 =
6
5
(3.186)
Upon substitution into (3.180), we obtain:
J
1
=
1
48
36
25
= 0.03 (3.187)
This result is very dierent from the exact value computed in example 3.54. This is
because approximation (3.180) is valid only for probability density functions that are
very near the Gaussian (see Box 3.5, on page 252).
Example 3.58
Four samples of random variable y are: 3, 2, 2, 3. Four samples of a
Gaussianly distributed variable are: 1, 0, 0, 1. Use (3.181), with a = 1,
to compute a number proportional to the negentropy of y.
We note that both variables have 0 mean. We must also make sure that they have unit
248 Image Processing: The Fundamentals
variance. The variance of y is:
2
y
=
1
4
_
(3)
2
+ (2)
2
+ 2
2
+ 3
2
= 6.5 (3.188)
The variance of is:
=
1
4
_
(1)
2
+ 0
2
+ 0
2
+ 1
2
= 0.5 (3.189)
If we divide now all values of y with
y
=
6.5 = 2.55 and all values of with
1
4
ln[cosh(1.41)] + ln[cosh(0)] + ln[cosh(0)] + ln[cosh(1.41)]
_
2
= 0.0016 (3.190)
Example 3.59
Draw N Gaussianly distributed random numbers with 0 mean and variance
1. Allow N to take values 10, 100, 500, 1000, 2000,..., up to 100, 000. For each
set of numbers compute S E ln[cosh(a)] /a, for various values of a. What
do you observe?
Table 3.2 gives the values of S for a = 1 and a = 1.2 and for the rst few values of N.
Figure 3.15 shows how the value of S varies with changing N, for the same two values
of a. We notice that at least a few tens of thousands of numbers are needed to somehow
stabilise the value of this expression. The average of the values of S obtained for N
taking values from 50, 000 up to 100, 000, in steps of 1000, for various values of a is
given in table 3.3. We observe that the value of S strongly depends on the number of
samples drawn and the value of a.
Independent component analysis 249
N a = 1 a = 1.2
10 0.4537 0.3973
100 0.3314 0.4649
500 0.3603 0.3793
1000 0.3629 0.3978
Table 3.2: Number of Gaussianly distributed random samples used and the correspond-
ing value of S E ln[cosh(a)] /a for a = 1 and a = 1.2.
a average S =< S >
1.0 0.3749
1.1 0.3966
1.2 0.4140
1.3 0.4371
1.4 0.4508
1.5 0.4693
2.0 0.5279
Table 3.3: Value of < S > for various values of a, estimated as the average over all
values obtained for S, for N taking values from 50, 000 to 100, 000, in steps of 1, 000.
Figure 3.15: The value of S E ln[cosh(a)] /a for a = 1 and a = 1.2 as a function
of the number of samples used to produce it.
250 Image Processing: The Fundamentals
Example 3.60
Draw 1000 uniformly distributed random numbers with 0 mean and unit
variance, in the range [1, 1]. These numbers make up the samples of vari-
able y. Also draw 1000 samples of variable from a Gaussian distribution
with 0 mean and variance 1. Estimate the negentropy of y using the three
formulae (3.181), (3.182) and (3.183). Are the estimates you obtain simi-
lar?
After we draw 1000 uniformly distributed random numbers in the range [1, 1], we
calculate their mean m and standard deviation s and normalise them so that they
have 0 mean and standard deviation 1, by removing from each number the value of
m and dividing the result with s. These normalised numbers make up the values of
variable y in formulae (3.181), (3.182) and (3.183). We compute J
3
= 0.0019 and
J
4
= 0.0635.
The value of J
2
depends on the value of a. The values of J
2
for the various values of
a are listed in table 3.4.
a J
2
a J
2
1.0 0.0010 1.6 0.0027
1.1 0.0013 1.7 0.0029
1.2 0.0016 1.8 0.0032
1.3 0.0018 1.9 0.0034
1.4 0.0021 2.0 0.0036
1.5 0.0024
Table 3.4: The values of J
2
for various values of parameter a.
The estimates we obtain are not similar and they are not expected to be similar. First of
all, J
2
and J
3
are approximate functions proportional to the true value of J, presumably
with dierent constants of proportionality. Second, J
4
is an approximation and not
just proportional to the true value of J. It is actually a bad approximation, as the
approximation is valid for probability density functions similar to the Gaussian, and
the uniform probability density function is not that similar.
We must remember, however, that these approximations are used to maximise the
negentropy of the solution, and that in a maximisation or minimisation problem, con-
stants of proportionality do not matter.
Independent component analysis 251
Example 3.61
Compute the negentropy of the zero mean uniform probability density
function with unit variance, using formulae (3.182) and (3.183). Compare
your answer with the values of J
3
and J
4
estimated empirically in example
3.60.
We must compute the expectation value of function exp(x
2
/2) with respect to the
uniform probability density function given by (3.184) for = 1:
Ee
x
2
2
1
2
3
_
3
3
e
x
2
2
dx
=
1
2
3
2
_
3
0
e
2
d
_
x
2
_
. .
Set tx/
2
=
_
2
3
_
3/2
0
e
t
2
dt
=
2
erf(
1.5)
=
_
6
erf(
1.5)
= 0.6687 (3.191)
Here we made use of the denition of the error function, equation (3.149), on page
237. Then according to formula (3.182), we have:
J
_
0.7132
1
2
_
2
= 0.001478 (3.192)
Note that according to formula (3.182) the negentropy is proportional to this value.
Formula (3.183), however, is a real approximation. The rst term of (3.183) is zero
for the uniform probability density function because the uniform probability density
function is symmetric and thus the integrand is an odd function integrated over a
symmetric interval of integration. The second term of (3.183) is nothing else than the
value of (3.182) multiplied with factor 24/(16
f(x) ln[f(x)]dx
_
+
g(x) ln[g(x)]dx
. .
entropy of g(x)
_
+
_
+
g(x)(x) ln[g(x)]dx
_
+
g(x)(x) ln[g(x)]dx
_
+
2
e
x
2
2
ln[g(x)] =
x
2
2
1
2
ln(2) (3.195)
Independent component analysis 253
Then, in the last integral, we use the approximation (1 + ) ln(1 + ) +
1
2
2
. So,
equation (3.194) takes the form:
H
f
H
g
1
2
_
+
g(x)(x)x
2
dx +
1
2
ln(2)
_
+
g(x)(x)dx
_
+
g(x)(x)dx
1
2
_
+
g(x)
2
(x)dx (3.196)
On the left-hand side of this expression we recognise the negentropy of f(x) with a
minus sign in front. From (3.193) we note that
f(x) g(x) +g(x)(x)
_
+
f(x)dx
_
+
g(x)dx +
_
+
g(x)(x)dx
1 1 +
_
+
g(x)(x)dx
_
+
g(x)(x)dx 0 (3.197)
where we made use of the fact that probability density functions integrate to 1.
Similarly,
_
+
f(x)x
2
dx
_
+
g(x)x
2
dx +
_
+
g(x)(x)x
2
dx
1 1 +
_
+
g(x)(x)x
2
dx
_
+
g(x)(x)x
2
dx 0 (3.198)
where we made use of the fact that both probability density functions have zero-mean
and unit variance.
Using these results in (3.196) we obtain:
J
1
2
_
+
g(x)
2
(x)dx (3.199)
It has been shown by statisticians that:
1
2
_
+
g(x)
2
(x)dx
1
12
2
3
+
1
48
2
2
(3.200)
Then approximation (3.180) follows.
254 Image Processing: The Fundamentals
Box 3.6. Approximating the negentropy with nonquadratic functions
Consider all probability density functions f(x) that satisfy the following n constraints
EG
i
(x)
_
+
f(x)G
i
(x)dx = c
i
for i = 1, 2, . . . , n (3.201)
where G
i
(x) are some known functions and c
i
some known constants. It can be shown
that the function with the maximum entropy from among all these functions is the one
with the form
f
0
(x) = Ae
i
a
i
G
i
(x)
(3.202)
where A and a
i
are some functions of c
i
. For the special case of n = 2, G
1
(x) = x,
c
1
= 0, G
2
(x) = x
2
and c
2
= 1, see Box 3.4, on page 246.
Let us assume that f(x) is similar to a normal probability density function g(x) with
the same mean (zero) and the same variance (unit). We may express that by saying
that f(x), on the top of the n constraints listed above, obeys two more constraints,
with:
G
n+1
(x) = x c
n+1
= 0
G
n+2
(x) = x
2
c
n+2
= 1 (3.203)
For completeness, we also consider function G
0
= a constant, written as ln A with
c
0
= ln A.
Further, we assume that functions G
i
form an orthonormal system of functions, for
i = 1, . . . , n, with weight g(x)
_
+
g(x)G
i
(x)G
j
(x)dx =
_
1 if i = j
0 if i ,= j
(3.204)
and they are orthogonal to functions G
0
, G
n+1
(x) and G
n+2
(x):
_
+
g(x)G
i
(x)x
r
dx = 0 for r = 0, 1, 2 and i (3.205)
In view of the above, the function with the maximum entropy that obeys the constraints
must have the form:
f
0
(x) = e
ln A+a
n+1
x+a
n+2
x
2
+
n
i=1
a
i
G
i
(x)
= Ae
x
2
2
+a
n+1
x+(a
n+2
+
1
2
)x
2
+
n
i=1
a
i
G
i
(x)
= Ae
x
2
2
e
a
n+1
x+(a
n+2
+
1
2
)x
2
+
n
i=1
a
i
G
i
(x)
(3.206)
If f(x) is very near a Gaussian, coecient a
n+2
in the above expansion must be the
dominant one, with its value near 1/2. So, the last factor on the right-hand side of
Independent component analysis 255
(3.206) may be considered to be of the form e
1 +:
f
0
(x) Ae
x
2
2
_
1 +a
n+1
x +
_
a
n+2
+
1
2
_
x
2
+
n
i=1
a
i
G
i
(x)
_
= A
2g(x)
_
1 +a
n+1
x +
_
a
n+2
+
1
2
_
x
2
+
n
i=1
a
i
G
i
(x)
_
(3.207)
We shall take the various moments of this expression in order to work out values for
the various parameters that appear in it. First, let us multiply both sides with dx and
integrate them from to +:
_
+
f
0
(x)dx
. .
=1(integral of a pdf)
= A
2
_
+
g(x)dx
. .
=1(integral of a pdf)
+A
2a
n+1
_
+
g(x)xdx
. .
=0(mean)
+A
2
_
a
n+2
+
1
2
__
+
g(x)x
2
dx
. .
=1(unit variance)
+A
2
n
i=1
a
i
_
+
g(x)G
i
(x)dx
. .
=0(orthogonality of G
i
with G
0
)
1 = A
2 +A
2
_
a
n+2
+
1
2
_
(3.208)
We then multiply both sides of (3.207) with xdx and integrate:
_
+
f
0
(x)xdx
. .
=0(mean)
= A
2
_
+
g(x)xdx
. .
=0(mean)
+A
2a
n+1
_
+
g(x)x
2
dx
. .
=1(unit variance)
+A
2
_
a
n+2
+
1
2
__
+
g(x)x
3
dx
. .
=0(odd integrand)
+A
2
n
i=1
a
i
_
+
g(x)G
i
(x)xdx
. .
=0(orthogonality of G
i
with G
n+1
)
0 = A
2a
n+1
a
n+1
= 0 (3.209)
256 Image Processing: The Fundamentals
Next multiply both sides of (3.207) with x
2
dx and integrate:
_
+
f
0
(x)x
2
dx
. .
=1(unit variance)
= A
2
_
+
g(x)x
2
dx
. .
=1(unit variance)
+A
2a
n+1
_
+
g(x)x
3
dx
. .
=0(odd integrand)
+A
2
_
a
n+2
+
1
2
_ _
+
g(x)x
4
dx
. .
=3(see example (3.50))
+A
2
n
i=1
a
i
_
+
g(x)G
i
(x)x
2
dx
. .
=0(orthogonality of G
i
with G
n+2
)
1 = A
2 + 3A
2
_
a
n+2
+
1
2
_
(3.210)
By subtracting (3.208) from (3.210) by parts, we obtain:
0 = 2A
2
_
a
n+2
+
1
2
_
a
n+2
=
1
2
(3.211)
From (3.208) and (3.211) we deduce that:
1 = A
2 A =
1
2
(3.212)
Finally, we multiply both sides of (3.207) with G
j
(x)dx and integrate. We make also
use of the fact that A
2 = 1:
_
+
f
0
(x)G
j
(x)dx
. .
=c
j
=
_
+
g(x)G
j
(x)dx
. .
=0(orthogonality of G
j
with G
0
)
+a
n+1
_
+
g(x)xG
j
(x)dx
. .
=0(orthogonality of G
j
with G
n+1
)
+
_
a
n+2
+
1
2
_ _
+
g(x)x
2
G
j
(x)dx
. .
=0(orthogonality of G
j
with G
n+2
)
+
n
i=1
a
i
_
+
g(x)G
i
(x)G
j
(x)dx
. .
=
ij
c
j
= a
j
a
j
= c
j
(3.213)
Then, equation (3.207) takes the form:
f
0
(x) g(x)
_
1 +
n
i=1
c
i
G
i
(x)
_
(3.214)
Independent component analysis 257
This equation has the same form as (3.193) with (x)
n
i=1
c
i
G
i
(x). Then we know
from Box 3.5, on page 252, that the negentropy of f
0
(x) is given by (3.199). That is:
J
1
2
_
+
g(x)
_
n
i=1
c
i
G
i
(x)
_
2
dx
=
1
2
_
+
g(x)
n
i=1
c
i
G
i
(x)
n
j=1
c
j
G
j
(x)dx
=
1
2
n
i=1
n
j=1
c
i
c
j
_
+
g(x)G
i
(x)G
j
(x)dx
=
1
2
n
i=1
n
j=1
c
i
c
j
ij
=
1
2
n
i=1
c
2
i
=
1
2
n
i=1
__
+
f(x)G
i
(x)dx
_
2
=
1
2
n
i=1
[E G
i
(x)]
2
(3.215)
In practice, we may use only one function in this sum.
Box 3.7. Selecting the nonquadratic functions with which to approximate
the negentropy
First we must select functions G
i
(x) so that function f
0
(x), given by (3.202), is inte-
grable. This will happen only if functions G
i
(x) grow at most as fast as x
2
with [x[
increasing. The next criterion is to choose these functions so that they satisfy constraints
(3.204) and (3.205).
Let us consider two functions
G
1
(x) and
G
2
(x) that grow slowly enough with [x[, and
the rst one is odd and the second one is even. We shall show here how to modify them
so they satisfy constraints (3.204) and (3.205).
Let us construct from them two functions
G
1
(x) and
G
2
(x)
G
1
(x)
G
1
(x) +x (3.216)
G
2
(x)
G
2
(x) +x
2
+ (3.217)
where , and are some constants, the values of which will be determined so that
258 Image Processing: The Fundamentals
constraints (3.205) are satised. Note that the orthogonality constraint is automatically
satised since g(x)
G
1
(x)
G
2
(x) is an odd function. To ensure orthonormality, some
scaling of these functions has to take place, but this may be done at the end very easily.
Also note that
G
1
(x) is automatically orthogonal to G
0
and G
4
x
2
with weight g(x)
since g(x)
G
1
(x)x
r
for r = 0, 2 is an odd function. For r = 1 constraint (3.205) has the
form:
_
+
g(x)
G
1
(x)xdx = 0
_
+
g(x)
G
1
(x)xdx +
_
+
g(x)x
2
dx
. .
=variance=1
= 0
=
_
+
g(x)
G
1
(x)xdx (3.218)
Next note that as
G
2
(x) is even, it automatically satises constraint (3.205) for r = 1.
To satisfy the same constraints for r = 0 and r = 2, we must have:
_
+
g(x)
G
2
(x)dx = 0
_
+
g(x)
G
2
(x)x
2
dx = 0 (3.219)
Or:
_
+
g(x)
G
2
(x)dx +
_
+
g(x)x
2
dx
. .
=variance=1
+
_
+
g(x)dx
. .
=1
= 0
_
+
g(x)
G
2
(x)x
2
dx +
_
+
g(x)x
4
dx
. .
=3(example 3.50)
+
_
+
g(x)x
2
dx
. .
=variance=1
= 0 (3.220)
Or:
_
+
g(x)
G
2
(x)dx + + = 0
_
+
g(x)
G
2
(x)x
2
dx + 3 + = 0 (3.221)
Subtracting the rst from the second equation, we obtain:
=
1
2
_
+
g(x)
G
2
(x)dx
1
2
_
+
g(x)
G
2
(x)x
2
dx (3.222)
Independent component analysis 259
Using this in the rst of (3.221), we obtain:
=
1
2
_
+
g(x)
G
2
(x)x
2
dx
3
2
_
+
g(x)
G
2
(x)dx (3.223)
So, the two functions we should use to expand the unknown probability density function
f(x) could be
G
1
(x)
1
1
_
G
1
(x) x
_
+
g(z)
G
1
(z)zdz
_
(3.224)
G
2
(x)
1
2
_
G
2
(x) +x
2
_
1
2
_
+
g(z)
G
2
(z)dz
1
2
_
+
g(z)
G
2
(z)z
2
dz
_
+
1
2
_
+
g(z)
G
2
(z)z
2
dz
3
2
_
+
g(z)
G
2
(z)dz
_
(3.225)
where
1
and
2
ensure orthonormality. Note that here we replaced the dummy variable
in the integrals of (3.224) and (3.225) with z to avoid confusion.
We can now compute the values of the corresponding parameters c
i
which appear in
the approximation of the negentropy by (3.215):
c
1
_
+
f(x)G
1
(x)dx
=
1
1
_
_
+
f(x)
G
1
(x)dx
_
+
f(x)xdx
. .
=0(mean)
_
+
g(z)
G
1
(z)zdz
_
=
1
1
E
G
1
(x) (3.226)
c
2
_
+
f(x)G
2
(x)dx
=
1
2
__
+
f(x)
G
2
(x)dx
+
_
+
f(x)x
2
dx
. .
=1(unit variance)
_
1
2
_
+
g(z)
G
2
(z)dz
1
2
_
+
g(z)
G
2
(z)z
2
dz
_
+
_
+
f(x)dx
_
1
2
_
+
g(z)
G
2
(z)z
2
dz
3
2
_
+
g(z)
G
2
(z)dz
__
=
1
2
__
+
f(x)
G
2
(x)dx
_
+
g(z)
G
2
(z)dz
_
=
1
2
_
E
G
2
(x) E
G
2
(z)
_
(3.227)
260 Image Processing: The Fundamentals
In the last equality it is understood that the expectation of function
G
2
(z) is computed
over a normally distributed variable z.
If we decide to use only one function G
i
, and we select to use, say, only an even
function, then for
G
2
(x) ln(cosh(ax))/a we obtain expression (3.181), while for
G
2
(x) exp(x
2
/2), we obtain expression (3.182). Note that these expressions pro-
duce numbers approximately proportional to the true value of the negentropy, because
we have omitted from them the normalising factors
1
and
2
.
Note also that none of these approximations captures any asymmetric characteristics
of probability density function f(x). For example, for a dark image, where no negative
numbers are allowed, the assumption that the unknown probability density function
f(x) is symmetric must be clearly wrong, as any tail of negative numbers is either
truncated or mapped into the positive numbers. This is particularly so for medical
images, where the histograms of the grey values are not really Gaussian, but they
exhibit strong asymmetries. To capture such asymmetries in the histograms of the
data, we must use at least two functions in the expansion of the negentropy, one of
them odd and one of them even.
So, if we wish to use also an odd function in the expansion, we may use
G
1
(x)
xexp(x
2
/2). The use of this function in conjunction with
G
2
(x) exp (x
2
/2) results
in approximation (3.183). The coecients with which the two terms in the formula are
multiplied come from factor 1/2 which has to multiply c
2
i
and from the normalisation
constants of the functions used (see examples 3.63 and 3.64), ie these coecients are
1/(2
2
1
) and 1/(2
2
2
), respectively.
Example B3.62
For function
G
2
(x) exp(x
2
/2) compute E
G
2
(z) that appears in (3.227).
E
G
2
(z)
1
2
_
+
z
2
2
e
z
2
2
dz
=
1
2
_
+
e
z
2
dz
=
1
2
(3.228)
Here we made use of (3.159), on page 241. This value appears in approximations
(3.182) and (3.183), on page 246, instead of E
G
2
(z).
Independent component analysis 261
Example B3.63
For functions
G
1
(x) xexp(x
2
/2) and
G
2
(x) exp(x
2
/2) compute the
values of parameters , and using equations (3.218), (3.222) and (3.223).
From (3.218) we have
=
1
2
_
+
x
2
2
e
x
2
2
x
2
dx
=
1
2
_
+
e
x
2
x
2
dx
=
1
2
_
+
xd
_
e
x
2
_
_
1
2
_
=
1
2
2
xe
x
2
1
2
2
_
+
e
x
2
dx
=
1
2
=
1
2
2
(3.229)
where we made use of (3.159), on page 241.
For
G
2
(x) we need the following integral
1
2
_
+
x
2
2
e
x
2
2
dx =
1
2
_
+
e
x
2
dx =
1
=
1
2
(3.230)
where we made use of (3.159). The second term in (3.222) is /2, ie it is 1/(4
2).
Then from (3.222) we have:
=
1
2
1
2
1
4
2
=
1
4
2
(3.231)
From (3.223) we have:
=
1
4
2
3
2
1
2
=
5
4
2
(3.232)
262 Image Processing: The Fundamentals
Example B3.64
Calculate the normalisation constants that will make functions
G
1
(x)
xexp(x
2
/2) +x and
G
2
(x) exp(x
2
/2) +x
2
+ orthonormal with weight
the Gaussian kernel g(x) with zero mean and unit variance. The values of
, and have been computed in example 3.63.
These two functions are already orthogonal. Further, each one must integrate to 1
when squared and integrated with kernel g(x). We must work out the values of these
integrals.
_
+
g(x)
G
1
(x)
2
dx =
1
2
_
+
x
2
2
_
e
x
2
x
2
+
2
x
2
+ 2e
x
2
2
x
2
_
dx =
1
2
__
+
3x
2
2
x
2
dx + 2
_
+
e
x
2
x
2
dx
_
+
2
1
2
_
+
x
2
2
x
2
dx
. .
=1=variance
=
1
2
_
1
3
_
+
xd
_
e
3x
2
2
_
2
1
2
_
+
xd
_
e
x
2
_
_
+
2
=
1
2
_
1
3
xe
3x
2
2
. .
=0
+
1
3
_
+
3x
2
2
dx
. .
Set z
3x/
2
xe
x
2
. .
=0
+
_
+
e
x
2
dx
. .
=
(eqn(3.159))
_
+
2
=
1
2
_
1
3
_
2
3
_
+
e
z
2
dz
. .
=
(eqn(3.159))
+
_
+
2
=
1
3
3
+
1
2
+
2
=
1
3
3
1
4
+
1
8
=
8
3 9
72
2
1
(3.233)
Here we made use of (3.229). So,
G
1
(x) should be normalised by being multiplied with
Independent component analysis 263
_
72/(8
3 9) to form function G
1
(x). When G
1
(x) is used in the calculation of the
negentropy, this factor squared and divided by 2 will appear as coecient of the rst
term of approximation (3.183) (factor 1/(2
2
1
), see Box 3.7, on page 257).
For
G
2
(x) we have:
_
+
g(x)
G
2
(x)
2
dx =
1
2
_
+
x
2
2
e
x
2
dx +
2
1
2
_
+
x
2
2
x
4
dx
. .
=3(example (3.50))
+
2
1
2
_
+
x
2
2
dx
. .
=1
+2
1
2
_
+
x
2
2
e
x
2
2
x
2
dx + 2
1
2
_
+
x
2
2
e
x
2
2
dx
. .
=
(eqn (3.159))
+2
1
2
_
+
x
2
2
x
2
dx
. .
=1=variance
=
1
2
_
+
3x
2
2
dx
. .
z
3x/
2
+3
2
+
2
+ 2
1
2
_
+
e
x
2
x
2
dx +
2 + 2 =
1
+ 3
2
+
2
+ 2
1
2
1
2
_
+
xd
_
e
x
2
_
+
2 + 2 =
1
3
+ 3
2
+
2
2
xe
x
2
. .
=0
+
1
2
_
+
e
x
2
dx
. .
=
2 + 2 =
1
3
+ 3
2
+
2
+
1
2 + 2 =
1
3
+
3
32
+
25
32
+
1
8
5
4
10
32
=
1
3
18
32
=
16
3 27
48
2
2
(3.234)
Here we made use of the values of and given by (3.231) and (3.232), respectively.
So,
G
2
(x) should be normalised by being multiplied with 1/
2
=
48/
_
16
3 27
to form function G
2
(x). The factor then that should appear in the approximation of
negentropy should be 1/(2
2
2
) = 24/(16
k=1
p
ki
(3.236)
Let us call p
ki
the zero-mean versions of p
ki
: p
ki
p
ki
p
i
. Note that p
i
is the mean value
of image i. This is equivalent to describing the cloud of points in 3.16a using a coordinate
Independent component analysis 265
a pixel
a cloud of
i
m
a
g
e
2
image 4
i
m
a
g
e
I
i
m
a
g
e
1
MN pixels
i
m
a
g
e
3
a cloud of
p
i
x
e
l
1
p
i
x
e
l
2
p
i
x
e
l
3
pixel 4
an image
I images
p
i
x
e
l
M
N
(a) (b)
Figure 3.16: (a) If we assume that a random experiment decides which combination of pixel
values across all images corresponds to a pixel position, then each pixel is a point in a
coordinate system with as many axes as images we have. Assuming that we have I images
of size M N, the cloud of points created this way represents MN outcomes of the random
experiment, as each pixel is one such outcome. (b) If we assume that a random experiment
decides which combination of pixel values makes up an image, then each image is a point
in a coordinate system with as many axes as pixels in the image. Assuming that we have
I images of size M N, the cloud of points created this way represents I outcomes of the
random experiment, as each image is one such outcome.
system that is centred at the centre of the cloud and has its axes parallel one by one with
those of the original coordinate system (see gure 3.17a).
We may then dene a coordinate system in which the components that make up each
pixel are uncorrelated. We learnt how to do that in the section on K-L: we have to work out
the eigenvectors of the I I autocorrelation matrix of the zero-mean data. Such a matrix
may have at most I eigenvalues, but in general it will have E I. The C
ij
component of the
autocorrelation matrix, corresponding to the correlation between images i and j, is given by:
C
ij
1
MN
MN
k=1
p
ki
p
kj
(3.237)
The eigenvectors of this matrix dene a very specic coordinate system in which each
point is represented by uncorrelated values. The eigenvector that corresponds to the largest
eigenvalue coincides with the axis of maximum elongation of the cloud, while the eigenvector
that corresponds to the smallest eigenvalue coincides with the axis of minimum elongation
of the cloud. This does not allow us any choice of the coordinate system. Imagine, however,
if the cloud were perfectly round. All coordinate systems dened with their origins at the
centre of the cloud would have been equivalent (see gure 3.17b). We could then choose any
one of them to express the data. Such a degeneracy would be expressed by matrix C having
only one multiple eigenvalue. The multiplicity of the eigenvalue would be the same as the
dimensionality of the cloud of points, ie the same as the number of axes we could dene for
the coordinate system. Data that are represented by a spherical cloud are called whitened
data. Such data allow one to choose from among many coordinate systems, one in which the
266 Image Processing: The Fundamentals
components of the data are more independent than in any other. So, identifying independent
components from our data may be achieved by rst whitening the data, in order to have an
unlimited number of options of creating uncorrelated components from them, and choosing
from among them the most independent ones.
i
m
a
g
e
2
image 4
i
m
a
g
e
I
i
m
a
g
e
1
i
m
a
g
e
3
i
m
a
g
e
1
image 4
i
m
a
g
e
I
i
m
a
g
e
3
i
m
a
g
e
2
(a) (b)
Figure 3.17: (a) Removing the mean of each component is equivalent to shifting the original
coordinate system so that its centre coincides with the centre of the cloud of points and each
axis remains parallel with itself. (b) When the cloud of points is spherical, all coordinate
systems, centred at the centre of the cloud, like the one indicated here by the thick arrows,
are equivalent in describing the data.
How can we whiten the data?
Let us consider rst how we create the uncorrelated components of p
ki
by using the eigenvec-
tors of matrix C. Let us call the lth eigenvector of C u
l
and the corresponding eigenvalue
l
.
Let us say that C has E eigenvectors in all, so l = 1, 2, . . . , E. Each point in gure 3.16a is
represented by a position vector p
k
( p
k1
, p
k2
, . . . , p
kI
)
T
in the coordinate system centred
at the centre of the cloud. This position vector is projected on each one of the eigenvectors
u
l
in turn, to identify the components of vector p
k
in the new coordinate system made up
from these eigenvectors. Let us denote these projections by w
kl
p
T
k
u
l
. The combination
of values (w
1l
, w
2l
, . . . , w
MN,l
) make up the lth uncorrelated component of the original data.
For xed l, the values of w
kl
for k = 1, 2, . . . , MN have a standard deviation equal to
l
. If we
want the spread of these values to be the same along all axes, dened by u
l
for the dierent
values of l, we must divide them with the corresponding
l
, ie we must use w
kl
instead of
w
kl
, given by: w
kl
w
kl
/
l
. This is equivalent to saying
w
kl
w
kl
l
= p
T
k
u
l
1
l
= p
T
k
_
u
l
1
l
_
p
T
k
u
l
(3.238)
where we dened the unit vectors of the axes to be u
l
, so that the points are equally spread
along all axes.
In summary, in order to whiten the data, we use as unit vectors of the coordinate system
the eigenvectors of matrix C, divided by the square root of the corresponding eigenvalue.
Independent component analysis 267
How can we select the independent components from whitened data?
Once the data are whitened, we must select our rst axis so that the projections of all points
on this axis are as non-Gaussianly distributed as possible. Then we select a second axis so
that it is orthogonal to the rst and at the same time the projections of all points on it are
as non-Gaussian as possible. The process continues until we select all axes, making sure that
each new axis we dene is orthogonal to all previously dened axes. In Box 3.9 it is shown
that such axes may be dened by iteratively solving an appropriate equation.
The MN E-tuples of values we shall dene this way are the independent components of
the original MN I-tuples we started with. They are the coecients of the expansion of each
of the MN original I-tuples in terms of the basis vectors dened by the axes we selected.
Indeed, the tip of each unit vector we select has a certain position vector in relation to the
original coordinate system. The components of this position vector make up the basis I-tuples
in terms of which all other I-tuples may be expressed.
So far, our discussion referred to gure 3.16a. This is useful for linear spectral unmixing,
a problem we shall discuss in Chapter 7 (see page 695). However, in most other image
processing applications, the assumed underlying random experiment is usually the one shown
in gure 3.16b.
Example B3.65
Dierentiation by a vector is dened as dierentiation with respect to each
of the elements of the vector. For vectors a, b and f show that:
f
T
a
f
= a and
b
T
f
f
= b (3.239)
Assume that vectors a, b and f are N 1. Then we have:
f
T
a =
_
f
1
f
2
. . . f
N
_
_
_
_
_
_
a
1
a
2
.
.
.
a
N
_
_
_
_
_
= f
1
a
1
+f
2
a
2
+. . . +f
N
a
N
(3.240)
Use this in:
f
T
a
f
_
_
_
_
_
_
_
_
f
T
a
f
1
f
T
a
f
2
.
.
.
f
T
a
f
N
_
_
_
_
_
_
_
_
=
_
_
_
_
_
a
1
a
2
.
.
.
a
N
_
_
_
_
_
f
T
a
f
= a (3.241)
Similarly:
b
T
f = b
1
f
1
+b
2
f
2
+. . . +b
N
f
N
(3.242)
268 Image Processing: The Fundamentals
Then:
b
T
f
f
=
_
_
_
_
_
_
_
_
b
T
f
f
1
b
T
f
f
2
.
.
.
b
T
f
f
N
_
_
_
_
_
_
_
_
=
_
_
_
_
_
b
1
b
2
.
.
.
b
N
_
_
_
_
_
b
T
f
f
= b (3.243)
Box 3.8. How does the method of Lagrange multipliers work?
Assume that we wish to satisfy two equations simultaneously:
f(x, y) = 0
g(x, y) = 0 (3.244)
Let us assume that in the (x, y) plane the rst of these equations is satised at point
A and the second at point B, so that it is impossible to satisfy both equations, exactly
for the same value of (x, y).
contours of
constant
values of f(x,y)
contours of
constant
values of g(x,y)
A
B
x
y
C
Figure 3.18: Two incompatible constraints are exactly satised at points A and B. The
point where we make the minimum total violation of the two constraints is the point
where two isocontours of the two functions just touch (point C). This is the point
identied by the method of Lagrange multipliers.
We wish to nd a point C on the plane where we make the least compromise in violating
these two equations. The location of this point will depend on how fast the values of
f(x, y) and g(x, y) change from 0, as we move away from points A and B, respectively.
Let us consider the isocontours of f and g around each of the points A and B, re-
spectively. As the contours grow away from point A, function [f(x, y)[ takes larger
Independent component analysis 269
and larger values, while as contours grow away from point B, the values function
[g(x, y)[ takes become larger as well. Point C, where the values of [f(x, y)[ and g[(x, y)[
are as small as possible (minimum violation of the constraints which demand that
[f(x, y)[ = [g(x, y)[ = 0), must be the point where an isocontour around A just touches
an isocontour around B, without crossing each other. When two curves just touch
each other, their tangents become parallel. The tangent vector to a curve along which
f =constant is f, and the tangent vector to a curve along which g =constant is g.
The two tangent vectors do not need to have the same magnitude for the minimum
violation of the constraints. It is enough for them to have the same orientation. There-
fore, we say that point C is determined by the solution of equation f = g where
is some constant that takes care of the (possibly) dierent magnitudes of the two
vectors. In other words, the solution to the problem of simultaneous satisfaction of the
two incompatible equations (3.244) is the solution of the dierential set of equations
f +g = 0 (3.245)
where is the Lagrange multiplier, an arbitrary constant.
Box 3.9. How can we choose a direction that maximises the negentropy?
Let us consider that the negentropy we wish to maximise is given by approximation
(3.181), on page 246, which we repeat here in a more concise way,
J
1
(y) [E G(y) E G()]
2
(3.246)
where:
G(y)
1
a
ln[cosh(ay)] (3.247)
First of all we observe that the second term in (3.246) is a constant (see example 3.59, on
page 248) and so the maximum of J
1
(y) will coincide with an extremum of its rst term:
E G(y). So, our problem is to select an axis w, such that the projections y
i
on it of
all data vectors x
i
, (y
i
w
T
x
i
), are distributed as non-Gaussianly as possible. As w is
eectively a directional vector, its magnitude should be 1. So, our problem is phrased
as follows: extremise E
_
G(w
T
x
i
)
_
subject to the constraint w
T
w = 1. According to
the method of Lagrange multipliers, the solution of such a problem is given by the
solution of the following system of equations (see Box 3.8)
w
_
E
_
G(w
T
x
i
)
_
+w
T
w
= 0 (3.248)
where is a parameter called Lagrange multiplier. Note that the expectation opera-
tor means nothing else than averaging over all x
i
vectors in the ensemble. So, expecta-
tion and dierentiation may be exchanged as both are linear operators. By using then
270 Image Processing: The Fundamentals
the rules of dierentiating with respect to a vector (see example 3.65), we equivalently
may write
E
_
w
_
G(w
T
x
i
)
_
+
w
_
w
T
w
= 0
E
_
_
G(w
T
x
i
)
w
T
x
i
w
T
x
i
w
_
+
w
_
w
T
w
= 0
E
_
G
(w
T
x
i
)x
i
_
+ 2w = 0 (3.249)
where G
(y) is the derivative of G(y) with respect to its argument. For G(y) given by
(3.247), we have:
G
(y)
dG(y)
dy
=
1
a
1
cosh(ay)
d[cosh(ay)]
dy
=
sinh(ay)
cosh(ay)
= tanh(ay) (3.250)
It is convenient to call 2 , so we may say that the solution w we need is the
solution of equation:
E
_
G
(w
T
x
i
)x
i
_
w = 0 (3.251)
This is a system of as many nonlinear equations as components of vectors w and x
i
. If
we denote the left-hand side of (3.251) by F, we may write:
F E
_
G
(w
T
x
i
)x
i
_
w (3.252)
These equations represent a mapping from input vector w to output vector F. The
Jacobian matrix of such a mapping is dened as the matrix of all rst order partial
derivatives of F with respect to w:
J
F
(w)
(F
1
, F
2
, . . . , F
N
)
(w
1
, w
2
, . . . , w
N
)
_
_
_
_
_
_
_
_
F
1
w
1
F
1
w
2
F
1
w
N
F
2
w
1
F
2
w
2
F
2
w
N
.
.
.
.
.
.
.
.
.
F
N
w
1
F
N
w
2
F
N
w
N
_
_
_
_
_
_
_
_
(3.253)
Here N is assumed to be the number of components of vectors w and F. The Jacobian
matrix may be used to expand function F(w
+
) about a point w, near w
+
, using Taylor
series, where we keep only the rst order terms:
F(w
+
) F(w) +J
F
(w)(w
+
w) (3.254)
Now, if point w
+
is where the function becomes 0, ie if F(w
+
) = 0, the above equation
may be used to identify point w
+
, starting from point w:
w
+
= w[J
F
(w)]
1
F(w) (3.255)
It can be shown that the Jacobian of system (3.251) is given by
J
F
(w) = E
_
G
(w
T
x
i
)x
i
x
i
T
_
I (3.256)
Independent component analysis 271
where G
(w
T
x
i
)x
i
x
i
T
_
E
_
G
(w
T
x
i
)
_
E
_
x
i
x
i
T
_
= E
_
G
(w
T
x
i
)
_
I (3.257)
The last equality follows because the data represented by vectors x
i
have been centred
and whitened. This approximation allows one to write for the Jacobian:
J
F
(w)
_
E
_
G
(w
T
x
i
)
_
I
[J
F
(w)]
1
1
E{G
(w
T
x
i
)}
I (3.258)
If we substitute from (3.258) and (3.252) into (3.255), we deduce that the solution of
system (3.251) may be approached by updating the value of an initial good guess of w,
using:
w
+
= w
E
_
G
(w
T
x
i
)x
i
_
w
E G
(w
T
x
i
)
(3.259)
This equation may be further simplied to:
w
+
=
wE
_
G
(w
T
x
i
)
_
w E
_
G
(w
T
x
i
)x
i
_
+w
E G
(w
T
x
i
)
(3.260)
The denominator is a scalar that scales all components of w
+
equally, so it may be
omitted, as long as after every update we scale w
+
to have unit magnitude. Thus, we
deduce the following updating formula:
w
+
= wE
_
G
(w
T
x
i
)
_
E
_
G
(w
T
x
i
)x
i
_
(3.261)
After every update, we must check for convergence: if vectors w and w
+
are almost
identical, we stop the process. These two vectors may be deemed to be identical if the
absolute value of their dot product is more than, say, 0.999 (it would have to be 1 if we
insisted them to be exactly identical).
Example B3.66
Our input data consist of an ensemble of four 3D vectors x
i
, where i =
1, 2, 3, 4. In terms of their components, these four vectors are:
x
1
T
= (x
11
, x
12
, x
13
)
x
2
T
= (x
21
, x
22
, x
23
)
x
3
T
= (x
31
, x
32
, x
33
)
x
4
T
= (x
41
, x
42
, x
43
) (3.262)
272 Image Processing: The Fundamentals
We wish to identify a vector w
T
(w
1
, w
2
, w
3
) such that the projections y
i
,
where i = 1, 2, 3, 4, of vectors x
i
on this direction extremise E G(y). Write
down the equations that have to be solved to identify w.
For a start, we write down the expressions for y
i
:
y
1
w
T
x
1
= w
1
x
11
+w
2
x
12
+w
3
x
13
y
2
w
T
x
2
= w
1
x
21
+w
2
x
22
+w
3
x
23
y
3
w
T
x
3
= w
1
x
31
+w
2
x
32
+w
3
x
33
y
4
w
T
x
4
= w
1
x
41
+w
2
x
42
+w
3
x
43
(3.263)
Applying formula (3.251) we can write down the equations we have to solve:
1
4
[G
(y
1
)x
11
+G
(y
2
)x
21
+G
(y
3
)x
31
+G
(y
4
)x
41
] w
1
= 0
1
4
[G
(y
1
)x
12
+G
(y
2
)x
22
+G
(y
3
)x
32
+G
(y
4
)x
42
] w
2
= 0
1
4
[G
(y
1
)x
13
+G
(y
2
)x
23
+G
(y
3
)x
33
+G
(y
4
)x
43
] w
3
= 0 (3.264)
Here G
(y) = tanh(ay). Note that the expectation operator that appears in (3.251) was
interpreted to mean the average over all vectors x
i
and we wrote down one equation
for each component of vector w.
Example B3.67
Work out the Jacobian matrix of system (3.264).
We start by naming the left-hand side of the equations of the system:
F
1
1
4
[G
(y
1
)x
11
+G
(y
2
)x
21
+G
(y
3
)x
31
+G
(y
4
)x
41
] w
1
F
2
1
4
[G
(y
1
)x
12
+G
(y
2
)x
22
+G
(y
3
)x
32
+G
(y
4
)x
42
] w
2
F
3
1
4
[G
(y
1
)x
13
+G
(y
2
)x
23
+G
(y
3
)x
33
+G
(y
4
)x
43
] w
3
(3.265)
The Jacobian of this system is dened as:
Independent component analysis 273
J
F
(w)
_
_
_
_
_
_
_
F
1
w
1
F
1
w
2
F
1
w
3
F
2
w
1
F
2
w
2
F
2
w
3
F
3
w
1
F
3
w
2
F
3
w
3
_
_
_
_
_
_
_
(3.266)
The elements of this matrix may be computed with the help of equations (3.265) and
(3.263). We compute explicitly only the rst one:
F
1
w
1
=
1
4
_
G
(y
1
)
y
1
y
1
w
1
x
11
+
G
(y
2
)
y
2
y
2
w
1
x
21
+
G
(y
3
)
y
3
y
3
w
1
x
31
+
G
(y
4
)
y
4
y
4
w
1
x
41
_
w
1
w
1
(3.267)
We call G
(y
1
)x
2
11
+G
(y
2
)x
2
21
+G
(y
3
)x
2
31
+G
(y
4
)x
2
41
F
1
w
2
=
1
4
[G
(y
1
)x
11
x
12
+G
(y
2
)x
21
x
22
+G
(y
3
)x
31
x
32
+G
(y
4
)x
41
x
42
]
F
1
w
3
=
1
4
[G
(y
1
)x
11
x
13
+G
(y
2
)x
21
x
23
+G
(y
3
)x
31
x
33
+G
(y
4
)x
41
x
43
]
F
2
w
1
=
1
4
[G
(y
1
)x
12
x
11
+G
(y
2
)x
22
x
21
+G
(y
3
)x
32
x
31
+G
(y
4
)x
42
x
41
]
F
2
w
2
=
1
4
_
G
(y
1
)x
2
12
+G
(y
2
)x
2
22
+G
(y
3
)x
2
32
+G
(y
4
)x
2
42
F
2
w
3
=
1
4
[G
(y
1
)x
12
x
13
+G
(y
2
)x
22
x
23
+G
(y
3
)x
32
x
33
+G
(y
4
)x
42
x
43
]
F
3
w
1
=
1
4
[G
(y
1
)x
13
x
11
+G
(y
2
)x
23
x
21
+G
(y
3
)x
33
x
31
+G
(y
4
)x
43
x
41
]
F
3
w
2
=
1
4
[G
(y
1
)x
13
x
12
+G
(y
2
)x
23
x
22
+G
(y
3
)x
33
x
32
+G
(y
4
)x
43
x
42
]
F
3
w
3
=
1
4
_
G
(y
1
)x
2
13
+G
(y
2
)x
2
23
+G
(y
3
)x
2
33
+G
(y
4
)x
2
43
(3.268)
274 Image Processing: The Fundamentals
Example B3.68
Show that the Jacobian given by equations (3.266) and (3.268) may be
written in the form (3.256).
For this case, equation (3.256) takes the form:
J
F
(w) =
1
4
_
G
(y
1
)x
1
x
1
T
+G
(y
2
)x
2
x
2
T
+G
(y
3
)x
3
x
3
T
+G
(y
4
)x
4
x
4
T
I
(3.269)
We may start by computing the vector outer products that appear on the right-hand
side:
x
1
x
1
T
=
_
_
x
11
x
12
x
13
_
_
_
x
11
x
12
x
13
_
=
_
_
x
2
11
x
11
x
12
x
11
x
13
x
12
x
11
x
2
12
x
12
x
13
x
13
x
11
x
13
x
12
x
2
13
_
_
x
2
x
2
T
=
_
_
x
21
x
22
x
23
_
_
_
x
21
x
22
x
23
_
=
_
_
x
2
21
x
21
x
22
x
21
x
23
x
22
x
21
x
2
22
x
22
x
23
x
23
x
21
x
23
x
22
x
2
23
_
_
x
3
x
3
T
=
_
_
x
31
x
32
x
33
_
_
_
x
31
x
32
x
33
_
=
_
_
x
2
31
x
31
x
32
x
31
x
33
x
32
x
31
x
2
32
x
32
x
33
x
33
x
31
x
33
x
32
x
2
33
_
_
x
4
x
4
T
=
_
_
x
41
x
42
x
43
_
_
_
x
41
x
42
x
43
_
=
_
_
x
2
41
x
41
x
42
x
41
x
43
x
42
x
41
x
2
42
x
42
x
43
x
43
x
41
x
43
x
42
x
2
43
_
_
(3.270)
If we substitute from (3.270) into (3.269), we shall obtain (3.266).
How do we perform ICA in image processing in practice?
The algorithm that follows is applicable to all choices of random experiment we make. The
only thing that has to change from one application to the other is the way we read the data,
ie the way we form the input vectors. In order to make the algorithm specic, we show here
how it is applied to the case shown in gure 3.16b.
Let us assume that we have I grey images of size M N.
Step 0: Remove the mean of each image. This step is not necessary, but it is advisable.
If the means are not removed, one of the independent components identied may be a at
component. Not particularly interesting, as we are really interested in identifying the modes
of image variation.
Step 1: Write the columns of image i one under the other to form an MN1 vector p
i
. You
will have I such vectors. Plotted in an MN-dimensional coordinate system they will create
a cloud of points as shown in gure 3.16b.
Independent component analysis 275
You may write these vectors next to each other to form the columns of a matrix P that will
be MN I in size.
Step 2: Compute the average of all vectors, say vector m, and remove it from each vector,
thus creating I vectors p
i
of size MN 1.
This operation moves the original coordinate system to the centre of the cloud of points-an
analogous operation to the one shown in gure 3.17a.
The new vectors form the MN I matrix
P, when written next to each other.
Step 3: Compute the autocorrelation matrix of the new vectors. Let us call p
ki
the kth
component of vector p
i
. Then the elements of the autocorrelation matrix C are:
C
kj
=
1
I
I
i=1
p
ki
p
ji
(3.271)
Matrix C is of size MN MN and it may also be computed as:
C =
1
I
P
P
T
(3.272)
Step 4: Compute the nonzero eigenvalues of C and arrange them in decreasing order. Let
us say that they are E. Let us denote by u
l
the eigenvector that corresponds to eigenvalue
l
. We may write these eigenvectors next to each other to form matrix U, of size MN E.
Step 5: Scale the eigenvectors so that the projected components of vectors p
i
will have the
same variance along all eigendirections: u
l
u
l
/
l
.
You may write the scaled eigenvectors next to each other to form matrix
U of size MN E.
Step 6: Project all vectors p
i
on the scaled eigenvectors to produce vectors q
i
, where q
i
is
an E 1 vector with components q
li
given by:
q
li
= u
T
l
p
i
(3.273)
This step achieves dimensionality reduction, as usually E < MN and at the same time
produces whitened data to work with.
This step may be performed in a concise way as
Q
U
T
P, with vectors q
i
being the columns
of matrix
Q.
Step 7: Select randomly an E 1 vector w
1
, with the values of its components drawn from
a uniform distribution in the range [1, 1]. (Any other range will do.)
Step 8: Normalise vector w
1
so that it has unit norm: if w
i1
is the ith component of vector
w
1
, dene vector w
1
, with components:
w
i1
w
i1
_
j
w
2
j1
(3.274)
Step 9: Project all data vectors q
i
on w
1
, to produce the I dierent projection components:
y
i
= w
T
1
q
i
(3.275)
These components (the y
1
, . . . , y
4
values in example 3.66) will be stored in an 1I matrix/row
vector which may be produced in one go as Y w
T
1
Q.
Step 10: Update each component of vector w
1
according to
w
+
k1
= w
k1
1
I
I
i=1
G
(y
i
)
1
I
I
i=1
q
ki
G
(y
i
) (3.276)
276 Image Processing: The Fundamentals
(corresponding to equation (3.261), on page 271).
Note that for G
(y) = tanh y, G
(y) dG
(y)/dy = 1 (tanhy)
2
.
Step 11: Normalise vector w
+
1
by dividing each of its elements with the square root of the
sum of the squares of its elements,
_
j
(w
+
j1
)
2
, so that it has unit magnitude. Call the
normalised version of vector w
+
1
, vector w
+
1
.
Step 12: Check whether vectors w
+
1
and w
1
are suciently close. If, say, [ w
+T
1
w
1
[ > 0.9999,
the two vectors are considered identical and we may adopt the normalised vector w
+
1
as the
rst axis of the ICA system.
If the two vectors are dierent, ie if the absolute value of their dot product is less than 0.9999,
we set w
1
= w
+
1
and go to Step 9.
After the rst ICA direction has been identied, we proceed to identify the remaining
directions. The steps we follow are the same as Steps 712, with one extra step inserted:
we have to make sure that any new direction we select is orthogonal to the already selected
directions. This is achieved by inserting an extra step between Steps 10 and 11, to make
sure that we use only the part of vector w
+
e
(where e = 2, . . . , E) which is orthogonal to all
previously identied vectors w
+
t
for t = 1, . . . , e 1. This extra step is as follows.
Step 10.5: When trying to work out vector w
+
e
, create a matrix B that contains as columns
all w
+
t
, t = 1, . . . , e 1, vectors worked out so far. Then, in Step 11, instead of using vector
w
+
e
, use vector w
+
e
BB
T
w
+
e
. (See example 3.69.)
To identify the coecients of the expansion of the input images in terms of the ICA basis,
the following steps have to be added to the algorithm.
Step 13: Project all vectors p
i
on the unscaled eigenvectors to produce vectors q
i
, where q
i
is an E 1 vector with components q
li
given by:
q
li
= u
l
T
p
i
(3.277)
This step may be performed in a concise way as Q U
T
P, with vectors q
i
being the columns
of matrix Q. Matrix U has been computed in Step 4.
Step 14: Write the identied vectors w
+
e
next to each other as columns, to form matrix W.
Then compute matrix Z W
T
Q. The ith column of matrix Z consists of the coecients of
the expansion of the ith pattern in terms of the identied basis.
Each of the w
+
e
vectors is of size E1. The components of each such vector are measured
along the eigenaxes of matrix C. They may, therefore, be used to express vector w
+
e
in terms
of the original coordinate system, via vectors u
l
. So, if we want to view the basis images we
identied, the following step may be added to the algorithm.
Step 15: We denote by v
e
the position vector of the tip of vector w
+
e
in the original
coordinate system:
v
e
= w
+
1e
u
1
+ w
+
2e
u
2
+ + w
+
Ee
u
E
+m (3.278)
Here m is the mean vector we removed originally from the cloud of points to move the original
coordinate system to the centre of the cloud. All these vectors may be computed simultane-
ously as columns of matrix V , given by V = UW + M, where matrix M is made up from
Independent component analysis 277
vector m repeated E times to form its columns.
There are E vectors v
e
, and they are of size MN1. Each one may be wrapped round to
form an M N image, by reading its rst M elements and placing them as the rst column
of the image, then the next M elements and placing them as the next image column and so
on. These will be the basis images we have created from the original ensemble of images and
the coecients of the expansion of each original image in terms of them are the so called
independent components of the original image.
If we wish to reconstruct an image, we must add the following steps to the algorithm.
Step 14.5: Construct vectors v
e
:
v
e
= w
+
1e
u
1
+ w
+
2e
u
2
+ + w
+
Ee
u
E
(3.279)
All these vectors may be computed simultaneously as columns of matrix
V , given by
V = UW.
Step 14.6: To reconstruct the i
th
pattern, we consider the i
th
column of matrix Z. The
elements of this column are the coecients with which the columns of
V have to be multiplied
and added to form the original pattern i. We must remember to add vector m, ie the mean
pattern, in order to have full reconstruction. To visualise the reconstructed pattern we shall
have to wrap it into an image.
Example B3.69
Consider two 31 vectors w
1
and w
2
of unit length and orthogonal to each
other. Write them one next to the other to form a 32 matrix B. Consider
also a 3 1 vector w
3
. Show that vector w
3
BB
T
w
3
is the component of
w
3
that is orthogonal to both vectors w
1
and w
2
.
Matrix B is [w
1
, w
2
]. Matrix B
T
is:
B
T
=
_
w
1
T
w
2
T
_
=
_
w
11
w
21
w
31
w
12
w
22
w
32
_
(3.280)
If we multiply B
T
with w
3
we obtain a vector with its rst element the dot product of
w
3
with vector w
1
and its second element the dot product of w
3
with vector w
2
:
B
T
w
3
=
_
w
11
w
21
w
31
w
12
w
22
w
32
_
_
_
w
13
w
23
w
33
_
_
=
_
w
11
w
13
+w
21
w
23
+w
31
w
33
w
12
w
13
+w
22
w
23
+w
32
w
33
_
(3.281)
As vectors w
1
and w
2
are of unit length, the values of these two dot products are the
projections of w
3
on w
1
and w
2
, respectively. When multiplied with the corresponding
unit vector (w
1
or w
2
), they become the components of vector w
3
along the directions
278 Image Processing: The Fundamentals
of vectors w
1
and w
2
, respectively. By subtracting them from vector w
3
, we are left
with the component of w
3
orthogonal to both directions:
_
_
w
13
w
23
w
33
_
_
(w
11
w
13
+w
21
w
23
+w
31
w
33
)
_
_
w
11
w
21
w
31
_
_
(w
12
w
13
+w
22
w
23
+w
32
w
33
)
_
_
w
12
w
22
w
32
_
_
(3.282)
This is the same as w
3
BB
T
w
3
= w
3
B
_
B
T
w
3
_
:
_
_
w
13
w
23
w
33
_
_
_
_
w
11
w
12
w
21
w
22
w
31
w
32
_
_
_
w
11
w
13
+w
21
w
23
+w
31
w
33
w
12
w
13
+w
22
w
23
+w
32
w
33
_
(3.283)
=
_
_
w
13
w
11
(w
11
w
13
+w
21
w
23
+w
31
w
33
) w
12
(w
12
w
13
+w
22
w
23
+w
32
w
33
)
w
23
w
21
(w
11
w
13
+w
21
w
23
+w
31
w
33
) w
22
(w
12
w
13
+w
22
w
23
+w
32
w
33
)
w
33
w
31
(w
11
w
13
+w
21
w
23
+w
31
w
33
) w
32
(w
12
w
13
+w
22
w
23
+w
32
w
33
)
_
_
Example B3.70
For three 2 1 vectors a
1
= (a
11
, a
21
)
T
, a
2
= (a
12
, a
22
)
T
and a
3
= (a
13
, a
23
)
T
,
show that formulae (3.271) and (3.272) give the same answer.
Applying formula (3.271) for I = 3, we obtain the four elements of the 2 2 matrix
C as follows:
C
11
=
1
3
3
k=1
a
2
1k
=
1
3
_
a
2
11
+a
2
12
+a
2
13
_
C
12
=
1
3
3
k=1
a
1k
a
2k
=
1
3
(a
11
a
21
+a
12
a
22
+a
13
a
23
)
C
21
=
1
3
3
k=1
a
2k
a
1k
=
1
3
(a
21
a
11
+a
22
a
12
+a
23
a
13
)
C
22
=
1
3
3
k=1
a
2
2k
=
1
3
_
a
2
21
+a
2
22
+a
2
23
_
(3.284)
To apply formula (3.272), we must rst write the three vectors as the columns of a
Independent component analysis 279
matrix and then multiply it with its transpose. We obtain:
C =
1
3
_
a
11
a
12
a
13
a
21
a
22
a
23
_
_
_
a
11
a
21
a
12
a
22
a
13
a
23
_
_
=
1
3
_
a
2
11
+a
2
12
+a
2
13
a
11
a
21
+a
12
a
22
+a
13
a
23
a
21
a
11
+a
22
a
12
+a
23
a
13
a
2
21
+a
2
22
+a
2
23
_
(3.285)
The two results are the same.
Example 3.71
Consider the image of gure 3.19. Divide it into blocks of 8 8 and treat
each block as an image of a set. Work out the basis images that will allow
the identication of the independent components for each image in the set.
Figure 3.19: An original image from which 8 8 tiles are extracted.
From image 3.19, 1000 patches of size 88 were extracted at random, allowing overlap.
Each patch had its mean removed. Then they were all written as columns of size
64 1. They were written next to each other to form matrix P of size 64 1000.
The average of all columns was computed, as vector m. This was removed from all
columns of matrix P, to form matrix
P. From this, matrix C of size 64 64 was
created as C =
P
P
T
/1000. The eigenvalues of this matrix were computed and all
280 Image Processing: The Fundamentals
those smaller than 0.0002 were set to 0. It is important to use a threshold for neglecting
small eigenvalues, because eigenvalues may become arbitrarily small and with negative
sign sometimes due to numerical errors. Such erroneous negative values may cause
problems when the square root of the eigenvalue is used in the whitening process. The
process of eigenvalue thresholding left us with E = 27 eigenvalues. The 27 basis images
identied by Step 15 of the algorithm are shown in gure 3.20
Figure 3.20: The 27 basis images identied by the ICA algorithm for the 1000 patches
extracted from image 3.19.
Example 3.72
Consider one of the 8 8 tiles you identied in example 3.71 and recon-
struct it using one, two,..., twenty-seven basis images identied by the ICA
algorithm.
In order to identify the basis images by the ICA algorithm, the data had to be whitened.
Whitening is only necessary for the process of identifying the basis images, and it
should be bypassed when we are interested in working out the coecients of the expan-
sion of a specic image (88 tile) in terms of the identied basis. So, to work out the
coecients of the expansion of any patch in terms of the basis images, we rst have
to know the coecients of its expansion in terms of the unscaled eigenvectors. This
Independent component analysis 281
way we create matrix Q of size 27 1000, the columns of which are vectors q
i
. The
elements of vector q
i
are given by q
li
= u
l
T
p
i
, where vector u
l
is the eigenvector of
unit length (Step 13 of the algorithm). We may then form matrix Z W
T
Q (Step
14 of the algorithm). The ith column of matrix Z consists of the coecients of the
expansion of the ith input pattern in terms of the basis images. To form the various
approximations of a particular original subimage (tile), we work as follows:
1st order approximation: Use the rst element of the ith column of matrix Z and
multiply the rst column of matrix
V constructed by (3.279) in Step 14.5 of the algo-
rithm. Add the mean vector m, and wrap up the result into an 8 8 image.
2nd order approximation: Multiply the rst element of the ith column of matrix
Z with the rst column of matrix
V , add the product of the second element of the ith
column of matrix Z with the second column of matrix
V , add the mean vector m, and
wrap up the result into an 8 8 image.
3rd order approximation: Multiply the rst element of the ith column of matrix
Z with the rst column of matrix
V , add the product of the second element of the ith
column of matrix Z with the second column of matrix
V , add the product of the third
element of the ith column of matrix Z with the third column of matrix
V , add the
mean vector m, and wrap up the result into an 8 8 image.
Continue the process until you have incorporated all components in the reconstruction.
Figure 3.21 shows the 27 successive reconstructions of the 20th input tile, as well as
the original input image.
Figure 3.21: The 27 reconstructions of one of the original images, by incorporating the
basis images one at a time. The bottom right panel shows the original image. The nal
reconstruction is almost perfect. The minor dierences between the original image and
the full reconstruction are probably due to the omission of some (weak) eigenvalues of
matrix C.
282 Image Processing: The Fundamentals
Example 3.73
Analyse the ower image in terms of the basis constructed in example 3.71
and reconstruct it using one, two,..., twenty-seven basis images.
The ower image was not one of the input patches used to construct the ICA basis
we shall use. To analyse it according to this basis, we rst remove from it the mean
vector m and then we project it onto the unnormalised eigenvectors stored in matrix
U, to work out its vector q
ower
. This vector is then projected onto the ICA vectors in
order to construct its coecients of expansion stored in vector z
ower
W
T
q
ower
.
These coecients are then used to multiply the corresponding basis vectors stored in
matrix
V . Figure 3.22 shows the 27 successive reconstructions of the ower image.
The reconstruction is not very good, because this image is totally dierent from the set
of images used to construct the basis.
Figure 3.22: The 27 reconstructions of the ower image, by incorporating the basis
images one at a time. The original image is shown at the bottom right.
Independent component analysis 283
How do we apply ICA to signal processing?
Let us revisit the cocktail party problem. Let us consider that we have S microphones and
each signal we record consists of T samples. There are again two ways to view the problem
from the statistical point of view.
(i) We may assume that the underlying random experiment produces combinations of
values that are recorded at one time instant by the S microphones. Every such combination
produces a point in a coordinate system with as many axes as we have microphones. We
shall have T such points, as many as the time instances at which we sampled the recorded
signals. This case is depicted in gure 3.23a. First we must remove the mean recording of
each microphone over time, from each component of the S-tuple, ie we must centre the data.
Let us say that after centring the data, x
ik
represents what the ith microphone recorded at
time k. The elements of the correlation matrix we shall have to compute in this case, in order
to whiten the data, will be given by:
C
ij
1
T
T
k=1
x
ik
x
jk
(3.286)
The tips of the unit vectors of the axes we shall dene by performing ICA will be the basis
S-tuple signals, ie combinations of recorded values by the microphones in terms of which
all observed combinations of values might be expressed. These signals are of no particular
interest. What we are interested in, in this case, is to identify a set of virtual microphones,
which, if they were used for recording, they would have recorded the independent components
(speeches) that make up the mixtures. These virtual microphones are represented by the
axes we identify by the ICA algorithm, and therefore, the components of the signals when
expressed in terms of these axes are what we are looking for. These are the rows of matrix Z,
and so the independent components of the mixtures are the signals represented by the rows
of matrix Z computed at Step 14.
The method of performing ICA is the same as the algorithm given on page 274, with the
only dierence that our input vectors p
i
now are T vectors of size S 1, ie matrix P is of
size S T.
(ii) We may assume that the underlying random experiment produces combinations of
values that are recorded by a single microphone over T time instances. Every such combina-
tion produces a point in a coordinate system with as many axes as we have time instances.
We shall have S such points, as many as microphones. This case is depicted in gure 3.23b.
Again we have to centre the input vectors by removing the mean over all microphones from
each component of the T-tuple. The elements of the correlation matrix we shall have to
compute in this case, in order to whiten the data, will be given by:
C
ij
1
S
S
k=1
x
ik
x
jk
(3.287)
Note that x
ik
that appears in (3.286) and x
ik
that appears in (3.287) are not the same.
First of all, vector x
k
in (3.286) represents what combination of values were recorded by
the various microphones at instant k, while here vector x
k
represents what combination of
values microphone k recorded over time. Further, dierent mean values were removed from
the original vectors in order to produce these centred versions of them.
284 Image Processing: The Fundamentals
The tips of the unit vectors of the axes we shall dene by performing ICA in this case will
be the basis T-sample long signals. However, the problem now is that we usually do not have
as many microphones as we might wish, ie S is not large enough to allow the calculation of
reliable statistics, and so this approach is not usually adopted.
a cloud of
m
i
c
r
.
2
m
i
c
r
.
3
microphone 4
m
i
c
r
o
p
h
o
n
e
S
recorded values
a combination of
instantaneously
T points
m
i
c
r
o
p
h
o
n
e
1
t
i
m
e
t
2
a cloud of
1
t
i
m
e
t
t
i
m
e
t
3
time t
4
t
i
m
e
t
T
a recorded signal
S signals
(a) (b)
Figure 3.23: (a) If we assume that a random experiment decides which combination of sample
values are recorded at each instance by the S microphones, then each such combination is a
point in a coordinate system with as many axes as microphones we have. Assuming that each
microphone over time recorded T samples, the cloud of points created this way represents
T outcomes of the random experiment. (b) If we assume that a random experiment decides
which combination of sample values makes up a recorded signal, then each recorded signal is
a point in a coordinate system with as many axes as samples in the signal. Assuming that
we have S recorded signals, the cloud of points created this way represents S outcomes of the
random experiment.
Example 3.74
Figure 3.24 shows three signals which were created using formulae
f(i) = sin
i
19
g(i) = sin
i
5
+ sin
i
31
h(i) = 0.3i
modulo 5
(3.288)
where i takes integer values from 1 to 1000. Let us consider the three
mixtures of them shown in gure 3.25, which were created using:
m
1
(i) = 0.3f(i) + 0.4g(i) + 0.3h(i)
m
2
(i) = 0.5f(i) + 0.2g(i) + 0.3h(i)
m
3
(i) = 0.1f(i) + 0.1g(i) + 0.8h(i) (3.289)
Independent component analysis 285
Assume that you are only given the mixed signals and of course that you
do not know the mixing proportions that appear in (3.289). Use the ICA
algorithm to recover the original signals.
0 20 40 60 80 100
1
0.5
0
0.5
1
0 20 40 60 80 100
2
1
0
1
2
0 20 40 60 80 100
0
0.5
1
1.5
Figure 3.24: The rst 100 samples of three original signals.
0 20 40 60 80 100
0.5
0
0.5
1
1.5
0 20 40 60 80 100
0.5
0
0.5
1
1.5
0 20 40 60 80 100
0.5
0
0.5
1
1.5
Figure 3.25: The rst 100 samples of three mixed signals.
We are going to run the ICA algorithm, assuming that the underlying random exper-
iment produces triplets of numbers recorded by the three sensors over a period of 1000
time instances. This is the case depicted in gure 3.23a. We shall not use Step 0 of
the algorithm as there is no point here. We shall apply, however, Steps 1-14. Note
that matrix P now is 3 1000, and matrix C is 3 3:
C =
_
_
0.2219 0.1725 0.0985
0.1725 0.1825 0.0886
0.0985 0.0886 0.1303
_
_
(3.290)
The eigenvalues of this matrix are:
1
= 0.4336,
2
= 0.0725 and
3
= 0.0286. The
corresponding eigenvectors, written as columns of matrix U, are:
U =
_
_
0.6835 0.3063 0.6626
0.6105 0.2577 0.7489
0.4002 0.9164 0.0109
_
_
(3.291)
286 Image Processing: The Fundamentals
After we divide each vector with the square root of the corresponding eigenvalue, we
obtain matrix
U:
U =
_
_
1.0379 1.1379 3.9184
0.9271 0.9574 4.4287
0.6077 3.4042 0.0642
_
_
(3.292)
The initial guess for vector w
1
is (0.9003, 0.5377, 0.2137)
T
. After the algorithm runs,
it produces the following three unit vectors along the directions that will allow the
unmixing of the signals, written as columns of matrix W:
W =
_
_
0.4585 0.6670 0.5872
0.8877 0.3125 0.3382
0.0421 0.6764 0.7354
_
_
(3.293)
Steps 13 and 14 of the algorithm allow us to recover the original signals, as shown in
gure 3.26. Note that these signals are the rows of matrix Z, computed in step 14, as
these are the components that would have been recorded by the ctitious microphones
represented by the three recovered axes of matrix W.
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
0 20 40 60 80 100
2
1
0
1
2
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
Figure 3.26: The rst 100 samples of the three recovered signals by the ICA algorithm.
Example 3.75
Figure 3.27 shows the three mixed signals of example 3.74 with some Gaus-
sian noise added to each one. The added noise values were drawn from a
Gaussian probability density function with mean 0 and standard deviation
0.5. Perform again ICA to recover the original signals.
This time matrix C is given by:
C =
_
_
0.2326 0.1729 0.1005
0.1729 0.1905 0.0892
0.1005 0.0892 0.1392
_
_
(3.294)
Independent component analysis 287
The eigenvalues of this matrix are:
1
= 0.4449,
2
= 0.0801 and
3
= 0.0374. The
corresponding eigenvectors, written as columns of matrix U, are:
U =
_
_
_
0.6852 0.3039 0.6619
0.6070 0.2639 0.7496
0.4025 0.9154 0.0037
_
_
_
(3.295)
0 20 40 60 80 100
0.5
0
0.5
1
1.5
2
0 20 40 60 80 100
0.5
0
0.5
1
1.5
0 20 40 60 80 100
0.5
0
0.5
1
1.5
2
Figure 3.27: The rst 100 samples of three mixed signals with noise.
After we divide each vector with the square root of the corresponding eigenvalue, we
obtain matrix
U:
U =
_
_
_
1.0273 1.0736 3.4246
0.9101 0.9322 3.8781
0.6034 3.2338 0.0190
_
_
_
(3.296)
The initial guess for vector w
1
is again (0.9003, 0.5377, 0.2137)
T
. After the algorithm
runs, it produces the following three unit vectors along the directions that will allow
the unmixing of the signals, written as columns of matrix W:
W =
_
_
_
0.4888 0.6579 0.5729
0.8723 0.3775 0.3107
0.0118 0.6516 0.7584
_
_
_
(3.297)
The recovered signals are shown in gure 3.28. We note that even a moderate amount
of noise deteriorates signicantly the quality of the recovered signals. There are two
reasons for that: rst each mixed signal has its own noise component which means
that it is as if we had 6 original signals mixed; second, ICA is designed to recover
non-Gaussian signals, and the noise added is Gaussian.
288 Image Processing: The Fundamentals
0 20 40 60 80 100
2
1
0
1
2
0 20 40 60 80 100
2
1
0
1
2
0 20 40 60 80 100
2
1
0
1
2
Figure 3.28: The rst 100 samples of the three recovered signals by the ICA algorithm,
when a Gaussian noise component was present in each mixed signal.
Example 3.76
In a magnetoencephalographic experiment, somebody used 250 channels
to record the magnetic eld outside a human head, every one millisecond,
producing 1000 samples per channel. There are various processes that take
place in the human brain, some of which control periodic functions of the
human body, like breathing, heart beating, etc. Let us assume that the
true signals that are produced by the brain are those shown in gure 3.24.
We created 250 mixtures of them by choosing triplets of mixing components
at random, the only constraint being that the numbers were in the range
[0, 1]. They were not normalised to sum to 1, as such normalisation is not
realistic in a real situation. Identify the original source signals hidden in
the mixtures.
Let us try to solve the problem according to the interpretation of gure 3.23b. This
means that our P matrix is T S in size, ie its dimensions are 1000 250, and the
sources are given by the columns of matrix V . Figure 3.29 shows the rst 100 points
of the recovered original signals, identied as the columns of matrix V . We note that
the recovery is not so good. The signals are pretty noisy. This is because 250 samples
are not really enough to perform the statistics.
Next, let us try to solve the problem according to the interpretation of gure 3.23a.
This means that our P matrix is the transpose of the previous one, ie it is S T,
ie its dimensions are 250 1000. In this case the sources are given by the rows of
matrix Z. Figure 3.30 shows the rst 100 points of the recovered original signals. The
recovery is almost perfect. Note, of course, that in a real experiment, the recovery is
never perfect. If nothing else, each recording has its own noisy signal superimposed,
and that leads to a recovery of the form shown in gure 3.28.
Independent component analysis 289
0 20 40 60 80 100
0.01
0.005
0
0.005
0.01
0 20 40 60 80 100
0.01
0.005
0
0.005
0.01
0 20 40 60 80 100
10
8
6
4
2
0
2
x 10
3
Figure 3.29: The three recovered signals by the ICA algorithm when the P matrix is
arranged to be 1000 250 in size, ie T S.
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
0 20 40 60 80 100
2
1
0
1
2
Figure 3.30: The three recovered signals by the ICA algorithm when the P matrix is
arranged to be 250 1000 in size, ie S T.
What are the major characteristics of independent component analysis?
Independent component analysis extracts the components of a set of blended recordings
in an unpredictable order.
The identied independent components are identied up to a scaling factor.
Independent component analysis does not produce a basis of elementary signals in terms
of which we may express any other signal that fulls certain constraints, like Karhunen-
Loeve transform does. Instead, independent component analysis identies the possible
independent sources that are hidden in mixed recordings. That is why it is part of the
family of methods known as blind source separation.
Due to the above characteristic, independent component analysis is always performed
over a set of data. It is not meaningful here to talk about a single representative signal
of a set of signals.
290 Image Processing: The Fundamentals
What is the dierence between ICA as applied in image and in signal processing?
There is a confusion in the literature in the terminology used by the dierent communities of
users. Cognitive vision scientists are interested in the actual axes dened by ICA to describe
the cloud of points, and in particular at the coordinates of the tips of their unit vectors in terms
of the original coordinate system, because these are actually the basis images, linear mixtures
of which form all real images we observe. They refer to them as independent components.
Image processing people, interested in sparse representations of images, refer to the coecients
of the expansion of the images in terms of these basis images as the independent components.
Finally, signal processing people view the axes dened by ICA as virtual microphones and
for them the independent components are the projections of their original samples along each
one of these axes. This point is schematically shown in gure 3.31.
A
B
D
E
C
O
Figure 3.31: is the origin of the original coordinate system in which we are given the
data. O is the origin of the same system translated to the centre of the cloud of points
when we remove the mean signal from each signal. The thick arrows are the unit vectors of
the coordinate system we create by performing ICA. Note that, in general, this coordinate
system has fewer axes than the original one. A is the tip of one of these unit vectors. A
is the position vector of this point, with respect to the original coordinate system. Vector
A corresponds to one column of matrix V constructed at Step 15 of the algorithm. People
interested in cognitive vision are interested in the components of vector A. Each such vector
denes an elementary image. Such elementary images of size 8 8 or 16 16 correspond well
to the receptive elds of some cells in the human visual cortex. Examples are those shown
in gure 3.20. B is one of the original signals. If we project it onto the new axes, we shall
get its coordinates with respect to the new axes. The signed lengths OC, OD and OE will
constitute the ICA components of this point. Image processing people interested in image
coding and other similar applications, refer to this set of values as the source signal. These
values correspond to the columns of matrix Z constructed at Step 14 of the algorithm. Signal
processing people consider the projections of all points B along one of the ICA axes and treat
them as one of the source signals they are looking for. These sources correspond to the
rows of matrix Z.
Independent component analysis 291
Figure 3.32: Embroidering and Rice eld in Chengdu. From each image 5000 patches
of size 16 16 were selected at random. The ICA algorithm identied 172 independent
components from the rst image and 130 from the second image, shown at the bottom.
292 Image Processing: The Fundamentals
What is the take home message of this chapter?
If we view an image as an instantiation of a whole lot of images, which are the result of a
random process, then we can try to represent it as the linear superposition of some eigenimages
which are appropriate for representing the whole ensemble of images. For an N N image,
there may be as many as N
2
such eigenimages, while in the SVD approach there were only
N. The dierence is that with these N
2
eigenimages we can represent the whole ensemble of
images, while in the case of SVD the N eigenimages were appropriate only for representing
the one image. If the set of eigenimages is arranged in decreasing order of the corresponding
eigenvalues, truncating the expansion of the image in terms of them approximates any image
in the set with the minimum mean square error, over the whole ensemble of images. In the
SVD case similar truncation led to the minimum square error approximation.
The crux of K-L expansion is the assumption of ergodicity. This assumption states that
the spatial statistics of a single image are the same as the ensemble statistics over the whole set
of images. If a restricted type of image is considered, this assumption is clearly unrealistic:
images are not simply the outcomes of a random process; there is always a deterministic
underlying component which makes the assumption invalid. So, in such a case, the K-L
transform eectively puts more emphasis on the random component of the image, ie the
noise, rather than the component of interest.
However, if many dierent images are considered, the average grey value over the en-
semble, even of the deterministic component, may be the same from pixel to pixel, and the
assumption of ergodicity may be nearly valid. Further, if one has available a collection of
images representative of the type of image of interest, the assumption of ergodicity is not
needed: the K-L transform may be calculated using ensemble statistics and used to dene a
basis tailor-made for the particular type of image. Such is the case of applications dealing
with large databases. The so called eigenface method of person identication is nothing
more than the use of K-L transform using ensemble statistics as opposed to invoking the
ergodicity assumption.
The K-L transform leads to an orthogonal basis of uncorrelated elementary images, in
terms of which we may express any image that shares the same statistical properties as the
image (or images) used to construct the transformation matrix. We may further seek the
identication of a basis of independent elementary images. Such a basis, however, does not
have the same meaning as a K-L basis. It rather serves the purpose of identifying components
in the images that act as building blocks of the set considered. They are extracted usually
in the hope that they may correspond to semantic components. Indeed, they tend to be
elementary image structures like bands and edges in various orientations. Some researchers
have identied them with the elementary structures the human vision system has been known
to detect with the various processing cells it relies on, known as ganglion cells. Figure 3.32
shows the independent components identied from 1616 patches of two very dierent images.
The two sets of the extracted independent components, however, appear to be very similar,
indicating that the building blocks of all images are fundamentally the same.
Chapter 4
Image Enhancement
What is image enhancement?
Image enhancement is the process by which we improve an image so that it looks subjectively
better. We do not really know what the image should look like, but we can tell whether it
has been improved or not, by considering, for example, whether more detail can be seen, or
whether unwanted ickering has been removed, or the contrast is better.
How can we enhance an image?
An image is enhanced when we
remove additive noise and interference;
remove multiplicative interference;
increase its contrast;
decrease its blurring.
Some of the methods we use to achieve the above are
smoothing and low pass ltering;
sharpening or high pass ltering;
histogram manipulation and
generic deblurring algorithms, or algorithms that remove noise while avoid blurring the
image.
Some of the methods in the rst two categories are versions of linear ltering.
What is linear ltering?
Manipulation of images often entails omitting or enhancing details of certain spatial frequen-
cies. This can be done by multiplying the Fourier transform of the image with a certain
function that kills or modies certain frequency components and then taking the inverse
Fourier transform. When we do that, we say that we lter the image, and the function we
use is said to be a linear lter.
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
294 Image Processing: The Fundamentals
4.1 Elements of linear lter theory
How do we dene a 2D lter?
A 2D lter may be dened in terms of its Fourier transform
h(, ), called the frequency
response function
1
. By taking the inverse Fourier transform of
h(, ) we may calculate
the lter in the real domain. This is called the unit sample (or impulse) response of the
lter and is denoted by h(k, l). Filters may be dened in the frequency domain, so they have
exactly the desirable eect on the signal, or they may be dened in the real domain, so they
are easy to implement.
How are the frequency response function and the unit sample response of the
lter related?
The frequency response function
h(, ) is dened as a function of continuous frequencies
(, ). The unit sample response h(k, l) is dened as the inverse Fourier transform of
h(, ).
However, since it has to be used for the manipulation of a digital image, h(k, l) is dened at
discrete points only. Then the equations relating these two functions are:
h(k, l) =
1
(2)
2
_
h(, )e
j(k+l)
dd (4.1)
h(, ) =
+
n=
+
m=
h(n, m)e
j(n+m)
(4.2)
If we are interested in real lters only, these equations may be modied as follows:
h(k, l) =
1
(2)
2
_
h(, ) =
+
n=
+
m=
h(n, m) cos(n +m) (4.4)
Why are we interested in the lter function in the real domain?
Because we may achieve the enhancement of the image as desired, by simply convolving it
with h(k, l) instead of multiplying its Fourier transform with
h(, ). Figure 4.1 shows this
schematically for the 1D case. The 2D case is totally analogous.
Are there any conditions which h(k, l) must full so that it can be used as a
convolution lter?
Yes, h(k, l) must be zero for k > K and l > L, for some nite values K and L; ie the lter
with which we want to convolve the image must be a nite array of numbers. The ideal low
1
Strictly speaking, a lter is dened in terms of its system transfer function, which is the Laplace or
z-transform of the sample response of the lter. The frequency response function is a special case of the
transfer function. Its limitation is that it does not allow one to assess the stability of the lter. However, in
image processing applications, we rarely deal with unstable lters, so the issue does not arise.
Elements of linear lter theory 295
F( ) f(x)
x
we want
band we
want to
eliminate
band
to keep
f(x) * h(x)
h(x)
x
x
1
0
FT of the filtered
signal
Signal
Filtered signal
H( )
Filter FT of the filter
Fourier
Fourier
transform
transform
Fourier
transform
inverse
inverse
Figure 4.1: The case of a lter dened in the frequency domain for maximum performance.
The processing route followed in this case is: image FT of the image multiplication
with the FT of the lter inverse FT of the product ltered image. Top row: a signal
and its Fourier transform. Middle row: on the right the ideal lter we dene in the frequency
domain and on the left its unit sample response in the real domain, which is not nite and
thus it has a rather inconvenient form. Bottom row: on the right the Fourier transform of the
ltered signal obtained by multiplying the Fourier transform of the signal at the top, with
the Fourier transform of the lter in the middle; on the left the ltered signal that could have
been obtained by convolving the signal at the top with the lter in the middle, if lter h(x)
were nite.
296 Image Processing: The Fundamentals
pass, band pass and high pass lters do not full this condition. That is why the are called
innite impulse response lters (IIR).
Example 4.1
Calculate the impulse response of the ideal 1D low pass lter.
The ideal low pass lter in 1D is dened as
h() =
_
1 if
0
< <
0
0 otherwise
(4.5)
where is the frequency variable and 0 <
0
< is the cuto frequency parameter.
The inverse Fourier transform of this function is:
h(k) =
1
2
_
h()e
jk
d
=
1
2
_
0
0
e
jk
d
=
1
2
_
0
0
cos(k)d +j
1
2
_
0
0
sin(k)d
. .
=0
=
1
2
sin(k)
k
0
=
1
2k
2 sin(
0
k) =
sin(
0
k)
k
(4.6)
Box 4.1. What is the unit sample response of the 2D ideal low pass lter?
The 2D ideal low pass lter (see gure 4.2), which sets to zero all frequencies above a
certain radial frequency R, is dened as:
h(, ) =
_
1 for
_
2
+
2
R
0 otherwise
(4.7)
1 1
n
R
r
H( ,n) H( ,n)
Figure 4.2: The ideal lowpass lter in 2D in the frequency domain. On the right, a
cross-section of this lter with cuto frequency R (r
_
2
+
2
).
Elements of linear lter theory 297
We may use this denition of
h(, ) to calculate the corresponding unit sample response
from equation (4.3):
h(k, l) =
1
(2)
2
_
cos(k +l)
2
+
2
= r
2
and dd = rdrd (4.9)
Then:
h(k, l) =
1
(2)
2
_
2
0
_
R
0
cos(rk cos +rl sin )rdrd (4.10)
We may write
k cos +l sin =
_
k
2
+l
2
_
k
k
2
+l
2
cos +
l
k
2
+l
2
sin
_
_
k
2
+l
2
[sin cos + cos sin ] =
_
k
2
+l
2
sin( +)(4.11)
where angle has been dened so that:
sin
k
k
2
+l
2
and cos
l
k
2
+l
2
(4.12)
We dene a new variable t +. Then equation (4.10) may be written as:
h(k, l) =
1
(2)
2
_
2+
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt
=
1
(2)
2
_
2
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt
+
1
(2)
2
_
2+
2
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt (4.13)
In the second term we change variable t to
t t 2 t =
t + 2 sin t = sin
t.
Therefore, we may write:
h(k, l) =
1
(2)
2
_
2
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt
+
1
(2)
2
_
0
_
R
0
cos
_
r
_
k
2
+l
2
sin
t
_
rdrd
t
=
1
(2)
2
_
2
0
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt (4.14)
298 Image Processing: The Fundamentals
This may be written as:
h(k, l) =
1
(2)
2
_
0
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt
+
1
(2)
2
_
2
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt (4.15)
In the second term, we dene a new variable of integration:
t t t =
t +
sin t = sin
t cos
_
r
k
2
+l
2
sin t
_
= cos
_
r
k
2
+l
2
sin
t
_
and dt = d
t. Then:
h(k, l) =
1
(2)
2
_
0
_
R
0
cos
_
r
_
k
2
+l
2
sin t
_
rdrdt
+
1
(2)
2
_
0
_
R
0
cos
_
r
_
k
2
+l
2
sin
t
_
rdrd
t
=
1
2
2
_
R
0
__
0
cos
_
r
_
k
2
+l
2
sin t
_
dt
_
rdr (4.16)
We know that the Bessel function of the rst kind of zero order is dened as:
J
0
(z)
1
_
0
cos(z sin )d (4.17)
If we use denition (4.17) in equation (4.16), we obtain:
h(k, l) =
1
2
_
R
0
rJ
0
_
r
_
k
2
+l
2
_
dr (4.18)
We dene a new variable of integration x r
k
2
+l
2
dr =
1
k
2
+l
2
dx. Then:
h(k, l) =
1
2
1
k
2
+l
2
_
R
k
2
+l
2
0
xJ
0
(x)dx (4.19)
From the theory of Bessel functions, it is known that:
_
x
p+1
J
p
(x)dx = x
p+1
J
p+1
(x) (4.20)
We apply formula (4.20) with p = 0 to equation (4.19):
h(k, l) =
1
2
1
k
2
+l
2
xJ
1
(x)
k
2
+l
2
0
h(k, l) =
1
2
R
k
2
+l
2
J
1
_
R
_
k
2
+l
2
_
(4.21)
This function is a function of innite extent, dened at each point (k, l) of integer coor-
dinates. It corresponds, therefore, to an array of innite dimensions. The implication
is that this lter cannot be implemented as a linear convolution lter.
Elements of linear lter theory 299
Example B4.2
What is the impulse response of the 2D ideal band pass lter?
The ideal band pass lter for band [R
1
, R
2
] is dened as:
h(, ) =
_
1 for R
1
_
2
+
2
R
2
0 otherwise
(4.22)
The only dierence, therefore, with the ideal lowpass lter, derived in Box 4.1, is in
the limits of equation (4.19):
h(k, l) =
1
2
1
k
2
+l
2
_
R
2
k
2
+l
2
R
1
k
2
+l
2
xJ
0
(x)dx
=
1
2
1
k
2
+l
2
xJ
1
(x)
R
2
k
2
+l
2
R
1
k
2
+l
2
=
1
2
1
k
2
+l
2
_
R
2
J
1
_
R
2
_
k
2
+l
2
_
R
1
J
1
_
R
1
_
k
2
+l
2
__
(4.23)
This is a function dened for all values (k, l). Therefore, the ideal band pass lter is
an innite impulse response lter.
Example B4.3
What is the impulse response of the 2D ideal high pass lter?
The ideal high pass lter, with cuto radial frequency R, is dened as:
h(, ) =
_
0 for
_
2
+
2
R
1 otherwise
(4.24)
The only dierence, therefore, with the ideal lowpass lter, derived in Box 4.1, is in
the limits of equation (4.19):
h(k, l) =
1
2
1
k
2
+l
2
_
+
R
k
2
+l
2
xJ
0
(x)dx
=
1
2
1
k
2
+l
2
xJ
1
(x)
+
R
k
2
+l
2
(4.25)
300 Image Processing: The Fundamentals
Bessel function J
1
(x) tends to 0 for x +. However, its asymptotic behaviour
is lim
x+
J
1
(x)
1
x
. This means that J
1
(x) does not tend to 0 fast enough to
compensate for factor x which multiplies it, ie lim
x+
xJ
1
(x) +. Therefore,
there is no real domain function that has as Fourier transform the ideal high pass
lter. In practice, of course, the highest frequency we may possibly be interested in is
1
N
, where N is the number of samples along one image axis, so the issue of the innite
upper limit in equation (4.25) does not arise and the ideal high pass lter becomes the
same as the ideal band pass lter.
What is the relationship between the 1D and the 2D ideal lowpass lters?
For a cuto frequency
0
= 1, the 1D ideal lowpass lter is given by (see example 4.1):
h
1
(k) =
sin k
k
(4.26)
For a cuto radial frequency R = 1, the 2D ideal lowpass lter is given by (see Box 4.1)
h
2
(k, l) =
J
1
(
k
2
+l
2
)
2
k
2
+l
2
(4.27)
where J
1
(x) is the rst-order Bessel function of the rst kind. Figure 4.3 shows the plot of
h
1
(k) versus k and the plot of h
2
(k, l) versus k for l = 0. It can be seen that although the
two lters look similar, they dier in signicant details: their zero crossings are at dierent
places, and the amplitudes of their side-lobes are dierent.
The plots in this gure were created by observing that sink/k = 1 for k = 0 and that
J
1
(k)/k = 1/2 for k = 0. Further, for k = 1, 2 and 3, the following approximation
formula holds:
J
1
(k)
k
0.5 0.56249985
_
k
3
_
2
+ 0.21093573
_
k
3
_
4
0.03954289
_
k
3
_
6
+0.00443319
_
k
3
_
8
0.00031761
_
k
3
_
10
+ 0.00001109
_
k
3
_
12
(4.28)
For k > 3, the approximation is
J
1
(k)
k
=
1
k
k
_
0.79788456 + 0.00000156
3
k
+ 0.01659667
_
3
k
_
2
+0.00017105
_
3
k
_
3
0.00249511
_
3
k
_
4
+ 0.00113653
_
3
k
_
5
0.00020033
_
3
k
_
6
_
cos
1
(4.29)
where:
Elements of linear lter theory 301
1
= k 2.35619449 + 0.12499612
3
k
+ 0.00005650
_
3
k
_
2
0.00637879
_
3
k
_
3
+0.00074348
_
3
k
_
4
+ 0.00079824
_
3
k
_
5
0.00029166
_
3
k
_
6
(4.30)
The dierences in the two lters imply that we cannot take an ideal or optimal (according
20 0 20
0.1
0
0.1
0.2
0.3
k
h
1
(k)
h
2
(k,0)
Figure 4.3: The cross-section of the 2D ideal lowpass lter (h
2
(k, l) represented here by the
continuous line) is similar to but dierent from the cross-section of the 1D ideal lowpass lter
(h
1
(k), represented here by the dashed line).
to some criteria) 1D lter, replace its variable by the polar radius (ie replace k by
k
2
+l
2
in equation (4.26)) and create the corresponding ideal or optimal lter in 2D. However,
although the 2D lter we shall create this way will not be the ideal or optimal one, according
to the corresponding criteria in 2D, it will be a good suboptimal lter with qualitatively the
same behaviour as the optimal one.
How can we implement in the real domain a lter that is innite in extent?
A lter, which is of innite extent in real space may be implemented in a recursive way, and
that is why it is called a recursive lter. Filters which are of nite extent in real space are
called nonrecursive lters. Filters are usually represented and manipulated with the help
of their z-transforms (see Box 4.2). The z-transforms of innite in extent lters lead to
recursive implementation formulae.
Box 4.2. z-transforms
A lter of nite extent is essentially a nite string of numbers x
l
, x
l+1
, x
l+2
, . . . , x
m
,
where l and m are some integers. Sometimes an arrow is used to denote the element
of the string that corresponds to the 0
th
position. The z-transform of such a string is
dened as:
302 Image Processing: The Fundamentals
X(z)
m
k=l
x
k
z
k
(4.31)
If the lter is of innite extent, the sequence of numbers which represents it is of innite
extent too and its z-transform is given by an innite sum, of the form:
X(z) =
+
k=
x
k
z
k
or
+
k=0
x
k
z
k
or
0
k=
x
k
z
k
(4.32)
In such a case, we can usually write this sum in closed form as the ratio of two poly-
nomials in z, as opposed to writing it as a single polynomial in z (which is the case for
the z-transform of the nite lter):
H(z) =
M
a
i=0
a
i
z
i
M
b
j=0
b
j
z
j
(4.33)
Here M
a
and M
b
are some integers. Conventionally we choose b
0
= 1.
The reason we use z-transforms is because digital lters can easily be realised in hard-
ware in terms of their z-transforms. The z-transform of a sequence together with its
region of convergence uniquely denes the sequence. Further, it obeys the convolution
theorem: the z-transform of the convolution of two sequences is the product of the
z-transforms of the two sequences.
When we convolve a signal with a digital lter, we essentially multiply the z-transform
of the signal with the z-transform of the lter:
R(z)
..
z-transform of
output signal
= H(z)
. .
z-transform
of lter
D(z)
. .
z-transform of
input signal
(4.34)
If we substitute from (4.33) into (4.34) and bring the denominator to the left-hand side
of the equation, we have:
R(z)
M
b
j=0
b
j
z
j
=
_
M
a
i=0
a
i
z
i
_
D(z) (4.35)
In the sum on the left-hand side, we separate the j = 0 term and remember that b
0
= 1:
R(z) +
_
_
M
b
j=1
b
j
z
j
_
_
R(z) =
_
M
a
i=0
a
i
z
i
_
D(z) (4.36)
Therefore:
R(z) =
_
M
a
i=0
a
i
z
i
_
D(z)
_
_
M
b
j=1
b
j
z
j
_
_
R(z) (4.37)
Elements of linear lter theory 303
Remember that R(z) is a sum in z
m
with coecients, say, r
m
. It is clear from the above
equation that the value of r
m
may be calculated in terms of the previously calculated
values of r
m
since polynomial R(z) appears on the right-hand side of the equation too.
That is why such a lter is called recursive. In the case of a nite lter, all b
i
s are
zero (except b
0
which is 1) and so coecients r
m
of R(z) are expressed in terms of a
i
and the coecients which appear in D(z) only (ie we have no recursion).
Example B4.4
A good approximation of the ideal low pass lter is the so called Butter-
worth lter. Butterworth lters constitute a whole family of lters. The
z-transform of one of the Butterworth lters is given by:
H(z) =
0.58z
1
+ 0.21z
2
1 0.40z
1
+ 0.25z
2
0.044z
3
(4.38)
Using equation (4.37) work out how you may use this lter in a recursive
way to smooth an image of size 256 256 line by line.
We shall treat each line of the image as a signal of 256 samples. Let us call the samples
along this sequence d
0
, d
1
, . . . , d
255
. Then its z-transform is:
D(z) =
255
k=0
d
k
z
k
(4.39)
Let us denote the samples of the smoothed sequence by r
0
, r
1
, . . . , r
255
. Then the z-
transform of the output sequence is:
R(z) =
255
k=0
r
k
z
k
(4.40)
Our task is to calculate the values of r
k
from the known values d
k
and the lter
parameters.
According to the notation of Box 4.2:
a
0
= a
3
= 0, a
1
= 0.58, a
2
= 0.21
b
1
= 0.40, b
2
= 0.25, b
3
= 0.044 (4.41)
We substitute from (4.39), (4.40) and (4.41) into (4.37), in order to work out the
values of r
k
in terms of d
k
(the input pixel values) and the lter coecients a
1
, a
2
,
b
1
, b
2
and b
3
:
304 Image Processing: The Fundamentals
255
k=0
r
k
z
k
= a
1
z
1
255
k=0
d
k
z
k
+a
2
z
2
255
k=0
d
k
z
k
(b
1
z
1
+b
2
z
2
+b
3
z
3
)
255
k=0
r
k
z
k
=
255
k=0
a
1
d
k
z
k1
. .
set k+1
kk=
k1
+
255
k=0
a
2
d
k
z
k2
. .
set k+2
kk=
k2
255
k=0
b
1
r
k
z
k1
. .
set k+1
kk=
k1
255
k=0
b
2
r
k
z
k2
. .
set k+2
kk=
k2
255
k=0
b
3
r
k
z
k3
. .
set k+3
kk=
k3
=
256
k=1
a
1
d
k1
z
k
+
257
k=2
a
2
d
k2
z
256
k=1
b
1
r
k1
z
257
k=2
b
2
r
k2
z
258
k=3
b
3
r
k3
z
k
(4.42)
We may drop the tilde from the dummy summation variable
k and also we may split
the terms in the various sums that are outside the range [3, 255]:
r
0
+r
1
z
1
+r
2
z
2
+
255
k=3
r
k
z
k
=
255
k=3
a
1
d
k1
z
k
+a
1
d
0
z
1
+a
1
d
1
z
2
+a
1
d
255
z
256
+
255
k=3
a
2
d
k2
z
k
+a
2
d
0
z
2
+a
2
d
254
z
256
+a
2
d
255
z
257
255
k=3
b
1
r
k1
z
k
b
1
r
0
z
1
b
1
r
1
z
2
b
1
r
255
z
256
255
k=3
b
2
r
k2
z
k
b
2
r
0
z
2
b
2
r
254
z
256
b
2
r
255
z
257
255
k=3
b
3
r
k3
z
k
b
3
r
253
z
256
b
3
r
254
z
257
b
3
r
255
z
258
(4.43)
We may then collect together all terms with equal powers of z:
Elements of linear lter theory 305
r
0
+ (r
1
a
1
d
0
+b
1
r
0
)z
1
+ (r
2
a
1
d
1
a
2
d
0
+b
1
r
1
+b
2
r
0
)z
2
+
255
k=3
(r
k
a
1
d
k1
a
2
d
k2
+b
1
r
k1
+b
2
r
k2
+b
3
r
k3
)z
k
+(a
1
d
255
a
2
d
254
+b
1
r
255
+b
2
r
254
+b
3
r
253
)z
256
(4.44)
+(a
2
d
255
+b
2
r
255
+b
3
r
254
)z
257
+b
3
r
255
z
258
= 0
As this equation has to be valid for all values of z, we must set equal to 0 all coecients
of all powers of z:
r
0
= 0
r
1
a
1
d
0
+b
1
r
0
= 0
r
2
a
1
d
1
a
2
d
0
+b
1
r
1
+b
2
r
0
= 0
r
k
a
1
d
k1
a
2
d
k2
+b
1
r
k1
+b
2
r
k2
+b
3
r
k3
= 0 for k = 3, 4, . . . , 255
a
1
d
255
a
2
d
254
+b
1
r
255
+b
2
r
254
+b
3
r
253
= 0
a
2
d
255
+b
2
r
255
+b
3
r
254
= 0
b
3
r
255
= 0 (4.45)
The last three equations may be solved to yield values for r
253
, r
254
and r
255
, which
are incompatible with the values for the same unknowns that will be computed from the
recursive expression. This problem arises because the sequence is considered nite. If
the upper limits in (4.39) and (4.40) were +, instead of 255, the last three equations
in (4.45) would not have arisen. In practice, we only keep the recursive relation from
(4.45), and use it to compute the smoothed values r
k
of the line of the image, given
the input pixel values d
k
and the lter coecients a
1
, a
2
, b
1
, b
2
and b
3
, as follows:
r
k
= a
1
d
k1
+a
2
d
k2
b
1
r
k1
b
2
r
k2
b
3
r
k3
for k = 0, 1, 2, . . . , 255 (4.46)
Note that the rst three of equations (4.45) are special cases of the recursive formula,
if we consider that values d
3
, d
2
, d
1
, r
1
, r
2
and r
3
are 0.
Alternative options exist to dene the variables with the negative indices in the recur-
sive formula. The input image values sometimes are set equal to the last few values
of the row of pixels, assuming wrap round boundary conditions (ie signal repetition):
d
1
= d
255
, d
2
= d
254
and d
3
= d
253
. This arbitrariness in the initial conditions
of the recursive relationship is the reason some scientists say that the recursive lters
have innitely long boundary eect. That is, the choice of boundary conditions we
make for the recursive relationship aects all subsequent pixel values, while this is not
the case for nonrecursive lters.
306 Image Processing: The Fundamentals
Example B4.5
You are given the sequence: 12, 13, 13, 14, 12, 11, 12, 13, 4, 5, 6, 5, 6, 4, 3, 6. Use
the lter of example 4.4 to work out a smooth version of it, assuming
(i) that the sequence is repeated ad innitum in both directions;
(ii) that the values of the samples outside the index range given are all 0.
Plot the initial and the two smoothed sequences and comment on the result.
We apply equation (4.46) assuming that the values of r
k
for negative indices are 0.
The reconstructed sequences we obtain are:
(i) 4.1100, 9.8640, 12.9781, 13.1761, 13.3099, 12.5010, 11.1527, 11.1915, 12.2985,
7.6622, 4.2227, 4.8447, 5.3793, 5.6564, 4.7109, 3.2870.
(ii) 0.0000, 6.9600, 12.8440, 13.6676, 13.4123, 12.4131, 11.1136, 11.2023, 12.3087,
7.6619, 4.2205, 4.8443, 5.3797, 5.6565, 4.7108, 3.2869.
Figure 4.4 shows the plots of the original and the smoothed sequences. We observe
that in practice the eect of the boundary conditions we choose do not make much
dierence to the signal after a few samples. We also observe that the smoothed signal
is shifted one position to the right of the input sequence. This is expected, as the lter
we use has in its numerator as common factor z
1
. Every z
1
we use to multiply an
input sequence with, shifts the sequence by one position to the right.
Figure 4.4: The points of the original sequence are denoted by circles. The two
smoothed versions of it are denoted by crosses and triangles. After the rst few sam-
ples, the two sequences become indistinguishable. However, both are shifted by one
sample to the right in relation to the original sequence.
Elements of linear lter theory 307
Example B4.6
The z-transform of a lter is:
H(z) =
0.58 + 0.21z
1
1 0.40z
1
+ 0.25z
2
0.044z
3
(4.47)
Work out the formulae that will allow you to use this lter to smooth a
sequence d
0
, d
1
, . . . , d
255
.
According to the notation of Box 4.2:
a
0
= 0.58, a
1
= 0.21, a
2
= a
3
= 0
b
1
= 0.40, b
2
= 0.25, b
3
= 0.044 (4.48)
The z-transform of the input sequence is given by (4.39), and that of the output se-
quence by (4.40). We substitute from (4.39), (4.40) and (4.48) into (4.37), in order
to work out the values of r
k
in terms of d
k
(the input pixel values) and the lter
coecients a
0
, a
1
, b
1
, b
2
and b
3
:
255
k=0
r
k
z
k
= a
0
255
k=0
d
k
z
k
+a
1
z
1
255
k=0
d
k
z
k
(b
1
z
1
+b
2
z
2
+b
3
z
3
)
255
k=0
r
k
z
k
= a
0
255
k=0
d
k
z
k
+
255
k=0
a
1
d
k
z
k1
. .
set k+1
kk=
k1
255
k=0
b
1
r
k
z
k1
. .
set k+1
kk=
k1
255
k=0
b
2
r
k
z
k2
. .
set k+2
kk=
k2
255
k=0
b
3
r
k
z
k3
. .
set k+3
kk=
k3
= a
0
255
k=0
d
k
z
k
+
256
k=1
a
1
d
k1
z
256
k=1
b
1
r
k1
z
257
k=2
b
2
r
k2
z
258
k=3
b
3
r
k3
z
k
(4.49)
We may drop the tilde from the dummy summation variable
k and also we may split
the terms in the various sums that are outside the range [3, 255]:
308 Image Processing: The Fundamentals
r
0
+r
1
z
1
+r
2
z
2
+
255
k=3
r
k
z
k
=
255
k=3
a
0
d
k
z
k
+a
0
d
0
+a
0
d
1
z
1
+a
0
d
2
z
2
+
255
k=3
a
1
d
k1
z
k
+a
1
d
0
z
1
+a
1
d
1
z
2
+a
1
d
255
z
256
255
k=3
b
1
r
k1
z
k
b
1
r
0
z
1
b
1
r
1
z
2
b
1
r
255
z
256
255
k=3
b
2
r
k2
z
k
b
2
r
0
z
2
b
2
r
254
z
256
b
2
r
255
z
257
255
k=3
b
3
r
k3
z
k
b
3
r
253
z
256
b
3
r
254
z
257
b
3
r
255
z
258
(4.50)
We may then collect together all terms with equal powers of z:
r
0
a
0
d
0
+(r
1
a
0
d
1
a
1
d
0
+b
1
r
0
)z
1
+(r
2
a
0
d
2
a
1
d
1
+b
1
r
1
+b
2
r
0
)z
2
+
255
k=3
(r
k
a
0
d
k
a
1
d
k1
+b
1
r
k1
+b
2
r
k2
+b
3
r
k3
)z
k
+(a
1
d
255
+b
1
r
255
+b
2
r
254
+b
3
r
253
)z
256
(4.51)
+(b
2
r
255
+b
3
r
254
)z
257
+b
3
r
255
z
258
= 0
As this equation has to be valid for all values of z, we must set equal to 0 all coecients
of all powers of z:
r
0
a
0
d
0
= 0
r
1
a
0
d
1
a
1
d
0
+b
1
r
0
= 0
r
2
a
0
d
2
a
1
d
1
+b
1
r
1
+b
2
r
0
= 0
r
k
a
0
d
k
a
1
d
k1
+b
1
r
k1
+b
2
r
k2
+b
3
r
k3
= 0 for k = 3, 4, . . . , 255
a
1
d
255
+b
1
r
255
+b
2
r
254
+b
3
r
253
= 0
b
2
r
255
+b
3
r
254
= 0
b
3
r
255
= 0 (4.52)
The last three equations are ignored in practice. The recursive equation we have to use
is:
r
k
= a
0
d
k
+a
1
d
k1
b
1
r
k1
b
2
r
k2
b
3
r
k3
for k = 0, 1, 2, . . . , 255 (4.53)
Elements of linear lter theory 309
Example B4.7
Smooth the sequence of example 4.5 using the recursive lter of example
4.6, for the cases when
(i) the sequence is repeated ad innitum in both directions;
(ii) the values of the samples outside the index range given are all 0.
Compare the results with those of example 4.6.
We apply equation (4.53) assuming that the values of r
k
for negative indices are 0.
The reconstructed sequences we obtain are:
(i) 8.2200, 13.3480, 13.5542, 13.2964, 12.4173, 11.1392, 11.2064, 12.3041, 7.6602,
4.2211, 4.8448, 5.3797, 5.6564, 4.7108, 3.2869, 4.4960
(ii) 6.9600, 12.8440, 13.6676, 13.4123,
12.4131, 11.1136, 11.2023, 12.3087, 7.6619, 4.2205, 4.8443, 5.3797, 5.6565, 4.7108,
3.2869, 4.4959.
We note that in (ii) the results are identical with those of example 4.6, except now
they are shifted one position to the left, so the smoothed sequence follows the input
sequence more faithfully. In (i) the results are similar to those of example 4.6 but
not identical. This is expected, given the dierent samples of the wrapped round input
sequence which aect the rst value of the reconstructed sequence.
Can we dene a lter directly in the real domain for convenience?
Yes, but we should always keep an eye what the lter does in the frequency domain. For
example, if we use a at averaging lter to convolve the image with, we may create high
frequency artifacts in the output image. This is due to the side lobes the Fourier transform of
the at lter has, which may enhance certain high frequencies in the image while suppressing
others. This is schematically shown in gure 4.5.
Can we dene a lter in the real domain, without side lobes in the frequency
domain?
Yes. The Fourier Transform of a Gaussian function is also a Gaussian function. So, if we
choose the shape of the lter to be given by a Gaussian, we shall avoid the danger of creating
artifacts in the image. However, a Gaussian lter is innite in extent and in order to use it
as a convolution lter, it has to be truncated. If we truncate it, its Fourier transform will no
longer be a Gaussian, but it will still be a function with minimal side lobes.
310 Image Processing: The Fundamentals
F( )
h(x)
H( )
m
u
l
t
i
p
l
i
c
a
t
i
o
n
c
o
n
v
o
l
u
t
i
o
n
signal
FT of the filtered
f(x)
x
we want
band we
want to
eliminate
band
to keep
f(x) * h(x)
x
Signal
Filtered signal
Fourier
x
0
Filter
FT of the filter
Fourier
transform
transform
Figure 4.5: The case of a lter dened in the real domain, for maximum convenience. The
processing route here is: image convolution with the lter ltered image. Compare this
with the process followed in gure 4.1. Top row: a signal and its Fourier transform. Middle
row: a at lter on the left, and its Fourier transform on the right. Bottom row: on the left,
the ltered signal that may be obtained by convolving the signal at the top with the lter in
the middle; on the right the Fourier transform of the ltered signal obtained by multiplying
the Fourier transform of the signal at the top, with the Fourier transform of the lter in
the middle. Note that this lter is very convenient to implement in the real domain, but its
side lobes in the frequency domain may cause artifacts in the signal, by keeping some high
frequencies while killing others.
Reducing high frequency noise 311
4.2 Reducing high frequency noise
What are the types of noise present in an image?
Noise in images is often assumed to be either impulse noise or Gaussian noise. Image
noise is often assumed to be additive, zero-mean, unbiased, independent, uncorrelated, ho-
mogeneous, white, Gaussian and iid. For special cases, where high accuracy is required, it is
advisable to work out specically the noise model, as some or all of these assumptions may
be violated.
What is impulse noise?
Impulse noise, also known as shot noise or spec noise, alters at random the values of some
pixels. In a binary image this means that some black pixels become white and some white
pixels become black. This is why this noise is also called salt and pepper noise. It is
assumed to be Poisson distributed. A Poisson distribution has the form
p(k) =
e
k
k!
(4.54)
where p(k) is the probability of having k pixels aected by the noise in a window of a certain
size, and is the average number of aected pixels in a window of the same xed size. The
variance of the Poisson distribution is also .
What is Gaussian noise?
Gaussian noise is the type of noise in which, at each pixel position (i, j), the random noise
value, that aects the true pixel value, is drawn from a Gaussian probability density function
with mean (i, j) and standard deviation (i, j). Unlike shot noise, which inuences a few
pixels only, this type of noise aects all pixel values.
What is additive noise?
If the random number of the noise eld is added to the true value of the pixel, the noise is
additive.
What is multiplicative noise?
If the random number of the noise eld is multiplied with the true value of the pixel, the
noise is multiplicative.
What is homogeneous noise?
If the noise parameters are the same for all pixels, the noise is homogeneous. For example,
in the case of Gaussian noise, if (i, j) and (i, j) are the same for all pixels (i, j) and equal,
say, to and , respectively, the noise is homogeneous.
312 Image Processing: The Fundamentals
What is zero-mean noise?
If the mean value of the noise is zero ( = 0), the noise is zero-mean. Another term for
zero-mean noise is unbiased noise.
What is biased noise?
If (i, j) ,= 0 for at least some pixels, the noise is called biased. This is also known as xed
pattern noise. Such a noise can be easily converted to zero-mean by removing (i, j) from
the value of pixel (i, j).
What is independent noise?
As the noise value that aects each pixel is random, we may think of the noise process as a
random eld, the same size as the image, which, point by point, is added to (or multiplied
with) the eld that represents the image. We may say then, that the value of the noise at each
pixel position is the outcome of a random experiment. If the result of the random experiment,
which is assumed to be performed at a pixel position, is not aected by the outcome of the
random experiment at other pixel positions, the noise is independent.
What is uncorrelated noise?
If the average value of the product of the noise values at any combination of n pixel positions,
(averaged over all such n-tuples of positions in the image) is equal to the product of the
average noise values at the corresponding positions, the noise is uncorrelated:
Average ofproduct = Product ofaverages (4.55)
For zero-mean noise this is the same as saying that if we consider any n-tuple of pixel
positions and multiply their values and average these values over all such n-tuples we nd
in the image, the answer will be always 0. In practice, we consider only the autocorrelation
function of the noise eld, ie we consider only pairs of pixels in order to decide whether the
noise is correlated or uncorrelated.
Let us perform the following thought experiment. Let us consider the noise eld. We
consider a pair of samples at a certain relative position from each other and multiply their
values. If the noise is zero-mean, sometimes these two noise values will be both positive,
sometimes both negative, and sometimes one will be positive and the other negative. This
means that sometimes their product will be positive and sometimes negative. If the noise is
assumed to be uncorrelated, we expect to have about equal number of positive and negative
products if we consider all pairs of samples at the same relative position. So, the average
value of the product of the noise values at a pair of positions, over all similar pairs of positions
in the noise eld, is expected to be 0. We shall get 0 for all relative positions of sample pairs,
except when we consider a sample paired with itself, because in that case we average the
square of the noise value over all samples. In that case it is not possible to get 0 because we
would be averaging non-negative numbers.
What we are calculating with this thought experiment is the spatial autocorrelation func-
tion of the random eld of noise. The average of the squared noise value is nothing other
Reducing high frequency noise 313
than the variance
2
of the homogeneous noise eld, and the result indicates that the auto-
correlation function of this eld is a delta function with value
2
at zero shift and 0 for all
other shifts. So,
C(h) =
2
(h) (4.56)
where h is the distance between the pixels we pair together in order to compute the autocor-
relation function C(h).
What is white noise?
It is noise that has the same power at all frequencies (at power spectrum). The term
comes from the white light, which is supposed to have equal power at all frequencies of the
electromagnetic spectrum. If the spectrum of the noise were not at, but it had more power in
some preferred frequencies, the noise would have been called coloured noise. For example,
if the noise had more power in the high frequencies, which in the electromagnetic spectrum
correspond to the blue light, we might have characterised the noise as blue noise. Note that
the analogy with the colour spectrum is only a metaphor: in the case of noise we are talking
about spatial frequencies, while in the case of light we are talking about electromagnetic
frequencies.
What is the relationship between zero-mean uncorrelated and white noise?
The two terms eectively mean the same thing. The autocorrelation function of zero-mean
uncorrelated noise is a delta function (see equation (4.56)). The Fourier transform of the auto-
correlation function is the power spectrum of the eld, according to the Wiener-Khinchine
theorem (see Box 4.5, on page 325). The Fourier transform of a delta function is a function
with equal amplitude at all frequencies (see Box 4.4, on page 325). So, the power spectrum
of the uncorrelated zero-mean noise is a at spectrum, with equal power at all frequencies.
Therefore, for zero-mean noise, the terms uncorrelated and white are interchangeable.
What is iid noise?
This means independent, identically distributed noise. The term independent means
that the joint probability density function of the combination of the noise values may be
written as the product of the probability density functions of the individual noise components
at the dierent pixels. The term identically distributed means that the noise components at
all pixel positions come from identical probability density functions. For example, if the noise
value at every pixel is drawn from the same Gaussian probability density function, but with
no regard as to what values have been drawn in other pixel positions, the noise is described
as iid.
If the noise component n
ij
at pixel (i, j) is drawn from a Gaussian probability density
function with mean and standard deviation , we may write for the joint probability density
function p(n
11
, n
12
, . . . , n
NM
) of all noise components
p(n
11
, n
12
, . . . , n
NM
) =
1
2
e
(n
11
)
2
2
2
2
e
(n
12
)
2
2
2
1
2
e
(n
NM
)
2
2
2
(4.57)
where the size of the image has been assumed to be N M.
314 Image Processing: The Fundamentals
Example B4.8
Show that zero-mean iid noise is white.
Independent random variables are also uncorrelated random variables:
Independent p(n
11
, n12, . . . , n
NM
) =
(N,M)
(i,j)=(1,1)
p(n
ij
)
Mean(n
11
, n
12
, . . . , n
NM
) =
(N,M)
(i,j)=(1,1)
Mean(n
ij
)
Mean(n
ij
, n
i+k,j+l
) = Mean(n
ij
)Mean(n
i+k,j+l
) = 0 (i, j), (k, l)
Autocorrelation Function = Delta Function
White spectrum (4.58)
Example B4.9
Show that biased iid noise is coloured.
Consider two pixels (i, j) and (i + k, j + l). Each one has a noise component that
consists of a constant value b and a zero-mean part
ij
and
i+k,j+l
, respectively, so
that n
ij
b +
ij
and n
i+k,j+l
= b +
i+k,j+l
. Consider the autocorrelation function
for these two pixel positions:
< n
ij
n
i+k,j+l
> = < (b +
ij
)(b +
i+k,j+l
) >
= b
2
+b <
i+k,j+l
>
. .
=0
+b <
ij
>
. .
=0
+<
ij
i+k,j+l
>
. .
=0
= b
2
= a constant (4.59)
The Fourier transform of a constant is a delta function: 2b
2
(). Such a spectrum
is clearly an impulse, ie not white.
Reducing high frequency noise 315
Is it possible to have white noise that is not iid?
Yes. The white spectrum means that the autocorrelation function of the noise is a delta func-
tion. For zero-mean noise this implies uncorrelatedness, but not independence (see examples
4.10, 4.11 and 4.12).
Example B4.10
Consider an iid 1D zero-mean uniform noise signal x(i) in the range [3, 3].
From this construct a noise signal y(j) as follows:
y(2i) = x(i)
y(2i + 1) = k
_
x(i)
2
3
(4.60)
Select a value for k so that the variance of y(j) is also 3 and show that y(j)
is a zero-mean noise.
From example 3.53, on page 244, we know that the variance
2
of x(i) is 3 (A = 3 in
(3.170)). The average (expectation value) of the even samples of y(j) is clearly zero,
as the average of x(i) is zero. The average of the odd samples of y(j) is
< y(2i + 1) >=
k
_
x(i)
2
3
_
= k
_
x(i)
2
_
3
= k[3 3] = 0 (4.61)
since the variance,
x(i)
2
_
, of x(i) is 3. So y(j) has mean 0. Then its variance is the
same as the average of the squares of its samples. Obviously, the average of its even
samples is
x(i)
2
_
= 3. The average of its odd samples is:
y(2i + 1)
2
_
=
k
2
x(i)
4
6k
2
x(i)
2
+ 9k
2
_
= k
2
x(i)
4
_
6k
2
x(i)
2
_
+ 9k
2
= k
2
9
5
3
2
6k
2
3 + 9k
2
= k
2
36
5
(4.62)
Here we made use of (3.185), on page 247, for the value of
x(i)
4
_
4
. So, the
variance of the odd samples of y(j) will be 3 if:
k =
_
15
36
=
15
6
= 0.6455 (4.63)
316 Image Processing: The Fundamentals
Example B4.11
Show that the noise signal y(j) you constructed in example 4.10 is white
but not independent.
To show that the signal is white, we must show that it has 0 mean and its autocorrela-
tion function is a delta function. We have already shown in example 4.10 that it has 0
mean. Its autocorrelation function for 0 shift (h = 0) is its variance, which was shown
to be 3 in example 4.10. The autocorrelation function for shift h 2 is expected to be
0, because the pairs of values that will be averaged will be from independently drawn
values according to signal x(i). We have only to worry about shift h = 1, because
clearly, in this case, the second member of a pair of such values depends on the value
of the rst member of the pair, according to (4.60). Let us consider the average of the
product of such a pair of values:
y(2i)y(2i + 1)) =
x(i)k
_
x(i)
2
3
_
= k
x(i)
3
_
3k x(i))
= 0 (4.64)
Here we made use of the fact that x(i) are uniformly distributed numbers with 0 mean,
and so their third moment must be 0. So, the autocorrelation function of y(j) is 0 for
all shifts except shift 0. This makes it a delta function and its Fourier transform at,
ie noise y(j) is white.
Noise y(j), however, is clearly not independent by construction. Figure 4.6a shows
the rst 100 samples of a noise sequence x(i) we created. Figure 4.6b shows the rst
100 samples of the corresponding noise sequence y(j). In total we created a sequence
of 1000 samples long. The mean of x(i) was computed to be 0.0182, and its variance
2.8652. The mean of y(j) was computed to be 0.0544, and its variance 2.9303. Figure
4.6c shows its autocorrelation function as a function of the shift h computed using
C(h)
1
N
h
Nh
j=1
y(j)y(j +h) (4.65)
where N is the total number of samples in the sequence and N
h
is the number of pairs
of samples at shift h.
Figure 4.6d shows all pairs of two successive samples of the sequence, plotted against
each other. We can clearly see that the samples are not independent.
Reducing high frequency noise 317
0 20 40 60 80 100
4
2
0
2
4
i
x(i)
0 20 40 60 80 100
3
2
1
0
1
2
3
4
j
y(j)
(a) (b)
0 2 4 6 8 10
0.5
0
0.5
1
1.5
2
2.5
3
h
C(h)
4 2 0 2 4
2
1
0
1
2
3
4
y(j)
y(j+1)
(c) (d)
Figure 4.6: (a) The rst 100 samples of the noise sequence x(i). (b) The rst 100
samples of the noise sequence y(j). (c) The autocorrelation function of y(j) for the
rst 11 possible shifts (h = 0, 1, 2, . . . , 10). It is clearly a delta function indicating
uncorrelated noise. (d) y(j + 1) plotted versus y(j).
Example B4.12
Consider an iid 1D zero-mean Gaussian noise signal x(i) with unit variance.
From this construct a noise signal y(j) as follows:
y(2i) = x(i)
y(2i + 1) =
x(i)
2
1
2
(4.66)
318 Image Processing: The Fundamentals
Show that y(j) is zero-mean white noise with unit variance, which is not
iid.
The noise is zero-mean: the mean value of the even samples is clearly 0 by construction.
The mean value of the odd samples is:
< y(2i + 1) > =
1
2
_
x(i)
2
_
1
_
=
1
2
(1 1) = 0 (4.67)
Next, we must examine whether the signal has unit variance. The variance of the even
samples is 1 by construction. The variance of the odd samples is:
y(2i + 1)
2
_
=
1
2
_
x(i)
4
_
+ 1 2
x(i)
2
__
=
1
2
(3 + 1 2) = 1 (4.68)
Here we made use of (3.163), on page 242.
Then we must examine whether the signal may be used to represent white noise. To
show that the signal has a white power spectrum, we must show that its autocorrelation
function is a delta function. Its autocorrelation function for 0 shift (h = 0) is its
variance, which is 1. The autocorrelation function for shift h 2 is expected to be
0, because the pairs of values that will be averaged will be from independently drawn
values according to signal x(i). We have only to worry about shift h = 1, because
clearly, in this case, the second value of each pair of samples depends on the rst value
of the pair, according to (4.66). Let us consider the average of the product of such a
pair of values:
y(2i)y(2i + 1)) =
1
x(i)
_
x(i)
2
1
_
=
1
2
_
x(i)
3
_
x(i))
_
= 0 (4.69)
Here we made use of the fact that x(i) are Gaussianly distributed numbers with 0 mean,
and, therefore, their third moment must be 0 too. So, the autocorrelation function of
y(j) is 0 for all shifts except shift 0. This makes it a delta function and its Fourier
transform at, ie noise y(j) is white.
Noise y(j), however, is clearly not independent by construction. Its even samples are
clearly Gaussianly distributed, while its odd samples are not (see example 4.13).
Figure 4.7a shows the rst 100 samples of a noise sequence x(i) we created. Figure
4.7b shows the rst 100 samples of the corresponding noise sequence y(j). In total
we created a sequence of 1000 samples long. The mean of x(i) was computed to be
Reducing high frequency noise 319
0.0264, and its variance 1.0973. The mean of y(j) was computed to be 0.0471, and its
variance 1.1230. Figure 4.7c shows its autocorrelation function as a function of the
shift h computed using
C(h)
1
N
h
Nh
j=1
y(j)y(j +h) (4.70)
where N is the total number of samples in the sequence and N
h
is the number of pairs
of samples at shift h.
Figure 4.7d shows all pairs of two successive samples of the sequence, plotted against
each other. We can clearly see that the samples are not independent.
0 20 40 60 80 100
3
2
1
0
1
2
3
i
x(i)
0 20 40 60 80 100
2
1
0
1
2
3
i
y(i)
(a) (b)
0 2 4 6 8 10
0.2
0
0.2
0.4
0.6
0.8
1
1.2
h
C(h)
2 0 2 4 6
3
2
1
0
1
2
3
y(j+1)
y(j)
(c) (d)
Figure 4.7: (a) The rst 100 samples of the noise sequence x(i). (b) The rst 100
samples of the noise sequence y(j). (c) The autocorrelation function of y(j) for the
rst 11 possible shifts (h = 0, 1, 2, . . . , 10). It is clearly a delta function indicating
uncorrelated noise. (d) y(j) plotted versus y(j + 1).
320 Image Processing: The Fundamentals
Box 4.3. The probability density function of a function of a random variable
Assume that variable x is distributed according to probability density function p
1
(x).
Assume also that y = g(x). Finally, assume that the real roots of this equation are
x
1
, x
2
, . . . , x
n
, ie for a specic value of y = y
0
,
x
1
= g
1
(y
0
), . . . , x
n
= g
1
(y
0
) (4.71)
Then the probability density function p
2
(y) of y is
p
2
(y) =
p
1
(x
1
)
[g
(x
1
)[
+
p
1
(x
2
)
[g
(x
2
)[
+ +
p
1
(x
n
)
[g
(x
n
)[
(4.72)
where g
(x) =
dg(x)
dx
.
This formula is intuitively obvious: p
2
(y
0
) expresses the number of values of y that fall
inside interval dy, around y
0
. If we know that n dierent values of x give rise to the
same value of y = y
0
, we must consider all of them. Inside an interval dx around the rst
of these roots, there are p
1
(x
1
)dx values of x that will give rise to values of y about the
y
0
value, inside an interval dy related to dx by dy/dx[
x=x
1
= g
(x
1
) dy = g
(x
1
)dx.
Inside an interval dx around the second of these roots, there are p
1
(x
2
)dx values of x
that will give rise to values of y about the y
0
value, inside an interval dy related to dx
by dy = g
(x
2
)dx, and so on. To obtain the density we need, we must sum up all these
contributing densities. Each number of contributed values has rst to be divided by the
width in which it falls, in order to become a density. The width of the interval is given
by the absolute value of the interval, eg the rst of these widths is [g
(x
1
)[dx. The dx
of the denominator in each term cancels the dx of the numerator, thus, formula (4.72)
follows.
Example B4.13
Work out the probability density function according to which the odd sam-
ples of sequence y(j) in example 4.12 are distributed.
We shall use (4.72) with
p
1
(x) =
1
2
e
x
2
2
g(x) =
1
2
(x
2
1)
g
(x) =
1
2
2x =
2x (4.73)
Reducing high frequency noise 321
The roots of g(x) are:
y =
1
2
(x
2
1)
2y + 1 = x
2
x =
_
2y + 1
_
_
x
1
= +
_
2y + 1
x
2
=
_
2y + 1
(4.74)
Then the probability density function p
2
(y) of the odd samples of sequence y(j) in
example 4.12 is:
p
2
(y) =
1
2
_
2y + 1
2
e
2y+1
2
+
1
2
_
2y + 1
2
e
2y+1
2
=
1
2y + 1
2y+1
2
(4.75)
Figure 4.8 shows a plot of this function. This function is clearly not a Gaussian. So,
signal y(j), of example 4.12, has a dierent probability density function for its even
and its odd samples, and, therefore, it is not iid.
1 0.5 0 0.5 1
0
2
4
6
8
10
12
14
y
p
2
(y)
Figure 4.8: The probability density function of the odd samples of sequence y(j) dened
by equation (4.66).
322 Image Processing: The Fundamentals
Example B4.14
Work out the probability density function according to which the odd sam-
ples of sequence y(j) in example 4.10 are distributed.
We shall use (4.72) with
p
1
(x) =
_
1
6
for 3 x 3
0 elsewhere
g(x) =
15
6
(x
2
3)
g
(x) =
15
6
2x =
15
3
x (4.76)
The roots of g(x) are:
y =
15
6
(x
2
3)
6
15
y + 3 = x
2
x =
15
y + 3
_
_
x
1
= +
_
6
15
y + 3
x
2
=
_
6
15
y + 3
(4.77)
Then the probability density function p
2
(y) of the odd samples of sequence y(j) in
example 4.11 is:
p
2
(y) =
_
_
1
15
3
15
y+3
1
3
for 3
_
6
15
y + 3 3 and 3
_
6
15
y + 3 3
1
15
3
15
y+3
1
6
for 3
_
6
15
y + 3 3 and
_
6
15
y + 3 , [3, 3]
1
15
3
15
y+3
1
6
for
_
6
15
y + 3 , [3, 3] and 3
_
6
15
y + 3 3
0 for
_
6
15
y + 3 , [3, 3] and
_
6
15
y + 3 , [3, 3]
(4.78)
Let us examine the inequalities that appear as conditions of the above equation. First
of all, we observe that if 3
_
6
15
y + 3 3, it is not possible for
_
6
15
y + 3
not to be in the same interval, as if z is smaller than 3, z will be bigger than 3.
So, the second and third branches of the equation are impossible. Next, we note that
Reducing high frequency noise 323
6
15
y + 3 should always be positive, otherwise we shall not have a real root. With this
understanding, we can work out the range of values of y:
3
15
y + 3 3
0
6
15
y + 3 9
3
6
15
y 6
15
2
y
15 (4.79)
Then, the probability density function of y is:
p
2
(y) =
_
_
_
1
15
15
y+3
for
15
2
y
15
0 otherwise
(4.80)
Figure 4.9 shows a plot of this function. This function clearly does not represent
a uniform distribution. So, signal y(j), of example 4.10, has a dierent probability
density function for its even and its odd samples, and, therefore, is not iid.
Figure 4.9: The probability density function of the odd samples of sequence y(j) dened
by equation (4.60).
324 Image Processing: The Fundamentals
Why is noise usually associated with high frequencies?
This is a misconception, particularly when the assumption of white noise is made. White
noise aects all frequencies. However, in general, the deterministic component of the image
has higher power in the low frequencies. If the noise has the same power in all frequencies,
there is a cuto frequency, beyond which the power spectrum of a noisy image is dominated
by the noise spectrum (see gure 4.10 for the case of additive noise). It is this cuto point
that various methods try to identify so that they rid the image from frequencies higher than
that in order to remove the noise. Of course, useful high frequencies are also removed at the
same time and noise at low frequencies remains, and that is why it is not possible to remove
white noise entirely and create a perfect noise-free image.
r
S
p
e
c
t
r
a
l
m
a
g
n
i
t
u
d
e
Clean image spectrum
Noise spectrum
2
+
2
r =
S
p
e
c
t
r
a
l
m
a
g
n
i
t
u
d
e
Noise spectrum
2
+
2
Clean image spectrum
R
1
FT of ideal LP filter
r =
(a) (b)
r
S
p
e
c
t
r
a
l
m
a
g
n
i
t
u
d
e
Clean image spectrum
Noise spectrum
2
+
2
FT of flat filter
r =
r
S
p
e
c
t
r
a
l
m
a
g
n
i
t
u
d
e
Clean image spectrum
Noise spectrum
2
+
2
FT of Gaussian filter
r =
(c) (d)
Figure 4.10: The case of additive white noise. (a) Beyond a certain frequency, the spectrum
of a noisy image is dominated by the spectrum of the noise. Ideally we should use a lter that
will eliminate all frequencies beyond the change over frequency r. This is the idea behind low
pass ltering a noisy image. The issue is which low pass lter one should use: (b) the ideal
low pass lter with cuto frequency R = r, which has to be implemented in the frequency
domain, or (c) a lter conveniently dened in the real domain with imperfect frequency
response, which may enhance some of the frequencies we wish to kill. (d) A compromise is
a Gaussian lter that goes smoothly to zero in the real as well as in the frequency domain.
However, the Gaussian lter will not treat all frequencies the same: some desirable frequencies
will be subdued and some undesirable frequencies will be suppressed less than others.
Reducing high frequency noise 325
How do we deal with multiplicative noise?
Multiplicative noise may easily be converted to additive noise by taking the logarithm of the
noisy signal. If s(i) is a signal that is aected by multiplicative noise n(i), the result will be
a noisy signal t(i) = s(i)n(i). To remove n(i) from t(i), we rst take the logarithm of t(i):
t(i) log t(i) = log s(i) + log n(i). We may call log s(i) s(i) and log n(i) n(i). We then
have a noisy signal
t(i) = s(i) + n(i), from which we have to remove the additive noise n(i).
To perform this task we may apply any method appropriate for dealing with additive noise.
A case of multiplicative noise/interference is discussed in the next section.
Box 4.4. The Fourier transform of the delta function
We insert the delta function into the denition formula of the Fourier transform:
() =
_
+
(t)e
jt
dt = e
jt
t=0
= 1 (4.81)
Here we made use of the following property of the delta function: when the delta
function is multiplied with another function and integrated from to +, it picks
up the value of the other function at the point where the argument of the delta function
is zero. In this particular case, the argument of the delta function is t and it is zero at
t = 0.
Box 4.5. Wiener-Khinchine theorem
We shall show that the Fourier transform of the spatial autocorrelation function of a
real-valued random eld f(x, y) is equal to the spectral power density [
F(u, v)[
2
of the
eld.
The spatial autocorrelation function of f(x, y) is dened as
R
ff
( x, y) =
_
+
_
+
R
ff
(u, v), of R
ff
( x, y):
R
ff
(u, v) =
_
+
_
+
R
ff
( x, y)e
j( xu+ yv)
d xd y
=
_
+
_
+
_
+
_
+
R
ff
(u, v) =
_
+
_
+
_
+
_
+
f(s
1
, s
2
)f(x, y)e
j((s
1
x)u+(s
2
y)v)
dxdyds
1
ds
2
The two double integrals on the right-hand side are separable, so we may write:
R
ff
(u, v) =
_
+
_
+
f(s
1
, s
2
)e
j(s
1
u+s
2
v)
ds
1
ds
2
_
+
_
+
f(x, y)e
j(xu+yv)
dxdy
We recognise the rst of the double integrals on the right-hand side of this equation to
be the Fourier transform
F(u, v) of f(s
1
, s
2
) and the second double integral its complex
conjugate
F
R
ff
(u, v) =
F(u, v)
F
(u, v) = [
F(u, v)[
2
Is the assumption of Gaussian noise in an image justied?
According to the central limit theorem we discussed in Chapter 3, page 235, when several
random numbers are added, the sum tends to be Gaussianly distributed. There are many
sources of noise in an image, like instrument noise, quantisation noise, etc. We may, therefore,
assume that all these noise components combined may be modelled as Gaussian noise, and
that is why it is very common to assume that the noise in an image is Gaussian. The shot noise
appears in special cases: in synthetic aperture radar images (SAR) due to the special
imaging conditions, or in ordinary images due to degradations caused by specic sources, like,
for example, damage of old photographs by insects or sprayed chemicals.
How do we remove shot noise?
We use various statistical lters, like rank order ltering or mode ltering.
What is a rank order lter?
A rank order lter is a lter the output value of which depends on the ranking of the pixels
according to their grey values inside the lter window. The most common rank order lter is
the median lter.
What is median ltering?
The median is the value which divides a distribution in two equally numbered populations.
For example, if we use a 55 window, we have 25 grey values which we order in an increasing
sequence. Then the median is the thirteenth value. Median ltering has the eect of forcing
Reducing high frequency noise 327
(a)Image with impulse noise (b)Image with additive Gaussian noise
(c)Median ltering of (a) (d)Median ltering of (b)
(e)Smoothing of (a) by averaging (f)Smoothing of (b) by averaging
Figure 4.11: We must use the right type of ltering for each type of noise: on the left, the
image Ocer damaged by impulse noise, and below it, attempts to remove this noise by
median ltering and by spatial averaging; on the right, the same image damaged by zero-
mean, additive, white, Gaussian noise, and below it, attempts to remove it by median ltering
and by spatial averaging. Note that spatial averaging does not really work for impulse noise,
and median ltering is not very eective for Gaussian noise.
328 Image Processing: The Fundamentals
points with distinct intensities to be more like their neighbours, thus eliminating intensity
spikes which appear isolated.
Figure 4.11c shows image 4.11a processed with a median lter and with a window of size
5 5, while gure 4.11d shows image 4.11b (which contains Gaussian noise) having been
processed in the same way. It is clear that the median lter removes the impulse noise almost
completely.
What is mode ltering?
Mode ltering involves assigning to the central pixel the most common value inside the local
window around the pixel (the mode of the histogram of the local values).
How do we reduce Gaussian noise?
We can remove Gaussian noise by smoothing the image. For example, we may replace the
value of each pixel by the average value inside a small window around the pixel. Figures 4.11e
and 4.11f show the result of applying this process to images 4.11a and 4.11b, respectively.
The size of the window used is the same as for the median ltering of the same images, ie
5 5. We note that this type of ltering is much more eective for the Gaussian noise, but
produces bad results in the case of impulse noise. This is a simple form of lowpass ltering
of the image. A better way to low pass lter the image is to use a Gaussian lter of size
(2M + 1) (2M + 1), rather than using a at lter. Note that all low pass lters (like the
Gaussian lter) are eectively averaging lters, computing a weighted average of the values
inside the local window.
Example 4.15
We process an image by using windows of size 5 5. The grey pixel values
inside a 5 5 subimage are:
15, 17, 15, 17, 16, 10, 8, 9, 18, 15, 16, 12, 14, 11, 15, 14, 15, 18, 100, 15, 14, 13, 12, 12, 17.
Which value would
(i) a local averaging,
(ii) a median and
(iii) a mode
lter assign to the central pixel of this subimage?
(i) The local averaging lter would assign to the central pixel of the subimage the
rounded to the nearest integer average value of the pixels inside the 5 5 window:
Average =
1
25
(15 + 17 + 15 + 17 + 16 + 10 + 8 + 9 + 18 + 15 + 16 + 12 + 14 + 11
+15 + 14 + 15 + 18 + 100 + 15 + 14 + 13 + 12 + 12 + 17) = 17.52
So, the assigned value will be 18.
Reducing high frequency noise 329
(ii) The median lter will assign to the central pixel the median value of the grey values
inside the window. To identify the median value, we rst rank all the grey values we
are given:
8, 9, 10, 11, 12, 12, 12, 13, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 100
The 13th number in this sequence is the median, which is 15 and this is the value
assigned to the central pixel by the median lter.
(iii) The mode in the above list of numbers is also 15, because this is the most frequent
number. So, the mode lter will also assign value 15 to the central pixel.
We note that the outlier value 100, which most likely is the result of impulse noise,
severely aected the output of the mean lter, but it did not aect the value either of
the mode or the median lter. That is why an averaging lter (and in general a low
pass convolution lter) is not appropriate for removing impulse noise.
Example 4.16
Work out the weights of Gaussian smoothing lters of size (2M+1)(2M+1),
for M = 2, 3 and 4.
The values of these lters will be computed according to function
g(r) = e
r
2
2
2
(4.84)
where is a lter parameter, that has to be specied, and r
2
x
2
+ y
2
. Since the
lter will be (2M + 1) (2M + 1), the value of the lter at the truncation point will
be given by g(M) = e
M
2
/(2
2
)
(see gure 4.12). We wish the lter to go smoothly to
0, so this nal value of the lter should be small. Let us call it . Parameter should
be chosen so that:
= e
M
2
2
2
ln =
M
2
2
2
=
M
2 ln
(4.85)
For = 0.01 we work out that should be 0.659 for M = 2, 0.989 for M = 3 and
1.318 for M = 4.
330 Image Processing: The Fundamentals
O
B
A C
D
Figure 4.12: The circles represent the isocontours of the Gaussian function we use
to compute the values of the (2M + 1) (2M + 1) smoothing lter at the black dots.
In this case, M = 2. Point O is at position (0, 0) and function (4.84) has value 1
there. The value of the function is set to 0 outside the 5 5 square. The places where
this truncation creates the maximum step are points A, B, C and D. At those points,
function (4.84) has value e
2
2
/(2
2
)
, as those points are at coordinates (M, 0) and
(0, M).
Finally, we compute the values of the lter using (4.84) for all positions in a (2M +
1) (2M + 1) grid, assuming that the (0, 0) point is the central cell of the grid. The
values we nd for M = 2, 3 and 4, respectively, are:
0.0001 0.0032 0.0100 0.0032 0.0001
0.0032 0.1000 0.3162 0.1000 0.0032
0.0100 0.3162 1.0000 0.3162 0.0100
0.0032 0.1000 0.3162 0.1000 0.0032
0.0001 0.0032 0.0100 0.0032 0.0001
0.0001 0.0013 0.0060 0.0100 0.0060 0.0013 0.0001
0.0013 0.0167 0.0774 0.1292 0.0774 0.0167 0.0013
0.0060 0.0774 0.3594 0.5995 0.3594 0.0774 0.0060
0.0100 0.1292 0.5995 1.0000 0.5995 0.1292 0.0100
0.0060 0.0774 0.3594 0.5995 0.3594 0.0774 0.0060
0.0013 0.0167 0.0774 0.1292 0.0774 0.0167 0.0013
0.0001 0.0013 0.0060 0.0100 0.0060 0.0013 0.0001
0.0001 0.0007 0.0032 0.0075 0.0100 0.0075 0.0032 0.0007 0.0001
0.0007 0.0056 0.0237 0.0562 0.0750 0.0562 0.0237 0.0056 0.0007
0.0032 0.0237 0.1000 0.2371 0.3162 0.2371 0.1000 0.0237 0.0032
0.0075 0.0562 0.2371 0.5623 0.7499 0.5623 0.2371 0.0562 0.0075
0.0100 0.0750 0.3162 0.7499 1.0000 0.7499 0.3162 0.0750 0.0100
0.0075 0.0562 0.2371 0.5623 0.7499 0.5623 0.2371 0.0562 0.0075
0.0032 0.0237 0.1000 0.2371 0.3162 0.2371 0.1000 0.0237 0.0032
0.0007 0.0056 0.0237 0.0562 0.0750 0.0562 0.0237 0.0056 0.0007
0.0001 0.0007 0.0032 0.0075 0.0100 0.0075 0.0032 0.0007 0.0001
Reducing high frequency noise 331
To make sure that the lter will not alter the values of a at patch, we normalise its
values so that they sum up to 1, by dividing each of them with the sum of all. The
result is:
0.0000 0.0012 0.0037 0.0012 0.0000
0.0012 0.0366 0.1158 0.0366 0.0012
0.0037 0.1158 0.3662 0.1158 0.0037
0.0012 0.0366 0.1158 0.0366 0.0012
0.0000 0.0012 0.0037 0.0012 0.0000
0.0000 0.0002 0.0010 0.0016 0.0010 0.0002 0.0000
0.0002 0.0027 0.0126 0.0210 0.0126 0.0027 0.0002
0.0010 0.0126 0.0586 0.0977 0.0586 0.0126 0.0010
0.0016 0.0210 0.0977 0.1629 0.0977 0.0210 0.0016
0.0010 0.0126 0.0586 0.0977 0.0586 0.0126 0.0010
0.0002 0.0027 0.0126 0.0210 0.0126 0.0027 0.0002
0.0000 0.0002 0.0010 0.0016 0.0010 0.0002 0.0000
0.0000 0.0001 0.0003 0.0007 0.0009 0.0007 0.0003 0.0001 0.0000
0.0001 0.0005 0.0022 0.0052 0.0069 0.0052 0.0022 0.0005 0.0001
0.0003 0.0022 0.0092 0.0217 0.0290 0.0217 0.0092 0.0022 0.0003
0.0007 0.0052 0.0217 0.0516 0.0688 0.0516 0.0217 0.0052 0.0007
0.0009 0.0069 0.0290 0.0688 0.0917 0.0688 0.0290 0.0069 0.0009
0.0007 0.0052 0.0217 0.0516 0.0688 0.0516 0.0217 0.0052 0.0007
0.0003 0.0022 0.0092 0.0217 0.0290 0.0217 0.0092 0.0022 0.0003
0.0001 0.0005 0.0022 0.0052 0.0069 0.0052 0.0022 0.0005 0.0001
0.0000 0.0001 0.0003 0.0007 0.0009 0.0007 0.0003 0.0001 0.0000
Example 4.17
The image Fun Fair of gure 4.13 is corrupted with Gaussian noise.
Reduce its noise with the help of the 5 5 lter produced in example 4.16.
First we create an empty grid the same size as the original image. To avoid boundary
eects, we do not process a stripe of 2-pixels wide all around the input image. These
pixels retain their original value in the output image. For processing all other pixels,
we place the lter with its centre coinciding with the pixel, the value of which is to be
recalculated, and multiply the lter value with the corresponding pixel value under it,
sum up the 25 products and assign the result as the new value of the central pixel in
the output image. The window is shifted by one pixel in both directions until all pixels
are processed. This process is shown schematically in gure 4.14. In relation to this
332 Image Processing: The Fundamentals
gure, the output values at the central pixels of the two positions of the window shown
in (a) are given by:
o
22
= g
00
f
2,2
+g
01
f
2,1
+g
02
f
2,0
+g
03
f
2,1
+g
04
f
2,2
+g
10
f
1,2
+g
11
f
1,1
+g
12
f
1,0
+g
13
f
1,1
+g
14
f
1,2
+g
20
f
0,2
+g
21
f
0,1
+g
22
f
0,0
+g
23
f
0,1
+g
24
f
0,2
+g
30
f
1,2
+g
31
f
1,1
+g
32
f
1,0
+g
33
f
1,1
+g
34
f
1,2
+g
40
f
2,2
+g
41
f
2,1
+g
42
f
2,0
+g
43
f
2,1
+g
44
f
2,2
(4.86)
And:
o
43
= g
21
f
2,2
+g
22
f
2,1
+g
23
f
2,0
+g
24
f
2,1
+g
25
f
2,2
+g
31
f
1,2
+g
32
f
1,1
+g
33
f
1,0
+g
34
f
1,1
+g
35
f
1,2
+g
41
f
0,2
+g
42
f
0,1
+g
43
f
0,0
+g
44
f
0,1
+g
45
f
0,2
+g
51
f
1,2
+g
52
f
1,1
+g
53
f
1,0
+g
54
f
1,1
+g
55
f
1,2
+g
61
f
2,2
+g
62
f
2,1
+g
63
f
2,0
+g
64
f
2,1
+g
65
f
2,2
(4.87)
The result of applying this process to the image of gure 4.13a is shown in 4.13b.
(a) Image with Gaussian noise (b) Image after Gaussian ltering
Figure 4.13: Gaussian ltering applied to remove Gaussian noise from an image.
Reducing high frequency noise 333
g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g
g g g g g g g g
g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
01 02 03 04 05 06 07 08
10
20
30
40
50
60
70
80
11
12 13 14 15 16 17 18
21 22 23 24 25 26 27 28
31 32 33 34 35 36 37 38
41 43 44 45 46 47 48
51 52 53 54 55 56 57 58
g
00
62 61 63 64 65 66 67
68
71 72 74 73 75 76 77 78
81 82 83
84
85 86 87 88
22 23 24 25 26 27 28
32 33 34 35 36 37 38
42 43 44 45 46 47 48
52 53 54 55 56 57 58
62 63 64 65 66 67
68
72 74 73 75 76 77 78
82 83
84
85 86 87 88
o o o o o o o
o o o o o o o
o
o
o
o
o o
o
o
o
o o
o
o
o
o o
o
oo
o
o o
o
o
o
o o
o
o
o
o o
o
o
o
o
(a)
f
f
(c)
42
f
f f f
2,0
0,0
1,0
2,0
2,1
1,1
0,1
1,1
2,2
1,2
0,2
1,2
2,2
f
(b)
2,2
0,2 0,1
f
f
1,1
1,2
f
f
1,2
f
f
f
f
f
f
f
f
f f
f
f
f
f
2,1
1,1
2,1
1,0
2,2 2,1
Figure 4.14: (a) The input image with grey values g
ij
. (b) The 5 5 smoothing lter
with values f
ij
. (c) The result of processing the input image with the lter. The pixels
marked with crosses have unreliable values as their values are often chosen arbitrarily.
For example, they could be set identical to the input values, or they might be calculated
by assuming that those pixels have full neighbourhoods with the missing neighbours
having value 0, or value as if the image were repeated in all directions.
Can we have weighted median and mode lters like we have weighted mean lters?
Yes. The weights of a median or a mode indicate how many times the corresponding number
should be repeated. Figure 4.15 shows an image with impulse noise added to it, and two
versions of improving it by unweighted and by weighted median ltering. The weights used
are given in table 4.1.
0 1 1 1 0
1 2 2 2 1
1 2 4 2 1
1 2 2 2 1
0 1 1 1 0
Table 4.1: Weights that might be used in conjunction with a median or a mode lter.
334 Image Processing: The Fundamentals
(a) Image with impulse noise (b) Image detail with no noise
(c) Unweighted median lter (d) Weighted median lter
(e) Detail of (c) (f) Detail of (d)
Figure 4.15: Median ltering applied to the Ocer with impulse noise (where 10% of the
pixels are set to grey level 255). The weighted version produces better results, as it may be
judged by looking at some image detail and comparing it with the original shown in (b).
Reducing high frequency noise 335
Example 4.18
The sequence of 25 numbers of example 4.15 was created by reading se-
quentially the grey values of an image in a 5 5 window. You are asked to
compute the weighted median of that image, using the weights of table 4.1.
What value will this lter give for the particular window of this example?
First we write the grey values of example 4.15 in their original spatial arrangement:
15 17 15 17 16
10 8 9 18 15
16 12 14 11 15
14 15 18 100 15
14 13 12 12 17
Then we use the weights to repeat each entry of the above table the corresponding
number of times and thus create the sequence of numbers that we shall have to rank:
17, 15, 17, 10, 8, 8, 9, 9, 18, 18, 15, 16, 12, 12, 14, 14, 14, 14, 11, 11, 15, 14, 15, 15, 18, 18, 100,
100, 15, 13, 12, 12.
Ranking these numbers in increasing order yields:
8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 12, 13, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 16, 17, 17, 18,
18, 18, 18, 100, 100.
There are 32 numbers in this sequence and the median is between the 16th and the
17th number. Both these numbers are equal to 14, so the median is 14. (If the
numbers were dierent their average rounded to the nearest integer would have been
considered.)
The most occurring number is 15, so the output of the mode lter would be 15.
Can we lter an image by using the linear methods we learnt in Chapter 2?
Yes, often low, band and high pass image ltering is done by convolving the image with a
suitable lter. This is the reason we prefer the lters to be nite in extent: nite convolution
lters may be implemented as matrix operators applied to the image.
Example 4.19
You have a 3 3 image which may be represented by a 9 1 vector. Derive
a matrix which, when it operates on this image, smooths its columns by
averaging every three successive pixels, giving them weights
1
4
,
1
2
,
1
4
, and
assigning the result to the central pixel. To deal with the border pixels,
assume that the image is repeated periodically in all directions.
Let us say that the original image is
336 Image Processing: The Fundamentals
_
_
g
11
g
12
g
13
g
21
g
22
g
23
g
31
g
32
g
33
_
_
(4.88)
and its smoothed version is:
_
_
g
11
g
12
g
13
g
21
g
22
g
23
g
31
g
32
g
33
_
_
(4.89)
Let as also say that the smoothing matrix we wish to identify is A, with elements a
ij
:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
11
a
12
. . . a
19
a
21
a
22
. . . a
29
a
31
a
32
. . . a
39
a
41
a
42
. . . a
49
a
51
a
52
. . . a
59
a
61
a
62
. . . a
69
a
71
a
72
. . . a
79
a
81
a
82
. . . a
89
a
91
a
92
. . . a
99
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(4.90)
From the above equation we have:
g
11
= a
11
g
11
+a
12
g
21
+a
13
g
31
+a
14
g
12
+a
15
g
22
+a
16
g
32
+a
17
g
13
+a
18
g
23
+a
19
g
33
(4.91)
From the denition of the smoothing mask, we have:
g
11
=
1
4
g
31
+
1
2
g
11
+
1
4
g
21
(4.92)
Comparison of equations (4.91) and (4.92) shows that we must set:
a
11
=
1
2
, a
12
=
1
4
, a
13
=
1
4
, a
14
= a
15
= . . . = a
19
= 0 (4.93)
Working in a similar way for a few more elements, we can see that the matrix we wish
to identify has the form:
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
2
1
4
1
4
0 0 0 0 0 0
1
4
1
2
1
4
0 0 0 0 0 0
1
4
1
4
1
2
0 0 0 0 0 0
0 0 0
1
2
1
4
1
4
0 0 0
0 0 0
1
4
1
2
1
4
0 0 0
0 0 0
1
4
1
4
1
2
0 0 0
0 0 0 0 0 0
1
2
1
4
1
4
0 0 0 0 0 0
1
4
1
2
1
4
0 0 0 0 0 0
1
4
1
4
1
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(4.94)
Reducing high frequency noise 337
How do we deal with mixed noise in images?
If an image is aected by additive Gaussian as well as impulse noise, then we may use the
-trimmed lter: after we rank the grey values inside the smoothing window, we may keep
only the N(1 ) values that are closest to the median value. We may then compute only
from them the mean value that we shall assign to the central pixel of the window.
Can we avoid blurring the image when we are smoothing it?
Yes. Some methods we may use are:
(i) edge adaptive smoothing;
(ii) mean shift smoothing;
(iii) anisotropic diusion.
What is the edge adaptive smoothing?
When we smooth an image, we place a window around a pixel, compute an average value
inside the window and assign it to the central pixel. If the window we use happens to span two
dierent regions, the boundary between the two regions will be blurred. In edge preserving
smoothing, we place several windows around the pixel, having the pixel in all possible relative
positions with respect to the window centre. Inside each window we compute the variance of
the pixel values. We select the window with the minimum variance. We compute the average
(weighted or not) of the pixels inside that window and assign that value to the pixel under
consideration. Figure 4.16 shows the example of an image that has a diagonal edge. In this
example, window C is expected to be the most homogeneous of all windows to which the
pixel identied with the cross belongs. (Window C will have the least variance.) Then the
new value for the marked pixel will be computed from the values inside this window.
A
B
C
Figure 4.16: This grid represents a 14 14 image with an edge. The cross identies the
pixel for which we have to compute a new value. Let us say that the new value is computed
inside a 5 5 window. Conventionally, we would use window A. However, in edge preserving
smoothing, we may consider 25 windows of size 5 5, all of which contain the pixel, but in
dierent locations in relation to the centre of the window. Two of those windows are shown
here, identied as windows B and C.
Figure 4.17 shows some noisy images, their smoothed versions with a at or a Gaussian
lter and their smoothed versions with an edge preserving at or Gaussian lter. Box 4.6
shows how to compute the local variance in an ecient way.
338 Image Processing: The Fundamentals
(a) Original Greek Flag (b) Flat lter (c) Edge preserving at
(d) Gaussian lter (e)Edge preserving Gaussian
(f) Original Roof Tiles (g) Flat lter (h) Edge preserving at
(i) Gaussian lter (j)Edge preserving Gaussian
Figure 4.17: Noisy images and their smoothed versions by a 5 5 averaging window placed
around each pixel or with its position selected according to the local variance.
Reducing high frequency noise 339
Box 4.6. Ecient computation of the local variance
Let us say that we wish to compute the variance of numbers x
i
, where i = 1, 2, . . . , N.
Let us denote by <> the averaging operator. The mean of these numbers then is
< x
i
>. The variance is:
2
< (x
i
)
2
>
= < x
2
i
+
2
2x
i
>
= < x
2
i
> +
2
2 < x
i
>
= < x
2
i
> +
2
2
2
= < x
2
i
> < x
i
>
2
(4.95)
To select then the window with the least variance that contains each pixel, we use the
following algorithm.
Step 1: Convolve the original image I with a at averaging window of the di-
mensions you have preselected, say 5 5. Call the output array A.
Step 2: Square the elements of array A. Call the output array B.
Step 3: Construct an array the same size as the input image where the value of each
pixel is squared. Call this array C.
Step 4: Convolve array C with a at averaging window of the dimensions you have
preselected, say 5 5. Call this array D.
Step 5: Subtract array B from array D. Call this array E.
Step 6: When you want to select a new value for pixel (i, j), consider all pixels inside
a window of the preselected size, say 5 5, centred at (i, j), and identify the pixel with
the smallest value in array E. Use that pixel as the centre of the window from which
you will compute the new value of pixel (i, j), from the values of the original image I.
How does the mean shift algorithm work?
According to this algorithm, a pixel is represented by a triplet in a 3D space where the two
dimensions represent the position of the pixel in the image and the third dimension is used to
measure its brightness. A pixel (i, j) then in this space is represented by point (x
ij
, y
ij
, g
ij
),
where to begin with, x
ij
= i, y
ij
= j and g
ij
is its grey value. The pixels in this 3D space are
allowed to move and create agglomerations. The movement of the pixels happens iteratively,
where at each iteration step a new vector (x
ij
, y
ij
, g
ij
) is computed for each pixel (i, j). At
the m+ 1 iteration, this new vector is given by
x
m+1
ij
=
(k,l)N
ij
x
m
kl
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_
(k,l)N
ij
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_
y
m+1
ij
=
(k,l)N
ij
y
m
kl
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_
(k,l)N
ij
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_ (4.96)
340 Image Processing: The Fundamentals
g
m+1
ij
=
(k,l)N
ij
g
m
kl
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_
(k,l)N
ij
g
_
(x
m
ij
x
m
kl
)
2
h
2
x
_
g
_
(y
m
ij
y
m
kl
)
2
h
2
y
_
g
_
(g
m
ij
g
m
kl
)
2
h
2
g
_ (4.97)
where h
x
, h
y
and h
g
are appropriately chosen scaling constants, ^
ij
is a neighbourhood of
pixel (i, j), dened as a 3D sphere using the Euclidean metric, and
g(x) e
x
or g(x)
_
1 for [x[ w
0 otherwise
(4.98)
with w being a parameter specifying the size of the at kernel. The iterations may be repeated
for a prespecied number of times. At the end of the last iteration, pixel (i, j) takes, as grey
value, the value g
m
final
ij
, rounded to the nearest integer.
Figures 4.18 and 4.19 show the results of the mean shift algorithm after it was applied
for a few iterations to some noisy images. The algorithm was run with h
x
= h
y
= 15 and
h
g
= 25.5 (that is h
g
= 0.1 for grey values scaled in the range [0.1]). Neighbourhood ^
ij
was
the full image. This algorithm converges only when all pixels have the same values.
(a) Original (b) Iteration 1 (c) Iteration 2
(d) Iteration 3 (e) Iteration 4 (f) Iteration 5
Figure 4.18: Noisy Leonidas (128 128 in size) and its smoothed versions by the mean
shift algorithm, after a few iterations. As the iterations progress, the noise reduces, but also
signicant image details are lost, as small regions are incorporated in larger neighbouring
regions.
Reducing high frequency noise 341
(a) Original (b) Iteration 1 (c) Iteration 2
(d) Iteration 3 (e) Iteration 4 (f) Iteration 5
(g) Original (h) Iteration 1 (i) Iteration 2
(j) Iteration 3 (k) Iteration 4 (l) Iteration 5
Figure 4.19: Noisy images (128 128 in size) and their smoothed versions by the mean shift
algorithm.
342 Image Processing: The Fundamentals
What is anisotropic diusion?
It is an algorithm that generalises Gaussian ltering, used to reduce additive Gaussian noise
(see Box 4.7), to make it adaptive to local image gradient (see Box 4.10), so that edges are
preserved.
Box 4.7. Scale space and the heat equation
Let us imagine that we smooth an image with low pass Gaussian lters of increasing
standard deviation . This way we create a stack of images, of progressively lower
and lower detail. We may view this stack of images as the image representation in a
3D space, where the two axes are the (x, y) image axes and the third axis is standard
deviation , which in this context is referred to as scale. This 3D space is known as
scale space. Figure 4.20 shows an example image and some of its smoothed versions
with lters of increasing scale. We can see that as the scale increases more and more
image features disappear, while the features that survive are the most prominent image
features, but appear very blurred. The places, where the borders of the distinct image
regions meet, become progressively blurred and the gradient magnitude that measures
the contrast of the image in those places gradually diuses, so only the borders with the
strongest contrast survive albeit in a blurred way. Figure 4.21 shows the corresponding
gradient magnitude images to those shown in gure 4.20.
This diusion of image information observed in scale space may be also seen in a cross
section of the stack of images created, as shown in gure 4.22. We can see how the grey
value from a location diuses to neighbouring locations as the scale increases. It can be
shown (see example 4.21) that this diusion of grey value from one pixel to other pixels
can be modelled by the heat diusion equation
I(x, y; )
=
_
2
I(x, y; )
x
2
+
2
I(x, y; )
y
2
_
(4.99)
where I(x, y; ) is the image seen as a function dened in the 3D scale space. The
bracketed expression on the right-hand side of (4.99) is known as the Laplacian of the
image, sometimes denoted as I or
2
I. Equation (4.99) may, therefore, be written in
all the following equivalent forms (see also Box 4.8):
I(x, y; )
= I(x, y; )
=
2
I(x, y; )
= div(gradI(x, y; ))
= I(x, y; ) (4.100)
In Physics, corresponds to time and the image grey value to temperature.
Reducing high frequency noise 343
(a) Father and daughters (b) = 1.98, M = 6
(c) = 4.28, M = 13 (d) = 8.24, M = 25
(e) = 16.48, M = 50 (f) = 32.95, M = 100
Figure 4.20: As the scale of the lter with which we smooth an image increases, less
and less information survives. The size of the image is 362 512. The lters used were
designed to have size (2M+1)(2M+1) with discontinuity = 0.01, using the method
described in example 4.16, on page 329.
344 Image Processing: The Fundamentals
(a) Father and daughters (b) = 1.98
(c) = 4.28 (d) = 8.24
(e) = 16.48 (f) = 32.95
Figure 4.21: The gradient magnitude images of gure 4.20, each one individually scaled
to the range [0, 255]. As the scale of the lter, with which we smooth the image,
increases, the transition between regions becomes less as less sharp and this manifests
itself in very broad stripes of large gradient magnitude.
Reducing high frequency noise 345
Figure 4.22: Along the vertical axis we measure scale , which increases from = 0
(original image, bottom) to = 32.95 (top), in 101 steps, inclusive. The horizontal axis
is the x axis of the image. Left: a cross section of the 3D image representation in scale
space. Right: a cross section of the corresponding gradient magnitude images. Note
how the strength of the gradient magnitude weakens as the edges diuse. That is why
we had to scale each gradient image individually to be able to visualise it in gure 4.21.
Box 4.8. Gradient, Divergence and Laplacian
The gradient of a function f(x, y) is a vector denoted and dened as:
grad(f(x, y)) f(x, y)
_
f(x, y)
x
,
f(x, y)
y
_
T
(4.101)
The gradient vector of a function identies the direction of maximum change of the
function.
The divergence of a vector u (u
x
, u
y
) is a function, dened and denoted as:
div(u)
u
x
x
+
u
y
y
(4.102)
If the vector is thought of as a velocity vector, its divergence measures the total ow
away from the point of its denition.
The Laplacian of a function f(x, y) is the divergence of its gradient vector:
f(x, y) div(grad(f(x, y)))
= f(x, y)
=
2
f(x, y)
=
_
f(x, y)
x
,
f(x, y)
y
_
T
=
2
f(x, y)
x
2
+
2
f(x, y)
y
2
(4.103)
Thus, the Laplacian of a function is equal to the sum of its second derivatives. These
formulae generalise trivially to higher dimensions.
346 Image Processing: The Fundamentals
Example B4.20
Show that for function g(x, y; ) = e
(x
2
+y
2
)/(2
2
)
/(2
2
), the following is cor-
rect:
1
g(x, y; )
=
2
g(x, y; )
x
2
+
2
g(x, y; )
y
2
(4.104)
We start by computing the derivative on the left-hand side of (4.104):
g(x, y; )
=
2
2
3
e
x
2
+y
2
2
2
+
1
2
2
e
x
2
+y
2
2
2
2x
2
+ 2y
2
2
3
=
1
2
_
x
2
+y
2
5
2
3
_
e
x
2
+y
2
2
2
(4.105)
Let us then compute the rst derivative of g(x, y; ) with respect to x:
g(x, y; )
x
=
1
2
2
2x
2
2
e
x
2
+y
2
2
2
=
x
2
4
e
x
2
+y
2
2
2
(4.106)
The second derivative with respect to x is:
2
g(x, y; )
x
2
=
1
2
4
e
x
2
+y
2
2
2
+
x
2
2
6
e
x
2
+y
2
2
2
(4.107)
In a similar way, the second derivative with respect to y is worked out to be:
2
g(x, y; )
y
2
=
1
2
4
e
x
2
+y
2
2
2
+
y
2
2
6
e
x
2
+y
2
2
2
(4.108)
Combining (4.107) and (4.108), we can work out the right-hand side of (4.104):
2
g(x, y; )
x
2
+
2
g(x, y; )
y
2
=
2
2
4
e
x
2
+y
2
2
2
+
x
2
+y
2
2
6
e
x
2
+y
2
2
2
=
1
2
_
x
2
+y
2
2
5
2
3
_
e
x
2
+y
2
2
2
(4.109)
Upon comparison with (4.105), equation (4.104) follows.
Reducing high frequency noise 347
Example B4.21
The embedding of an image into the 3D scale space is achieved by smooth-
ing the image with a Gaussian low pass lter g(x, y; ) with increasing values
of . Show that the change in the grey value of a pixel as changes is ex-
pressed by equation (4.99).
We may say that the value of pixel (x, y), when the image has been smoothed with lter
g(x, y; ), is given by the convolution integral:
I(x, y; ) =
_
+
_
+
_
+
=
_
+
_
+
g(u, v; )
=
_
+
_
+
2
g(u, v; )
u
2
I(x u, y v)dudv
+
_
+
_
+
2
g(u, v; )
v
2
I(x u, y v)dudv (4.112)
We shall see in Chapter 6 (example 6.49, page 622) that convolution of the image with
the second derivative of the Gaussian along the x direction yields an estimate of the
second derivative of the image along the x axis, and convolution of the image with
the second derivative of the Gaussian along the y direction, yields an estimate of the
second derivative of the image along the y axis. Thus, equation (4.99) is valid.
348 Image Processing: The Fundamentals
Box 4.9. Dierentiation of an integral with respect to a parameter
Assume that the denite integral I() depends on a parameter , as follows:
I() =
_
b()
a()
f(x; )dx (4.113)
Its derivative with respect to is given by the following formula, known as the Leibniz
rule:
dI()
d
=
db()
d
f(b(); )
da()
d
f(a(); ) +
_
b()
a()
f(x; )
dx (4.114)
Box 4.10. From the heat equation to the anisotropic diusion algorithm
When we smooth an image I with a Gaussian lter of standard deviation , the value of
a pixel diuses according to equation (4.100), which may be rewritten in a more general
form
I(x, y; )
|I(x,y;))|
b
. Note that when [I(x, y; ))[ >> b > 0, the exponent
is large and this function becomes very small. This happens along the direction of max-
imum change, ie orthogonal to the direction of an image edge. If [I(x, y; ))[ << b,
e
|I(x,y;))|
b
is large and the diusion of the grey values along this direction is facili-
tated. This happens parallel to lines of constant grey value, ie parallel to image edges.
Thus, the modied heat equation for anisotropic diusion becomes:
I(x, y; )
= div(e
|I(x,y;))|
b
grad(I(x, y; ))) (4.116)
Reducing high frequency noise 349
How do we perform anisotropic diusion in practice?
Assume that I(i, j) is the image we wish to process.
Step 0: Decide upon a value C, which indicates that the C% weakest gradient magni-
tude values are assumed to be due to noise and the rest due to genuine image discontinuities.
This may be decided by looking at the histogram of the gradient magnitude values.
Step 1: At each image iteration, compute the gradient magnitude of the image pixels by
using one of the lters for this purpose (discussed in Chapter 6, pages 596 and 608). Create
the histogram of the values of the magnitude of all gradient vectors. Starting from the rst
bin, accumulate the entries of the successive bins until C% of pixels have been accounted for.
The value of the last bin is noted as threshold B.
Step 2: For each image pixel (i, j), compute the following quantities:
N
(i, j) I(i 1, j) I(i, j)
S
(i, j) I(i + 1, j) I(i, j)
E
(i, j) I(i, j + 1) I(i, j)
W
(i, j) I(i, j 1) I(i, j) (4.117)
Step 3: For each image pixel (i, j), compute the following quantities
c
N
(i, j) g(
N
(i, j))
c
S
(i, j) g(
S
(i, j))
c
E
(i, j) g(
E
(i, j))
c
W
(i, j) g(
W
(i, j)) (4.118)
where
g(x) e
(
x
B
)
2
(4.119)
Step 4: Update the value of pixel (i, j) using
I(i, j)
new
= I(i, j)
old
+
[c
N
(i, j)
N
(i, j) +c
S
(i, j)
S
(i, j)+
c
E
(i, j)
E
(i, j) +c
W
(i, j)
W
(i, j)] (4.120)
where 0 < 0.25.
Figure 4.23 shows the results of applying this algorithm to some noisy images. The value of
C was chosen to be 50 and = 0.25. Note that this algorithm does not converge, and so one
has to run it for several iterations and assess the results as desired.
350 Image Processing: The Fundamentals
Three iterations
Seven iterations
Fourteen iterations
Twenty iterations
Figure 4.23: The noisy images of gures 4.18a, 4.19a and 4.19g, after 3, 7, 14 and 20 iterations
of anisotropic diusion. The weakest half of the gradient magnitudes were attributed to
uctuations in the grey value due to noise. So, C = 50 was used. Parameter in (4.120) was
set to 0.25.
Reducing low frequency interference 351
4.3 Reducing low frequency interference
When does low frequency interference arise?
Low frequency interference arises when the image has been captured under variable illumi-
nation. This is almost always true for indoor scenes, because of the inverse square law of
light propagation: the parts of the imaged scene that are furthest away from the illuminating
source receive much less light than those near the source. This is not true for outdoor scenes,
under natural light, because the sun is so far away, that all points of an imaged scene may
be considered at equal distance from it.
Can variable illumination manifest itself in high frequencies?
Yes. Shadows are a form of variable illumination. They may appear in both indoor and out-
door images. Parts of the scene in shadow do not receive light directly from the illuminating
source, but indirectly via diusion of the light by the surrounding objects. This is called
ambient light. Indoor scenes suer from both shadows and gradually varying illumination,
while outdoor scenes suer only from shadows. Shadows create sudden changes of brightness,
which may be mistaken for real object boundaries. Their eect cannot be corrected by the
methods discussed in this section. However, they may be taken care of by the locally adaptive
methods discussed later on in this chapter.
In which other cases may we be interested in reducing low frequencies?
It is also possible that we may be interested in the small details of an image, or details that
manifest themselves in high frequencies. The process of enhancing the high frequencies of an
image is called sharpening and it may be achieved by high pass linear ltering. Small
image details may also be enhanced by using nonlinear lters based on local image statistics.
What is the ideal high pass lter?
The ideal high pass lter is schematically shown in gure 4.24, in the frequency domain.
H( , ) H( , )
1
r
1
r
o
Figure 4.24: The spectrum of the ideal high pass lter is 1 everywhere, except inside a circle
of radius r
0
in the frequency domain, where it is 0. On the right, a cross-section of such a
lter. Here r
_
2
+
2
.
352 Image Processing: The Fundamentals
Filtering with such a lter in the frequency domain is equivalent to convolving in the
real domain with the function that has this lter as its Fourier transform. There is no nite
function which corresponds to the ideal high pass lter (see example 4.3, on page 299). So,
often, high pass lters are dened in the real domain, for convenience of use rather than
optimality in performance, just like we do for low pass lters. Convenient high pass lters,
with good properties in the frequency domain, are the various derivatives of the Gaussian
function, truncated and discretised (see example 4.17, on page 331). The rst derivatives of
the Gaussian function (4.84), on page 329, are:
g
x
(x, y) xe
x
2
+y
2
2
2
g
y
(x, y) ye
x
2
+y
2
2
2
(4.121)
Note that constant factors in these denitions have been omitted as they are irrelevant,
given that the weights of the truncated and discretised lters created from them will be
normalised. These lters, used as convolution lters, will enhance the horizontal and vertical
transitions of brightness in the image.
The second derivative based Gaussian lter, derived from (4.84), is
g
r
_
1
r
2
2
_
e
r
2
2
2
(4.122)
where r =
_
x
2
+y
2
. This function may be used to enhance spots and small blobs in the
image.
Example 4.22
Apply to the image of gure 4.25 lters (4.121) and (4.122).
Figure 4.25: A building in Ghent. Size 256 323 pixels.
Reducing low frequency interference 353
Let us consider that the lters we shall use will be (2M +1) (2M +1) in size. When
lter g
x
(x) is truncated, its value is Me
M
2
2
2
. We wish this value to be equal to .
This way, we may work out the value of , given M:
= Me
M
2
2
2
ln
M
=
M
2
2
2
=
M
_
2(lnM ln )
(4.123)
The values of lter g
x
may be computed by allowing x to take values M, M +
1, . . . , 1, 0, 1, 2, . . . , M 1, M. The values of g
x
should sum up to 0 as this is a high
pass lter, so a signal that consists of only a zero frequency component (ie a at sig-
nal), should yield 0 as output. Further, if we wish to have control over the amount
by which transitions in brightness are enhanced, we should make sure that all positive
weights sum up to 1 and all negative weights sum up to 1, and multiply the whole
lter with a factor A that allows us to control the level of enhancement. In general,
the weights computed from continuous function g
x
(x) may not sum up to 0. If we
divide the positive weights by their sum and the negative weights by the absolute value
of their own sum, we ensure that both the above conditions are fullled. Using this
methodology and for = 0.01, and M = 2, M = 3 and M = 4, we constructed the
following lters:
For M = 2 the lter is: 0.01, 0.27, 0.00, 0.27, 0.01
For M = 3 the lter is: 0.01, 0.16, 0.53, 0.00, 0.53, 0.16, 0.01
For M = 4 the lter is: 0.01, 0.10, 0.45, 0.69, 0.00, 0.69, 0.45, 0.10, 0.01
After normalising (so that the positive weights add up to 1, while the negative weights
add up to 1), the lters are:
For M = 2 the lter is: 0.04, 0.96, 0.00, 0.96, 0.04
For M = 3 the lter is: 0.01, 0.23, 0.76, 0.00, 0.76, 0.23, 0.01
For M = 4 the lter is: 0.01, 0.08, 0.36, 0.55, 0, 0.55, 0.36, 0.08, 0.01
The above lters may be used on their own as 1D convolution lters, or they may be
combined with the 1D version of the smoothing lter developed in example 4.17, applied
in the orthogonal direction, to form 2D lters that smooth along one direction while
enhancing the brightness transitions along the other. Note that lters g
x
and g
y
dier
only in the direction along which they are applied. The 1D versions of smoothing lter
(4.84), on page 329, are:
For M = 2: 0.006, 0.191, 0.605, 0.191, 0.006
For M = 3: 0.004, 0.052, 0.242, 0.404, 0.242, 0.052, 0.004
For M = 4: 0.003, 0.022, 0.096, 0.227, 0.303, 0.227, 0.096, 0.022, 0.003
Figure 4.26 shows the output of applying the 1D version of lter (4.84) along the
horizontal direction, followed by lter (4.121) applied along the vertical direction to
image 4.25, for various values of M, in order to enhance its horizontal details. Notice
that such a lter may create negative outputs, responding with an absolutely large, but
negative number, to transitions in brightness from bright to dark and with a large pos-
itive number to transitions in brightness from dark to bright. To avoid discriminating
354 Image Processing: The Fundamentals
between these two types of transition, the absolute value of the lter output is taken.
Then, in order to visualise the results, we use the histogram of the output values in
order to select two thresholds: any value below the low threshold t
1
is set to 0, while
any value above the high threshold t
2
is set to 255. The values in between are linearly
mapped to the range [0, 255]:
g
new
=
_
_
0 if g
old
t
1
255 if g
old
t
2
_
g
old
t
1
t
2
t
1
255 + 0.5
_
if t
1
< g
old
< t
2
(4.124)
This type of stretching allows a much better visualisation of the results than straight-
forward mapping of the output values to the range [0, 255], as it allows us to remove
the eect of outliers.
M = 2 M = 3
M = 4 M = 5
Figure 4.26: Enhancing the horizontal details of the building in Ghent, by low pass
ltering along the horizontal direction and high pass ltering along the vertical one.
Reducing low frequency interference 355
Figure 4.27 shows the output of applying lter (4.84) along the vertical direction,
followed by lter (4.121) applied along the horizontal direction to image 4.25, for
various values of M, in order to enhance its vertical details. As above, the absolute
value of the lter output is considered and scaled for visualisation.
M = 2 M = 3
M = 4 M = 5
Figure 4.27: Enhancing the vertical details of the building in Ghent, by low pass l-
tering along the vertical direction and high pass ltering along the horizontal one.
To construct lter g
r
we work as follows. Note that the value of for this lter
determines the radius r at which its values change sign. So, when we select it, we
must consider the size of the spots we wish to enhance. The example lters we present
next have been computed by selecting = M/2 in (4.122) and allowing x and y to take
values 0, 1, 2, . . . , M. The weights of this lter have to sum up to 0, so after we
compute them, we nd their sum and we subtract from each weight /(2M + 1)
2
.
The lters that result in this way, for M = 2, M = 3 and M = 4, are:
0.0913 0.0755 0.0438 0.0755 0.0913
0.0755 0.2122 0.3468 0.2122 0.0755
0.0438 0.3468 1.0918 0.3468 0.0438
0.0755 0.2122 0.3468 0.2122 0.0755
0.0913 0.0755 0.0438 0.0755 0.0913
356 Image Processing: The Fundamentals
0.1008 0.0969 0.0804 0.0663 0.0804 0.0969 0.1008
0.0969 0.0436 0.1235 0.2216 0.1235 0.0436 0.0969
0.0804 0.1235 0.3311 0.0409 0.3311 0.1235 0.0804
0.0663 0.2216 0.0409 1.1010 0.0409 0.2216 0.0663
0.0804 0.1235 0.3311 0.0409 0.3311 0.1235 0.0804
0.0969 0.0436 0.1235 0.2216 0.1235 0.0436 0.0969
0.1008 0.0969 0.0804 0.0663 0.0804 0.0969 0.1008
0.1032 0.1018 0.0955 0.0832 0.0759 0.0832 0.0955 0.1018 0.1032
0.1018 0.0886 0.0362 0.0501 0.0940 0.0501 0.0362 0.0886 0.1018
0.0955 0.0362 0.1462 0.3187 0.3429 0.3187 0.1462 0.0362 0.0955
0.0832 0.0501 0.3187 0.1321 0.2760 0.1321 0.3187 0.0501 0.0832
0.0759 0.0940 0.3429 0.2760 1.1033 0.2760 0.3429 0.0940 0.0759
0.0832 0.0501 0.3187 0.1321 0.2760 0.1321 0.3187 0.0501 0.0832
0.0955 0.0362 0.1462 0.3187 0.3429 0.3187 0.1462 0.0362 0.0955
0.1018 0.0886 0.0362 0.0501 0.0940 0.0501 0.0362 0.0886 0.1018
0.1032 0.1018 0.0955 0.0832 0.0759 0.0832 0.0955 0.1018 0.1032
M = 2 M = 3
M = 4 M = 5
Figure 4.28: Enhancing the blob-like details of the building in Ghent, by high pass
ltering with lter (4.122).
Reducing low frequency interference 357
Figure 4.28 shows the output of applying lter (4.122) to image 4.25, for various
values of M, in order to enhance its blob-like details. To avoid discriminating between
dark or bright blob-like details in the image, the absolute value of the lter output is
considered before it is scaled in the range [0, 255] for displaying.
How can we enhance small image details using nonlinear lters?
The basic idea of such algorithms is to enhance the local high frequencies. These high fre-
quencies may be identied by considering the level of variation present inside a local window,
or by suppressing the low frequencies. This leads to the algorithms of unsharp masking
and retinex, which has been inspired by the human visual system (retinex=retina+cortex).
Both algorithms may be used as global ones or as locally adaptive ones.
What is unsharp masking?
This algorithm subtracts from the original image a blurred version of it, so that only high
frequency details are left, which are subsequently used to form the enhanced image. The
blurred version is usually created by convolving the original image with a Gaussian mask, like
one of those dened in example 4.17.
The use of a smoothing lter creates a wide band of pixels around the image, that have
either to be left unprocessed or omitted from the nal result. As the lters we use here are
quite big, such a band around the image would result in neglecting a signicant fraction of
the image. So, we apply some correction procedure that allows us to use all image pixels:
we create an array the same size as the image, with all its elements having value 1; then we
convolve this array with the same lter we use for the image and divide the result of the
convolution of the image with the result of the convolution of the array of 1s, pixel by pixel.
This way the value of a pixel near the border of the image is computed from the available
neighbours it has, with weights of the lter that always sum up to 1, even if the neighbourhood
used is not complete.
Figure 4.29 shows an original image and its enhanced version by unsharp masking it, using
a smoothing Gaussian window of size 121 121. Figure 4.30 shows another example where
either a Gaussian lter was used to create the low pass version of the image, or the mean grey
value of the image was considered to be its low pass version. The histograms of the enhanced
values are also shown in order to demonstrate how thresholds t
1
and t
2
were selected for
applying equation (4.124), on page 354, to produce the displayable result.
How can we apply the unsharp masking algorithm locally?
In the local application of the unsharp masking algorithm, we consider a local window. From
the value of a pixel we subtract the value the same pixel has in the low pass ltered version of
the image. The dierence from the global algorithm is that now the low pass ltered version
has been produced by using a small window. The residual is multiplied with an amplifying
constant if it is greater than a threshold. The threshold allows one to suppress small high
frequency uctuations which are probably due to noise. The low pass version of the image
358 Image Processing: The Fundamentals
(a) Original (b) Global, Gaussian 121 121
Figure 4.29: Unsharp masking A Street in Malta shown in (a) (size 512 512). (b) Global
algorithm, where the low pass version of the image was obtained by convolving it with a
121 121 Gaussian window. Residuals below 50 were set to 0 and above 50 to 255. The
in-between values were linearly stretched to the range [0, 255].
may be created by convolving the original image either with a at averaging window, or
with a Gaussian window. Figure 4.31 shows the results of applying this algorithm to image
Leaves, with threshold 15, a local window of size 21 21 and with an amplifying constant
of 2. In these results, any values outside the range [0, 255] were collapsed either to 0 or to 255,
accordingly. Figures 4.32a and 4.32b show the enhancement of image 4.29a, using a Gaussian
and a at window, respectively, for the estimation of its low pass version.
How does the locally adaptive unsharp masking work?
In the locally adaptive unsharp masking, the amplication factor is selected according to the
local variance of the image.
Let us say that the low pass grey value at (x, y) is m(x, y), the variance of the pixels
inside a local window is (x, y), and the value of pixel (x, y) is f(x, y). We may enhance the
variance inside each such window by using a transformation of the form,
g(x, y) = A[f(x, y) m(x, y)] +m(x, y) (4.125)
where A is some scalar.
We would like areas which have low variance to have their variance amplied most. So,
we choose the amplication factor A inversely proportionally to (x, y),
A =
kM
(x, y)
(4.126)
where k is a constant, and M is the average grey value of the whole image. The value of the
pixel is not changed if the dierence f(x, y) m(x, y) is above a certain threshold.
Reducing low frequency interference 359
(a) Original (b) Gaussian (c) Mean
100 50 0 50 100 150
0
2
4
6
8
10
12
x 10
4
50 0 50 100 150 200 250
0
2
4
6
8
x 10
4
(d) (e)
Figure 4.30: (a) The image Leaves of size 460 540. (b) Unsharp masking it by using a
Gaussian window of size 121 121 to produce a smoothed version of it which is subtracted
from the original image. In (d) the histogram of the residual values. The result was produced
by linearly stretching the range of values [75, 75], while letting values outside this range to
become either 0 or 255. (c) Unsharp masking the original image by simply removing from
each pixel the mean grey value of the image. In (e) the histogram of the residual values. The
result was produced by linearly stretching the range of values [50, 100], while letting values
outside this range to become either 0 or 255.
(a) Original (b) Local, at window (c) Local, Gaussian window
Figure 4.31: Unsharp masking applied locally to image (a). (b) The low pass version of each
image patch was created by convolving the original image with an averaging window of size
21 21. (c) The low pass version of each image patch was created by convolving the original
image with a Gaussian window of radius 10. For both results, only dierences larger than 15
were multiplied with a factor of 2.
360 Image Processing: The Fundamentals
Figure 4.32 shows the results of the various versions of the unsharp masking algorithm
applied to an image. Figure 4.33 demonstrates the eect of selecting the range of values that
will be linearly stretched to the range [0, 255], or simply allowing out of range values to be
set to either 0 or 255, without bothering to check the histogram of the resultant values.
How does the retinex algorithm work?
There are many algorithms referred to with the term retinex. The simplest one discussed
here is also known as logarithmic transform or single scale retinex.
This algorithm consists of two basic ingredients:
(i) local grey value normalisation by division with the local mean value;
(ii) conversion into a logarithmic scale that spreads more the dark grey values and less the
bright values (see Box 4.11).
The transformation of the original grey value f(x, y) to a new grey value g(x, y) is ex-
pressed as:
g(x, y) = ln(f(x, y) + 1) ln f(x, y) = ln
f(x, y) + 1
f(x, y)
(4.127)
Note the necessity to add 1 to the image function to avoid having to take the logarithm of
0. (This means that if we wish to scale the image values to be in the range (0, 1], we must
divide them by 256.) Function f(x, y) is computed by convolving the image with a large
Gaussian smoothing lter. The lter is chosen to be large, so that very little detail is left in
f(x, y). The novelty of the retinex algorithm over unsharp masking is eectively the use of
logarithms of the grey image values. Figure 4.34 shows the results of applying this algorithm
to the original image of 4.29a.
Box 4.11. Which are the grey values that are stretched most by the retinex
algorithm?
Let us consider a dierence g in the grey values of the output image, and a corre-
sponding grey level dierence f in the grey values of the input image. Because of
equation (4.127), we may write:
g
f
f
(4.128)
This relationship indicates that when f is small (dark image patches), a xed dierence
in grey values f will appear larger in the output image, while when f is large, the
same dierence in grey values f will be reduced. This imitates what the human visual
system does, which is known to be more discriminative in dark grey levels than in bright
ones. One may easily work this out from the psychophysical law of Weber-Fechner.
This law says that
I
I
0.02 (4.129)
where I is the minimum grey level dierence which may be discriminated by the
human eye when the brightness level is I. Since the ratio I/I is constant, at smaller
values of I, (darker greys), we can discriminate smaller dierences in I.
Reducing low frequency interference 361
(a) Local, Gaussian (b) Local, at
(c) Adaptive, Gaussian, k = 0.5 (d) Adaptive, at, k = 0.5
Figure 4.32: Unsharp masking image 4.29a. (a) Local algorithm, where the low pass version of
the image was obtained by convolution with a Gaussian mask of size 2121. The amplication
factor was 2 and dierences below 3 meant that the pixel value was not changed. (b) As in
(a), but a at 21 21 pixels window was used to obtain the low pass version of the image.
(c) The adaptive algorithm where the amplication factor is given by (4.126) with k = 0.5.
Value m(x, y) used in (4.125) was obtained with a 2121 Gaussian window. Dierences above
15 were not enhanced. Finally, only enhanced values in the range [75, 275] were linearly
stretched to the [0, 255] range; those below 75 were set to 0 and those above 275 were set
to 255. (d) As in (c), but a 21 21 at window was used to obtain the value of m(x, y). The
range of linearly stretched enhanced values was [100, 200].
362 Image Processing: The Fundamentals
Adaptive, Gaussian, k = 1.5
Adaptive, Gaussian, k = 3
Figure 4.33: Adaptive unsharp masking. On the left, the enhanced values were simply trun-
cated if they were outside the range [0.255]. On the right, the histogram of the enhanced
values was inspected and two thresholds were selected manually. Any value outside the range
of the thresholds was either set to 0 or to 255. Values within the two thresholds were linearly
stretched to the range [0, 255]. This is very important, particularly for large values of k, which
may produce extreme enhanced values. The selected range of values for linear stretching, from
top to bottom, respectively, was: [50, 250] and [200, 400].
Reducing low frequency interference 363
(a) Retinex 61 61 (b) Retinex 121 121
(c) Retinex 241 241 (d) Retinex 361 361
Figure 4.34: Retinex enhancement of the street in Malta. The high frequency details are
enhanced by rst taking the logarithm of the image and then removing from it its smoothed
version, obtained by convolving it with a Gaussian mask of size (a) 61 61, (b) 121 121,
(c) 241 241 and (d) 361 361. For the top two panels, equation (4.124) was applied with
t
1
= 200 and t
2
= 150, while for the bottom two panels it was applied with t
1
= 300
and t
2
= 150. These thresholds were selected by visually inspecting the histograms of the
enhanced values.
364 Image Processing: The Fundamentals
How can we improve an image which suers from variable illumination?
The type of illumination variation we are interested in here is due to the inverse square law
of the propagation of light. Indeed, according to the laws of physics, the intensity of light
reduces according to the inverse of the square of the distance away from the lighting source.
This may cause problems in two occasions:
(i) when the lighting source is very directional and strong, like when we are capturing an
image indoors with the light coming from a window somewhere outside the eld of view of
the camera;
(ii) when we are interested in performing very accurate measurements using the grey image
values. Examples of such applications arise when we use photometric stereo, or when
we perform industrial inspection that relies on the accurate estimation of the colour of the
inspected product.
In both cases (i) and (ii), the problem can be dealt with if we realise that every image
function f(x, y) is the product of two factors: an illumination function i(x, y) and a reectance
function r(x, y) that is intrinsic to the imaged surface:
f(x, y) = i(x, y)r(x, y) (4.130)
To improve the image in the rst case, we may use homomorphic ltering. To improve
the image in the second case, we may apply a procedure called atelding.
What is homomorphic ltering?
A homomorphic lter enhances the high frequencies and suppresses the low frequencies, so
that the variation in the illumination is reduced, while edges (and details) are sharpened.
Illumination is generally of uniform nature and yields low-frequency components in the
Fourier transform of the image. Dierent materials (objects) on the other hand, imaged
next to each other, cause sharp changes of the reectance function, which cause sharp
transitions in the intensity of the image. These sharp changes are associated with high-
frequency components. We can try to separate these two factors by rst taking the log-
arithm of equation (4.130) so that the two eects are additive rather than multiplicative:
ln f(x, y) = ln i(x, y) + ln r(x, y).
The homomorphic lter is applied to this logarithmic image. The cross-section of a ho-
momorphic lter looks like the one shown in gure 4.35.
L
H(r)
r
Figure 4.35: A cross-section of a homomorphic lter as a function of polar frequency, r
_
2
+
2
.
Reducing low frequency interference 365
Figure 4.36a shows two images with smoothly varying illumination from left to right. The
results after homomorphic ltering, shown in gures 4.36b, constitute clear improvements,
with the eect of variable illumination greatly reduced and several details, particularly in the
darker parts of the images, made visible.
(a) Original images (b) After homomorphic ltering
Figure 4.36: These images were captured indoors, with the light of the window coming from
the right. The light propagates according to the inverse square law, so its intensity changes
gradually as we move to the left of the image.
These results were obtained by applying to the logarithm of the original image, a lter
with the following frequency response function:
366 Image Processing: The Fundamentals
h(, ) =
1
1 +e
s
2
+
2
r
0
+A (4.131)
with s = 1, r
0
= 128 and A = 10. The parameters of this lter are related as follows to the
parameters
H
and
L
of gure 4.35:
L
=
1
1 +e
sr
0
+A,
H
= 1 +A (4.132)
What is photometric stereo?
In photometric stereo we combine images captured by the same camera, but illuminated by
directional light coming from several dierent directions, in order to work out the orientation
of the illuminated surface patch in relation to some coordinate system. The basic point, on
which photometric stereo relies, is the observation that the intensity of light received by a
surface patch depends on the relative orientation of the surface with respect to the direction
of illumination. Exploiting the variation of greyness a pixel exhibits in images captured under
dierent illumination directions, but by the same camera and from the same viewing direction
and distance, one can work out the exact orientation of the surface patch depicted by the
pixel. The basic assumption is that the variation in greyness, observed for the same pixel, is
entirely due to the variation in the relative orientation the corresponding surface patch has
with respect to the various illumination sources. In practice, however, part of the variation
will also be due to the inverse square law of the propagation of light, and if one ignores that,
erroneous estimates of the surface orientation will be made. So, an important rst step, before
applying such algorithms, is to ateld the images used.
What does atelding mean?
It means to correct an image so that it behaves as if it were captured under illumination of
uniform intensity throughout the whole extent of the image.
How is atelding performed?
The cases in which atelding is required usually arise when the images are captured under
controlled conditions, as it happens in systems of visual industrial inspection, or in photo-
metric stereo. In such cases, we have the opportunity to capture also a reference image, by
imaging, for example, a uniformly coloured piece of paper, under the same imaging conditions
as the image of interest. Then we know that any variation in grey values across this reference
image must be due to variation in illumination and noise. The simplest thing to do is to
view the reference image as a function g(x, y), where (x, y) are the image coordinates and g
is the grey value, and t this function with a low order polynomial in x and y. This way the
high frequency noise is smoothed out, while the low order polynomial captures the variation
of illumination across the eld of view of the camera. Then the image of interest has to be
divided point by point by this low order polynomial function, that models the illumination
eld, in order to be corrected for the variable illumination. One might divide the image of
interest by the raw values of the reference image, point by point, but this may amplify noise.
Histogram manipulation 367
4.4 Histogram manipulation
What is the histogram of an image?
The histogram of an image is a discrete function that is formed by counting the number of
pixels in the image that have a certain grey value. When this function is normalised to sum up
to 1 for all the grey values, it can be treated as a probability density function that expresses
how probable it is for a certain grey value to be found in the image. Seen this way, the grey
value of a pixel becomes a random variable which takes values according to the outcome of
an underlying random experiment.
When is it necessary to modify the histogram of an image?
If we cannot see much detail in an image, the reason could be that pixels, which represent
dierent objects or parts of objects, have grey values which are very similar to each other.
This is demonstrated with the example histograms shown in gure 4.37. The histogram of
the bad image is very narrow and it does not occupy the full range of possible grey values,
while the histogram of the good image is more spread. In order to improve the bad
image, we might like to modify its histogram so that it looks like that of the good image.
(b)
(a)
n
u
m
b
e
r
o
f
p
i
x
e
l
s
n
u
m
b
e
r
o
f
p
i
x
e
l
s
grey value grey value
Figure 4.37: (a) The histogram of a bad image. (b) The histogram of a good image.
How can we modify the histogram of an image?
The simplest way is histogram stretching. Let us say that the histogram of the low contrast
image ranges from grey value g
min
to g
max
. We wish to spread these values over the range
[0, G1], where G1 > g
max
g
min
. We may map the grey values to the new range, if the
grey value of a pixel g
old
is replaced with the value g
new
, given by:
g
new
=
_
g
old
g
min
g
max
g
min
G+ 0.5
_
(4.133)
368 Image Processing: The Fundamentals
Term 0.5 was added so that the real number (g
old
g
min
)G/(g
max
g
min
) is rounded by
the oor operator | to its nearest integer as opposed to its integer part. We saw a version
of this method on page 354, where equation (4.124) is used instead, designed to trim out
extreme values.
Note that all we do by applying equation (4.133) is to spread the grey values, without
changing the number of pixels per grey level. There are more sophisticated methods, which
as well as stretching the range of grey values, allocate a predened number of pixels at each
grey level. These methods are collectively known as histogram manipulation.
What is histogram manipulation?
Histogram manipulation is the change of the grey values of an image, without aecting its
semantic information content.
What aects the semantic information content of an image?
The information content of an image is conveyed by the relative grey values of its pixels.
Usually, the grey values of the pixels do not have meaning in absolute terms, but only in
relative terms. If the order (ranking) of pixels in terms of their grey value is destroyed, the
information content of the image will be aected. So, an image enhancing method should
preserve the relative brightness of pixels.
How can we perform histogram manipulation and at the same time preserve the
information content of the image?
Let us assume that the grey values in the original image are represented by variable r and
in the new image by variable s. We would like to nd a transformation s = T(r) such that
the probability density function p
old
(r), which might look like the one in gure 4.37a, is
transformed into a probability density function p
new
(s), which might look like that in gure
4.37b.
In order to preserve the information content of the image, all pixels that were darker
than a pixel with grey value R, say, should remain darker than this pixel even after the
transformation, when this pixel gets a new value S, say. So, for every grey value R, the
number of pixels with lower grey values should be the same as the number of pixels with lower
grey values than S, where S is the value to which R is mapped. This may be expressed by
saying that the transformation T between the two histograms must preserve the distribution
function of the normalised histograms:
P
old
(R) = P
new
(S)
_
R
0
p
old
(r)dr =
_
S
0
p
new
(s)ds (4.134)
This equation can be used to dene the transformation T that must be applied to the
value R of variable r to obtain the corresponding value S of variable s, provided we dene
function p
new
(s).
Histogram manipulation 369
Example 4.23
The histogram of an image may be approximated by the probability density
function
p
old
(r) = Ae
r
(4.135)
where r is the grey level variable taking values between 0 and b, and A
is a normalising factor. Calculate the transformation s = T(r), where s is
the grey level value in the transformed image, such that the transformed
image has probability density function
p
new
(s) = Bse
s
2
(4.136)
where s takes values between 0 and b, and B is some normalising factor.
Transformation S = T(R) may be calculated using equation (4.134):
B
_
S
0
se
s
2
ds = A
_
R
0
e
r
dr (4.137)
The left-hand side of (4.137) is:
_
S
0
se
s
2
ds =
1
2
_
S
0
e
s
2
ds
2
=
1
2
e
s
2
S
0
=
1 e
S
2
2
(4.138)
The right-hand side of (4.137) is:
_
R
0
e
r
dr = e
r
R
0
= 1 e
R
(4.139)
We substitute from (4.138) and (4.139) into (4.137) to obtain:
1 e
S
2
2
=
A
B
_
1 e
R
_
e
S
2
= 1
2A
B
_
1 e
R
_
S
2
= ln
_
1
2A
B
_
1 e
R
_
_
S =
ln
_
1
2A
B
(1 e
R
)
_
(4.140)
So, each grey value R of the original image should be transformed into grey value S
in the enhanced image, according to equation (4.140).
370 Image Processing: The Fundamentals
What is histogram equalisation?
Histogram equalisation is the process by which we make all grey values in an image equally
probable, ie we set p
new
(s) = c, where c is a constant. Transformation S = T(R) may be
calculated from equation (4.134) by substitution of p
new
(s) and integration. Figures 4.38a-
4.38d show an example of applying this transformation to a low contrast image. Notice how
narrow the histogram 4.38b of the original image 4.38a is. After histogram equalisation, the
histogram in 4.38d is much more spread, but contrary to our expectations, it is not at, ie it
does not look equalised.
Why do histogram equalisation programs usually not produce images with at
histograms?
In the above analysis, we tacitly assumed that variables r and s can take continuous values.
In reality, of course, the grey level values are discrete. In the continuous domain there is an
innite number of values in any interval [r, r + dr]. In digital images, we have only a nite
number of pixels in each range. As the range is stretched, and the number of pixels in it is
preserved, there is only a nite number of pixels with which the stretched range is populated.
The histogram that results is spread over the whole range of grey values, but it is far from
at.
How do we perform histogram equalisation in practice?
In practice, r takes discrete values g, ranging between, say, g
min
and g
max
. Also, s takes
discrete values t, ranging from 0 to G 1, where typically G = 256. Then equation (4.134)
becomes:
S
t=0
p
new
(t) =
R
g=g
min
p
old
(g) (4.141)
For histogram equalisation, p
new
(t) = 1/G, so that the values of p
new
(t) over the range
[0, G1] sum up to 1. Then:
1
G
(S + 1) =
R
g=g
min
p
old
(g) S = G
R
g=g
min
p
old
(g) 1 (4.142)
For every grey value R, this equation produces a corresponding value S. In general, this
S will not be integer, so in order to get an integer value, we round it to an integer by taking
its ceiling. This is because, when R = g
min
, the rst term on the right-hand side of equation
(4.142) may be less than 1 and so S may become negative instead of 0. When R = g
max
,
the sum on the right-hand side of (4.142) is 1 and so S becomes G 1, as it should be. So,
nally a pixel with grey value g
old
in the original image should get value g
new
in the enhanced
image, given by
g
new
=
_
G
g
old
g=g
min
p
old
(g) 1
_
(4.143)
where p
old
(g) is the normalised histogram of the old image.
Histogram manipulation 371
0 50 100 150 200 250
0
2000
4000
6000
8000
10000
12000
14000
(a) Original image (b)Histogram of (a)
0 50 100 150 200 250
0
2000
4000
6000
8000
10000
12000
14000
(c) After histogram equalisation (d) Histogram of (c)
0 50 100 150 200 250
1023
1023.5
1024
1024.5
1025
(e) After histogram equalisation (f) Histogram of (e)
with random additions
Figure 4.38: Enhancing the image of The Bathtub Cleaner by histogram equalisation.
372 Image Processing: The Fundamentals
0 50 100 150 200 250
0
2000
4000
6000
8000
10000
12000
14000
(a) After histogram hyperbolisation (b) Histogram of (a)
0 50 100 150 200 250
0
500
1000
1500
2000
2500
3000
(c) After histogram hyperbolisation (d) Histogram of (c)
with random additions
Figure 4.39: Histogram hyperbolisation with = 0.01 applied to the image of gure 4.38a.
Can we obtain an image with a perfectly at histogram?
Yes, if we remove the constraint that the ranking of pixels in terms of their grey values has
to be strictly preserved. We may allow, for example, pixels to be moved into neighbouring
bins in the histogram, so that all bins have equal number of pixels. This method is known
as histogram equalisation with random additions. Let us say that the (unnormalised)
histogram of the image, after stretching or equalising it, is represented by the 1D array H(g),
where g [0, G1], and that the image has NM pixels. The algorithm of histogram equali-
sation with random additions should work as follows.
Step 1: To the grey value of each pixel, add a random number drawn from a uniform distri-
bution [0.5, 0.5].
Histogram manipulation 373
Step 2: Order the grey values, keeping track which grey value corresponds to which pixel.
Step 3: Change the rst
NM
G
| grey values to 0. Change the next
NM
G
| grey values to 1,
... etc until the last
NM
G
| which change to G1.
The result of applying this algorithm to the image of gure 4.38a is shown in 4.38e.
What if we do not wish to have an image with a at histogram?
We may dene p
new
(s) in (4.134) to be any function we wish. Once p
new
(s) is known (the
desired histogram), one can solve the integral on the right-hand side to derive a function f
1
of S. Similarly, the integral on the left-hand side may be performed to yield a function f
2
of
R, ie
f
1
(S) = f
2
(R) S = f
1
1
f
2
(R) (4.144)
A special case of this approach is histogram hyperbolisation, where p
new
(s) = Ae
s
with A and being some positive constants. The eect of this choice is to give more emphasis
to low grey values and less to the high ones. This algorithm may also be used in conjunction
with random additions, to yield an image with a perfectly hyperbolic histogram (see gure
4.39). In gure 4.39d this can be seen clearly because the method of random additions was
used.
How do we do histogram hyperbolisation in practice?
Set p
new
(s) = Ae
s
in (4.134). First, we work out the value of A, noticing that p(s) has to
integrate to 1, from 0 to G1:
_
G1
s=0
Ae
s
ds = 1 A
_
G1
0
e
s
d(s)
1
= 1
A
_
(G1)
0
e
t
dt = 1
e
(G1)
+
A
= 1
A
_
1 e
(G1)
_
= 1 A =
1 e
(G1)
(4.145)
The right-hand side then of (4.134) becomes:
_
S
0
Ae
s
ds = A
e
s
S
0
=
A
_
1 e
S
_
(4.146)
For the discrete normalised histogram of the original image p
r
(g), equation (4.144) then takes
the form:
A
_
1 e
S
_
=
R
g=g
min
p
old
(g) S =
1
ln
_
1
A
R
g=g
min
p
old
(g)
_
(4.147)
Since S has to take integer values, we add 0.5 and take the oor, so we nally arrive at the
transformation:
g
new
=
_
ln
_
1
A
g
old
g=g
min
p
old
(g)
_
+ 0.5
_
(4.148)
374 Image Processing: The Fundamentals
In gure 4.39, = 0.01 as this gave the most aesthetically pleasing result. It was found
with experimentation that = 0.05 and = 0.1 gave images that were too dark, whereas
= 0.001 gave an image that was too bright. For this particular image, G = 256.
How do we do histogram hyperbolisation with random additions?
The only dierence with histogram equalisation with random additions is that now each bin
of the desired histogram has to have a dierent number of pixels. First we have to decide
the number of pixels per bin. If t denotes the discrete grey values of the enhanced image,
bin H(t) of the desired histogram will have V (t) pixels. So, rst we calculate the number of
pixels we require per bin. This may be obtained by multiplying the total number of pixels
with the integral of the desired probability density function over the width of the bin, ie the
integral from t to t + 1. For an N M image, the total number of pixels is NM. We then
have
V (t) = NMA
_
t+1
t
e
t
dt V (t) = NMA
e
t
t+1
t
V (t) = NM
A
_
e
(t+1)
e
t
_
V (t) = NM
A
_
e
t
e
(t+1)
_
(4.149)
where A is given by (4.145).
The algorithm then of histogram hyperbolisation with random additions is as follows.
Step 1: To the grey value of each pixel add a random number drawn from a uniform distri-
bution [0.5, 0.5].
Step 2: Order the grey values, keeping track which grey value corresponds to which pixel.
Step 3: Set the rst V (0)| pixels to 0.
Step 4: For t from 1 to G1, assign to the next V (t)+V (t 1) V (t 1)|| pixels grey
value t. Note the correction term V (t 1) V (t 1)| we incorporate in order to account for
the left-over part of V (t 1), which, when added to V (t) may produce a value incremented
by 1 when the oor operator is applied.
Why should one wish to perform something other than histogram equalisation?
One may wish to emphasise certain grey values more than others, in order to compensate for a
certain eect; for example, to compensate for the way the human eye responds to the dierent
degrees of brightness. This is a reason for doing histogram hyperbolisation: it produces a
more pleasing picture.
The human eye can discriminate better darker shades than brighter ones. This is known
from psychophysical experiments which have shown that the threshold dierence in brightness
I, for which the human eye can separate two regions, over the average brightness I is
constant and roughly equal to 0.02 (see equation (4.129) of Box 4.11, on page 360). So, the
brighter the scene, (higher I) the more dierent two brightness levels have to be in order for
us to be able to discriminate them. In other words, the eye shows more sensitivity to dark
shades and this is why histogram hyperbolisation is believed to produce better enhanced
images, as it places more pixels in the dark end of the grey spectrum.
Histogram manipulation 375
(a) Original image (b) After global histogram equalisation
(c)Local histogram equalisation 81 81 (d)Local histogram equalisation 241 241
Figure 4.40: Enhancing the image of A Young Train Driver (of size 512 512).
What if the image has inhomogeneous contrast?
The approach described above is global, ie we modify the histogram which refers to the whole
image. However, the image may have variable quality at various parts. In that case, we may
apply the above techniques locally: we scan the image with a window inside which we modify
the histogram but we alter only the value of the central pixel. Clearly, such a method is costly
and various algorithms have been devised to make it more ecient.
Figure 4.40a shows a classical example of an image that requires local enhancement. The
picture was taken indoors looking towards windows with plenty of ambient light coming
through. All outdoor sections are ne, but in the indoor part the lm was under-exposed.
The result of global histogram equalisation, shown in gure 4.40b, makes the outdoor parts
over-exposed in order to allow us to see the details of the interior. The results of local
histogram equalisation, shown in gures 4.40c and 4.40d, are overall much more pleasing.
376 Image Processing: The Fundamentals
(a) Original image (b) After global histogram equalisation
(c)Local histogram equalisation 81 81 (d)Local histogram equalisation 241 241
Figure 4.41: Enhancing the image At the Karlstejn Castle (of size 512 512).
The window size used for 4.40c was 81 81, while for 4.40d it was 241 241, with the
original image being of size 512 512. Notice that no part of the image gives the impression
of being over-exposed or under-exposed. There are parts of the image, however, that look
damaged, particularly at the bottom of the image. They correspond to parts of the original
lm which received too little light to record anything. They correspond to at black patches,
and, by trying to enhance them, we simply enhance the lm grain or the instrument noise.
A totally dierent eect becomes evident in gure 4.41c which shows the local histogram
enhancement of a picture taken at Karlstejn castle in the Czech Republic, shown in gure
4.41a. The castle at the back consists of at grey walls. The process of local histogram
equalisation amplies every small variation of the wall to such a degree that the wall looks
like the rough surface of a rock. Further, on the left of the image, we observe again the eect
of trying to enhance a totally black area. However, increasing the window size to 241 241
removes most of the undesirable eects.
Histogram manipulation 377
Can we avoid damaging at surfaces while increasing the contrast of genuine
transitions in brightness?
Yes, there are algorithms that try to do that by taking into consideration pairs of pixel values.
Such algorithms may be best understood if we consider the mapping function between input
and output grey values. This function has to be one-to-one, but it may be chosen so that it
stretches dierently dierent ranges of grey values. This idea is shown schematically in gure
4.42.
o
u
t
p
u
t
g
r
e
y
v
a
l
u
e
input grey value input grey value
G
O
(b) (a)
A B
B A C D
o
u
t
p
u
t
g
r
e
y
v
a
l
u
e
G
F
E
O
Figure 4.42: (a) A simple stretching (see equation (4.133), on page 367) takes the input range
of grey values [A, B] and maps it linearly to the output range [O, G], where GO > B A.
(b) An algorithm that knows that grey values in the range [A, C] belong to more or less
uniform regions, may suppress the stretching of these values and map them to range [O, E],
such that E O < C A. The same algorithm, knowing that values in the range [C, D]
often appear in regions of true brightness transitions, may map the grey values in the range
[C, D] to the range [E, F], so that F E > D C. Grey values in the range [D, B] may
neither be stretched nor suppressed.
How can we enhance an image by stretching only the grey values that appear in
genuine brightness transitions?
Let us consider the image of gure 4.43a. It is a 3-bit 44 image. First, let us count how many
pairs of grey values of certain type we nd next to each other assuming 8-connectivity. We
are not interested in ordered pairs, ie (3, 5) is counted the same as (5, 3). The 2D histogram
of pairs of values we construct that way occupies only half of the 2D array, as shown at the
top of gure 4.43c.
We may select a threshold and say, for example, that pixels that are next to each other
and dier by less than 2 grey levels, owe their dierence to noise only, and so we do not wish
378 Image Processing: The Fundamentals
to stretch, but rather suppress their dierences. These pixels correspond to range [A, C] of
gure 4.42b. They are the pairs that form the main diagonal of the 2D histogram and the
diagonal adjacent to it, as the members of those pairs dier from each other either by 0 or
by 1 grey level. The dierences of all other pairs are to be stretched. Dierences that are to
be suppressed should be associated with some negative force, that will have to bring the
dashed line in gure 4.42b down, while dierences that are to be stretched are to be associated
with some positive force, that will have to bring the dashed line in gure 4.42b up. The
more neighbouring pairs a particular grey level participates to, the more likely should be to
stretch the mapping curve for its value upwards. So, in the 2D histogram of pairs of values
we constructed, we sum the values in each column, ignoring the values along the diagonal
strip that represent pairs of values that are to be suppressed. Those are summed separately
to form the forces that will have to pull the mapping curve down. The two strings of numbers
created that way are shown at the bottom of gure 4.43c as positive and negative forces that
try to push the mapping curve upwards or downwards, respectively.
Of course, the mapping function cannot be pushed down and up simultaneously at the
same point, so, the two forces have somehow to be combined. We may multiply the negative
forces with a constant, say = 0.2, and add the result to the positive forces to form the
combined net forces shown in 4.43c. The mapping curve should be stretched so that neigh-
bouring grey values dier proportionally to these numbers. At this point we do not worry
about scaling the values to be in the right range. We shall do that at the end.
Next, we work out the cumulative of this string of numbers so that as the grey levels
advance, each new grey value diers from its previous one, by as much as the net force we
computed for that value. Now, these numbers should be added to the ordinary stretching
numbers, ie those represented by the dashed line in 4.42b, which were computed using equation
(4.133) of page 367. The ordinary stretching and the calculated cumulative force are added
to form the mapping line. The values of this line far exceed the allowed range of 3-bits. So,
we scale and round the numbers to be integers between 0 and 7. This is the nal mapping
curve. The original grey values are mapped to the values in the very bottom line of gure
4.43c. The input values and these values form a look-up table for image enhancement. The
mapping curve worked out this way is shown in gure 4.43d. Note that this mapping is not
one-to-one. In a real application one may experiment with weight of the negative forces,
to avoid many-to-one mappings.
How do we perform pairwise image enhancement in practice?
The algorithm for such an image enhancement is as follows.
Step 0: If the grey values of the image are in the range [A, B], create a 2D array C of
size (B A+1) (B A+1) and initialise all its elements to 0. The elements of this array
are identied along both directions with indices from A to B.
Step 1: Accumulate the pairs of grey values that appear next to each other in the image,
using 8-connectivity, in the bottom triangle of array C. You need only the bottom triangle
because you accumulate in the same cell pairs (g
1
, g
2
) and (g
2
, g
1
). To avoid counting a pair
twice, start from the top left corner of the image and proceed from top left to bottom right,
by considering for each pixel (i, j) only the pairs it forms with pixels (i + 1, j), (i + 1, j + 1),
(i, j + 1) and (i 1, j + 1), as long as these pixels are within the image boundaries.
Step 2: Decide what the threshold dierence d should be, below which you will not enhance
Histogram manipulation 379
1 2 3 4
1
2
3
4
5
6
7
n
e
w
v
a
l
u
e
old value
Negative forces 0.2
20.2 18.65 17.9 12.35 3.6
(c)
0
1
0
5
0.2 1.0
4
2 10
9 5
6
0.4 2.0 1.2
Net forces 0.2 1.0 3.8 7.0 3.6
Cumulative net force
3.6 13.2 13.4 14.4 10.6
Ordinary stretching 7 3.5 0 1.75 5.25
Positive forces
Negative forces
Final mapping
Sum mapping
7 6 6 4 0
(d)
1
1 3
4 3 2 1 0
1
1
8
4
2
5
4
2
4
5
2
2
4
0
2
1
(a) (b)
0 6 4
6
7
7
6
4 0
4
6 6
4
6
4
6
0 2 1 1
2 3 4
1 3 4 1
2
0 1 2 2
0
Figure 4.43: An example of image enhancement where pairs of values that appear next to
each other and dier by more than a threshold (here equal to 1) are stretched more than
other pairs. (a) The original image. (b) The enhanced image. (c) All steps of the algorithm.
The grey boxes indicate the cells from which the negative forces are computed. (d) The plot
of the nal mapping curve.
380 Image Processing: The Fundamentals
grey value dierences. This determines how wide a strip is along the diagonal of array C,
that will be used to form the negative forces.
Step 3: Add the columns of the array you formed in Step 1 that belong to the strip of
dierences you wish to suppress, to form the string of negative forces:
F
(g) =
_
_
C(B, B) for g = B
C(B 1, B) +C(B 1, B 1) for g = B 1
. . .
C(g, g d + 1) + +C(g, g 1) +C(g, g) for g > B d + 1
. . .
g
i=gd
C(g, i) for g B d
(4.150)
Step 4: Add the values of the remaining cells in each column of the 2D array, to form the
string of the positive forces:
F
+
(g) =
_
0 for g B d
gd1
i=A
C(g, i) for g < B d
(4.151)
Step 5: Multiply the negative forces with a number in the range (0, 1] and subtract them
point by point from the positive forces, to form the net forces:
F
net
(g) = F
+
(g) F
i=A
F
net
(i) for A g B (4.153)
Step 7: Create the mapping from the old to new values, using equation (4.133), g
new
(g).
Step 8: Add the corresponding values you produced in Steps 6 and 7:
g(g) = S(g) +g
new
(g) for A g B (4.154)
Step 9: Scale and round the resultant values:
g
new
( g(g)) =
_
g(g) g(A)
g(B) g(A)
G+ 0.5
_
(4.155)
Step 10: Use the above formula to enhance the image.
Histogram manipulation 381
(a) (b)
40 60 80 100 120 140 160
0
50
100
150
200
250
input value
o
u
t
p
u
t
v
a
l
u
e
(c) (d)
Figure 4.44: A Catholic Precession (size 512 425). (b) Enhanced by simple stretching.
(c) Enhanced by considering pairs of pixels, using parameters d = 1 and = 0.5. (d) The
mapping function for stretching (dashed line) and its modication by considering neighbouring
pairs of pixels (continuous line).
382 Image Processing: The Fundamentals
Note that the above algorithm may be applied globally or locally, inside running windows.
When running windows are used, we only change the value of the pixel in the centre of the
window and then shift the window by one pixel and repeat the whole process.
Figure 4.44 shows an original image with very low contrast, its enhanced version by simple
stretching its range of grey values, and its enhanced version by applying the above algorithm.
Figure 4.44d shows the mapping function between the original and the nal grey image values.
The dashed line is the mapping of simple stretching, while the continuous line is the mapping
obtained by the above algorithm. Figure 4.45 shows a bad image and various enhanced
versions of it.
(a) Original image (b) After global histogram equalisation
(c) Local histogram equalisation 81 81 (d) Enhancement with pairwise relations
Figure 4.45: Enhancing the image of The Hanging Train of Wuppertal. For the enhance-
ment with the pairwise relations approach, = 0.1 and d = 3.
Generic deblurring algorithms 383
4.5 Generic deblurring algorithms
Proper image deblurring will be discussed in the next chapter, under image restoration. This
is because it requires some prior knowledge of the blurring process in order to work correctly.
However, one may use some generic methods of deblurring, that may work without any prior
knowledge.
If we plot a cross-section of a blurred image, it may look like that of gure 4.46a. The
purpose of deblurring is to sharpen the edges, so that they look like those in 4.46b. We
shall discuss here some algorithms that may achieve this: mode ltering, mean shift and
toboggan
2
contrast enhancement.
g
r
e
y
v
a
l
u
e
along the image
g
r
e
y
v
a
l
u
e
along the image
(a)
(b)
Figure 4.46: A cross section of a blurred image looks like (a). The purpose of deblurring
algorithms discussed in this section is to make proles like (a) become like (b).
How does mode ltering help deblur an image?
It has been shown that repeated application of the mode lter (see page 333) may result in
an image made up from patches of uniform grey value with sharp boundaries. The mode may
be applied with or without the use of weights. Figure 4.47 shows an image blurred due to
shaken camera, and the results of the successive application of mode ltering. The algorithm
took 90 iterations to converge. The weights used were
_
_
1 3 1
3 5 3
1 3 1
_
_
(4.156)
These weights were chosen so that the chance of multiple modes was reduced, and the central
pixel was given reasonable chance to survive, so that image details might be preserved. What
the algorithm does in the case of multiple modes is very critical for the outcome. For these
results, when multiple modes were observed, the algorithm worked out the average mode and
rounded it to the nearest integer before assigning it to the central pixel. Note that by doing
that, we create grey values that might not have been present in the original image. This
leads to slow convergence, and ultimately to the creation of artifacts as one can see from
gures 4.47e and 4.47f. Figure 4.48a shows the result of applying the mode ltering with
the same weights, but leaving the value of the pixel unchanged if multiple modes occurred.
2
Toboggan is a type of sledge (originally used by the Canadian Indians) for transportation over snow.
384 Image Processing: The Fundamentals
Convergence now was achieved after only 11 iterations. There are no artifacts in the result.
Figure 4.48b shows the result of mode ltering with weights:
_
_
1 2 1
2 4 2
1 2 1
_
_
(4.157)
There is more chance for these weights to create multiple modes than the previous ones.
Convergence now was achieved after 12 iterations if a pixel was left unchanged when multiple
modes were detected. If the average of multiple modes was used, the output after 12 iterations
is shown in 4.48c. After a few more iterations severe artifacts were observed.
(a) Original (b) Iteration 1 (c) Iteration 10
(d) Iteration 20 (e) Iteration 40 (f) Iteration 90
Figure 4.47: Alison (size 172 113). A blurred image and its deblurred versions by using
weighted mode ltering with weights (4.156). If the output of the lter had multiple modes,
the average of the modes was used.
Generic deblurring algorithms 385
The quantisation of the image values used is crucial for this algorithm. Programming
environments like Matlab, that convert the image values to real numbers between 0 and 1,
have to be used with care: the calculation of the mode requires discrete (preferably integer)
values. In general, mode ltering is very slow. The result does not necessarily improve with
the number of iterations, and so mode ltering may be applied a small number of times, say
for 5 or 6 iterations.
(a) (b) (c)
Figure 4.48: (a) Image 4.47a processed with weights (4.156). When the output of the lter
had multiple modes, the pixel value was not changed. This is the convergent result after 11
iterations. (b) Image 4.47a processed with weights (4.157). When the output of the lter
had multiple modes, the pixel value was not changed. This is the convergent result after 12
iterations. (c) Image 4.47a processed with weights (4.157). When the output of the lter had
multiple modes, the average value of these modes was used, rounded to the nearest integer.
This is the output after 12 iterations. Further iterations created severe artifacts.
Can we use an edge adaptive window to apply the mode lter?
Yes. The way we use such a window is described on page 337. Once the appropriate window
for each pixel has been selected, the mode is computed from the values inside this window.
Figure 4.49 shows the results of applying the mode lter to image 4.47a, with a 5 5 edge
adaptive window and no weights (top row). In the second row are the results obtained if we
use a 3 3 locally adaptive window and weights (4.156).
How can mean shift be used as a generic deblurring algorithm?
The mean shift algorithm, described on page 339, naturally sharpens the edges because it
reduces the number of grey values present in the image, and thus, forces intermediate grey
values to shift either to one or the other extreme. Figure 4.49 shows the result of applying it
to the image of gure 4.47a, with h
x
= 15, h
y
= 15 and h
g
= 1.
386 Image Processing: The Fundamentals
(a) Iteration 1 (b) Iteration 2 (c) Convergence(Iter. 12)
(d) Iteration 1 (e) Iteration 2 (f) Convergence(Iter. 63)
(g) Iteration 1 (h) Iteration 2 (i) Iteration 3
Figure 4.49: Top: edge adaptive 5 5 mode lter. Middle: edge adaptive 3 3 mode lter
with weights (4.156). Bottom: mean shift with h
x
= 15, h
y
= 15 and h
g
= 1.
Generic deblurring algorithms 387
What is toboggan contrast enhancement?
The basic idea of toboggan contrast enhancement is shown in gure 4.50. The pixels slide
along the arrows shown in 4.50a, so that the blurred prole sharpens. This algorithm consists
of three stages.
Stage 1: Work out the magnitude of the gradient vector of each pixel.
Stage 2: Inside a local window around each pixel, identify a pixel with its gradient magni-
tude being a local minimum.
Stage 3: Assign to the central pixel the value of the pixel with the local minimum gradient
magnitude.
How the gradient magnitude of an image may be estimated is covered in Chapter 6 (see
pages 596 and 608), so here we are concerned only with Stage 2 of the algorithm.
How do we do toboggan contrast enhancement in practice?
Assuming that the input to the algorithm is a grey image I and an array T of the same
size that contains the magnitude of the gradient vector at each pixel position, the following
algorithm may be used.
Step 0: Create an array O the same size as the image I and ag all its elements as undened.
The ag may be, for example, a negative number, say 1 for the ag being up. Create also
an empty stack where you may temporarily store pixel positions.
Step 1: For each pixel (i, j) in the image: add it to the stack and consider whether its
gradient T(i, j) is a local minimum, by comparing it with the values of all its neighbours in
its 3 3 neighbourhood.
Step 2: If T(i, j) is a local minimum, set the values of all pixels in the stack equal to the
value of the current pixel, empty the stack and go to Step 1.
Step 3: If it is not a local minimum, identify the neighbour with the minimum gradient
magnitude.
Step 4: If the ag of the neighbour is down in array O, ie if O(neighbour) ,= 1, give to
all pixels in the stack the value the neighbour has in the output array, ie set O(in stack) =
O(neighbour). Empty the stack and go to Step 1.
Step 5: If the neighbour is still agged in O (ie if O(neighbour) = 1), and if the gradient
magnitude of the neighbour is a local minimum, in array O assign to the neighbour and to
all pixels in the stack, the same grey value the neighbour has in image I.
Empty the stack and go to Step 1.
Step 6: If the neighbour is still agged in O (ie if O(neighbour) = 1), and if the gradient
magnitude of the neighbour is not a local minimum, add the address of the neighbour to the
stack, nd the pixel in its 8-neighbourhood with the minimum gradient magnitude and go to
Step 2.
Step 7: Exit the algorithm when all pixels in the output array have their ags down, ie all
pixels have acquired grey values.
Figure 4.51 shows gure 4.47a deblurred by this algorithm.
388 Image Processing: The Fundamentals
g
r
e
y
v
a
l
u
e
along the image
(b) (a)
g
r
e
y
v
a
l
u
e
along the image
Figure 4.50: The black dots represent pixels. Pixels at the at parts of the image (extreme
left and extreme right in (a)) bequest their grey values to their neighbouring pixels. In other
words, pixels in the slanted parts of the cross-section, inherit the values of the pixels with
zero gradient. In (a) the arrows show the direction along which information is transferred,
while in (b) the arrows show which pixels have their grey values increased or reduced.
Figure 4.51: Toboggan contrast enhancement applied to Alison (gure 4.47a.)
Generic deblurring algorithms 389
Example 4.24
Apply toboggan contrast enhancement to the image of gure 4.52a. The
gradient magnitude for each pixel location is given in 4.52b. Show all
intermediate steps.
10
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
12 14 23 13
16 11 21 18
13 23 24 21
12 21 20
(b) (a)
0
Figure 4.52: (a) An original image. (b) The value of the gradient magnitude
at each pixel position.
All steps of the algorithm are shown in gures 4.534.57, where the rst array is always
the gradient magnitude map T, the second is the input image I and the third is the
output image O.
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7
12
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7
12
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
12
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
0
0
0
Figure 4.53: Pixel (0, 0): the neighbour with the minimum gradient is pixel (1, 1)
(T(1, 1) = 11). As T(1, 1) is a local minimum, the value of I(1, 1) is assigned in
the output array to both pixels (0, 0) and (1, 1). Pixel (1, 0): the neighbour with the
minimum gradient is pixel (1, 1). As this pixel has already a value assigned to it in
the output array, pixel (1, 0) inherits that value. The same happens to pixel (2, 0).
390 Image Processing: The Fundamentals
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
12
12
12
2
2
2 4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
2
2
12
2
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
7 2
2
12
2
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
7 2
2
7 12
2
0
0
0
0
0
0
Figure 4.54: Pixel (3, 0) is a local minimum in T, so it gets value 2 in the output array,
ie the same value it had in the input array. Pixels (0, 1) and (2, 1) have neighbours with
minimum gradient, which already have values assigned to them in the output array,
so they inherit that value. The neighbour with the minimum gradient for pixel (3, 1)
is pixel (3, 2). T(3, 2) is not a local minimum; its own neighbour with the minimum
gradient magnitude is pixel (3, 3). T(3, 3) is a local minimum. Pixels (3, 1), (3, 2) and
(3, 3) all take value I(3, 3) in the output array. Pixels (0, 2) and (1, 2) have neighbours
with minimum gradient magnitude, which have already assigned values in the output
array, so they inherit the values of those neighbours.
Generic deblurring algorithms 391
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
7 2
2
7 2 12
2
0
Figure 4.55: Pixel (2, 2) has a neighbour with minimum gradient magnitude, which has
already an assigned value in the output array, so it inherits the value of that neighbour.
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 0 1 0
2
7 2
2
7 2
0
12
2
Figure 4.56: Pixel (0, 3) has gradient magnitude that is a local minimum. Its value in
the output array is set to be the same as that in the input array: O(0, 3) = I(0, 3).
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
7 2
2
7 2
0 0
10
12 14 23 13
16 11 21 18
23 24 21
12 21 20
7
7 7 7
7 7
4 4 3 2
3 7 6 1
7 2 6 2
2 1 0
2
7 2
2
7 2
0 0 2
12
12
2
2
0
0
Figure 4.57: Pixels (1, 3)and (2, 3) have neighbours with minimum gradient magnitude,
which have already assigned values in the output array, so they inherit the values of
those neighbours.
392 Image Processing: The Fundamentals
Example 4.25
Deblur the image of gure 4.58a using toboggan deblurring and mode l-
tering with weights (4.157) and (4.156).
(a) Original (b) Toboggan
(c) Mode (iter. 2, weights (4.157)) (d) Mode (iter. 6, weights (4.157))
(e) Mode (iter. 1, weights (4.156)) (f ) Mode (iter. 8, weights (4.156))
Figure 4.58: (a) Faces, blurred due to shaky camera (size 300 358). (b) Using to-
boggan deblurring. (c)-(f ): Mode ltering with dierent weights for various iterations.
Generic deblurring algorithms 393
What is the take home message of this chapter?
With image enhancement we try to make images look better according to subjective criteria.
We may enhance an image in a desirable way, by manipulating its Fourier spectrum: we
can preferentially kill frequency bands we do not want, or enhance frequencies we want. This
can be achieved with the help of lters dened in the frequency domain, with exactly specied
spectra. The use of such lters involves taking the Fourier transform of the image, multiplying
it with the Fourier transform of the lter, and then taking the inverse Fourier transform. We
can avoid this tedious process by working solely in the real domain, but the lters we shall use
then have to be nite (to be implemented using convolution) or innite but approximate (to
be implemented using z-transforms). In either case, these lters are optimal for convenience
of use, rather than optimal for their frequency characteristics.
Further, we may enhance an image using nonlinear methods, which manipulate its grey
values directly, by mapping them to a broader range of values. When applying such methods,
care should be taken so the ranking of pixels is more or less preserved, in order to preserve
the semantic content of the image and not create artifacts.
Contrast enhancement of a grey image can be achieved by manipulating the grey values of
the pixels so that they become more diverse. This can be done by dening a transformation
that converts the distribution of the grey values to a prespecied shape. The choice of this
shape may be totally arbitrary.
Finally, in the absence of any information, generic deblurring may be achieved by using
mode ltering, mean shift or toboggan enhancement. Figures 4.59 and 4.60 show the prole
of the same cross section of the various restored versions of image 4.47a. We can see that
the edges have indeed been sharpened, but unless an algorithm that takes into consideration
spatial information is used, the edges may be shifted away from their true position and thus
loose their lateral continuity. So, these algorithms do not really restore the image into its
unblurred version, but they simply sharpen its edges and make it look patchy.
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(a) Original (b) Figure 4.47b (c) Figure 4.47c
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(d) Figure 4.47d (e) Figure 4.47e (f) Figure 4.48a
Figure 4.59: Line 124 of Alison, originally and after deblurring. Averaging multiple modes
introduces an artifact on the left in (e). The dashed line in each panel is the original prole.
394 Image Processing: The Fundamentals
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(a) Figure 4.48b (b) Figure 4.48c (c) Figure 4.49a
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(d) Figure 4.49b (e) Figure 4.49c (f) Figure 4.49d
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(g) Figure 4.49e (h) Figure 4.49f (i) Figure 4.49g
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
0 20 40 60 80 100
0
50
100
150
200
(j) Figure 4.49h (k) Figure 4.49i (l) Figure 4.51
Figure 4.60: The prole of line 124 of image Alison, originally and after applying the various
deblurring methods. Note how the mean shift algorithm, (panels (i), (j) and (k)), creates large
at patches in the images. All algorithms make edges sharper and reduce small grey value
uctuations. The original prole is shown as a dashed line superimposed to each resultant
prole.
Chapter 5
Image Restoration
What is image restoration?
Image restoration is the improvement of an image using objective criteria and prior knowledge
as to what the image should look like.
Why may an image require restoration?
An image may be degraded because the grey values of individual pixels may be altered, or
it may be distorted because the position of individual pixels may be shifted away from their
correct position. The second case is the subject of geometric restoration, which is a type
of image registration.
What is image registration?
Image registration is the establishment of a correspondence between the pixels of two images,
depicting the same scene, on the basis that the corresponding pixels are images of the same
physical patch of the imaged scene. Image registration is a very broad topic, with applications
in medical image processing, remote sensing and multiview vision, and it is beyond the scope
of this book.
How is image restoration performed?
Grey value restoration may be modelled as a linear process, in which case it may be solved by
a linear method. If the degradation is homogeneous, ie the degradation model is the same for
the whole image, then the problem becomes that of dening an appropriate convolution lter
with which to process the degraded image in order to remove the degradation. For linear
but inhomogeneous degradations, a linear solution may be found, but it cannot be expressed
in the form of a simple convolution. For general degradation processes, where linear and
nonlinear eects play a role, nonlinear restoration methods should be used.
What is the dierence between image enhancement and image restoration?
In image enhancement we try to improve the image using subjective criteria, while in image
restoration we are trying to reverse a specic damage suered by the image, using objective
criteria.
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
396 Image Processing: The Fundamentals
5.1 Homogeneous linear image restora-
tion: inverse ltering
How do we model homogeneous linear image degradation?
Under the assumption that the eect which causes the damage is linear, equation (1.15), on
page 13, should be used. Then, in the continuous domain, the output image g(, ) may be
written in terms of the input image f(x, y) as
g(, ) =
_
+
_
+
_
+
G(u, v) =
F(u, v)
H(u, v) (5.3)
where
G,
F and
H are the Fourier transforms of functions g, f and h, respectively.
How may the problem of image restoration be solved?
The problem of image restoration may be solved if we have prior knowledge of the point
spread function or its Fourier transform (the frequency response function) of the degra-
dation process.
How may we obtain information on the frequency response function
H(u, v) of the
degradation process?
1. From the knowledge of the physical process that caused the degradation. For example,
if the degradation is due to diraction,
H(u, v) may be calculated. Similarly, if the
degradation is due to atmospheric turbulence or motion, the physical process may be
modelled and
H(u, v) calculated.
2. We may try to extract information on
H(u, v) or h( x, y) from the image itself,
ie from the eect the process has on the images of some known objects, ignoring the
actual nature of the underlying physical process that takes place.
Inverse ltering 397
Example 5.1
When a certain static scene was being recorded, the camera underwent
planar motion parallel to the image plane (x, y). This motion appeared
as if the scene moved in the x and y directions by distances, which are
functions of time t, x
0
(t) and y
0
(t), respectively. The shutter of the camera
remained open from t = 0 to t = T where T is a positive real number. Write
down the equation that expresses the intensity recorded at pixel position
(x, y) in terms of the scene intensity function f(x, y).
The total exposure time at any point of the recording medium (say the lm) will be T
and we shall have for the blurred image:
g(x, y) =
_
T
0
f(x x
0
(t), y y
0
(t))dt (5.4)
This equation says that all points that were at close enough distances from point (x, y)
to be shifted passing point (x, y) in time interval T, will have their values recorded and
accumulated by the sensor at position (x, y).
Example 5.2
In example 5.1, derive the frequency response function with which you can
model the degradation suered by the image due to the camera motion,
assuming that the degradation was linear with a shift invariant point spread
function.
Consider the Fourier transform
G(u, v) of g(x, y) dened in example 5.1:
G(u, v) =
_
+
_
+
g(x, y)e
2j(ux+vy)
dxdy (5.5)
If we substitute (5.4) into (5.5), we have:
G(u, v) =
_
+
_
+
_
T
0
f(x x
0
(t), y y
0
(t))dte
2j(ux+vy)
dxdy (5.6)
We may exchange the order of the integrals:
G(u, v) =
_
T
0
__
+
_
+
f(x x
0
(t), y y
0
(t))e
2j(ux+vy)
dxdy
_
. .
This is the Fourier transform of a shifted function by x
0
and y
0
in directions x and y, respectively
dt (5.7)
We have shown (see equation (2.241), on page 115) that the Fourier transform of a
398 Image Processing: The Fundamentals
shifted function and the Fourier transform of the unshifted function are related by:
FT of shifted function = (FT of unshifted function)e
2j(ux
0
(t)+vy
0
(t))
(5.8)
Therefore,
G(u, v) =
_
T
0
F(u, v)e
2j(ux
0
(t)+vy
0
(t))
dt (5.9)
where
F(u, v) is the Fourier transform of the scene intensity function f(x, y), ie the
unblurred image.
F(u, v) is independent of time, so it may come out of the integral
sign:
G(u, v) =
F(u, v)
_
T
0
e
2j(ux
0
(t)+vy
0
(t))
dt (5.10)
Comparing this equation with (5.3), we conclude that:
H(u, v) =
_
T
0
e
2j(ux
0
(t)+vy
0
(t))
dt (5.11)
Example 5.3
Assume that the motion in example 5.1 was in the x direction only and
with constant speed
T
, so that y
0
(t) = 0, x
0
(t) =
t
T
. Calculate the frequency
response function of the process that caused motion blurring.
In equation (5.11), substitute y
0
(t) and x
0
(t) to obtain:
H(u, v) =
_
T
0
e
2ju
t
T
dt =
e
2ju
t
T
2ju
T
0
=
T
2ju
_
e
2ju
1
=
T
2ju
_
1 e
2ju
=
Te
ju
2ju
_
e
ju
e
ju
=
Te
ju
2j sin(u)
2ju
= T
sin(u)
u
e
ju
(5.12)
Inverse ltering 399
Example B5.4
It was established that during the time interval T, when the shutter of the
camera was open, the camera moved in such a way that it appeared as
if the objects in the scene moved along the positive y axis, with constant
acceleration 2 and initial velocity s
0
, starting from zero displacement.
Derive the frequency response function of the degradation process for this
case.
In this case x
0
(t) = 0 in (5.11) and
d
2
y
0
dt
2
= 2
dy
0
dt
= 2t +b y
0
(t) = t
2
+bt +c (5.13)
where b and c are some constants of integration, to be specied by the initial conditions
of the problem. We have the following initial conditions:
t = 0 zero shifting, ie c = 0
t = 0 velocity of shifting = s
0
b = s
0
(5.14)
Therefore:
y
0
(t) = t
2
+s
0
t (5.15)
We substitute x
0
(t) and y
0
(t) in equation (5.11) for
H(u, v):
H(u, v) =
_
T
0
e
2jv(t
2
+s
0
t)
dt
=
_
T
0
cos
_
2vt
2
+ 2vs
0
t
dt j
_
T
0
sin
_
2vt
2
+ 2vs
0
t
dt (5.16)
We may use formulae
_
cos(ax
2
+bx +c)dx=
_
2a
_
cos
ac b
2
a
C
_
ax +b
a
_
sin
ac b
2
a
S
_
ax +b
a
__
_
sin(ax
2
+bx +c)dx=
_
2a
_
cos
ac b
2
a
S
_
ax +b
a
_
+ sin
ac b
2
a
C
_
ax +b
a
__
(5.17)
400 Image Processing: The Fundamentals
where S(x) and C(x) are
S(x)
_
2
_
x
0
sin
_
t
2
_
dt
C(x)
_
2
_
x
0
cos
_
t
2
_
dt (5.18)
and they are called Fresnel integrals.
We shall use the above formulae with
a 2v
b 2vs
0
c 0 (5.19)
to obtain:
H(u, v) =
1
2
v
_
cos
2vs
2
0
C
_
2v
t +s
0
_
sin
_
2vs
2
0
_
S
_
2v
t +s
0
_
j cos
2vs
2
0
S
_
2v
t +s
0
_
j sin
_
2vs
2
0
_
C
_
2v
t +s
0
__
T
0
=
1
2
v
_
cos
2vs
2
0
_
C
_
_
2v
(T +s
0
)
_
jS
_
_
2v
(T +s
0
)
__
+sin
2vs
2
0
_
S
_
_
2v
(T +s
0
)
_
+jC
_
_
2v
(T +s
0
)
__
cos
2vs
2
0
_
C
_
_
2v
s
0
_
jS
_
_
2v
s
0
__
sin
2vs
2
0
_
S
_
_
2v
s
0
_
+jC
_
_
2v
s
0
___
(5.20)
Example B5.5
What is the frequency response function for the case of example 5.4, if the
shutter remained open for a very long time and the starting velocity of the
shifting was negligible?
It is known that functions S(x) and C(x), that appear in (5.20), have the following
Inverse ltering 401
asymptotic behaviour:
lim
x+
S(x) =
1
2
lim
x+
C(x) =
1
2
lim
x0
S(x) = 0
lim
x0
C(x) = 0 (5.21)
Therefore, for s
0
0 and T +, we have:
C
_
_
2v
(T +s
0
)
_
1
2
C
_
_
2v
s
0
_
0
S
_
_
2v
(T +s
0
)
_
1
2
S
_
_
2v
s
0
_
0
cos
2vs
2
0
1 sin
2vs
2
0
0 (5.22)
Therefore, equation (5.20) becomes:
H(u, v)
1
2
v
_
1
2
j
1
2
_
=
1 j
4
v
(5.23)
Example B5.6
It was established that during the time interval T, when the shutter of the
camera was open, the camera moved in such a way that it appeared as
if the objects in the scene moved along the positive y axis, with constant
acceleration 2 and initial speed 0, until time T
1
, from which time onwards
they carried on moving with constant speed. Derive the frequency response
function of the degradation process for this case.
Following the notation of example 5.4, we have s
0
= 0 and so equation (5.15) takes
the form: y
0
(t) = t
2
. The rst derivative of y
0
(t) is the speed of the motion at time
t. The nal speed attained at the end of the constant acceleration period is therefore:
s
1
dy(t)
dt
t=T
1
= 2T
1
(5.24)
402 Image Processing: The Fundamentals
Then, applying the results of examples 5.2 and 5.4, for s
0
= 0, and by using (5.21),
the frequency response function of the motion blurring induced is:
H(u, v) =
_
T
1
0
e
2jvt
2
dt +
_
T
T
1
e
2jvs
1
t
dt
=
1
2
v
_
C
_
2vT
1
_
jS
_
2vT
1
__
e
2jvs
1
t
2jvs
1
T
T
1
=
1
2
v
_
C
_
2vT
1
_
jS
_
2vT
1
__
1
2jvs
1
_
e
2jvs
1
T
e
2jvs
1
T
1
_
=
1
2
v
_
C
_
2vT
1
_
jS
_
2vT
1
__
1
2jvs
1
cos(2vs
1
T) j sin(2vs
1
T) cos(2vs
1
T
1
) +j sin(2vs
1
T
1
)
=
1
2
v
C
_
2vT
1
_
+
1
4vT
1
_
sin(4vT
1
T) sin(4vT
2
1
)
j
1
2
v
S
_
2vT
1
_
+j
1
4vT
1
_
cos(4vT
1
T) cos(4vT
2
1
)
(5.25)
Example B5.7
Explain how you may infer the point spread function of the degradation
process from an astronomical image.
We know that by denition the point spread function is the output of the imaging
system when the input is a point source. In an astronomical image, a very distant
star may be considered as a point source. By measuring then the brightness prole of
a star, we immediately have the point spread function of the degradation process this
image has been subjected to.
Example B5.8
Assume that we have an ideal bright straight line in the scene parallel to
the image axis x. Use this information to derive the point spread function
Inverse ltering 403
of the process that degrades the captured image.
Mathematically, the undegraded image of a bright line may be represented as
f(x, y) = (y) (5.26)
where we assume that the line actually coincides with the x axis. Then the image of
this line will be:
h
l
(x, y) =
_
+
_
+
h(x x
, y y
)(y
)dy
dx
=
_
+
h(x x
, y)dx
We change variable x x x
dx
h( x, y)d x (5.27)
The right-hand side of this equation does not depend on x and therefore the left-hand
side should not depend either. This means that the image of the line will be parallel to
the x axis (or rather coincident with it) and its prole will be constant all along it:
h
l
(x, y) = h
l
(y) =
_
+
h
l
( x, y)d x
. .
x is a dummy variable,
independent of x
(5.28)
Take the Fourier transform of h
l
(y):
H
l
(v)
_
+
h
l
(y)e
2jvy
dy (5.29)
The Fourier transform of the point spread function is the frequency response function,
given by:
H(u, v) =
_
+
h(x, y)e
2j(ux+vy)
dxdy (5.30)
If we set u = 0 in this expression, we obtain
H(0, v) =
_
+
__
+
h(x, y)dx
_
. .
h
l
(y) from (5.28)
e
2jvy
dy (5.31)
By comparing equation (5.29) with (5.31), we get:
H(0, v) =
H
l
(v) (5.32)
404 Image Processing: The Fundamentals
This equation is known as the Fourier slice theorem. This theorem states that by
taking a slice of the Fourier transform of a function h(x, y) (ie by setting u = 0 in
_
+
h(x x
, y y
)u(y
)dx
dy
(5.34)
We may dene new variables x x x
, y y y
. Obviously dx
= d x and
dy
_
+
_
+
h( x, y)
u(y y)
y
d xd y (5.36)
Inverse ltering 405
It is known that the derivative of a step function with respect to its argument is a delta
function:
h
e
(x, y)
y
=
_
+
_
+
h( x, y)(y y)d xd y
=
_
+
h( x, y)d x (5.37)
If we compare (5.37) with equation (5.27), we see that the derivative of the image of the
edge is the image of a line parallel to the edge. Therefore, we can derive information
concerning the point spread function of the imaging process by obtaining images of
ideal step edges at various orientations. Each such image should be dierentiated
along a direction orthogonal to the direction of the edge. Each resultant derivative
image should be treated as the image of an ideal line and used to yield the prole of
the point spread function along the direction orthogonal to the line, as described in
example 5.8.
Example B5.10
Use the methodology of example 5.9 to derive the point spread function of
the camera of your mobile phone.
Using a ruler and black ink we create the chart shown in gure 5.1a.
(a) (b)
Figure 5.1: (a) A test chart for the derivation of the point spread function of an
imaging device. (b) The image of the test chart captured with the camera of a Motorola
U9 mobile phone (size 200 284).
This chart can be used to measure the point spread function of our imaging system at
406 Image Processing: The Fundamentals
orientations 0
, 45
, 90
and 135
, 45
, 90
and 135
using
the Robinson operators. These operators are shown in gure 5.2.
1 2 1
0 0 0
1 2 1
(a) M0
2 1 0
1 0 1
0 1 2
(b) M1
1 0 1
2 0 2
1 0 1
(c) M2
0 1 2
1 0 1
2 1 0
(d) M3
Figure 5.2: Filters used to compute the derivative of an image along directions orthog-
onal to 0
, 45
, 90
and 135
.
0 10 20 30 40
0.5
0
0.5
1
1.5
2
(a) (b)
Figure 5.3: (a) The point spread function of the camera when several cross-sections
of the convolved image are averaged. (b) The thresholded version of 5.1b, showing the
problem of variable illumination of the background. The threshold used was 137.
0 5 10 15 20
0.5
0
0.5
1
1.5
2
0 5 10 15 20
0.5
0
0.5
1
1.5
2
x
h(x)
(a) (b)
Figure 5.4: The point spread function of the camera when single cross-sections at
orientations (a) 45
o
and 135
o
and (b) 0
o
and 90
o
are considered.
Inverse ltering 407
The proles of the resultant images, along several lines orthogonal to the original edges,
are computed and averaged to produce the four proles for 0
, 45
, 90
and 135
plotted
in gure 5.3a. These are the proles of the point spread function. However, they do not
look very satisfactory. For a start, they do not have the same peak value. The reason
for this becomes obvious once the original image is thresholded: the illumination of the
background is variable, and so the white background has dierent value in the right
part of the image from that in the left part of the image. This is a case which would
have been beneted if we had applied the process of atelding described on page 366.
In addition, the edges of the test chart are not perfectly aligned with the axes of the
image. This means that averaging several proles will make the point spread function
appear wider than it actually is for the edges that are not perfectly aligned with the
selected direction. To avoid these two problems, we select cross-sections from the left
part of the convolved image only and also use only single cross-sections to avoid the
eect of misalignment. The obtained proles of the point spread function are shown in
gure 5.4. We plot separately the proles that correspond to orientations 45
and 135
,
as the distance of the pixels along these orientations is
2 longer than the distance
of the pixels along 0
and 90
F(u, v) =
G(u, v)
H(u, v)
(5.38)
Then, by taking the inverse Fourier transform of
F(u, v), we should be able to recover f(x, y),
which is what we want. However, this straightforward approach produces unacceptably poor
results.
408 Image Processing: The Fundamentals
What happens at frequencies where the frequency response function is zero?
H(u, v) probably becomes 0 at some points in the (u, v) plane and this means that
G(u, v)
will also be zero at the same points as seen from equation (5.3), on page 396. The ratio
G(u, v)/
H(u, v) as it appears in (5.38) will be 0/0, ie undetermined. All this means is that
for the particular frequencies (u, v) the frequency content of the original image cannot be
recovered. One can overcome this problem by simply omitting the corresponding points in
the frequency plane, provided of course they are countable.
Will the zeros of the frequency response function and the image always coincide?
No. If there is the slightest amount of noise in equation (5.3), the zeros of
H(u, v) will not
coincide with the zeros of
G(u, v). Even if the numerator of (5.38) is extremely small, when
the denominator becomes 0, the result is innitely large. This means that frequencies killed
by the imaging process will actually be innitely amplied.
G(u, v) will always have noise if
nothing else, because of the digitisation process that produces integer valued images.
How can we avoid the amplication of noise?
In many cases, [
H(u, v)[ becomes too small or before its rst zero. In other words, we use
F(u, v) =
M(u, v)
G(u, v) (5.39)
where
M(u, v)
_
1
H(u,v)
for u
2
+v
2
2
0
1 for u
2
+v
2
>
2
0
(5.40)
with
0
chosen so that all zeros of H(u, v) are excluded. Of course, one may use other win-
dowing functions instead of a window with rectangular prole, to make
M(u, v) go smoothly
to zero at
0
.
Example 5.11
When a certain static scene was being recorded, the camera underwent
planar motion parallel to the i axis of the image plane, from right to left.
This motion appeared as if the scene moved by the same distance from left
to right. The shutter of the camera remained open long enough for the
values of i
T
consecutive scene patches, that otherwise would have produced
i
T
consecutive pixels, to be recorded by the same pixel of the produced
image. Write down the equation that expresses the intensity recorded at
pixel position (i, j) in terms of the unblurred image f(i, j) that might have
been produced under ideal conditions.
Inverse ltering 409
The blurred image g(i, j) in terms of the ideal image f(i, j) is given by
g(i, j) =
1
i
T
i
T
1
k=0
f(i k, j) i = 0, 1, . . . , N 1 (5.41)
where i
T
is the total number of pixels with their brightness recorded by the same cell
of the camera and N is the total number of pixels in a row of the image.
Example 5.12
In example 5.11, derive the frequency response function with which you
can model the degradation suered by the image due to the camera motion,
assuming that the degradation is linear with a shift invariant point spread
function.
The discrete Fourier transform of the blurred image g(i, j) is given by (see (2.161)):
G(m, n) =
1
N
2
N1
l=0
N1
t=0
g(l, t)e
j(
2ml
N
+
2nt
N
)
(5.42)
If we substitute g(l, t) from equation (5.41), we have:
G(m, n) =
1
N
2
1
i
T
N1
l=0
N1
t=0
i
T
1
k=0
f(l k, t)e
j(
2ml
N
+
2nt
N
)
(5.43)
We rearrange the order of the summations to obtain:
G(m, n) =
1
i
T
i
T
1
k=0
1
N
2
N1
l=0
N1
t=0
f(l k, t)e
j(
2ml
N
+
2nt
N
)
. .
DFT of shifted f(l, t)
(5.44)
By applying the property of the Fourier transform concerning shifted functions (see
(2.241), on page 115), we have
G(m, n) =
1
i
T
i
T
1
k=0
F(m, n)e
j
2m
N
k
(5.45)
where
F(m, n) is the Fourier transform of the original image. As
F(m, n) does not
depend on k, it can be taken out of the summation:
410 Image Processing: The Fundamentals
G(m, n) =
F(m, n)
1
i
T
i
T
1
k=0
e
j
2m
N
k
(5.46)
We identify then the Fourier transform of the point spread function of the degradation
process as:
H(m, n) =
1
i
T
i
T
1
k=0
e
j
2m
N
k
(5.47)
The sum on the right-hand side of this equation is a geometric progression with ratio
between successive terms:
q e
j
2m
N
(5.48)
We apply then formula (2.165), on page 95, with this q and S = i
T
, to obtain:
H(m, n) =
1
i
T
e
j
2m
N
i
T
1
e
j
2m
N
1
=
1
i
T
e
j
m
N
i
T
_
e
j
m
N
i
T
e
j
m
N
i
T
_
e
j
m
N
_
e
j
m
N
e
j
m
N
_ (5.49)
Therefore:
H(m, n) =
1
i
T
sin
_
m
N
i
T
_
sin
m
N
e
j
m
N
(i
T
1)
m ,= 0 (5.50)
Notice that for m = 0 we have q = 1 and we cannot apply the formula of the geometric
progression. Instead, we have a sum of 1s in (5.47), which is equal to i
T
, and so:
F(u, v) =
_
G(u,v)
H(u,v)
if u < u
0
and v < v
0
G(u, v) if u u
0
or v v
0
(5.52)
Step 4: Take the inverse DFT of
F(u, v) to reconstruct the image.
Example 5.13
Consider the 128 128 image of gure 5.5a. To imitate the way this image
would look if it were blurred by motion, we take every 10 consecutive pixels
along the x axis, nd their average value, and assign it to the tenth pixel.
This is what would have happened if, when the image was being recorded,
the camera had moved 10 pixels to the left: the brightness of a line segment
in the scene with length equivalent to 10 pixels would have been recorded
by a single pixel. The result would look like gure 5.5b. Restore this image
by using inverse ltering omitting division by 0.
The blurred image may be modelled by equation (5.41) with i
T
= 10 and N = 128.
Let us denote by
G(m, n) the Fourier transform of the blurred image. We may analyse
it in its real and imaginary parts:
G(m, n) G
1
(m, n) +jG
2
(m, n) (5.53)
We may then write it in magnitude-phase form
G(m, n) =
_
G
2
1
(m, n) +G
2
2
(m, n)e
j(m,n)
(5.54)
where
cos (m, n) =
G
1
(m, n)
_
G
2
1
(m, n) +G
2
2
(m, n)
sin (m, n) =
G
2
(m, n)
_
G
2
1
(m, n) +G
2
2
(m, n)
(5.55)
To obtain the Fourier transform of the original image, we divide
G(m, n) with
H(m, n):
F(m, n) =
_
G
2
1
(m, n) +G
2
2
(m, n)
sin
i
T
m
N
i
T
sin
m
N
e
j((m,n)+
m
N
(i
T
1))
(5.56)
Therefore, the real and the imaginary parts of
F(m, n), F
1
(m, n) and F
2
(m, n), re-
spectively, are given by:
F
1
(m, n) = i
T
sin
m
N
_
G
2
1
(m, n) +G
2
2
(m, n)
sin
i
T
m
N
cos
_
(m, n) +
m
N
(i
T
1)
_
F
2
(m, n) = i
T
sin
m
N
_
G
2
1
(m, n) +G
2
2
(m, n)
sin
i
T
m
N
sin
_
(m, n) +
m
N
(i
T
1)
_
(5.57)
412 Image Processing: The Fundamentals
If we use formulae cos(a + b) = cos a cos b sin a sin b and sin(a + b) = cos a sin b +
sin a cos b, and substitute for cos (m, n) and sin (m, n) from equations (5.55), we
obtain:
F
1
(m, n) = i
T
sin
m
N
G
1
(m, n) cos
m(i
T
1)
N
G
2
(m, n) sin
m(i
T
1)
N
sin
i
T
m
N
F
2
(m, n) = i
T
sin
m
N
G
1
(m, n) sin
m(i
T
1)
N
+G
2
(m, n) cos
m(i
T
1)
N
sin
i
T
m
N
(5.58)
For m = 0 (see equation (5.51)) we have to set:
F
1
(0, n) = G
1
(0, n) for 0 n N 1
F
2
(0, n) = G
2
(0, n) for 0 n N 1 (5.59)
To restore the image, we use F
1
(m, n) and F
2
(m, n) as the real and the imaginary
parts of the Fourier transform of the undegraded image and take the inverse Fourier
transform. As we are trying to recover a real image, we expect that this inverse trans-
form will yield a zero imaginary part, while the real part will be the restored image. It
turns out that, if we simply take the inverse DFT of F
1
(m, n) + jF
2
(m, n), both real
and imaginary parts consist of irrelevant huge positive and negative numbers. The real
part of the result is shown in 5.5d. Note the strong saturation of the vertical lines.
They are due to the presence of innitely large positive and negative values that are
truncated to the extreme allowable grey values.
This result is totally wrong because in equations (5.58) we divide by 0 for several values
of m.
Indeed, the denominator sin
i
T
m
N
becomes 0 every time
i
T
m
N
is a multiple of :
i
T
m
N
= k m =
kN
i
T
where k = 1, 2, . . . (5.60)
Our image is 128 128, ie N = 128, and i
T
= 10. Therefore, we divide by 0 when
m = 12.8, 25.6, 38.4, etc. As m takes only integer values, the denominator becomes
very small for m = 13, 26, 38, etc. It is actually exactly 0 only for m = 64. Let us
omit this value for m, ie let us use:
F
1
(64, n) = G
1
(64, n) for 0 n 127
F
2
(64, n) = G
2
(64, n) for 0 n 127 (5.61)
The rest of the values of F
1
(m, n) and F
2
(m, n) are as dened by equations (5.58).
If we take the inverse Fourier transform now, we obtain as real part the image in gure
5.5e, with the imaginary part being very nearly 0. The most striking characteristic of
this image is the presence of some vertical stripes.
Inverse ltering 413
Example 5.14
Restore the image of gure 5.5a using inverse ltering and setting the
frequency response function of the degradation process equal to 1 after its
rst 0.
When we select certain frequencies to handle them in a dierent way from the way
we handle other frequencies, we must be careful to treat in the same way positive and
negative frequencies, so that the restored image we get at the end is real. From the way
we have dened the DFT we are using here (see equation (5.42), on page 409), it is
not obvious which frequencies correspond to the negative frequencies, ie which values
of m in (5.58) are paired frequencies.
According to example 5.13, the blurring is along the horizontal direction only. So, our
problem is really only 1D: we may restore the image line by line, as if each image line
were a separate signal that needed restoration. We shall treat formulae (5.58) as if
they were 1D, ie consider n xed, identifying only the image line we are restoring. Let
us remind ourselves what the DFT of a 1D signal f(k) looks like:
F(m)
1
N
N1
k=0
f(k)e
j
2m
N
k
(5.62)
Since m takes values from 0 to 127, we appreciate that the negative frequencies must
be frequencies mirrored from the 127 end of the range. Let us manipulate (5.62) to
identify the exact frequency correspondence:
F(m) =
1
N
N1
k=0
f(k)e
j
2(mN+N)
N
k
=
1
N
N1
k=0
f(k)e
j
2(mN)
N
k
e
j
2N
N
k
. .
=1
=
1
N
N1
k=0
f(k)e
j
2(Nm)
N
k
=
F
(N m) (5.63)
It is obvious from this that if we wish to obtain a real image, whatever we do to
frequencies lower than m
0
, we must also do it to the frequencies higher than N m
0
.
In our example, the rst zero of the frequency response function is for k = 1, ie for
m =
N
i
T
= 12.8 (see (5.60). We use formulae (5.58), therefore, only for 0 m 12
and 116 m 127, and for 0 n 127. Otherwise we use:
F
1
(m, n) = G
1
(m, n)
F
2
(m, n) = G
2
(m, n)
_
for
13 m 115
0 n 127
(5.64)
If we now take the inverse Fourier transform of F
1
(m, n) + jF
2
(m, n), we obtain the
image shown in gure 5.5f (the imaginary part is again virtually 0). This image
looks better than the previous, but more blurred, with the vertical lines (the horizontal
414 Image Processing: The Fundamentals
interfering frequency) still there, but less prominent. The blurring is understandable:
we have eectively done nothing to improve the frequencies above m = 12, so the high
frequencies of the image, responsible for any sharp edges, remain degraded.
(a) Original image (b) Realistic blurring (c) Blurring with cylindri-
MSE = 893 cal boundary condition
MSE = 1260
(d) Real part of inverse (e) Inverse ltering of (b) (f ) Inverse ltering of (b)
ltering of (b) omitting division by 0 omitting division with
MSE = 19962 MSE = 7892 terms beyond the rst 0
MSE = 2325
(g) Real part of inverse (h) Inverse ltering of (c) (i) Inverse ltering of (c)
ltering of (c) omitting division by 0 omitting division with
MSE = 20533 MSE = 61 terms beyond the rst 0
MSE = 194
Figure 5.5: Restoring Dionisia with inverse ltering.
Inverse ltering 415
Example 5.15
Explain the presence of the vertical stripes in restored images 5.5e and
5.5f.
We observe that we have almost 13 vertical stripes in an image of width 128, ie they
repeat every 10 pixels. They are due to the boundary eect: the Fourier transform
assumes that the image is repeated ad innitum in all directions. So it assumes that
the pixels on the left of the blurred image carry the true values of the pixels on the
right of the image. In reality, of course, this is not the case, as the blurred pixels
on the left carry the true values of some points further left that do not appear in the
image. For example, gure 5.6 shows the results of restoring the same original image
when blurred with i
T
= 5, 6, 8. One can clearly count the interfering stripes being
128/5 26, 128/6 = 24 and 128/8 = 16 in these images, respectively. To demonstrate
further that this explanation is correct, we blurred the original image with i
T
= 10,
assuming cylindrical boundary conditions, ie assuming that the image is repeated on the
left. The result is the blurred image of gure 5.5c. The results of restoring this image
by the three versions of inverse ltering are shown in the bottom row of gure 5.5.
The vertical lines have disappeared entirely and we have a remarkably good restoration
in 5.5h, obtained by simply omitting the frequency for which the frequency response
function is exactly 0. The only noise present in this image is quantisation noise: the
restored values are not necessarily integer and they are mapped to the nearest integer.
Unfortunately, in real situations, the blurring is going to be like that of gure 5.5b
and the restoration results are expected to be more like those in gures 5.5e and 5.5f,
rather than those in 5.5h and 5.5i.
(a) i
T
= 5, MSE = 11339 (b) i
T
= 6, MSE = 6793 (c) i
T
= 8, MSE = 2203
Figure 5.6: Restored versions of Dionisia blurred with dierent number i
T
of columns
recorded on top of each other. The interfering horizontal frequency depends on i
T
.
The restorations are by simply omitting division by 0.
Note that the quality of the restorations in gure 5.6 does not follow what we would
instinctively expect: we would expect the restoration of the image blurred with i
T
= 8
to be worse than that of the restoration of the image blurred with i
T
= 5. And yet,
416 Image Processing: The Fundamentals
the opposite is true. This is because 8 is an exact divisor of 128, so we have several
divisions by 0 and we omit all of them. On the contrary, when i
T
= 5, we have
not a single frequency for which the denominator in (5.58) becomes exactly 0, and
so we do not omit any frequency. However, we have many frequencies, where the
denominator in (5.58) becomes near 0. Those frequencies are amplied unnecessarily
and unrealistically, and cause the problem.
Example 5.16
Quantify the quality of the restoration you achieved in examples 5.13, 5.14
and 5.15 by computing the mean square error (MSE) of each restoration.
Comment on the suitability of this measure for image quality assessment.
The mean square error is the sum of the square dierences between the corresponding
pixels of the original (undistorted image) and the restored one, divided by the total
number of pixels in the image. The values of MSE are given in the captions of
gures 5.6 and 5.5. MSE is not suitable for image quality assessment in practical
applications, because it requires the availability of the perfect image, used as reference.
It is only suitable for evaluating and comparing dierent algorithms using simulating
situations. One, however, has to be careful what one measures. The restored images
contain errors and interfering frequencies. These result in out of range grey values.
So, the restored image matrix we obtain contains real values in a range broader than
[0, 255]. To visualise this matrix as an image, we have to decide on whether we truncate
the values outside the range to the extreme 0 and 255 values, or we map the full range
to the [0, 255] range. The results shown in gures 5.6 and 5.5 were produced by clipping
the out of range values. That is why the stripes due to the interfering frequencies have
high contrast. These extreme valued pixels contribute signicantly to the MSE we
compute. Figure 5.7 shows the histograms of the original image 5.5a and the two
restored versions of it shown in 5.5e and 5.5f. We can see the dominant peaks at grey
values 0 and 255, which contribute to the extreme values of MSE for these images.
Figure 5.8 shows the results we would have obtained if the whole range of obtained
values had been mapped to the range [0, 255]. The value of MSE now is lower, but the
images are not necessarily better.
0 50 100 150 200 250
0
100
200
300
400
500
Pixel count
Grey value
0 50 100 150 200 250
0
500
1000
1500
2000
2500
3000
3500
Pixel count
Grey value
0 50 100 150 200 250
0
500
1000
1500
Pixel count
Grey value
Figure 5.7: Histograms of the original image 5.5a (left), and the restored images 5.5e
(middle) and 5.5f (right).
Inverse ltering 417
(a) MSE = 4205 (b) MSE = 2096
Figure 5.8: Images obtained by mapping the full range of restored values to [0, 255].
They should be compared with 5.5e and 5.5f, respectively, as those were produced by
clipping the out of range values. We note that scaling linearly the full range of values
tends to produce images of low contrast.
Can we dene a lter that will automatically take into consideration the noise in
the blurred image?
Yes. One such lter is the Wiener lter, which treats the image restoration problem as an
estimation problem and solves it in the least square error sense. This will be discussed in the
next section.
Example 5.17
Restore the blurred and noisy images shown in gure 5.9. They were
produced by adding white Gaussian noise with standard deviation 10 or 20
to the blurred images 5.5b and 5.5c.
The results shown in gures 5.9d5.9f are really very bad: High frequencies dominated
by noise are amplied by the lter to the extent that they dominate the restored image.
When the lter is truncated beyond its rst 0, the results, shown in gures 5.9g5.9i
are reasonable.
418 Image Processing: The Fundamentals
(a) Realistic blurring with (b) Realistic blurring with (c) Blurring (cylindrical
added Gaussian noise added Gaussian noise boundary condition) plus
with = 10. with = 20. Gaussian noise ( = 10)
MSE = 994 MSE = 1277 MSE = 1364
(d) Inverse ltering of (a), (e) Inverse ltering of (b), (f ) Inverse ltering of (c),
omitting division by 0 omitting division by 0 omitting division by 0
MSE = 15711 MSE = 18010 MSE = 14673
(g) Inverse ltering of (a), (h) Inverse ltering of (b), (i) Inverse ltering of (c),
but omitting division with but omitting division with but omitting division with
terms beyond the rst 0 terms beyond the rst 0 terms beyond the rst 0
MSE = 2827 MSE = 3861 MSE = 698
Figure 5.9: Restoring the image of noisy Dionisia, with inverse ltering. All MSE
values have been computed in relation to 5.5a.
Wiener ltering 419
5.2 Homogeneous linear image restora-
tion: Wiener ltering
How can we express the problem of image restoration as a least square error
estimation problem?
If
f(r) is an estimate of the original undegraded image f(r), we wish to calculate
f(r) so that
the norm of the residual image f(r)
f(r) is minimal over all possible versions of image f(r).
This is equivalent to saying that we wish to identify
f(r) which minimises:
e
2
E
_
[f(r)
f(r)]
2
_
(5.65)
Can we nd a linear least squares error solution to the problem of image restora-
tion?
Yes, by imposing the constraint that the solution
f(r) is a linear function of the degraded
image g(r). This constraint is valid if the process of image degradation is assumed to be
linear, ie modelled by an equation like (5.1), on page 396. Clearly, if this assumption is
wrong, the solution found this way will not give the absolute minimum of e
2
but it will make
e
2
minimum within the limitations of the constraints imposed.
The solution of a linear problem is also linear, so we may express
f(r) as a linear function
of the grey levels of the degraded image, ie
f(r) =
_
+
_
+
m(r, r
)g(r
)dr
(5.66)
where m(r, r
) is the function we want to determine and which gives the weight by which the
grey level value of the degraded image g at position r
as
opposed to depending on them separately. In that case (5.66) may be written as:
f(r) =
_
+
_
+
m(r r
)g(r
)dr
(5.67)
This equation means that we wish to identify a lter m(r) with which to convolve the
degraded image g(r
_
+
h(r r
)f(r
)dr
+(r) (5.68)
where g(r), f(r) and (r) are considered to be random elds, with (r) being the noise eld.
420 Image Processing: The Fundamentals
What is the linear least mean square error solution of the image restoration
problem?
If
M(u, v) is the Fourier transform of lter m(r), it can be shown (see Box 5.3, on page 428)
that the linear solution of equation (5.65) can be obtained if
M(u, v) =
H
(u, v)
[
H(u, v)[
2
+
S
(u,v)
S
ff
(u,v)
(5.69)
where
H(u, v) is the Fourier transform of the point spread function of the degradation process,
) satises
E
__
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
g(s)
_
= 0 (5.70)
then it minimises the error dened by equation (5.65).
Intuitively, we can see that this is true, because equation (5.70) says that the error of
the estimation (expressed by the quantity inside the square bracket) is orthogonal to
the data. This is what least squares error estimation does. (Remember how when we
t a least squares error line to some points, we minimise the sum of the distances of
these points from the line.) Next, we shall prove this mathematically.
If we substitute equation (5.67) into equation (5.65) we obtain:
e
2
= E
_
_
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
2
_
(5.71)
Consider now another function m
(r) when used for the restoration of the image, will produce an estimate
f
(r) with
error e
2
, greater than the error of the estimate obtained by m(r) which does satisfy
(5.70). Error e
2
will be:
e
2
E
_
_
f(r)
_
+
_
+
(r r
)g(r
)dr
_
2
_
(5.72)
Inside the integrand in (5.72) we add to and subtract from m(rr
) function m
(rr
).
Wiener ltering 421
We split the integral into two parts and then expand the square:
e
2
=E
_
_
f(r)
_
+
_
+
[m
(r r
) +m(r r
) m(r r
)]g(r
)dr
_
2
_
=E
___
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
+
__
+
_
+
[m(r r
) m
(r r
)]g(r
)dr
__
2
_
=E
_
_
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
2
_
. .
e
2
+ E
_
__
+
_
+
[m(r r
) m
(r r
)]g(r
)dr
_
2
_
. .
a non-negative number
+ 2E
__
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
_
+
_
+
[m(r r
) m
(r r
)]g(r
)dr
_
. .
rename r
s
(5.73)
The expectation value of the rst term is e
2
and clearly the expectation value of the
second term is a non-negative number. In the last term, in the second factor, change the
dummy variable of integration from r
_
+
m(r r
)g(r
)dr
__
+
_
+
[m(r s) m
(r s)]g(s)ds
_
(5.74)
The rst factor in the above expression does not depend on s and thus it can be put
inside the double integral sign:
E
__
+
_
+
_
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
[m(r s) m
(r s)] g(s)ds
_
(5.75)
The dierence [m(r s) m
_
+
E
__
f(r)
_
+
_
+
m(r r
)g(r
)dr
_
g(s)
_
[m(r s) m
(r s)]ds (5.76)
422 Image Processing: The Fundamentals
However, the expectation value in the above term is 0 according to (5.70), so from (5.73)
we get:
e
2
= e
2
+ a non-negative term (5.77)
We conclude that the error of the restoration created with the m
_
+
G(u, v) =
H
(u, v)
F(u, v) (5.79)
where
H
G(u, v) =
_
+
_
+
g(x, y)e
j(ux+vy)
dxdy (5.80)
F(u, v) =
_
+
_
+
f(x, y)e
j(ux+vy)
dxdy (5.81)
H(u, v) =
_
+
_
+
h(x, y)e
j(ux+vy)
dxdy (5.82)
The complex conjugate of
H(u, v) is
(u, v) =
_
+
_
+
h(x, y)e
j(ux+vy)
dxdy (5.83)
since h(x, y) is real.
Let us substitute g(x, y) from (5.78) into the right-hand side of (5.80):
G(u, v) =
_
+
_
+
_
+
_
+
h( x x, y y)f( x, y)d xd ye
j(ux+vy)
dxdy (5.84)
Wiener ltering 423
We dene new variables of integration s
1
xx and s
2
y y to replace integration
over x and y. Since dx = ds
1
and dy = ds
2
, dxdy = ds
1
ds
2
. Also, as the limits
of both s
1
and s
2
are from + to , we can change their order without worrying
about a change of sign:
G(u, v) =
_
+
_
+
_
+
_
+
h(s
1
, s
2
)f( x, y)d xd ye
j(u( xs
1
)+v( ys
2
))
ds
1
ds
2
(5.85)
The two double integrals are separable:
G(u, v) =
_
+
_
+
h(s
1
, s
2
)e
j(us
1
+vs
2
)
ds
1
ds
2
_
+
_
+
f( x, y)e
j(u x+v y)
d xd y (5.86)
On the right-hand side of this equation we recognise the product of
F(u, v) and
H
(u, v)
from equations (5.81) and (5.83), respectively. Therefore, equation (5.79) has been
proven.
Example B5.19
If
R
ff
and
R
gf
are the Fourier transforms of the autocorrelation function
of eld f(r), and the cross-correlation function between elds g(r) and f(r),
respectively, related by equation (5.68), show that
R
gf
(u, v) =
H
(u, v)
R
ff
(u, v) (5.87)
where
H
_
+
h(r r
) Ef(r
)f(r +s)
. .
R
ff
(r+sr
)
dr
+Ef(r +s)(r)(5.88)
The last term in the above equation is 0, because, due to the uncorrelatedness of f(r)
and (r), it may be written as Ef(r +s)E(r) and E(r) = 0. Therefore:
R
gf
(s) =
_
+
_
+
h(r r
)R
ff
(r r
+s)dr
(5.89)
Let us dene a new vector of integration rr
_
+
h(s s)R
ff
(s)ds (5.90)
According to (5.79), this equation may be written as
R
gf
(u, v) =
H
(u, v)
R
ff
(u, v) (5.91)
in terms of Fourier transforms, where (u, v) are the frequencies along the two axes.
Example B5.20
If
R
gg
(u, v),
R
fg
(u, v) and
R
g
(u, v) are the Fourier transforms of the autocor-
relation function of the homogeneous random eld g(x, y), and the cross-
correlation functions between the homogeneous random elds f(x, y) and
g(x, y), and (x, y) and g(x, y), respectively,
H(u, v) is the Fourier transform
of h(x, y), and
g(x, y) =
_
+
_
+
R
gg
(u, v) =
H
(u, v)
R
fg
(u, v) +
R
g
(u, v) (5.93)
If we multiply both sides of equation (5.92) with g(x+s
1
, y+s
2
) and take the ensemble
average over all versions of random eld g(x, y), we obtain:
Eg(x, y)g(x +s
1
, y +s
2
) =
E
_
g(x +s
1
, y +s
2
)
_
+
_
+
_
+
_
+
_
+
h(x x, y y)R
fg
(x x +s
1
, y y +s
2
)d xd y +R
g
(s
1
, s
2
) (5.97)
We may dene new variables of integration: x x , y y . Then d xd y = dd,
and the change of sign of the two sets of limits of integration cancel each other out:
R
gg
(s
1
, s
2
) =
_
+
_
+
h(, )R
fg
( +s
1
, +s
2
)dd +R
g
(s
1
, s
2
) (5.98)
We may change variables of integration again, to w + s
1
, z + s
2
. Then
= w s
1
, = z s
2
, dd = dwdz and the limits of integration are not aected:
R
gg
(s
1
, s
2
) =
_
+
_
+
h(w s
1
, z s
2
)R
fg
(w, z)dwdz +R
g
(s
1
, s
2
) (5.99)
If we take the Fourier transform of both sides of this expression, and make use of
(5.79), we may write:
R
gg
(u, v) =
H
(u, v)
R
fg
(u, v) +
R
g
(u, v) (5.100)
Example B5.21
If
R
ff
(u, v) and
R
fg
(u, v) are the Fourier transforms of the autocorrelation
function of the homogeneous random eld f(x, y), and the cross-correlation
function between the homogeneous random elds f(x, y) and g(x, y), respec-
tively,
H(u, v) is the Fourier transform of h(x, y), and
g(x, y) =
_
+
_
+
R
fg
(u, v) =
H(u, v)
R
ff
(u, v) (5.102)
We multiply both sides of (5.101) with f(xs
1
, y s
2
) and take the expectation value.
The reason we multiply with f(xs
1
, y s
2
) and not with f(x+s
1
, y +s
2
) is in order
to be consistent with example 5.19. In that example, we formed the shifting arguments
of R
gf
by subtracting the arguments of g from the arguments of f. Following the same
convention here will result in positive arguments for R
fg
.
With this proviso, on the left-hand side of the resultant equation then we shall have
the cross-correlation between random elds f(x, y) and g(x, y), R
fg
(s
1
, s
2
). On the
right-hand side we exchange the order of expectation and integration and observe that
the expectation operator is applied only to the random components:
R
fg
(s
1
, s
2
) =
_
+
_
+
_
+
h(x x, y y)R
ff
( x x +s
1
, y y +s
2
)d xd y (5.105)
We dene new variables of integration x x, y y:
R
fg
(s
1
, s
2
) =
_
+
_
+
h(, )R
ff
(s
1
, s
2
)dd (5.106)
The above equation is a straightforward convolution, and in terms of Fourier trans-
forms it is written as (5.102).
Wiener ltering 427
Example B5.22
If
R
(u, v) and
R
g
(u, v) are the Fourier transforms of the autocorrelation
function of the homogeneous random eld (x, y), and the cross-correlation
function between the homogeneous random elds (x, y) and g(x, y), and
g(x, y) =
_
+
_
+
R
g
(u, v) =
R
(u, v) (5.108)
We multiply both sides of equation (5.107) with (xs
1
, ys
2
) and take the expectation
value:
R
g
(s
1
, s
2
) =
_
+
_
+
(s
1
, s
2
)
(5.109)
The integral term vanishes because the two elds are uncorrelated and at least one of
them has zero mean. Then, taking the Fourier transform of both sides yields (5.108).
Box 5.2. From the Fourier transform of the correlation functions of images
to their spectral densities
The autocorrelation and the cross-correlation functions that were computed in examples
5.195.22 were computed in the ensemble sense. If we invoke the ergodicity assumption,
we may say that these correlations are the same as the spatial correlations of the corre-
sponding images (treated as random elds). Then, we may apply the Wiener-Khinchine
theorem (see Box 4.5, on page 325) to identify the Fourier transforms of the correla-
tion functions with the corresponding power spectral densities of the images. Thus,
equations (5.87), (5.93), (5.102) and (5.108) may be replaced with
S
gf
(u, v) =
H
(u, v)S
ff
(u, v) (5.110)
S
gg
(u, v) =
H
(u, v)S
fg
(u, v) +S
g
(u, v) (5.111)
S
fg
(u, v) =
H(u, v)S
ff
(u, v) (5.112)
S
g
(u, v) = S
(u, v) (5.113)
428 Image Processing: The Fundamentals
where S
fg
(u, v), S
gf
(u, v) and S
g
(u, v) are the cross-spectral densities between the real
and the observed image, the observed and the real image and the noise eld and the
observed image, respectively, with the convention that the arguments of the rst eld
are subtracted from the second to yield the shifting variable. S
gg
(u, v), S
ff
(u, v) and
S
(u, v) are the power spectral densities of the observed image, the unknown image
and the noise eld, respectively.
In general, however, the elds are not ergodic. Then, it is postulated that the Fourier
transforms of the auto- and cross-correlation functions are the spectral and cross-
spectral densities respectively, of the corresponding random elds.
The above equations are used in the development of the Wiener lter, and so the
ergodicity assumption is tacitly made in this derivation.
Box 5.3. Derivation of the Wiener lter
In order to identify lter m(r r
_
+
m(r r
)g(r
)g(s)dr
_
= 0 (5.115)
where g(s) has gone inside the integral because it does not depend on r
. The expectation
operator applied to the second term operates really only on the random elds g(r
) and
g(s). Therefore, we may write:
E f(r)g(s)
. .
R
gf
(r,s)
=
_
+
_
+
m(r r
) Eg(r
)g(s)
. .
R
gg
(r
,s)
dr
(5.116)
In this expression we recognise the denitions of the autocorrelation and cross-
correlation functions of the random elds, so we may write:
_
+
_
+
m(r r
)R
gg
(r
, s)dr
= R
gf
(r, s) (5.117)
Wiener ltering 429
We have seen that for homogeneous random elds, the correlation function can be
written as a function of the dierence of its two arguments (example 3.24, page 196).
So:
_
+
_
+
m(r r
)R
gg
(r
s)dr
= R
gf
(r s) (5.118)
We introduce some new variables: r
s t and r s . Therefore, dr
= dt and
r r
= t. Then:
_
+
_
+
m( t)R
gg
(t)dt = R
gf
() (5.119)
This is a convolution between the autocorrelation function of the degraded image and
the sought lter. According to the convolution theorem, the eect is equivalent to the
multiplication of the Fourier transforms of the two functions:
M(u, v)S
gg
(u, v) = S
gf
(u, v) (5.120)
Here S
gg
and S
fg
are the spectral density of the degraded image and the cross-spectral
density of the undegraded and degraded images, respectively, ie the Fourier transforms of
the autocorrelation function of g and cross-correlation of f and g functions, respectively
(see Box 5.2, on page 427). Therefore:
M(u, v) =
S
gf
(u, v)
S
gg
(u, v)
(5.121)
The Fourier transform of the optimal restoration lter, which minimises the mean square
error between the real image and the reconstructed one, is equal to the ratio of the cross-
spectral density of the degraded image and the true image, over the spectral density of
the degraded image.
If we substitute from equations (5.112) and (5.113) into equation (5.111), we obtain:
S
gg
(u, v) = S
ff
(u, v)[
H(u, v)[
2
+S
(u, v) (5.122)
If we substitute then equations (5.110) and (5.122) into (5.121), we obtain
M(u, v) =
H
(u, v)S
ff
(u, v)
S
ff
(u, v)[
H(u, v)[
2
+S
(u, v)
(5.123)
or:
M(u, v) =
H
(u, v)
[
H(u, v)[
2
+
S
(u,v)
S
ff
(u,v)
(5.124)
This equation gives the Fourier transform of the Wiener lter for image restoration.
430 Image Processing: The Fundamentals
What is the relationship between Wiener ltering and inverse ltering?
If we multiply the numerator and denominator of (5.124) with
H(u, v), we obtain:
M(u, v) =
1
H(u, v)
[
H(u, v)[
2
[
H(u, v)[
2
+
S
(u,v)
S
ff
(u,v)
(5.125)
In the absence of noise, S
(u, v) = constant = S
(0, 0) =
_
+
_
+
(u,v)
S
ff
(u,v)
in equation (5.125) with a
constant and experiment with various values of .
This is clearly rather an oversimplication, as the ratio
S
(u,v)
S
ff
(u,v)
is a function of (u, v) and
not a constant. So, we may try to estimate both the spectrum of the noise and the spectrum of
the undegraded image from the spectrum of the degraded image. Let us assume for simplicity
that all functions that appear in lter (5.69), on page 420, are functions of
u
2
+v
2
.
Let us also say that the rst zero of
H(u, v)
H(u, v) happens for
0
_
u
2
0
+v
2
0
. Then there
will be a strip of frequencies around frequency
0
, inside which
H(u, v) stops being reliable
and the noise eects become serious. Let us also consider two frequencies,
1
and
2
, such
that for frequencies <
1
,
H(u, v) behaves well and we may use inverse ltering, while
for frequencies >
2
, the power spectrum we observe is totally dominated by noise. This
assumption is valid, as long as the noise is assumed white, and so it has a constant power
spectrum, while the unknown image is assumed to have a spectrum that decays fast for
high frequencies. We may then consider the power spectrum of the observed image beyond
Wiener ltering 431
frequency
2
and use it to estimate the power spectrum of the noise, by taking, for example,
its average over all frequencies beyond
2
. Further, we may make the assumption that the
power spectrum of the unknown image decays exponentially beyond frequency
1
. We may
apply then (5.122) at frequency
1
to work out the model parameters for S
ff
:
S
gg
(
1
) = S
ff
(
1
)[
H(
1
)[
2
+S
(5.127)
Note that S
H(
1
)[
2
(5.128)
Assuming an exponential decay for S
ff
() when >
1
, we may write
S
ff
() = S
ff
(
1
)e
(
1
)
(5.129)
where is some positive constant. We may then dene a lter as follows:
M() =
_
_
_
1
H()
if <
1
()
|
H()|
2
+S()
if
1
(5.130)
Here:
S()
S
S
ff
(
1
)e
(
1
)
S
S
ff
(
1
)
=
S
S
ff
(
1
)
_
e
(
1
)
1
_
(5.131)
Note that when =
1
, the Wiener branch of this lter coincides with the inverse lter.
For >>
1
the 1 in the denominator of the second branch of the lter is negligible
in comparison with the exponential term, and so the lter behaves like the Wiener lter.
Parameter should be selected so that S
ff
(
1
)e
(
2
1
)
<< S
.
Figure 5.10 shows schematically how this lter is dened.
How do we apply Wiener ltering in practice?
In summary, Wiener ltering may be implemented as follows.
Step 0: Somehow work out the Fourier transform of the point spread function of the degra-
dation process,
H(u, v).
Step 1: Take the Fourier transform of the observed degraded image,
G(u, v).
Step 2: Select a value for constant and multiply
G(u, v) point by point with
M(u, v) =
H
(u, v)
[
H(u, v)[
2
+
(5.132)
Step 3: Take the inverse Fourier transform to recover the restored image.
432 Image Processing: The Fundamentals
0
2
1
S ( )
gg
Figure 5.10: S
gg
() is the power spectrum of the observed image. Let us say that the
frequency response function with which we model the degradation process has its rst 0 at
frequency
0
. We observe that for frequencies beyond
2
the spectrum attens out, and,
therefore, under the assumption of white noise, we may infer that at those frequencies the
spectrum is dominated by the noise. We may thus compute the constant power spectrum
of the noise, S
.
Step 5: Identify some frequencies u
1
< u
0
and v
1
< v
0
, compute S
gg
(u
1
, v
1
) and set:
S
ff
(u
1
, v
1
) =
S
gg
(u
1
, v
1
) S
H(u
1
, v
1
)[
2
(5.133)
Step 6: Select a value for so that
S
ff
(u
1
, v
1
)e
u
2
2
+v
2
2
u
1
+v
2
1
0.1S
(5.134)
Step 7: Multiply
G(u, v) point by point with
M(u, v) =
_
_
1
H(u,v)
if u < u
1
and v < v
1
(u,v)
|
H(u,v)|
2
+
S
S
ff
(u
1
,v
1
)
e
(
u
2
+v
2
u
2
1
+v
2
1
)
1
if u u
1
or v v
1
(5.135)
Step 8: Take the inverse Fourier transform to recover the restored image.
Wiener ltering 433
Example 5.23
Restore the blurred images of gures 5.5a, on page 414, 5.9a and 5.9b, on
page 418, by using Wiener ltering.
From equation (5.50), on page 410, we have:
[
H(m, n)[
2
=
1
i
2
T
sin
2 m
N
sin
2
i
T
m
N
(5.136)
We shall use the Wiener lter as given by equation (5.132), with the ratio of the
spectral densities in the denominator replaced by a constant :
M(m, n) =
1
i
T
sin(
m
N
i
T )
sin
m
N
e
j
m(i
T
1)
N
1
i
2
T
sin
2 m
N
i
T
sin
2 m
N
+
(5.137)
Or:
M(m, n) =
i
T
sin
m
N
sin
_
m
N
i
T
_
sin
2
_
m
N
i
T
_
+ i
2
T
sin
2 m
N
e
j
m(i
T
1)
N
(5.138)
We must be careful for the case m = 0, when we have:
M(0, n) =
1
1 +
for 0 n N 1 (5.139)
If we multiply with this function the Fourier transform of the blurred image, as dened
by equation (5.54), on page 411, we obtain:
F(m, n) =
i
T
sin
m
N
sin
i
T
m
N
_
G
2
1
(m, n) +G
2
2
(m, n)e
j
(m,n)+
(i
T
1)m
N
sin
2 i
T
m
N
+ i
2
T
sin
2 m
N
(5.140)
For the case m = 0, we have:
F(0, n) =
_
G
2
1
(0, n) +G
2
2
(0, n)
1 +
e
j(0,n)
for 0 n N 1 (5.141)
The real and the imaginary parts of
F(m, n) are given by:
434 Image Processing: The Fundamentals
F
1
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
_
G
2
1
(m, n) +G
2
2
(m, n)
sin
2 i
T
m
N
+ i
2
T
sin
2 m
N
cos
_
(m, n) +
(i
T
1)m
N
_
F
2
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
_
G
2
1
(m, n) +G
2
2
(m, n)
sin
2 i
T
m
N
+ i
2
T
sin
2 m
N
sin
_
(m, n) +
(i
T
1)m
N
_
(5.142)
If we use formulae cos(a + b) = cos a cos b sin a sin b and sin(a + b) = sin a cos b +
cos a sin b and substitute cos (m, n) and sin (m, n) from equations (5.55), we obtain:
F
1
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
sin
2i
T
m
N
+i
2
T
sin
2 m
N
_
G
1
(m, n)cos
(i
T
1)m
N
G
2
(m, n)sin
(i
T
1)m
N
_
F
2
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
sin
2 i
T
m
N
+i
2
T
sin
2 m
N
_
G
1
(m, n)sin
(i
T
1)m
N
+G
2
(m, n)cos
(i
T
1)m
N
_
(5.143)
For m = 0 we must remember to use:
F
1
(0, n) =
G
1
(0, n)
1 +
for 0 n N 1
F
2
(0, n) =
G
2
(0, n)
1 +
for 0 n N 1 (5.144)
If we take the inverse Fourier transform, using functions F
1
(m, n) and F
2
(m, n) as
the real and the imaginary part of the transform, respectively, we obtain the restored
image shown in gure 5.11a. This image should be compared with images 5.5e and
5.5f, on page 414, which were obtained by inverse ltering.
The restoration of the noisy images of gures 5.9a and 5.9b by Wiener ltering is
shown in gures 5.11b and 5.11c. These images should be compared with gures 5.9g
and 5.9h, respectively. In all cases, Wiener ltering produces superior results. We
note, that if we use too small a value of , the eect of the correction term in the
denominator of the lter is insignicant. If we use too high , the restored image is
very smoothed. For the case with no noise, we obtained acceptable results for in the
range from about 0.01 to 0.03. For = 10, we obtained acceptable results for from
about 0.025 to 0.05, while for = 20, the best results were obtained for from about
0.03 to 0.06.
Wiener ltering 435
Input: image 5.5b Input: image 5.9a Input: image 5.9b
= 0.001, MSE = 2273 = 0.001, MSE = 7242 = 0.001, MSE = 11889
= 0.012, MSE = 859 = 0.04, MSE = 1001 = 0.043, MSE = 1707
= 0.099, MSE = 745 = 0.099, MSE = 831 = 0.099, MSE = 1076
= 0.999, MSE = 4603 = 0.999, MSE = 4610 = 0.999, MSE = 4612
Figure 5.11: Dionisia restored with Wiener ltering.
436 Image Processing: The Fundamentals
5.3 Homogeneous linear image restora-
tion: Constrained matrix inversion
If the degradation process is assumed linear, why dont we solve a system of linear
equations to reverse its eect instead of invoking the convolution theorem?
Indeed, the system of linear equations we must invert is given in matrix form by equation
(1.38), on page 19, g = Hf . However, g is expected to be noisy, so we shall rewrite this
equation including an explicit noise term:
g = Hf + (5.145)
Here is the noise eld written in vector form.
Since we assumed that we have some knowledge about the point spread function of the
degradation process, matrix H is assumed to be known. Then
f = H
1
g H
1
(5.146)
where H is an N
2
N
2
matrix, and f , g and are N
2
1 vectors, for an N N image.
Equation (5.146) seems pretty straightforward, why bother with any other ap-
proach?
There are two major problems with equation (5.146).
1) It is extremely sensitive to noise. It has been shown that one needs impossibly low levels
of noise for the method to work.
2) The solution of equation (5.146) requires the inversion of an N
2
N
2
square matrix, with
N typically being 500, which is a formidable task even for modern computers.
Example 5.24
Demonstrate the sensitivity to noise of the inverse matrix restoration.
Let us consider the signal given by:
f(x) = 25 sin
2x
30
for x = 0, 1, . . . , 29 (5.147)
Assume that this signal is blurred by a function that averages every three samples after
multiplying them with some weights. We can express this by saying that the discrete
signal is multiplied with matrix
H =
_
_
_
_
_
_
_
_
_
0.4 0.3 0 0 . . . 0 0 0 0.3
0.3 0.4 0.3 0 . . . 0 0 0 0
0 0.3 0.4 0.3 . . . 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 . . . 0 0 0.4 0.3
0.3 0 0 0 . . . 0 0 0.3 0.4
_
_
_
_
_
_
_
_
_
(5.148)
Constrained matrix inversion 437
to produce a blurred signal g(x). In a digital system, the elements of g(x) are rounded
to the nearest integer. To recover the original signal we multiply the blurred signal
g(x) with the inverse of matrix H. The original and the restored signals are shown in
gure 5.12.
0 10 20 30
50
25
0
25
50
x
f(x)
Figure 5.12: An original signal (smooth line) and its restored version by direct matrix
inversion, using the exact inverse of the matrix that caused the distortion in the rst
place. The noise in the signal was only due to rounding errors.
Is there any way by which matrix H can be inverted?
Yes, for the case of homogeneous linear degradation, matrix H can easily be inverted because
it is a block circulant matrix.
When is a matrix block circulant?
A matrix H is block circulant if it has the following structure:
H =
_
_
_
_
_
_
_
H
0
H
M1
H
M2
. . . H
1
H
1
H
0
H
M1
. . . H
2
H
2
H
1
H
0
. . . H
3
.
.
.
.
.
.
.
.
.
.
.
.
H
M1
H
M2
H
M3
H
0
_
_
_
_
_
_
_
(5.149)
Here H
0
, H
1
, . . . , H
M1
are partitions of matrix H, and they are themselves circulant
matrices.
438 Image Processing: The Fundamentals
When is a matrix circulant?
A matrix D is circulant if it has the following structure:
D =
_
_
_
_
_
_
_
d(0) d(M 1) d(M 2) . . . d(1)
d(1) d(0) d(M 1) . . . d(2)
d(2) d(1) d(0) . . . d(3)
.
.
.
.
.
.
.
.
.
.
.
.
d(M 1) d(M 2) d(M 3) . . . d(0)
_
_
_
_
_
_
_
(5.150)
In such a matrix, each column can be obtained from the previous one by shifting all
elements one place down and putting the last element at the top.
Why can block circulant matrices be inverted easily?
Circulant and block circulant matrices can easily be inverted because we can nd easily their
eigenvalues and eigenvectors.
Which are the eigenvalues and eigenvectors of a circulant matrix?
We dene the set of scalars
(k) d(0) +d(M 1) exp
_
2j
M
k
_
+d(M 2) exp
_
2j
M
2k
_
+. . . +d(1) exp
_
2j
M
(M 1)k
_
(5.151)
and the set of vectors
w(k)
1
M
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
exp
_
2j
M
k
exp
_
2j
M
2k
.
.
.
exp
_
2j
M
(M 1)k
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.152)
where k takes up values k = 0, 1, 2, . . . , M 1. It can be shown then by direct substitution
that
Dw(k) = (k)w(k) (5.153)
ie (k) are the eigenvalues of matrix D (dened by equation (5.150)) and w(k) are its corre-
sponding eigenvectors.
Constrained matrix inversion 439
How does the knowledge of the eigenvalues and the eigenvectors of a matrix help
in inverting the matrix?
If we form matrix W which has the eigenvectors of matrix D as its columns, we know that
we can write
D = WW
1
(5.154)
where W
1
has elements (see example 5.25)
W
1
(k, i) =
1
M
exp
_
j
2
M
ki
_
(5.155)
and is a diagonal matrix with the eigenvalues along its diagonal. Then, the inversion of
matrix D is trivial:
D
1
= (WW
1
)
1
= (W
1
)
1
1
W
1
= W
1
W
1
(5.156)
Example 5.25
Consider matrix W the columns of which w(0), w(1), . . . , w(M 1) are given
by equation (5.152). Show that matrix Z with elements
Z(k, i) =
1
M
exp
_
2j
M
ki
_
(5.157)
is the inverse of matrix W.
We have:
W =
1
M
_
_
_
_
_
_
_
_
1 1 1 . . . 1
1 e
2j
M
e
2j
M
2
. . . e
2j
M
(M1)
1 e
2j
M
2
e
2j
M
4
. . . e
2j
M
2(M1)
.
.
.
.
.
.
.
.
.
.
.
.
1 e
2j
M
(M1)
e
2j
M
2(M1)
. . . e
2j
M
(M1)
2
_
_
_
_
_
_
_
_
(5.158)
Z =
1
M
_
_
_
_
_
_
_
_
1 1 1 . . . 1
1 e
2j
M
e
2j
M
2
. . . e
2j
M
(M1)
1 e
2j
M
2
e
2j
M
4
. . . e
2j
M
2(M1)
.
.
.
.
.
.
.
.
.
.
.
.
1 e
2j
M
(M1)
e
2j
M
2(M1)
. . . e
2j
M
(M1)
2
_
_
_
_
_
_
_
_
(5.159)
440 Image Processing: The Fundamentals
ZW =
1
M
_
_
_
_
_
_
_
_
M
M1
k=0
e
2j
M
k
. . .
M1
k=0
e
2j
M
(M1)k
M1
k=0
e
2j
M
k
M . . .
M1
k=0
e
2j
M
2(M2)k
.
.
.
.
.
.
.
.
.
M1
k=0
e
2j
M
(M1)k
M1
k=0
e
2j
M
2(M2)k
. . . M
_
_
_
_
_
_
_
_
(5.160)
All the o-diagonal elements of this matrix are of the form:
M1
k=0
e
2jt
M
k
where t is
some positive or negative integer. We may then apply (2.167), on page 95, with m k
and S M to show that all the o-diagonal elements are 0 and thus recognise that
ZW is the identity matrix, ie that Z = W
1
.
Example 5.26
For M = 3 show that (k) dened by equation (5.151) and w(k) dened
by (5.152) are the eigenvalues and eigenvectors, respectively, of matrix
(5.150), for k = 0, 1, 2.
Let us redene matrix D for M = 3 as:
D =
_
_
d
0
d
2
d
1
d
1
d
0
d
2
d
2
d
1
d
0
_
_
(5.161)
We also have:
w(k) =
1
3
_
_
1
e
2j
3
k
e
2j
3
2k
_
_
for k = 0, 1, 2 (5.162)
(k) = d
0
+d
2
e
2j
3
k
+d
1
e
2j
3
2k
for k = 0, 1, 2 (5.163)
We must show that:
Dw(k) = (k)w(k) (5.164)
We compute rst the left-hand side of this expression:
Dw(k) =
_
_
d
0
d
2
d
1
d
1
d
0
d
2
d
2
d
1
d
0
_
_
1
3
_
_
1
e
2j
3
k
e
2j
3
2k
_
_
=
1
3
_
_
_
d
0
+d
2
e
2j
3
k
+d
1
e
2j
3
2k
d
1
+d
0
e
2j
3
k
+d
2
e
2j
3
2k
d
2
+d
1
e
2j
3
k
+d
0
e
2j
3
2k
_
_
_
(5.165)
Constrained matrix inversion 441
We also compute the right-hand side of (5.164):
(k)w(k) =
1
3
_
_
_
d
0
+d
2
e
2j
3
k
+d
1
e
2j
3
2k
d
0
e
2j
3
k
+d
2
e
2j
3
2k
+d
1
e
2j
3
3k
d
0
e
2j
3
2k
+d
2
e
2j
3
3k
+d
1
e
2j
3
4k
_
_
_
(5.166)
If we compare the elements of the matrices on the right-hand sides of expressions
(5.165) and (5.166) one by one, and take into consideration the fact that
e
2jk
= 1 for any integer k (5.167)
and
e
2j
3
4k
= e
2j
3
3k
e
2j
3
k
= e
2jk
e
2j
3
k
= e
2j
3
k
, (5.168)
we see that equation (5.164) is correct.
Example 5.27
Find the inverse of the following matrix:
_
_
_
_
1 0 2 3
3 1 0 2
2 3 1 0
0 2 3 1
_
_
_
_
(5.169)
This is a circulant matrix and according to the notation of equation (5.150), we have
M = 4, d(0) = 1, d(1) = 3, d(2) = 2, d(3) = 0. Then, applying formulae (5.151)
and (5.152), we obtain:
(0) = 1 + 2 + 3 = 4 (0)
1
=
1
4
(5.170)
(1) = 1+2e
2j
4
2
+3e
2j
4
3
=123j =33j
1
(1) =
3+3j
18
=
1+j
6
(2) = 1 + 2e
2j
4
4
+ 3e
2j
4
6
= 1 + 2 3 = 2
1
(2) =
1
2
(3) = 1+2e
2j
4
6
+3e
2j
4
9
=12+3j =3+3j
1
(3) =
33j
18
=
1j
6
442 Image Processing: The Fundamentals
w
T
(0) =
1
2
_
1 1 1 1
_
w
T
(1) =
1
2
_
1 e
2j
4
e
2j
4
2
e
2j
4
3
_
=
1
2
_
1 j 1 j
_
w
T
(2) =
1
2
_
1 e
2j
4
2
e
2j
4
4
e
2j
4
6
_
=
1
2
_
1 1 1 1
_
w
T
(3) =
1
2
_
1 e
2j
4
3
e
2j
4
6
e
2j
4
9
_
=
1
2
_
1 j 1 j
_
(5.171)
We use these vectors to construct matrices W and W
1
and then apply formula
(5.156):
D
1
=
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
_
_
_
_
1
4
0 0 0
0
1+j
6
0 0
0 0
1
2
0
0 0 0
1j
6
_
_
_
_
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
=
1
4
_
_
_
_
1 1 1 1
1 j 1 j
1 1 1 1
1 j 1 j
_
_
_
_
_
_
_
_
_
_
1
4
1
4
1
4
1
4
1+j
6
1+j
6
1j
6
1j
6
1
2
1
2
1
2
1
2
1j
6
1j
6
1+j
6
1+j
6
_
_
_
_
_
_
=
1
4
_
_
_
_
_
_
7
12
13
12
1
12
5
12
5
12
7
12
13
12
1
12
1
12
5
12
7
12
13
12
13
12
1
12
5
12
7
12
_
_
_
_
_
_
=
1
48
_
_
_
_
7 13 1 5
5 7 13 1
1 5 7 13
13 1 5 7
_
_
_
_
(5.172)
Example B5.28
The elements of a matrix W
N
are given by
W
N
(k, n) =
1
N
exp
_
2j
N
kn
_
(5.173)
where k and n take values 0, 1, 2 . . . , N 1. The elements of the inverse
matrix W
1
N
(see example 5.25) are given by:
W
1
N
(k, n) =
1
N
exp
_
2j
N
kn
_
(5.174)
Constrained matrix inversion 443
We dene matrix W as the Kronecker product of W
N
with itself. Show that
the inverse of matrix W is formed by the Kronecker product of matrix W
1
N
with itself.
Let us consider an element W(m, l) of matrix W. We write integers m and l in terms
of their quotients and remainders when divided with N:
m m
1
N +m
2
l l
1
N +l
2
(5.175)
Since W is W
N
W
N
we have:
W(m, l) =
1
N
e
2j
N
m
1
l
1
e
2j
N
m
2
l
2
(5.176)
Indices (m
1
, l
1
) identify the partition of matrix W to which element W(m, l) belongs.
Indices m
2
and l
2
vary inside each partition, taking all their possible values (see gure
5.13).
l
1
m
l
2
m
2
l
2
m
2
l
2
m
2
l
2
m
2
l
2
m
2
l
2
m
2
1
Figure 5.13: There are N N partitions, enumerated by indices (m
1
, l
1
), and inside
each partition there are N N elements, enumerated by indices (m
2
, l
2
).
In a similar way, we can write an element of matrix Z W
1
N
W
1
N
, by writing
t t
1
N +t
2
and n n
1
N +n2:
Z(t, n) =
1
N
e
2j
N
t
1
n
1
e
2j
N
t
2
n
2
(5.177)
An element of the product matrix A WZ is given by:
444 Image Processing: The Fundamentals
A(k, n) =
N
2
1
t=0
W(k, t)Z(t, n)
=
1
N
2
N
2
1
t=0
e
2j
N
k
1
t
1
e
2j
N
k
2
t
2
e
2j
N
t
1
n
1
e
2j
N
t
2
n
2
=
1
N
2
N
2
1
t=0
e
2j
N
(k
1
n
1
)t
1
e
2j
N
(k
2
n
2
)t
2
(5.178)
If we write again t t
1
N + t
2
, we can break the sum over t into two sums, one over
t
1
and one over t
2
:
A(k, n) =
1
N
2
N1
t
1
=0
_
e
2j
N
(k
1
n
1
)t
1
N1
t
2
=0
e
2j
N
(k
2
n
2
)t
2
_
(5.179)
We apply formula (2.164), on page 95, for the inner sum rst, with S N, m t
2
and t k
2
n
2
, and the outer sum afterwards, with S N, m t
1
and t k
1
n
1
:
A(k, n) =
1
N
2
N1
t
1
=0
e
2j
N
(k
1
n
1
)t
1
(k
2
n
2
)N
=
1
N
(k
2
n
2
)N(k
1
n
1
)
= (k
2
n
2
)(k
1
n
1
)
= (k n) (5.180)
Therefore, matrix A has all its elements 0, except the diagonal ones (obtained for
k = n), which are equal to 1. So, A is the unit matrix and this proves that matrix
Z, with elements given by (5.177), is the inverse of matrix W, with elements given by
(5.176).
How do we know that matrix H that expresses the linear degradation process is
block circulant?
We saw in Chapter 1, that matrix H, which corresponds to a shift invariant linear operator
expressed by equation (1.17), on page 13, may be partitioned into submatrices as expressed
by equation (1.39), on page 19.
Let us consider one of the partitions of the partitioned matrix H (see equation (5.149), on
page 437). Inside every partition, the values of l and j remain constant; ie j l is constant
inside each partition. The value of i k along each line runs from i to i N + 1, taking all
Constrained matrix inversion 445
integer values in between. When i is incremented by 1 in the next row, all the values of i k
are shifted by one position to the right (see equations (1.39) and (5.149) ). So, each partition
submatrix of H is characterised by the value of j l u and has a circulant form:
H
u
_
_
_
_
_
_
_
h(0, u) h(N 1, u) h(N 2, u) . . . h(1, u)
h(1, u) h(0, u) h(N 1, u) . . . h(2, u)
h(2, u) h(1, u) h(0, u) . . . h(3, u)
.
.
.
.
.
.
.
.
.
.
.
.
h(N 1, u) h(N 2, u) h(N 3, u) . . . h(0, u)
_
_
_
_
_
_
_
(5.181)
Notice that here we assume that h(v, u) is periodic with period N in each of its arguments,
and so h(1 N, u) = h((1 N) +N, u) = h(1, u) etc.
The full matrix H may be written in the form
H =
_
_
_
_
_
_
_
H
0
H
1
H
2
. . . H
M+1
H
1
H
0
H
1
. . . H
M+2
H
2
H
1
H
0
. . . H
M+3
.
.
.
.
.
.
.
.
.
.
.
.
H
M1
H
M2
H
M3
. . . H
0
_
_
_
_
_
_
_
(5.182)
where again owing to the periodicity of h(v, u), H
1
= H
M1
, H
M+1
= H
1
etc.
How can we diagonalise a block circulant matrix?
Dene a matrix with elements
W
N
(k, n)
1
N
exp
_
2j
N
kn
_
(5.183)
and matrix
W W
N
W
N
(5.184)
where is the Kronecker product of the two matrices (see example 1.26, on page 38). The
inverse of W
N
(k, n) is a matrix with elements:
W
1
N
(k, n) =
1
N
exp
_
2j
N
kn
_
(5.185)
The inverse of W is given by (see example 5.28):
W
1
N
(k, n) = W
1
N
W
1
N
(5.186)
We also dene a diagonal matrix as
446 Image Processing: The Fundamentals
(k, i) =
_
N
2
H
_
mod
N
(k),
_
k
N
__
if i = k
0 if i ,= k
(5.187)
where
H is the discrete Fourier transform of the point spread function h:
H(u, v) =
1
N
2
N1
x=0
N1
y=0
h(x, y)e
2j(
ux
N
+
vy
N
)
(5.188)
It can be shown then, by direct matrix multiplication, that:
H = WW
1
H
1
= W
1
W
1
(5.189)
Thus, H can be inverted easily since it has been written as the product of matrices the
inversion of which is trivial.
Box 5.4. Proof of equation (5.189)
First we have to nd how an element H(f, g) of matrix H is related to the point spread
function h(x, y). Let us write indices f and g as multiples of the dimension N of one of
the partitions, plus a remainder:
f f
1
N +f
2
g g
1
N +g
2
(5.190)
As f and g scan all possible values from 0 to N
2
1 each, we can visualise the N N
partitions of matrix H, indexed by subscript u f
1
g
1
, as follows:
f
1
= 0
g
1
= 0
u = 0
f
1
= 0
g
1
= 1
u = 1
f
1
= 0
g
1
= 2
u = 2
. . .
f
1
= 1
g
1
= 0
u = 1
f
1
= 1
g
1
= 1
u = 0
f
1
= 1
g
1
= 2
u = 1
. . .
. . . . . . . . . . . .
We observe that each partition is characterised by index u = f
1
g
1
and inside each
partition the elements computed from h(x, y) are computed for various values of f
2
g
2
.
We conclude that:
Constrained matrix inversion 447
H(f, g) = h(f
2
g
2
, f
1
g
1
) (5.191)
Let us consider next an element of matrix WW
1
:
A(m, n)
N
2
1
l=0
N
2
1
t=0
W
ml
lt
W
1
tn
(5.192)
Since matrix is diagonal, the sum over t collapses to values of t = l only. Then:
A(m, n) =
N
2
1
l=0
W
ml
ll
W
1
ln
(5.193)
ll
is a scalar and therefore it may change position inside the summand:
A(m, n) =
N
2
1
l=0
W
ml
W
1
ln
ll
(5.194)
In example 5.28 we saw how the elements of matrices W
ml
and W
1
ln
can be written if
we write their indices in terms of their quotients and remainders when divided by N:
m Nm
1
+m
2
l Nl
1
+l
2
n Nn
1
+n
2
(5.195)
Using these expressions and the denition of
ll
as N
2
H(l
2
, l
1
) from equation (5.187),
we obtain:
A(m, n) =
N
2
1
l=0
e
2j
N
m
1
l
1
e
2j
N
m
2
l
2
1
N
2
e
2j
N
l
1
n
1
e
2j
N
l
2
n
2
N
2
H(l
2
, l
1
) (5.196)
On rearranging, we have:
A(m, n) =
N1
l
1
=0
N1
l
2
=0
H(l
2
, l
1
)e
2j
N
(m
1
n
1
)l
1
e
2j
N
(m
2
n
2
)l
2
(5.197)
We recognise this expression as the inverse Fourier transform of
h(m
2
n
2
, m
1
n
1
). Therefore:
A(m, n) = h(m
2
n
2
, m
1
n
1
) (5.198)
By comparing equations (5.191) and (5.198) we can see that the elements of matrices H
and WW
1
have been shown to be identical, and so equation (5.189) has been proven.
448 Image Processing: The Fundamentals
Box 5.5. What is the transpose of matrix H?
We shall show that H
T
= W
W
1
, where
W
1
will be given by an equation similar to (5.197), but instead of having factor
H(l
2
, l
1
), it will have factor
H(l
2
, l
1
), coming from the element of
ll
being dened
in terms of the complex conjugate of the Fourier transform
H(u, v) given by equation
(5.188):
A(m, n) =
N1
l
1
=0
N1
l
2
=0
H(l
2
, l
1
)e
2j
N
(m
1
n
1
)l
1
e
2j
N
(m
2
n
2
)l
2
(5.200)
We change the dummy variables of summation to:
l
1
l
1
and
l
2
l
2
(5.201)
Then:
A(m, n) =
N+1
l
1
=0
N+1
l
2
=0
H(
l
2
,
l
1
)e
2j
N
(m
1
+n
1
)
l
1
e
2j
N
(m
2
+n
2
)
l
2
(5.202)
Since we are dealing with periodic functions summed over a period, the range over which
we sum does not really matter, as long as N consecutive values are considered. Then
we can write:
A(m, n) =
N1
l
1
=0
N1
l
2
=0
H(
l
2
,
l
1
)e
2j
N
(m
1
+n
1
)
l
1
e
2j
N
(m
2
+n
2
)
l
2
(5.203)
We recognise on the right-hand side of the above expression the inverse Fourier transform
of
H(
l
2
,
l
1
), computed at (n
2
m
2
, n
1
m
1
):
A(m, n) = h(n
2
m
2
, n
1
m
1
) (5.204)
By direct comparison with equation (5.199), we prove that matrices H
T
and W
W
1
are equal, element by element.
Constrained matrix inversion 449
Example 5.29
Show that the Laplacian, ie the sum of the second derivatives, of a discrete
image at a pixel position (i, j) may be estimated by:
2
f(i, j) = f(i 1, j) +f(i, j 1) +f(i + 1, j) +f(i, j + 1) 4f(i, j) (5.205)
At inter-pixel position (i + 0.5, j), the rst derivative of the image function along the
i axis is approximated by the rst dierence:
i
f (i + 0.5, j) = f(i + 1, j) f(i, j) (5.206)
Similarly, the rst dierence at (i 0.5, j) along the i axis is:
i
f (i 0.5, j) = f(i, j) f(i 1, j) (5.207)
The second derivative at (i, j) along the i axis may be approximated by the rst dier-
ence of the rst dierences, computed at positions (i + 0.5, j) and (i 0.5, j), that is:
2
i
f(i, j) =
i
f (i + 0.5, j)
i
f (i 0.5, j)
= f(i + 1, j) 2f(i, j) +f(i 1, j) (5.208)
Similarly, the second derivative at (i, j) along the j axis may be approximated by:
2
j
f(i, j) =
j
f (i, j + 0.5)
j
f (i, j 0.5)
= f(i, j + 1) 2f(i, j) +f(i, j 1) (5.209)
Adding equations (5.208) and (5.209) by parts we obtain the result.
Example 5.30
Consider a 3 3 image represented by a column vector f . Identify a 9 9
matrix L such that if we multiply vector f by it, the output will be a vector
with the estimate of the value of the Laplacian at each position. Assume
that image f is periodic in each direction with period 3. What type of
matrix is L?
From example 5.29 we know that the point spread function of the operator that returns
the estimate of the Laplacian at each position is:
450 Image Processing: The Fundamentals
0 1 0
1 4 1
0 1 0
(5.210)
To avoid boundary eects, we rst extend the image in all directions periodically:
f
31
f
32
f
33
f
13
f
21
f
33
_
_
f
11
f
12
f
13
f
21
f
22
f
23
f
31
f
32
f
33
_
_
f
11
f
21
f
31
f
11
f
12
f
13
(5.211)
By observing which values will contribute to the value of the Laplacian at a pixel
position, and with what weight, we construct the 9 9 matrix with which we must
multiply the column vector f to obtain its Laplacian:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
4 1 1 1 0 0 1 0 0
1 4 1 0 1 0 0 1 0
1 1 4 0 0 1 0 0 1
1 0 0 4 1 1 1 0 0
0 1 0 1 4 1 0 1 0
0 0 1 1 1 4 0 0 1
1 0 0 1 0 0 4 1 1
0 1 0 0 1 0 1 4 1
0 0 1 0 0 1 1 1 4
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
f
12
f
22
f
32
f
13
f
23
f
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.212)
This matrix is a block circulant matrix with easily identiable partitions of size 3 3.
Example B5.31
Using the matrix dened in example 5.30, estimate the Laplacian of the
following image:
_
_
3 2 1
2 0 1
0 0 1
_
_
(5.213)
Then re-estimate the Laplacian of the above image using the formula of
example 5.29.
Constrained matrix inversion 451
_
_
_
_
_
_
_
_
_
_
_
_
_
_
4 1 1 1 0 0 1 0 0
1 4 1 0 1 0 0 1 0
1 1 4 0 0 1 0 0 1
1 0 0 4 1 1 1 0 0
0 1 0 1 4 1 0 1 0
0 0 1 1 1 4 0 0 1
1 0 0 1 0 0 4 1 1
0 1 0 0 1 0 1 4 1
0 0 1 0 0 1 1 1 4
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
3
2
0
2
0
0
1
1
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
7
4
6
4
5
3
3
0
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.214)
If we use the formula, we need to augment rst the image by writing explicitly the
boundary pixels:
0 0 1
1
1
1
_
_
3 2 1
2 0 1
0 0 1
_
_
3
2
0
(5.215)
3 2 1
The Laplacian is:
_
_
1 + 2 + 2 4 3 3 + 1 4 2 1 + 2 + 1 + 3 4 1
3 + 1 4 2 2 + 2 + 1 1 + 1 + 2 4 1
2 + 1 + 3 2 + 1 1 + 1 4 1
_
_
=
_
_
7 4 3
4 5 0
6 3 2
_
_
(5.216)
Note that we obtain the same answer, whether we use the local formula or matrix
multiplication.
Example B5.32
Find the eigenvalues and eigenvectors of the matrix worked out in example
5.30.
Matrix L worked out in example 5.30 is:
452 Image Processing: The Fundamentals
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
4 1 1 1 0 0 1 0 0
1 4 1 0 1 0 0 1 0
1 1 4 0 0 1 0 0 1
1 0 0 4 1 1 1 0 0
0 1 0 1 4 1 0 1 0
0 0 1 1 1 4 0 0 1
1 0 0 1 0 0 4 1 1
0 1 0 0 1 0 1 4 1
0 0 1 0 0 1 1 1 4
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.217)
This matrix is a block circulant matrix with easily identiable 3 3 partitions. To
nd its eigenvectors, we rst use equation (5.152), on page 438, for M = 3 to dene
vectors w:
w(0) =
1
3
_
_
1
1
1
_
_
w(1) =
1
3
_
_
1
e
2j
3
e
4j
3
_
_
w(2) =
1
3
_
_
1
e
4j
3
e
8j
3
_
_
(5.218)
These vectors are used as columns to construct the matrix dened by equation (5.183):
W
3
=
1
3
_
_
1 1 1
1 e
2j
3
e
4j
3
1 e
4j
3
e
8j
3
_
_
(5.219)
We take the Kronecker product of this matrix with itself to create matrix W as dened
by equation (5.184), on page 445:
W =
1
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 1
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 1 1 e
2j
3
e
2j
3
e
2j
3
e
4j
3
e
4j
3
e
4j
3
1 e
2j
3
e
4j
3
e
2j
3
e
4j
3
e
6j
3
e
4j
3
e
6j
3
e
8j
3
1 e
4j
3
e
8j
3
e
2j
3
e
6j
3
e
10j
3
e
4j
3
e
8j
3
e
12j
3
1 1 1 e
4j
3
e
4j
3
e
4j
3
e
8j
3
e
8j
3
e
8j
3
1 e
2j
3
e
4j
3
e
4j
3
e
6j
3
e
8j
3
e
8j
3
e
10j
3
e
12j
3
1 e
4j
3
e
8j
3
e
4j
3
e
8j
3
e
12j
3
e
8j
3
e
12j
3
e
16j
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.220)
The columns of this matrix are the eigenvectors of matrix L. These eigenvectors are
the same for all block circulant matrices with the same structure, independent of what
the exact values of the elements are. The inverse of matrix W can be constructed using
equation (5.185), on page 445, ie by taking the complex conjugate of matrix W. (Note
that for a general unitary matrix we must take the complex conjugate of its transpose
Constrained matrix inversion 453
in order to construct its inverse. This is not necessary here as W is a symmetric
matrix and therefore it is equal to its transpose.)
W
1
=
1
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 1
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 1 1 e
2j
3
e
2j
3
e
2j
3
e
4j
3
e
4j
3
e
4j
3
1 e
2j
3
e
4j
3
e
2j
3
e
4j
3
e
6j
3
e
4j
3
e
6j
3
e
8j
3
1 e
4j
3
e
8j
3
e
2j
3
e
6j
3
e
10j
3
e
4j
3
e
8j
3
e
12j
3
1 1 1 e
4j
3
e
4j
3
e
4j
3
e
8j
3
e
8j
3
e
8j
3
1 e
2j
3
e
4j
3
e
4j
3
e
6j
3
e
8j
3
e
8j
3
e
10j
3
e
12j
3
1 e
4j
3
e
8j
3
e
4j
3
e
8j
3
e
12j
3
e
8j
3
e
12j
3
e
16j
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.221)
The eigenvalues of matrix L may be computed from its Fourier transform, using equa-
tion (5.187), on page 446. First, however, we need to identify the kernel l(x, y) of the
operator represented by matrix L and take its Fourier transform
L(u, v) using equation
(5.188), on page 446. From example 5.30 we know that the kernel function is:
0 1 0
1 4 1
0 1 0
(5.222)
We can identify then the following values for the discrete function l(x, y):
l(0, 0) = 4, l(1, 1) = 0, l(1, 0) = 1, l(1, 1) = 0
l(0, 1) = 1, l(0, 1) = 1, l(1, 1) = 0, l(1, 0) = 1, l(1, 1) = 0 (5.223)
However, these values cannot be directly used in equation (5.188), which assumes a
function h(x, y) dened with positive values of its arguments only. We therefore need
a shifted version of our kernel, one that puts the value 4 at the top left corner of the
matrix representation of the kernel. We can obtain such a version by reading the rst
column of matrix L and wrapping it around to form a 3 3 matrix:
4 1 1
1 0 0
1 0 0
(5.224)
Then we have:
l(0, 0) = 4, l(0, 1) = 1, l(0, 2) = 1, l(1, 0) = 1
l(2, 0) = 1, l(1, 1) = 0, l(1, 2) = 0, l(2, 1) = 0, l(2, 2) = 0 (5.225)
We can use these values in equation (5.188) to derive:
L(u, v) =
1
9
_
4 +e
2j
3
v
+e
2j
3
2v
+e
2j
3
u
+e
2j
3
2u
_
(5.226)
454 Image Processing: The Fundamentals
Formula (5.187) says that the eigenvalues of matrix L, which appear along the diagonal
of matrix (k, i), are the values of the Fourier transform
L(u, v), computed for u =
mod
3
(k) and v =
_
k
3
_
, where k = 0, 1, . . . , 8. These values may be computed using
formula (5.226):
L(0, 0) = 0
L(0, 1) =
1
9
_
4 +e
2j
3
+e
4j
3
+ 1 + 1
_
=
1
9
[2 2 cos 60
] =
1
3
L(0, 2) =
1
9
_
4 +e
4j
3
+e
8j
3
+ 2
_
=
1
9
[2 +e
4j
3
+e
2j
3
] =
1
3
L(1, 0) =
L(0, 1) = 1
L(1, 1) =
1
9
_
4 + 2e
2j
3
+ 2e
4j
3
_
=
1
9
[4 4 cos 60
] =
2
3
L(1, 2) =
1
9
_
4 +e
4j
3
+e
8j
3
+e
2j
3
+e
4j
3
_
=
2
3
L(2, 0) =
L(0, 2) =
1
3
L(2, 1) =
L(1, 2) =
2
3
L(2, 2) =
1
9
_
4 + 2e
4j
3
+ 2e
8j
3
_
=
2
3
(5.227)
Here we made use of the following:
e
2j
3
= cos 60
j sin 60
=
1
2
j
3
2
e
4j
3
= cos 60
+j sin 60
=
1
2
+j
3
2
e
6j
3
= 1
e
8j
3
= e
2j
3
= cos 60
j sin 60
=
1
2
j
3
2
(5.228)
Note that the rst eigenvalue of matrix L is 0. This means that matrix L is singular,
and even though we can diagonalise it using equation (5.189), we cannot invert it
by taking the inverse of this equation. This should not be surprising as matrix L
expresses the Laplacian operator on an image, and we know that from the knowledge
of the Laplacian alone we can never recover the original image.
Applying equation (5.187) we dene matrix
L
for L to be:
L
=
_
_
0 0 0 0 0 0 0 0 0
0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0
0 0 0 3 0 0 0 0 0
0 0 0 0 6 0 0 0 0
0 0 0 0 0 6 0 0 0
0 0 0 0 0 0 3 0 0
0 0 0 0 0 0 0 6 0
0 0 0 0 0 0 0 0 6
_
_
(5.229)
Constrained matrix inversion 455
Having dened matrices W, W
1
and we can then write:
L = W
L
W
1
(5.230)
This equation may be conrmed by direct substitution. First we compute matrix
L
W
1
:
_
_
0 0 0 0 0 0 0 0 0
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
2j
3
e
4j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 e
4j
3
e
8j
3
1 1 1 e
2j
3
e
2j
3
e
2j
3
e
4j
3
e
4j
3
e
4j
3
2 2e
2j
3
2e
4j
3
2e
2j
3
2e
4j
3
2e
6j
3
2e
4j
3
2e
6j
3
2e
8j
3
2 2e
4j
3
2e
8j
3
2e
2j
3
2e
6j
3
2e
10j
3
2e
4j
3
2e
8j
3
2e
12j
3
1 1 1 e
4j
3
e
4j
3
e
4j
3
e
8j
3
e
8j
3
e
8j
3
2 2e
2j
3
2e
4j
3
2e
4j
3
2e
6j
3
2e
8j
3
2e
8j
3
2e
10j
3
2e
12j
3
2 2e
4j
3
2e
8j
3
2e
4j
3
2e
8j
3
2e
12j
3
2e
8j
3
2e
12j
3
2e
16j
3
_
_
If we take into consideration that
e
10j
3
= e
4j
3
= cos 60
+j sin 60
=
1
2
+j
3
2
e
12j
3
= 1
e
16j
3
= e
4j
3
= cos 60
+j sin 60
=
1
2
+j
3
2
(5.231)
and multiply the above matrix with W from the left, we recover matrix L.
How can we overcome the extreme sensitivity of matrix inversion to noise?
We can do it by imposing a smoothness constraint to the solution, so that it does not uctuate
too much. Let us say that we would like the second derivative of the reconstructed image to
be small overall. At each pixel, the sum of the second derivatives of the image along each
axis, known as the Laplacian, may be approximated by
2
f(i, k) given by equation (5.205)
derived in example 5.29. The constraint we choose to impose then is for the sum of the
squares of the Laplacian values at each pixel position to be minimal:
N1
k=0
N1
i=0
_
2
f(i, k)
2
= minimal (5.232)
The value of the Laplacian at each pixel position may be computed by using the Laplacian
operator which has the form of an N
2
N
2
matrix acting on column vector f (of size N
2
1),
456 Image Processing: The Fundamentals
Lf . Lf is a vector. The sum of the squares of its elements are given by (Lf )
T
Lf . The
constraint then is:
(Lf )
T
Lf = minimal (5.233)
How can we incorporate the constraint in the inversion of the matrix?
Let us write again in matrix form the equation we want to solve for f :
g = Hf + (5.234)
We assume that the noise vector is not known but some of its statistical properties are
known; say we know that:
T
= (5.235)
This quantity is related to the variance of the noise and it could be estimated from the image
itself using areas of uniform brightness only. If we substitute from (5.234) into (5.235), we
have:
(g Hf )
T
(g Hf ) = (5.236)
The problem then is to minimise (5.233) under the constraint (5.236). The solution of
this problem is a lter with Fourier transform (see Box 5.6, on page 459, and example 5.36):
M(u, v) =
H
(u, v)
[
H(u, v)[
2
+[
L(u, v)[
2
(5.237)
By multiplying numerator and denominator with
H(u, v), we can bring this lter into a
form directly comparable with the inverse and the Wiener lters:
M(u, v) =
1
H(u, v)
[
H(u, v)[
2
[
H(u, v)[
2
+[
L(u, v)[
2
(5.238)
Here is a constant and
L(u, v) is the Fourier transform of an N N matrix L, with the
following property: if we use it to multiply the image (written as a vector) from the left,
the output will be an array, the same size as the image, with an estimate of the value of the
Laplacian at each pixel position. The role of parameter is to strike the balance between
smoothing the output and paying attention to the data.
Example B5.33
If f is an N 1 real vector and A is an N N matrix, show that
f
T
Af
f
= (A+A
T
)f (5.239)
Constrained matrix inversion 457
Using the results of example 3.65, on page 269, we can easily see that:
f
T
Af
f
=
f
T
(Af )
f
+
(f
T
A)f
f
= Af +
(A
T
f )
T
f
f
= Af +A
T
f = (A+A
T
)f (5.240)
Here we made use of the fact that Af and A
T
f are vectors.
Example B5.34
If g is the column vector that corresponds to a 3 3 image G and matrix
W
1
is dened as in example 5.28 for N = 3, show that vector W
1
g is
proportional to the discrete Fourier transform
G of G.
Assume that:
G =
_
_
g
11
g
12
g
13
g
21
g
22
g
23
g
31
g
32
g
33
_
_
and W
1
3
=
1
3
_
_
1 1 1
1 e
2j
3
e
2j
3
2
1 e
2j
3
2
e
2j
3
_
_
(5.241)
Then:
W
1
= W
1
3
W
1
3
=
1
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 1
1 e
2j
3
e
2j
3
2
1 e
2j
3
e
2j
3
2
1 e
2j
3
e
2j
3
2
1 e
2j
3
2
e
2j
3
1 e
2j
3
2
e
2j
3
1 e
2j
3
2
e
2j
3
1 1 1 e
2j
3
e
2j
3
e
2j
3
e
2j
3
2
e
2j
3
2
e
2j
3
2
1 e
2j
3
e
2j
3
2
e
2j
3
e
2j
3
2
e
2j
3
3
e
2j
3
2
e
2j
3
3
e
2j
3
4
1 e
2j
3
2
e
2j
3
e
2j
3
e
2j
3
3
e
2j
3
2
e
2j
3
2
e
2j
3
4
e
2j
3
3
1 1 1 e
2j
3
2
e
2j
3
2
e
2j
3
2
e
2j
3
e
2j
3
e
2j
3
1 e
2j
3
e
2j
3
2
e
2j
3
2
e
2j
3
3
e
2j
3
4
e
2j
3
e
2j
3
2
e
2j
3
3
1 e
2j
3
2
e
2j
3
e
2j
3
2
e
2j
3
4
e
2j
3
3
e
2j
3
e
2j
3
3
e
2j
3
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.242)
If we use e
2j
3
3
= e
2j
= 1 and e
2j
3
4
= e
2j
3
3
e
2j
3
= e
2j
3
, this matrix simpli-
458 Image Processing: The Fundamentals
es somehow. So we get:
W
1
g =
1
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 1 1 1 1 1 1 1
1 e
2j
3
e
2j
3
2
1 e
2j
3
e
2j
3
2
1 e
2j
3
e
2j
3
2
1 e
2j
3
2
e
2j
3
1 e
2j
3
2
e
2j
3
1 e
2j
3
2
e
2j
3
1 1 1 e
2j
3
e
2j
3
e
2j
3
e
2j
3
2
e
2j
3
2
e
2j
3
2
1 e
2j
3
e
2j
3
2
e
2j
3
e
2j
3
2
1 e
2j
3
2
1 e
2j
3
1 e
2j
3
2
e
2j
3
e
2j
3
1 e
2j
3
2
e
2j
3
2
e
2j
3
1
1 1 1 e
2j
3
2
e
2j
3
2
e
2j
3
2
e
2j
3
e
2j
3
e
2j
3
1 e
2j
3
e
2j
3
2
e
2j
3
2
1 e
2j
3
e
2j
3
e
2j
3
2
1
1 e
2j
3
2
e
2j
3
e
2j
3
2
e
2j
3
1 e
2j
3
1 e
2j
3
2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
g
11
g
21
g
31
g
12
g
22
g
32
g
13
g
23
g
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
1
3
_
_
_
_
_
_
_
_
_
_
g
11
+g
21
+g
31
+g
12
+g
22
+g
32
+g
13
+g
23
+g
33
g
11
+g
21
e
2j
3
+g
31
e
2j
3
2
+g
12
+g
22
e
2j
3
+g
32
e
2j
3
2
+g
13
+g
23
e
2j
3
+g
33
e
2j
3
2
.
.
.
g
11
+g
21
+g
31
+g
12
e
2j
3
2
+g
22
e
2j
3
2
+g
32
e
2j
3
2
+g
13
e
2j
3
+g
23
e
2j
3
+g
33
e
2j
3
.
.
.
_
_
_
_
_
_
_
_
_
_
(5.243)
Careful examination of the elements of this vector shows that they are the Fourier
components of G, multiplied with 3, computed at various combinations of frequencies
(u, v), for u = 0, 1, 2 and v = 0, 1, 2, and arranged as follows:
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
G(0, 0)
G(1, 0)
G(2, 0)
G(0, 1)
G(1, 1)
G(2, 1)
G(0, 2)
G(1, 2)
G(2, 2)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.244)
This shows that W
1
g yields N times the Fourier transform of G, as a column vector.
Constrained matrix inversion 459
Example B5.35
Show that, if matrix is dened by equation (5.187), then
is a diagonal
matrix with its k
th
element along the diagonal being N
4
[
H(k
2
, k
1
)[
2
, where
k
2
mod
N
(k) and k
1
_
k
N
_
.
From the denition of , equation (5.187), we can write:
=
_
_
_
_
_
_
_
N
2
H(0, 0) 0 0 . . . 0
0 N
2
H(1, 0) 0 . . . 0
0 0 N
2
H(2, 0) . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . N
2
H(N 1, N 1)
_
_
_
_
_
_
_
(5.245)
Then:
=
_
_
_
_
_
_
_
N
2
H
(0, 0) 0 0 . . . 0
0 N
2
H
(1, 0) 0 . . . 0
0 0 N
2
H
(2, 0) . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . N
2
H
(N 1, N 1)
_
_
_
_
_
_
_
(5.246)
Obviously:
=
_
_
_
_
_
_
_
_
_
N
4
[
H(0, 0)[
2
0 0 . . . 0
0 N
4
[
H(1, 0)[
2
0 . . . 0
0 0 N
4
[
H(2, 0)[
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . N
4
[
H(N 1, N 1)[
2
_
_
_
_
_
_
_
_
_
(5.247)
Box 5.6. Derivation of the constrained matrix inversion lter
We must nd the solution of the problem: minimise (Lf )
T
Lf with the constraint:
[g Hf ]
T
[g Hf ] = (5.248)
460 Image Processing: The Fundamentals
According to the method of Lagrange multipliers (see Box 3.8, on page 268), the
solution must satisfy
f
_
f
T
L
T
Lf +(g Hf )
T
(g Hf )
_
= 0 (5.249)
where is a constant. This dierentiation is with respect to a vector and it will yield
a system of N
2
equations (one for each component of vector f ) which, with equation
(5.248), form a system of N
2
+1 equations, for the N
2
+1 unknowns: the N
2
components
of f plus .
If a is a vector and b another one, then it can be shown (example 3.65, page 267) that:
f
T
a
f
= a (5.250)
b
T
f
f
= b (5.251)
Also, if A is an N
2
N
2
square matrix, then (see example 5.33, on page 456):
f
T
Af
f
= (A+A
T
)f (5.252)
We apply equations (5.250), (5.251) and (5.252) to (5.249) to perform the dierentiation:
f
_
f
T
(L
T
L)f
. .
eqn(5.252)
with A L
T
L
+(g
T
g g
T
Hf
. .
eqn(5.251)
with b H
T
g
f
T
H
T
g
. .
eqn(5.250)
with a H
T
g
+ f
T
H
T
Hf
. .
eqn(5.252) with
A H
T
H
)
_
= 0
(2L
T
L)f +(H
T
g H
T
g + 2H
T
Hf ) = 0
(H
T
H +L
T
L)f = H
T
g (5.253)
Here
1
H
W
1
L = W
L
W
1
L
T
= W
L
W
1
(5.255)
Then:
H
T
H +L
T
L = W
H
W
1
W
H
W
1
+W
L
W
1
W
L
W
1
= W
H
W
1
+W
L
W
1
= W(
H
+
L
)W
1
(5.256)
We substitute from (5.255) and (5.256) into (5.253) to obtain:
W(
H
+
L
)W
1
f = W
H
W
1
g (5.257)
First we multiply both sides of the equation from the left with W
1
, to get:
(
H
+
L
)W
1
f =
H
W
1
g (5.258)
Notice that as
H
,
H
and
L
are diagonal matrices, this equation expresses
a relationship between the corresponding elements of vectors W
1
f and W
1
g one by
one.
Applying the result of example 5.35, we may write
H
= N
4
[
H(u, v)[
2
and
L
= N
4
[
L(u, v)[
2
(5.259)
where
L(u, v) is the Fourier transform of matrix L. Also, by applying the results of
example 5.34, we may write:
W
1
f = N
F(u, v) and W
1
g = N
G(u, v) (5.260)
Finally, we replace
H
by its denition, equation (5.187), so that (5.258) becomes:
N
4
_
[
H(u, v)[
2
+[
L(u, v)[
2
_
N
F(u, v) = N
2
H
(u, v)N
G(u, v)
N
2
[
H(u, v)[
2
+[
L(u, v)[
2
(u, v)
F(u, v) =
G(u, v) (5.261)
Note that when we work fully in the discrete domain, we have to use the form of the
convolution theorem that applies to DFTs (see equation (2.208), on page 108). Then
the correct form of equation (5.3), on page 396, is
G(u, v) = N
2
H(u, v)
G(u, v). This
means that the lter with which we have to multiply the DFT of the degraded image
in order to obtain the DFT of the original image is given by equation (5.237).
462 Image Processing: The Fundamentals
What is the relationship between the Wiener lter and the constrained matrix
inversion lter?
Both lters look similar (see equations (5.125) and (5.238)), but they dier in many ways.
1. The Wiener lter is designed to optimise the restoration in an average statistical sense
over a large ensemble of similar images. The constrained matrix inversion deals with
one image only and imposes constraints on the solution sought.
2. The Wiener lter is based on the assumption that the random elds involved are homo-
geneous with known spectral densities. In the constrained matrix inversion it is assumed
that we know only some statistical property of the noise.
In the constrained matrix restoration approach, various lters may be constructed using
the same formulation, by simply changing the smoothing criterion. For example, one may try
to minimise the sum of the squares of the rst derivatives at all positions as opposed to the
second derivatives. The only dierence from formula (5.237) will be in matrix L.
Example B5.37
Calculate the DFT of the N N matrix L
dened as:
L
_
_
_
_
_
_
_
_
_
4 1 1 . . . 1
1 0 0 . . . 0
0 0 0 . . . 0
.
.
.
.
.
.
.
.
. . . .
.
.
.
0 0 0 . . . 0
1 0 0 . . . 0
_
_
_
_
_
_
_
_
_
(5.262)
By applying formula (5.188) for L
(u, v) =
1
N
2
N1
x=0
N1
y=0
L
(x, y)e
2j(
ux
N
+
vy
N
)
=
1
N
2
_
4 +
N1
x=1
e
2j
ux
N
+e
2j
v
N
+e
2j
(N1)v
N
_
=
1
N
2
_
4 +
N1
x=0
e
2j
ux
N
1 +e
2j
v
N
+e
2j
(N1)v
N
_
=
1
N
2
_
5 +N(u) +e
2j
v
N
+e
2j
(N1)v
N
_
(5.263)
Here we made use of the geometric progression formula (2.165), on page 95.
Constrained matrix inversion 463
Example B5.38
Calculate the magnitude of the DFT of matrix L
dened by equation
(5.262).
The real and the imaginary parts of the DFT computed in example 5.37 are:
L
1
(m, n)
1
N
2
_
5 +N(m) + cos
2n
N
+ cos
2(N 1)n
N
_
L
2
(m, n)
1
N
2
_
sin
2n
N
sin
2(N 1)n
N
_
(5.264)
Then:
L
1
(m, n) =
_
_
1
N
2
_
N 5 + cos
2n
N
+ cos
2(N1)n
N
_
m = 0, n = 0, 1, . . . , N 1
1
N
2
_
5 + cos
2n
N
+ cos
2(N1)n
N
_
m ,= 0, n = 0, 1, . . . , N 1
(5.265)
Then:
L
2
1
(0, n) +L
2
2
(0, n) =
1
N
4
_
(N 5)
2
+ 2 + 2(N 5) cos
2n
N
+2(N 5) cos
2(N 1)n
N
+ 2 cos
2n
N
cos
2(N 1)n
N
+2 sin
2n
N
sin
2(N 1)n
N
_
=
1
N
4
_
(N 5)
2
+ 2(N 5) cos
2n
N
+ 2(N 5) cos
2(N 1)n
N
+2 cos
2(N 2)n
N
_
(5.266)
And:
L
2
1
(m, n)+L
2
2
(m, n)=
1
N
4
_
25 + 2 10 cos
2n
N
10 cos
2(N 1)n
N
+ 2 cos
2(N 2)n
N
_
(5.267)
464 Image Processing: The Fundamentals
How do we apply constrained matrix inversion in practice?
Apply the following algorithm.
Step 0: Select a smoothing operator and compute [
L(u, v)[
2
. If you select to use the Lapla-
cian, use formulae (5.266) and (5.267) to compute [
L(u, v)[
2
.
Step 1: Select a value for parameter . It has to be higher for higher levels of noise in
the image. The rule of thumb is that should be selected such that the two terms in the
denominator of (5.237) are roughly of the same order of magnitude.
Step 2: Compute the mean grey value of the degraded image.
Step 3: Compute the DFT of the degraded image.
Step 4: Multiply the DFT of the degraded image with function
M(u, v) of equation (5.237),
point by point.
Step 5: Take the inverse DFT of the result.
Step 6: Add the mean grey value of the degraded image to all elements of the result, to
obtain the restored image.
Example 5.39
Restore the images of gures 5.5a, on page 414, and 5.9a and 5.9b, on page
418, using constrained matrix inversion.
We must rst dene matrix
L(u, v) which expresses the constraint.
Following the steps of example 5.29, on page 449, we can see that matrix L(i, j), with
which we have to multiply an N N image in order to obtain the value of the
Laplacian at each position, is given by an N
2
N
2
matrix of the following structure:
NxN
matrix L
NxN
matrix L
NxN
matrix L
...
...
...
.
.
.
.
.
.
.
.
.
N-1
unit
matrices
NxN
N-1 unit matrices NxN
Matrix
L has the following form:
Constrained matrix inversion 465
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
4 1
1 4
0 1
0 0
0 0
.
.
.
.
.
.
0 0
1 0
N3 zeros
..
0 0 . . . 0
1 0 . . . 0
4 1 . . . 0
1 4 . . . 0
0 1 . . . 0
.
.
.
.
.
. . . .
.
.
.
0 0 . . . 4
0 0 . . . 1
1
0
0
0
0
.
.
.
1
4
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.268)
To form the kernel we require, we must take the rst column of matrix L and wrap it
to form an N N matrix. The rst column of matrix L consists of the rst column
of matrix
L (N elements) plus the rst columns of N 1 unit matrices of size N N.
These N
2
elements have to be written as N columns of size N next to each other, to
form an N N matrix L
, say:
L
=
_
_
_
_
_
_
_
_
_
4 1 1 . . . 1
1 0 0 . . . 0
0 0 0 . . . 0
.
.
.
.
.
.
.
.
. . . .
.
.
.
0 0 0 . . . 0
1 0 0 . . . 0
_
_
_
_
_
_
_
_
_
(5.269)
It is the Fourier transform of this matrix that appears in the constrained matrix inver-
sion lter. This Fourier transform may be computed analytically easily (see examples
5.37 and 5.38). Note that
[
L(m, n)[
2
= L
2
1
(m, n) +L
2
2
(m, n) (5.270)
where
L is the Fourier transform of L
_
15131 + 246 cos
2n
128
+ 246 cos
2127n
128
+ 2 cos
2126n
128
m = 0
n = 0, 1, . . . , 127
27 10 cos
2n
128
10 cos
2127n
128
+ 2 cos
2126n
128
m = 1, 2, . . . , 127
n = 0, 1, . . . , 127
(5.271)
The frequency response function of the lter we must use is then given by substituting
the frequency response function (5.50), on page 410, into equation (5.237):
466 Image Processing: The Fundamentals
M(m, n) =
1
i
T
sin
m
N
sin
i
T
m
N
sin
2
i
T
m
N
i
2
T
sin
2 m
N
+A(m, n)
e
j
m
N
(i
T
1)
(5.272)
For m = 0 we must use:
M(0, n) =
1
1 +A(0, n)
for 0 n N 1 (5.273)
Note from equation (5.271) that A(0, 0) is much larger than 1, making the dc com-
ponent of lter
M virtually 0. So, when we multiply the DFT of the degraded image
with
M, we kill its dc component. This is because the constraint we have imposed did
not have a dc component. To restore, therefore, the dc component of the image, after
ltering, we have to compute the dc component of the input image and add it to the
result, before we visualise it as an image.
Working as for the case of the Wiener ltering, we can work out that the real and
imaginary parts of the Fourier transform of the original image are given by:
F
1
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
_
G
1
(m, n) cos
(i
T
1)m
N
G
2
(m, n) sin
(i
T
1)m
N
_
sin
2 i
T
m
N
+A(m, n)i
2
T
sin
2 m
N
F
2
(m, n)=
i
T
sin
m
N
sin
i
T
m
N
_
G
1
(m, n) sin
(i
T
1)m
N
+G
2
(m, n) cos
(i
T
1)m
N
_
sin
2 i
T
m
N
+A(m, n)i
2
T
sin
2 m
N
(5.274)
These formulae are valid for 0 < m N 1 and 0 n N 1. For m = 0 we must
use formulae:
F
1
(0, n) =
G
1
(0, n)
1 +A(0, n)
F
2
(0, n) =
G
2
(0, n)
1 +A(0, n)
(5.275)
If we take the inverse Fourier transform using functions F
1
(m, n) and F
2
(m, n) as the
real and the imaginary parts, and add the dc component, we obtain the restored image.
The results of restoring images 5.5b, 5.9a and 5.9b are shown in gure 5.14. Note
that dierent values of , ie dierent levels of smoothing, have to be used for dierent
levels of noise in the image.
Constrained matrix inversion 467
Input: image 5.5b Input: image 5.9a Input: image 5.9b
= 0.001, MSE = 1749 = 0.001, MSE = 3186 = 0.001, MSE = 6489
= 0.002, MSE = 1617 = 0.004, MSE = 1858 = 0.006, MSE = 2312
= 0.005, MSE = 1543 = 0.007, MSE = 1678 = 0.010, MSE = 1934
= 0.01, MSE = 1530 = 0.02, MSE = 1593 = 0.0999, MSE = 2144
Figure 5.14: Image restoration with constrained matrix inversion.
468 Image Processing: The Fundamentals
5.4 Inhomogeneous linear image restora-
tion: the whirl transform
How do we model the degradation of an image if it is linear but inhomogeneous?
In the general case, equation (1.15), on page 13, applies:
g(i, j) =
N
k=1
N
l=1
f(k, l)h(k, l, i, j) (5.276)
We have shown in Chapter 1 that this equation can be written in matrix form (see equation
(1.38), on page 19):
g = Hf (5.277)
For inhomogeneous linear distortions, matrix H is not circulant or block circulant. In order
to solve system (5.277) we can no longer use ltering. Instead, we must solve it by directly
inverting matrix H. However, this will lead to a noisy solution, so some regularisation process
must be included.
Example 5.40
In a notorious case in 2007, a criminal was putting on the Internet images
of himself while committing crimes, with his face scrambled in a whirl
pattern. Work out the distortion he might have been applying to the
subimage of his face.
First we have to create a whirl scanning pattern. This may be given by coordinates
(x, y) dened as
x(t) = x
0
+t cos(t)
y(t) = y
0
+t sin(t) (5.278)
where (x
0
, y
0
) is the eye of the whirl, ie its starting point, t is a parameter incre-
mented along the scanning path, and and are parameters that dene the exact
shape of the whirl. For example, for a tight whirl pattern, must be small. The in-
teger coordinates (i, j) of the image that will make up the scanning path will be given
by
i = i
0
+t cos(t) + 0.5|
j = j
0
+t sin(t) + 0.5| (5.279)
Inhomogeneous linear image restoration 469
where (i
0
, j
0
) are the coordinates of the starting pixel, and and are chosen to be
much smaller than 1. Parameter t is allowed to take positive integer values starting
from 0. Once we have the sequence of pixels that make up the scanning pattern, we may
smear their values by, for example, averaging the previous K values and assigning the
result to the current pixel of the scanning sequence. For example, if the values of three
successive pixels are averaged and assigned to the most recent pixel in the sequence,
(K = 3), the values of the scrambled image g will be computed according to:
g (i
0
+t cos(t) + 0.5| , j
0
+t sin(t) + 0.5|) =
1
3
g (i
0
+(t 2) cos[(t 2)] + 0.5| , j
0
+(t 2) sin[(t 2)] + 0.5|) +
g (i
0
+(t 1) cos[(t 1)] + 0.5| , j
0
+(t 1) sin[(t 1)] + 0.5|) +
g (i
0
+t cos[t] + 0.5| , j
0
+t sin[t] + 0.5|) (5.280)
Example 5.41
Use the scrambling pattern of example 5.40 to work out the elements of
matrix H with which one should operate on an M N image in order to
scramble it in a whirl-like way. Assume that in the scrambling pattern the
values of K + 1 successive pixels are averaged.
Remember that the H mapping should be of size MN MN, because it will operate
on the image written as a column vector with its columns written one under the other.
To compute the elements of this matrix we apply the following algorithm.
Step 1: Create an array H of size MN MN with all its elements 0.
Step 2: Choose (i
0
, j
0
) to be the coordinates of a pixel near the centre of the image.
Step 3: Select values for and , say = 0.1 and =
2
360
5. Select the maximum
value of t you will use, say t
max
= 10, 000.
Step 4: Create a 1D array S of MN samples, all with ag 0. This array will be used
to keep track which rows of matrix H have all their elements 0.
Step 5: Starting with t = 0 and carrying on with t = 1, 2, . . . , t
max
, or until all
elements of array S have their ag down, perform the following computations.
Compute the indices of the 2D image pixels that will have to be mixed to yield the
value of output pixel (i
c
, j
c
):
i
c
= i
0
+t cos(t) + 0.5| j
c
= j
0
+t sin(t) + 0.5|
i
1
= i
0
+(t 1) cos[(t 1)] + 0.5| j
1
= j
0
+(t 1) sin[(t 1)] + 0.5|
470 Image Processing: The Fundamentals
i
2
= i
0
+(t 2) cos[(t 2)] + 0.5| j
2
= j
0
+(t 2) sin[(t 2)] + 0.5|
. . .
i
K
= i
0
+(t K) cos[(t K)] + 0.5| j
K
= j
0
+(t K) sin[(t K)] + 0.5|
(5.281)
In the above we must make sure that the values of the coordinates do not go out of
range, ie i
x
should take values between 1 and M and j
x
should take values between 1
and N. To ensure that, we use
i
k
= mini
k
, M
i
k
= maxi
k
, 1
j
k
= minj
k
, N
j
k
= maxj
k
, 1
for every k = 1, 2, . . . , K.
Step 6: Convert the coordinates computed in Step 5 into the indices of the column
vector we have created from the input image by writing its columns one under the
other. Given that a column has M elements indexed by i, with rst value 1, the pixels
identied in (5.281) will have the following indices in the column image:
I
c
= (i
c
1)M +j
c
I
1
= (i
1
1)M +j
1
I
2
= (i
2
1)M +j
2
. . .
I
K
= (i
K
1)M +j
K
(5.282)
Step 7: If S(I
c
) = 0, we proceed to apply (5.284).
If S(I
c
) ,= 0, the elements of the I
c
row of matrix H have already been computed. We
wish to retain, however, the most recent scrambling values, so we set them all again
to 0:
H(I
c
, J) = 0 for all J = 1, 2, . . . , MN (5.283)
Then we proceed to apply (5.284):
S(I
c
) = 1
H(I
c
, I
c
) = H(I
c
, I
1
) = H(I
c
, I
2
) = H(I
c
, I
3
) = . . . = H(I
c
, I
K
) = 1 (5.284)
There will be some rows of H that have all their elements 0. This means that the
output pixel, that corresponds to such a row, will have value 0. We may decide to
allow this, in which case the scrambling we perform will not be easily invertible, as
matrix H will be singular. Alternatively, we may use the following x.
Step 8: Check all rows of matrix H, and in a row that contains only 0s, set the
Inhomogeneous linear image restoration 471
diagonal element equal to 1. For example, if the 5th row contains only 0s, set the 5th
element of this row to 1. This means that the output pixel, that corresponds to this
row, will have the same value as the input pixel and matrix H will not be singular.
Step 9: Normalise each row of matrix H so that its elements sum up to 1.
After you have computed matrix H, you may produce the scrambled image g in column
form, from the input image g, also in column form, by using:
g = Hg (5.285)
Example 5.42
Figure 5.15a shows the image of a criminal that wishes to hide his face.
Use a window of 70 70 around his face to scramble it using the algorithm
of example 5.41.
In this case, M = N = 70. We select t
max
= 50, 000, = 0.001 and = (2/360)2.
The value of was chosen small so that the spiral is tight and, therefore, more likely
to pass through most, if not all, pixels. The value of was selected to mean that
each time parameter t was incremented by 1, the spiral was rotated by 2
o
. The value
of t was selected high enough so that the spiral covers the whole square we wish to
scramble. After matrix H has been created and equation (5.285) applied to the 7070
patch written as a 4900 vector, the result is wrapped again to form a 70 70 patch
which is embedded in the original image. The result is shown in gure 5.15b.
(a) (b)
Figure 5.15: (a) Zoom (size 360 256). (b) After a patch of size 70 70 around
the face region is scrambled.
472 Image Processing: The Fundamentals
Example B5.43
Instead of using the spiral of example 5.41 to scramble a subimage, use
concentric circles to scan a square subimage of size M M.
Step 1: Create an array H of size M
2
M
2
with all its elements 0.
Step 2: Choose (i
0
, j
0
) to be the coordinates of a pixel near or at the centre of the
image.
Step 3: Create a 1D array S of M
2
samples, all with ag 0. This array will be used
to keep track which rows of matrix H have all their elements 0.
Step 4: Set = (2/360)x where x is a small number like 1 or 2. Select a value of
K, say K = 10.
Step 5: For taking values from 1 to M/2| in steps of 1, do the following.
Step 6: Starting with t = 0 and carrying on with t = 1, 2, . . . , 359, perform the
following computations.
Compute the indices of the 2D image pixels that will have to be mixed to yield the
value of output pixel (i
c
, j
c
):
i
c
= i
0
+cos(t) + 0.5| j
c
= j
0
+sin(t) + 0.5|
i
1
= i
0
+cos[(t 1)] + 0.5| j
1
= j
0
+sin[(t 1)] + 0.5|
i
2
= i
0
+cos[(t 2)] + 0.5| j
2
= j
0
+sin[(t 2)] + 0.5|
. . .
i
K
= i
0
+cos[(t K)] + 0.5| j
K
= j
0
+sin[(t K)] + 0.5|
(5.286)
In the above, we must make sure that the values of the coordinates do not go out of
range, ie i
x
and j
x
should take values between 1 and M. To ensure that, we use
i
k
= mini
k
, M
i
k
= maxi
k
, 1
j
k
= minj
k
, M
j
k
= maxj
k
, 1
for every k = 1, 2, . . . , K.
Step 7: Convert the coordinates computed in Step 5 into the indices of the column
vector we have created from the input image by writing its columns one under the
other. Given that a column has M elements indexed by i, with rst value 1, the pixels
identied in (5.286) will have the following indices in the column image:
Inhomogeneous linear image restoration 473
I
c
= (i
c
1)M +j
c
I
1
= (i
1
1)M +j
1
I
2
= (i
2
1)M +j
2
. . .
I
K
= (i
K
1)M +j
K
(5.287)
Step 8: If S(I
c
) = 0, proceed to apply (5.289).
If S(I
c
) ,= 0, the elements of the I
c
row of matrix H have already been computed. We
wish to retain, however, the most recent scrambling values, so we set them all again
to 0:
H(I
c
, J) = 0 for all J = 1, 2, . . . , M
2
(5.288)
We then set:
S(I
c
) = 1
H(I
c
, I
c
) = H(I
c
, I
1
) = H(I
c
, I
2
) = H(I
c
, I
3
) = . . . = H(I
c
, I
K
) = 1 (5.289)
Step 9: Check all rows of matrix H, and in a row that contains only 0s, set the
diagonal element equal to 1.
Step 10: Normalise each row of matrix H so that its elements sum up to 1.
Example B5.44
Construct a scrambling matrix that averages the values of 2K +1 pixels on
the circle of an arc centred at each pixel of an M M patch of an image
and leaves no pixel unchanged.
Let us say that we wish to scramble a patch centred at pixel (i
0
, j
0
). Consider a pixel
(i, j) of the patch. The polar coordinates of this pixel with respect to the patch centre
are
r =
_
(i i
0
)
2
+ (j j
0
)
2
(5.290)
and , such that:
i = r cos and j = r sin (5.291)
Then points on the arc of the same circle centred at this pixel and symmetrically placed
on either side of it, have coordinates
i
k
= i
0
+r cos( +k)
j
k
= j
0
+r sin( +k) (5.292)
474 Image Processing: The Fundamentals
where k takes values K, K + 1, . . . , 0, . . . , K 1, K and is the angle subtended
by a single pixel on the circle of radius r, with its vertex at the centre of the circle,
measured in rads: = 1/r.
Then the algorithm of creating matrix H is as follows.
Step 1: Create an array H of size M
2
M
2
with all its elements 0.
Step 2: Choose (i
0
, j
0
) to be the coordinates of a pixel near or at the centre
of the image.
Step 3: Scan every pixel (i, j) inside the subimage you wish to scramble, and
compute for it its polar coordinates using equations (5.290) and (5.291), and set
= 1/r.
Step 4: Compute the indices of the 2D image pixels that will have to be mixed to
yield the value of output pixel (i, j)
i
k
= i
0
+r cos( +k) + 0.5| j
k
= j
0
+r sin( +k) + 0.5| (5.293)
for k = K, K + 1, . . . , 0, . . . , K 1, K.
In the above we must make sure that the values of the coordinates do not go out of
range, ie i
k
and j
k
should take values between 1 and M. To ensure that, we use
i
k
= mini
k
, M
i
k
= maxi
k
, 1
j
k
= minj
k
, M
j
k
= maxj
k
, 1 (5.294)
for every k.
Step 5: Convert the coordinates computed in Step 4 into the indices of the
column vector we have created from the input image by writing its columns one under
the other. Given that a column has M elements indexed by i, with rst value 1, the
pixels identied in (5.293) will have the following indices in the column image:
I = (i 1)M +j
I
k
= (i
k
1)M +j
k
(5.295)
Step 6: Set:
H(I, I
c
) = H(I, I
K
) = H(I, I
K+1
) = . . . = H(I, I
0
) = . . . = H(I, I
K
) = 1 (5.296)
Step 7: Check all rows of matrix H, and in a row that contains only 0s, set the
diagonal element equal to 1.
Step 8: Normalise each row of matrix H so that its elements sum up to 1.
Inhomogeneous linear image restoration 475
Example B5.45
Show how a thick whirl-like scrambling pattern might be created.
To keep it simple, when we compute a scrambled value for pixel (i
c
, j
c
) we assign it
to all pixels around it inside a window of size (2L + 1) (2L + 1). At rst sight this
may appear to create image patches with the same value, but due to the continuous
and slow rotation of the spiral pattern, large parts of each square patch are continually
over-written and the eect disappears.
Example B5.46
Use the algorithms you developed in examples 5.43 and 5.45 to scramble
the face of gure 5.15a. Show the scrambled patterns and compare them
with the one produced in example 5.42.
(a) (b) (c)
(d) (e) (f )
Figure 5.16: (a) The original image to be scrambled (size 7070). (b) The scrambling
obtained in example 5.42. (c) The scrambling obtained with the algorithm of example
5.43, with x = 1. (d) The scrambling obtained with the algorithm of example 5.43, with
x = 2. (e) The scrambling obtained with the algorithm of example 5.45, with = 0.1,
x = 2, K = 50, L = 3 and t
max
= 50, 000. (f ) The scrambling obtained with the
algorithm of example 5.45, with = 0.03, x = 2, K = 50, L = 3 and t
max
= 50, 000.
476 Image Processing: The Fundamentals
Example 5.47
Apply the algorithm of example 5.45 to construct matrix H with which an
8 8 image may be scrambled. Consider as the eye of the whirl pixel (4, 4).
Use this matrix then to scramble the ower image. Take the inverse of
matrix H and apply it to the scrambled image to reconstruct the ower.
As the image we have to scramble is small, only the inner part of the whirl will be used.
A large part of the whirl remains close to the central pixel, so although the image is
small, we have to use a large value of K. This is because K really represents the steps
along the whirl we use for averaging, not necessarily the number of distinct pixels, as
many of these steps are mapped to the same pixel. After trial and error, the following
parameters gave good results: = 0.01, x = 1, K = 10, L = 1 and t
max
= 5, 000.
Matrix H, of size 64 64, is shown in gure 5.17. Every black cell in this matrix
represents a nonzero value. The values along each row of the matrix are all equal and
sum up to 1. Every white cell represents a 0. We can see that there is no particular
structure in this matrix.
Figure 5.17: Matrix H with which an 8 8 image may be scrambled. All black cells
represent nonzero positive numbers that along each row are equal and sum up to 1.
White cells represent value 0.
Inhomogeneous linear image restoration 477
Figure 5.18 shows the original image, the scrambled one and the unscrambled obtained
by operating on the scrambled image with matrix H
1
. The sum of the squares of
the errors of the reconstructed image is 219. This error is computed from the raw
output values, where negative and higher than 255 grey values are allowed. This error
is only due to the quantisation errors introduced by representing the scrambled image
with integers, as matrix H could be inverted exactly. We observe that although we
knew exactly what the scrambling matrix was and we applied its inverse exactly on
the scrambled image, we did not get back the ower image exactly. This is because in
equation (5.285), on page 471, the g is given to us as an image with all its elements
rounded to the nearest integer, while the result of Hg is actually a vector with non-
integer elements. So, the application of H
1
to g is not the inverse of equation (5.285).
(a) (b) (c) (d)
Figure 5.18: (a) Original image of size 8 8. (b) Image scrambled with matrix H
shown in gure 5.17. (c) Image recovered by operating on the scrambled image with
the inverse of matrix H. The output has all its values beyond 255 and below 0 truncated
to 255 and 0, respectively. The square error is 79. (d) Image recovered by operating
on the scrambled image with the inverse of matrix H. The output values are mapped
to the range 0 to 255, without any truncation. The square error is 39957.
How may we use constrained matrix inversion when the distortion matrix is not
circulant?
Here the regularisation term has to be applied to the whole matrix, and not in the form of
a lter. To do that, we have to solve the following problem: if g is the observed distorted
image, g is the undistorted image we are seeking and H is the distortion matrix that was
used to produce g, minimise Lg and
H
1
g g
i=1
(d
i
g
i
)
2
+
MN
i=1
_
_
MN
j=1
L
ij
g
j
_
_
2
(5.298)
We have to set the rst derivative of U(g) with respect to one of the components of g, say
g
k
, equal to 0:
U
g
k
= 0
2(d
k
g
k
)(1) +2
MN
i=1
_
_
MN
j=1
L
ij
g
j
_
_
L
ik
= 0
g
k
+
MN
i=1
_
_
MN
j=1
L
ij
_
_
L
ik
g
j
= d
k
(5.299)
Example 5.48
A column image consisting of 3 elements has been scrambled by a 3 3
matrix, the inverse of which is matrix A. We wish to restore the image,
using as a regularisation constraint the requirement that every pixel has a
value as similar as possible with the value of the next pixel. Work out the
system of equations you will have to solve in this case.
Let us call the scrambled image we have g. Operating on it with matrix A will yield a
noisy estimate of the original image. Let us call it d:
d
1
= a
11
g
1
+a
12
g
2
+a
13
g
3
d
2
= a
21
g
1
+a
22
g
2
+a
23
g
3
d
3
= a
31
g
1
+a
32
g
2
+a
33
g
3
(5.300)
We wish to estimate better values for the original image g which will be such that the
squares of rst dierences g
1
g
2
, g
2
g
3
and g
3
g
1
are as small as possible, and at
Inhomogeneous linear image restoration 479
the same time g
i
will be as near as possible to d
i
. Note that we assumed here periodic
boundary conditions, ie that the image is repeated ad innitum, so that the last pixel g
3
has as its next neighbour the rst pixel g
1
. Then the cost function we have to minimise
is:
U = (d
1
g
1
)
2
+(d
2
g
2
)
2
+(d
3
g
3
)
2
+(g
1
g
2
)
2
+(g
2
g
3
)
2
+(g
3
g
1
)
2
(5.301)
Parameter is used to control the balance between remaining faithful to the original
estimate and imposing the smoothness constraint we selected to use. The derivatives
of this function with respect to the three unknowns are:
U
g
1
= 2(d
1
g
1
)(1) + 2(g
1
g
2
) 2(g
3
g
1
)
U
g
2
= 2(d
2
g
2
)(1) 2(g
1
g
2
) + 2(g
2
g
3
)
U
g
3
= 2(d
3
g
3
)(1) 2(g
2
g
3
) + 2(g
3
g
1
) (5.302)
The right-hand sides of these equations should be set to 0. Then, after some manipu-
lation, the system of equations we have to solve becomes:
g
1
(1 + 2) g
2
g
3
= d
1
g
1
+g
2
(1 + 2) g
3
= d
2
g
1
g
2
+ (1 + 2)g
3
= d
3
(5.303)
Example 5.49
Verify that the result of example 5.48 could have been obtained by applying
formula (5.299).
We rst work out matrix L, which, when it operates on the image, yields at each
position the rst dierence with the next pixel. It is easy to see that:
_
_
g
1
g
2
g
2
g
3
g
3
g
1
_
_
=
_
_
1 1 0
0 1 1
1 0 1
_
_
_
_
g
1
g
2
g
3
_
_
(5.304)
So, matrix L is:
_
_
1 1 0
0 1 1
1 0 1
_
_
(5.305)
480 Image Processing: The Fundamentals
For MN = 3 and for k = 1, 2, 3, formula (5.299) has the form:
g
1
+
3
i=1
3
j=1
L
ij
L
i1
g
j
= d
1
g
2
+
3
i=1
3
j=1
L
ij
L
i2
g
j
= d
2
g
3
+
3
i=1
3
j=1
L
ij
L
i3
g
j
= d
3
(5.306)
Or:
g
1
+
3
i=1
(L
i1
L
i1
g
1
+L
i2
L
i1
g
2
+L
i3
L
i1
g
3
) = d
1
g
2
+
3
i=1
(L
i1
L
i2
g
1
+L
i2
L
i2
g
2
+L
i3
L
i2
g
3
) = d
2
g
3
+
3
i=1
(L
i1
L
i3
g
1
+L
i2
L
i3
g
2
+L
i3
L
i3
g
3
) = d
3
(5.307)
Or:
g
1
+(L
11
L
11
g
1
+L
12
L
11
g
2
+L
13
L
11
g
3
+L
21
L
21
g
1
+L
22
L
21
g
2
+L
23
L
21
g
3
+L
31
L
31
g
1
+L
32
L
31
g
2
+L
33
L
31
g
3
) = d
1
g
2
+(L
11
L
12
g
1
+L
12
L
12
g
2
+L
13
L
12
g
3
+L
21
L
22
g
1
+L
22
L
22
g
2
+L
23
L
22
g
3
+L
31
L
32
g
1
+L
32
L
32
g
2
+L
33
L
32
g
3
) = d
2
g
3
+(L
11
L
13
g
1
+L
12
L
13
g
2
+L
13
L
13
g
3
+L
21
L
23
g
1
+L
22
L
23
g
2
+L
23
L
23
g
3
+L
31
L
33
g
1
+L
32
L
33
g
2
+L
33
L
33
g
3
) = d
3
(5.308)
Substituting the values of L
ij
, we nally get:
g
1
+(g
1
g
2
+g
1
g
3
) = d
1
g
2
+(g
1
+g
2
+g
2
g
3
) = d
2
g
3
+(g
2
+g
3
g
1
+g
3
) = d
3
(5.309)
This set of equations is the same as (5.303).
Inhomogeneous linear image restoration 481
What happens if matrix H is really very big and we cannot take its inverse?
In such cases, instead of estimating d H
1
g and trying to estimate g so that it is as close
as possible to d, we try to estimate g so that Hg is as close as possible to g. The function
we have to minimise now has the form:
U(g) [[ g Hg[[
2
+[[Lg[[
2
(5.310)
In terms of components, U(g) may be written as:
U(g)
MN
i=1
_
_
g
i
MN
j=1
H
ij
g
j
_
_
2
+
MN
i=1
_
_
MN
j=1
L
ij
g
j
_
_
2
(5.311)
We have to work out the rst derivative of U(g) with respect to a component of g, say g
k
,
and set it equal to 0:
U
g
k
= 0
2
MN
i=1
_
_
g
i
MN
j=1
H
ij
g
j
_
_
(H
ik
) + 2
MN
i=1
MN
j=1
L
ij
g
j
L
ik
= 0
MN
i=1
g
i
H
ik
+
MN
i=1
MN
j=1
H
ij
H
ik
g
j
+
MN
i=1
MN
j=1
L
ij
L
ik
g
j
= 0
MN
j=1
g
j
_
MN
i=1
H
ij
H
ik
_
+
MN
j=1
g
j
_
MN
i=1
L
ij
L
ik
_
=
MN
i=1
H
ik
g
i
MN
j=1
g
j
MN
i=1
(H
ij
H
ik
+L
ij
L
ik
)
. .
A
kj
=
MN
i=1
H
ik
g
i
. .
b
k
MN
j=1
A
kj
g
j
= b
k
Ag = b (5.312)
Matrix A H
T
H +L
T
L with elements
A
kj
MN
i=1
(H
ik
H
ij
+L
ik
L
ij
) (5.313)
and vector b H
T
g with elements
b
k
MN
i=1
H
ik
g
i
(5.314)
may easily be computed from the distorted image g, the distortion matrix H and the regu-
larisation matrix L.
482 Image Processing: The Fundamentals
However, system (5.312) is still very dicult to solve, because matrix A cannot be inverted
easily. It is neither circulant nor block circulant and it is of size MN MN, where MN
typically could be of the order of 250, 000. Approximate iterative methods may be used
in such a case, like Jacobis method and its more elaborate version, the Gauss-Seidel
method.
Box 5.7. Jacobis method for inverting large systems of linear equations
Consider the system of N linear equations with N unknowns
Ax = b (5.315)
where A is an N N known matrix and b is an N 1 known vector. If matrix A has
its largest values along its diagonal, then system (5.315) may be solved iteratively, as
follows.
Step 1: Write matrix A as the sum of three matrices, one containing its ele-
ments along its diagonal, one containing its elements in the upper triangle of it, and
the other containing its elements in the lower triangle of it:
D
_
_
_
_
_
a
11
0 . . . 0
0 a
22
. . . 0
.
.
.
.
.
. . . .
.
.
.
0 0 . . . a
NN
_
_
_
_
_
U
_
_
_
_
_
0 a
12
. . . a
1N
0 0 . . . a
2N
.
.
.
.
.
. . . .
.
.
.
0 0 . . . 0
_
_
_
_
_
L
_
_
_
_
_
0 0 . . . 0
a
21
0 . . . 0
.
.
.
.
.
. . . .
.
.
.
a
N1
a
N2
. . . 0
_
_
_
_
_
(5.316)
Step 2: Make an initial guess for the values of the elements of vector x. Say, set them
all to 0.
Step 3: Update the values of x until convergence, using the iterative formula
Dx
(k+1)
= (L +U)x
(k)
+b (5.317)
where superscript (k) means the value at iteration k.
Note that in practice we do not use matrices, as the computer may not be able to
handle such large matrices: we instead consider each equation in turn, as follows.
Step 2 (realistic): Initialise all elements of x:
x
(0)
i
=
b
i
D
ii
for i = 1, 2, . . . , N (5.318)
Step 3 (realistic): Update the values of x
i
, iteratively, until convergence:
x
(k+1)
i
=
1
D
ii
_
_
b
i
j=1,j=i
A
ij
x
(k)
j
_
_
for i = 1, 2, . . . , N (5.319)
Inhomogeneous linear image restoration 483
The essence of this algorithm is to consider only the dominant term in each equation,
assuming that this is the rst term for the rst equation, the second term for the second
equation, the third term for the third equation, and so on, and express the unknown
of the dominant term in terms of the other unknowns. Then compute an improved
value of the dominant unknown in each equation from the estimated values of the other
unknowns (see example 5.50). Convergence is guaranteed when the absolute value of
the diagonal element in each row of matrix A is larger than the sum of the absolute
values of the other elements.
Example B5.50
Consider the system of equations Ax = b, or explicitly
a
11
x
1
+a
12
x
2
+a
13
x
3
= b
1
a
21
x
1
+a
22
x
2
+a
23
x
3
= b
2
a
31
x
1
+a
32
x
2
+a
33
x
3
= b
3
(5.320)
where [a
11
[ > [a
12
[ + [a
13
[, [a
22
[ > [a
21
[ + [a
23
[ and [a
33
[ > [a
31
[ + [a
32
[. Solve it
for x
1
, x
2
and x
3
, iteratively.
First we keep the dominant term in each equation, and express it in terms of the other
terms:
a
11
x
1
= b
1
a
12
x
2
a
13
x
3
= b
1
0x
1
a
12
x
2
a
13
x
3
a
22
x
2
= b
2
a
21
x
1
a
23
x
3
= b
2
a
21
x
1
0x
2
a
23
x
3
a
33
x
3
= b
3
a
31
x
1
a
32
x
2
= b
3
a
31
x
1
a
32
x
2
0x
3
(5.321)
In terms of matrices, this system of equations may be written as:
_
_
a
11
0 0
0 a
22
0
0 0 a
33
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
b
1
b
2
b
3
_
_
_
_
0 a
12
a
13
a
21
0 a
23
a
31
a
32
0
_
_
_
_
x
1
x
2
x
3
_
_
(5.322)
Note that the matrix that appears on the left-hand side is the matrix of the diagonal
elements of matrix A, while the matrix that appears on the right-hand side is the sum of
the upper and the lower triangular matrices of matrix A. The values of the unknowns
on the right-hand side are to be the values we have estimated so far, while the values
we shall compute this way will be improved estimates. This way of solving the problem
is formally given in Box 5.7 as Jacobis method.
484 Image Processing: The Fundamentals
Example B5.51
In example 5.50 observe that when you solve system (5.321), at each iter-
ation you nd rst an improved value for x
1
, then for x
2
, and nally for x
3
.
Use the improved value of x
1
when you work out the new estimate for x
2
,
and the improved values of x
1
and x
2
when you work out the new estimate
for x
3
.
Let us call the improved values we work out at an iteration (
1
,
2
,
3
)
T
. The system
(5.321) may be written as:
a
11
1
= b
1
0x
1
a
12
x
2
a
13
x
3
a
22
2
= b
2
a
21
x
1
0x
2
a
23
x
3
a
33
3
= b
3
a
31
x
1
a
32
x
2
0x
3
(5.323)
Now consider that when you compute
2
, instead of using on the right-hand side the
old value of x
1
, you use the newly estimated value
1
, and when you work out
3
, you
make use of the values
1
and
2
, instead of x
1
and x
2
. Then this system becomes:
a
11
1
= b
1
0x
1
a
12
x
2
a
13
x
3
a
22
2
= b
2
a
21
1
0x
2
a
23
x
3
a
33
3
= b
3
a
31
1
a
32
2
0x
3
(5.324)
In terms of matrices, this may be written as:
_
_
a
11
0 0
0 a
22
0
0 0 a
33
_
_
_
_
3
_
_
=
_
_
b
1
b
2
b
3
_
_
_
_
0 0 0
a
21
0 0
a
31
a
32
0
_
_
_
_
3
_
_
_
_
0 a
12
a
13
0 0 a
23
0 0 0
_
_
_
_
x
1
x
2
x
3
_
_
(5.325)
Collecting all the new estimates on the left-hand side, we eventually get:
_
_
_
_
_
a
11
0 0
0 a
22
0
0 0 a
33
_
_
+
_
_
0 0 0
a
21
0 0
a
31
a
32
0
_
_
_
_
_
_
_
3
_
_
=
_
_
b
1
b
2
b
3
_
_
_
_
0 a
12
a
13
0 0 a
23
0 0 0
_
_
_
_
x
1
x
2
x
3
_
_
(5.326)
This formulation of the problem is known as the Gauss-Seidel method (see Box
5.8).
Inhomogeneous linear image restoration 485
Box 5.8. Gauss-Seidel method for inverting large systems of linear equations
Note that in Jacobis method (Box 5.7) we may solve the system sequentially, using for
some of the unknowns the improved values we have already estimated in the current
iteration, instead of waiting to update all values using those of the previous iteration
(see example 5.51). In that case, the algorithm is like that in Box 5.7, but with step 3
replaced with:
Step 3 (Gauss-Seidel): Update the values of x until convergence, using the iterative
formula
(D +L)x
(k+1)
= Ux
(k)
+b (5.327)
where superscript (k) means the value at iteration k.
Again, this step in practice is performed by handling each equation separately:
Step 3 (Gauss-Seidel, realistic): Update the values of x
i
until convergence,
using:
x
(k+1)
i
=
1
D
ii
_
_
b
i
j=1,j>i
A
ij
x
(k)
j
N
j=1,j<i
A
ij
x
(k+1)
j
_
_
for i = 1, 2, . . . , N
(5.328)
Does matrix H as constructed in examples 5.41, 5.43, 5.44 and 5.45 full the
conditions for using the Gauss-Seidel or the Jacobi method?
Only marginally. The diagonal elements of matrix H are not larger than the non-diagonal
elements. We may, however, construct scrambling matrices H which full this constraint, if
we add the following step at the end of each algorithm:
Step easy: Count the nonzero elements of each row of matrix H. If it is only 1, do nothing.
If there are N > 1 nonzero elements, set the value of the diagonal element to 0.55. Set the
values of the other nonzero elements to 0.45/(N 1).
Example 5.52
Degrade the face of image 5.15a with the algorithm of example 5.45 with
the extra step that makes the scrambling matrix dominated by its diagonal
values. Then unscramble it using the Jacobi and the Gauss-Seidel methods.
Figure 5.19a shows the original image 5.16a. Its degraded version, shown in 5.19b,
was obtained by running the thick whirl algorithm with parameters = 0.01, x = 2,
K = 25, L = 1 and using 5000 steps, and subsequently modifying the scrambling
486 Image Processing: The Fundamentals
matrix H so that the dominant element of each row is the diagonal element, with a
value larger than the sum of the values of the remaining elements. This is necessary
for the restoration to be possible with the Jacobi or the Gauss-Seidel method. To
achieve this, we set all diagonal elements of the matrix to 0.55 (unless the diagonal
element was the only nonzero element in its row, in which case it was left to be 1),
and distributed the remaining 0.45 (necessary to make all elements of a row to sum
up to 1) equally between the remaining nonzero elements of the same row. The H
matrix created this way is not really eective in scrambling, as the face can be easily
recognised. Nevertheless, the face is degraded, and our task is to recover its undegraded
version.
Figures 5.19c and 5.19d show the recovered images by using the Jacobi and the Gauss-
Seidel algorithm, respectively. The algorithms were run with the following stopping
criterion: stop either when the number of iterations reaches 35, or when the sum of
the squares of the dierences in the values of the pixels between successive iterations is
less than 0.1. Note that the values of the pixels were in the range [0, 1] and not [0, 255].
The Jacobi algorithm stopped after 29 iterations with a sum square error 0.09, while
the Gauss-Seidel algorithm stopped after 11 iterations with sum square error 0.011.
Although one cannot see any dierence in the quality of the results, the Gauss-Seidel
algorithm converged faster, to a more accurate solution.
(a) Original (b) Degraded (c) Restored (Jac) (d) Restored (G-S)
MSE = 606 MSE = 0.2425 MSE = 0.0066
Figure 5.19: An original image, its degraded version and the recovered images by the
Jacobi and the Gauss-Seidel algorithms. The mean square errors were computed using
pixel values in the range [0, 255].
What happens if matrix H does not satisfy the conditions for the Gauss-Seidel
method?
In that case we can minimise the cost function (5.297) by using gradient descent. Since the
function is quadratic in all the unknowns, it has a well dened minimum that can be reached
by simply starting from an initial guess of the solution and then updating the values of the
unknowns, one at a time, so that we always move towards the minimum. This is shown
schematically in gure 5.20.
Inhomogeneous linear image restoration 487
U(x)
x
1
x
2
x
A
B
C
x
0
Figure 5.20: A function U(x), which is quadratic in its argument, has one well dened
minimum at point C, for x = x
0
. If we guess an initial value for x
0
as x = x
1
, we are
at point A, where, for increasing value of x, the function decreases, ie dU/dx < 0. Then if
we increase the value of the original guess, we shall move it towards x
0
. If we guess an initial
value for x
0
as x = x
2
, we are at point B, where, for increasing value of x, the function also
increases, ie dU/dx > 0. Then if we decrease the value of the original guess, we shall move it
towards x
0
. In either case, we change the value of the original guess by dU/dx.
How do we apply the gradient descent algorithm in practice?
Step 1: Guess an initial restored image. You may set it to be the same as the distorted
image, or equal to a at image where all pixels have value 128.
Step 2: Consider in turn, one pixel at a time. Compute the value of the rst derivative
of function (5.310) with respect to the pixel value under consideration
U
g
k
=
MN
j=1
g
j
A
kj
b
k
(5.329)
where parameters A
kj
and b
k
are given by (5.313) and (5.314), respectively.
Step 3: If
U
g
k
> 0, decrease the value of pixel k by 1.
If
U
g
k
< 0, increase the value of pixel k by 1.
If
U
g
k
= 0, do nothing.
You may set as a termination criterion a xed number of updates. For an image with MN
pixels, and since we change the value of each pixel by 1 at each update, and if we start with
all pixels having value 128, we may estimate that we shall need roughly 128MN updates, ie
128 iterations.
488 Image Processing: The Fundamentals
Example 5.53
The image shown in gure 5.21a was produced with the thick whirl algo-
rithm with parameter setting = 0.01, x = 2, K = 25, L = 1 and by using
5000 steps. Unscramble it with the gradient descent algorithm.
To unscramble this image we assume we know matrix H that produced it. We then
have to work out matrix L, used in the regularisation term of the cost function that
we have to minimise. To do that, we have to generalise for an N N image the L
matrix we worked out in example 5.30, on page 449, for a 3 3 image. It turns out
that the N
2
N
2
matrix L we need consists of N
2
partitions of size N N each, with
the following arrangement:
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
A B C C . . . C C C B
B A B C . . . C C C C
C B A B . . . C C C C
C C B A . . . C C C C
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C C C C . . . B A B C
C C C C . . . C B A B
B C C C . . . C C B A
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.330)
Partition type B is the unit matrix of size NN. Partition type C has all its elements
0. Partition type A is:
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
4 1 0 0 . . . 0 0 0 1
1 4 1 0 . . . 0 0 0 0
0 1 4 1 . . . 0 0 0 0
0 0 1 4 . . . 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 . . . 1 4 1 0
0 0 0 0 . . . 0 1 4 1
1 0 0 0 . . . 0 0 1 4
_
_
_
_
_
_
_
_
_
_
_
_
_
(5.331)
To compute matrix L in practice, we create rst an N
2
N
2
matrix with all its
elements 0. Then we visit each element with indices (i, j) and compute:
i
1
_
i 1
N
_
+ 1 i
2
i (i
1
1)N
j
1
_
j 1
N
_
+ 1 j
2
j (j
1
1)N (5.332)
This way, we have analysed indices i and j into their quotient and remainder with
respect to N, so that indices (i
1
, j
1
) tell us to which partition element (i, j) belongs
and indices (i
2
, j
2
) tell us which element of partition (i
1
, j
1
) element (i, j) is. Note
Inhomogeneous linear image restoration 489
that, indices i and j are assumed to take values from 1 to N. Then, we use the
following rules to change or not the value of element L
ij
.
Rule 1: If i = j, set L
ij
= 4
Rule 2: If i
1
= j
1
and [i
2
j
2
[ = 1, set L
ij
= 1
Rule 3: If i
1
= j
1
and [(i
2
= 1 and j
2
= N) or (i
2
= N and j
2
= 1)], set L
ij
= 1
Rule 4: If [i j[ = N, set L
ij
= 1
Rule 5: If j
1
= N and i
1
= 1 and i
2
= j
2
, set L
ij
= 1
Rule 6: If j
1
= 1 and i
1
= N and i
2
= j
2
, set L
ij
= 1
Assuming that we know or can guess scrambling matrix H, we then use equa-
tion (5.313) to compute matrix A for a selected value of , and equation (5.314) to
compute vector b from the input image g. These computations are slow, but they have
to be done only once, before the iterations of the gradient descent algorithm start.
Figures 5.21b, 5.21c and 5.21d show the results obtained after 250 iterations for =
0.01, 0.001 and 0.0001, respectively. The mean square error for each image and the
nal value of the cost function in each case are also given. In all cases the nal value
was reached with much fewer iterations than 250 (at around 150 iterations). Letting
the program run for more iterations than necessary simply made the solution oscillate
about the minimum. The starting solution in all cases was the input image, and the
initial value of the cost function was of the order of 10 100.
(a) Input = 10
2
= 10
3
= 10
4
MSE = 2713 MSE = 276 MSE = 158 MSE = 198
U
final
= 0.53 U
final
= 0.09 U
final
= 0.02
Figure 5.21: Results of the gradient descent algorithm. The scrambling matrix is
assumed known. The value of controls the level of smoothing imposed on the solution.
What happens if we do not know matrix H?
In that case we have also to guess the values of H. Any prior knowledge or educated guess
concerning its structure might be used to guide the process so it is not totally random. The
reconstruction will have to proceed in alternating steps:
Step 1: Guess a matrix H
Step 2: Restore the image using the guessed H. According to the result you get, update
your guess concerning matrix H and go to Step 1.
490 Image Processing: The Fundamentals
5.5 Nonlinear image restoration: MAP
estimation
What does MAP estimation mean?
MAP stands for maximum a posteriori probability estimation. In general, it tries to
answer the question: what is the most probable solution, given the data and the model we
use?
How do we formulate the problem of image restoration as a MAP estimation?
Any combination of grey values assigned to the image pixels is a potential solution to the
restoration problem. We wish to identify the particular conguration that is most probable,
given the data and the degradation model. This is a global optimisation problem, as it seeks
the joint conguration of pixel values that is best, rather than trying to recover the image by
performing ltering that relies on local operations.
Thus, the MAP solution to the image restoration problem is
f = argmax
x
p(x[g, model) (5.333)
where f is the restored image, x is a combination of pixel values, g is the damaged image, and
model encompasses all knowledge we have on the degradation process, including the point
spread function of any blurring operator involved, any nonlinear degradation, the statistical
properties and the nature of noise, etc. Here p(x[g, model) is the a posteriori probability
density function of image x to be the image that has given rise to the observed image g, given
the degradation model.
How do we select the most probable conguration of restored pixel values, given
the degradation model and the degraded image?
To do that, we need to write down an expression for the a posteriori probability density
function p(x[g, model) that appears in (5.333) (see example 5.55).
We usually consider p(x[g, model) to have the following form
p(x[g, model) =
1
Z
e
H(x;g)
(5.334)
where H(x; g) is a non-negative function of conguration x, often referred to as the cost
function, or the energy function, or the Hamiltonian of the conguration, and Z
is a
normalising constant, called the partition function.
It is clear from (5.334) that in order to chose x so that p(x[g, model) is maximised, it is
enough to choose x so that H(x; g) is minimised.
MAP restoration 491
Box 5.9. Probabilities: prior, a priori, posterior, a posteriori, conditional
Prior or a priori probability is the probability of an event to happen when we have
no information concerning any special circumstances.
Posterior or a posteriori probability is the probability of an event to happen when
we have some information concerning the circumstances.
Conditional probability is the probability of an event to happen, subject to the con-
dition that another event happens rst.
Example B5.54
Estimate the probability that the rst person somebody meets when she
walks out in the street is Chinese. Also, estimate the probability that the
rst person somebody meets when she walks out in the street is Chinese, if
this person lives in Girona in Spain. Finally, estimate the probability that
the person you will see when you open your front door is Chinese, given
that the person you saw from your entrance video ringing your bell was
Chinese.
In absence of any information, the prior probability that the rst person somebody saw
when they walked out in the street would be Chinese is 0.2. This is because there are
roughly 1.2 billion Chinese in the world and the world has roughly 6 billion people. If
one is in Girona, however, the posterior probability for the rst person she sees when
she walks out in the street being Chinese is rather slim, tending to 0. The conditional
probability that the person at the doorstep is Chinese, given that the person who rung
the bell was Chinese, is near 1.
Is the minimum of the cost function unique?
In general, not, unless the cost function is quadratic in the unknowns, like function (5.297),
on page 477. Notice that the cost function takes a single value for a combination of values
of thousands of unknowns. (For a typical 500 500 image, H(x; g) depends on 250, 000
unknowns.) If the function is not quadratic, it is very likely that dierent combinations of
values of the unknowns may lead to the same value of the cost function. In some cases (see
example 5.56), the cost function may take a unique, well dened minimum, which, however,
corresponds to the trivial solution.
492 Image Processing: The Fundamentals
Example 5.55
Assuming that an image has been degraded by iid (independent identically
distributed) additive Gaussian noise with zero mean and variance
2
, work
out an expression for p(x[g, model) in (5.333).
Since the noise is additive, the true value x
ij
of pixel (i, j) and its observed value g
ij
are related by
x
ij
= g
ij
+n
ij
(5.335)
where n
ij
is the noise value at this pixel. We may write:
n
ij
= x
ij
g
ij
(5.336)
Since the noise is Gaussian, noise value n
ij
for pixel (i, j) was drawn with probability:
p(n
ij
) =
1
2
e
n
2
ij
2
2
(5.337)
Then the joint probability density function of the combination of noise values in all
pixels is simply the product of all such probabilities, since the noise values are inde-
pendent:
p(n) p(n
11
, n
12
, n
13
, . . . , n
NM
)
=
(i,j)
p(n
ij
)
=
_
1
2
_
NM
(i,j)
e
n
2
ij
2
2
=
_
1
2
_
NM
e
(i,j)
n
2
ij
2
2
=
_
1
2
_
NM
e
1
2
2
(i,j)
n
2
ij
(5.338)
Here n is the noise eld and the image is assumed to be N M in size. If x is
the undegraded image and g is the observed image, obviously n = x g. So, p(n) is
actually p(x g) and it is the joint probability of pixel values x to have given rise to
the observed values g. In other words, p(x g) is nothing else than the a posteriori
probability of x given g and the model (since we used the noise model to work out that
n = x g and the form of p(n)). We can then write:
p(x[g, model) =
_
1
2
_
NM
e
1
2
2
(i,j)
(x
ij
g
ij
)
2
(5.339)
MAP restoration 493
How can we select then one solution from all possible solutions that minimise the
cost function?
We select the most probable solution, according to the general knowledge we have for the
world. For example, we know that the world is, in general, smooth. This means that we may
work out a probability p(x) for any particular conguration x to arise in the rst place (a
priori probability), depending on its smoothness. We can then select the conguration that
maximises this probability. We usually adopt for this probability the form
p(x) =
1
Z
H(x)
(5.340)
where Z
H(x)
(5.341)
where Z is a normalising constant. This combination is slightly naive, as it does not consider
that the two functions that appear in the exponent may take values of totally dierent scale.
For example, if H(x; g) takes values in the range [0, 10] while
H(x) takes values in the range
[0.01, 0.1], simply adding them implies that any processing we do will be dominated by H(x; g),
which will drive p(x[g, model)p(x) down due to its high value, no matter what value
H(x)
takes. So, it is better if we introduce a positive scaling constant in the exponent of one of the
probabilities when we multiply them:
p(x[g, model)p(x) =
1
Z
e
H(x;g)
H(x)
(5.342)
Note that the insertion of a scaling factor in the exponent of (5.340) makes no dierence to
the relative ranking of the various congurations in terms of how probable they are. For
example, the conguration that maximises p(x) remains the same. What changes is the
sharpness of one of the probability density functions, in relation to the other. This way we
avoid large patches of the conguration space over which the value of one of the probability
density functions dominates over the other. This is shown schematically in gure 5.22.
In general, we always use a scaling parameter in the exponent of functions p(x[g, model),
p(x) or p(x[g, model)p(x). This scaling constant is known as temperature parameter. So,
in general, we write
p(x[g, model)p(x) =
1
Z
e
1
T
U(x;g)
(5.343)
where T > 0 is the temperature parameter and U(x; g) is the over all cost function.
The cost function then, that we have to minimise in order to identify the most probable
conguration, is:
U(x; g) H(x; g) +
H(x) > 0 (5.344)
494 Image Processing: The Fundamentals
A B
pdf2
x
pdf1
p(x)
A B
x
pdf1
p(x)
pdf2
(b)
(a)
Figure 5.22: Two probability density functions (pdfs). The area under the curve of each of
them is 1. (a) One of them is sharper than the other. This means that for a large range of
values of variable x, pdf1 has a veto since when we multiply the two pdfs, pdf1 is virtually
0 in ranges A and B, forcing the product to be 0 too, irrespective of what pdf2 says. (b) We
may multiply the exponent of pdf2 with a constant , so that we make it nonzero for roughly
the same range of values of x as pdf1. The area under the curve of pdf2 will remain 1, but
because its range is now restricted, it is forced to become more peaky. When we multiply
these two pdfs now, none of them has the veto over the other.
This function consists of two terms: H(x; g) is the term that expresses how faithful to the
data the solution is. The second term expresses the prejudice we have for the world. That is,
the expectation we have that the solution has to have certain characteristics in order to be
acceptable. Another way of putting it, is to say that the second term incorporates any prior
knowledge we have concerning the solution. Often, this term in the cost function is referred
to as a regularisation term, because it tends to be chosen so that it makes the solution
smoother. Parameter allows us to control the relative importance we give to prior model
(i,j)
(x
ij
g
ij
)
2
(5.345)
This is a non-negative function. Its minimum value is 0 and it is obtained when
x
ij
= g
ij
for all pixels (i, j).
This is a trivial solution, that does not help us restore the image.
MAP restoration 495
Example 5.57
Write down a joint probability density function that favours congurations
in which a pixel has similar value to its horizontal neighbours.
Let us consider a pixel x
ij
. We would like this value to be chosen from a probability
density function that favours values similar to the values of the neighbouring pixels (ie
pixels x
i1,j
and x
i+1,j
). Let us consider
p(x
ij
) =
1
Z
e
(x
ij
x
i1,j
)
2
(x
ij
x
i+1,j
)
2
(5.346)
where Z
e
(x
i1,j
x
i2,j
)
2
(x
i1,j
x
ij
)
2
(5.347)
where Z
e
(x
ij
x
i+1,j
)
2
and p(x
i1,j
) =
1
Z
e
(x
i1,j
x
ij
)
2
(5.348)
Then the joint probability density function for the whole conguration x is
p(x) =
1
Z
e
(i,j)
(x
ij
x
i+1,j
)
2
(5.349)
where Z is a normalising constant.
Example 5.58
Work out the cost function for the problem of example 5.57 and identify
the conguration that minimises it. Comment on your answer.
The cost function is:
H(x) =
(i,j)
(x
ij
x
i+1,j
)
2
(5.350)
This is a non-negative function. Its minimum value is 0 and it is obtained when
x
ij
= x
i+1,j
for all pixels (i, j). This is a trivial solution that yields a at image: all
pixels have to have the same value.
496 Image Processing: The Fundamentals
Example 5.59
Write down a cost function which, when minimised, will restore an image
that has suered from iid additive Gaussian noise and at the same time
favours solutions that make a pixel similar to its horizontal neighbours.
This should be the weighted sum of the results of examples 5.56 and 5.58:
U(x; g) = H(x; g) +
H(x) =
(i,j)
_
(x
ij
g
ij
)
2
+(x
ij
x
i+1,j
)
2
(5.351)
Example 5.60
Write down a cost function which, when minimised, will restore an image
that has suered from iid additive Gaussian noise and at the same time
favours solutions that make a pixel similar to its vertical and horizontal
neighbours.
U(x; g) =
(i,j)
_
(x
ij
g
ij
)
2
+a(x
ij
x
i+1,j
)
2
+b(x
ij
x
i,j+1
)
2
(5.352)
Here a and b are some scaling constants, which allow us to control the relative impor-
tance we give to each term. For example, if a > b > 0, or if a > 0 and b < 0, a pixel
will tend to be more similar to its horizontal neighbours than to its vertical neighbours.
The image we shall construct then will tend to have horizontal stripes.
Box 5.10. Parsevals theorem
If x
n
is a real N-sample long signal and x
k
is its Fourier transform, we have:
N1
k=0
x
k
x
k
=
1
N
N1
n=0
x
2
n
(5.353)
For a 2D N M image g
nm
, we have:
N1
k=0
M1
l=0
g
kl
g
kl
=
1
NM
N1
n=0
M1
m=0
g
2
nm
(5.354)
MAP restoration 497
Example B5.61
Prove Parsevals theorem for a real N-sample long signal x
n
.
The DFT of a digital signal x
n
is given by:
x
k
=
1
N
N1
n=0
x
n
e
j
2
N
kn
(5.355)
For a real signal, the complex conjugate of x
k
is:
x
k
=
1
N
N1
n=0
x
n
e
j
2
N
kn
(5.356)
Then:
N1
k=0
x
k
x
k
=
1
N
2
N1
k=0
N1
n=0
N1
m=0
x
n
x
m
e
j
2
N
k(nm)
=
1
N
2
N1
n=0
N1
m=0
x
n
x
m
N1
k=0
e
j
2
N
k(nm)
=
1
N
2
N1
n=0
N1
m=0
x
n
x
m
N(mn)
=
1
N
N1
n=0
x
2
n
(5.357)
Here, we made use of (2.164), on page 95, with S N, m k and t mn.
Example B5.62
Using Parsevals theorem, work out how the cost function dened in (5.352)
may be dened in the frequency domain.
Let us consider the energy function U(x; g) as dened in (5.352) in the frequency
domain. First of all, the sum of the squares of function x
ij
g
ij
according to Parsevals
theorem is equal to the sum of the squares of the Fourier coecients of the same
function, ie
498 Image Processing: The Fundamentals
(i,j)
(x
ij
g
ij
)
2
=
1
NM
(k,l)
( x
kl
g
kl
)
2
(5.358)
where (k, l) are frequency indices and the hats identify the DFTs of the functions.
Next, we observe that x
i+1,j
is function x
ij
shifted by a single position along the i
axis. This means that the DFT of x
i+1,j
is the DFT of x
ij
multiplied with e
j
2k
N
. So,
the DFT of function F
ij
x
ij
x
i+1,j
is x
kl
_
1 e
j
2k
N
_
. Let us consider that we
can write the DFT of x
ij
in terms of amplitude and phase as A(k, l)e
j(k,l)
. Then
the DFT of function F
ij
is:
F
kl
= A(k, l)e
j(k,l)
_
1 e
j
2k
N
_
= A(k, l)e
j(k,l)
A(k, l)e
j((k,l)+
2k
N
)
(5.359)
Then obviously,
kl
= A(k, l)e
j(k,l)
A(k, l)e
j((k,l)+
2k
N
)
(5.360)
According to Parsevals theorem, sum
(i,j)
(x
ij
x
i+1,j
)
2
is equal to
(1/(NM))
(k,l)
F
kl
F
kl
:
F
kl
F
kl
= A(k, l)
2
A(k, l)
2
e
j
2k
N
A(k, l)
2
e
j
2k
N
+A(k, l)
2
= 2A(k, l)
2
A(k, l)
2
_
e
j
2k
N
+e
j
2k
N
_
= 2A(k, l)
2
2A(k, l)
2
cos
2k
N
= 2A(k, l)
2
_
1 cos
2k
N
_
= 2 x
2
kl
_
1 cos
2k
N
_
(5.361)
So, we may write:
(i,j)
(x
ij
x
i,j+1
)
2
=
1
NM
2
(k,l)
x
2
kl
_
1 cos
2k
M
_
(5.362)
Then the cost function that we have to minimise to restore the image may be written
in the frequency domain as
U(x; g) =
(k,l)
_
( x
kl
g
kl
)
2
+ 2a x
2
kl
_
1 cos
2k
N
_
+ 2b x
2
kl
_
1 cos
2l
M
__
(5.363)
where we omitted factor 1/(NM) as not relevant to the minimisation problem.
MAP restoration 499
Example B5.63
By working in the frequency domain, work out the solution that minimises
energy function (5.352).
Cost function (5.352) expressed in the frequency domain is given by (5.363). This
expression involves only quantities that refer to the same site, so we may dierentiate
it with respect to the unknowns x
kl
and set the rst derivative to 0 in order to nd the
values that minimise it:
U(x; g)
x
nm
= 2( x
nm
g
nm
) +4a x
nm
_
1 cos
2n
N
_
+ 4b x
nm
_
1 cos
2m
M
_
(5.364)
Set this derivative to 0 and solve for x
nm
:
x
nm
_
1 + 2a
_
1 cos
2n
N
_
+ 2b
_
1 cos
2m
M
__
= g
nm
x
nm
=
1
1 + 2a
_
1 cos
2n
N
_
+ 2b
_
1 cos
2m
M
_ g
nm
(5.365)
The operation, therefore, of minimising cost function (5.352) is equivalent to using a
lter in the frequency domain dened as:
M(k, l)
1
1 + 2a
_
1 cos
2k
N
_
+ 2b
_
1 cos
2l
M
_ (5.366)
This lter is very similar to the lter we worked out in the constrained matrix inversion
section (see equation (5.272), on page 466). As we do not assume any image blurring
here, we only have in the denominator the term that is used for regularisation.
How do we model in general the cost function we have to minimise in order to
restore an image?
Assume that the original value of each pixel has been distorted by:
(i) a known linear process h;
(ii) a known nonlinear process ;
(iii) additive or multiplicative white Gaussian noise with known mean and standard devia-
tion .
Then it has been shown that the cost function that we have to minimise in order to restore
the image is given by
U(x; g) =
H(x) +
1
2
2
(i,j)
_
1
(g
ij
, (h(x)))
2
(5.367)
where
1
implies division of its arguments if the noise is multiplicative and subtraction of
its arguments if the noise is additive, and
H(x) is the regularisation term.
500 Image Processing: The Fundamentals
Example 5.64
An image was blurred by a linear process with a point spread function
given by:
_
_
_
_
_
1
9
1
9
1
9
1
9
1
9
1
9
1
9
1
9
1
9
_
_
_
_
_
(5.368)
The image was also subject to a nonlinear degradation process where in-
stead of recording the value of the pixel, the square root of the value was
recorded. Finally, the image was damaged by additive white Gaussian noise
with mean and standard deviation . Use formula (5.367) to work out
the cost function for restoring the image.
The degradation by the linear process means that if the true value at pixel (i, j) was
x
ij
, this value was replaced by the average of the 3 3 neighbourhood around position
(i, j). So, h(x) in (5.367) means:
h(x)
1
9
(x
i1,j1
+x
i1,j
+x
i1,j+1
+x
i,j1
+x
i,j
+x
i,j+1
+x
i+1,j1
+x
i+1,j
+x
i+1,j+1
)
(5.369)
This value was corrupted by a nonlinear process ( ) =
. So, (h(x)) in (5.367)
means:
(h(x))
_
x
i1,j1
+x
i1,j
+x
i1,j+1
+x
i,j1
+x
i,j
+x
i,j+1
+x
i+1,j1
+x
i+1,j
+x
i+1,j+1
9
(5.370)
However, this was not the value that was recorded, but rather the sum of this value
with a random noise component:
g
ij
=n
ij
+
1
3
_
x
i1,j1
+x
i1,j
+x
i1,j+1
+x
i,j1
+x
i,j
+x
i,j+1
+x
i+1,j1
+x
i+1,j
+x
i+1,j+1
(5.371)
As the noise was additive, its inverse (denoted by
1
in (5.367)) means taking the
dierence between the recorded value g
ij
and the (h(x)) value, given by (5.370). Then,
the cost function of the restoration problem is:
U(x; g) =
H(x) +
1
2
2
(i,j)
_
_
g
ij
1
3
_
x
i1,j1
+x
i1,j
+x
i1,j+1
+x
i,j1
+x
i,j
+x
i,j+1
+x
i+1,j1
+x
i+1,j
+x
i+1,j+1
__
(5.372)
MAP restoration 501
What is the reason we use a temperature parameter when we model the joint
probability density function, since its does not change the conguration for which
the probability takes its maximum?
The temperature parameter does not aect the solution of the problem as it is obvious from
(5.343), on page 493. That is why it is omitted from the expression of the cost function we
minimise in order to identify the most probable conguration (function (5.344)). However,
the temperature parameter allows us to control how dierent the probabilities that correspond
to two dierent congurations are. If we can control that, we can make the dierentiation
between congurations more or less sharp, at will. For example, we may start by allowing
all congurations to be more or less equally probable, as if our vision were blurred at the
beginning. Then, as we explore the solution space and we gradually get a better understanding
or our problem, we may allow the various congurations to be more and more distinct, as if
our vision sharpens and we can focus better to the most probable solution we seek.
How does the temperature parameter allow us to focus or defocus in the solution
space?
Consider two congurations:
x (x
11
, x
12
, ..., x
NN
)
x ( x
11
, x
12
, ..., x
NN
) (5.373)
Their corresponding probabilities of existence are:
p(x) =
1
Z
e
U(x)
T
(5.374)
p( x) =
1
Z
e
U( x)
T
(5.375)
Let us assume that U(x) > U( x). Then the relative probability q of the two congurations
is:
q
p(x)
p( x)
= e
U(x)U( x)
T
(5.376)
If T +e
U(x)U( x)
T
1
p(x)
p( x)
1: both congurations are equally likely.
If T 0 e
U(x)U( x)
T
0
p(x)
p( x)
0: the conguration with the highest energy
(U(x)) is much less probable than the other.
So, the temperature parameter allows us to control the dierentiation between the dierent
states of the system.
How do we model the prior probabilities of congurations?
This question is equivalent to asking how we dene function
H(x) in (5.344) and in (5.367),
ie how we dene the regularisation term in the cost function. This is largely arbitrary. We
may decide that we would like to favour congurations in which the rst dierences between
neighbouring pixels are minimised. Such a prior model is known as the membrane model:
H(x) =
(i,j)
_
(x
ij
x
i+1,j
)
2
+ (x
ij
x
i,j+1
)
2
(5.377)
502 Image Processing: The Fundamentals
Alternatively, we may decide that we would like to favour congurations in which the second
dierences between neighbouring pixels are minimised. Such a prior model is known as the
thin plate model:
H(x) =
(i,j)
_
(x
i1,j
2x
ij
+x
i+1,j
)
2
+ (x
i,j1
2x
ij
+x
i,j+1
)
2
(5.378)
These two regularisation models are schematically shown in gure 5.23.
(a) (b)
membrane thin plate
Figure 5.23: (a) For any two xed points, the membrane model acts as if an elastic band is
stretched between two pegs. (b) For any three xed points, the thin plate model acts as if
a sti metal rod is bent to pass through them as close as possible. In 2D, the elastic band
corresponds to a membrane and the sti metal rod to a sti metal plate.
What happens if the image has genuine discontinuities?
The genuine discontinuities of the image will be smoothed out by the membrane and the thin
plate models, and the restored image will look blurred. This is because these models impose
the I know the world is by and large smooth prejudice, everywhere in the image. This is
a gross oversimplication. The following two more sophisticated models have been used to
deal with this problem.
1) The imposition of a so called line process that is dened in between pixels and takes
binary values. Its eect is to switch o the smoothness constraint when large discontinuities
in the pixel values are observed. In this case, (5.377) takes the form
H(x) =
(i,j)
_
l
ij;i+1,j
(x
ij
x
i+1,j
)
2
+l
ij;i,j+1
(x
ij
x
i,j+1
)
2
(5.379)
where
l
ij;mn
=
_
1 if [x
ij
x
mn
[ t
0 otherwise
(5.380)
with t being a threshold. Note that when the dierence in value between two successive
pixels exceeds a threshold, the l factor sets to 0 the corresponding term that tries to drive
the solution towards those two pixels having similar values.
2) The use of an edge preserving adaptive regularisation term. The line process dened
above is discrete, not taking into consideration the strength of the discontinuity between two
neighbouring pixels, other than thresholding it. We may dene instead a continuous function
that adapts according to the observed dierence in value. In this case, (5.377) takes the form
H(x) =
(i,j)
_
f(x
ij
, x
i+1,j
)(x
ij
x
i+1,j
)
2
+f(x
ij
, x
i,j+1
)(x
ij
x
i,j+1
)
2
(5.381)
MAP restoration 503
where
f(x
ij
, x
mn
) =
2
1 +e
|x
ij
x
mn
|
(5.382)
with > 0 acting as a soft threshold. Note that if [x
ij
x
mn
[ +, e
|x
ij
x
mn
|
+
and f(x
ij
, x
mn
) 0, so the smoothing is switched o. If [x
ij
x
mn
[ 0, e
|x
ij
x
mn
|
1
and f(x
ij
, x
mn
) 1 and the smoothing term of these two pixels is allowed to play a role in
the minimisation of the cost function.
How do we minimise the cost function?
The cost function typically depends on hundreds of thousands of variables, all the x
ij
, and
in general is not quadratic. As a result, it has lots of local minima. If we were to use
any conventional optimisation technique, like gradient descent for example, what we would
achieve would be to identify the nearest to the starting point local minimum, which could
be well away from the sought global minimum. To avoid that, we have to use stochastic
methods which allow the solution to escape from such minima. Most of the methods people
use come under the general term Monte Carlo Markov Chain (MCMC) methods. This
is because they create a chain of successive congurations in the solution space, each one
diering from the previous usually in the value of a single pixel (see gure 5.24). The way
this chain of successive possible solutions is constructed dierentiates one method from the
other. The term Monte Carlo is used to indicate the stochastic nature of the method.
The term Markov is used to indicate that each new solution is created from the previous
one by an explicit process, while its dependence to all other previously considered solutions
is only implicit. When the temperature parameter in the probability density function is
used to sharpen up the solution space gradually, as our chain of congurations grows, the
method is called simulated annealing. This is because the gradual reduction in the value
of T in (5.343) imitates the way physicists grow, for example, crystals or agglomerates of
ferromagnetic materials, by gradually lowering the temperature of the physical system, in
order to allow them to reach their minimum energy states.
How do we create a possible new solution from the previous one?
Method 1: Select a pixel. Select a possible new value for it uniformly distributed over all
its possible values. If the new value reduces the cost function, accept the change. If the value
increases the cost function, accept the change with probability q. This is called Metropolis
sampler. It tends to be very slow, but it can escape from local minima (see gure 5.25).
Method 2: From the regularisation term you have adopted, work out the local conditional
probability density function for a pixel to take a value s given the values of its neighbours:
p(x
ij
= s[values of neighbours). Select a pixel. Select the most probable of values s for this
pixel and assign it to the pixel. This is called iterative conditional modes method. It
tends to get stuck to local minima.
Method 3: From the regularisation term you have adopted, work out the local conditional
probability density function for a pixel to take a value s given the values of its neighbours:
p(x
ij
= s[values of neighbours). Use a temperature parameter T in the exponent of this
function, so that you can control the sharpness of your conguration space. Select a new
value for pixel (i, j) according to this probability. This is called Gibbs sampler.
504 Image Processing: The Fundamentals
x
11 x
12
x
13
i1
s
x
x
NM
1N
s
s
s
s
0
2
i
x
ij
1
Figure 5.24: The conguration (solution) space for the restoration of an N M image. In
the conguration space we have as many axes as pixels in the image. Along each axis we
measure the value of one pixel. Every point in this space represents an image. The black dot
represents the starting conguration, possibly the degraded image. The cross represents the
solution we are trying to nd, ie the conguration which minimises the cost function globally.
Stochastic optimisation methods create a chain of possible solutions, aiming to end up at the
global minimum of the cost function. As each new possible solution s
i
is created from the
previous one s
i1
, this chain is a Markov chain. Some methods keep the value of the cost
function for each conguration in the chain, and when they stop, they select the solution
with the minimum value of the cost function. Some methods are cleverer than this and try
to direct the construction of the chain so that the nal conguration is the one that makes
the cost function globally minimal.
starting point
U(x;g)
x
11
x
12
x
ij
x
NM
B
A
global minimum
local minima
Figure 5.25: The cost function is a function of thousands of variables, as many as the pixels
in the image. Here we plot the value of the cost function in terms of the values of the pixels
of an NM image. If, when we minimise the cost function, we allow only new congurations
that reduce the cost function, starting from point A we shall end up to point B, which is
the nearest local minimum of function U(x; g). By allowing congurations that increase the
value of the cost function, the Metropolis algorithm can escape from local minima.
MAP restoration 505
The pixel we select to process may be chosen at random. However, a systematic approach,
according to which we visit each pixel of the image in turn, is preferable. When we have visited
all pixels of the image once, we say we have one iteration of the algorithm. In all methods, it
is necessary to scan the image in two passes when we use the membrane model. In each pass
we should update pixels that do not belong to the neighbourhood of each other. Figure 5.26a
shows how the image should be split when we use the membrane model. The division of the
image in two such disjoint sets is known as coding or colouring of the image. If we update
the values of neighbouring pixels in the same pass, we tend to move in circles. Consider the
case where we have obtained an improved value of pixel A in gure 5.26b on the basis of the
values of its neighbours, and then we consider pixel B. If we update the value of B using the
old value of pixel A, we may end up losing any gain we had achieved by changing the value
of A in the output image.
A B
(a) (b)
Figure 5.26: (a) The coding or colouring scheme that should be used for the restoration of
an image using the membrane model. The white pixels are neighbours of the black pixels.
For all sampling methods, we have to update the values of the white pixels rst, taking into
consideration the values of their neighbours. This constitutes one pass of the image. Then,
with the white pixels having new values, we update the values of the black pixels taking
into consideration the values of their neighbours. This is a second pass of the image. The
two passes together constitute an iteration. (b) Pixels A and B should not have their values
updated in the same pass. Each pixel and its neighbourhood is delineated with a line of
constant thickness.
How do we know when to stop the iterations?
If T in Method 3 is xed, we have a so called xed temperature annealing (actually a
contradiction in terms, but that is what it is called!). The method consists of a xed number
of passes (until one runs out of patience and time!). The value of the cost function for each
constructed conguration is used to keep track of the conguration with the minimum cost
value that was encountered. When the process ends, the conguration that produced the
minimum cost is identied as the desired solution.
If the probability q with which we accept a conguration that increases the cost function
in Method 1 reduces from one iteration to the next, or if T in Method 3 reduces from one
iteration to the next, we have the method of simulated annealing. In the rst case, we have
simulated annealing with the Metropolis sampler, in the second, simulated annealing with the
Gibbs sampler. The iterations stop when the algorithm converges, ie when the improvement
in the cost function from one iteration to the next is below a threshold.
506 Image Processing: The Fundamentals
How do we reduce the temperature in simulated annealing?
The formula we adopt for reducing the temperature is known as cooling schedule. It has
been shown that if we use
T(k) =
C
ln(1 +k)
for k = 1, 2, . . . (5.383)
where k is the iteration number and C is some positive constant, the simulated annealing
algorithm converges to the global minimum of the cost function. This cooling schedule,
however, is extremely slow. That is why, often, alternative suboptimal cooling schedules are
adopted:
T(k) = aT(k 1) for k = 1, 2, . . . (5.384)
Here a is chosen to be a number near 1, but just below it. Typical values are 0.99 or 0.999.
The starting value T(0) is selected to be high enough to make all congurations more or less
equally probable. One has to have a look at the typical values the cost function takes in order
to select an appropriate value of T(0). Typical values are of the order of 10.
How do we perform simulated annealing with the Metropolis sampler in practice?
You are advised to scale the image values from 0 to 1 when using this algorithm. The
algorithm below is for the membrane model.
Step 0: Create an array the same size as the original image and set its values equal to the
original image. Select a cooling schedule, a value for the cooling parameter a and a starting
value for the temperature parameter. Set the current conguration x
c
to be the same as the
degraded image g. Compute the cost function U(x
c
= g) U
old
. Set threshold = 0.001 (or
any other value of your choice).
Step 1: Consider the two subsets of pixels of the image, as identied in gure 5.26a, one
after the other.
Step 2: Visit every pixel of the colouring you consider of the current conguration x
c
in turn
(say the white pixels in gure 5.26a). For the pixel you are currently visiting, select a new
value uniformly distributed in the range of acceptable values, [0, 1].
Step 2.1: Using the new value for this pixel, compute the value of the cost function.
Call it U
new
.
Step 2.2: If U
new
U
old
, accept the new value for this pixel and use it to replace the
value in the output array. Set U
old
= U
new
. Go to Step 2.6.
Step 2.3: If U
new
> U
old
, compute
q e
1
T
(U
new
U
old
)
(5.385)
Step 2.4: Draw a random number uniformly distributed in the range [0, 1].
Step 2.5: If q, accept the new value for this pixel and use it to replace the
value in the output array. Set U
old
= U
new
.
Step 2.6: If you have not nished visiting all pixels of the image of the current
colouring, go to the next pixel (Step 2). If you have nished with all pixels of this
colouring, set the current conguration x
c
equal to the output conguration and
go to Step 1 in order to carry on with the pixels of the other colouring.
Step 3: When all pixels of the image have been visited, compute the global cost function of
MAP restoration 507
current conguration x
c
, U
new
. If [U
old
U
new
[ < thresholdU
old
, exit. If not, go to Step 4.
Step 4: Set current array x
c
equal to the output array. Reduce the temperature according
to the cooling schedule. Set U
old
= U
new
. Go to Step 1.
Note that at the beginning of the iterations, and as T is large, the ratio q of the probability
of the conguration with the new pixel value over the conguration with the old pixel value
is just below 1. It is highly likely then for the number we draw uniformly distributed in
the range [0, 1] (Step 2.4) to be smaller than q. This means that the new value of the pixel is
accepted with high probability, even though it makes the conguration worse (increases the
cost function). As the iterations progress and T becomes smaller, when the new conguration
is less probable than the old conguration, q becomes smaller than 1 but not near 1. It rather
tends to be near 0 because the lower temperature allows us to dierentiate more sharply
between the dierent congurations. Thus, when we draw uniformly distributed in the
range [0, 1], we have much lower probability to draw a number smaller than q than we had
at the beginning of the algorithm. This means that as the iterations advance, we accept
congurations that worsen the cost function with smaller and smaller probability. This way,
at the beginning of the algorithm, the system is hot, ie anything goes! We are prepared
to accept bad congurations much more easily than later on. The solution is allowed to
jump around in the conguration space rather easily. This allows the solution to avoid being
trapped in local minima of the cost function. As the system cools down, and the values of the
pixels begin to gel, we do not allow the solution to jump to other parts of the conguration
space so much, as we consider that we are approaching the global minimum and we do not
want to miss it by moving away from it.
Some tips concerning the implementation of this algorithm follow.
When we compute the value of the cost function for the proposed new value of a pixel, we
only compute the terms of the cost function that are aected by the value of the pixel under
consideration and replace their old contribution to the cost function by the new one. This
saves considerable computation time.
We may keep a track of how many times in succession we pass through steps 2.4 and 2.5.
If we keep going through that loop and not getting a new pixel value that genuinely reduces
the cost function, we may decide to exit the algorithm and accept as solution the most recent
conguration we had for which the cost function genuinely reduced. You may, therefore, use
as an extra exiting criterion, the following: if for X successive pixels you did not manage to
draw a new value that improved the cost function, go back to the conguration that you had
before the latest X pixels were visited and output it as the solution. X here could be 50, or
100, or a similar number.
Typically, hundreds if not thousands of iterations are required.
How do we perform simulated annealing with the Gibbs sampler in practice?
You are advised to scale the image values from 0 to 1 when using this algorithm. The
algorithm below is for the membrane model.
Step 0: Create an array the same size as the original image and set its values equal to those of
the original image. Select a cooling schedule, a value for the cooling parameter and a starting
value for the temperature parameter. Set the current conguration x
c
to be the same as the
degraded image g. Compute the cost function U(x
c
= g) U
old
. Set threshold = 0.001, or
another similar value of your choice.
508 Image Processing: The Fundamentals
Step 1: Visit every odd pixel of the current conguration x
c
in turn (the white pixels in
gure 5.26a). For every pixel, given the values of its neighbours, draw a new value with
probability:
p(x
ij
= x
new
[x
i1,j;c
, x
i,j1;c
, x
i+1,j;c
, x
i,j+1;c
) =
1
Z
e
1
T
[(x
new
x
i1,j;c
)
2
+(x
new
x
i,j1;c
)
2
+(x
new
x
i+1,j;c
)
2
+(x
new
x
i,j+1;c
)
2
]
(5.386)
Use this value to replace the value of this pixel in the output array. When you have nished
visiting all white pixels, set the current array x
c
equal to the output array and go to Step 2.
Step 2: Visit every even pixel of the current conguration x
c
in turn (the black pixels in
gure 5.26a). For every pixel, given the values of its neighbours, draw a new value as in Step
1. Use this value to replace the value of this pixel in the output array.
Step 3: Compute the global cost function of current conguration x
c
, U
new
. If [U
old
U
new
[ <
threshold U
old
, exit. If not, go to Step 4.
Step 4: Set current array x
c
equal to the output array. Reduce the temperature according
to the cooling schedule. Set U
old
= U
new
. Go to Step 1.
Typically, hundreds or even thousands of iterations are required.
Box 5.11. How can we draw random numbers according to a given proba-
bility density function?
Most computers have programs that can produce uniformly distributed random num-
bers. Let us say that we wish to draw random numbers x according to a given probability
density function p
x
(x). Let us also say that we know how to draw random numbers
y with a uniform probability density function dened in the range [A, B]. We may
formulate the problem as follows.
Dene a transformation y = g(x) which is one-to-one and which is such that if y is
drawn from a uniform probability density function in the range [A, B], samples x are
distributed according to the given probability density function p
x
(x).
Since we assume that relationship y = g(x) is one-to-one, we may schematically depict
it as shown in gure 5.27.
x
y g(x)
x
y
1
1
Figure 5.27: A one-to-one relationship between x and y.
MAP restoration 509
It is obvious from gure 5.27 that distributions P
y
(y
1
) and P
x
(x
1
) of the two variables
are identical, since whenever y is less than y
1
g(x
1
), x is less than x
1
:
P
y
(y
1
) T(y y
1
) = T(x x
1
) P
x
(x
1
) (5.387)
The distribution of x is known, since the probability density function of x is known:
P
x
(x
1
)
_
x
1
p
x
(x)dx (5.388)
The probability density function of y is given by:
p
y
(y) =
_
_
1
BA
for A y B
0 otherwise
(5.389)
The distribution of y is then easily obtained:
P
y
(y
1
)
_
y
1
p
y
(y)dy =
_
_
0 for y
1
A
y
1
A
BA
for A y
1
B
1 for B y
1
(5.390)
Upon substitution in (5.387), we obtain
y
1
A
B A
= P
x
(x
1
) (5.391)
which leads to:
y
1
= (B A)P
x
(x
1
) +A (5.392)
Random number generators usually produce uniformly distributed numbers in the range
[0, 1]. For A = 0 and B = 1 we have:
y
1
= P
x
(x
1
) (5.393)
So, to produce random numbers x, distributed according to a given probability den-
sity function p
x
(x), we follow these steps: we compute the distribution of x, P
x
(x
1
)
using equation (5.388); we tabulate pairs of numbers (x
1
, P
x
(x
1
)); we draw uniformly
distributed numbers y
1
in the range [0, 1]; we use our tabulated numbers as a look-up
table where for each y
1
= P
x
(x
1
) we look up the corresponding x
1
. These x
1
numbers
are our random samples distributed according to the way we wanted.
510 Image Processing: The Fundamentals
Example B5.65
Explain how you are going to draw random numbers according to the
probability density function (5.386).
We start by rewriting the probability density function. For simplicity we call p(x
ij
=
x
new
[x
i1,j;c
, x
i,j1;c
, x
i+1,j;c
, x
i,j+1;c
) p
n
, x
new
x
n
and we drop the explicit de-
pendence on the values of the neighbours in the current conguration, ie we drop ; c
from (5.386). We also expand the squares in the exponent and collect all similar terms
together. Then:
p
n
=
1
Z
e
1
T
[4x
2
n
+x
2
i1,j
+x
2
i,j1
+x
2
i+1,j
+x
2
i,j+1
2x
n
(x
i1,j
+x
i,j1
+x
i+1,j
+x
i,j+1
)]
(5.394)
Call:
4
T
A
1
T
(x
i1,j
+x
i,j1
+x
i+1,j
+x
i,j+1
) B
1
T
(x
2
i1,j
+x
2
i,j1
+x
2
i+1,j
+x
2
i,j+1
) (5.395)
Then we have:
p
n
=
1
Z
e
(Ax
2
n
2Bx
n
+)
(5.396)
To draw random numbers according to this probability density function, we have to
compute the cumulative distribution P
n
(z) of p
n
:
P
n
(z) =
_
z
0
p
n
dx
n
=
1
Z
_
z
0
e
(Ax
2
n
2Bx
n
+)
dx
n
(5.397)
We dene a new variable of integration y
Ax
n
dx
n
= dy/
A. Then
P
n
(z) =
1
Z
A
_
Az
0
e
(y
2
2y+)
dy (5.398)
where we set B/
A
_
Az
0
e
(y
2
2y++
2
2
)
dy
=
1
Z
A
e
(
2
)
_
Az
0
e
(y)
2
dy
=
1
Z
A
e
(
2
)
_
Az
0
e
t
2
dt (5.399)
MAP restoration 511
where we set t y . We remember that the error function is dened as:
erf(z)
2
_
z
0
e
t
2
dt (5.400)
Then:
P
n
(z) =
1
Z
A
e
(
2
)
2
. .
erf(
Az )
= erf(
Az ) (5.401)
If we want to draw random numbers according to p
n
, we must draw random numbers
uniformly distributed between 0 and 1, treat them as values of P
n
(z), and from them
work out the values of z. The values of z we shall work out this way will be distributed
according to p
n
.
To deal with the unknown scaling constant , that multiplies the error function in
(5.401), we sample the z variable using, say 100 points, equally spaced between 0 and
1. For each value of z, we compute erf(
Az)
to vary between 0 and 1. This way, we have a table of corresponding values of z and
P
n
(z). Then, when we need to draw a number according to p
n
, we draw a number
from a uniform distribution between 0 and 1, and use it as the value of P
n
(z), to look
up the corresponding value of z. This is the number we use in the Gibbs sampler as a
new value for the pixel under consideration.
Why is simulated annealing slow?
Simulated annealing is slow because it allows the long range interactions in the image to
evolve through a large number of local interactions. This is counter-intuitive, as in every
day life, we usually try to get the rough picture right rst, ie the gross long range and low
frequency characteristics, and then proceed to the details.
How can we accelerate simulated annealing?
We may speed up simulated annealing by using a multiresolution approach. The philosophy
of multiresolution approaches is as follows. We impose some graininess to the solution
space. Each grain represents many congurations of ne resolution. Each grain is rep-
resented by a single conguration of coarse resolution. We then scan the conguration space
by jumping from grain to grain! That is, we nd a solution by using only the coarse represen-
tative congurations. Then we search for the nal solution of ne resolution only among the
congurations of the nal grain, ie only among the congurations that are compatible with
the coarse solution we found. This is schematically shown in gure 5.28.
512 Image Processing: The Fundamentals
x
11 x
12
x
13
x
NM
x
1N
x
ij
Figure 5.28: The conguration space with a coarse graininess imposed to it. Each blob
represents thousands of possible solutions and is represented by a single coarse solution,
indicated by an here. The gross resolution allows us to jump from blob to blob, thus taking
large strides in the conguration space. If we have done the coarsening correctly, the solution
we seek, marked here by a cross, is within the last grain, ie it is compatible with the coarse
solution we nd. Starting from an upsampled version then of that coarse solution, we may
proceed with careful small steps towards the nal solution, examining only congurations
that are compatible with the coarse solution.
How can we coarsen the conguration space?
Very often people coarsen the image by blurring and subsampling it. Then they nd a solution
for the downsampled image by using the same model they had adopted for the original image,
and upsample it (often by pixel replication) and use it as a starting point of nding the solution
for the next higher resolution. This approach, however, is rather simplistic: the conguration
space depends on the data and the model. It is not correct to coarsen the data only. For
a start, coarsening the data only does not guarantee that the global minimum of the cost
function we are seeking will be inside the nal grain to which we stop in the conguration
space (see gure 5.28). Furthermore, when we adopt a model for the image, we explicitly
model the direct interaction between neighbouring pixels, but at the same time, we model
implicitly interactions at all scales, since a pixel is inuenced by its immediate neighbours,
but those are inuenced by their own neighbours, and so on. So, when we coarsen the
conguration space, the model implied for the coarsened space is already dened and we
have no freedom to arbitrarily redene it. The correct way to proceed is to coarsen the data
and the model together. This method is known as renormalisation group transform.
The use of the renormalisation group transform guarantees that the global solution sought
is among the solutions represented by the gross solution we derive using the coarse data and
model representations. It also allows one to dene the model for the coarse data that is
compatible with the model we originally adopted for the full image. However, the practical
implementation of the renormalisation group transform is quite dicult, and some shortcuts
have been proposed, like the super-coupling transform. These methods are quite involved
and beyond the scope of this book.
Geometric image restoration 513
5.6 Geometric image restoration
How may geometric distortion arise?
Geometric distortion may arise because of the imperfections of the lens or because of the
irregular movement of the sensor during image capture. In the former case, the distortion
looks regular like those shown in gure 5.29. The latter case arises, for example, when an
aeroplane photographs the surface of the Earth with a line scan camera: as the aeroplane
wobbles, the captured image may be inhomogeneously distorted, with pixels displaced by as
much as 45 interpixel distances away from their true positions and in random directions.
Why do lenses cause distortions?
Lenses cause distortions because they are not ideal. They are usually very complicated devices
designed to minimise other lens undesirable eects. This is sometimes done at the expense
of geometric delity. For example, a telephoto lens may cause the pin-cushion distortion,
while a wide-angle lens may cause the barrel distortion (see gure 5.29).
(a)Perfect image (b) Pin-cushion distortion (c) Barrel distortion
Figure 5.29: Examples of geometric distortions caused by the lens.
How can a geometrically distorted image be restored?
We start by creating an empty array of numbers the same size as the distorted image. This
array will become the corrected image. Our purpose is to assign grey values to the elements of
this array. This can be achieved by performing a two-stage operation: spatial transformation
followed by grey level interpolation (see gure 5.30).
How do we perform the spatial transformation?
Assume that the correct position of a pixel is (x
c
, y
c
) and the distorted position is (x
d
, y
d
) (see
gure 5.30). In general, there will be a transformation which leads from one set of coordinates
to the other, say:
x
d
= O
x
(x
c
, y
c
) y
d
= O
y
(x
c
, y
c
) (5.402)
The type of transformation we choose depends on the type of distortion. For lens distortion
we use a global transformation. For inhomogeneous distortion we use a mosaic of local
transformations.
514 Image Processing: The Fundamentals
x
c
y
c
x
d
y
d
A
A
Figure 5.30: In this gure the pixels correspond to the nodes of the grids. Pixel A of the
corrected grid corresponds to inter-pixel position A
will
not coincide with a pixel position.
How may we model the lens distortions?
The distortions caused by the lens tend to be radially symmetric. So, the transformation we
have to apply in order to correct for that should also be radially symmetric. If we dene
r
_
x
2
+y
2
measured from the image centre, transformation (5.402) takes the form
r
d
=
f(r
c
) (5.403)
with the obvious interpretation for r
d
and r
c
. Function
f(r
c
) has to have certain properties:
it has to yield 0 distortion at the image centre, with the level of distortion increasing as we
move towards the image periphery;
it has to be invertible, so that one can easily move between the distorted and the undistorted
coordinates;
it has to be antisymmetric in x and y, so that negative coordinate positions move away
from the centre by the same amount as the corresponding positive coordinate positions.
For all the above reasons, the model adopted usually is:
r
d
= r
c
f(r
c
)
_
x
d
= x
c
f(r
c
)
y
d
= y
c
f(r
c
)
(5.404)
Typical functions f(r
c
) used are:
f(r
c
) = 1 +k
1
r
c
(5.405)
f(r
c
) = 1 +k
1
r
2
c
(5.406)
f(r
c
) = 1 +k
1
r
c
+k
2
r
2
c
(5.407)
f(r
c
) = 1 +k
1
r
2
c
+k
2
r
4
c
(5.408)
f(r
c
) =
1
1 +k
1
r
c
(5.409)
f(r
c
) =
1
1 +k
1
r
2
c
(5.410)
f(r
c
) =
1 +k
1
r
c
1 +k
2
r
2
c
(5.411)
f(r
c
) =
1
1 +k
1
r
c
+k
2
r
2
c
(5.412)
Geometric image restoration 515
In these formulae k
1
and k
2
are model parameters that have to be specied. Often, when
considering the individual components of such a transformation, as in (5.404), the parameters
have dierent values for the x coordinate and the y coordinate, as the pixels are not always
perfectly square.
Example B5.66
It has been noticed that the lens distortion models tend to change the scale
of the image, either making it larger (in the case of the barrel distortion),
or smaller (in the case of the pin-cushion distortion). In order to avoid this
problem, the following spatial transformation model has been proposed:
x
c
= x
d
(1 +ky
2
d
)
y
c
= y
d
(1 +kx
2
d
) (5.413)
Explain why this model may not be acceptable.
This model is not analytically invertible to yield the distorted coordinates as functions
of the correct coordinates. If we use the second of equations (5.413) to solve for y
d
and substitute in the rst, we obtain:
x
c
= x
d
_
1 +k
_
y
c
1 +kx
2
d
_
2
_
x
c
(1 +kx
2
d
)
2
= x
d
(1 +kx
2
d
)
2
+kx
d
y
2
c
x
c
+x
c
k
2
x
4
d
+ 2x
c
kx
2
d
= x
d
+ 2kx
3
d
+k
2
x
5
d
+kx
d
y
2
c
k
2
x
5
d
+x
c
k
2
x
4
d
2kx
3
d
+ 2x
c
kx
2
d
(1 +ky
2
c
)x
d
= 0
(5.414)
This is a fth order equation in terms of x
d
: not easily solvable.
In addition, this model is not radially symmetric, although lens distortion tends to be
radially symmetric.
How can we model the inhomogeneous distortion?
In the general case we also consider a parametric spatial transformation model, only now we
apply it locally. A commonly used transformation is
x
d
= c
1
x
c
+c
2
y
c
+c
3
x
c
y
c
+c
4
y
d
= c
5
x
c
+c
6
y
c
+c
7
x
c
y
c
+c
8
(5.415)
where c
1
, c
2
, . . . , c
8
are some parameters. Alternatively, we may assume a nonlinear trans-
formation, where squares of the coordinates x
c
and y
c
appear on the right-hand sides of the
above equations.
516 Image Processing: The Fundamentals
How can we specify the parameters of the spatial transformation model?
In all cases, the values of the parameters of the spatial transformation model have to be
determined from the known positions of pixels in the distorted and the undistorted grid.
Such points are called tie points. For example, in aerial photographs of the surface of
the Earth, the values of parameters c
1
, . . . , c
8
in (5.415) can be determined from the known
positions of specic land mark points. There are several such points scattered all over the
surface of the Earth. We may use, for example, four such points to nd the values of the
above eight parameters and assume that these transformation equations, with the derived
parameter values, hold inside the whole quadrilateral region dened by these four tie points.
Then, we apply the transformation to nd the position A
will not have integer coordinates even though the coordinates of point
A in the (x
c
, y
c
) space are integer. This means that we do not actually know the grey value
at position A
. That is when the grey level interpolation process comes into play. The grey
value at position A
can be estimated from the values at its four nearest neighbouring pixels
in the (x
d
, y
d
) space by some method, for example by bilinear interpolation. We assume
that inside each little square the grey value is a simple function of the positional coordinates
g(x
d
, y
d
) = x
d
+y
d
+x
d
y
d
+ (5.416)
where , . . . , are some parameters. We apply this formula to the four corner pixels to derive
values for , , and and then use these values to calculate g(x
d
, y
d
) at the position of A
point.
Figure 5.31 shows in magnication the neighbourhood of point A
gets the grey value of the pixel which is nearest to it. A more sophisticated
method is to t a higher order surface through a larger patch of pixels around A
and nd
the value at A
_
c
1
= 1
c
5
=
1
3
Pixel C
1 = 3c
2
3 = 3c
6
_
_
c
2
=
1
3
c
6
= 1
Pixel D
4 = 3 + 3
1
3
+ 9c
3
4 = 3
1
3
+ 3 + 9c
7
_
_
c
3
= 0
c
7
= 0
(5.420)
518 Image Processing: The Fundamentals
The distorted coordinates, therefore, of any pixel within rectangle ABDC, are given
by:
x
d
= x
c
+
y
c
3
, y
d
=
x
c
3
+y
c
(5.421)
For x
c
= y
c
= 2 we have x
d
= 2 +
2
3
, y
d
=
2
3
+2. So, the coordinates of pixel (2, 2) in
the distorted image are (2
2
3
, 2
2
3
). This position is located between pixels in the distorted
image and actually between pixels with grey values as shown in gure 5.32.
x
A
2/3
2/3
(0,0) 5 (1,0) 3
(1,1) 1
(0,1) 2
Figure 5.32: A local coordinate system ( x, y) is dened to facilitate the calculation of
the parameters of the bilinear interpolation between pixels at locations (2, 2), (3, 2),
(2, 3) and (3, 3) of the original distorted grid, with corresponding grey values 5, 3, 2
and 1.
We dene a local coordinate system ( x, y), so that the pixel at the top left corner has
coordinate position (0, 0), the pixel at the top right corner has coordinates (1, 0), the
one at the bottom left (0, 1) and the one at the bottom right (1, 1).
Assuming that the grey value between four pixels can be computed from the grey values
in the four corner pixels, with bilinear interpolation, we have:
g( x, y) = x + y + x y + (5.422)
Applying this for the four neighbouring pixels, we have:
5 = 0 + 0 + 0 + = 5
3 = 1 + 0 + 0 + 5 = 2
2 = (2) 0 + 1 + 0 + 5 = 3
1 = (2) 1 + (3) 1 + 1 1 + 5 = 1 (5.423)
Therefore:
g( x, y) = 2 x 3 y + x y + 5 (5.424)
We apply this for x =
2
3
and y =
2
3
to obtain:
g
_
2
3
,
2
3
_
= 2
2
3
3
2
3
+
2
3
2
3
+ 5 =
4
3
6
3
+
4
9
+ 5 = 2
1
9
(5.425)
Geometric image restoration 519
So, the grey value at position (2, 2) should be 2
1
9
. Rounding it to the nearest integer
we have 2. If the nearest neighbour interpolation were used, we would have assigned
to this position grey value 1, as the nearest neighbour is the pixel at the bottom right
corner.
Example 5.68
It was established that the image captured was rotated in relation to the x
axis of the image by an angle in the counter-clockwise direction. Correct
the image so that it has 0 rotation with respect to the image axes.
Clearly, the correct coordinates of a pixel (x
c
, y
c
) will have to be rotated by angle in
order to coincide with the distorted coordinates (x
d
, y
d
) (see gure 5.33):
x
d
= x
c
cos +y
c
sin
y
d
= x
c
sin +y
c
cos (5.426)
We create rst an empty grid, the same size as the original image. For every pixel
(x
c
, y
c
) of this grid, we apply transformation (5.426) to identify the corresponding
position (x
d
, y
d
) in the captured image. We then interpolate the values of pixels
(x
d
|, y
d
|), (x
d
|, y
d
| + 1), (x
d
| + 1, y
d
|) and (x
d
| + 1, y
d
| + 1) using bilin-
ear interpolation, to work out a grey value for position (x
d
, y
d
), which we assign to
pixel (x
c
, y
c
) of the corrected image.
y
c
x
c
y
d
x
d
O
D
A
C
B
P
E
F
Figure 5.33: The coordinates of a point P in the correct coordinate system are
(OA, OB). In the rotated coordinate system, the coordinates of the point are
(OC, OD). From simple geometry: OC = OAcos +AP sin and OD = OBcos
BP sin . Then equations (5.426) follow.
520 Image Processing: The Fundamentals
Box 5.12. The Hough transform for line detection
The equation of a straight line may be written as
p = xcos +y sin (5.427)
where (, p) are some parameters that fully dene the line, dened as shown in gure
5.34a.
p=p
1
p
point A
1
p
1
O
x
y
(a)
=
1
A
B
C
(b) (c)
point B
point A
p
Figure 5.34: (a) A line may be fully dened if we know length p = OC of the normal
from the image centre O to the line, and the angle it forms with the reference direction
Ox. (b) A point A in Oxy space corresponds to a curve in the (, p) space. (c) Two
points A and B fully dene a line and their corresponding curves in the (, p) space
intersect exactly at the point that denes the parameters of the line.
Let us consider a pixel (x
i
, y
i
) which belongs to this line:
p = x
i
cos +y
i
sin (5.428)
This equation represents a trigonometric curve when plotted in the (, p) space, as
shown in gure 5.34b. If we consider another point (x
j
, y
j
), which belongs to the same
line, we shall have another trigonometric curve in space (, p), crossing the rst one.
The point where the two curves cross species the parameters of the line dened by
the two points, as shown in 5.34c. Using this principle, we may identify alignments
of pixels in a binary image by using the following algorithm. This algorithm is for
identifying alignments of black pixels in a binary image.
Step 0: Decide the number of bins you will use to quantise parameters (, p). Let us
say that the input image is M N and that m = minM, N. Let us say that we shall
allow p to take values up to p
max
= m/2|, in steps of 1, so that K p
max
. Let us also
say that we shall allow to take values from 0 to 359
o
in steps of 1
o
, so that L 360.
Step 1: Create an empty accumulator array A, of size K L, as shown in gure 5.35.
Step 2: Consider every black pixel in the image, with coordinates (x
i
, y
i
) and for
every value of we allow, compute the corresponding value of p, using (5.428).
Geometric image restoration 521
Step 3: For every value of p
l
you compute, corresponding to angle = l, identify the
bin to which it belongs in the accumulator array A, as k = p
l
|.
Step 4: Increase the cell of the accumulator array by 1: A(l+1, k+1) = A(l+1, k+1)+1.
Here we assumed that the elements of the accumulator array are identied by indices l
and k that take values 1, 2, . . . , L and 1, 2, . . . , K, respectively.
Step 5: Identify the peaks in the accumulator array. The coordinates of these peaks
yield the parameters of the straight lines in your image. If you expected to nd I lines,
consider the I strongest peaks.
b
i
n
1
,
=
0
b
i
n
2
,
=
1
b
i
n
3
,
=
2
b
i
n
4
,
=
3
b
i
n
5
,
=
4
max
b
i
n
L
,
=
3
5
9
bin 1, p = 0.5
bin 2, p = 1.5
bin 3, p = 2.5
bin K, p = p 0.5
+1 +1
+1 +1
+1
+1
+1
+1
+1
+1 +1
+1
+1
+1
Figure 5.35: An accumulators array is a 2D histogram where we count how many curves
enter each bin. Each bin has width 1 and the values given correspond to the centre of
each bin. A bin with coordinates (l, k) has as central values = l 1 and p = k 0.5
and contains all values in the range [l 0.5, l + 0.5) and [k 1, k). A single point in
the binary image creates a curve that passes through many bins (the black bins shown).
Every time a curve passes through a bin, the value of the bin is increased by 1. The
more curves pass through a bin, the more points vote for the corresponding values of
and p. All those points form a straight line in the image dened by the parameters
of the centre of this bin.
Example B5.69
A line may be dened in terms of parameters (, p), as shown in gure 5.34.
Work out the coordinates of the intersection of two lines.
Let us assume that the equations of the two lines are:
p
1
= xcos
1
+y sin
1
p
2
= xcos
2
+y sin
2
(5.429)
522 Image Processing: The Fundamentals
To solve this system of equations for x, we multiply the rst one with sin
2
and the
second with sin
1
and subtract them by parts:
p
1
sin
2
p
2
sin
1
= x(cos
1
sin
2
cos
2
sin
1
)
= xsin(
2
1
)
x =
p
1
sin
2
p
2
sin
1
sin(
2
1
)
(5.430)
To solve the system for y, we multiply the rst equation with cos
2
and the second
with cos
1
and subtract them by parts:
p
1
cos
2
p
2
cos
1
= y(sin
1
cos
2
sin
2
cos
1
)
= y sin(
1
2
)
y =
p
1
cos
2
p
2
cos
1
sin(
1
2
)
(5.431)
Note that if the two lines are parallel,
1
=
2
and formulae (5.430) and (5.431) will
contain divisions by 0, which will yield innite values, indicating that the two lines
meet at innity, which would be correct.
Example B5.70
Explain how you can work out the distortion parameters for the lens of the
camera of your mobile, assuming the spatial transformation model given
by (5.405), on page 514.
First I will construct a drawing consisting of L equidistant vertical black lines crossed
by L equidistant horizontal lines, as shown in gure 5.29a. I will then photograph this
drawing from a distance, say 5cm, with the centre of the drawing being roughly at the
centre of the image. Let us say that this way I will create an image of size M N
pixels. This will be a colour image. I will convert it into a binary image by setting all
pixels with colour values R, G and B to 255 if R + G + B > 3 127 = 381, and all
pixels with R +G+B 381 to 0.
I will then apply the Hough transform (see Box 5.12) to identify all straight lines
in the image. I expect that I shall have several lines with angle around 0
o
and 180
o
(corresponding to the vertical lines of the grid), and several lines with angle around
90
o
and 270
o
(corresponding to the horizontal lines of the grid). If I had managed to
photograph the grid exactly aligned with the axes of the image plane and if there were
no discretisation problems, obviously, the lines would have been observed with exactly
these values.
To identify any rotation of the grid in relation to the image axes, I will subtract 180
o
Geometric image restoration 523
from all the angles that are near 180
o
and average the resultant values with the values
that are near 0
o
. Let us say that this way I will nd an angle
V
. I will also subtract
180
o
from all the values that are around 270
o
and average the resultant angles with
all those that are about 90
o
. Let us say that this way I will nd an angle
H
. Then
I will average
V
and
H
90
o
in order to identify the angle by which the grid I
photographed is misaligned with the image axes. Let us say that I identied an angle
R
. I must rst assess whether this angle is worth worrying about or not. To do that,
I will multiply tan
R
with minM/2, N/2 to see how much shift this creates near the
periphery of the image. If this shift is more than 0.5 (ie more than half a pixel), then
I must correct for rotation.
How we correct for rotation is shown in example 5.68. I will have to apply the rotation
to the original colour image I captured, then binarise the rotation-corrected image and
perform again the Hough transform. This time the detected lines must have angles
very close to 0
o
, 180
o
, 90
o
and 270
o
.
Let us consider the four lines identied by the Hough transform algorithm with the
minimum values of parameter p, ie the four lines nearest to the image centre. Let us
say that the image centre has coordinates (i
0
, j
0
). I would like to identify the corners of
the smallest rectangle that surrounds the centre. I can do that by combining the
four lines two by two and computing their intersection point using formulae (5.430)
and (5.431) of example 5.69. The coordinates of the intersection points are rounded
to the nearest integer. Let us say that in this way we identify the four corners of
the smallest grid cell around the centre of the image, to be: (i
TL
, j
TL
), (i
BL
, j
BL
),
(i
TR
, j
TR
) and (i
BR
, j
BR
). We know that the lens distortion is negligible near the
image centre, so we may infer that the positions of these four points are the correct
positions. Then I may nd the length of the sides of the rectangle in terms of pixels,
by using:
d
LV
=
_
(i
TL
i
BL
)
2
+ (j
TL
j
BL
)
2
d
RV
=
_
(i
TR
i
BR
)
2
+ (j
TR
j
BR
)
2
d
TH
=
_
(i
TL
i
TR
)
2
+ (j
TL
j
TR
)
2
d
BH
=
_
(i
BL
i
BR
)
2
+ (j
BL
j
BR
)
2
(5.432)
We average the two vertical and the two horizontal lengths to have a better estimate:
d
H
=
d
TH
+d
BH
2
d
V
=
d
LV
+d
RV
2
(5.433)
Next, let us identify the corner of this rectangle that is closest to the image centre
(i
0
, j
0
), and let us shift that corner to coincide with the image centre. Let us call this
new image centre (
i
0
,
j
0
). These steps are shown schematically in gure 5.36.
Now we know that one of the corners of the grid I drew and photographed coincides
with the image centre. We also know that the outer corners of the grid should be at
524 Image Processing: The Fundamentals
coordinates which are known integer multiples of d
H
and d
V
. These coordinates are
expressed in terms of pixels and they are measured from the image centre. Let us call
these locations of the four outer corners of the photographed grid their correct locations
(see gure 5.37):
x
TLc
=
i
0
d
H
L
1
y
TLc
=
j
0
d
V
L
3
x
BLc
=
i
0
d
H
L
1
y
BLc
=
j
0
+d
V
L
4
x
TRc
=
i
0
+d
H
L
2
y
TRc
=
j
0
d
V
L
3
x
BRc
=
i
0
+d
H
L
2
y
BRc
=
j
0
+d
V
L
4
(5.434)
By considering now the four lines with the maximum value of p identied by the Hough
transform, we can work out the observed coordinates of the four outer corners of the
grid. Let us call them:
(x
TLd
, y
TLd
) (x
BLd
, y
BLd
) (x
TRd
, y
TRd
) (x
BRd
, y
BRd
) (5.435)
TL
BL BR j
i
TR TL TR
BR BL j
i
TL TR
i
j BR BL
(c) (b) (a)
Figure 5.36: The thick lines represent the imaged grid. The intersections of the thin
lines represent the pixel centres. The four corners of any rectangle are identied as
top left (TL), top right (TR), bottom left (BL) and bottom right (BR). (a) The imaged
grid may not be perfectly aligned with the image axes. (b) After we correct the image
by rotating it about the image centre. (c) After we shift the corner of the inner-most
rectangle (highlighted with grey) that is nearest to the image centre to coincide with
the image centre. In this case it is the TL corner of the rectangle.
The assumed transformation model is given by (5.404) and (5.405) as
Geometric image restoration 525
x
d
= x
c
(1 +k
1
r
c
)
y
d
= y
c
(1 +k
2
r
c
) (5.436)
where I decided to use dierent parameter value for the vertical and horizontal distor-
tion. I can solve these equations for k
1
and k
2
:
k
1
=
x
d
x
c
x
c
_
x
2
c
+y
2
c
k
2
=
y
d
y
c
y
c
_
x
2
c
+y
2
c
(5.437)
For each one of these equations I shall have four pairs of correct and observed (dis-
torted) values given by (5.434) and (5.435), respectively. If I substitute them in (5.437)
I shall obtain four values for k
1
and four values for k
2
. I will average the four values
to work out a single value for k
1
and a single value for k
2
.
TL TR
BR BL j
i
TL
j BR BL
d
V
d
H
TR
i
L
1
L
2
L
3
L
4
(a) (b)
Figure 5.37: (a) The four black dots represent the inner-most cell of the grid, around
the image centre, that has suered the least lens distortion. From its sides we can
compute the true size of each cell in terms of pixels (the values of d
H
and d
V
). (b)
After we have shifted the nearest corner of the inner-most cell to coincide with the
image centre, we can compute the true locations of the four outer corners of the grid
(marked by black dots), as integer multiples of d
H
and d
V
. The number of integer
multiples we need in each case is specied by parameters L
1
, L
2
, L
3
and L
4
.
526 Image Processing: The Fundamentals
What is the take home message of this chapter?
This chapter explored some techniques used to correct (ie restore) the damaged values of
an image. The problem of restoration requires some prior knowledge concerning the original
uncorrupted signal or the imaged scene, and in that way diers from the image enhancement
problem. Geometric restoration of an image requires knowledge of the correct location of
some reference points.
Grey level restoration of an image requires knowledge of some statistical properties of the
corrupting noise and the perfect image itself, as well as a model for the blurring process.
Often, we bypass the requirement for knowing the statistical properties of the original image
by imposing some spatial smoothness constraints on the solution, based on the heuristic that
the world is largely smooth. Having chosen the correct model for the degradation process
and the uncorrupted image, we have then to solve the problem of recovering the original
image values.
The full problem of image restoration is a very dicult one as it is nonlinear. It can be
solved with the help of global optimisation approaches. However, simpler solutions can be
found, in the form of convolution lters, if we make the assumption that the degradation
process is shift invariant.
Chapter 6
Image Segmentation and Edge
Detection
What is this chapter about?
This chapter is about the image processing techniques that are used to prepare an image as
an input to an automatic vision system. These techniques perform image segmentation
and edge detection, and their purpose is to extract information from an image in such a
way that the output image contains much less information than the original one, but the little
information it contains is much more relevant to the other modules of an automatic vision
system than the discarded information.
What exactly is the purpose of image segmentation and edge detection?
The purpose of image segmentation and edge detection is to extract the outlines of dierent
regions in the image, that is to divide the image into regions which are made up of pixels
which have something in common. For example, they may have similar brightness, or colour,
which may indicate that they belong to the same object or facet of an object. An example is
shown in gure 6.1.
Figure 6.1: An original image, its segmentation and its edge map.
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
528 Image Processing: The Fundamentals
6.1 Image segmentation
How can we divide an image into uniform regions?
One of the simplest methods is that of histogramming and thresholding. If we plot the
number of pixels which have a specic grey value versus that value, we create the histogram of
the image. Properly normalised, the histogram is essentially the probability density function
of the grey values of the image (see page 236).
Assume that we have an image consisting of a bright object on a dark background and
assume that we want to extract the object. For such an image, the histogram will have two
peaks and a valley between them. We can choose as the threshold then the grey value which
corresponds to the valley of the histogram, indicated by t
0
in gure 6.2a, and label all pixels
with grey values greater than t
0
as object pixels and all pixels with grey values smaller than
t
0
as background pixels.
n
u
m
b
e
r
o
f
p
i
x
e
l
s
grey value
h
i
g
h
t
h
r
e
s
h
o
l
d
l
o
w
t
h
r
e
s
h
o
l
d
t
1
t
2
(a) (b)
n
u
m
b
e
r
o
f
p
i
x
e
l
s
grey value
t
o
dark background
bright object
Figure 6.2: The histogram of an image with a bright object on a dark background.
What do we mean by labelling an image?
When we say we extract an object in an image, we mean that we identify the pixels that
make the object up. To express this information, we create an array of the same size as the
original image and we give to each pixel a label. All pixels that make up the object are given
the same label and all pixels that make up the background are given a dierent label. The
label is usually a number, but it could be anything: a letter or a colour. It is essentially a
name and it has symbolic meaning only. Labels, therefore, cannot be treated as numbers.
Label images cannot be processed in the same way as grey level images. Often label images
are also referred to as classied images, as they indicate the class to which each pixel
belongs.
What can we do if the valley in the histogram is not very sharply dened?
If there is no clear valley in the histogram of an image, it means that there are several pixels
in the background which have the same grey value as pixels in the object and vice versa.
Image segmentation 529
Such pixels are particularly encountered near the boundaries of the objects which may be
fuzzy and not sharply dened. One may use then what is called hysteresis thresholding:
instead of one, two threshold values are chosen on either side of the valley (see gure 6.2b).
The highest of the two thresholds is used to dene the hard core of the object. The
lowest is used in conjunction with spatial proximity of the pixels: a pixel with intensity value
greater than the smaller threshold but less than the larger threshold is labelled as object pixel
only if it is adjacent to a pixel which is a core object pixel. Sometimes we use dierent rules:
we may label such a pixel as an object pixel only if the majority of its neighbouring pixels
have already been labelled as object pixels.
Figure 6.3 shows an image depicting a dark object on a bright background and its his-
togram. In 6.3c the image is segmented with a single threshold, marked with a t in the
histogram, while in 6.3d it has been segmented using two thresholds marked t
1
and t
2
in the
histogram.
(a) Original image
0 50 100 150 200 250
0
500
1000
1500
2000
2500
3000
Number of pixels
t
0
t
2
t
1
Grey
Level
(b) Histogram of (a)
(c) Thresholded with t
0
= 134 (d) Thresholded with t
1
= 83 and t
2
= 140
Figure 6.3: Simple thresholding versus hysteresis thresholding.
Alternatively, we may try to choose the global threshold value in an optimal way, ie by
trying to minimise the number of misclassied pixels.
How can we minimise the number of misclassied pixels?
We can minimise the number of misclassied pixels if we have some prior knowledge about the
distributions of the grey values that make up the object and the background. For example,
if we know that the object occupies a certain fraction of the area of the picture, then this
is the prior probability for a pixel to be an object pixel. Clearly, the background pixels
will occupy 1 of the area and a pixel will have 1 prior probability to be a background
pixel. We may choose the threshold then, so that, the pixels we classify as object pixels are
530 Image Processing: The Fundamentals
a fraction of the total number of pixels. This method is called p-tile method. Further,
if we also happen to know the probability density functions of the grey values of the object
pixels and the background pixels, then we may choose the threshold that exactly minimises
the error.
How can we choose the minimum error threshold?
Let us assume that the pixels which make up the object are distributed according to the prob-
ability density function p
o
(x) and the pixels which make up the background are distributed
according to function p
b
(x).
n
u
m
b
e
r
o
f
p
i
x
e
l
s
o o
N p (x)
o o
b
b b
b
N p (x)
N p (x)+N p (x)
t t
1
t
2
x
Figure 6.4: The probability density functions of the grey values of the pixels that make up
the object (p
o
(x)) and the background (p
b
(x)). Their weighted sum, ie p
o
(x) and p
b
(x) mul-
tiplied with the total number of pixels that make up the object, N
o
, and the background, N
b
,
respectively, and added, is the histogram of the image (depicted here by the thick black line).
If we use threshold t, the pixels that make up the cross-over tails of the two probability den-
sity functions (shaded grey in this gure) will be misclassied. The minimum error threshold
method tries to minimise the total number of pixels misclassied and it may yield two thresh-
olds: one on either side of the narrower probability density function, implying that the pixels
that are on the far right have more chance to come from the long tail of the fat probability
density function, than from the short tail of the narrow probability density function.
Assume that we choose a threshold value t (see gure 6.4). Then the error committed by
misclassifying object pixels as background pixels will be given by
_
t
p
o
(x)dx (6.1)
and the error committed by misclassifying background pixels as object pixels is:
_
+
t
p
b
(x)dx (6.2)
Image segmentation 531
In other words, the error that we commit arises from misclassifying the two tails of the two
probability density functions on either side of threshold t. Let us also assume that the fraction
of the pixels that make up the object is , and, by inference, the fraction of the pixels that
make up the background is 1 . Then, the total error is:
E(t) =
_
t
p
o
(x)dx + (1 )
_
+
t
p
b
(x)dx (6.3)
We would like to choose t so that E(t) is minimum. We take the rst derivative of E(t)
with respect to t (see Box 4.9, on page 348) and set it to zero:
E
t
= p
o
(t) (1 )p
b
(t) = 0
p
o
(t) = (1 )p
b
(t) (6.4)
The solution of this equation gives the minimum error threshold, for any type of probability
density functions that are used to model the two pixel populations.
Example B6.1
Derive equation (6.4) from (6.3).
We apply Leibniz rule given by equation (4.114), on page 348, to perform the dier-
entiation of E(t) given by equation (6.3). We have the following correspondences:
Parameter corresponds to t.
For the rst integral:
a() (a constant, with zero derivative)
b() t
f(x; ) p
0
(x)
(independent from the parameter with respect
to which we dierentiate)
For the second integral:
a() t
b() + (a constant, with zero derivative)
f(x; ) p
b
(x) (independent from t)
Equation (6.4) then follows.
532 Image Processing: The Fundamentals
Example 6.2
The grey values of the object and the background pixels are distributed
according to probability density function
p(x) =
_
3
4a
3
_
a
2
(x b)
2
_
for b a x b +a
0 otherwise
(6.5)
with a = 1 and b = 5 for the background, and a = 2 and b = 7 for the object.
Sketch the two probability density functions and determine the range of
possible thresholds.
1 2 3 4 5 6 7 8 9 10
0.2
0.4
0.0
0.6
0.8
1.0
x
p (x)
p (x)
b
o
p(x)
0
Figure 6.5: The range of possible thresholds is from 5 to 6.
Example 6.3
If the object pixels are eight-nineths (8/9) of the total number of pixels,
determine the threshold that minimises the fraction of misclassied pixels
for the problem of example 6.2.
We substitute into equation (6.4) the following:
=
8
9
1 =
1
9
p
b
(t) =
3
4
(t
2
24 + 10t) p
o
(t) =
3
32
(t
2
45 + 14t) (6.6)
Then:
1
9
3
4
(t
2
24 + 10t) =
8
9
3
32
(t
2
45 + 14t)
24 + 10t = 45 + 14t 4t = 21 t =
21
4
= 5.25 (6.7)
Image segmentation 533
Example 6.4
The grey values of the object and the background pixels are distributed
according to the probability density function
p(x) =
_
4a
cos
(xx
0
)
2a
for x
0
a x x
0
+a
0 otherwise
(6.8)
with x
0
= 1 and a = 1 for the objects, and x
0
= 3 and a = 2 for the
background. Sketch the two probability density functions. If one-third
of the total number of pixels are object pixels, determine the fraction of
misclassied object pixels by optimal thresholding.
p (x)
o
p (x)
b
0.2
0.4
0.0
0.6
0.8
1.0
p(x)
1 2 3 4 5
x
0
Figure 6.6: The range of possible thresholds is from 1 to 2.
Apply formula (6.4) with:
=
1
3
1 =
2
3
p
0
(x) =
4
cos
(x 1)
2
p
b
(x) =
8
cos
(x 3)
4
(6.9)
Equation (6.4) becomes:
1
3
4
cos
(t 1)
2
=
2
3
8
cos
(t 3)
4
cos
(t 1)
2
= cos
(t 3)
4
(t 1)
2
=
(t 3)
4
(6.10)
534 Image Processing: The Fundamentals
Consider rst
t1
2
=
t3
4
2t 2 = t 3 t = 1.
This value is outside the acceptable range, so it is a meaningless solution.
Then:
(t 1)
2
=
(t 3)
4
2t 2 = t + 3 3t = 5 t =
5
3
(6.11)
This is the threshold for minimum error. The fraction of misclassied object pixels
will be given by all those object pixels that have grey value greater than
5
3
. We dene
a new variable of integration y x 1, to obtain:
_
2
5
3
4
cos
(x 1)
2
dx =
4
_
1
2
3
cos
y
2
dy
=
4
sin
y
2
1
2
3
=
1
2
_
sin
2
sin
3
_
=
1
2
(1 sin 60
o
)
=
1
2
_
1
3
2
_
=
2 1.7
4
=
0.3
4
= 0.075 = 7.5% (6.12)
What is the minimum error threshold when object and background pixels are
normally distributed?
Let us assume that the pixels that make up the object are normally distributed with mean
o
and standard deviation
o
and the pixels that make up the background are normally
distributed with mean
b
and standard deviation
b
:
p
o
(x) =
1
2
o
exp
_
(x
o
)
2
2
2
o
_
p
b
(x) =
1
2
b
exp
_
(x
b
)
2
2
2
b
_
(6.13)
Upon substitution into equation (6.4), we obtain:
Image segmentation 535
2
o
exp
_
(t
o
)
2
2
2
o
_
= (1 )
1
2
b
exp
_
(t
b
)
2
2
2
b
_
exp
_
(t
o
)
2
2
2
o
+
(t
b
)
2
2
2
b
_
=
1
(t
o
)
2
2
2
o
+
(t
b
)
2
2
2
b
= ln
_
b
1
(t
2
+
2
b
2t
b
)
2
o
(t
2
+
2
o
2
o
t)
2
b
= 2
2
o
2
b
ln
_
b
1
(
2
o
2
b
)t
2
+ 2(
b
2
o
+
o
2
b
)t +
2
b
2
o
2
o
2
b
2
2
o
2
b
ln
_
b
1
_
= 0 (6.14)
This is a quadratic equation in t. It has two solutions in general, except when the two
populations have the same standard deviation. If
o
=
b
, the above expression takes the
form:
2(
o
b
)
2
o
t + (
2
b
2
o
)
2
o
2
4
o
ln
_
1
_
= 0
t =
2
o
b
ln
_
1
o
+
b
2
(6.15)
This is the minimum error threshold.
What is the meaning of the two solutions of the minimum error threshold equa-
tion?
When
o
=
b
, the quadratic term in (6.14) does not vanish and we have two thresholds, t
1
and t
2
. These turn out to be one on either side of the sharpest probability density function.
Let us assume that the sharpest probability density function is that of the object pixels (see
gure 6.4). Then the correct thresholding will be to label as object pixels only those pixels
with grey value x such that t
1
< x < t
2
.
The meaning of the second threshold is that the atter probability density function has
such a long tail, that the pixels with grey values x t
2
are more likely to belong to the long
tail of the at probability density function, than to the sharper probability density function.
Example 6.5
The grey values of the object pixels are distributed according to probability
density function
p
o
=
1
2
o
exp
_
|x
o
|
o
_
(6.16)
536 Image Processing: The Fundamentals
while the grey values of the background pixels are distributed according
to probability density function:
p
b
=
1
2
b
exp
_
|x
b
|
b
_
(6.17)
If
o
= 60,
b
= 40,
o
= 10 and
b
= 5, nd the thresholds that minimise
the fraction of misclassied pixels, when we know that the object occupies
two-thirds of the area of the image.
We substitute in equation (6.4):
2
o
e
|t
o
|
o
=
1
2
b
e
|t
b
|
b
exp
_
|t
o
|
o
+
|t
b
|
b
_
=
o
(1 )
(6.18)
We have =
2
3
(therefore, 1 =
1
3
) and
o
= 10,
b
= 5. Then:
|t
o
|
o
+
|t
b
|
b
= ln
10
1
3
5
2
3
= ln 1 = 0 (6.19)
We have the following cases:
t <
b
<
o
|t
o
| = t +
o
and |t
b
| = t +
b
t
o
o
+
t +
b
b
= 0 (
b
o
)t =
o
b
t =
o
o
=
60 5 10 40
5
t
1
= 20
b
< t <
o
|t
o
| = t +
o
and |t
b
| = t
b
t
o
o
+
t
b
b
= 0 (
o
+
b
)t =
o
b
+
o
b
t =
o
b
+
o
b
+
o
=
60 5 + 10 40
15
t
2
= 47
b
<
o
< t |t
o
| = t
o
and |t
b
| = t
b
t
o
o
+
t
b
b
= 0 (
o
b
)t =
o
b
t =
o
b
+
o
b
=
60 5 + 10 40
5
= 20 <
o
(6.20)
The last solution is rejected because t was assumed greater than
o
.
So, there are two thresholds, t
1
= 20 and t
2
= 47.
Image segmentation 537
p (x)
b
p (x)
o
0
20 40 60 80 100 120
x
10
8
6
4
2
0
100p(x)
t
2
1
t
Figure 6.7: Only pixels with grey values between t
1
and t
2
should be classied as
background pixels in order to minimise the error. Notice how these two thresholds do
not appear very intuitive. This is because the object pixels are twice as many as the
background pixels, and this shifts the thresholds away from the intuitive positions, in
order to minimise the total number of misclassied pixels.
How can we estimate the parameters of the Gaussian probability density functions
that represent the object and the background?
In general, a multimodal histogram may be modelled by a Gaussian mixture model
(GMM). The parameters of such a model may be estimated by using the expectation
maximisation (EM) algorithm. In the case of a single object in a uniform background,
the mixture model consists of two Gaussians. The following EM algorithm is for two classes.
Step 0: Decide the number of Gaussians you want to t to your data. Say K = 2. Guess
initial values for their parameters, ie
1
,
2
,
1
and
2
. Select a value for the threshold of
convergence X, say X = 0.1.
Step 1: For every pixel, (i, j), with grey value g
ij
, compute the probability with which it
belongs to the two classes:
p
ij1
1
2
1
e
(g
ij
1
)
2
2
2
1
p
ij2
1
2
2
e
(g
ij
2
)
2
2
2
2
(6.21)
Step 2: Divide p
ij1
and p
ij2
with p
ij1
+ p
ij2
, to produce the normalised probabilities p
ij1
and p
ij2
, respectively.
Step 3: For every class, compute updated values of its parameters, taking into consideration
the probability with which each pixel belongs to that class:
538 Image Processing: The Fundamentals
1
=
(i,j)
p
ij1
g
ij
(i,j)
p
ij1
1
=
_
(i,j)
p
ij1
(g
ij
1
)
2
(i,j)
p
ij1
2
=
(i,j)
p
ij2
g
ij
(i,j)
p
ij2
2
=
_
(i,j)
p
ij2
(g
ij
2
)
2
(i,j)
p
ij2
(6.22)
Step 4: Check whether the new parameter values are very dierent from the previous values:
If |
1
1
| < X
1
and |
2
2
| < X
2
and |
1
1
| < X
1
and |
2
2
| < X
2
, exit
Else, set
1
=
1
,
2
=
2
,
1
=
1
and
2
=
2
and go to Step 1.
Step 5: Compute the fraction of pixels that belong to the rst Gaussian:
=
1
N
(i,j)
p
ij1
(6.23)
where N is the total number of pixels in the image.
Example 6.6
Assuming Gaussian probability density functions for the object and the
background in image 6.3a, estimate their parameters with the EM algo-
rithm, derive the optimal threshold and use it to threshold the image.
The EM algorithm was initialised with
o
= 64,
b
= 192 and
o
=
b
= 3. It took two
iterations to converge with X = 0.1 to:
o
= 53.3,
b
= 182.0,
o
= 31.7,
b
= 16.5,
A
o
= 588, A
b
= 2207 and = 0.332. The optimal threshold was estimated by solving:
731.88t
2
336800t + 3.1783 10
7
= 0 (6.24)
This equation had two roots: t
1
= 132.54 and t
2
= 327.65. Using t
1
as the optimal
threshold, we obtained the result shown in gure 6.8.
0 50 100 150 200 250
0
500
1000
1500
2000
2500
3000
Number of pixels
t=133
Grey
Level
(a) (b)
Figure 6.8: (a) The image thresholded with threshold 133. (b) The histogram of the
image with the two Gaussians, estimated by the EM algorithm, overlayed and the
chosen threshold marked.
Image segmentation 539
Example 6.7
Use simulated annealing to t the two peaks of the histogram of gure 6.3a
with a mixture of two Gaussians.
Let us assume that the pairs of values of the histogram are (x
i
, y
i
) where x
i
is the value
of the centre of bin i and y
i
is the bin value. It is assumed that
N
i=1
y
i
y = 1, that
is it is assumed that the histogram has been normalised by dividing all its entries with
the total number of pixels and the bin width y (see page 236). Here N is the number
of bins of the histogram. Since the histogram has two main peaks, we shall try to t
it with the mixture of two Gaussians. We wish to select the parameters of the two
Gaussians and their mixing proportions so that the square of the dierence between
the value obtained by the mixture of the Gaussians and the true histogram value is
minimised. That is, we dene the cost function of the problem as follows:
E
N
i=1
_
_
y
i
2
1
e
(x
i
1
)
2
2
2
1
+
1
2
2
e
(x
i
2
)
2
2
2
2
N
m=1
_
2
1
e
(x
m
1
)
2
2
2
1
+
1
2
2
e
(x
m
2
)
2
2
2
2
_
_
_
2
(6.25)
Here is the mixing proportion with which the rst Gaussian contributes to the
mixture and 1 is the mixing proportion with which the second Gaussian contributes
to the mixture. Obviously, 0 < < 1. If we assume that the histogram has been
constructed with one bin per grey value, y = 1 and N = 256. Also, parameters
1
,
2
,
1
and
2
must take values in the range [0, 255]. The simulated annealing
algorithm then, which we may use to identify the values of all ve parameters of the
problem, is as follows.
Step 1: Select plausible values for parameters
1
,
2
,
1
,
2
and , by in-
specting the histogram you wish to t. For the particular problem, we selected:
1
= 239,
2
= 22,
1
= 5,
2
= 5 and = 0.5.
Step 2: Compute E from (6.25), and call it E
old
. Set E
min
= E
old
. Set
min
1
,
min
2
,
min
1
,
min
2
and
min
equal to the current values of these parameters.
Step 3: Select a starting temperature T
0
= 10. Set k = 0. Set = 0.99.
Step 4: Increase k by 1 and set T
k
= T
k1
.
Step 5: Consider one of the parameters. Select randomly a new value for it,
uniformly distributed in the range of its plausible values.
Step 6: For the new set of parameter values, compute E from (6.25), and call it
E
new
.
Step 7: If E
new
E
old
, accept the new parameter value. Set E
old
= E
new
. If
E
new
< E
min
, set E
min
= E
new
and
min
1
,
min
2
,
min
1
,
min
2
and
min
equal to the
parameter values with which E
new
was computed.
If all parameters have been considered in turn, go to Step 4. If not all parameters
have been considered in this iteration step, go to Step 5.
540 Image Processing: The Fundamentals
Step 8: If E
new
> E
old
, compute q e
E
new
E
old
T
k
. Then draw a random number ,
uniformly distributed between 0 and 1. If q, accept the new parameter value. Set
E
old
= E
new
.
If all parameters have been considered in turn, go to Step 4. If not all parameters
have been considered in this iteration step, go to Step 5.
Step 9: If > q, retain the old parameter value. If all parameters have been
considered in turn, go to Step 4. If not all parameters have been considered in this
iteration step, go to Step 5.
Exit the algorithm when a certain number of iterations has been performed, or
when (|E
old
E
new
|)/E
old
< 0.01. In this example, we used as termination criterion
the xed number of iterations, set to 1000. Figure 6.9a shows how the value of the cost
function E changes as a function of the iterations. The output values of the algorithm
are
min
1
,
min
2
,
min
1
,
min
2
and
min
, ie the values of the parameters for which E
obtained its minimum value during the whole process. For this example, these values
were:
min
1
= 179.0,
min
2
= 8.1,
min
1
= 8.2,
min
2
= 120.1 and
min
= 0.35. Figure
6.9b shows how the mixture of the two Gaussians ts the histogram of the image.
Once we have tted the histogram with the two Gaussians, we can apply formula (6.14)
to derive the minimum error threshold. For this example, two valid roots were obtained:
t
1
= 159.5 and t
2
= 200.1. This is because, the Gaussian, identied by the algorithm
for modelling the part of the histogram that corresponds to the background grey values,
is much narrower than the Gaussian that models the object. So, we have to use both
these thresholds to identify the object pixels.
0 200 400 600 800 1000
0
0.005
0.01
0.015
0.02
E
k
0 50 100 150 200 250
0
0.005
0.01
0.015
0.02
0.025
0.03
Frequency
Grey
Level
(a) (b)
Figure 6.9: (a) The value of the cost function versus the number of iterations used.
(b) The two Gaussians that are identied to model the object and the background.
The result is shown in gure 6.10. This result is not satisfactory. The use of the second
threshold to restrict the background pixels in the range [160, 200] clearly created false
object regions in the background. This is not because the signicance of the second
threshold is wrongly understood, but because the models do not t very well the two
Image segmentation 541
humps of the histogram. So, although the theory is correct, the practice is wrong because
we model with Gaussians two probability density functions that are not Gaussians.
Figure 6.10: In white, the pixels with grey values in the range [160, 200].
What are the drawbacks of the minimum error threshold method?
The method has various drawbacks. For a start, we must know the prior probabilities for the
pixels to belong to the object or the background, ie we must know . Next, we must know the
distributions of the two populations. Often, it is possible to approximate these probability
density functions with normal probability density functions, but even in that case one would
have to estimate the parameters and of each function.
Is there any method that does not depend on the availability of models for the
distributions of the object and the background pixels?
A method which does not depend on modelling the probability density functions is the Otsu
method. Unlike the previous analysis, this method has been developed directly in the discrete
domain. The method identies the threshold that maximises the distinctiveness of the two
populations to which it divides the image. This distinctiveness is expressed by the interclass
variance (see Box 6.1) that can be shown to be
2
B
(t) =
[(t) (t)]
2
(t)[1 (t)]
(6.26)
where p(x) are the values of the image histogram, is the mean grey value of the image, t is
a hypothesised threshold and:
(t)
t
x=1
xp
x
and (t)
t
x=1
p
x
(6.27)
The idea is then to start from the beginning of the histogram and test each grey value t for
the possibility of being the threshold that maximises
2
B
(t), by calculating the values of (t)
and (t) and substituting them into equation (6.26). We stop testing once the value of
2
B
starts decreasing. This way we identify t for which
2
B
(t) becomes maximal. This method
tacitly assumes that function
2
B
(t) is well-behaved, ie that it has only one maximum.
542 Image Processing: The Fundamentals
Box 6.1. Derivation of Otsus threshold
Consider that we have an image with L grey levels in total and its normalised histogram,
so that for each grey value x, p
x
represents the frequency with which the particular value
arises. Then assume that we set the threshold to t. Let us assume that we are dealing
with the case of a bright object on a dark background. The fraction of pixels that will
be classied as background ones will be:
(t) =
t
x=1
p
x
(6.28)
The fraction of pixels that will be classied as object pixels will be:
1 (t) =
L
x=t+1
p
x
(6.29)
The mean grey value of the background pixels and the object pixels, respectively, will
be:
b
=
t
x=1
xp
x
t
x=1
p
x
(t)
(t)
o
=
L
x=t+1
xp
x
L
x=t+1
p
x
=
L
x=1
xp
x
t
x=1
xp
x
1 (t)
=
(t)
1 (t)
(6.30)
Here we dened (t)
t
x=1
xp
x
, and is the mean grey value over the whole image,
dened as:
L
x=1
xp
x
L
x=1
p
x
(6.31)
Similarly, we may dene the variance of each of the two populations, created by the
choice of a threshold t, as:
2
b
t
x=1
(x
b
)
2
p
x
t
x=1
p
x
=
1
(t)
t
x=1
(x
b
)
2
p
x
2
o
L
x=t+1
(x
o
)
2
p
x
L
x=t+1
p
x
=
1
1 (t)
L
x=t+1
(x
o
)
2
p
x
(6.32)
Let us consider next the total variance of the distribution of the pixels in the image:
2
T
=
L
x=1
(x )
2
p
x
(6.33)
Image segmentation 543
We may split this sum into two:
2
T
=
t
x=1
(x )
2
p
x
+
L
x=t+1
(x )
2
p
x
(6.34)
As we would like eventually to involve the statistics dened for the two populations, we
add and subtract inside each sum the corresponding mean:
2
T
=
t
x=1
(x
b
+
b
)
2
p
x
+
L
x=t+1
(x
o
+
o
)
2
p
x
=
t
x=1
(x
b
)
2
p
x
+
t
x=1
(
b
)
2
p
x
+ 2
t
x=1
(x
b
)(
b
)p
x
+
L
x=t+1
(x
o
)
2
p
x
+
L
x=t+1
(
o
)
2
p
x
+ 2
L
x=t+1
(x
o
)(
o
)p
x
(6.35)
Next we substitute the two sums on the left of each line in terms of
2
b
and
2
o
, using
equations (6.32). We also notice that the two sums in the middle of each line can be
expressed in terms of equations (6.28) and (6.29), since ,
b
and
o
are constants and
they do not depend on the summing variable x:
2
T
= (t)
2
b
+ (
b
)
2
(t) + 2(
b
)
t
x=1
(x
b
)p
x
+(1 (t))
2
o
+ (
o
)
2
(t) + 2(
o
)
L
x=t+1
(x
b
)p
x
(6.36)
The two terms with the sums are zero, since, for example:
t
x=1
(x
b
)p
x
=
t
x=1
xp
x
x=1
b
p
x
=
b
(t)
b
(t) = 0 (6.37)
Then by rearranging the remaining terms,
2
T
= (t)
2
b
+ (1 (t))
2
o
. .
terms depending on the variance
within each class
+(
b
)
2
(t) + (
o
)
2
(1 (t))
. .
terms depending on the variance
between the two classes
2
W
(t) +
2
B
(t) (6.38)
where
2
W
(t) is dened to be the within-class variance and
2
B
(t) is dened to be the
between-class variance. Clearly, the total image variance
2
T
is a constant. We want
to specify t so that
2
W
(t) is as small as possible, ie the classes that are created are as
544 Image Processing: The Fundamentals
compact as possible, and
2
B
(t) is as large as possible. Let us choose to work with
2
B
(t),
ie let us try to choose t so that
2
B
(t) is maximum. We substitute in the denition of
2
B
(t) the expressions for
b
and
o
, as given by equations (6.30):
2
B
(t) = (
b
)
2
(t) + (
o
)
2
(1 (t))
=
_
(t)
(t)
_
2
(t) +
_
(t)
1 (t)
_
2
(1 (t))
=
[(t) (t)]
2
(t)
+
[ (t) +(t)]
2
1 (t)
=
[(t) (t)]
2
[1 (t)] +(t)[(t) +(t)]
2
(t)[1 (t)]
=
[(t) (t)]
2
(t)[1 (t)]
(6.39)
This function expresses the interclass variance
2
B
(t), in terms of the mean grey value
of the image and quantities that can be computed once we know the values of the
image histogram up to the chosen threshold t.
Example 6.8
Calculate Otsus threshold for the image of Figure 6.3a and use it to thresh-
old the image.
Figure 6.11a shows how
2
B
(t) varies as t scans all possible grey values. The rst
maximum of this function is at t = 84 and we choose this threshold to produce the
result shown in gure 6.11b. We can see that the result is not noticeably dierent from
the result obtained with the empirical threshold (gure 6.3c) and a little worse than the
optimal threshold result (gure 6.8a). It is worse than the result obtained by hysteresis
thresholding, reinforcing again the conclusion that spatial and grey level characteristics
used in thresholding is a powerful combination.
0 50 100 150 200 250
0
1000
2000
3000
4000
B
2
(t)
t
(a) (b)
Figure 6.11: (a) The maximum of
B
(t) denes Otsus threshold, which for this ex-
ample is t = 114. (b) Resultant segmentation with Otsus threshold.
Image segmentation 545
Are there any drawbacks in Otsus method?
Yes, a few:
1. Although the method does not make any assumption about the probability density
functions p
o
(x) and p
b
(x), it describes them by using only their means and variances.
Thus, it tacitly assumes that these two statistics are sucient to represent them. This
may not be true.
2. The method breaks down when the two populations are very unequal. When the two
populations become very dierent in size from each other,
2
B
(t) may have two maxima
and actually the correct maximum is not necessarily the global maximum. That is why
in practice the correct maximum is selected from among all maxima of
2
B
(t) by checking
that the value of the histogram at the selected threshold, p
t
, is actually a valley (i.e.
p
t
< p
o
and p
t
< p
b
). Only if this is true, t should be accepted as the best threshold.
3. The method, as presented above, assumes that the histogram of the image is bimodal,
ie that the image contains two classes. For more than two classes present in the image,
the method has to be modied so that multiple thresholds are dened which maximise
the interclass variance and minimise the intraclass variance.
4. The method will divide the image into two classes, even if this division does not make
sense. A case when the method should not be directly applied is that of variable
illumination.
How can we threshold images obtained under variable illumination?
In Chapter 4 (page 364), we saw that an image is essentially the product of a reectance
function r(x, y), which is intrinsic to the viewed surfaces, and an illumination function i(x, y):
f(x, y) = r(x, y)i(x, y) (6.40)
Thus, any spatial variation of the illumination results in a multiplicative interference to
the reectance function that is recorded during the imaging process. We can convert the
multiplicative interference into additive, if we take the logarithm of the image:
ln f(x, y) = ln r(x, y) + ln i(x, y) (6.41)
Then instead of forming the histogram of f(x, y), we can form the histogram of ln f(x, y).
If we threshold the image according to the histogram of ln f(x, y), are we thresh-
olding it according to the reectance properties of the imaged surfaces?
No, under variable illumination. To be able to understand what is going on, we have to try
to answer the following question. How is the histogram of ln f(x, y) expressed in terms of the
histograms of ln r(x, y) and ln i(x, y)? Ideally, we are interested in thresholding the histogram
of r(x, y), or ln r(x, y).
Let us dene some new variables:
z(x, y) ln f(x, y)
r(x, y) ln r(x, y)
i(x, y) (6.43)
If f(x, y), r(x, y) and i(x, y) are thought of as random variables, then z(x, y), r(x, y) and
i(x, y) are also random variables. So, the question may be rephrased as follows. What is the
histogram of the sum of two random variables in terms of the histograms of the two variables?
A histogram may be thought of as a probability density function. Rephrasing the question
again, we have the following. What is the probability density function of the sum of two
random variables in terms of the probability density functions of the two variables?
In Box 6.2, we show that the probability density function of z(x, y) is the convolution of
the probability density function of r(x, y) with the probability density function of
i = u divides the (
i, r) plane into two half planes, one in which z > u and one
where z < u. The probability of z < u is equal to the integral of the probability density
function of pairs (
i, r), over the area of the half plane in which z < u (see gure 6.12):
P
z
(u) =
_
+
i=
_
u
i
r=
p
r
i
( r,
i)d rd
i (6.45)
Image segmentation 547
Here p
r
i
( r,
i) is the joint probability density function of the two random variables r and
i.
To work out the probability density function of z, we dierentiate P
z
(u) with respect
to u, using Leibnizs rule (see Box 4.9, on page 348), applied twice; once with
f(x; )
_
u
i
r=
p
u
i
( r,
i)d r
b() +
a() (6.46)
and once more when we need to dierentiate f(x; ), which itself is an integral that
depends on parameter u, with respect to which we dierentiate:
p
z
(u) =
dP
z
u
du
_
+
i=
d
du
_
_
u
i
r=
p
r
i
( r,
i)d r
_
d
i
=
_
+
i=
p
r
i
(u
i,
i)d
i (6.47)
The two random variables r and
i
( r,
i) = p
r
( r)p
i
(
i) (6.48)
Upon substitution in (6.47), we obtain:
p
z
(u) =
_
+
p
r
(u
i)p
i
(
i)d
i (6.49)
This shows that the histogram (= probability density function) of z is equal to the
convolution of the two histograms of the two random variables r and
i.
If the illumination is uniform, then:
i(x, y) = constant
i = ln i(x, y) =
i
o
= constant (6.50)
Then p
i
(
i) = (
i
o
), and after substitution in (6.49) and integration, we obtain
p
z
(u) = p
r
(u).
That is, under uniform illumination, the histogram of the reectance function (intrinsic
to the object) is the same as the histogram of the observed grey values. If, however,
the illumination is not uniform, even if we had a perfectly distinguishable object, the
histogram is badly distorted and the various thresholding methods break down.
548 Image Processing: The Fundamentals
Since straightforward thresholding methods break down under variable
illumination, how can we cope with it?
There are two ways in which we can circumvent the problem of variable illumination:
1. Divide the image into more or less uniformly illuminated patches and histogram and
threshold each patch as if it were a separate image. Some adjustment may be needed
when the patches are put together, as the threshold essentially will jump from one value
in one patch to another value in a neighbouring patch.
2. Obtain an image of just the illumination eld, using the image of a surface with uniform
reectance and divide the image f(x, y) by i(x, y), ie essentially subtract the illumination
component
i(x, y) from z(x, y). Then multiply
f(x,y)
i(x,y)
with a reference value, say i(0, 0),
to bring the whole image under the same illumination and proceed using the corrected
image. This is essentially the method of atelding (see page 366).
Example 6.9
Threshold the image of Figure 6.13a.
This image exhibits an illumination variation from left to right. Figure 6.13b shows the
histogram of the image. Using Otsus method, we identify threshold t = 99. The result
of thresholding the image with this threshold is shown in Figure 6.13c. The result of
dividing the image into three subimages from left to right and applying Otsus method
to each subimage separately is shown in Figure 6.13d. The three local thresholds, from
left to right, were 114, 84 and 52.
0 50 100 150 200 250
0
200
400
600
800
1000
Number of pixels
Grey
Level
(a) Original image (b) The histogram of (a)
(c) Global thresholding (d) Local thresholding
Figure 6.13: Global versus local thresholding for an image with variable illumination.
Image segmentation 549
0 5 10 15 20 25 30 35 40 45 50
0
500
1000
1500
2000
2500
3000
Bins
C
o
u
n
t
s
0 5 10 15 20 25 30 35 40 45 50
0
500
1000
1500
2000
2500
3000
Bins
C
o
u
n
t
s
0 5 10 15 20 25 30 35 40 45 50
0
500
1000
1500
2000
2500
3000
Bins
C
o
u
n
t
s
0 5 10 15 20 25 30 35 40 45 50
0
500
1000
1500
2000
2500
3000
Bins
C
o
u
n
t
s
Figure 6.14: Identifying a threshold when the histogram has a long tail. In the rst iteration,
5 bins on the right of the peak and 5 bins on the left of the last bin are tted with straight
lines. Their intersection denes a rst estimate of the threshold. The points on the left of
the threshold and up to the peak are tted again and the outliers are removed. The inliers
are retted to produce the rst line for the second iteration. The points on the right of the
threshold are also tted again and the outliers are removed. The inliers are retted to produce
the second line for the second iteration. Their intersection yields the second estimate of the
threshold. In the third and fourth iterations, this process is repeated to rene the estimate
of the threshold.
What do we do if the histogram has only one peak?
Such situation is often encountered in practice. It arises when one of the two populations
has a roughly at distribution that creates a long tail of the histogram. One way to identify
a meaningful threshold in such a case is to t with straight lines the descending part of the
peak and the long tail, and select as threshold the coordinate where the two lines meet. The
so called knee algorithm is as follows.
Step 1: Consider the peak of the histogram, with abscissa b
peak
, and n bins on the right of
the peak towards the long tail. Typical value for n = 5. In the (bin, count) space t these
550 Image Processing: The Fundamentals
points with a straight line using least square error.
Step 2: Consider the last bin of the histogram, with abscissa b
last
, and n bins on the left
of this bin, towards the peak. In the (bin, count) space t these points with a straight line,
using least square error.
Step 3: Work out the point where the two lines meet. The abscissa of the point of intersec-
tion is the rst estimate of the value of the threshold, t
1
.
Step 4: Consider all points with abscissa in the range [b
peak
, t
1
] and t them with a straight
line in the least square error sense. If all points have residual error less than a tolerance, keep
this line. If not, omit the points with error larger than the tolerance and ret the rest with
a least square error line.
Step 5: Consider all points with abscissa in the range [t
1
, b
last
] and t them with a straight
line in the least square error sense. If all points have residual error less than a tolerance, keep
this line. If not, omit the points with error larger than the tolerance and ret the rest with
a least square error line.
Step 6: Find the intersections of the lines constructed in Steps 4 and 5. The abscissa of the
point of intersection is the new estimate of the value of the threshold, t
2
.
The process may be repeated as many times as we like. Every time we go to Step 4 and
5, we consider the pairs of points that are assumed to be represented by the straight line we
have for them.
An example of using this algorithm to identify a threshold for a histogram with a long
tail is shown in gure 6.14.
Are there any shortcomings of the grey value thresholding methods?
Yes. With the exception of hysteresis thresholding, which is of limited use, the spatial prox-
imity of the pixels in the image is not considered at all in the segmentation process. Instead,
only the grey values of the pixels are used.
For example, consider the two images in gure 6.15.
n
u
m
b
e
r
o
f
p
i
x
e
l
s
grey value
255 0
20
40
60
80
Figure 6.15: Two very dierent images with identical histograms.
Image segmentation 551
Clearly, the rst image is the image of a uniform region, while the second image contains
two quite distinct regions. Even so, both images have identical histograms. Their histograms
are bimodal and we can easily choose a threshold to split the pixels that make up the two
peaks. However, if we use this threshold to segment the rst image, we shall get nonsense.
How can we cope with images that contain regions that are not uniform but they
are perceived as uniform?
Regions that are not uniform in terms of the grey values of their pixels, but are perceived
as uniform, are called textured regions. For segmentation purposes then, each pixel is
not characterised by its grey value, but by another number or numbers, which quantify
the variation of the grey values in a small patch around that pixel. These numbers, which
characterise the pixels, are called features or attributes. If only one attribute is used
to characterise a pixel, any of the thresholding methods discussed so far may be used to
segment the object (see example 6.10). Usually, however, more than one attribute are used
to characterise the pixels of an image. So, we may envisage that each pixel is characterised
not by a scalar, but by a vector, each component of which measures something dierent at
or around the pixel position. Then each pixel is represented by a point in a multidimensional
space, where we measure one component of the feature vector per axis. Pixels belonging
to the same region will have similar or identical values in their attributes and, thus, they
will cluster together. The problem then becomes one of identifying clusters of pixels in a
multidimensional space. Essentially it is similar to histogramming, only now we deal with
multidimensional histograms. There are several other clustering methods that may be used,
but they are in the realm of pattern recognition and thus beyond the scope of this book
1
.
Example 6.10
You are asked to segment the image of gure 6.16 into object and back-
ground. Black pixels in the image have value 0 and white pixels have value
1. Dene a scalar that may be used to characterise each pixel and segment
the image by thresholding the values of this scalar.
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1
1
1
1 1
1
1 1
1 1
1
1 1
1
1 1
1
1 1
1
1 1
1
1 1
1
1
1
1 1
1
1
1 1 1 1
1
1
1
1
1
1
1
1
1
1
1 1 1 1
1
1
1
1
1
1
1 1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1 1 1
1
1
1
1
1
1 1 1 1
1
1 1
1 1
1 1
1 1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1
1 1 1 1
1 1 1
1 1
1 1 1
1 1 1 1
1 1 1
1 1 1 1
0 0
0 0 0
0
0
0
0
0
0
0
0
0
0 0
0
0 0 0
0
0
0
0 0
0
0
0
0
1
1
1
1 1
1
1
1 0
1 1 1 1
1
1
1
1 1
1 1
1
1
1
1
1
1
1
0
0
1
1
Figure 6.16: An object on a white background.
1
More on texture may be found in the book Image Processing, dealing with Texture, by Petrou and
Garcia Sevilla, John Wiley & Sons, Ltd, ISBN-13-978-0-470-02628-1
552 Image Processing: The Fundamentals
We dene as a characteristic value for each pixel the absolute dierence between the
grey value of the pixel and the average grey value of its four nearest neighbours. Figure
6.17a then shows the value this scalar has for all image pixels. Figure 6.17b shows
the histogram of these values. We select, as an appropriate threshold, the value of 0.5.
Any pixel with value greater or equal to this threshold is considered as belonging to the
same image region. The two segments we identify this way are presented as grey and
white in gure 6.18.
0 0 0 0
0
0
0 0 0
0
0
0
0
0
0
0.25
0
1
1
1
1
1
1
0 0 0 0 0 0 0
0 0.75 1 1 1 1 1 1 1 0.25 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
0
0 0
0
0.25
0
0.25
0
0 0
0.25
0
0
0 0
0
0.5
1
0.75
0.75
0.75
1
0.25
0 0
0.75
1
1
1
1
0.25
0 0 0 0 0 0 0 0 0
0 0.25 0 0.25 0 0.25 0 0 0
0.75 1 0.75 1 0.75 1 0.25 0 0
1 1 1 1 1 0.75 0 0 0
1 1 1 1 1 1 0.25 0 0
1 1 1 1 1 0.25 0
1 1 1 1 0.75 0 0 0
1 1 1 1 1 0 0
1 0.75 1 0.75 1 0.5 0 0 0
0 0 0 0.25 0 0.25 0 0.25
0 0 0 0 0 0 0 0 0
0
0
n
u
m
b
e
r
o
f
p
i
x
e
l
s
10
20
30
40
50
60
70
80
90
100
110
0.25
value
0.75 0.5 0 1
(a) (b)
Figure 6.17: (a) The values of a feature that may be used to characterise the image of
gure 6.16. (b) The histogram of these values.
Figure 6.18: Segmentation result of the image of gure 6.16 using the values of gure
6.17.
Image segmentation 553
0 50 100 150 200 250
0
0.002
0.004
0.006
0.008
0.01
0.012
Frequency
Grey
Level
0 50 100 150 200 250
0
0.002
0.004
0.006
0.008
0.01
0.012
Frequency
Grey
Level
(a) (b)
Figure 6.19: (a) The histogram of image 6.13a. (b) The histogram of the same image upsam-
pled by a factor of 4 in each direction using linear interpolation.
Can we improve histogramming methods by taking into consideration the spatial
proximity of pixels?
To a limited extent, yes. We may upsample the image by using bilinear (or some other form of)
interpolation and then compute the histogram. This way, the histogram becomes smoother
(because it is computed from many more samples) and, at the same time, the extra pixels
created have values that depend on the spatial arrangement of the original pixels. Other
more sophisticated methods of histogram estimation also exist, using the so called kernel
functions. A pixel may be thought of as a delta function with which we sample the scene at
a certain point. A kernel-based method replaces that delta function by a function with more
spread, the kernel function. Then it treats the image as a collection of overlapping kernel
functions.
Figure 6.19 shows the histogram of gure 6.13a, on page 548, and, next to it, the same
histogram computed after upsampling the original image by a factor of 4 in each direction:
three extra rows and three extra columns were inserted between any two adjacent rows and
columns, and the values of the new pixels were computed by using bilinear interpolation as de-
scribed in Chapter 5 (see page 518). We can see how much smoother the new histogram is. A
more accurate estimation of the histogram allows the more accurate estimation of subsequent
quantities from it, including, for example, the threshold for performing segmentation.
Are there any segmentation methods that take into consideration the spatial
proximity of pixels?
Yes, they are called region growing methods. Examples of such methods are the watershed
segmentation method and the less sophisticated but more straightforward split and merge
algorithm. In general, one starts from some seed pixels and attaches neighbouring pixels to
them, provided the attributes of the pixels in the region created in this way vary within a
predened range. So, each seed grows gradually by accumulating more and more neighbouring
pixels, until all pixels in the image have been assigned to a region.
554 Image Processing: The Fundamentals
How can one choose the seed pixels?
There is no clear answer to this question, and this is the most important drawback of this
type of method. In some applications, the choice of seeds is easy. For example, in target
tracking in infrared images, the target will appear bright, and one can use as seeds the few
brightest pixels. The split and merge method does not require any seeds. The watershed
method identies the seeds by morphological image reconstruction.
How does the split and merge method work?
Initially the whole image is considered as one region. If the range of attributes within this
region is greater than a predetermined value, then the region is split into four quadrants and
each quadrant is tested in the same way, until every square region created in this way contains
pixels with range of attributes within the given value. At the end, all adjacent regions with
attributes within the same range may be merged.
An example is shown in gure 6.20, where, for simplicity, a binary 88 image is considered.
The tree structure shows the successive splitting of the image into quadrants. Such a tree is
called quad tree.
(RSXV) (SMTX) (XTHU) (VXUO)
F
B
G
K
C
I
S
H O D
P
E
A
Q
R M
X T
N
V
U L
J
(IJNM) (JGKN) (NKCL) (MNLH)
(EQRP) (QIMR) (RMHO) (PROD)
(b) (a)
(ABCD)
(IGCH) (EIHD) (FBGI) (AFIE)
Figure 6.20: Image segmentation by splitting inhomogeneous quadrants.
We end up having the following regions:
(AFIE)(FBGI)(IJNM)(JGKN)(NKCL)(MNLH)
(EQRP)(QIMR)(PROD)(RSXV)(SMTX)(XTHU)(VXUO)
These are all the leaves of the quad tree. Any two adjacent regions then are checked for
merging and eventually only the two main regions of irregular shape emerge. The above quad
tree structure is clearly favoured when the image is square with N = 2
n
pixels in each side.
Split and merge algorithms often start at some intermediate level of the quad tree (ie some
blocks of size 2
l
2
l
, where l < n) and check each block for further splitting into four square
sub-blocks and any two adjacent blocks for merging. At the end, again we check for merging
any two adjacent regions.
What is morphological image reconstruction?
Consider an image f(i, j). Consider also another image g(i, j), called the mask, such that
g(i, j) f(i, j) for every pixel (i, j). Dene the dilation of image g(i, j), by a structuring
Image segmentation 555
element B, as follows: consider a neighbourhood of the shape and size of B around each
pixel of g(i, j); select the largest value inside this neighbourhood and assign it to the central
pixel in the output image. This operation is denoted as g(i, j) B. The reconstruction of
image f by g is obtained by iterating
g
k+1
(i, j) = min{f(i, j), g
k
(i, j) B} (6.51)
until image g
k
does not change any further.
Consider also another mask image g, such that g(i, j) f(i, j) for every pixel (i, j). Dene
the erosion of image g by a structuring element B, as follows: consider a neighbourhood
of the shape and size of B around each pixel of g; select the smallest value inside this neigh-
bourhood and assign it to the central pixel in the output image. This operation is denoted
as g(i, j) B. The reconstruction of image f by g is obtained by iterating
g
k+1
(i, j) = max{f(i, j), g
k
(i, j) B} (6.52)
Example 6.11
Reconstruct morphologically image f, shown in gure 6.21a, using a struc-
turing element of size 33. Start by creating mask g by subtracting 1 from
all pixels of the original image f.
Figure 6.21b shows image g constructed from the original image.
14
14
13
13
12
12
13
13
14
14
15
16
16
15
15
16
15
15
15
15
15
14
14
14 14
14
14
15
13
13 15
15
15
14
14
16
16
16
16 16
15
17 17
17
17
17
16
15 14
14
14
13
13
15
15
15
15
15
16 16
18
17
28 24
14
13 14
27
27
25 14
14
14
16
26
25
15
30
17
15
13
13 13
13
13 13
14 14 14
14
14 14 14
15 15 17 15 16
25
26
28 27 16
15
17
16
12
22
23
22
15
15 15
15 15
15 15
15
15
15
14
25 25
27 30 27 26
26
28 27 26
28 25 26
13 14 14 13
16 16
22 17
21 16
19
14
17 16 16
16 14 15
26
26
26
13 13
17
16
14 14 14
14
14
14 14 14
14 14 14
14
14
14 14 14 14
14
14 14
14
14
14 14 14
14
13
13
13
13
13 13
13
13
13
13 13
13
13
13
15 15
15 15
15
15
15
15
15
15
12
12
12
12
12 12
12 12
12 14
13
13
13
13 21 22
21 23
20 21
15
13 13
13 13 12 13
14
14 13
13
15
14
13
13
15
15
13
15
14
13
13
12
12
13
14
14
13
15 16 16 15 14 14 14 15 14 14 15 15
15 15 16 16 16 15 14 14 14 15 16 16 15
14 15 14 15 16 14 15 14 16 14 15 14
14 15 17 18 16 13 14 15 16 16
13 17 14 15 15 15 26 14 15
13 16 15 16 27 27 26 26 16 14 15 13 13 14 16
14 15 14 15 26 28 28 27 17 14 15 13 14 13 14
16 14 14 15 16 18 27 27 26 28 15 16 14 14
15 16 15 16 28 27 27 28 29 17 16 16 14 15 15 15
14 14 15 15 15 16 13 14 15 15 14 15 14 15 16 15
16 17 18 15 16 18 17 14 14 14
13 13 14 13 22 24 24 15 16 18 23 25 29 18 17
17 18 17 18 31 22 17 16 16 23 22 15 14 13
15
15
15 15
14
13
13
14 14
14 14
14
14
15
15 15
15 16
16
16
16
16
16 15
15
17
17
16
16
18 17 17
17 17
17 17 16 16 16
16
16
18 18
18
17 23 22 23 13
21
20 19
17
17 16 16 16 15 14
28 29 27 26 15
16 15 14
16 15 15 15
31
29
15
15
14
16
16
15 14
17
27
15
16
15
(a) (b)
Figure 6.21: (a) An original image f. (b) Image g constructed from image f by
subtracting 1 from all pixel values.
Figure 6.22 shows the result of the rst iteration of the algorithm. Figure 6.23 shows
the results of the second and third (nal) iterations. Figure 6.23b is the morphological
reconstruction of f by g, obtained after three iterations. As iterations progress, the
image shrinks as the border pixels cannot be processed.
556 Image Processing: The Fundamentals
15 15 15 15 16 17 17 16 16
15 15 25 25 26 28 28 27 16 15 15
15 15 15 28 28 26 26 15
15 26 30 28 27 16 15
15 14 26 30 30 30 28 27 15
15 27 28 30 30 30 28 28 17 15 15 14 15
15 15 27 28 28 28 28 28 28 17 15 15 14 15
17 17 27 27 27 27 28 28 28 14
14 23 24
17
17
17
28 30 30 30 23 23 23
23
22
14 14 14 14 16 18
30 30
30 30
30 22
17
23
21 21
21 15
15
15 15 15 15 14 14 14 14 15 15 15 15 15
14 15 15 15 15 14 15 14 14 15 15
15 16 17 16 16 17 16 13 15 15
14 14 15 17 28 26 15 15
15 15 16 27 27 26 16 14 13 13 14
14 13 15 14 17 27 28 30 26 15 14
15 15 28 29 26 27 27 18 16 15 14 14 15
15 15 16 28 27 27 28 28 17 16 15 14 14 15
16 14 15 14 15 15 14 15 15 14 14
14 14 23 22 23 14
17
17
17 17
17
25 18 23 22 14 13
14
14 13 16 18 17
18
18 30
20
22
17
18 16
16 16
15
16
15
15 15 15 15 14 14 14 14 15 15 15 15 15 15
15 16 15 17
15 15
14
14
14
15
15
14
14
15 17
16 27
30
28
15
15
15
15
26
27
15
15
14
17 19 19 19 16
14
15 15
22 22 23 22 30 17
17
17 28
28
28 28 22 23 17 23 23 22
17 22
17 23 23 23
14
14
14
14 22 22 22
15
30
27
17
15
15 15 28
18
27 26 15
15
26
28
15
15 18
18
17
18 16 17 17 13
13
16
15 13
14
13
17
17
29 23
15
16 15 14 14
23 22
13
16 15
15
27 14
16
23
21
14
15 14
17
19
14
14
(a) (b)
Figure 6.22: (a) The dilation of image g, by a structuring element of size 3 3: it
was created by taking the maximum of image 6.21b inside a sliding window of size
3 3 and placing it in the central position of the window in the output image. (b) The
morphological reconstruction of f by g: it was created by taking the minimum between
6.21a and (a), pixel by pixel.
15 15 15 14 15 14 14
16 17 16 16 17 16 13 15
14 14 15 17 28 26 15
15 16 27 27 26 16 13 13
14 13 15 14 17 27 28 30 26 15 14
15 28 29 26 27 27 18 16 15 14 14
15 16 28 27 27 28 28 17 16 14
14 15 14 15 15 14 15 15
14 23 22 23 14
17
25 18 23 22 14
18
18 30
20
22 16
16 16
15
15 28
18
27 26 15
15
26
28
15
15 18
18
17
18 16 17 17
13
16
15 13
29 23
15
23 22
13
16 15
15
27
16
23
21
15 14 19
14
14
16 16
16
14 15
16
16 15
15
15
16 17
16 15
16 17 16 16 17 16 13
14 14 15 17 28 26
16 27 27 26 16 13
13 15 14 17 27 28 30 26 15
15 28 29 26 27 27 18 16 15 14
16 28 27 27 28 28 17 16 14
14 15 14 15 15 14 15 15
23 22 23 14
25 18 23 22
18 30 22 16
28
18
27 26 15
26
28
15 18 16 17 17
29 23
23 22
13
16 15
15
27
16
23
21
14
14 15
16
16 17
14
13
16
(a) (b)
Figure 6.23: (a) The result of applying the process shown in gure 6.22 to image 6.22a.
(b) The result of applying the process shown in gure 6.22 to (a). Further applications
of the same process do not change the result.
Image segmentation 557
How does morphological image reconstruction allow us to identify the seeds
needed for the watershed algorithm?
If we subtract from image f its reconstruction by g, we are left with the signicant maxima of
f. If we subtract from image f its reconstruction by g, we are left with the signicant minima
of f. These may be used as seed patches which may be grown to segments of the image, by
accumulating more and more pixels that surround them. Each region stops growing when it
comes in contact with another region. To ensure that the meeting points between regions are
at the correct places, we use the gradient magnitude image to perform the seed selection and
region growing method.
How do we compute the gradient magnitude image?
The gradient vector of a 2D function f(x, y) is (f(x, y)/x, f(x, y)/y). Its magnitude is
_
_
f(x,y)
x
_
2
+
_
f(x,y)
y
_
2
. In the discrete domain, the partial derivatives are replaced by the
rst dierences of the function,
i
f(i, j) and
j
f(i, j):
Gradient Magnitude
_
(
i
f(i, j))
2
+ (
j
f(i, j))
2
(6.53)
Figure 6.24 shows an original image and the output obtained if the 1 +1 and
1
+1
convolution lters are applied along the horizontal and vertical directions, respectively, to
compute the rst image dierences. The outputs of the two convolutions are squared, added
and square rooted to produce the gradient magnitude associated with each pixel.
(a) (b)
Figure 6.24: (a) Original image. (b) The image of the gradient magnitudes. For displaying
purposes, the gradient image has been subjected to histogram equalisation.
558 Image Processing: The Fundamentals
Example 6.12
Use the morphological reconstruction of image f, shown in 6.22a, by g,
created in example 6.11, to identify the maxima of image f.
The morphological reconstruction of f by g is shown in 6.23b. To identify the maxima
of f, we subtract it from the original image f. The result is shown in 6.25b. The
maxima are highlighted in grey in the result and in the original image.
15 16 16 15 14 14 14 15 14 14 15 15
15 15 16 16 16 15 14 14 14 15 16 16 15
14 15 14 15 16 14 15 14 16 14 15 14
14 15 17 18 13 14 15 16 16
13 17 14 15 15 15 26 14 15
13 16 15 16 27 27 26 26 16 14 15 13 13 14 16
14 15 14 15 26 28 28 27 17 14 15 13 14 13 14
16 14 14 15 16 18 27 27 26 28 15 16 14 14
15 16 15 16 28 27 27 28 29 17 16 16 14 15 15 15
14 14 15 15 15 16 13 14 15 15 14 15 14 15 16 15
16 17 18 15 16 18 17 14 14 14
13 13 14 13 22 24 24 15 16 18 23 25 29 18 17
17 18 17 18 31 22 17 16 16 23 22 15 14 13
15
15
15 15
14
13
13
14 14
14 14
14
14
15
15 15
15 16
16
16
16
16
16 15
15
17
17
16
16
18 17 17
17 17
17 17 16 16 16
16
16
18 18
18
17 22 23 13
21
20 19
17
17 16 16 16 15 14
28 29 27 26 15
16 15 14
16 15 15 15
31
29
15
15
14
16
16
15 14
17
27
15
16
15
16 0 0 0 0 0 0 0
0 0
0
1 0 0 0 0
0
0 0 0
1 0 0 0
0 0 0 0 0 0 0
0 0
1 0 0
0 0 0
23
0 0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0 0
0
0 0
0
0
1 0
0
0
0
0
0
0
0
0
0
0
0
0 0 0
0 0
0 0
0 0
0
0
0 0 0
0 0 0
0
0 0 0 0
0
0
0
1
(a) (b)
Figure 6.25: (a) The original image f. (b) After subtracting the morphological recon-
struction of f by g. Grey indicates the identied maxima of f.
What is the role of the number we subtract from f to create mask g in the
morphological reconstruction of f by g?
If we create g by subtracting from f a number I
0
, we are identifying maxima in f that stick
out from their surroundings by at most I
0
.
Example 6.13
Identify all maxima that are less signicant than 3, ie they stick out from
their surrounding pixels by at most 3 grey levels, in image f shown in gure
6.22a, using a structuring element of size 3 3.
We start by creating image g by subtracting 3 from all pixels of the original image f.
The result is shown in 6.26a. Figure 6.26b shows the result of reconstructing f by this
Image segmentation 559
g, obtained after three iterations. Figure 6.27 shows the maxima identied this time.
We note that they indeed are the maxima that dier from their surrounding pixels by
at most 3 values.
26 26 27 27 27 17 15 14
27 27 27 26 26 16 14 15
26 28 28 28 27 17 14 15
28 28 26 27 27 18 16 15
28 27 27 27 27 17 16 16
15 14 15 15 14 13 16 15
16 18 17 17 14 21 21 21
25 23 18 16 15 21 21 21
12 13 13 12 11 11 11 12 11 11 12 12
12 12 13 13 13 12 11 11 11 12 13 13 12
11 12 11 12 13 11 12 11 13 11 12 11
11 12 14 15 10 11 12 13 13
10 14 11 12 12 12 23 11 12
10 13 12 13 24 24 23 23 13 11 12 10 10 11 13
11 12 11 12 23 25 25 24 14 11 12 10 11 10 11
13 11 11 12 13 15 24 24 23 25 12 13 11 11
12 13 12 13 25 24 24 25 26 14 13 13 11 12 12 12
11 11 12 12 12 13 10 11 12 12 11 12 11 12 13 12
13 14 15 12 13 15 14 11 11 11
10 10 11 10 19 21 21 12 13 15 20 22 26 15 14
14 15 14 15 28 19 14 13 13 20 19 12 13 10
12
12
12 12
11
10
10
11 11
11 11
11
11
12
12 12
12 13
13
13
13
13
13 12
12
14
14
13
13
15 14 14
14 14
14 14 13 13 13
13
13
15 15
15
14 19 20 10
18
17 16
14
14 13 13 13 12 11
25 26 24 23 12
13 12 11
13 12 12 12
28
26
12
12
11
13
13
12 11
14
24
12
13
12
13
23
(a) (b)
Figure 6.26: (a) Image g constructed from f by subtracting 3. (b) The morphological
reconstruction of f by g, using a 3 3 structuring element.
15 16 16 15 14 14 14 15 14 14 15 15
15 15 16 16 16 15 14 14 14 15 16 16 15
14 15 14 15 16 14 15 14 16 14 15 14
14 15 17 18 13 14 15 16 16
13 17 14 15 15 15 26 14 15
13 16 15 16 27 27 26 26 16 14 15 13 13 14 16
14 15 14 15 26 28 28 27 17 14 15 13 14 13 14
16 14 14 15 16 18 27 27 26 28 15 16 14 14
15 16 15 16 28 27 27 28 29 17 16 16 14 15 15 15
14 14 15 15 15 16 13 14 15 15 14 15 14 15 16 15
16 17 18 15 16 18 17 14 14 14
13 13 14 13 22 24 24 15 16 18 23 25 29 18 17
17 18 17 18 31 22 17 16 16 23 22 15 14 13
15
15
15 15
14
13
13
14 14
14 14
14
14
15
15 15
15 16
16
16
16
16
16 15
15
17
17
16
16
18 17 17
17 17
17 17 16 16 16
16
16
18 18
18
17 22 23 13
21
20 19
17
17 16 16 16 15 14
28 29 27 26 15
16 15 14
16 15 15 15
31
29
15
15
14
16
16
15 14
17
27
15
16
15
16
0
0
0 0 0
0 0
0 0 0
0 0 0 0 0 0
0 0
23
0
0
0
0
0 0
0
0
0
0
0
0
0
0 0 0
0 0
0 0
0
0 0
0 0 0
0
0 0 0 0
0
0
(a) (b)
1
1
1 3 3
2
2
3
1
1
2
2
0
Figure 6.27: (a) The original image f. (b) After subtracting the morphological recon-
struction of f by g. Grey indicates the identied maxima of f within the region that
is not aected by border eects.
560 Image Processing: The Fundamentals
What is the role of the shape and size of the structuring element in the morpho-
logical reconstruction of f by g?
If we create g by subtracting from f a number I
0
, we are identifying maxima in f that stick
out from their surroundings by at most I
0
. The structuring element we use identies the
shape and size of the neighbourhood of a pixel over which it must stick out in order to be
identied as a maximum.
Example 6.14
Identify all maxima that stick out from their surrounding pixels, inside a
local window of size 5 5, by one grey level, in image f, shown in gure
6.22a.
We start by creating image g by subtracting 1 from all pixels of the original image
f. The result is shown in 6.22b. Figure 6.28a shows the result of reconstructing f by
this g using a 5 5 structuring element. Two iterations were necessary. Figure 6.28b
shows the maxima identied this time. We note that they indeed are only the maxima
that dier from their surrounding 5 5 pixels by more than 1 grey value.
26 26 27 29 28 17 15 14
27 27 27 26 26 16 14 15
26 28 30 28 27 17 14 15
28 29 26 27 27 18 16 15
28 27 27 28 29 17 16 16
15 14 15 15 14 13 16 15
16 18 17 17 14 23 22 23
25 23 18 16 15 23 23 22
0
0
0 0 0
0 0
0 0 0
0 0 0 0 0 0
0 0
0
0
0
0
0 0
0
0
0
0
0
0
0
0 0 0
0 0
0 0
0
0 0
0 0 0
0
0 0 0 0
0
0
(a) (b)
0
0 0
1
1 1
0
0 0
0 0 0
0
Figure 6.28: (a) The morphological reconstruction of f by g using a 5 5 structuring
element. (b) After subtracting the morphological reconstruction of f by g from f. Grey
indicates the identied maxima of f.
Example 6.15
You are given the following sequence of numbers:
f = [6, 10, 10, 10, 7, 8, 8, 7, 4, 11, 6, 6, 6, 8, 8, 5, 5, 5, 5, 6, 4, 5, 4] (6.54)
Identify all its local maxima by morphological reconstruction.
Image segmentation 561
We must apply formula (6.51), on page 555. We rst create sequence g by subtracting
1 from f. This is shown in gure 6.29a. Then we must dilate g. We select to do
that by using a structuring element of length 3, ie by taking the maximum of every
three successive elements of g. However, before we do that, we extend g by one sample
on either side, so that all samples of interest have left and right neighbours. Finally,
we take the minimum between the dilated version of g and f. The successive steps of
iteratively applying this process, until convergence, are shown in gure 6.29.
(b) Dilate g
(c) Take the min to create g
2
(d) Dilate g
2
(e) Take the min to create g
3
(f) Dilate g
3
(g) Take the min to create g
(a) Produce g from f
1
1
4
(h) Subtract g (=g ) from f
4 3
Figure 6.29: The sequence of steps required in order to apply formula (6.51) to identify
the local maxima in sequence f. In each panel, sequence f is shown with a dotted line.
In each panel, the sequence that is being processed is shown with a dashed line and the
result obtained is shown with a continuous line.
562 Image Processing: The Fundamentals
Example 6.16
For the sequence of example 6.15, identify the local maxima that stick out
from their surroundings by at most 2 levels.
We must apply formula (6.51). As we are interested in local maxima that stick above
their surroundings by at most 2 levels, we create sequence g by subtracting 2 from f.
We repeat then the process of example 6.15 until convergence. All steps needed are
shown in gures 6.30 and 6.31. We observe two things.
(i) The more g diers from f, the more iteration steps are needed for convergence.
This is expected, as g has to evolve more in order to get stuck underneath f. Note
that, if we had created g by subtracting from f a much larger number than 2, g would
never reach f to be stopped by it, but rather g would converge to a at signal below
any trough or peak of f. So, the maximum number we should use, to create g from f,
should be the peak-to-peak amplitude of the largest uctuation of the f values.
(ii) If i
0
is the value we subtract from f in order to create g, the process identies all
peaks that stick above their surroundings by at most i
0
. This means that if we want
to identify only the peaks that stick above their surrounding by exactly i
0
, we must
either threshold the result, to keep only the maxima that have value exactly i
0
, or we
must apply a hierarchical approach, where we successively identify peaks of increasing
maximum size, until no g converges to the at signal, and then subtract each result from
the next one. This way we can segment the image using successively lower and lower
peaks in a multiresolution scheme, which allows us to stop at the desired resolution.
(b) Dilate g (d) Dilate g
(c) Take the min to create g
2
2
1
1
(a) Produce g from f
Figure 6.30: The sequence of steps required in order to apply formula (6.51) to iden-
tify the local maxima in sequence f, that stick out by at most 2 levels above their
surroundings. In each panel, sequence f is shown with a dotted line. In each panel,
the sequence that is being processed is shown with a dashed line and the result obtained
is shown with a continuous line.
Image segmentation 563
(e) Take the min to create g
(f) Dilate g
(h) Dilate g
(g) Take the min to create g
(j) Dilate g
(k) Take the min to create g
(l) Dilate g
(m) Take the min to create g
(n) Subtract g (=g ) from f (i) Take the min to create g
3
3
4
4
5 7
7
6
6
5
6
Figure 6.31: Continuation of the process of gure 6.30. Note that the identied max-
ima stick above their surroundings by less than or equal to 2.
564 Image Processing: The Fundamentals
Example 6.17
For the sequence of example 6.15, identify all local minima, by morpholog-
ical reconstruction.
First, we construct sequence g
1
by adding 1 to every sample of the sequence. This is
shown with the solid line in gure 6.32a. We then erode g
1
by taking the minimum of
every three successive numbers. Note that before we do that, we extend the sequence
by one sample in each direction, by repeating the rst and the last sample, so that all
samples have left and right neighbours. The result is shown in 6.32b, with the solid line.
Next, take the maximum between the eroded version of g
1
and the original sequence, to
produce g
2
. We repeat this process as shown in the panels of gure 6.32, until the new
signal g we produce no longer changes. This happens at the fourth iteration. We then
subtract the original sequence from g
3
. The samples that have nonzero values identify
the local minima.
(b) Erode g
(c) Take the max to create g
(d) Erode g
(e) Take the max to create g
(f) Erode g
(g) Take the max to create g
(h) Subtract f from g (=g )
(a) Produce g from f
~
1
~
1
~
2
~
2
~
~
4
~
3
3
~
3 4
~
Figure 6.32: The sequence of steps required in order to apply formula (6.52) to identify
the local minima in sequence f. In each panel, sequence f is shown with the dotted
line. In each panel, the sequence that is being processed is shown with a dashed line
and the result obtained with a continuous line.
Image segmentation 565
Example 6.18
Use the local minima identied in example 6.17 to segment the sequence
of example 6.15.
Figuratively speaking, we rst assume that the local minima identied are pierced
at their bottom. Then we start raising the level of water from the bottom and every
time the water enters a minimum, a new lake is created, with its own name. The
water level carries on increasing, until all lakes meet. In practice, we start from the
value of the deepest minimum identied. All samples around the (equally deep) deepest
minima are given some labels that identify these distinct minima. Let us say that this
minimum value of the equally deep deepest minima is f
min
. As shown in gure 6.33a
we have for this sequence three deepest minima, with value 4. Each one is given its
own label, identied in the gure with a dierent tone of grey. Then we raise the level
of water by 1, and all samples with value 5 or less that are adjacent to an existing
label inherit the value of this label (see gure 6.34). If a new local minimum is ooded
this way, ie if a local minimum with value 5 exists, it is assigned a new label. Then
we consider all samples with value 6: if they are adjacent to a labelled sample, they
inherit the label of that sample; if not, a new label is created. This process carries
on until no unlabelled samples are left. Note that samples that correspond to local
maxima may be labelled on a rst-come-rst-serve basis, ie they may get the label of
a neighbouring pixel at random. One, however, may be a little more sophisticated
and leave those samples for the end. Then one may assign them labels so that they
cause the minimum increase of the variance of the values of the region to which they
are assigned. We sequentially consider assigning the pixel to each one of the possible
regions it might belong to. For each case we estimate the change in the variance of the
grey values of the region that will be created by incorporating the pixel. The region that
will suer the least disturbance by the incorporation of the pixel will bequest its label to
the pixel. An alternative criterion would be simply to assign to a pixel the label of its
neighbour that has the most similar grey value. Note that in this example, the sample
with value 11 (the highest peak of the sequence) has been assigned the same label as
the deepest trough of the sequence (the sample with value 4) by the ad-hoc approach.
A more elaborate approach would clearly assign to this sample the label of its right
neighbour.
(b) Flooding level 1 (a) All minima pierced; flood begins
Figure 6.33: See caption of gure 6.34.
566 Image Processing: The Fundamentals
(e) Flooding level 4
(f) Flooding level 5
(g) Flooding level 6
(h) Flooding level 7
(c) Flooding level 2
(d) Flooding level 3
Figure 6.34: The sequence of steps required in order to segment the signal in 6.33a,
with the watershed algorithm. The dierent tones of grey represent dierent labels.
How does the use of the gradient magnitude image help segment the image by
the watershed algorithm?
The places where the magnitude of the gradient of the image is high are the places where
we wish dierent regions of the image to meet. So, to prevent regions from growing inside
the territory of neighbouring regions, we erect barriers at the places of high gradient mag-
nitude, the so called watersheds. We also consider as seeds the minima of the gradient
magnitude image, instead of the minima and maxima of the original grey image. The min-
ima of the gradient magnitude usually correspond to at patches of the image. These at
image patches could be either brighter or darker than their surroundings. Thus, working
with the gradient minima as opposed to the image minima, we may segment simultaneously
regions that are either brighter or darker than their surroundings. The algorithm is as follows.
Step 0: Create an empty output array o, the same size as the input image.
Image segmentation 567
Step 1: Compute the gradient magnitude image w from the input image.
Step 2: By morphological reconstruction, identify the minima of the gradient magnitude
image and use them as seeds from which regions will grow.
Step 4: In the output array, give a distinct label to each seed patch of pixels.
Step 5: Set the threshold at the minimum value of the gradient magnitude image, t
min
.
Step 6: All pixels with value in w less or equal to the threshold, and which are adjacent to
one of the labelled patches in o, and which have not yet been assigned a label, are given in o
the label of the labelled patch to which they are adjacent.
Step 7: Repeat Step 6 until no assignment of labels happens.
Step 8: If there are pixels still with no labels, increment the threshold by t and return to
Step 6.
Figure 6.35 shows the result of applying this algorithm to an image. In general, the
watershed algorithm creates image oversegmentation.
(a) Original image (b) Labels of the segments shown as grey values
(c) Outlines of the segments (d) Segments shown with their average grey value
Figure 6.35: The watershed algorithm applied to the Greek ag.
568 Image Processing: The Fundamentals
Are there any drawbacks in the watershed algorithm which works with the gra-
dient magnitude image?
Yes. The gradient magnitude is usually very grossly estimated. This is particularly important
for medical image processing, where small image structures may have to be segmented and
minute image details preserved (see example 6.19). The situation may be improved if subpixel
gradient estimation is performed (see example 6.20).
However, another issue is that the minima of the gradient magnitude do not necessarily
correspond to peaks and troughs of the grey image (so that they can be used as seeds to allow
regions to grow). The gradient magnitude becomes 0 at saddle points of the image too, ie at
places where the image brightness obtains a maximum along one direction and an inection
along the orthogonal direction (see example 6.21).
A much better method for using the watershed algorithm is to identify the ridges and
waterlines of the image rst, with subpixel accuracy. The minima and maxima of the image
are identied as the points where non-ascending and non-descending paths from neighbouring
pixels end up, respectively. All pixels, that have a non-ascending path of a chain of pixels
leading to a particular minimum, correspond to that minimum and form the so called wa-
terhole of that minimum. Non-ascending path means that the path we follow, to move away
from the pixel, either leads to a pixel with the same value, or moves to a pixel with a lower
value. All pixels, that have a non-descending path of a chain of pixels leading to a particular
maximum, correspond to that maximum and form the so called hill of that maximum. Non-
descending path means that the path we follow, to move away from the pixel, either leads to
a pixel with the same value, or moves to a pixel with a higher value. Methods that use these
concepts to segment images are beyond the scope of this book.
Example 6.19
Figure 6.36a shows an image with a dark object on a bright background.
Compute the rst dierence
i
f(i, j) at location (i, j), by subtracting the
value at location (i 1, j) from the value at location (i + 1, j) and dividing
by 2, and the rst dierence
j
f(i, j), by subtracting the value at location
(i, j1) from the value at location (i, j+1) and dividing by 2. Then, compute
the gradient map of this image. Comment on whether the gradient map
you create this way can allow you to erect watershed barriers to segment
this image by using the watershed algorithm.
Figures 6.36c and 6.36d show the rst dierences computed for the pixels of this image,
while gure 6.36b shows the gradient map computed from them, by using:
Gradient Magnitude =
_
(
i
f(i, j))
2
+ (
j
f(i, j))
2
(6.55)
The gradient values are high for several neighbouring pixels, and we cannot easily see
how we may join the ridges of them to form the watersheds we need.
Image segmentation 569
5 17
18
3 4 5 16 15 15 14 14
15
13
14
14 15
16 15 17
16 17 6 5 4
6 7
14 5 6 7 18 18 19
15
18
17
18 16
17
18
14 4
3
13
17 16 18 18 19 18
17 15
4
5 6 18 19 20
20 20 19 5
19 18
6.5
5.1
2.2 7.1 6.3 6.7 1.1 0.5
7.6 8.3 9.3 7.5 0.5
5.9 4.6 1.4 6.6 1.4
4.0 6.3 6.1 8.1 7.8 6.5
5.1 5.0 7.1 8.1 6.6
0.5 1.1 1.4 5.0 6.1 1.4
(c) (d)
(a) (b)
1.0
1 6 5 1 1 0
1 1 0.5 5 5.5 1
0.5 5.5 6 1 4 4
5.5
7.5
2 1
7
4.5 1
1
2
6.5 1
0.5 7.5
1 0
7.5
1.5
6.5
0.5 0.5 6.5 6 7 1
1.5
2
0.5
0
0.5 0.5 1 0.5 1 1
6.5
6.5
1
0 0 5.5 5 4.5
1
6
5 5 5 6
5.5
0.5 1 1
6 5.5
Figure 6.36: (a) A grey image. The thick black line indicates the object depicted in
this image. We would like to use the gradient information to identify a ridge that will
allow us to separate the object from the background. (b) The gradient magnitude of
the image. It is rather gross for the job we want to do. (c) The rst dierences along
the vertical direction. (d) The rst dierences along the horizontal direction. The
gradient magnitude shown in (b) is the square root of the sum of the squares of the
rst dierences shown in (c) and (d).
Example 6.20
The rst dierence between two neighbouring pixels corresponds to the
image derivative at the position between the two pixels. Create the so
called dual grid of image 6.36a, by considering these positions, and estimate
the gradient magnitude image for this image at the subpixel positions.
To create the dual grid of image 6.36a, we create cells between any two neighbouring
pixels along the vertical and horizontal directions, where we assign interpolated grey
image values between the adjacent pixels. Figure 6.37 shows this calculation. We then
570 Image Processing: The Fundamentals
use these values to compute the rst dierences of the image at the positions of the
dual grid, that correspond to the empty cells in gure 6.37. The square root of the sum
of the squares of these rst dierences gives the gradient magnitude at the points of
the dual grid. This is shown in gure 6.38.
10.5 5 4.5 3.5 3 16 15 15 14
18
20
20
19
5
4
19 18 17 15 13 17 17
20 19 5 4 3 18 18
19 18 6 5 4 14
18 18 7 6 5 14 13
6 7 17 17 16 15 14
5 6 17 16 15 14 15
15.5 15 14.5 14
10.5
17
17.5 15.5
5.5 4.5 3.5 16.5 15.5 15 14 14.5
14.5
14.5 14.5 16.5 11.5 5.5 4.5
5.5 4.5 6.5 17 16.5 15.5 14.5
5.5 6.5 12 17 16.5 15.5 14.5
13.5 14.5 11.5 12 12.5 12
14 4
13.5 9.5 5.5 6.5 12.5 18 18.5
19.5 18.5 18 6.5 5.5 4.5 14 14
19.5 18.5 12 5.5 4.5 9 14.5
16.5
15
17.5
17 18 16.5 16
16.5
17
18 10.5 3.5 12 19.5
18.5 5.5 4.5 3.5 16
19
18.5 18.5 17.5 16 15
9.5 8 17.5 17.5 11 18.5 19.5
18 19 18 15
17 16.5 16 18 18 18 18.5 19 18.5 18
15.5
20
19.5 20
12
4.5
14
Figure 6.37: The interpolated values of image 6.36a at the interpixel positions.
13
5 3 16 15 15 14
18
20
20
19
5
4
19 18 17 15 13 17 17
20 19 5 4 3 18 18
19 18 6 5 4 14
18 18 7 6 5 14 13
6 7 17 17 16 15 14
5 6 16 15 14 15
14 4
15
18 16 17 16 18 18 19 18
17
1.4 1.4 11 1.4 0.7 1 0.7
1.4 1.4 11.5 0.7 1.4 1.4 0
11.5 0.7 10.5 11.0 7.2 1.4
1.4 0.7 11.5 1.4 1.4 9.5 1
0.7 1.4 13 1.4 1.4 12.6 3.5
1.6 1.4 9.3 11.6 10.5 10.5 1
1 1 0.7 2.2 2.5 2.1 1
Figure 6.38: The values of the gradient magnitude of image 6.36a at the positions of
the dual grid. The pixels of the original image and their grey values are shown in the
grey cells.
Image segmentation 571
Example 6.21
Use the result of example 6.20 to identify the watersheds in image 6.36a.
Identify also the minima of the gradient magnitude which may be used as
seeds by the watershed algorithm.
By observing the gradient magnitude values in 6.38, we note that we have two distinct
sets of values that may be identied by selecting thresholds 1 and 3. We consider values
below or equal to 1 as corresponding to the minima of the gradient map and values
above 3 as corresponding to the watersheds of the image.
The values that may be linked to form the watersheds are shown in gure 6.39 as black
dots. The minima that may be used as seeds are shown as open circles. Note that one
of these open circles, near the narrow neck of the object, does not correspond to either
a minimum or a maximum of the image, but to a saddle point.
3 4 5 16 15 15 14 14
4 5 6 17 16 15 14 15
14 15 16 17 17 7 6 5
19 18 18 7 6 5 14 13
15 14 4 5 6 18 19 20
20 20 19 5 4 3 18 18
17 17 13 15 17 19 18 18
18 19 18 18 16 17 16 18
Figure 6.39: The dark dots correspond to interpixel positions with high gradient mag-
nitude, appropriate for being joined to form watersheds that will stop one image region
spilling into another. The open circles correspond to minima of the gradient magnitude
and they may be used as seeds for the watershed algorithm.
572 Image Processing: The Fundamentals
Example B6.22
In image 6.36a, identify all non-descending paths starting from pixel (1, 1)
that has grey value 3. Can you identify any non-ascending paths starting
from this pixel?
We start from pixel (1, 1) and we examine all its neighbours in a 3 3 neighbourhood.
We proceed to the pixel with the highest value. If more than one neighbours have the
same value, we must consider all options. This creates the bifurcation of the path we
follow. The result is shown in gure 6.40. Note that all branches of the non-descending
path lead to the same patch of locally maximal values. If we construct similar paths
from all pixels in the image, the pixels that have their paths leading to the same local
maximum form the hill of that maximum.
There are no non-ascending paths starting from pixel (1, 1) as the value of this pixel
is a local minimum. If we construct all non-ascending paths that start from each pixel
in the image, the pixels that have their paths ending up to pixel (1, 1) constitute the
water basin of this minimum.
5
18
18
20
20
19
5
4
3 4
5
6
18
19
20
19
19 18
18
19
18
18
7
6
5
17
7
6
5
17
18 16
15
4
5
6
17
16
15 15
15
16
4
3
13
17 16
17
18
14
14
15
14
14 14
15
14
13
15
18
17
18
17
16
Figure 6.40: All non-descending paths that start from the top left corner pixel of this
image.
Image segmentation 573
Example B6.23
In image 6.36a identify all non-ascending paths starting from pixels (4, 1),
(4, 2), (4, 3), (5, 1), (5, 2), (5, 3), (6, 1), (6, 2), (6, 3), (7, 1), (7, 2), (7, 3), (8, 1), (8, 2)
and (8, 3). Pixels which have non-ascending paths to two dierent local
minima form the ridges of the image. Identify which of these pixels belong
to the ridges of the image.
All non-ascending paths, starting from the pixels mentioned, are marked in gure 6.41.
The local minima, to which these paths end up, are pixels (1, 1), (6, 6) and (8, 1). Pixels
which have non-ascending paths to more than one minima are highlighted in grey.
If we construct all non-descending paths, starting from all pixels in the image, the
pixels, that have such paths ending to more than one local maxima, form the waterlines
of the image.
3
4
5
19
20
20
18
18 19
19
20
19
18
6
5
4 5
6
7
18
18
19
18
18 18
17
5
6
7
17
17
16 15
16
17
6
5
4
15
16 17
13
3
4
5
16
15
15 14
14
15
14
14
18
17
16 18
17
18
15
13
14
15
14
Figure 6.41: The grey pixels have non-ascending paths that end up to two dierent local
minima. These pixels form the ridges of the image. Note that neighbouring pixels (5, 1)
and (6, 1) have non-ascending paths that end up to two dierent minima. One may
postulate the existence of a ridge between these two pixels, at subpixel location. This
subpixel ridge is marked with a grey horizontal box in the image.
574 Image Processing: The Fundamentals
Is it possible to segment an image by ltering?
Yes. The edge preserving mode lter we encountered in Chapter 4 (page 328), if applied
repeatedly, produces a segmentation. It turns out that only the edge adaptive version of
the lter gives acceptable results (page 385). An example is shown in gure 6.42, where the
weighted mode lter with weights
_
_
1 3 1
3 5 3
1 3 1
_
_
(6.56)
was used 134 times. This lter is very slow.
(a) Original image (b) Edge adaptive mode ltering
Figure 6.42: The edge preserving mode algorithm used to segment Leonidas.
Another type of ltering is the mean shift ltering, we also saw in Chapter 4, used for
smoothing (see pages 339 and 385). The smooth image produced by this algorithm may be
segmented by agglomerating all segments which contain representative pixels that are within
a certain distance from each other in the spatio-brightness domain.
How can we use the mean shift algorithm to segment an image?
The algorithm described on page 339 was shifting all pixels at each iteration, thus dynamically
changing the feature space. It converges when all pixels belong to a single cluster. So, that
version of the algorithm has to be run for a prespecied number of iterations. At the end
the user has to select the desired result. However, one may also run the algorithm while
keeping the feature space xed: each time we concentrate on one pixel only, updating its
values until they reach a xed point. This xed point is the nearest mode of the static feature
space, ie the centre of the nearest agglomeration of pixels in the feature space. Once the nal
values of this pixel have been identied, the next pixel is considered, again working in the
original unchanged feature space, where all pixels have their original values. In this version
Image segmentation 575
(a)Leonidas (128 128) (b)Dynamic feature space (c)Static feature space
Figure 6.43: Leonidas segmented by mean shift. (b) This result was obtained after 80 iter-
ations, using h
x
= 15, h
y
= 15 and h
g
= 0.05. (The grey values were in the range [0, 1].)
The triplets of all pixels in the feature space were updated at each iteration. (c) As in (b),
but now trajectories of pixels in the static feature space were computed. Each trajectory was
considered as having reached its xed point when the value of (6.57) was less than or equal
to 0.0001.
of the algorithm, we compute trajectories of pixels that lead to the nearest cluster centre.
Each trajectory is disregarded at the end, and what matters only is the nal triplet of values
associated with the pixel. The convergence of each trajectory is judged by comparing two
successive triplets of values, using their distance, computed as
(x
m+1
ij
x
m
ij
)
2
h
2
x
+
(y
m+1
ij
y
m
ij
)
2
h
2
y
+
(g
m+1
ij
g
m
ij
)
2
h
2
g
(6.57)
where (x
m+1
ij
, y
m+1
ij
, g
m+1
ij
) and (x
m
ij
, y
m
ij
, g
m
ij
) are the triplets of values associated with pixel
(i, j) at iterations m+1 and m, respectively. This distance value is compared with a threshold
th to assess convergence of the values of pixel (i, j).
After the mean shift algorithm has been run, either with a dynamic or a static feature
space, every pixel is associated with a vector (x
m
final
ij
, y
m
final
ij
, g
m
final
ij
). However, there are
only a few such distinct vectors in comparison with the original number of pixels. Each such
vector is represented by a point in the 3D space of position-grey value. Let us say that we
have M such distinct vectors, each one linked to a large number of pixels. Let us call them
z
m
(x
m
, y
m
, g
m
) for i = 1, 2, . . . , M. We may create a double entry table of size M M,
where the element at position (m, n) is 1, if
|x
m
x
n
| h
x
|y
m
y
n
| h
y
|g
m
g
n
| h
g
(6.58)
and it is 0 if the above conditions do not hold simultaneously. Then, we look for connected
regions made up of 1s in this array (islands of 1s). All pixels, that are linked to all vectors
z
m
that make up the same island, receive the same label. This way the segmented image is
formed. Figure 6.43 shows an example of applying this algorithm.
Treating pixels as triplets of values frees us from the rectangular grid and allows us to
think of representing an image by a graph.
576 Image Processing: The Fundamentals
What is a graph?
A graph is a mathematical construction that consists of a set of nodes, called vertices, and
a set of edges, ie lines that join vertices that are related according to some relationship
we specify. If every node is connected to every other node, the graph is said to be fully
connected. If the edges represent relationships that are symmetric, ie the order of the
connected nodes does not matter, then the graph is called undirected. We often assign
weights to the edges to quantify the relationship that exists between the joined nodes.
How can we use a graph to represent an image?
We can consider each pixel to be a node. We may connect every pixel to every other pixel and
assign as weight to the connection the inverse of the absolute grey value dierence of the two
connected pixels. These weights then measure the similarity between the two pixels. This
way the image is represented by an undirected relational graph. To be consistent with the
mean shift algorithm, we may dene the similarity to be the inverse of the weighted sum of
the distance and the square grey value dierence of the two pixels.
How can we use the graph representation of an image to segment it?
We may start by removing the weakest links of the graph, ie disconnect pixels with the least
similarity in order to form disjoint sets of pixels. To do that in a systematic and unbiased
way, we may use the normalised cuts algorithm.
What is the normalised cuts algorithm?
Let us assume that the set of vertices V of a graph is divided into two subsets, A and B. We
dene the cut of sets A and B as
cut(A, B)
uA,vB
w(u, v) (6.59)
where w(u, v) is the weight of the edge that joins nodes u and v. We then dene the nor-
malised cut between sets A and B as:
ncut(A, B)
cut(A, B)
cut(A, V )
+
cut(A, B)
cut(B, V )
(6.60)
The normalised cuts algorithm divides the pixels of the image into the two subsets A and
B that minimise ncut(A, B).
Box 6.3. The normalised cuts algorithm as an eigenvalue problem
Let us dene the total connection a node V
i
of a graph has to all other nodes, as
d
i
j
w
ij
, where w
ij
is the weight between nodes V
i
and V
j
. This is called the
Image segmentation 577
degree of the node. Also, dene a vector x, such that its element x
i
has value 1, if
node V
i
belongs to subset A, and it has value 1, if V
i
belongs to subset B. We may
then express ncut(A, B), dened by (6.60), as:
ncut(A, B) =
x
i
=1,x
j
=1
w
ij
x
i
=1
d
i
+
x
i
=1,x
j
=1
w
ij
x
i
=1
d
i
(6.61)
Note that the problem we are trying to solve is to identify vector x, because if we know
vector x, we know which pixel (node of the graph) belongs to set A and which does not
belong to set A. So, we may denote function ncut(A, B) as C(x), which expresses the
cost function we wish to minimise, as a function of the unknown vector x.
Let us denote by 1 a vector with all its elements equal to 1. Note that an element of
vector (1 +x)/2 will be 1, if the corresponding node belongs to set A, and it will be 0,
if it does not. Similarly, an element of vector (1 x)/2 will be 1, if the corresponding
node belongs to set B, and 0 otherwise. It can be shown then (see examples 6.24 and
6.25), that (6.61) may be written as
C(x) =
(1 +x)
T
L(1 +x)
4k1
T
D1
+
(1 x)
T
L(1 x)
4(1 k)1
T
D1
(6.62)
where D is a diagonal matrix with values d
i
along its diagonal, L D W with W
being the matrix made up from weights w
ij
, and:
k
x
i
=1
d
i
i
d
i
(6.63)
Matrix L is known as the Laplacian matrix of the graph.
Obviously:
C(x) =
(1 k)(1 +x)
T
L(1 +x) +k(1 x)
T
L(1 x)
4k(1 k)1
T
D1
(6.64)
Let us divide numerator and denominator with (1 k)
2
:
C(x) =
1
1k
(1 +x)
T
L(1 +x) +
k
(1k)
2
(1 x)
T
L(1 x)
4
k
1k
1
T
D1
(6.65)
Dene:
b
k
1 k
k =
b
1 +b
1
1 k
= 1 +b and
k
(1 k)
2
= b(1 +b) (6.66)
If we substitute from (6.66) into (6.65), we obtain:
C(x) =
(1 +b)(1 +x)
T
L(1 +x) +b(1 +b)(1 x)
T
L(1 x)
4b1
T
D1
(6.67)
Term 1
T
L1 is 0. This is because L1 means nothing else than summing all elements
along the same row of matrix L. We know that the diagonal element in each row is the
578 Image Processing: The Fundamentals
sum of the weights of all the edges that start from the corresponding node, while all
other elements are the weights from that node to each other node, with a negative sign
(see example 6.26). Term 1
T
L1 then may change sign as convenient. This allows us to
write (see example 6.27):
C(x) =
[1 +x b(1 x)]
T
L[1 +x b(1 x)]
4b1
T
D1
(6.68)
It can be shown (see example 6.32) that 4b1
T
D1 = y
T
Dy, where y is:
y 1 +x b(1 x) (6.69)
Then cut C(x) is given by:
C(x) =
y
T
Ly
y
T
Dy
(6.70)
This is known as the Rayleigh quotient. The problem is then to identify a vector x,
which determines which node of the graph belongs to set A and which not, so that the
Rayleigh quotient C(x) is minimised.
Example B6.24
Consider the fully connected graph of gure 6.44. It consists of nodes
V = {V
1
, V
2
, V
3
, V
4
}. Set A consists of nodes A = {V
1
, V
4
}, while set B consists
of nodes B = {V
2
, V
3
}. The weights associated with each edge are marked
on the gure. Compute the normalised cut between sets A and B, using
denition (6.61).
2
V
1
1
V
V
2
3
V
4
2
2
3
1
Figure 6.44: A fully connected graph.
Vector x obviously is x
T
= (1, 1, 1, 1). From (6.61) then:
ncut(A, B) =
w
12
+w
13
+w
42
+w
43
d
1
+d
4
+
w
21
+w
24
+w
31
+w
34
d
2
+d
3
=
6
10
+
6
12
=
11
10
(6.71)
Image segmentation 579
Example B6.25
Compute the normalised cut of sets A and B of example 6.24, using formula
(6.62).
From gure 6.44 we work out matrices D and W, and from them matrix L:
D =
_
_
_
_
5 0 0 0
0 6 0 0
0 0 6 0
0 0 0 5
_
_
_
_
W =
_
_
_
_
0 2 1 2
2 0 3 1
1 3 0 2
2 1 2 0
_
_
_
_
(6.72)
Then:
L = D W =
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
(6.73)
Parameter k, dened by (6.63), is:
k =
d
1
+d
4
d
1
+d
2
+d
3
+d
4
=
5 + 5
5 + 6 + 6 + 5
=
10
22
=
5
11
(6.74)
Vectors 1 +x and 1 x are:
1 +x =
_
_
_
_
1
1
1
1
_
_
_
_
+
_
_
_
_
1
1
1
1
_
_
_
_
=
_
_
_
_
2
0
0
2
_
_
_
_
1 x =
_
_
_
_
1
1
1
1
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
_
_
_
0
2
2
0
_
_
_
_
(6.75)
Then we can compute the various quantities that appear in (6.62):
(1 +x)
T
L(1 +x) =
_
2 0 0 2
_
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
_
_
_
_
2
0
0
2
_
_
_
_
=
_
2 0 0 2
_
_
_
_
_
6
6
6
6
_
_
_
_
= 24 (6.76)
580 Image Processing: The Fundamentals
(1 x)
T
L(1 x) =
_
0 2 2 0
_
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
_
_
_
_
0
2
2
0
_
_
_
_
=
_
0 2 2 0
_
_
_
_
_
6
6
6
6
_
_
_
_
= 24
1
T
D1 =
_
1 1 1 1
_
_
_
_
_
5 0 0 0
0 6 0 0
0 0 6 0
0 0 0 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
1 1 1 1
_
_
_
_
_
5
6
6
5
_
_
_
_
= 22 (6.77)
Substituting in (6.62), we obtain:
ncut(A, B) =
24
4
10
22
22
+
24
4
_
1
10
22
_
22
=
11
10
(6.78)
Example B6.26
For the graph of example 6.24 compute scalars x
T
L1, 1
T
Lx and 1
T
L1.
For this graph, vector x is x
T
= (1, 1, 1, 1) and matrix L is given by (6.73). We
have then:
x
T
L1 =
_
1 1 1 1
_
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
1 1 1 1
_
_
_
_
_
0
0
0
0
_
_
_
_
=0
1
T
Lx =
_
1 1 1 1
_
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
1 1 1 1
_
_
_
_
_
6
6
6
6
_
_
_
_
=0
1
T
L1 =
_
1 1 1 1
_
_
_
_
_
5 2 1 2
2 6 3 1
1 3 6 2
2 1 2 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
1 1 1 1
_
_
_
_
_
0
0
0
0
_
_
_
_
=0 (6.79)
Image segmentation 581
Example B6.27
Write the numerator of the right-hand side of equation (6.67) in the form
y
T
Ly, where y is some vector.
Let us expand the numerator of (6.67):
(1 +b)(1 +x)
T
L(1 +x) +b(1 +b)(1 x)
T
L(1 x) =
(1 +x)
T
L(1 +x) +b(1 +x)
T
L(1 +x) +b(1 x)
T
L(1 x) +b
2
(1 x)
T
L(1 x) =
(1 +x)
T
L(1 +x) +b
2
(1 x)
T
L(1 x)+
b [1
T
L1 +1
T
Lx +x
T
L1 +x
T
Lx +1
T
L1 1
T
Lx x
T
L1 +x
T
Lx]
. .
S
(6.80)
As 1
T
L1 = 0, we can change the sign of these terms inside the square bracket. For
those terms then, we have:
S = 1
T
L1
. .
a
+1
T
Lx
. .
b
+x
T
L1
. .
c
+x
T
Lx
. .
c
1
T
L1
. .
d
1
T
Lx
. .
a
x
T
L1
. .
d
+x
T
Lx
. .
b
(6.81)
We combine together terms marked with the same letter:
S = 1
T
L(1 +x)
. .
e
+(1 +x)
T
Lx
. .
f
+x
T
L(1 +x)
. .
e
(1 +x)
T
L1
. .
f
(6.82)
We again combine terms marked with the same letter:
S = (1 x)
T
L(1 +x) (1 +x)
T
L(1 x) (6.83)
When we substitute S in (6.80), we obtain the numerator of (6.67) as:
(1 +x)
T
L(1 +x)
. .
g
+b
2
(1 x)
T
L(1 x)
. .
h
b(1 x)
T
L(1 +x)
. .
g
b(1 +x)
T
L(1 x)
. .
h
=
[1 +x b(1 x)]
T
L(1 +x) b[1 +x b(1 x)]
T
L(1 x) =
[1 +x b(1 x)]
T
L[1 +x b(1 x)]
(6.84)
We note that the numerator of (6.67) may be written in the form y
T
Ly, if we dene
vector y as:
y 1 +x b(1 x) (6.85)
582 Image Processing: The Fundamentals
Example B6.28
For vector y, dened by (6.85), and matrix D, dened in Box 6.3, show
that y
T
D1 = 0.
First, we observe that D1 is nothing other than a vector with the d
i
values of the
nodes of the graph. When we then multiply it from the left with vector (1 +x)/2,
which has as nonzero elements those that identify the nodes that belong to set A, we
get
x
i
=1
d
i
. When we multiply it from the left with vector (1 x)/2, which has as
nonzero elements those that identify the nodes that do not belong to set A, we get
x
i
=1
d
i
.
Also, from the denition of b (equation (6.66), on page 577) and the denition of k
(equation (6.63)), we obtain:
b
k
1 k
=
x
i
=1
d
i
x
i
=1
d
i
(6.86)
We make use of all these observations to compute y
T
D1, as follows:
y
T
D1 = [1 +x b(1 x)]
T
D1
= (1 +x)
T
D1 b(1 x)
T
D1
= 2
x
i
=1
d
i
2b
x
i
=1
d
i
= 2
x
i
=1
d
i
2
x
i
=1
d
i
x
i
=1
d
i
x
i
=1
d
i
= 0 (6.87)
Example B6.29
For the graph of example 6.24, compute y
T
D1.
To compute vector y, as dened by (6.85), we need rst the value of b. Parameter k
in example 6.25 was computed as 5/11. Then from denition (6.66) we deduce that
b = (5/11)/(6/11) = 5/6. We may then compute vector y, using denition (6.85), as:
y =
_
_
_
_
1
1
1
1
_
_
_
_
+
_
_
_
_
1
1
1
1
_
_
_
_
5
6
_
_
_
_
_
1
1
1
1
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
_
_
=
_
_
_
_
2
0
0
2
_
_
_
_
5
6
_
_
_
_
0
2
2
0
_
_
_
_
=
_
_
_
_
2
5/3
5/3
2
_
_
_
_
(6.88)
Image segmentation 583
Using matrix D, as given by (6.73), we are then ready to compute y
T
D1:
y
T
D1 =
_
2
5
3
5
3
2
_
_
_
_
_
5 0 0 0
0 6 0 0
0 0 6 0
0 0 0 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
2
5
3
5
3
2
_
_
_
_
_
5
6
6
5
_
_
_
_
= 0
(6.89)
Example B6.30
For the graph of example 6.24 calculate vector Dx and from that the values
of 1
T
Dx and x
T
Dx. Generalise your results for any graph.
Dx =
_
_
_
_
5 0 0 0
0 6 0 0
0 0 6 0
0 0 0 5
_
_
_
_
_
_
_
_
1
1
1
1
_
_
_
_
=
_
_
_
_
5
6
6
5
_
_
_
_
(6.90)
Then:
1
T
Dx =
_
1 1 1 1
_
_
_
_
_
5
6
6
5
_
_
_
_
= 5 6 6 + 5 = 2
x
T
Dx =
_
1 1 1 1
_
_
_
_
_
5
6
6
5
_
_
_
_
= 5 + 6 + 6 + 5 = 22 (6.91)
We notice that operation with vector 1 from the left on vector Dx eectively sums the
d
i
values of all the nodes, that belong to set A, and subtracts from them the sum of the
d
i
values of all the nodes, that do not belong to set A. On the contrary, operation with
vector x from the left on vector Dx eectively sums up all the d
i
values. We conclude
that:
1
T
Dx =
x
i
=1
d
i
x
i
=1
d
i
x
T
Dx =
i
d
i
=
x
i
=1
d
i
+
x
i
=1
d
i
(6.92)
584 Image Processing: The Fundamentals
Example B6.31
Show that:
1
T
D1 =
i
d
i
(6.93)
L1 = 0 (6.94)
Let us assume that the graph has N nodes.
1
T
D1 =
_
1 1 . . . 1
_
_
_
_
_
d
1
0 . . . 0
0 d
2
. . . 0
. . . . . . . . . . . .
0 0 . . . d
N
_
_
_
_
_
_
_
_
1
1
. . .
1
_
_
_
_
=
_
1 1 . . . 1
_
_
_
_
_
d
1
d
2
. . .
d
N
_
_
_
_
=
N
i=1
d
i
(6.95)
L1 =
_
_
_
_
d
1
w
12
. . . w
1N
w
21
d
2
. . . w
2N
. . . . . . . . . . . .
w
N1
w
N2
. . . d
N
_
_
_
_
_
_
_
_
1
1
. . .
1
_
_
_
_
=
_
_
_
_
d
1
N
i=2
w
iN
d
2
w
21
N
i=3
w
iN
. . .
d
N
N1
i=1
w
iN
_
_
_
_
=
_
_
_
_
0
0
. . .
0
_
_
_
_
= 0 (6.96)
Example B6.32
For vector y, dened by (6.85), and matrix D, dened in Box 6.3, show
that y
T
Dy = 4b1
T
D1.
Let us calculate y
T
Dy, where we substitute only vector y on the right, from its de-
nition (6.85):
y
T
Dy = y
T
D1
. .
=0
+y
T
Dx b y
T
D1
. .
=0
+by
T
Dx (6.97)
Image segmentation 585
Here we made use of the result of example 6.28. Let us compute y
T
Dx:
y
T
Dx = 1
T
Dx +x
T
Dx b1
T
Dx +bx
T
Dx (6.98)
If we make use of (6.92), we obtain:
y
T
Dx =
x
i
=1
d
i
x
i
=1
d
i
+
i
d
i
b
x
i
=1
d
i
+b
x
i
=1
d
i
+b
i
d
i
= 2
x
i
=1
d
i
+ 2b
x
i
=1
d
i
(6.99)
Let us substitute now in (6.97):
y
T
Dy = 2
x
i
=1
d
i
+ 2 b
x
i
=1
d
i
. .
=
x
i
=1
d
i
+2b
x
i
=1
d
i
. .
=b
x
i
=1
d
i
+2b
2
x
i
=1
d
i
= 2
x
i
=1
d
i
+ 2
x
i
=1
d
i
+ 2b
2
x
i
=1
d
i
+ 2b
2
x
i
=1
d
i
= 4
x
i
=1
d
i
. .
=b
x
i
=1
d
i
+4b
2
x
i
=1
d
i
= 4b
_
x
i
=1
d
i
+b
x
i
=1
d
i
. .
=
x
i
=1
d
i
_
= 4b
_
x
i
=1
d
i
+
x
i
=1
d
i
_
= 4b
i
d
i
(6.100)
According to (6.95),
i
d
i
= 1
T
D1, so the proof is completed.
Box 6.4. How do we minimise the Rayleigh quotient?
It is obvious from (6.70) that if we manage to nd a number such that
Ly = Dy (6.101)
then by substituting in the numerator of (6.70), we shall have C(x) = . If we select
to be minimum, the corresponding vector y (and by extension the corresponding vector
x) will be the solution of our minimisation problem. First of all, we must write (6.101)
as a usual eigenvalue equation, where a matrix multiplies a vector that yields the vector
586 Image Processing: The Fundamentals
itself times a scalar. To do that, we multiply both sides of this expression with D
1
2
.
This is possible because D is a diagonal matrix with positive values along its diagonal,
and so taking its inverse square root is trivial (see also example 2.3, on page 51):
D
1
2
Ly = D
1
2
y (6.102)
Let us dene vector z D
1
2
y. Then obviously y = D
1
2
z and we have:
D
1
2
LD
1
2
z = z (6.103)
It can be shown that matrix D
1
2
LD
1
2
does not have negative eigenvalues (see ex-
amples 6.33, 6.34 and 6.35). Its smallest eigenvalue then must be 0. This eigenvalue,
however, corresponds to a vector x = 1, ie a vector that does not divide the set of nodes
into two subsets (see example 6.36). Then it is obvious that the eigenvector we require
is the smallest nonzero eigenvector of matrix D
1
2
LD
1
2
. This is known as the Fiedler
vector.
Example B6.33
A symmetric matrix A is called positive semidenite if x
T
Ax 0 for every
real vector x. For a graph that consists of three nodes, show that its
Laplacian matrix is positive semidenite.
Remembering that the weights of the nodes are symmetric, ie that w
ij
= w
ji
, the
Laplacian matrix of a 3-node graph has the form:
L =
_
_
d
1
w
12
w
13
w
12
d
2
w
23
w
13
w
23
d
3
_
_
(6.104)
Let us consider a vector x (x
1
, x
2
, x
3
)
T
and let us compute x
T
Lx:
x
T
Lx =
_
x
1
x
2
x
3
_
_
_
d
1
w
12
w
13
w
12
d
2
w
23
w
13
w
23
d
3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
x
1
x
2
x
3
_
_
_
d
1
x
1
w
12
x
2
w
13
x
3
w
12
x
1
+d
2
x
2
w
23
x
3
w
13
x
1
w
23
x
2
+d
3
x
3
_
_
= d
1
x
2
1
+d
2
x
2
2
+d
3
x
2
3
2w
12
x
2
x
1
2w
13
x
3
x
1
2w
23
x
3
x
2
(6.105)
We remember that the degree of a node is equal to the sum of the weights it has with
Image segmentation 587
the other nodes. So, d
1
= w
12
+w
13
, d
2
= w
21
+w
23
and d
3
= w
31
+w
32
. Then:
x
T
Lx = w
12
x
2
1
+w
13
x
2
1
+w
21
x
2
2
+w
23
x
2
2
+w
31
x
2
3
+w
32
x
2
3
2w
12
x
2
x
1
2w
13
x
3
x
1
2w
23
x
3
x
2
= w
12
(x
1
x
2
)
2
+w
13
(x
1
x
3
)
2
+w
23
(x
2
x
3
)
2
(6.106)
The right-hand side of (6.106) is always greater than equal to 0, for any vector x. So,
matrix L is positive semidenite.
Example B6.34
Show that, if L is a positive semidenite matrix and A is a diagonal matrix,
ALA is also positive semidenite.
To prove that ALA is positive semidenite, we must show that x
T
ALAx is always
greater than equal to 0 for every real vector x. We note that Ax is another vector,
which we might call y. We also note that since A is a diagonal matrix, y
T
= x
T
A.
So, we may write: x
T
ALAx = y
T
Ly. Since L is positive semidenite, y
T
Ly 0 and
so x
T
ALAx 0. Therefore, ALA is also positive semidenite.
Example B6.35
Show that, if A is a positive semidenite matrix, all its eigenvalues are
non-negative.
Assume that x is an eigenvector of A and is the corresponding eigenvalue. Then
Ax = x. Multiply both sides of this equation with x
T
to obtain:
x
T
Ax = x
T
x (6.107)
As is a scalar, it can come just after the equal sign on the right-hand side of (6.107).
Also, as x is an eigenvector, x
T
x = 1. Then
= x
T
Ax 0 (6.108)
since A is positive semidenite.
588 Image Processing: The Fundamentals
Example B6.36
Show that the eigenvector that corresponds to the 0 eigenvalue of matrix
D
1
2
LD
1
2
in Box 6.4 is vector 1, ie a vector with all its elements equal to
1.
We observe that if we set y = 1 in (6.101), = 0 because we have shown that L1 = 0,
where 0 is a vector of zeros (see (6.96)). So, vector 1, normalised so the sum of the
squares of its elements is 1, is the eigenvector of eigenvalue = 0. This eigenvector
does not partition the graph and so it is of no interest to the minimisation of (6.70).
Example B6.37
Show that all solutions of (6.103) satisfy y
T
D1 = 0.
Since all eigenvectors are orthogonal to each other, they will also be orthogonal to
eigenvector z
0
= D
1
2
1, the eigenvector that corresponds to the 0 eigenvalue. Then,
eigenvector z
i
, which corresponds to eigenvalue
i
, must satisfy:
z
i
T
z
0
= 0 y
i
T
D
1
2
D
1
2
1 = 0 y
i
T
D1 = 0 (6.109)
Example B6.38
Work out the values of the elements of vector y, as dened by (6.85), that
identify the nodes that belong to the two sets.
The ith element of vector y is given by:
y
i
= 1 +x
i
b(1 x
i
) (6.110)
If node V
i
belongs to set A, x
i
= 1 and so y
i
= 2 > 0. If node V
i
does not belong to
set A, x
i
= 1, and so y
i
= 2b < 0.
So, thresholding the y
i
values into positive and negative allows us to identify the nodes
that belong to the two classes.
Image segmentation 589
How do we apply the normalised graph cuts algorithm in practice?
Step 1: Represent the image by a graph, where every pixel is represented by a node and
the weights of the edges represent similarity between the pixels they join. As a measure of
similarity, we may use the inverse of the distance computed by the mean shift algorithm:
w
ij;kl
=
_
(i k)
2
h
2
x
+
(j l)
2
h
2
y
+
(g
ij
g
kl
)
2
h
2
g
_
1
2
(6.111)
Here w
ij;kl
is the weight between pixels (i, j) and (k, l), with grey values g
ij
and g
kl
, re-
spectively, and h
x
, h
y
and h
g
are suitably chosen positive parameters. However, any other
plausible measure of pixel similarity may be used instead.
Step 2: Construct matrices L, D and E D
1
2
LD
1
2
as follows:
Matrix D is a diagonal matrix. Its ith element along the diagonal is the sum of the weights
of the edges that connect node i with all other nodes.
Matrix W is made up from all the weights: its w
ij
element is the weight between nodes V
i
and V
j
. This matrix has 0s along its diagonal.
Matrix L is L = D W.
Note that if the image is M N pixels in size, these matrices will be MN MN in size.
Step 3: Find the smallest nonzero eigenvalue of E and the corresponding eigenvector z
1
. It
will be of size MN 1.
Step 4: Compute vector y as y
1
= D
1
2
z
1
.
This vector will have as many elements as there are pixels in the image. Depending on the
way we indexed the pixels of the image in order to form its graph representation, there is a
one-to-one correspondence between the elements of vector y
1
and the image pixels.
Step 5: Consider all pixels (nodes of the graph) that correspond to positive elements of
vector y
1
as belonging to one region and the ones that correspond to negative elements of y
1
as belonging to another region.
In practice, it is not possible to calculate directly the eigenvalues of matrix E, as it could
be very big. If we allow edges only between pixels that are relatively near to each other,
then this matrix is sparse. In that case, there are special methods that can be used, like, for
example, the Lanczos method. Such methods, however, are beyond the scope of this book.
Once the original image has been partitioned into two regions, the pixels of each region
may be represented by separate graphs which may also be partitioned in two regions each,
and so on. Figure 6.45 shows two examples of applying this algorithm.
Is it possible to segment an image by considering the dissimilarities between
regions, as opposed to considering the similarities between pixels?
Yes, in such an approach we examine the dierences between neighbouring pixels and say that
pixels with dierent attribute values belong to dierent regions and, therefore, we postulate
a boundary separating them. Such a boundary is called an edge and the process is called
edge detection.
590 Image Processing: The Fundamentals
(a) Original images (64 64)
(b) First division
(c) Second division
Figure 6.45: Using the normalised graph cuts algorithm to segment the Greek ag and
Leonidas. (b) Division in two regions. (c) Each region is further subdivided into two re-
gions. Parameter values for the Greek ag h
x
= h
y
= 5 and h
g
= 0.01 and for Leonidas
h
x
= h
y
= 4 and h
g
= 0.02.
Edge detection 591
6.2 Edge detection
How do we measure the dissimilarity between neighbouring pixels?
We may slide a window across the image, at each position calculate the statistical properties
of the pixels within each half of the window and compare the two results. The places where
these statistical properties dier most are where the boundaries of the regions are.
A
B
Figure 6.46: One may use a sliding window to measure the dissimilarity between two neigh-
bouring patches of the image. A statistic is computed in each half of the window and the
dierence between the two values is assigned to the central pixel, highlighted in grey here.
For example, consider the 12 12 image of gure 6.46. Each dot represents a pixel. The
rectangle drawn is a 39 window which could be placed so that its centre coincides with every
pixel in the image, apart from those too close to the edge of the image. We can calculate the
statistical properties of the twelve pixels inside the left part of the window (part A) and those
of the twelve pixels inside the right part of the window (part B), and assign their dierence
to the central pixel. The simplest statistic we may compute is the mean grey value inside
each subwindow. Other statistics may also (or alternatively) be computed. For example, we
may calculate the standard deviation or even the skewness of the grey values of the pixels
within each half of the window. However, one has to be careful as the number of pixels inside
the window is rather small and high order statistics are not computed reliably from a small
number of samples. Therefore, the size of the window plays a crucial role, as we need a large
enough window, to calculate the statistics reliably, and a small enough window, to include
within each half only part of a single region and avoid contamination from neighbouring
regions.
We have to slide the window horizontally to scan the whole image. Local maxima of
the assigned values are candidate positions for vertical boundaries. Local maxima where the
value of the dierence is greater than a certain threshold are accepted as vertical boundaries
between adjacent regions. We can repeat the process by rotating the window by 90
and
sliding it vertically to scan the whole image again.
In general, such statistical methods, that rely on local calculations, are not very reliable
592 Image Processing: The Fundamentals
boundary identiers. However, if one has prior knowledge of the existence of a boundary,
then one may slide the window perpendicularly to the suspected direction of the boundary,
to locate its position very accurately. In such a case, statistical lters are very powerful tools
of local boundary detection (see gure 6.47).
d
i
f
f
e
r
e
n
c
e
i
n
l
o
c
a
l
s
t
a
t
i
s
t
i
c
scanning line
location of boundary
h
y
p
o
t
h
e
s
i
s
e
d
d
i
r
e
c
t
i
o
n
o
f
t
h
e
b
o
u
n
d
a
r
y
A
B
s
c
a
n
n
i
n
g
l
i
n
e
Figure 6.47: If we have some prior knowledge of the existence of a boundary, we may slide a
bipolar window orthogonal to the hypothesised direction of the boundary and compute the
exact location of the boundary, as the place where the dierence in the value of the computed
statistic is maximal.
What is the smallest possible window we can choose?
The smallest possible window we can choose consists of two adjacent pixels. The only statis-
tic we can calculate from such a window is the dierence of the grey values of the two pixels.
When this dierence is high, we say we have an edge passing between the two pixels. Of
course, the dierence of the grey values of the two pixels is not a statistic but rather an esti-
mate of the rst derivative of the intensity function, with respect to the spatial variable along
the direction we compute the dierence. This is because rst derivatives of image function
f(i, j) are approximated by rst dierences in the discrete case:
f
x
(i, j) f(i + 1, j) f(i, j)
f
y
(i, j) f(i, j + 1) f(i, j) (6.112)
Calculating f
x
(i, j) at each pixel position is equivalent to convolving the image with a
mask (lter) of the form +1 1 in the x direction, and calculating f
y
(i, j) is equivalent
to convolving the image with the lter
+1
1
in the y direction.
The rst and the simplest edge detection scheme then is to convolve the image with these
two masks and produce two outputs. Note that these small masks have even lengths, so their
centres are not associated with any particular pixel in the image as they slide across the
image. So, the output of each calculation should be assigned to the position in between the
Edge detection 593
two adjacent pixels. These positions are said to constitute the dual grid of the image grid
(see example 6.20, on page 569). In practice, we seldomly invoke the dual grid. We usually
adopt a convention and try to be consistent. For example, we may always assign the output
value to the rst pixel of the mask. If necessary, we later may remember that this value
actually measures the dierence between the two adjacent pixels at the position half a pixel
to the left or the bottom of the pixel to which it is assigned. So, with this understanding,
and for simplicity from now and on, we shall be talking about edge pixels or edgels.
In the rst output, produced by convolution with mask +1 1 , any pixel, that has an
absolute value larger than values of its left and right neighbours, is a candidate to be a vertical
edge pixel. In the second output, produced by convolution with mask
+1
1
, any pixel, that
has an absolute value larger than the values of its top and bottom neighbours, is a candidate
to be a horizontal edge pixel. The process of identifying the local maxima as candidate edge
pixels is called non-maxima suppression.
In the case of zero noise, this scheme will clearly pick up the discontinuities in image
brightness.
What happens when the image has noise?
In the presence of noise every small and irrelevant uctuation in the intensity value will be
amplied by dierentiating the image. It is common sense then, that one should smooth the
image rst, with a low pass lter, and then nd the local dierences.
Let us consider, for example, a 1D signal. Assume that one uses as a low pass lter a
simple averaging procedure. We smooth the signal by replacing each intensity value with the
average of three successive intensity values:
A
i
I
i1
+I
i
+I
i+1
3
(6.113)
Then, we estimate the derivative at position i by averaging the two dierences between the
value at position i and its left and right neighbours:
F
i
(A
i+1
A
i
) + (A
i
A
i1
)
2
=
A
i+1
A
i1
2
(6.114)
If we substitute from (6.113) into (6.114), we obtain:
F
i
=
1
6
[I
i+2
+I
i+1
I
i1
I
i2
] (6.115)
One may combine the two linear operations, of smoothing and nding dierences, into one
operation, if one uses large enough masks. In this case, the rst dierence at each position
could be estimated by using a mask like
1
6
1
6
0
1
6
1
6 . It is clear that the larger
the mask used, the more eective the smoothing. However, it is also clear that the more
blurred the edge becomes too, so the more inaccurately its position will be specied (see
gure 6.48).
For an image which is a 2D signal, one should use 2D masks. The smallest mask that one
should use, to combine minimum smoothing with dierencing, is a 3 3 mask. In this case,
we also have the option to smooth in one direction and take the dierence along the other.
594 Image Processing: The Fundamentals
This implies that the 2D mask may be the result of applying, in a cascaded way, rst a 3 1
smoothing mask and then a 1 3 dierencing masks, or vice versa. In general, however, a
2D 3 3 mask will have the form:
1 K 1
0 0 0
1 K 1
1
2
3
4
5
6
7
8
9
1 3 5 7 11 13 15 17 19 21 23
sample
value
1
2
3
4
5
6
7
8
9
1 3 5 7 11 13 15 17 19 21 23
sample
value
1 3 5 7 11 13 15 17 19 21 23
sample
2
1
1
0
2
3
4
5
f
i
r
s
t
d
i
f
f
e
r
e
n
c
e
1 3 5 7 11 13 15 17 19 21 23
sample
2
1
1
0
2
3
4
5
f
i
r
s
t
d
i
f
f
e
r
e
n
c
e
9
9
1 3 5 7 11 13 15 17 19 21 23
sample
2
1
1
0
2
3
4
5
f
i
r
s
t
d
i
f
f
e
r
e
n
c
e
1
2
3
4
5
6
7
8
9
1 3 5 7 11 13 15 17 19 21 23
sample
value
9 9
9
9
Figure 6.48: The left column shows a noisy signal and two smoothed versions of it, one
produced by replacing each sample with the average of 3 successive samples (middle), and
one produced by replacing each sample with the average of 9 successive samples (bottom). On
the top right is the result of estimating the rst dierence of the original signal, by subtracting
from the value of each sample the value of its previous sample. The edge manifests itself with
a sharp peak in the unsmoothed signal. However, there are several secondary peaks present
in this output. When the rst dierence is estimated from the smoothed version of the signal,
the number of secondary peaks reduces, but at the same time, the main peak that corresponds
to the true edge becomes blunt (right column, middle and bottom panels). The more severe
the smoothing we applied was, the blunter the main peak. In the panels referring to the
smoothed signal, the original input or its dierence signal are shown by a dashed line to allow
direct comparison.
Edge detection 595
Box 6.5. How can we choose the weights of a 3 3 mask for edge detection?
Let us denote the generic form of the 3 3 mask, we wish to dene, as:
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
We are going to use one such mask to calculate f
x
and another to calculate f
y
. Such
masks must obey the following conditions.
1. The mask which calculates f
x
must be produced from the mask that calculates
f
y
by a 90
rotation. Let us consider from now and on the mask which will
produce f
y
only. The calculated value will be assigned to the central pixel.
2. We do not want to give any extra weight to the left or right neighbours of the
central pixel, so we must have identical weights in the left and right columns. The
3 3 mask, therefore, must have the form:
a
11
a
12
a
11
a
21
a
22
a
21
a
31
a
32
a
31
3. Let us say that we want to subtract the signal in front of the central pixel from
the signal behind it, in order to nd local dierences, and we want these two
subtracted signals to have equal weights. The 3 3 mask, therefore, must have
the form:
a
11
a
12
a
11
a
21
a
22
a
21
a
11
a
12
a
11
4. If the image is absolutely smooth, we want to have zero response. So, the sum of
all the weights must be zero. Therefore, a
22
= 2a
21
:
a
11
a
12
a
11
a
21
2a
21
a
21
a
11
a
12
a
11
5. In the case of a smooth signal, and as we dierentiate in the direction of the
columns, we expect each column to produce 0 output. Therefore, a
21
= 0:
a
11
a
12
a
11
0 0 0
a
11
a
12
a
11
We can divide these weights throughout by a
11
so that, nally, this mask depends only
on one parameter. Then the lter on page 594 follows.
596 Image Processing: The Fundamentals
What is the best value of parameter K?
It can be shown that the orientations of edges which are almost aligned with the image axes
are not aected by the dierentiation, if we choose K = 2. We have then the Sobel masks
for dierentiating an image along two directions:
1 2 1 1 0 1
0 0 0 2 0 2
1 2 1 1 0 1
Note that we have changed convention for the second mask and subtract the values of the
pixels behind the central pixel from the values of the pixels in front. This is intentional,
so that the orientation of the calculated edge, computed by using the components of the
gradient vector derived using these masks, is measured from the horizontal axis anticlockwise
(see Box 6.6).
Box 6.6. Derivation of the Sobel lters
Edges are characterised by their strength and orientation, dened as:
Strength E(i, j)
_
[f
x
(i, j)]
2
+ [f
y
(i, j)]
2
Orientation: a(i, j) tan
1
f
y
(i, j)
f
x
(i, j)
(6.116)
The idea is to try to specify parameter K of the lter on page 594, so that, the output
of the operator is as faithful as possible to the true values of E and a which correspond
to the non-discretised image:
E =
_
f
x
_
2
+
_
f
y
_
2
a = tan
1
f/y
f/x
(6.117)
Consider a straight step edge in the scene, passing through the middle of a pixel. Each
pixel is assumed to be a tile of size 1 1. Assume that the edge has orientation and
assume that is small enough so that the edge cuts lines AB and CD as opposed to
cutting lines AC and BD (0 tan
1
(
1
3
)) (see gure 6.49).
Edge detection 597
AF G B
L
J
H I
N M
D C
O
j
i
Figure 6.49: Zooming into a 3 3 patch of an image around pixel (i, j).
Also, assume that on the left of the edge we have a dark region with grey value G
1
and
on the right a bright region with grey value G
2
. Then, clearly, the pixel values inside
the mask are:
f(i 1, j 1) = f(i 1, j) = f(i 1, j + 1) = G
1
f(i + 1, j 1) = f(i + 1, j) = f(i + 1, j + 1) = G
2
(6.118)
The pixels in the central column have mixed values. If we assume that each pixel is
like a tile with dimensions 1 1 and denote the area of a polygon by the name of the
polygon inside brackets, then pixel ABIL will have value:
f(i, j1) = G
1
(AFJL)+G
2
(FBIJ) = G
1
_
1
2
(FGHJ)
_
+G
2
_
1
2
+(FGHJ)
_
(6.119)
We must nd the area of trapezium FGHJ. From the triangles OJH and OFG, we
have:
JH =
1
2
tan, FG =
3
2
tan (6.120)
Therefore, (FGHJ) =
1
2
(JH +FG) = tan , and by substitution into equation (6.119),
we obtain:
f(i, j 1) = G
1
_
1
2
tan
_
+G
2
_
1
2
+ tan
_
(6.121)
By symmetry:
f(i, j + 1) = G
2
_
1
2
tan
_
+G
1
_
1
2
+ tan
_
(6.122)
598 Image Processing: The Fundamentals
Clearly:
f(i, j) =
G
1
+G
2
2
(6.123)
Let us see now what the lter on page 594 will calculate in this case:
f
x
= f(i + 1, j + 1) +f(i + 1, j 1) +Kf(i + 1, j)
[f(i 1, j + 1) +f(i 1, j 1) +Kf(i 1, j)]
= (G
2
+G
2
+KG
2
) (G
1
+G
1
+KG
1
) = (G
2
G
1
)(2 +K)
f
y
= [f(i 1, j + 1) +f(i + 1, j + 1) +Kf(i, j + 1)]
+f(i 1, j 1) +f(i + 1, j 1) +Kf(i, j 1)
= G
1
G
2
KG
2
_
1
2
tan
_
KG
1
_
1
2
+ tan
_
+G
1
+G
2
+KG
1
_
1
2
tan
_
+KG
2
_
1
2
+ tan
_
= K(G
2
G
1
)
_
1
2
tan
_
+K(G
2
G
1
)
_
1
2
+ tan
_
= K(G
2
G
1
)
_
1
2
+ tan +
1
2
+ tan
_
= 2K(G
2
G
1
) tan (6.124)
The magnitude of the edge will then be
E =
_
(G
2
G
1
)
2
(2 +K)
2
+ (2K)
2
(G
2
G
1
)
2
tan
2
(6.125)
= (G
2
G
1
)(2 +K)
1 +
_
2K
2 +K
_
2
tan
2
(6.126)
and the orientation of the edge:
tan =
f
y
f
x
=
2K tan
2 +K
(6.127)
Note that if we choose K = 2,
tan = tan (6.128)
ie the calculated orientation of the edge will be equal to the true orientation.
One can perform a similar calculation for the case tan
1 1
3
45
o
. In that case, we
have error introduced in the calculation of the orientation of the edge.
Edge detection 599
Example 6.39
Write down the formula, which expresses the output O(i, j) at position (i, j)
of the convolution of the image with the Sobel mask, that dierentiates
along the i axis, as a function of the input values I(i, j). (Note: Ignore
boundary eects, ie assume that (i, j) is suciently far from the image
border.)
O(i, j) = I(i 1, j 1) 2I(i 1, j) I(i 1, j + 1)
+ I(i + 1, j 1) + 2I(i + 1, j) +I(i + 1, j + 1) (6.129)
Example 6.40
You have a 3 3 image which can be represented by a 9 1 vector. Con-
struct a 99 matrix, which, when it operates on the image vector, produces
another vector, each element of which is the estimate of the gradient com-
ponent of the image along the i axis. (To deal with the boundary pixels,
assume that the image is repeated periodically in all directions.)
Consider a 3 3 image:
i
j
_
_
f
11
f
12
f
13
f
21
f
22
f
23
f
31
f
32
f
33
_
_
(6.130)
Periodic repetition of this image implies that we have:
f
33
f
31
f
32
f
33
f
31
(6.131)
f
13
f
23
f
33
_
_
f
11
f
12
f
13
f
21
f
22
f
23
f
31
f
32
f
33
_
_
f
11
f
21
f
31
(6.132)
f
13
f
11
f
12
f
13
f
11
(6.133)
Using the result of example 6.39, we notice that the rst derivative at position (1, 1)
is given by:
f
32
+ 2f
12
+f
22
f
33
2f
13
f
23
(6.134)
600 Image Processing: The Fundamentals
The vector representation of the image is:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f
11
f
21
f
31
f
12
f
22
f
32
f
13
f
23
f
33
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(6.135)
If we operate with a 9 9 matrix on this image, we notice that in order to get the
desired output, the rst row of the matrix should be:
0 0 0 2 1 1 2 1 1
The derivative at position (1, 2), ie at pixel with value f
21
, is given by:
f
12
+ 2f
22
+f
32
f
13
2f
23
f
33
(6.136)
Therefore, the second row of the 9 9 matrix should be:
0 0 0 1 2 1 1 2 1
The derivative at position (1, 3) is given by:
f
22
+ 2f
32
+f
12
f
23
2f
33
f
13
Therefore, the third row of the 9 9 matrix should be:
0 0 0 1 1 2 1 1 2
Reasoning this way, we conclude that the matrix we require must be:
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 2 1 1 2 1 1
0 0 0 1 2 1 1 2 1
0 0 0 1 1 2 1 1 2
2 1 1 0 0 0 2 1 1
1 2 1 0 0 0 1 2 1
1 1 2 0 0 0 1 1 2
2 1 1 2 1 1 0 0 0
1 2 1 1 2 1 0 0 0
1 1 2 1 1 2 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(6.137)
This is a block circulant matrix.
Edge detection 601
Example 6.41
Using the matrix derived in example 6.40, calculate the rst derivative
along the i axis of the following image:
_
_
3 1 0
3 1 0
3 1 0
_
_
(6.138)
What is the component of the gradient along the i axis at the centre of the
image?
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 2 1 1 2 1 1
0 0 0 1 2 1 1 2 1
0 0 0 1 1 2 1 1 2
2 1 1 0 0 0 2 1 1
1 2 1 0 0 0 1 2 1
1 1 2 0 0 0 1 1 2
2 1 1 2 1 1 0 0 0
1 2 1 1 2 1 0 0 0
1 1 2 1 1 2 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
3
3
3
1
1
1
0
0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
4
4
4
12
12
12
8
8
8
_
_
_
_
_
_
_
_
_
_
_
_
_
_
(6.139)
At the centre of the image the component of the gradient along the i axis is 12.
In the general case, how do we decide whether a pixel is an edge pixel or not?
Edges are positions in the image where the image function changes. As the image is a 2D
function, to nd these positions we have to calculate the gradient of the function f(x, y).
The gradient of a 2D function is a 2D vector, with the partial derivatives of the function along
two orthogonal directions as its components. In the discrete case, these partial derivatives
are the partial dierences computed along two orthogonal directions, by using masks like, for
example, the Sobel masks. If we convolve an image with these masks, we have a gradient
vector associated with each pixel. Edges are the places where the magnitude of the gradient
vector is a local maximum along the direction of the gradient vector (ie the orientation of the
gradient at that pixel position). For this purpose, the local value of the gradient magnitude
will have to be compared with the values of the gradient estimated along this orientation and
at unit distance on either side away from the pixel. In general, these gradient values will not
be known, because they will happen to be at positions in between the pixels. Then, either
a local surface is tted to the image and used for the estimation of the gradient magnitude
at any interpixel position required, or the value of the gradient magnitude is calculated by
interpolating the values of the gradient magnitudes at the known integer positions.
After this process of non-maxima suppression takes place, the values of the gradient
602 Image Processing: The Fundamentals
vectors that remain are thresholded and only pixels with gradient values above the threshold
are considered as edge pixels identied in the edge map (see gure 6.50).
(a) (b)
(c) (d)
Figure 6.50: (a) Original image. (b) The gradient magnitude image: the brighter the
pixel, the higher the value of the gradient of the image at that point. (c) After non-maxima
suppression: only pixels with locally maximal values of the gradient magnitude retain their
values. The gradient values of all other pixels are set to zero. (d) The edge map: the gradient
values in (c) are thresholded: all pixels with values above the threshold are labelled 0 (black)
and all pixels with values below the threshold are labelled 255 (white).
How do we perform linear edge detection in practice?
Step 0: Create an output array the same size as the input image and assign to every pixel
value 255 (white).
Step 1: Convolve the input image with a mask that estimates its horizontal derivative
f
x
(i, j). This mask may be such that, in order to perform the estimation, smooths the
image in the vertical direction.
Step 2: Convolve the input image with a mask that estimates its vertical derivative f
y
(i, j).
Edge detection 603
This mask may be such that, in order to perform the estimation, smooths the image in the
horizontal direction.
Step 3: Compute the gradient magnitude at each pixel position:
G(i, j)
_
f
x
(i, j)
2
+ f
y
(i, j)
2
(6.140)
Step 4: Pixels that have G(i, j) above a predened threshold have their gradient orientation
estimated. Let us say that the orientation of the gradient vector of such a pixel (i, j) is
(i, j), with respect to the horizontal (the i) axis. Angle (i, j) is allowed to vary in the range
[90
o
, 90
o
].
Step 5: For each pixel with gradient orientation (i, j), compute the gradient magnitude at
neighbouring positions:
(i + cos (i, j), j sin (i, j)) and (i cos (i, j), j + sin (i, j) (6.141)
You may use bilinear interpolation for that, as these positions are most likely in between
pixels (see page 518). If G(i, j) is greater than both these estimated gradient magnitude
values, mark pixel (i, j) as an edge pixel in the output array, by giving it value 0 (black).
Example 6.42
The following image is given
0 1 2 0 4 5
0 0 1 1 4 5
0 2 0 4 5 4
0 0 5 4 6 6
0 0 6 6 5 6
5 4 6 5 4 5
Use the following masks to estimate the magnitude and orientation of the
local gradient at all pixel positions of the image, except the boundary
pixels:
1 0 1 1 3 1
3 0 3 0 0 0
1 0 1 1 3 1
Then, indicate which pixels represent horizontal edge pixels (edgels) and
which represent vertical.
If we convolve the image with the rst mask, we shall have an estimate of the gradient
component along the x (horizontal) axis:
5 4 16 17
6 11 19 6 I
x
21 20 7 6
24 23 4 2
604 Image Processing: The Fundamentals
If we convolve the image with the second mask, we shall have an estimate of the
gradient component along the y (vertical) axis:
1 1 11 6
4 15 15 10 I
y
0 18 12 4
18 8 2 6
The gradient magnitude is given by: |G| =
_
(I
x
)
2
+ (I
y
)
2
:
26
17
377
325
52
346
586
136
441
724
193
52
900
593
20
40
The gradient orientation is given by:
= tan
1
I
y
I
x
(6.142)
tan
1
(1/5) tan
1
(1/4) tan
1
(11/16) tan
1
(6/17)
tan
1
(2/3) tan
1
(15/11) tan
1
(15/19) tan
1
(5/3)
tan
1
0 tan
1
(9/10) tan
1
(12/7) tan
1
(2/3)
tan
1
(3/4) tan
1
(8/23) tan
1
(1/2) tan
1
3
We know that for an angle to be in the range 45
to 45
(0
|| 45
) its
tangent must satisfy 0 | tan| 1. Also, an angle is in the range 45
to 90
, or
90
to 45
(45
< || 90
), if 1 < | tan| +.
As we want to quantise the gradient orientations to vertical and horizontal ones only,
we shall make all orientations 0
|| 45
< || 90
.
By inspecting the orientation array above, we infer the following gradient orientations:
0 0 0 0
0 90
o
0 90
o
0 0 90
o
0
0 0 0 90
o
Edge detection 605
A pixel is a horizontal edge pixel if the magnitude of its gradient is a local maximum,
when compared with its vertical neighbours, and its orientation is 0
. A pixel is a
vertical edge if its orientation is 90
Example 6.43
Are the masks used in example 6.42 separable? How can we take advan-
tage of the separability of a 2D mask to reduce the computational cost of
convolution?
The masks used in example 6.42 are separable because they may be implemented as a
sequence of two 1D lters, applied in a cascaded way one after the other:
1 0 1 1
3 0 3 3 followed by 1 0 1
1 0 1 1
1 3 1 1
0 0 0 1 3 1 followed by 0
1 0 1 1
Any 2D N N separable mask may be implemented as a cascade of two 1D masks of
size N. This implementation replaces N
2
multiplications and additions per pixel by
2N such operations per pixel.
Are Sobel masks appropriate for all images?
Sobel masks are appropriate for images with low levels of noise. They are inadequate for
noisy images. See, for example, gure 6.51, which shows the results of edge detection using
Sobel masks for a very noisy image.
606 Image Processing: The Fundamentals
(a) A very noisy image (b) Edge map (Sobel, threshold th
1
)
(c) Edge map (Sobel, threshold th
2
) (d) Edge map (Sobel, threshold th
3
)
Figure 6.51: Trying to detect edges in a blurred and noisy image using the Sobel edge detector
may be very tricky. One may experiment with dierent threshold values but the result is not
satisfactory. The three results shown were obtained by using dierent thresholds within the
same edge detection framework, and they were the best among many others obtained for
dierent threshold values.
How can we choose the weights of the mask if we need a larger mask owing to
the presence of signicant noise in the image?
We shall consider the problem of detecting abrupt changes in the value of a 1D signal, like
the one shown in gure 6.52.
Let us assume that the feature we want to detect is u(x) and it is immersed in additive
white Gaussian noise n(x). The mask we want to use for edge detection should have certain
desirable characteristics, called Cannys criteria, as follows.
Edge detection 607
O
x
signal
Figure 6.52: A noisy 1D edge.
1. Good signal to noise ratio.
2. Good locality, ie the edge should be detected where it actually is.
3. Small number of false alarms, ie the maxima of the lter response should be mainly due
to the presence of the true edges in the image, rather than due to noise.
Canny showed that a lter function f(x) has maximal signal to noise ratio, if it is chosen
in such a way that it maximises the quantity
SNR
_
+
f(x)u(x)dx
n
0
_
_
+
f
2
(x)dx
(6.143)
where n
0
is the standard deviation of the additive Gaussian noise. As this is a constant for
a particular image, when we try to decide what function f(x) should be, we omit it from the
computation. So, we try to maximise the quantity:
S
_
+
f(x)u(x)dx
_
_
+
f
2
(x)dx
(6.144)
Canny also showed that the lter function f(x) detects an edge with the minimum shifting
away from its true location, if it is chosen in such a way that its rst and second derivatives
maximise the quantity:
L
_
+
(x)u(x)dx
_
_
+
[f
(x)]
2
dx
(6.145)
608 Image Processing: The Fundamentals
Finally, he showed that the output of the convolution of the signal with lter f(x) will
contain minimal number of false responses, if function f(x) is chosen in such a way that its
rst and second derivatives maximise the quantity:
C
_
_
+
(f
(x))
2
dx
_
+
(f
(x))
2
dx
(6.146)
One, therefore, may design an optimal edge enhancing lter by trying to maximise the
above three quantities S, L and C.
Canny combined the rst two into a single performance measure
P, which he maximised
under the constraint that C is constant. He derived a convolution lter that way, which he
did not use, because he noticed that it could be approximated by the derivative of a Gaussian.
So, he adopted the derivative of a Gaussian as a good enough image derivative estimation
lter (see page 352 and example 4.22).
Alternatively, one may combine all three quantities S, L, and C into a single performance
measure P
P (S L C)
2
=
_
_
+
f(x)u(x)dx
_
2
_
_
+
(x)u(x)dx
_
2
_
+
f
2
(x)dx
_
+
(f
(x))
2
dx
(6.147)
and try to choose f(x) so that P is maximum. The free parameters which will appear in
the functional expression of f(x) can be calculated from the boundary conditions imposed on
f(x) and by substitution into the expression for P and selection of their numerical values so
that P is maximum.
Once function f(x) has been worked out, it must be sampled to produce a digital lter.
The number of samples used denes the size of the lter. The higher the level of noise in the
image, ie the higher the value of n
0
in (6.143), the more densely the lter should be sampled:
dense sampling makes the lter longer (consisting of more taps). This implies a higher degree
of smoothing imposed to the data, while estimating its rst dierence.
The lter dened in the above way is a lter that will respond maximally to a discontinuity
of the signal that has the form of a step function, ie it is a lter that estimates, somehow,
the local rst dierence of the signal. If we integrate it, we may obtain a lter that may be
used to smooth the signal. So, to move from 1D to 2D, we work out the dierentiating lter,
then we integrate it to form the corresponding smoothing lter, then we sample both lters
to make them usable for digital signals, and then we apply rst the smoothing lter along one
direction, followed by the dierencing lter along the orthogonal direction, to estimate the
rst derivative along the dierencing direction. We then repeat the process by exchanging
the directions of smoothing and dierentiation, to estimate the derivative along the other
direction. Finally, we use these two directional dierences to estimate the local gradient
magnitude and orientation, and proceed as we did earlier to work out the edge map of the
image, applying non-maxima suppression and thresholding.
It must be stressed that the lters dened in this way are appropriate for the detection of
antisymmetric features, ie step or ramp edges, when the noise in the signal is additive, white
and Gaussian.
Edge detection 609
Can we use the optimal lters for edges to detect lines in an image in an optimal
way?
No. Edge detection lters respond badly to lines. The theory for optimal edge detector lters
has been developed under the assumption that there is a single isolated step edge. We can
see that, from the limits used in the integrals in equation (6.143). For example, the limits
are from to +, assuming that in the whole innite length of the signal, there is only a
single step edge. If we have a line, its prole looks like the one shown in gure 6.53.
grey value
x
Figure 6.53: The prole of a line.
It looks like two step edges back to back. The responses of the lter to the two step edges
interfere and the result is not satisfactory: the two steps may or may not be detected. If
they are detected, they tend to be detected as two edges noticeably misplaced from their true
positions and shifted away from each other.
Apart from this, there is another more fundamental dierence between step edges and
lines.
What is the fundamental dierence between step edges and lines?
A step edge is scale invariant: it looks the same whether we stretch it or shrink it. A
line has an intrinsic length-scale: this is its width. From the moment the feature we wish
to detect has a characteristic size, the size of the lter we must use becomes important.
Similar considerations apply also if one wishes to develop appropriate lters for ramp edges as
opposed to step edges: the length over which the ramp rises (or drops) is characteristic of the
feature and it cannot be ignored, when designing the optimal lter. The criterion expressed
by equation (6.147) may be appropriately modied to develop optimal lters for the detection
of ramp edges of various slopes. Figure 6.54 shows attempts to detect edges in the image of
gure 6.51a, using the wrong lter. The edges are blurred, so they actually have ramp-like
proles, but the lter used is the optimal lter for step edges. Figure 6.55 shows the results of
using the optimal lter for ramps on the same image, and the eect the size of the lter has
on the result. Figure 6.56 shows an example of a relatively clean and unblurred image,
for which one does not need to worry too much about which lter one uses for edge detection.
Cannys criteria have also been modied for the case of lines and used to derive optimal
lters for line detection that depend on the width and sharpness of the line.
610 Image Processing: The Fundamentals
(a) Assuming a step edge and using a (b) The same parameters as in (a), with the
lter of size 17 17 size of the lter increased to 21 21. Some
detail is lost and no improvement has been
achieved
(c) Assuming a step edge and a lter of size (d) As in (c) with the threshold increased
17 17 and increasing the threshold even further
Figure 6.54: Trying to deal with the high level of noise by using an optimal lter of large size
and experimenting with the thresholds may not help, if the edges are blurred and resemble
ramps, and the optimal lter we use has been developed to be optimal for step edges and
not for ramp edges.
Edge detection 611
(a) Assuming a shallow ramp and using (b) Exactly the same as in (a), but
a lter of size 7 7 setting the lter size to 13 13
(c) Assuming a ramp edge and using (d) Assuming a ramp edge and using
a lter of size 17 17 a lter of size 21 21
Figure 6.55: Having chosen the right model for the features we wish to detect, using the right
lter size for the level of noise may prove crucial. Spectacularly good results may be obtained
if the right lter shape and size are used. The lter shape determines the type of feature
that will be enhanced, while the lter size determines the level of noise this lter can tolerate.
These results should be compared with those of gure 6.54, where the optimal lter for step
edges was used, while here we used the optimal lter of ramp edges.
612 Image Processing: The Fundamentals
Example B6.44
Assume that u(x) is dened for positive and negative values of x and that
we want to enhance a feature like u(x), at position x = 0, in a noisy signal.
Use equation (6.144) to show that if the feature we want to enhance is an
even function, we must choose an even lter, and if it is an odd function,
we must choose an odd lter.
Any function f(x) may be written as the sum of an odd and an even part:
f(x)
1
2
[f(x) f(x)]
. .
f
o
(x)(odd)
+
1
2
[f(x) +f(x)]
. .
f
e
(x)(even)
(6.148)
In general, therefore, f(x) may be written as:
f(x) = f
e
(x) +f
o
(x) (6.149)
Assume that u(x) is even. Then, the integral in the numerator of S is:
_
+
f(x)u(x)dx =
_
+
f
e
(x)u(x)dx +
_
+
f
o
(x)u(x)dx
. .
odd integrand integrated
over a symmetric interval:
it vanishes
=
_
+
f
e
(x)u(x)dx (6.150)
The integral in the denominator in the expression for S is:
_
+
f
2
(x)dx =
_
+
f
2
e
(x)dx +
_
+
f
2
o
(x)dx + 2
_
+
f
e
(x)f
o
(x)dx
. .
odd integrand integrated
over a symmetric interval:
it vanishes
=
_
+
f
2
e
(x)dx +
_
+
f
2
o
(x)dx (6.151)
So, we see that the odd part of the lter does not contribute at all to the signal response,
while it contributes to the noise response. That is, it reduces the signal to noise ratio.
Thus, to enhance an even feature we must use an even lter. Similarly, to enhance
an odd feature, we must use an odd lter.
Edge detection 613
Example 6.45
You are asked to use the following lter masks:
1 2 1 1 1 1
1 2 1 2 2 2
1 2 1 1 1 1
What type of feature are these masks appropriate for enhancing? (The
word feature here means structural feature, eg an edge, a line, a corner,
etc in an image. This appears to be dierent from the previous use of
the word feature we encountered in this chapter, where we dened it to
mean attribute, eg a number that characterises a pixel. If we consider
that masks like these measure somehow how strong a local edge or a local
line is, then we may relate the two uses of the word feature: the output
of applying such a mask produces a number that characterises a pixel by
telling us how much edge-like or line-like the pixel is. This number is an
attribute of the pixel, ie a feature.)
The rst lter enhances vertical lines and the second horizontal lines, in both cases
brighter than the background.
(a) Original image (b) Sobel lter
(c) Optimal (5 5) (d) Optimal (7 7)
Figure 6.56: Everyday life images are relatively clean and good results can be obtained by
using relatively small lters. Note that the smaller the lter, the more details of the image
are preserved.
614 Image Processing: The Fundamentals
Example 6.46
Use the masks of example 6.45 to process the image below. (Do not process
the border pixels.)
1 1 5 3 0
0 1 4 1 0
1 1 3 2 1
0 2 5 3 0
1 0 4 2 0
Then, by choosing an appropriate threshold for the output images calcu-
lated, produce the feature maps of the input image.
The result of the convolution of the image with the rst mask is:
8 15 1
5 14 1
8 14 1
The result of the convolution of the image with the second mask is:
2 3 4
2 4 1
4 8 4
Note that the outputs contain values that may easily be divided into two classes: posi-
tive and negative. We choose, therefore, as our threshold the zero value, t = 0. This
threshold seems to be in the largest gap between the two populations of the output values
(ie one population that presumably represents the background and one that represents
the features we want to detect). When thresholding, we shall give one label to one class
(say all numbers above the threshold will be labelled 1) and another label to the other
class (all numbers below the threshold will be labelled 0). The two outputs then become:
0 1 0 0 0 0
0 1 0 0 0 0
0 1 0 1 1 1
These are the feature maps of the given image.
Edge detection 615
Box 6.7. Convolving a random noise signal with a lter
Assume that our noise signal is n(x) and we convolve it with a lter f(x). The result
of the convolution will be:
g(x
0
) =
_
+
f(x)n(x
0
x)dx (6.152)
As the input noise is a random variable, this quantity is a random variable too. If we
take its expectation value, it will be 0 as long as the expectation value of the input noise
is 0. We may also try to characterise it by calculating its variance, that is, its mean
square value. This is given by:
E{[g(x
0
)]
2
} = E{g(x
0
)g(x
0
)} (6.153)
This is the denition of the autocorrelation function of the output, calculated at argu-
ment 0. So, we must calculate rst the autocorrelation function of g, R
gg
().
We multiply both sides of equation (6.152) with g(x
0
+) and take expectation values:
E{g(x
0
)g(x
0
+)} = E
__
+
f(x)g(x
0
+)n(x
0
x)dx
_
R
gg
() =
_
+
f(x)E{g(x
0
+)n(x
0
x)}dx
R
gg
() =
_
+
f(x)R
ng
(x
0
+ x
0
+x)dx
R
gg
() =
_
+
f(x)R
ng
( +x)dx (6.154)
Here R
ng
(a) is the cross-correlation function between the input noise signal and the
output noise signal, at relative shift a. We must calculate R
ng
. We multiply both sides
of (6.152) with n(x
0
) and then take expectation values. (We multiply with n(x
0
)
instead of n(x
0
+) because in (6.154) we dened the argument of R
ng
as the dierence
of the argument of g minus the argument of n.)
E{g(x
0
)n(x
0
)} =
_
+
f(x)E{n(x
0
x)n(x
0
)}dx
R
ng
() =
_
+
f(x)R
nn
(x
0
x x
0
+)dx
R
ng
() =
_
+
f(x)R
nn
( x)dx (6.155)
Here R
nn
(a) is the autocorrelation function of the input noise signal at relative shift a.
However, n(x) is assumed to be white Gaussian noise, so its autocorrelation function
616 Image Processing: The Fundamentals
is a delta function given by R
nn
() = n
2
0
() where n
2
0
is the variance of the noise.
Therefore:
R
ng
() =
_
+
f(x)n
2
0
( x)dx = n
2
0
f() (6.156)
So, R
ng
with argument +x, as it appears in (6.154), is:
R
ng
( +x) = n
2
0
f( +x) (6.157)
If we substitute this into (6.154), we have:
R
gg
() = n
2
0
_
+
f(x)f( +x)dx R
gg
(0) = n
2
0
_
+
f
2
(x)dx (6.158)
If we substitute this into equation (6.153), we obtain:
E
_
[g(x
0
)]
2
_
= n
2
0
_
+
f
2
(x)dx (6.159)
The mean lter response to noise is given by the square root of the above
expression.
Box 6.8. Calculation of the signal to noise ratio after convolution of a noisy
edge signal with a lter
Assume that f(x) is the lter we want to develop, so that it enhances an edge in a noisy
signal. The signal consists of two components: the deterministic signal of interest u(x)
and the random noise component n(x):
I(x) = u(x) +n(x) (6.160)
The response of the lter due to the deterministic component of the noisy signal, when
the latter is convolved with the lter, is:
s(x
0
) =
_
+
f(x)u(x
0
x)dx (6.161)
Let us assume that the edge we wish to detect is actually at position x
0
= 0. Then:
s(0) =
_
+
f(x)u(x)dx (6.162)
The mean response of the lter due to the noise component (see Box 6.7) is
Edge detection 617
n
0
_
_
+
f
2
(x)dx. The signal to noise ratio, therefore, is:
SNR =
_
+
f(x)u(x)dx
n
0
_
_
+
f
2
(x)dx
(6.163)
Box 6.9. Derivation of the good locality measure
The result of the convolution of signal (6.160) with lter f(x) is
O(x
0
)
_
+
f(x)I(x
0
x)dx =
_
+
f(x
0
x)I(x)dx
=
_
+
f(x
0
x)u(x)dx +
_
+
f(x
0
x)n(x)dx s(x
0
) +g(x
0
)(6.164)
where s(x
0
) is the output due to the deterministic signal of interest and g(x
0
) is the
output due to noise.
The edge is detected at the local maximum of this output, that is at the place where
the rst derivative of O(x
0
) with respect to x
0
becomes 0:
dO(x
0
)
dx
0
=
ds(x
0
)
dx
0
+
dg(x
0
)
dx
0
=
_
+
(x
0
x)u(x)dx +
_
+
(x
0
x)n(x)dx (6.165)
We expand the derivative of the lter about point x
0
= 0, the true position of the edge,
and keep only the rst two terms of the expansion:
f
(x
0
x) f
(x) +x
0
f
(x) (6.166)
Upon substitution into
ds(x
0
)
dx
0
, we obtain:
ds(x
0
)
dx
0
_
+
(x)u(x)dx +x
0
_
+
(x)u(x)dx (6.167)
The feature we want to detect is an antisymmetric feature, ie an edge that has a shape
like .
According to the result proven in example 6.44, f(x) should also be an antisymmetric
function. This means that its rst derivative will be symmetric and, when it is multi-
plied with the antisymmetric function u(x), it will produce an antisymmetric integrand,
which, upon integration over a symmetric interval, will make the rst term in equation
(6.167) vanish. Then:
ds(x
0
)
dx
0
x
0
_
+
(x)u(x)dx
. .
set xx
= x
0
_
+
(x
0
x)n(x)dx =
_
+
(x)n(x
0
x)dx (6.169)
The position of the edge will be marked at the value of x that makes the sum of the
right-hand sides of equations (6.168) and (6.169) zero:
ds(x
0
)
dx
0
+
dg(x
0
)
dx
0
= 0
x
0
_
+
(x)u(x)dx +
_
+
(x)n(x
0
x)dx = 0
x
0
_
+
(x)u(x)dx =
_
+
(x)n(x
0
x)dx (6.170)
The right-hand side of this expression is a random variable, indicating that the location
of the edge will be marked at various randomly distributed positions around the true
position, which is at x
0
= 0. We can calculate the mean shifting away from the true
position as the expectation value of x
0
. This, however, is expected to be 0. So, we
calculate instead the variance of the x
0
values. We square both sides of (6.170) and
take their expectation value:
E{x
2
0
}
__
+
(x)u(x)dx
_
2
= E
_
__
+
(x)n(x
0
x)dx
_
2
_
(6.171)
Note that the expectation value operator applies only to the random components.
In Box 6.7 we saw that if a noise signal with variance n
2
0
is convolved with a lter f(x),
the mean square value of the output signal is given by:
n
2
0
_
+
f
2
(x)dx (6.172)
(see equation (6.159)). The right-hand side of equation (6.170) here indicates the con-
volution of the noise component with lter f
[f
(x)]
2
dx. Then:
E{x
2
0
} =
n
2
0
_
+
[f
(x)]
2
dx
_
_
+
(x)u(x)dx
_
2
(6.173)
The smaller this expectation value is, the better the localisation of the edge. We may
dene, therefore, the good locality measure as the inverse of the square root of the
above quantity. We may also ignore factor n
0
, as it is the standard deviation of the
noise during the imaging process, over which we do not have control. So, a lter is
optimal with respect to good locality, if it maximises the quantity:
L
_
+
(x)u(x)dx
_
_
+
[f
(x)]
2
dx
(6.174)
Edge detection 619
Example B6.47
Show that the dierentiation of the output of a convolution, of a signal
u(x) with a lter f(x), may be achieved by convolving the signal with the
derivative of the lter.
The output of the convolution is
s(x
0
) =
_
+
f(x)u(x
0
x)dx (6.175)
or
s(x
0
) =
_
+
f(x
0
x)u(x)dx (6.176)
Applying Leibnizs rule for dierentiating an integral with respect to a parameter (see
Box 4.9, on page 348), we obtain from equation (6.176)
ds(x
0
)
dx
0
=
_
+
(x
0
x)u(x)dx (6.177)
which upon changing variables may be written as:
ds(x
0
)
dx
0
=
_
+
(x)u(x
0
x)dx (6.178)
Box 6.10. Derivation of the count of false maxima
It has been shown by Rice, that the average distance of any two successive zero crossings
of the convolution of a function h with Gaussian noise is given by
x
av
=
R
hh
(0)
R
hh
(0)
(6.179)
where R
hh
() is the spatial autocorrelation function of function h(x), ie:
R
hh
() =
_
+
(h(x))
2
dx (6.181)
Using Leibnizs rule (see Box 4.9, on page 348) we can dierentiate (6.180) with respect
to , to obtain:
R
hh
() =
_
+
h(x)h
(x +)dx (6.182)
We dene a new variable of integration x x + x = x and dx = d x to obtain:
R
hh
() =
_
+
h( x )h
( x)d x (6.183)
We dierentiate (6.183) once more:
R
hh
() =
_
+
( x )h
( x)d x R
hh
(0) =
_
+
(h
(x))
2
dx (6.184)
Therefore, the average distance of zero crossings of the output signal, when a noise
signal is ltered with function h(x), is given by:
x
av
=
_
_
+
(h(x))
2
dx
_
+
(h
(x))
2
dx
(6.185)
From the analysis in Box 6.9, we can see that the false maxima in our case will come
from equation
_
+
(x)n(x
0
x)dx = 0, which will give the false alarms in the absence
of any signal (u(x) = 0). This is equivalent to saying that the false maxima coincide
with the zeros in the output signal when the noise signal is ltered with function f
(x).
So, if we want to reduce the number of false local maxima, we should make the average
distance between the zero crossings as large as possible, for lter function h(x) f
(x).
Therefore, we dene the good measure of scarcity of false alarms as:
C
_
_
+
(f
(x))
2
dx
_
+
(f
(x))
2
dx
(6.186)
Can edge detection lead to image segmentation?
Not directly. The detected edges are usually fragmented and have gaps, so, in general, they
do not divide the image into regions. To use edge detection to segment an image, we must
Edge detection 621
follow one of two routes:
use hysteresis edge linking, as proposed by Canny, in combination with some further
postprocessing;
use a Laplacian of Gaussian lter to detect edges.
What is hysteresis edge linking?
It is a way of postprocessing the output of the non-maxima suppression stage of an edge
detector, so that more complete edges may be extracted. Instead of specifying one threshold
to threshold the gradient magnitude values, we specify two thresholds. Edgels with gradient
magnitude weaker than the lower threshold are considered as noise and discarded. All re-
maining edgels, with gradient magnitude above the lower threshold, are linked to form chains
of linked pixels. If at least one of the pixels in such a chain has gradient magnitude stronger
than the higher threshold, all pixels of the chain are kept as edgels.
So, the algorithm on page 602 has to be modied as follows. In Step 4, the word thresh-
old refers to the low threshold. Then the following steps have to be added.
Step 6: In the output image identify edgels with three or more neighbouring edgels. Such
edgels are junctions. Remove them from the edge map and mark them in some other array.
Step 7: After the junction edgels have been removed, the edge map consists of disconnected
strings of edgels. Examine each string in turn. If at least one of its edgels has gradient mag-
nitude above the high threshold, keep the whole string. If none of the edgels of the string has
gradient magnitude above the high threshold, remove the whole string from the edge map.
Step 8: Re-insert the junction pixels you removed in Step 6, if they are adjacent to at least
one edgel in the edge map.
Does hysteresis edge linking lead to closed edge contours?
No. It simply leads to longer and better edge strings. To extract closed contours, we must
combine the edges we extract with some other algorithm, like snakes or level set methods,
which t exible curves to the grey values of the image. These methods are beyond the scope
of this book. Alternatively, edge information is used in the watershed algorithm, which we
discussed in the previous section, and which is eectively a hybrid method that combines
region and gradient information to segment the image.
Example B6.48
A realistic ramp edge at position x = 0 may be modelled by a sigmoid
function
u(x) =
1
1 +e
sx
(6.187)
where s is a parameter that controls the slope of the edge. Show that the
second derivative of this function is 0 at the position of the edge.
The rst derivative of the model edge is:
622 Image Processing: The Fundamentals
u
(x) = s
e
sx
(1 +e
sx
)
2
= s
e
sx
1 +e
2sx
+ 2e
sx
= s
1
e
sx
+e
sx
+ 2
= s
1
2 cosh(sx) + 2
(6.188)
The second derivative of u(x) is:
u
(x) = s
2
sinh(sx)
2(cosh(sx) + 1)
2
(6.189)
For x = 0 this function is 0.
Example B6.49
Show that the second derivative of a Gaussianly smoothed signal may be
obtained by convolving the signal with the second derivative of the Gaus-
sian function.
Let us call the signal f(x), the smoothing Gaussian lter g(x) and the smoothed signal
o(x). We have:
o(x) =
_
+
(x) =
_
+
(x) =
_
+
(x) =
_
+
n=1
a
n
cos
nt
T
+
+
n=1
b
n
sin
nt
T
=
a
0
2
+
+
n=1
_
a
2
n
+b
2
n
_
a
n
_
a
2
n
+b
2
n
cos
nt
T
+
b
n
_
a
2
n
+b
2
n
sin
nt
T
_
a
0
2
+
+
n=1
A
n
cos
_
nt
T
n
_
(6.194)
The coecients of the series expansion are given by:
a
n
=
1
T
_
T
T
f(t) cos
nt
T
dt
b
n
=
1
T
_
T
T
f(t) sin
nt
T
dt (6.195)
In (6.194) the amplitude A
n
of the expansion was dened as
A
n
_
a
2
n
+b
2
n
(6.196)
and the phase
n
such that:
cos
n
a
n
_
a
2
n
+b
2
n
and sin
n
b
n
_
a
2
n
+b
2
n
(6.197)
We also made use of cos cos + sin sin = cos( ).
We have phase congruency at a point t when
nt
T
n
has the same value, modulo 2,
for all indices n.
What is phase congruency for a 1D digital signal?
A (2N + 1)-sample long digital signal may be represented by its DFT, which treats the
signal as if it were periodic with period N. So, for a digital real signal f(i), with i =
N, N + 1, . . . , 0, . . . , N 1, N, we have
f(i) =
N
k=N
F(k)e
j
2ki
2N+1
(6.198)
626 Image Processing: The Fundamentals
where F(k) is usually complex and given by:
F(k) =
1
2N + 1
N
i=N
f(i)e
j
2ki
2N+1
(6.199)
We may write F(k) A
k
e
j
k
, so that A
k
is a non-negative real number and
k
is such that
A
k
cos
k
= Real{F(k)} and A
k
sin
k
= Imaginary{F(k)}. Then (6.198) may be written
as:
f(i) =
N
k=N
A
k
e
j(
2ki
2N+1
+
k)
=
N
k=N
A
k
cos
_
2ki
2N + 1
+
k
_
+j
N
k=N
A
k
sin
_
2ki
2N + 1
+
k
_
=
N
k=N
A
k
cos
_
2ki
2N + 1
+
k
_
(6.200)
The last equality follows because f(i) is a real signal and so the imaginary component of
its expansion must be 0. This signal exhibits phase congruency at a sample i, when angles
2ki
2N+1
+
k
for all values of k have the same value, modulo 2.
How does phase congruency allow us to detect lines and edges?
It has been observed that phase congruency coincides with high local energy in a zero mean
signal and that when there is an edge or a line, the signal exhibits high local energy.
Why does phase congruency coincide with the maximum of the local energy of
the signal?
Let us consider what the DFT does to a signal that has zero mean: it maps a 1D signal to a
2D space, where along one axis we measure the real part of a vector and along the other
its imaginary part. The mapping is such that the value of the original signal at a point is
written as a sum of many such vectors. The energy of the signal at that point is dened as
the magnitude of the sum vector: expansion (6.200) for signal f(i) may be written as
f(i) =
N
k=N
_
A
k
cos
_
2ki
2N + 1
+
k
_
+jA
k
sin
_
2ki
2N + 1
+
k
__
(6.201)
So, the local energy of the signal at point i, being the magnitude of the sum vector, is:
E(i)
_
_
N
k=N
A
k
cos
_
2ki
2N + 1
+
k
_
_
2
+
_
N
k=N
A
k
sin
_
2ki
2N + 1
+
k
_
_
2
(6.202)
Figure 6.58 shows schematically how the constituent vectors are added to produce the
sum vector. Note that the sum vector we obtain will have the largest possible length if
Phase congruency and the monogenic signal 627
all constituent vectors point towards the same direction. This happens when the ratio of
the two components of each vector is constant for all vectors. The vectors we add are:
_
A
k
cos
_
2ki
2N+1
+
k
_
, A
k
sin
_
2ki
2N+1
+
k
__
. The ratio of the two components of such a
vector is
A
k
sin
_
2ki
2N+1
+
k
_
A
k
cos
_
2ki
2N+1
+
k
_ = tan
_
2ki
2N + 1
+
k
_
(6.203)
This ratio is constant when angle
2ki
2N+1
+
k
is the same (modulo ) for all k, ie when we
have phase congruency.
I
m
a
g
i
n
a
r
y
Real
4
t
h
h
a
r
m
o
n
i
c t
o
t
a
l
s
i
g
n
a
l
3
rd
h
a
rm
o
n
ic
3
rd
h
a
rm
o
n
ic
4
t
h
h
a
r
m
o
n
i
c
I
m
a
g
i
n
a
r
y
Fixed location
Real
(a) (b)
a
t
f
i
x
e
d
l
o
c
a
t
i
o
n
2
n
d
h
a
r
m
o
n
i
c
2
n
d
h
a
r
m
o
n
i
c
1
s
t
h
a
r
m
o
n
i
c
1
s
t
h
a
r
m
o
n
i
c
Figure 6.58: For a xed location (signal sample), the signal is made up from the addition of
several harmonic vectors.
How can we measure phase congruency?
Let us denote the phase of harmonic k as
k
(i)
2ki
2N+1
+
k
, for i xed. Let us dene the
orientation of the sum vector in gure 6.58 as
(i). This is shown pictorially in gure 6.59a.
So, the projection of each added vector on the sum vector, that represents the signal, is
actually A
k
cos(
k
(i)
k
A
k
cos(
k
(i)
(i))
k
A
k
(6.204)
Couldnt we measure phase congruency by simply averaging the phases of the
harmonic components?
Yes, but not in a straightforward way. Angles are measured modulo 360
o
. An angle of 1
o
and
an angle of 358
o
point almost in the same direction, but their average is an angle of 179.5
o
628 Image Processing: The Fundamentals
I
m
a
g
i
n
a
r
y
Real
a
t
f
i
x
e
d
l
o
c
a
t
i
o
n
t
o
t
a
l
s
i
g
n
a
l
(a)
(b)
2 2
1
3
3
4 4
I
m
a
g
i
n
a
r
y
Real
t
o
t
a
l
s
i
g
n
a
l
a
t
f
i
x
e
d
l
o
c
a
t
i
o
n
1
A
c
o
s
(
)
A cos( )
A cos( )
A
c
o
s
(
)
Figure 6.59: For a xed location (signal sample), the signal is made up from the projections
of the harmonic amplitudes on the signal direction. In (a) the relative orientations of the
various harmonics in relation to the orientation of the total signal are marked. In (b) the
projections of the various harmonics along the direction of the signal are added to make up
the signal.
Phase congruency and the monogenic signal 629
pointing in the opposite direction. There is a small trick one can use to average such numbers.
Let us assume that we have to average ve angles,
1
, . . . ,
5
. We add to each angle 360
o
and, thus, create a set of ten angles (the original ve ones plus the new ve ones we created).
We rank them in increasing order and we take the standard deviation of every successive ve
angles. The set of ve angles that has the smallest standard deviation is the desired set. Its
average is the average of the original angles.
This process is not very useful for large sets of angles to be averaged, so we prefer to use
denition (6.204) to quantify phase congruency.
Example 6.50
Calculate the average and the median angle of angles:
20
o
, 40
o
, 290
o
, 130
o
and 320
o
.
We add 360
o
to each given angle and, thus, we obtain angles 380
o
, 400
o
, 650
o
, 490
o
and 680
o
. We rank in increasing order the set of ten angles:
20
o
, 40
o
, 130
o
, 290
o
, 320
o
, 380
o
, 400
o
, 490
o
, 650
o
, 680
o
We consider every successive ve and compute their mean and variance:
1
=
20 + 40 + 130 + 290 + 320
5
= 160
2
1
=
(20 160)
2
+ (40 160)
2
+ (130 160)
2
+ (290 160)
2
+ (320 160)
2
5
= 15480 (6.205)
We do the same for the remaining four sets of successive ve angles (the last set is
expected to be identical in terms of variance with the rst set of numbers we started
with):
2
=
40 + 130 + 290 + 320 + 380
5
= 232
2
2
= 16056
3
=
130 + 290 + 320 + 380 + 400
5
= 304
2
3
= 9144
4
=
290 + 320 + 380 + 400 + 490
5
= 376
2
4
= 4824
5
=
320 + 380 + 400 + 490 + 650
5
= 448
2
5
= 13176 (6.206)
The set with the minimum variance is set 4 and its mean is 376. Therefore, the average
angle of the original angles is 376 360 = 16
o
and the median is 380 360 = 20
o
.
630 Image Processing: The Fundamentals
Example 6.51
Calculate the value of
in denition (6.204), on page 627.
This is the orientation of the sum vector (the total signal) given by equation (6.201).
So, angle
is:
= tan
1
N
k=N
A
k
sin
_
2ki
2N+1
+
k
_
N
k=N
A
k
cos
_
2ki
2N+1
+
k
_ (6.207)
Care should be taken so that the angle is in the range [0, 360
o
). To make sure that we
dene the angle correctly, in the right quadrant of the trigonometric circle, it is better
to dene it using its sine and cosine (or simply check the signs of the sine and cosine
functions):
cos
=
N
k=N
A
k
cos
_
2ki
2N+1
+
k
_
_
_
N
k=N
A
k
sin
_
2ki
2N+1
+
k
__
2
+
_
N
k=N
A
k
cos
_
2ki
2N+1
+
k
__
2
sin
=
N
k=N
A
k
sin
_
2ki
2N+1
+
k
_
_
_
N
k=N
A
k
sin
_
2ki
2N+1
+
k
__
2
+
_
N
k=N
A
k
cos
_
2ki
2N+1
+
k
__
2
(6.208)
How do we measure phase congruency in practice?
Since the maxima of the phase congruency coincide with the maxima of the local energy of the
signal, we measure the local signal energy and identify its local maxima, in order to identify
the places where signal features (eg edges or lines) are present.
How do we measure the local energy of the signal?
We note that in order to measure the local signal energy, we analyse the signal into its
harmonic components, that are dened in a 2D space (see gure 6.58a). All we need then
to do is to dene an appropriate basis for this space and identify our signal components
with respect to this basis. We note that a zero-mean real symmetric signal will have all its
harmonic components coincide with the real axis of gure 6.58a. (This is the basis of the
cosine transform we saw in Chapter 2.) We also note that a zero-mean real antisymmetric
signal will have all its harmonic components coincide with the imaginary axis of gure 6.58a.
(This is the basis of the sine transform we saw in Chapter 2.) So, if we dene two signals that
are real and with zero-mean, and one is symmetric and the other antisymmetric, and they
are orthogonal to each other in the frequency domain, we shall have an orthogonal basis for
this space. If the two signals are also of unit length each, we shall have an orthonormal basis.
One way to make sure that two signals are orthogonal in the frequency domain is to
dene them so that their phases are in quadrature, ie at 90
o
from each other. In particular,
Phase congruency and the monogenic signal 631
starting from one basis function, we may construct the other, by rotating its positive frequency
components by +90
o
, and its negative frequency components by 90
o
. This is explained
schematically in gure 6.60. Functions that have Fourier transforms that dier only by a
phase shift of +90
o
in the positive frequencies and by 90
o
in the negative frequencies are
said to constitute a Hilbert transform pair. These two basis signals, in the time domain,
they should be of the same size as the segment of the signal we wish to use in order to dene
the local energy, ie the size of the basis signals in the time domain is dened by what we
mean with the word local. Then all we have to do is to project the signal on these two basis
signals, square and add the two components and thus obtain the local energy of the signal.
frequeny
axis
0
Imaginary
frequeny
Real
axis
0
frequeny
axis
0
Imaginary
jH( )
H( )
sign( )jH( )
Figure 6.60: A symmetric lter has a real Fourier transform (top). As the lter is real in the
time domain, its Fourier transform is symmetric about the 0 frequency. If we multiply this
Fourier transform with j, it will refer to the imaginary axis (middle). If we multiply it with
j for the negative frequencies and with +j for the positive ones (bottom), we create a lter
that is orthogonal to the original one, in the frequency domain: if we multiply point by point
the function at the top with the function at the bottom and sum up the results, we get 0.
632 Image Processing: The Fundamentals
For example, signal (1, 2, 1) is a symmetric signal that may be used to represent the
imaginary axis in the 2D Fourier space. Signal (1, 0, +1) an antisymmetric signal that
may be used to represent the imaginary axis in the 2D Fourier space. The two signals are
orthogonal to each other in the time domain too, as their dot product is (1) (1) +
2 0 + (1) (+1) = 0. So, they constitute an orthogonal basis for the space of in-
terest. We can make the basis orthonormal if we divide the elements of the symmetric
signal with
_
(1)
2
+ 2
2
+ (1)
2
=
6 and the elements of the antisymmetric signal with
_
(1)
2
+ (+1)
2
=
2. Then all we have to do, in order to work out the local energy of the
signal at every point, is to convolve the signal with these two basis signals (=lters), get the
two convolution outputs, square the value of each sample of each output, and sum the two
squared results.
Why should we perform convolution with the two basis signals in order to get
the projection of the local signal on the basis signals?
Projection of the signal on the two unit signals (basis signals) involves taking every sample
of the signal and its local neighbourhood, the same size as the unit signals, and multiplying
them point by point and adding the products. This is eectively convolution if we ignore
the reversal of the convolving signal we have to do to perform real convolution. We may call
such a convolution pseudo-convolution. This is shown schematically in gure 6.61. The
pseudo-convolution is what we usually do when we deal with images: the pseudo-convolution
is equivalent to the projection of one signal on the other. The fact that we do not perform
the reversal is irrelevant when the unit (or basis, or lter) signal is symmetric and it results
in sign reversal if the unit (or basis, or lter) signal is antisymmetric. However, as we get the
square of each resultant value, a change in sign is irrelevant.
basis
projection
=convolution
=pseudoconvolution
signal
signal
x
on the basis
local segment
to be projected
convolution
=projection
(a) (b)
x
Figure 6.61: (a) The projection of a segment of the signal on a basis signal involves the
multiplication of the segment with the basis signal point by point and the addition of the
results. (b) Convolution, according to its algebraic denition, involves the reversal of the
basis signal before we perform the point by point multiplication and addition. Usually, in
image processing, we forget this reversal (because the sign of the result is usually ignored)
and we actually perform a pseudo-convolution.
Phase congruency and the monogenic signal 633
Example 6.52
Consider the lter (1/
6, 2/
6, 1/
6
_
e
j
2k
3
+ 2 e
j
2k
3
_
(6.209)
Next, we work out the DFT components by allowing k to take values 1, 0 and 1.
F(1) =
1
3
6
_
e
j
2
3
+ 2 e
j
2
3
_
=
1
3
6
_
cos
2
3
+j sin
2
3
+ 2 cos
2
3
j sin
2
3
_
=
1
3
6
_
2 cos
2
3
+ 2
_
=
1
3
6
_
2
_
1
2
_
+ 2
_
=
1
6
F(0) = 0
F(1) =
1
3
6
_
e
j
2
3
+ 2 e
j
2
3
_
=
1
3
6
_
cos
2
3
j sin
2
3
+ 2 cos
2
3
+j sin
2
3
_
=
1
6
(6.210)
The positive frequency component (the k = 1 component) of the DFT of the orthogonal
lter should be at +90
o
from F(1) and the negative frequency component (the k =
1 component) of the DFT of the orthogonal lter should be at 90
o
from F(1).
Therefore, each of the above F(k) has to be multiplied either with e
j
2
to change its
phase by +90
o
, or with e
j
2
to change its phase by 90
o
, in order to yield the DFT
components of the orthogonal lter. So, the components of the orthogonal lter are:
F(1) =
1
6
e
j
F(0) = 0
F(1) =
1
6
e
j
2
(6.211)
We use these values then in (6.198) to construct the orthogonal lter:
634 Image Processing: The Fundamentals
f(i) =
1
k=1
F(k)e
j
2ki
3
=
1
6
_
e
j
2
e
j
2i
3
+e
j
2
e
j
2i
3
_
=
1
6
_
e
j
6
(3+4i)
+e
j
6
(3+4i)
_
=
1
6
2 cos
(3 + 4i)
6
(6.212)
To work out the lter weights now, we allow i to take values 1, 0 and 1:
f(1) =
1
3
=
1
f(0) = 0
f(1) =
1
6
2 cos
7
6
=
1
2
(6.213)
So, lters
_
6
,
2
6
,
1
6
_
and
_
1
2
, 0,
1
2
_
(6.214)
constitute a Hilbert transform pair.
Example 6.53
Consider the lter (1/
2, 0, 1/
2
_
e
j
2k
3
+e
j
2k
3
_
(6.215)
Next, we work out the DFT components by allowing k to take values 1, 0 and 1.
Phase congruency and the monogenic signal 635
F(1) =
1
3
2
_
e
j
2
3
+e
j
2
3
_
=
1
3
2
_
cos
2
3
+j sin
2
3
+ cos
2
3
+j sin
2
3
_
=
1
3
2
2j sin
2
3
= j
1
3
3 = j
1
6
F(0) = 0
F(1) =
1
3
2
_
e
j
2
3
+e
j
2
3
_
=
1
3
2
_
cos
2
3
j sin
2
3
+ cos
2
3
j sin
2
3
_
= j
1
6
(6.216)
The k = 1 component of the DFT of the orthogonal lter should be at +90
o
from F(1)
and the k = 1 component should be at 90
o
from F(1). So, the components of the
orthogonal lter are:
F(1) = j
1
6
e
j
2
=
1
F(0) = 0
F(1) = j
1
6
e
j
2
=
1
6
(6.217)
We use these values then in (6.198) to construct the orthogonal lter:
f(i) =
1
k=1
F(k)e
j
2ki
3
=
1
6
_
e
j
2i
3
+e
j
2i
3
_
=
1
6
2 cos
2i
3
(6.218)
To work out the lter weights now, we allow i to take values 1, 0 and 1:
f(1) =
1
6
2 cos
2
3
=
1
f(0) =
2
f(1) =
1
6
(6.219)
636 Image Processing: The Fundamentals
Example 6.54
Construct the Hilbert transform pair of lter (1, 1, 1, 1).
First compute the DFT of the given lter. The centre of the lter is between its two
central samples, ie its samples are at half integer positions. Its DFT then is given by
F(k) =
1
4
1
i=2
f(i)e
j
2k(i+0.5)
4
=
1
4
_
e
j
2k3
8
+e
j
2k
8
+e
j
2k
8
e
j
2k3
8
_
=
1
2
_
cos
k3
4
+ cos
k
4
_
(6.220)
F(0) = 0
F(1) =
1
2
_
cos
3
4
+ cos
4
_
=
1
2
F(2) =
1
2
_
cos
3
2
+ cos
2
_
= 0
F(3) =
1
2
_
cos
4
+ cos
3
4
_
=
1
2
(6.221)
The elements that correspond to the negative frequencies are elements F(2) and F(3).
We multiply the positive frequency component (namely F(1)) with e
j
2
and the nonzero
negative frequency component, namely F(3), with e
j
2
to produce the DFT of the
desired lter:
F(0) = 0
F(1) =
1
2
e
j
F(2) = 0
F(3) =
1
2
e
j
2
(6.222)
We then take the inverse DFT to produce the lter:
f(i) =
3
k=0
F(k)e
j
2k(i+0.5)
4
=
1
2
e
j
(i+1.5)
2
2
e
j
(3i+0.5)
2
(6.223)
Phase congruency and the monogenic signal 637
f(2) =
1
2
_
e
j
4
e
j
11
4
_
=
1
2
_
cos
4
j sin
4
cos
11
4
+j sin
11
4
_
= 1
f(1) =
1
2
_
e
j
4
e
j
5
4
_
= 1
f(0) =
1
2
_
e
j
3
4
e
j
4
_
= 1
f(1) =
1
2
_
e
j
5
4
e
j
7
4
_
= 1 (6.224)
So, lters (1, 1, 1, 1) and (1, 1, 1, 1) constitute a Hilbert transform pair.
Box 6.11. Some properties of the continuous Fourier transform
The Fourier transform of a function f(x) is F(), dened as:
F()
_
+
f(x)e
jx
dx (6.225)
The inverse transform may be worked out as:
f(x) =
1
2
_
+
F()e
jx
d (6.226)
Some important properties of the Fourier transform are tabulated below.
Real domain Frequency domain Name of the property
f(x) F() Denition of the Fourier transform
F(x) 2f() Duality
d
n
f(x)
dx
(j)
n
F() FT of the nth derivative of a function
_
x
f(y)dy
j
F
_
_
Scaling property
f(x +) e
j
F() Shifting property (real domain)
f(x)e
jx
F( +) Shifting property (frequency domain)
_
+
|f(x)|
2
dx =
1
2
_
+
|F()|
2
d Parsevals theorem
1 2() Fourier transform of a constant
(x) 1 Fourier transform of the delta function
638 Image Processing: The Fundamentals
Example B6.55
Prove the duality property of the Fourier transform.
Let us consider the Fourier transform of function F(x), as dened by equation (6.225):
F() =
_
+
F(x)e
jx
dx
=
_
+
__
+
f(y)e
jxy
dy
_
e
jx
dx
=
_
+
f(y)
__
+
e
j(y)x
dx
_
dy
=
_
+
f(y)2(y )dy
=
_
+
f(y)2(y +)dy
= 2f() (6.227)
Here we made use of the fact that the Fourier transform of the delta function is 1,
which, by considering the inverse Fourier transform, immediately leads to the identity:
(x) =
1
2
_
+
e
jx
d (6.228)
We also used the property of delta function (x) = (x).
Example B6.56
The Fourier transform of a signal f(x) is F(). We multiply F() with
function H() sgn()j to obtain the Fourier transform
F(), of a signal
f(x) =
1
2
_
+
f(t)
x t
dt (6.230)
This equation expresses the Hilbert transform of function f(x).
Example 6.57
Show that
_
xcos(x)dx =
1
xsin(x) +
1
2
cos(x)
_
xsin(x)dx =
1
xcos(x) +
1
2
sin(x) (6.231)
where is a positive constant.
We use integration by parts and remember that the integral of cos x is sin x and the
integral of sin x is cos x:
_
xcos(x)dx =
_
xd[sin(x)]
1
=
1
xsin(x)
1
_
sin(x)dx
=
1
xsin(x) +
1
2
cos(x) (6.232)
640 Image Processing: The Fundamentals
_
xsin(x)dx =
_
xd[cos(x)]
1
=
1
xcos(x) +
1
_
cos(x)dx
=
1
xcos(x) +
1
2
sin(x) (6.233)
Example 6.58
Compute the Fourier series of the periodic signal dened as follows:
f(x) =
_
_
3
x
2
for 6 x < 4
1 for 4 x 2
1
2
x for 2 < x < 2
+1 for 2 x 4
+3
x
2
for 4 < x < 6
(6.234)
Then plot this function and its rst four nonzero harmonics on the same
axes. At which points do you observe phase congruency?
We must use formulae (6.194) and (6.195), on page 625, with T = 6.
Since f(x) is an odd function, its product with the cosine function is also odd. The
integrals of such products over a symmetric interval of integration, vanish. So, all
expansion coecients a
n
are 0.
Further, the integrands of the b
n
coecients in (6.195) are symmetric functions, so
instead of integrating from T to +T, we may integrate only from 0 to +T and double
the result.
In the following calculation, we make use of (6.231):
b
n
=
2
6
_
6
0
f(x) sin
nx
6
dx
=
1
3
__
2
0
x
2
sin
nx
6
dx +
_
4
2
sin
nx
6
dx +
_
6
4
_
3
x
2
_
sin
nx
6
dx
_
Phase congruency and the monogenic signal 641
=
1
3
_
1
2
_
6
n
xcos
nx
6
+
36
2
n
2
sin
nx
6
2
0
+
_
6
n
cos
nx
6
4
2
+3
_
6
n
cos
nx
6
6
4
1
2
_
6
n
xcos
nx
6
+
36
2
n
2
sin
nx
6
6
4
_
=
1
3
_
1
2
_
6
n
2 cos
n
3
+
36
2
n
2
sin
n
3
_
6
n
cos
n2
3
+
6
n
cos
n
3
+3
_
6
n
cos(n) +
6
n
cos
n2
3
_
1
2
_
36
n
cos(n) +
24
n
cos
n2
3
36
2
n
2
sin
n2
3
__
=
2
n
cos
n
3
+
6
2
n
2
sin
n
3
2
n
cos
n2
3
+
2
n
cos
n
3
6
n
cos(n)
+
6
n
cos
n2
3
+
6
n
cos(n)
4
n
cos
n2
3
+
6
2
n
2
sin
n2
3
=
6
2
n
2
_
sin
n
3
+ sin
2n
3
_
(6.235)
We may now work out the values of b
n
for n = 1, 2, . . .. We remember that sin(/3) =
2
b
5
=
6
2
25
b
7
=
6
2
49
b
11
=
6
2
121
(6.236)
So, the harmonic expansion of the function is:
f(x)
6
2
sin
x
6
. .
1st harmonic
+
6
2
25
sin
_
5x
6
+
_
. .
2nd harmonic
+
6
2
49
sin
7x
6
. .
3rd harmonic
+
6
2
121
sin
_
11x
6
+
_
. .
4th harmonic
(6.237)
Note that the negative coecients were made positive and a was introduced in the
phase of the sinusoid to account for the negative sign (remember that sin( + ) =
sin ).
642 Image Processing: The Fundamentals
6 4 2 0 2 4 6
1
0
1
6 4 2 0 2 4 6
1
0
1
f(x)
x
Figure 6.62: At the top the original signal and its rst four harmonics. At the bottom,
all harmonics plotted with the same amplitude. The thick dashes at the bottom of the
gure mark the ranges of the abscissa where all harmonics are in phase, in the sense
that they are all increasing, or all decreasing. These are the places where the signal
changes abruptly.
Figure 6.62 shows the plot of the original function and each of the four harmonics
identied above. It is dicult to distinguish the 3rd and 4th harmonics in the top
panel. In the bottom panel, we omit the coecient of the harmonic in order to make
clearer the phase of each sinusoid at each point. We observe that the phases coincide
only at the points where the at part of the trapezoid function starts or ends. These
are the points of maximum phase congruency and they are expected to be the places of
maximum local signal energy.
Example 6.59
Sample the continuous signal given by (6.234) at integer and half integer
values of x. Then compute the DFT of the sampled signal and plot its rst
ve harmonics on the same axes as the digital signal. At which points do
you observe phase congruency?
The sampled signal is:
Phase congruency and the monogenic signal 643
(0, 1/4, 1/2, 3/4, 1, 1, 1, 1, 1, 3/4, 1/2, 1/4, 0,
1/4, 1/2, 3/4, 1, 1, 1, 1, 1, 3/4, 1/2, 1/4) (6.238)
Note that we do not include the last 0 value at point x = 6, because in DFT the signal
is assumed to be repeated in both directions, so if we had a 0 at the end of this sampled
sequence, it would be as if the signal in two successive samples had value 0, given that
the next sample after the end of the sequence is the rst sample of the sequence. These
samples correspond to indices i: (12, 11, . . . , 0, . . . , 11).
The DFT of this signal is:
F(k) =
1
24
11
i=12
f(i)e
j
2ki
24
(6.239)
As f(12) = f(0) = 0 and f(i) = f(i), and since e
j
+ e
j
= 2j sin , this
expression may be written as:
F(k) =
2j
24
11
i=1
f(i) sin
2ki
24
=
2j
24
_
1
4
sin
2k
24
+
1
2
sin
4k
24
+
3
4
sin
6k
24
+ sin
8k
24
+ sin
10k
24
+ sin
12k
24
+sin
14k
24
+ sin
16k
24
+
3
4
sin
18k
24
+
1
2
sin
20k
24
+
1
4
sin
22k
24
_
(6.240)
Note that due to the antisymmetry of the sine function, F(k) = F(k).
F(1) = 0.5294j, F(2) = F(3) = F(4) = F(6) = F(8) = F(9) = F(10) = 0,
F(5) = 0.0244j, F(7) = 0.0144j and F(11) = 0.0092j.
Now we have the values of F(k), we may use (6.198), on page 625, to work out the
DFT expansion of f(i):
f(i) =
11
k=12
F(k)e
j
2ki
24
= 2j
11
k=1
F(k) sin
2ki
24
(6.241)
Note that F(12) = F(0) = 0 and that is why we could simplify the above formula.
Also, as F(k) is an imaginary number, factor 2j in the above formula makes the result
real. The individual terms of this sample are the harmonics of function f(i). As there
are only four nonzero values of F(k), there are only four harmonics. The rst one
is 1.0588 sin
i
12
. Variable i is the index of the sampled function f(i). If we want to
plot these harmonics as continuous functions, we must use as independent variable,
variable x in the range [6, 6) and replace i in these formulae with 2x, since x was
sampled with step 0.5.
644 Image Processing: The Fundamentals
In gure 6.63, we plot the original function and its four harmonics as continuous
functions. To identify the points of phase congruency we also plot all harmonics with
the same amplitude. The places where all harmonics act in symphony, ie they are all
increasing or all decreasing, are marked with the thick black dashes at the bottom of
the gure. We observe that phase congruency coincides again with the places where
the signal changes suddenly.
6 4 2 0 2 4 6
1
0
1
6 4 2 0 2 4 6
1
0
1
i
f(i)
Figure 6.63: At the top, the original digital signal with its harmonics. At the bot-
tom, the harmonics plotted with the same amplitude, in order to assess the places of
phase congruency. Places where the harmonics all increase or all decrease together are
marked with the thick dashes at the bottom of the graph.
Example 6.60
Compute the local energy of the digital signal of example 6.59, by using the
lters developed in example 6.54 and identify the points where the local
energy is maximum. To avoid border eects, do not process the rst two
and the last two samples.
We pseudo-convolve sequence (6.238) with lters (1, 1, 1, 1) and (1, 1, 1, 1) and
square the output at each sample and add the two results to deduce the square of the
local energy. The results are shown in table 6.1. The bold entries in the nal column
represent the local maxima of the energy. Note that these do not coincide very well
with the places of phase congruency we identied in example 6.59. This is because the
lters we use are not very compact in the time domain, ie they do not have very good
localisation properties.
Phase congruency and the monogenic signal 645
signal conv1 (conv1)
2
conv2 (conv2)
2
loc.energy
0
1/4
0 0 1 1 1
1/2
0 0 1 1 1
3/4
1/4 1/16 3/4 9/16 10/16
1
1/4 1/16 1/4 1/16 1/8
1
0 0 0 0 0
1
0 0 0 0 0
1
1/4 1/16 1/4 1/16 1/8
1
1/4 1/16 3/4 9/16 10/16
3/4
0 0 1 1 1
1/2
0 0 1 1 1
1/4
0 0 1 1 1
0
0 0 1 1 1
1/4
0 0 1 1 1
1/2
0 0 1 1 1
3/4
1/4 1/16 3/4 9/16 10/16
1
1/4 1/16 1/4 1/16 1/8
1
0 0 0 0 0
1
0 0 0 0 0
1
1/4 1/16 1/4 1/16 1/8
1
1/4 1/16 3/4 9/16 10/16
3/4
0 0 1 1 1
1/2
0 0 1 1 1
1/4
0
Table 6.1: The rst column is the signal. The second column is the result of convolving
the signal with lter (1, 1, 1, 1). The third column is the square of this output. The
fourth column is the result of convolving the signal with lter (1, 1, 1, 1). The fth
column is the square of this result. The nal column is the square of the local energy,
computed as the sum of the two squares. Note that as the lters are even, the output
value is assigned between two samples. In bold, the local maxima of the local energy.
646 Image Processing: The Fundamentals
Example 6.61
Compute the local energy of the digital signal of example 6.59 by using the
lters developed in example 6.52 and identify the points where the local
energy is maximum. To avoid border eects, do not process the rst and
the last sample.
We pseudo-convolve sequence (6.238) with lters (1, 2, 1) and (1, 0, 1) and square
the output at each sample. Before we add the two outputs, we multiply the output of
the rst lter with 1/6 and that of the second lter with 1/2 to take into consideration
the scaling factors of the lters. Then we add the two results to deduce the square of
the local energy. The results are shown in table 6.2. The entries of the last column
in bold represent the local maxima of the energy. Note that these are closer to the
maxima of the phase congruency than the points identied in example 6.60, as the
lters we use here are more compact in the time domain.
signal conv1 (conv1)
2
conv2 (conv2)
2
loc.energy
0
1/4 0 0 1/2 1/4 1/8
1/2 0 0 1/2 1/4 1/8
3/4 0 0 1/2 1/4 1/8
1 1/4 1/16 1/4 1/16 1/24
1 0 0 0 0 0
1 0 0 0 0 0
1 0 0 0 0 0
1 1/4 1/16 1/4 1/16 1/24
3/4 0 0 1/2 1/4 1/8
1/2 0 0 1/2 1/4 1/8
1/4 0 0 1/2 1/4 1/8
0 0 0 1/2 1/4 1/8
1/4 0 0 1/2 1/4 1/8
1/2 0 0 1/2 1/4 1/8
3/4 0 0 1/2 1/4 1/8
1 1/4 1/16 1/4 1/16 1/24
1 0 0 0 0 0
1 0 0 0 0 0
1 0 0 0 0 0
1 1/4 1/16 1/4 1/16 1/24
3/4 0 0 1/2 1/4 1/8
1/2 0 0 1/2 1/4 1/8
1/4 0 0 1/2 1/4 1/8
0
Table 6.2: The rst column is the signal. The second column is the result of convolving
the signal with lter (1, 2, 1). The third column is the square of this output. The
fourth column is the result of pseudo-convolving the signal with lter (1, 0, 1). The
fth column is the square of this result. The nal column is the square of the local
energy, computed as the sum of the two squares, after they are scaled by the squared
values of the normalising factors of the two lters. In bold, the local maxima of the
local energy.
Phase congruency and the monogenic signal 647
If all we need to compute is the local energy of the signal, why dont we use
Parsevals theorem to compute it in the real domain inside a local window?
Indeed, one may do that. Parsevals theorem (see page 637) says that the energy computed
in the frequency domain is the same as the energy computed in the real domain. Given that
we are not interested in the dc component of the signal, all we have to do is to remove the
local mean and take the sum of the squares of the remaining components. This is a simple
operation, which might be quite eective for clean signals. However, the use of the Fourier
domain allows one to omit some frequencies, that may correspond to noise, and also develop
lters that preferentially detect features of specic frequencies.
Example 6.62
Compute the local energy of the digital signal of example 6.59 by using a
scanning 1 3 window. You must remove the local mean of the windowed
signal and then work out the sum of the squares of the remainders and
assign it to the central sample of the window. Thus, you may identify the
points where the local energy is maximum. To avoid border eects, do not
process the rst and the last sample.
The result is shown in table 6.3. The entries of the last column in bold represent the
local maxima of the energy. Note that the energy values we computed in example 6.61
and here are the same. This is due to Parsevals theorem.
signal local mean local energy
0
1/4 1/4 1/8
1/2 1/2 1/8
3/4 3/4 1/8
1 11/12 1/24
1 1 0
1 1 0
1 1 0
1 11/12 1/24
3/4 3/4 1/8
1/2 1/2 1/8
1/4 1/4 1/8
0 0 1/8
1/4 1/4 1/8
1/2 1/2 1/8
3/4 3/4 1/8
1 11/12 1/24
1 1 0
1 1 0
1 1 0
1 11/12 1/24
3/4 3/4 1/8
1/2 1/2 1/8
1/4 1/4 1/8
0
Table 6.3: The rst column is the signal. The second column is the local average of
the signal, inside a 3-samples long window. The last column is the sum of the squares
of the residuals inside each window, after the local mean has been removed.
648 Image Processing: The Fundamentals
How do we decide which lters to use for the calculation of the local energy?
Apart from the obvious requirement that the two lters should constitute a Hilbert transform
pair, have 0 dc component and the sum of squares of their elements should be 1, it is desirable
for the lters to have also the following properties.
1) They should be compact in the real domain in order to have good locality. The lters used
in example 6.60 lacked this property.
2) If we wish to process the image in a single scale only, the lters should be narrow in the
real domain so that they have broad spectrum in the frequency domain and, thus, capture
the local structural characteristics of the signal at any scale they occur. The lters used in
example 6.61 had this property. However, such lters are not immune to noise.
3) It is better if we consider lters that are also compact in the frequency domain, so that
they help us estimate the local signal energy within a certain frequency band. Such l-
ters have the advantage that they will pick up signal features which will look like even and
odd ripples of dierent frequencies, ie they will go beyond the mere detection of lines and
edges.
4) If we decide to follow the multiband/multiscale approach, the lter should be easily scaled
to allow the user to translate it into a dierent band.
5) Filters that are dened with nite bandwidth should have maximal concentration in the
real domain with minimal side lobes, so that they do not create artifacts and they can be
easily truncated to be used as convolution lters. Their exact shape will depend on the lter
parameters.
A lter that exhibits most of the desirable properties is:
H() = log
_
cos
2
(
0
)
cos
2
1
_
for |
0
| <
1
(6.242)
Here it is assumed that the total bandwidth of the signal is scaled to be in the range [0, 1]
and
0
is a number within this range. The bandwidth of the lter is determined by
1
. This
lter should be used as the symmetric lter. It should be combined with its Hilbert transform
pair to extract the local signal energy. Note that this lter will respond best to symmetric
features, which, however, may not be just simple lines.
Example 6.63
Assume that the central frequency response of the lter dened by equation
(6.242) is
0
= 0.5 and the bandwidth parameter is
1
= 0.2. Calculate the
frequency response of the lter and its weights in the real domain.
Figure 6.64a shows the plot of lter (6.242) as a function of . It was constructed by
allowing to take values in the range [0.3, 0.7] in steps of 0.01. The logarithm used
was base 10. To obtain the lter in the real domain, we sampled the range of [0, 1]
in steps of 0.01. We also considered the same values of H() for negative frequencies,
ie in the range [1, 0], with the same sampling step. This yielded 201 points in total,
Phase congruency and the monogenic signal 649
which were treated as the DFT of the lter. The inverse DFT yielded the real lter. In
6.64b we plot only the central 61 values of the lter. As the lter is of nite bandwidth
in the frequency domain, it is of innite extent in the real domain, with gradually dying
down side lobes. Note that the lter is not a simple line detection lter, but rather one
that will respond maximally to patterns in the signal that are similar to its appearance.
The signicant nonzero values of the lter are:
(0.132, 0.011, 0.466, 0.015, 0.795, 0.007, 0.933,
0.007, 0.795, 0.015, 0.466, 0.011, 0.132)
One may approximate this lter as a 9-tap lter with weights:
(0.249, 0.012, 0.488, 0.005, 0.492, 0.005, 0.488, 0.012, 0.249)
These values were obtained by keeping only the 9 central values of the lter and nor-
malising them, so that the positive weights sum up to +1 and the negative weights sum
up to 1. A simplied version of this lter might be:
0.25, 0.00, 0.50, 0.00, 0.50, 0.00, 0.50, 0.00, 0.25.
0 0.2 0.4 0.6 0.8 1
0
0.005
0.010
0.015
0.020
H()
0.2 0.1 0 0.1 0.2 0.3
1
0.5
0
0.5
1
x
h(x)
(a) (b)
Figure 6.64: (a) Filter (6.242) in the frequency domain and (b) in the real domain.
Example 6.64
Compute the Hilbert transform pair of lter (6.242).
The Fourier transform of the Hilbert pair of lter (6.242) should be:
H()
_
_
_
jH() for < 0
0 for = 0
jH() for > 0
(6.243)
We sample the range [1, 1] of values with steps of 0.01, to produce 201 sampling
points. We then take the inverse DFT of these samples to produce the lter in the
real domain. In gure 6.65 we plot the central 61 values of this lter. This is an
antisymmetric lter. However, it is not an edge detection lter. It is designed to
respond maximally to antisymmetric multiple uctuations of the signal. So, if used
650 Image Processing: The Fundamentals
in conjunction with the lter in 6.64b, they will detect features in the signal that are
of these two types. The phase computed from their two outputs will tell us how much
one or the other of the two lters the signal looks like, in the locality of each energy
maximum.
The most signicant central values of the lter in gure 6.65 are:
(0.0057, 0.0805, 0.0034, 0.0160, 0.0062, 0.2877, 0.0146, 0.6437, 0.0124, 0.8974, 0
0.8974, 0.0124, 0.6437, 0.0146, 0.2877, 0.0062, 0.0160, 0.0034, 0.0805, 0.0057).
By keeping only the central 9 values, we may construct a 9-tap long lter, making
sure that the positive weights sum up to +1 and the negative weights sum up to 1:
(0.0093, 0.4105, 0.0079, 0.5723, 0.0000, 0.5723, 0.0079, 0.4105, 0.0093)
One may simplify such a lter to the form:
(0.20, 0.40, 0.60, 0.00, 0.60, 0.40, 0.20).
0.2 0 0.2
1
0.5
0
0.5
1
h(x)
x
Figure 6.65: The Hilbert pair of the lter of gure 6.64b.
Example B6.65
We project a signal f(x) on two lters that constitute a Hilbert transform
pair, by pseudo-convolving the signal with these lters. Work out the
relationship of the Fourier transforms of the two outputs of these pseudo-
convolutions.
Let us call the even lter e(x) and the odd lter o(x). Let us denote their Fourier trans-
forms by E() and O(), respectively. We shall denote the components of the DFTs
that correspond to positive frequencies with a plus (+) superscript and the components
that correspond to negative frequencies with a minus () superscript. We shall denote
the real and the imaginary parts of the DFTs with indices R and I, respectively. We
may then write:
E() =
_
E
R
() +jE
I
() for < 0
E
+
R
() +jE
+
I
() for > 0
(6.244)
Phase congruency and the monogenic signal 651
Since lter o(x) is the Hilbert transform pair of e(x), its DFT is produced from the
DFT of e(x), by multiplying the components of the negative frequencies with j and
the components of the positive frequencies with +j:
O() =
_
E
I
() jE
R
() for < 0
E
+
I
() +jE
+
R
() for > 0
(6.245)
Let us denote by F() the DFT of the signal. The local projection of the signal on
e(x) is identical to its convolution with e(x), given that e(x) is symmetric. The DFT
of the output will be F()E(). The local projection of the signal on o(x) is minus
its convolution with o(x), given that o(x) is antisymmetric. The DFT of this output
will be F()O(). If we were to perform the two pseudo-convolutions in one go, by
using as a single lter, the complex lter c(x) e(x) + jo(x), the DFT of the result
would be F()E() jF()O() C():
C() =
_
F()E
R
() +jF()E
I
() F()E
R
() jF()E
I
() for < 0
F()E
+
R
() +jF()E
+
I
() +F()E
+
R
() +jF()E
+
I
() for > 0
=
_
0 for < 0
2F()E
+
R
() +j2F()E
+
I
() for > 0
(6.246)
We note that, only the DFT components of positive phase of the even lter appear
in the DFT of the projection onto the complex lter. This indicates that we may
obtain the convolution with the two original lters, if we take the DFT of the signal,
multiply it with the components of the DFT of the original even lter, that correspond
to positive frequencies only, double the result and take the inverse DFT: the real part
of the inverse DFT will be the result of the local projection of the signal on the even
lter and the imaginary part of the result will be the local projection of the signal on
the odd lter.
How do we compute the local energy of a 1D signal in practice?
In practice we do not need to use two lters to estimate the local energy (see example 6.65).
Let us assume that we have the DFT values of the even lter for positive frequencies and the
DFT of the signal. The algorithm we have to use then is:
Step 0: Set to 0 all the DFT amplitudes that correspond to the negative frequencies of the
even lter. Call this result E
+
().
Step 1: Take the DFT of the signal F().
Step 2: Multiply F() with 2E
+
(), point by point.
Step 3: Take the inverse DFT of this product.
Step 4: Take the real part of the result and square it sample by sample.
Step 5: Take the imaginary part of the result and square it sample by sample.
Step 6: Sum the two squared results: this is the local energy of the signal.
Step 7: Identify the local maxima of the local energy as the places where the signal has
either a symmetric or an antisymmetric feature.
652 Image Processing: The Fundamentals
How can we tell whether the maximum of the local energy corresponds to a
symmetric or an antisymmetric feature?
This is determined by the phase of the complex output produced by the above algorithm. Let
us call it r(x) e(x) +jo(x). The real part of this result corresponds to the local projection
of the signal on the even lter and the imaginary part to the projection of the signal on the
odd lter. The phase of the result is given by tan
1
(o(x)/e(x)). The closer to /2 this phase
is, the more similar the feature is to the antisymmetric lter. The closer to 0 this phase is,
the more similar the feature is to the symmetric lter.
Example 6.66
Use the lters developed in example 6.52, on page 633, to work out the
location and type of the features of the following signal:
f(x) =
_
_
7 for 10 x < 8
15 for 8 x < 6
9 for 6 x < 4
2 for 4 x < 3.9
9 for 3.9 x < 2
18 for 2 x < 1.9
9 for 1.9 x < 0
16 for 0 x < 4
10 for 4 x < 7.3
2 for 7.3 x < 7.5
10 for 7.5 x < 8
7 for 8 x < 10
(6.247)
Sample this signal with step 0.1 to produce the digital signal you will use
in your calculations.
Sampling the signal in steps of 0.1 produces a digital signal f(i), where i ranges from
0 to 200. We convolve the signal with lters (1, 2, 1)/
2, to
produce outputs e(i) and o(i), respectively. To avoid boundary eects and in order to
be consistent with the treatment in the frequency domain, we assume that the signal is
repeated ad innitum in both directions. We then square and add the two outputs to
produce the local energy. To identify the locations of the features, we have to identify
the local energy maxima. We dene as maxima the locations at which the local energy
is larger than one of its neighbours and larger or equal than the other of its neighbours.
Because we are dealing with real numbers, several insignicant maxima may be iden-
tied. As we know that this signal is clean (no noise), we use a pretty low threshold,
accepting as local maxima only those that are larger than 10
3
, from at least one of
their neighbours. Then we compute the phase for each index i as tan
1
(o(i)/e(i)). We
threshold the phase so that when its absolute value is above /4, we mark it as 1 and
when it is below /4 as 0. So, a local energy maximum with the corresponding phase
above the threshold is marked as a spike (a line) and a local energy maximum with
Phase congruency and the monogenic signal 653
the corresponding phase below the threshold is marked as an edge. In gure 6.66 we
plot the original signal, the local energy, and the local phase. The identied edges are
marked with an on the original signal and the identied lines with an open circle.
10 5 0 5 10
0
10
20
10 5 0 5 10
0
50
10 5 0 5 10
0
0.5
1
f(x)
x
Energy
x
Phase
x
Figure 6.66: At the top, the original signal sampled with 201 points. The mark the
locations of edges, while the open circles mark the locations of spikes. Note that as
the lters are very small, lines wider than a single sample are identied as two back-
to-back edges. In the middle, the local energy. We can see that its maxima coincide
well with the signal features. At the bottom the local phase thresholded: it is 1 when
its value at a local energy maximum is greater than /4, and 0 otherwise.
Example 6.67
Repeat example 6.66 by working in the frequency domain.
The convolution of the signal with a lter is equivalent to the multiplication of their
DFTs. As the signal is 201-sample long, its DFT is also 201-sample long. The lter,
however, is 3-tap long. To compute its DFT in the form that it can be multiplied with
the DFT of the signal point by point, it also has to be 201-tap long. Given that we
assume the signal to be repeated ad innitum in both directions, the left neighbour of
the rst sample of the signal is the last sample. So, we may consider that the signal
is convolved with the following lters:
_
2
6
,
1
6
, 0, 0, 0, . . . , 0, 0, 0,
. .
198 zeros
6
_
(6.248)
654 Image Processing: The Fundamentals
_
0,
1
2
, 0, 0, 0, . . . , 0, 0, 0,
. .
198 zeros
1
2
_
(6.249)
The DFTs of these two lters are plotted in gure 6.67. Note that the DFT of the
rst lter is purely real and the DFT of the second lter is purely imaginary. The dot
product of these two DFTs is 0, conrming that these two DFTs form and orthogonal
basis in the Fourier domain.
0 50 100 150 200
0
0.005
0.01
0 50 100 150 200
0.01
0
0.01
E(k)
O(k)
k
k
Figure 6.67: The DFT of lter (6.248) at the top, and of lter (6.249) at the bottom.
The horizontal axis is sampled with 201 points. O(k) is purely imaginary.
We then take the DFT of the signal and multiply it point by point with the DFTs of
the two lters. Let us call the two outputs O
e
(k) and O
o
(k), respectively, with k taking
values in the range [0, 200]. We take the inverse DFT of each one of these outputs to
produce e(i) and o(i), respectively. The signal features then are computed in exactly the
same way as in example 6.66. The values of e(i) and o(i) turn out to agree, to several
decimal places, with the values obtained by direct convolutions in example 6.66. On
the basis of that, one would expect the result of feature detection to be exactly identical
with that in example 6.66. The result we obtain here is shown in gure 6.68, which
is very dierent from that shown in gure 6.66. Upon closer inspection, it turns out
that the non-maxima suppression we perform by taking the local maxima of the local
energy is an unstable process. On either side of an edge or a spike, the values of the
local energy tend to be the same, with dierences only at the level of rounding error.
When something changes in the run, these dierences sometimes ip the choice of the
maximum from one to the other, with consequences also on the computed phase of the
feature. That is why, the negative spike of the signal in example 6.66 was correctly
identied as such, while here one of its sides only is identied as an edge.
Phase congruency and the monogenic signal 655
10 5 0 5 10
0
10
20
10 5 0 5 10
0
50
10 5 0 5 10
0
0.5
1
f(x)
x
Energy
x
Phase
Figure 6.68: At the top, the original signal sampled with 201 points. The mark the
locations of identied edges. In the middle, the local energy. At the bottom, the local
phase thresholded: it is 1 when its value is greater than /4, and 0 otherwise.
10 5 0 5 10
0
10
20
10 5 0 5 10
0
50
10 5 0 5 10
0
0.5
1
f(x)
x
x
Energy
Phase
Figure 6.69: At the top, the original signal sampled with 201 points. The mark the
locations of the identied edges and the open circle marks the location of the identied
spike. These results were produced by thresholding the local energy and keeping all its
values above the threshold as marking the locations of features (no non-maxima sup-
pression). In the middle, the local energy. At the bottom, the local phase thresholded:
it is 1 when its value is greater than /4, and 0 otherwise.
656 Image Processing: The Fundamentals
One way to avoid having such unstable results is not to apply non-maxima suppression,
but simply to threshold the local energy values. This will lead to thick features, with
their locations marked by multiple points, but it is a much more robust approach than
the previous one. The result of keeping all energy maxima higher than 1, is shown in
gure 6.69. We can see that the spike and the edges have been identied correctly.
Example B6.68
For the lters used in examples 6.66 and 6.67, conrm the relationships
between their DFTs, as discussed in example 6.65.
Let us consider the even lter f
e
(i) as given by (6.248) and its DFT, plotted at the
top of gure 6.67. Given that this DFT is real, the DFT of its Hilbert transform pair
should be imaginary.
The values of its indices that correspond to positive frequencies should be the same as
the values of the DFT of f
e
(i) for the same indices, and the values of its indices that
correspond to negative frequencies should be the negative of the values of the DFT of
f
e
(i) for the same indices. If the indices of the DFT go from 0 to 200, the indices that
correspond to negative frequencies are from 101 to 200, inclusive.
So, at the top of gure 6.70, we plot the DFT, E(), of lter f
e
(i) and below it, the
DFT, O(), of its Hilbert pair. We then take the inverse DFT of O() to recover
the corresponding lter f
o
(i) in the real domain. The lter we obtain is shown at the
bottom of gure 6.70.
We can see that it does not resemble at all lter (6.249), which we know to be the
Hilbert pair of f
e
(i). However, it does start from 0, and its second value is its largest
negative value, while its last value is exactly the opposite of that, both values having
absolute value 0.5198, which is dierent from 1/
_
7 for 10 x < 8
15 for 8 x < 6
9 for 6 x < 4
2 for 4 x < 3.9
12 for 3.9 x < 2
18 for 2 x < 1.9
10 for 1.9 x < 0
10 + 3x for 0 x < 2
16 for 2 x < 4
24 2x for 4 x < 7
10 for 7 x < 7.3
2 for 7.3 x < 7.5
18 for 7.5 x < 8
7 for 8 x < 10
(6.250)
658 Image Processing: The Fundamentals
Sample this signal with step 0.1 to produce the digital signal you will use
in your calculations.
First of all, given the type of signal we have, it is best if we select the parameters of the
lter so that it is tuned to select features of the type we expect to have, namely edges
and lines. So, we must select its parameters so that its frequency response is broad
and in low frequencies, rather than in high frequencies like in examples 6.63 and 6.64,
where its parameters tuned it to detect multiripple features. We select
0
= 0.2 and
1
= 0.2. At the top of gure 6.71, we plot H() for these values of its parameters.
The frequency range [1, 1] is sampled with 201 points. The indices that correspond
to negative frequencies are those above 101. Indices [101, 200] correspond to values of
in the range [1, 0). To produce the odd lter, the values of H(k) for k [0, 100]
are multiplied with j and the values of H(k) for k [101, 200] are multiplied with
j. Taking the inverse DFTs yields the lters plotted in gure 6.71 as f
e
(i) and
f
o
(i). These are lters appropriate for processing a 201-sample signal. Note that lter
f
e
(i) is essentially a lter with one positive weight surrounded by two negative weights,
exactly the type of lter appropriate for detecting spikes. These weights are:
(0.005, 0.109, 0.288, 0.374, 0.193, 0.251, 0.728, 0.933,
0.728, 0.251, 0.193, 0.374, 0.288, 0.109, 0.005, )
The f
o
(i) lter is a rst dierence lter with weights:
(0.005, 0.279, 0.614, 0.755, 0.525, 0, 0.525, 0.755, 0.614, 0.279, 0.005).
0 50 100 150 200
0
0.01
0.02
0 50 100 150 200
0.5
0
0.5
1
0 50 100 150 200
0.5
0
0.5
H(k)
k
f
e
(i)
f
o
(i)
i
i
Figure 6.71: At the top, lter H(k). In the middle, the real part of the inverse of
H(k). It is a line/spike detection lter. At the bottom, the real part of the inverse of
sign()jH(k). It is an edge detection lter.
In order to process the signal, however, we do not use these lters. Instead, we use the
algorithm described on page 651: we multiply the DFT of the signal with 2H(k), where
Phase congruency and the monogenic signal 659
for negative frequencies, ie for k > 100, we have set H(k) = 0, take the inverse DFT
and separate the real from the imaginary part to extract results e(i) and o(i). From
them we compute the local energy. Before we take its local maxima, we rst smooth it
with a low pass Gaussian lter of size 11. Figure 6.72 shows the original signal with
the identied features and below it the local energy and local phase, thresholded to be
1 for phase above /4 and 0 otherwise.
10 5 0 5 10
0
10
20
10 5 0 5 10
0
50
10 5 0 5 10
0
0.5
1
Energy
Phase
x
x
f(x)
x
Figure 6.72: At the top, the original signal with the features identied: an open circle
indicates a symmetric feature and a cross indicates an antisymmetric feature. In the
middle, the smoothed local energy function and at the bottom, the thresholded local
phase.
How can we compute phase congruency and local energy in 2D?
The problem in 2D is that the features may have a particular orientation. Thus, one requires
three quantities in order to identify features in an image: local energy, local phase and local
orientation. First attempts to generalise the 1D local energy calculation to 2D were based on
the use of several orientations. Along each orientation, the 1D approach was applied and the
orientation that yielded the maximum energy response was considered as the dominant local
orientation. This ad-hoc approach had several drawbacks, concerning bias and isotropy. The
correct way to proceed is based on the generalisation of the Hilbert transform to 2D, known
as the Riesz transform. Application of the Riesz transform to a 2D signal produces the
monogenic signal, which corresponds to the analytic signal in 1D.
What is the analytic signal?
If f(x) is a signal and f
H
(x) is its Hilbert transform, the analytic signal is f(x) + jf
H
(x).
If we know the real and the imaginary part of an analytic signal, we can compute the local
660 Image Processing: The Fundamentals
energy and phase of the original signal. The whole process we developed so far was aimed at
computing the local components of the analytic signal of a given real signal.
How can we generalise the Hilbert transform to 2D?
Let us recall what the Hilbert transform does. It rst considers an even lter and processes
the signal with that. An even lter can be easily generalised to 2D, as it is isotropic. So,
we may consider a 2D even lter and process with that the 2D image. Then the Hilbert
transform creates from the even lter an odd lter, by multiplying its DFT with sign()j.
Now, an odd lter has a preferred direction, as it is a dierentiating lter. Given that an
image has two directions, we must be able to generate two such lters, one for each direction.
Their outputs have to be combined with the output of the original even lter, to produce
the local energy. However, these lters being two, would dominate the sum. So, we must
nd a way to combine their outputs proportionally. If the frequency along one direction is
x
and along the other is
y
, we weigh these lters using
x
/
_
2
x
+
2
y
and
y
/
_
2
x
+
2
y
.
When we take the inverse DFTS of the image processed by these three lters, the ratio of the
outputs of the two odd lters will tell us the dominant orientation of the detected feature. In
addition, the higher the output of the even lter is, the more likely the detected feature to be
a symmetric feature, and the higher the combined output of the two odd lters is, the more
likely the detected feature to be an antisymmetric feature. This intuitive understanding of
what we have to do leads to the denition of the Riesz transform.
How do we compute the Riesz transform of an image?
Step 0: Take the DFT of the image, F(
x
,
y
).
Step 1: Consider an even lter with frequency response H(
x
,
y
).
Step 2: Multiply the DFT of the image with the DFT of the lter point by point, and take
the inverse DFT of the product, to produce f
H
(x, y).
Step 3: Dene functions:
H
1
(
x
,
y
) j
x
_
2
x
+
2
y
H
2
(
x
,
y
) j
y
_
2
x
+
2
y
(6.251)
These functions will help produce the Hilbert pair of the even lter along each axis and at
the same time, share the odd energy component between the two lters, with weights that,
when squared and summed, yield 1.
Step 4: Take the inverse DFT of F(
x
,
y
)H(
x
,
y
)H
1
(
x
,
y
) to produce f
H
1
(x, y).
Step 5: Take the inverse DFT of F(
x
,
y
)H(
x
,
y
)H
2
(
x
,
y
) to produce f
H
2
(x, y).
The output (f
H
(x, y), f
H
1
(x, y), f
H
2
(x, y)) is the monogenic signal of the image.
How can the monogenic signal be used?
The monogenic signal (f
H
(x, y), f
H
1
(x, y), f
H
2
(x, y)), at a pixel position (x, y), may be used
to compute the local image energy as:
E(x, y)
_
f
H
(x, y)
2
+f
H
1
(x, y)
2
+f
H
2
(x, y)
2
(6.252)
Local maxima of this quantity identify the locations of image features.
Phase congruency and the monogenic signal 661
The local feature symmetry (ie how symmetric or antisymmetric the feature is) may be
measured as:
(x, y)
tan
1
f
H
(x, y)
_
f
H
1
(x, y)
2
+f
H
2
(x, y)
2
(6.253)
Here we assume that function tan
1
yields values in the range [/2, /2]. If the feature is
purely symmetric, the numerator of this fraction is maximal and the denominator is minimal.
So the calculated angle will tend to be either close to 90
o
or to 90
o
. If we do not want
to distinguish the two, we take the absolute value of the result. If the feature is mostly
antisymmetric, the numerator approaches 0 while the denominator is large, and (x, y) is
close to 0. So, (x, y) is a measure of local symmetry, taking values in the range [0, /2].
The local feature orientation may be computed as:
(x, y) tan
1
f
H
2
(x, y)
f
H
1
(x, y)
+
_
1 + sign
_
tan
1
f
H
2
(x, y)
f
H
1
(x, y)
__
(6.254)
This number varies between 0 and , as factor is added only to negative angles computed
by tan
1
, to put them in the range [/2, ].
How do we select the even lter we use?
We usually wish to detect features that manifest themselves in one or more frequency bands.
So, we place the nonzero frequency response of the lter in the band of interest. Some
features, which are more prominent, manifest themselves in several bands. These features are
considered more stable than others and one way of selecting them is to check their detectability
in several frequency bands. The broader the band we use, the narrower the lter will be in
the real domain and, therefore, the better it will respond to image details. The narrower the
band, the broader the lter in the real domain and the grosser the features it will pick. So,
using a selection of band widths allows us to detect features of dierent scales.
Figure 6.73 shows the frequency space of an image, with four dierent cases of band
selection. In the rst case, the band is such that the 2D even lter placed there will act as a
low pass lter along the y axis and as a symmetric feature detector along the x axis. In the
second case, the selected band contains the dc component along the x axis, but it does not
contain it along the y axis. So, this lter will smooth along the x axis and enhance symmetric
features along the y axis. In the third case, the lter does not contain the dc component along
any of the axis. This lter, therefore, will respond to features that show local symmetry along
both axis, namely diagonal features. Finally, the last case corresponds to a composite lter
that will respond to features either along the main or the secondary diagonal of the axes.
Often, a systematic way is used to sample the bands. A Gaussian function dened with
one of its axes along the radial direction is used to identify the band of interest. This type
of frequency band creation leads to the so called Gabor functions
2
. The bands that are
identied in such cases are schematically shown in gure 6.74.
It has to be claried that for each band we select, we can measure a dierent local energy,
local phase and local orientation. So, from each selected band we can estimate how much the
image locally resembles the feature that corresponds to that band, what is the local orientation
of this feature, and whether the feature is mostly symmetric or mostly antisymmetric. The
use of all frequency bands characterises the image and all its local structure fully.
2
Gabor functions are fully covered in Image Processing, dealing with Texture, by Petrou and Garcia
Sevilla.
662 Image Processing: The Fundamentals
x
(0,0)
x
(0,0)
(c)
x
(0,0)
(b)
(a)
x
(0,0)
(d)
Figure 6.73: The 2D frequency domain of an image. Positive and negative frequencies are
shown. The large rectangle indicates the frequency band of the digital image. The small
rectangles indicate the bands where the selected even lter has nonzero response. When
the lter does not include in its band the (0, 0) frequency (ie it has 0 dc component), the
rectangles are slightly displaced from the axis to indicate this. (a) A lter in this band will
smooth along the y axis (since it will have a nonzero dc component along this axis) and it
will enhance a symmetric feature along the x axis. (b) A lter in this band will smooth along
the x axis and enhance a symmetric feature along the y axis. (c) A lter in this band will
enhance symmetric features along the main diagonal of the axes. (d) We may use a composite
lter to enhance features along the two diagonal directions of the axes.
x
Figure 6.74: Usually the bands used to study the frequency content of an image are selected by
considering Gaussian lters with elliptic cross-sections, with one of their axes aligned with the
radial direction in the frequency domain. A pair of identical Gaussians has to be considered
in corresponding bands of the positive and negative frequencies, for the corresponding lters
in the spatial domain to be real.
Phase congruency and the monogenic signal 663
Example 6.70
Apply the Riesz transform to the 201201 image of gure 6.75 and compute
its local energy and local phase. Identify the features of the image by
considering the local maxima of the energy along directions orthogonal to
the local orientation. Identify the type of local feature by considering the
local phase.
Figure 6.75: An original image containing edges and lines.
First we have to select the even lter we shall use. We are going to use lter (6.242),
extended to 2D:
H(
x
,
y
) = log
_
cos
2
(
x
0x
)
cos
2
1x
_
log
_
cos
2
(
y
0y
)
cos
2
1y
_
(6.255)
for |
x
0x
| <
1x
and |
y
0y
| <
1y
In the digital domain, we shall replace
x
with 2k/N and
y
with 2l/M, for an
N M image:
H(k, l) = log
_
_
cos
2
_
2(kk
0
)
N
_
cos
2
_
2k
1
N
_
_
_
log
_
_
cos
2
_
2(ll
0
)
M
_
cos
2
_
2l
1
M
_
_
_
(6.256)
for |k k
0
| < k
1
and |l l
0
| < l
1
We shall chose the parameters of this lter so it has nonzero response in the bands
dened in gure 6.73a,b,d, so that, in the rst case, we shall detect horizontal features,
in the second case vertical features and in the third case diagonal features.
As the DFT we compute for the image has the dc component at its corner, we must be
careful how we shall represent each lter. In the Fourier domain, a function is assumed
to be repeated ad innitum in all directions. So, we may think of the domain of interest
repeated as shown in gure 6.76. Each panel in gure 6.76 was produced by repeating
four times the corresponding panel of gure 6.73. Then we may shift the lter domain
so that the dc component is at the corner. In practice, this means that we compute the
lter values using formula (6.256) for k = 0, 1, . . . ,
_
N
2
_
and l = 0, 1, . . . ,
_
M
2
_
. For
664 Image Processing: The Fundamentals
N = M = 201, this means that we compute the lter values for k, l = 0, 1, . . . , 100.
For indices k = 101, . . . , 200 and l = 0, 1, . . . ,
_
M
2
_
, we dene the lter values using
H(k, l) = H(, l) for k =
_
N
2
_
+ 1, . . . , N 1 and l = 0, 1, . . . ,
_
M
2
_
(6.257)
where N k. This reects the lter values for the indices that correspond to
negative frequencies, ie for indices k = 101, . . . , 200. Finally, we reect again to
create values for the l indices that correspond to negative frequencies
H(k, l) = H(k, ) for k = 0, 1, . . . , N 1 and l =
_
M
2
_
+ 1, . . . , M 1 (6.258)
where M l.
x
(0,0)
(b)
(0,0)
x
(a)
y
k=1
p
k
log p
k
(7.1)
Here G is the number of distinct values the pixels can take (the number of grey levels in
the image), and p
k
is the frequency for a particular grey level in the image (the value of
the kth bin of the normalised histogram of the image), when we use G bins to construct
it. The information measure H is called the entropy of the image. The larger its
value, the more information the image contains.
Example B7.1
Show that the histogram equalised version of an image conveys the maxi-
mum possible information the image may convey. (Perform the analysis in
the continuous domain.)
The image conveys maximum information when its entropy is maximum. If we dier-
entiate H, given by equation (7.1), with respect to p
k
, we obtain:
H
p
k
= log p
k
p
k
1
p
k
= log p
k
1 (7.2)
The second derivative of H with respect to p
k
is:
2
H
p
2
k
=
1
p
k
< 0 (7.3)
674 Image Processing: The Fundamentals
This shows that at the root of the rst derivative, H takes its maximum. The rst
derivatives of H with respect to the p
k
s are 0 when log p
k
= 1 for all k. The important
point here is that all p
k
s have to take the same value. As they have to sum up to 1,
this value is 1/G and this concludes the proof.
Example B7.2
Show that when the range of grey values of an image increases, its infor-
mation content also increases.
Let us consider two versions of the same image, one with G + 1 grey values and one
with G grey values. Let us also assume that we have applied histogram equalisation to
each image to maximise the information it conveys, as shown in example 7.1. Then,
in the rst image, every grey value has probability 1/(G+ 1) to arise. Substituting in
(7.1), we work out that the information content of this image is
H
1
=
G+1
k=1
1
G+ 1
log
_
1
G+ 1
_
= log(G+ 1) (7.4)
For the image with G grey levels, the information content is H
2
= log G. Obviously
H
2
< H
1
. So, to maximise the information conveyed by a single band of a multispectral
image, we must maximise the range of grey values of the band.
How do we perform principal component analysis in practice?
To perform principal component analysis we must diagonalise the covariance matrix of the
data. The autocovariance function of the outputs of the assumed random experiment is
C(i, j) E{(x
i
(m, n) x
i0
)(x
j
(m, n) x
j0
)} (7.5)
where x
i
(m, n) is the value of pixel (m, n) in band i, x
i0
is the mean of band i, x
j
(m, n) is
the value of the same pixel in band j, x
j0
is the mean of band j and the expectation value
is computed over all outcomes of the random experiment, ie over all pixels of the image. For
an M N image:
C(i, j) =
1
MN
M
m=1
N
n=1
(x
i
(m, n) x
i0
)(x
j
(m, n) x
j0
) (7.6)
For a 3-band image, variables i and j take only three values, so the covariance matrix is
a 3 3 matrix. For data that are uncorrelated, C is diagonal, ie C(i, j) = 0 for i = j. To
achieve this, we must transform the data using the transformation matrix A made up from
Multispectral images 675
the eigenvectors of the covariance matrix of the untransformed data. The process is as follows.
Step 1: Find the mean of the distribution of points in the spectral space, point (
B
1
,
B
2
,
B
3
),
by computing the average grey value of each band.
Step 2: Subtract the mean grey value from the corresponding band. This is equivalent to
translating the original coordinate system to be centred at the centre of the cloud of pixels
(see axes B
1
B
2
B
3
in gure 7.2b).
Step 3: Compute the autocorrelation matrix C(i, j) of the initial cloud, using (7.6), where i
and j identify the dierent bands.
Step 4: Compute the eigenvalues of C(i, j) and arrange them in decreasing order. Form the
eigenvector matrix A, having the eigenvectors as rows.
Step 5: Transform the cloud of pixels using matrix A. For a 3-band image, each triplet
x =
_
_
B
1
B
1
B
2
B
2
B
3
B
3
_
_
is transformed into y =
_
_
P
1
P
2
P
3
_
_
by: y = Ax. In other words, the new values
a pixel will carry will be given by y
k
=
i
a
ki
x
i
, where k indexes the new bands, while i
indexes the old bands.
This is a linear transformation. The new bands are linear combinations of the intensity
values of the initial bands, arranged so that the rst principal component contains most of
the information for the image (see gure 7.2b). This is ensured by Step 4, where we use the
largest eigenvalue for the rst principal component. (The eigenvalue represents the spread of
the data along the corresponding eigenvector.)
What are the advantages of using the principal components of an image, instead
of the original bands?
1. The information conveyed by each principal band is maximal for the number of bits
used, because the bands are uncorrelated and no information contained in one band can
be predicted by the knowledge of the other bands.
2. If we want to use a grey version of the image, we can restrict ourselves to the rst
principal component only, and be sure that it has the maximum contrast and contains
the maximum possible information conveyed by a single band of the image.
An example of principal component analysis is shown in gure 7.3. Although at rst
glance not much dierence is observed between gures 7.3a, 7.3b, 7.3c and 7.3d, with a more
careful examination, we can see that the rst principal component combines the best parts of
all three original bands: for example, the sky has maximum contrast in the third band and
minimum contrast in the rst band. In the rst principal component, it has an intermediate
contrast. The roof has maximum contrast in the rst band, while it has much less contrast in
the other two bands. In the rst principal component, the roof has an intermediate contrast.
What are the disadvantages of using the principal components of an image instead
of the original bands?
The grey values in the bands created from principal component analysis have no physical
meaning, as they do not correspond to any physical bands. As a result, the grey value of a
676 Image Processing: The Fundamentals
(a) Band B
1
(d) 1st PC, P
1
(b) Band B
2
(e) 2nd PC, P
2
(c) Band B
3
(f) 3rd PC, P
3
Figure 7.3: Principal component analysis of the Mount Athos border (384 512 pixels).
pixel cannot be used for the classication of the pixel. This is particularly relevant to remote
sensing applications, where, often, pixels are classied according to their grey values. In a
principal component band, pixels that represent water, for example, may appear darker or
brighter than other pixels in the image, depending on the image content, while the degree of
greyness of water pixels in the various spectral bands is always consistent, well understood
by remote sensing scientists, and often used to identify them.
Multispectral images 677
Example 7.3
Is it possible for matrix C to represent the autocovariance matrix of a
three-band image?
C =
_
_
1 0 1
0 1 2
2 2 0
_
_
(7.7)
This matrix cannot represent the autocovariance matrix of an image, because, from
equation (7.6), it is obvious that C must be symmetric with positive elements along its
diagonal.
Example 7.4
A 3-band image consists of bands with mean 3, 2 and 3, respectively. The
autocovariance matrix of this image is given by:
C =
_
_
2 0 1
0 2 0
1 0 2
_
_
(7.8)
A pixel has intensity values 5, 3 and 4 in the three bands, respectively.
What will be the transformed values of the same pixel in the three principal
component bands?
First, we must compute the eigenvalues of matrix C:
2 0 1
0 2 0
1 0 2
= 0
(2 )
3
(2 ) = 0 (2 )
_
(2 )
2
1
_
= 0
(2 )(1 )(3 ) = 0 (7.9)
Therefore,
1
= 3,
2
= 2,
3
= 1. The corresponding eigenvectors are:
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
x
1
x
2
x
3
_
_
= 3
_
_
x
1
x
2
x
3
_
_
2x
1
+x
3
= 3x
1
2x
2
= 3x
2
x
1
+ 2x
3
= 3x
3
x
1
= x
3
x
2
= 0
u
1
=
_
_
1
2
0
1
2
_
_
678 Image Processing: The Fundamentals
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
x
1
x
2
x
3
_
_
= 2
_
_
x
1
x
2
x
3
_
_
2x
1
+x
3
= 2x
1
2x
2
= 2x
2
x
1
+ 2x
3
= 2x
3
x
2
anything
x
1
= x
3
= 0
u
2
=
_
_
0
1
0
_
_
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
x
1
x
2
x
3
_
_
2x
1
+x
3
= x
1
2x
2
= x
2
x
1
+ 2x
3
= x
3
x
1
= x
3
x
2
= 0
u
3
=
_
_
2
0
1
2
_
_
(7.10)
The transformation matrix A is:
_
_
1
2
0
1
2
0 1 0
2
0
1
2
_
_
(7.11)
We rst subtract the mean from pixel
_
_
5
3
4
_
_
, to obtain
_
_
2
1
1
_
_
and then perform the
transformation:
_
_
p
1
p
2
p
3
_
_
=
_
_
1
2
0
1
2
0 1 0
2
0
1
2
_
_
_
_
2
1
1
_
_
=
_
_
3
2
1
2
_
_
(7.12)
Example 7.5
You are given the following 4 4 3-band image:
B
1
=
_
_
_
_
3 3 5 6
3 4 4 5
4 5 5 6
4 5 5 6
_
_
_
_
B
2
=
_
_
_
_
3 2 3 4
1 5 3 6
4 5 3 6
2 4 4 5
_
_
_
_
B
3
=
_
_
_
_
4 2 3 4
1 4 2 4
4 3 3 5
2 3 5 5
_
_
_
_
(7.13)
Calculate its three principal components and verify that they are uncorre-
lated.
First, we calculate the mean of each band:
Multispectral images 679
B
1
=
1
16
(3 + 3 + 5 + 6 + 3 + 4 + 4 + 5 + 4 + 5 + 5 + 6 + 4 + 5 + 5 + 6)
=
73
16
= 4.5625
B
2
=
1
16
(3 + 2 + 3 + 4 + 1 + 5 + 3 + 6 + 4 + 5 + 3 + 6 + 2 + 4 + 4 + 5)
=
60
16
= 3.75
B
3
=
1
16
(4 + 2 + 3 + 4 + 1 + 4 + 2 + 4 + 4 + 3 + 3 + 5 + 2 + 3 + 5 + 5)
=
54
16
= 3.375 (7.14)
Next, we calculate the elements of the covariance matrix as:
C
B
1
B
1
=
1
16
4
k=1
4
l=1
(B
1
(k, l)
B
1
)
2
= 0.996094
C
B
1
B
2
=
1
16
4
k=1
4
l=1
(B
1
(k, l)
B
1
)(B
2
(k, l)
B
2
) = 0.953125
C
B
1
B
3
=
1
16
4
k=1
4
l=1
(B
1
(k, l)
B
1
)(B
3
(k, l)
B
3
) = 0.726563
C
B
2
B
2
=
1
16
4
k=1
4
l=1
(B
2
(k, l)
B
2
)
2
= 1.9375
C
B
2
B
3
=
1
16
4
k=1
4
l=1
(B
2
(k, l)
B
2
)(B
3
(k, l)
B
3
) = 1.28125
C
B
3
B
3
=
1
16
4
k=1
4
l=1
(B
3
(k, l)
B
3
)
2
= 1.359375 (7.15)
Therefore, the covariance matrix is:
C =
_
_
0.996094 0.953125 0.726563
0.953125 1.937500 1.281250
0.726563 1.28125 1.359375
_
_
(7.16)
The eigenvalues of this matrix are:
1
= 3.528765
2
= 0.435504
3
= 0.328700 (7.17)
680 Image Processing: The Fundamentals
The corresponding eigenvectors are:
u
1
=
_
_
0.427670
0.708330
0.561576
_
_
u
2
=
_
_
0.876742
0.173808
0.448457
_
_
u
3
=
_
_
0.220050
0.684149
0.695355
_
_
(7.18)
The transformation matrix, therefore, is:
A =
_
_
0.427670 0.708330 0.561576
0.876742 0.173808 0.448457
0.220050 0.684149 0.695355
_
_
(7.19)
We can nd the principal components by using this matrix to transform the values of
every pixel. For example, for the rst few pixels we nd:
_
_
0.8485
1.5198
0.6039
_
_
=
_
_
0.427670 0.708330 0.561576
0.876742 0.173808 0.448457
0.220050 0.684149 0.695355
_
_
_
_
3 4.5625
3 3.75
4 3.375
_
_
_
_
2.6800
0.4491
0.1027
_
_
=
_
_
0.427670 0.708330 0.561576
0.876742 0.173808 0.448457
0.220050 0.684149 0.695355
_
_
_
_
3 4.5625
2 3.75
2 3.375
_
_
_
_
0.5547
0.6821
0.3486
_
_
=
_
_
0.427670 0.708330 0.561576
0.876742 0.173808 0.448457
0.220050 0.684149 0.695355
_
_
_
_
5 4.5625
3 3.75
3 3.375
_
_
(7.20)
We use the rst element of each transformed triplet to form the rst principal com-
ponent of the image, the second element for the second principal component and the
third for the third one. In this way, we derive:
P
1
=
_
_
_
_
0.8485 2.6800 0.5547 1.1428
3.9499 0.9958 1.5440 2.1318
0.2875 0.8619 0.5547 3.1211
2.2523 0.1536 1.2768 2.4128
_
_
_
_
P
2
=
_
_
_
_
1.5198 0.4491 0.6821 0.9366
0.1731 0.9907 0.2538 0.2878
0.8169 0.3345 0.6821 0.1405
0.4276 0.5083 0.3886 0.3143
_
_
_
_
P
3
=
_
_
_
_
0.6034 0.1027 0.3486 0.5799
0.1139 0.5444 0.5668 1.0085
0.1398 1.0197 0.3486 0.0931
0.1174 0.3355 1.0552 0.5911
_
_
_
_
(7.21)
Multispectral images 681
To conrm that these new bands contain uncorrelated data, we calculate their autoco-
variance matrix. First, we nd the mean of each band:
P
1
,
P
2
,
P
3
. Then we compute:
C
P
1
P
1
=
1
16
4
i=1
4
j=1
(P
1
(i, j)
P
1
)
2
= 3.528765
C
P
1
P
2
=
1
16
4
i=1
4
j=1
(P
1
(i, j)
P
1
)(P
2
(i, j)
P
2
) = 0.0
C
P
1
P
3
=
1
16
4
i=1
4
j=1
(P
1
(i, j)
P
1
)(P
3
(i, j)
P
3
) = 0.0
C
P
2
P
2
=
1
16
4
i=1
4
j=1
(P
2
(i, j)
P
2
)
2
= 0.435504
C
P
2
P
3
=
1
16
4
i=1
4
j=1
(P
2
(i, j)
P
2
)(P
3
(i, j)
P
3
) = 0.0
C
P
3
P
3
=
1
16
4
i=1
4
j=1
(P
3
(i, j)
P
3
)
2
= 0.328700 (7.22)
We see that this covariance matrix is diagonal, so it refers to uncorrelated data.
To visualise these new bands, we have to map their values in the range [0, 255], with
the same transformation formula. The minimum value we observe is 3.9499 and the
maximum is 3.1211. The mapping then should be done according to
g
new
=
_
g
old
+ 3.9499
3.1211 + 3.9499
255 + 0.5
_
(7.23)
where g
old
is one of the values in matrices P
1
, P
2
or P
3
and g
new
is the corresponding
new value.
Example 7.6
For the image of example 7.5 show that the rst principal component has
more contrast than any of the original bands.
The contrast of an image may be characterised by the range of grey values it has.
We can see that the contrast of the original image was 3 in the rst band, 5 in the
second and 4 in the third band. The range of values in the rst principal component
is 3.1211 (3.9499) = 7.0710. This is larger than any of the previous ranges.
682 Image Processing: The Fundamentals
Is it possible to work out only the rst principal component of a multispectral
image if we are not interested in the other components?
Yes. We may use the so called power method, that allows us to calculate only the most
signicant eigenvalue of a matrix. This, however, is possible only if the covariance matrix
C has a single dominant eigenvalue (as opposed to two or more eigenvalues with the same
absolute value). (See Box 7.2.)
Box 7.2. The power method for estimating the largest eigenvalue of a matrix
If matrix A is diagonalisable and has a single dominant eigenvalue, ie a single eigen-
value that has the maximum absolute value, this eigenvalue and the corresponding
eigenvector may be estimated using the following algorithm.
Step 1: Select a vector of unit length, that is not parallel to the dominant
eigenvalue. Let us call it x
0
. A vector chosen at random, most likely will not coincide
with the dominant eigenvector of the matrix. Set k = 0.
Step 2: Compute x
k+1
= Ax
k
.
Step 3: Normalise x
k+1
to have unit length,
x
k+1
x
k+1
_
x
2
k+1,1
+x
2
k+1,2
+ +x
2
k+1,N
(7.24)
where x
k+1,i
is the ith element of x
k+1
, which has N elements.
Step 4: If x
k+1
is not the same as x
k
, within a certain tolerance, set k = k + 1 and
go to Step 2.
The more dissimilar the eigenvalues of the matrix are, the faster this algorithm
converges.
Once convergence has been achieved, the dominant eigenvalue may be computed as the
Rayleigh quotient of the estimated eigenvector x:
dominant
=
x
T
Ax
x
T
x
(7.25)
This method may also be used to compute the minimum nonzero eigenvalue of a matrix
(see example 7.8).
Multispectral images 683
Example B7.7
Use the power method to work out an approximation of the dominant
eigenvector and the corresponding eigenvalue of matrix C, given by (7.8).
We start by making a random choice x
0
= (1, 1, 1)
T
. We compute x
1
as:
x
1
= Cx
0
=
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
1
1
1
_
_
=
_
_
3
2
3
_
_
(7.26)
We normalise x
1
to produce x
1
= (3/
22, 2/
22, 3/
22)
T
. We then work out x
2
:
x
2
= Cx
1
=
_
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
_
_
3
22
2
22
3
22
_
_
_ =
_
_
_
9
22
4
22
9
22
_
_
_ (7.27)
We normalise x
2
to produce x
2
= (9/
178, 4/
178, 9/
178)
T
. From this, we pro-
duce in the same way, x
3
= (27/
178, 8/
178, 27/
178)
T
, which, when normalised,
becomes x
3
= (27/
1522, 8/
1522, 27/
1522)
T
. Then:
x
4
= (81/
1522, 16/
1522, 81/
1522)
T
x
4
= (81/
13378, 16/
13378, 81/
13378)
T
x
5
= (243/
13378, 32/
13378, 243/
13378)
T
x
5
= (243/
119122, 32/
119122, 243/
119122)
T
x
6
= (729/
119122, 64/
119122, 729/
119122)
T
x
6
= (729/1032.95, 64/1032.95, 729/1032.95)
T
x
7
= (0.7065, 0.0410, 0.7065)
T
x
8
= (0.7068, 0.0275, 0.7068)
T
x
9
= (0.7070, 0.0184, 0.7070)
T
The corresponding eigenvalue is given by:
=
(0.7070, 0.0184, 0.7070)
T
_
_
2 0 1
0 2 0
1 0 2
_
_
_
_
0.7070
0.0184
0.7070
_
_
0.7070
2
+ 0.0184
2
+ 0.7070
2
= 2.9997 (7.28)
These results compare very well with the correct dominant eigenvalue that was com-
puted to be 3 in example 7.4 and the corresponding eigenvector as (1/
2, 0, 1/
2)
T
=
(0.7071, 0, 0.7071)
T
. The other eigenvalues of this matrix were found to be 2 and 1 in
example 7.4. These are not very dierent from 3 and that is why the power method
for this matrix converges rather slowly.
684 Image Processing: The Fundamentals
Example B7.8
Show that the largest eigenvalue of a matrix is the smallest eigenvalue of
its inverse and the corresponding eigenvector is the same.
Let us consider matrix A and its eigenpair (, x), where is the largest eigenvalue.
By denition:
Ax = x (7.29)
Let us multiply both sides of this equation with A
1
, and remember that , being a
scalar, can change position in the expression on the right-hand side:
A
1
Ax = A
1
x
x = A
1
x
1
x = A
1
x (7.30)
This shows that x is an eigenvector of A
1
, with the corresponding eigenvalue being
1/. Since is the largest eigenvalue of A, its inverse is obviously the smallest eigen-
value of A
1
. So, if we want to compute the smallest eigenvalue of a matrix, we can
use the power method to compute the largest eigenvalue of its inverse.
What is the problem of spectral constancy?
The solution of this problem concentrates on making sure that the same physical surface
patch, or dierent but spectrally identical surface patches, when imaged under two dierent
illumination conditions and imaging geometries, are recognised by the computer as having
the same spectrum. To deal with this problem, we have to think carefully on the process
of image formation and correct for the recorded pixel values to eliminate the dependence on
illumination.
What inuences the spectral signature of a pixel?
The spectrum of the source that illuminates the imaged surface, the properties of the material
the surface is made from and the sensitivity function of the sensor we use. The properties of
the material of the surface are expressed by the reectance function.
What is the reectance function?
The reectance function expresses the fraction of incident light that is reected (as opposed
to being absorbed) by the surface, as a function of the wavelength of the incident light.
Does the imaging geometry inuence the spectral signature of a pixel?
No. The relative orientation of the imaged surface with respect to the illumination direction
and the viewing direction of the camera inuences only the total amount of light energy the
Multispectral images 685
surface receives, but it does not inuence the way this energy is distributed across dierent
wavelengths. So, the relative values of the spectral signature components do not change with
the imaging geometry.
How does the imaging geometry inuence the light energy a pixel receives?
For a large fraction of materials, the light that is reected by the surface is reected with
the same intensity in all directions. This then means that the orientation of the surface with
respect of the camera is irrelevant to the process of image formation: in whichever direction
the camera is, it will receive the same light intensity from the surface. Such materials are
known as Lambertian. The orientation of the surface in relation to the illuminating source,
however, is very important: if the surface is turned away from the light source, the surface
will receive no light and it will be black. This dependence is expressed by the cosine of the
angle between the normal vector of the surface and the direction of illumination. This is
shown schematically in gure 7.4.
A
B
A
B
n
m
Figure 7.4: A surface AB catches light proportional to the area it presents towards the
illuminating source. This is area A
= ABcos and,
therefore, the light a surface catches per unit area is proportional to cos .
How do we model the process of image formation for Lambertian surfaces?
For Lambertian surfaces, we may model the process of image formation as
Q = m n
_
+
0
S()I()R()d (7.31)
where Q is the recording of a sensor with sensitivity function S(), when it sees a surface with
normal vector n and reectance function R(), illuminated by a source with spectrum I(),
in direction m. We see that the spectrum of a pixel, made up from the dierent Q values
that correspond to the dierent camera sensors, depends on the imaging geometry through a
simple scaling factor m n. The dependence on the imaging geometry and on the illumination
686 Image Processing: The Fundamentals
spectrum are interferences, since what we are interested in is to be able to reason about the
materials identied by their function R(), using the information conveyed by the observed
spectral values Q.
How can we eliminate the dependence of the spectrum of a pixel on the imaging
geometry?
Let us dene
ij
m n
ij
(7.32)
which is a number expressing the imaging geometry for surface patch (i, j), with normal
vector n
ij
.
For a multispectral camera, with L bands, we have:
Q
l
(i, j) =
ij
_
+
0
S
l
()I
0
()R
ij
()d for l = 1, 2, . . . , L (7.33)
To remove the dependence on the imaging geometry, dene:
q
l
(i, j)
Q
l
(i, j)
L
k=1
Q
k
(i, j)
for l = 1, 2, . . . , L (7.34)
These q
l
(i, j) values constitute the normalised spectrumof pixel (i, j), which is independent
from the imaging geometry.
How can we eliminate the dependence of the spectrum of a pixel on the spectrum
of the illuminating source?
We may assume, to a good approximation, that the camera responses are delta functions, ie:
S
l
() = (
l
) (7.35)
This assumption is nearly true for multispectral and hyperspectral cameras. Then:
Q
l
(i, j) =
ij
I
0
(
l
)R
ij
(
l
) (7.36)
If the same surface patch is seen under a dierent illuminant, with dierent spectral
characteristics, the values recorded for it will be:
Q
l
(i, j) =
ij
I
0
(
l
)R
ij
(
l
) (7.37)
Note that:
Q
l
(i, j)
Q
l
(i, j)
=
I
0
(
l
)
I
0
(
l
)
for l = 1, 2, . . . , L (7.38)
This expression means that under a new illuminant, the spectral values of a pixel become
l
Q
l
(i, j) for l = 1, 2, . . . , L (7.39)
where
l
I
0
(
l
)/I
0
(
l
) is a parameter that depends only on the band and is the same for
all pixels. This means that all pixels in the image will change in the same way and so, the
Multispectral images 687
average value of the image will change in the same way too. To remove dependence on the
spectrum of the illuminant, we may, therefore, dene
q
l
(i, j)
Q
l
(i, j)
1
MN
M
m1
N
n=1
Q
l
(m, n)
for l = 1, 2, . . . , L (7.40)
where M N is the size of the image.
What happens if we have more than one illuminating sources?
For K illuminating sources, in directions m
1
, m
2
,...,m
K
, the total illumination received by
the surface patch will be
K
k=1
m
k
nI
0k
() (7.41)
or:
n
K
k=1
m
k
I
0k
() (7.42)
We may always dene an eective light source with intensity I
0
at m
0
, such that:
I
0
m
0
K
k=1
m
k
I
0k
() (7.43)
So, the analysis done for the use of a single illumination source may also be transferred
to the case of several sources.
How can we remove the dependence of the spectral signature of a pixel on the
imaging geometry and on the spectrum of the illuminant?
Some researchers have proposed an algorithm that may be used to remove the dependence of
the image values on the imaging geometry and the spectrum of the illuminant, under three
very important conditions:
the surface is Lambertian;
the camera sensitivities are delta functions;
the surface has uniform spectrum, ie the surface is made up from the same material.
The steps of the algorithm are as follows.
Step 0: Set
q
0
l
(i, j) =
MNq
l
(i, j)
L
M
m=1
N
n=1
q
l
(m, n)
(7.44)
where q
l
(i, j) is the brightness of pixel (i, j) in band l, L is the number of bands and M N
is the size of the image.
Step 1: At iteration step t and for t 1, for each pixel (i, j) compute q
t
l
(i, j), using:
q
t
l
(i, j)
q
t1
l
(i, j)
L
k=1
q
t1
k
(i, j)
for l = 1, 2, . . . , L (7.45)
688 Image Processing: The Fundamentals
Step 2: For each pixel (i, j) compute q
l
(i, j), using
q
t
l
(i, j)
MNq
t
l
(i, j)
L
M
m=1
N
n=1
q
t
l
(m, n)
(7.46)
Step 3: If the image changed, go to Step 1. If not, exit.
The factor of L in the denominator of (7.46) is to restore the correct range of values of
q
l
(i, j), which is divided at each iteration step with the sum of L numbers of the same order
of magnitude as itself (see equation (7.45)).
What do we have to do if the imaged surface is not made up from the same
material?
In that case we have to segment the image rst into regions of uniform pixel spectra and cor-
rect for the illumination interference for each region separately. Step 2 of the above algorithm
then has to be replaced with:
Step 2(multimaterials): For each pixel (i, j) compute q
l
(i, j), using
q
t
l
(i, j)
N
R
q
t
l
(i, j)
L
(m,n)R
q
t
l
(m, n)
(7.47)
where R is the region to which pixel (i, j) belongs and N
R
is the number of pixels in that
region.
What is the spectral unmixing problem?
This problem arises in remote sensing where low resolution sensors may capture images of the
surface of the Earth, where an image pixel corresponds to a patch on the surface that contains
several materials. The word low here is a relative term: it means that the resolution of the
camera is low in relation to the typical size of the objects we wish to observe. For example,
if we wish to identify cities, a pixel corresponding to 100m
2
on the ground is adequate; if,
however, we wish to identify cars, such a pixel size is too big.
Sensors on board Earth observation satellites may capture images with several bands. For
example, a hyperspectral camera may capture images with a few hundred bands (instead of the
3-band images we commonly use). Let us say that we have data from a camera with L bands.
This means that for each pixel we have L dierent values, one for each band. These values
constitute the spectral signature of the pixel. If the pixel corresponds to a relatively large
patch on the ground, several dierent materials may have contributed to its spectral signature.
For example, if the pixel corresponds to 22 square metres on the ground, there might be some
plants as well as some concrete around the ower bed of these plants. The spectral signature
of the corresponding pixel then will be a combination of the spectral signatures of the plants
and of the concrete. The linear mixing model assumes that this combination is linear and
the mixing proportions correspond to the fractional covers of the patch on the ground. Thus,
working out, for example, that the spectral signature of the pixel we observe can be created
by adding 0.6 of the spectrum of the plants and 0.4 of the spectrum of the concrete, we may
Multispectral images 689
infer that 60% of the 4m
2
patch is covered by plants and 40% by concrete. That is why the
mixing proportions sometimes are referred to as cover proportions as well.
How do we solve the linear spectral unmixing problem?
The problem may be solved by incorporating some prior knowledge concerning the spectra of
the K dierent pure materials that are present in the scene. If s
i
is the spectrum of pixel i,
and p
k
is the spectrum of pure material k, we may write
s
i
=
K
k=1
a
ik
p
k
+
i
(7.48)
where a
ik
is the fraction of spectrum p
k
present in the spectrum of pixel i and
i
is the
residual error of the linear model used. For every pixel, we have K unknowns, the mixing
proportions a
ik
. We have at the same time MN such equations, one for each pixel, assuming
that we are dealing with an M N image. As these are vector equations, and as mixing
proportions a
ik
are the same for all bands, if we have L bands, we have L linear equations
per pixel, for the K unknowns. If K L the system may be solved in the least square error
sense to yield the unknown mixing proportions.
To guarantee that the a
ik
values computed correspond indeed to mixing proportions, one
usually imposes the constraints that
K
k=1
a
ik
= 1 and that 0 a
ik
1. Of course, if the
linear mixing model were correct and if there were no intraclass variability and measurement
noise, these constraints would have not been necessary, as the solution of the problem would
automatically have obeyed them. However, natural materials exhibit quite a wide variation in
their spectra, even if they belong to the same class (eg there are many dierent spectra that
belong to class grass). So, to derive an acceptable solution, these constraints are necessary
in practice.
Some variations of this algorithm have been proposed, where these constraints are not
used, in order to allow for the case when the basic spectra p
k
are mixtures themselves. In
some other methods, the statistical properties of the intraclass variations of the spectra of
the various pure classes are used to enhance the system of equations that have to be solved.
However, these advanced techniques are beyond the scope of this book.
Can we use library spectra for the pure materials?
In some cases, yes. The spectra we observe with the multispectral camera consist of the
values of the true spectrum (the reectance function) integrated with the spectral sensitivity
function of the sensor. Library spectra usually have been obtained with instruments with
much narrower sensitivity curves, amounting to being sampled versions of the continuous
spectrum of the material. If the sensitivity curves of the cameras we use are very near delta
functions, we may use library spectra for the pure materials. We have, however, to take into
consideration the dierent conditions under which the spectra have been captured.
The spectra captured by Earth observation satellites, for example, may have to be cor-
rected for atmospheric and other eects. Such correction is known as radiometric correc-
tion. The way it is performed is beyond the scope of this book and it can be found in books
on remote sensing. Figure 7.5 shows schematically the relationship between camera based
spectra and library spectra.
690 Image Processing: The Fundamentals
s
p
e
c
t
r
a
l
v
a
l
u
e
wavelength
wavelength
sensitivity of band 1
s
e
n
s
i
t
i
v
i
t
y
v
a
l
u
e
wavelength
sensitivity of band 2
s
e
n
s
i
t
i
v
i
t
y
v
a
l
u
e
wavelength
sensitivity of band L
s
e
n
s
i
t
i
v
i
t
y
v
a
l
u
e
real spectrum
library spectrum
multiply
m
u
l
t
i
p
l
y
a
n
d
i
n
t
e
g
r
a
t
e
m
u
l
t
i
p
l
y
a
n
d
i
n
t
e
g
r
a
t
e
spectral
spectral
spectral
value in
band 1
value in
band 2
value in
band L
s
p
e
c
t
r
a
l
v
a
l
u
e
wavelength
wavelength
sample the
+integrate
Figure 7.5: The library spectra may not be the same as the spectral signatures we obtain
from a multispectral image.
How do we solve the linear spectral unmixing problem when we know the spectra
of the pure components?
Let us consider the case where we have two pure spectra recorded together, as a mixed
spectrum s(), where is the wavelength of the electromagnetic spectrum. Let us assume
that one of these pure spectra is p
1
() and the other is p
2
(). As wavelength is discretised,
we can consider these three functions, namely s(), p
1
() and p
2
(), as being represented
by vectors s, p
1
and p
2
, respectively, in an L-dimensional space, where L is the number of
samples we use to sample the range of wavelengths. We shall use bold letters to indicate the
discretised versions of the spectra, that are dened for specic values of , while keeping the
non-bold version for the continuous spectrum.
In order to understand how to solve the problem, we shall start rst from the case where
L = 2, that is, each spectrum consists of only two values. Figure 7.6 shows this case schemat-
ically. Several points may be noted from this gure:
(i) the pure spectra do not form an orthogonal system of axes;
(ii) each spectrum has been captured possibly by dierent instrument and dierent calibra-
tion, so the length of each vector is irrelevant.
Multispectral images 691
This means that:
(i) we cannot use orthogonal projections to identify the components of s along the two pure
spectra directions;
(ii) we cannot use the raw values of the pure spectra. We must normalise each one of them
to become a unit vector in this space.
p
1
p
2
O
A
B
s
C
2
1
Figure 7.6: The pure spectra and the mixed spectrum can be thought of as vectors in a
space that has as many axes as samples we use. We may consider the points where these
vectors point as representing the spectra. Vectors p
1
and p
2
are the pure spectra. Length
OA represents the amount of the rst pure spectrum that has to be mixed with the amount
of the second pure spectrum, represented by length OB, in order to produce the observed
spectrum OC.
Let us denote by p
1
and by p
2
the two normalised pure spectra. Let us also denote by
( p
11
, p
12
) the values of the rst pure spectrum for the sampling wavelengths
1
and
2
. In a
similar way, the two sample values of p
2
are ( p
21
, p
22
). Normalisation here means that
p
2
11
+ p
2
12
= 1 p
2
21
+ p
2
22
= 1 (7.49)
Note that the observed mixed spectrum s does not have to be normalised. Let us say that the
samples of s are (s
1
, s
2
). Once the pure spectra are normalised, we may write the equations
s
1
=
1
p
11
+
2
p
21
s
2
=
1
p
12
+
2
p
22
(7.50)
where
1
and
2
are the mixing factors of the two pure spectra p
1
() and p
2
(), respectively.
This is a system of two equations with two unknowns, that can be solved to nd values for
the unknowns
1
and
2
. Once these values have been specied, they can be normalised so
they sum up to 1, in order to become the sought mixing proportions:
1
1
1
+
2
2
2
1
+
2
(7.51)
Let us consider now the case where we use three samples to sample the range of wave-
lengths. Then the linear system of equations (7.50) will consist of three equations, one for
692 Image Processing: The Fundamentals
each sampling wavelength:
s
1
=
1
p
11
+
2
p
21
s
2
=
1
p
12
+
2
p
22
s
3
=
1
p
13
+
2
p
23
(7.52)
Now we have more equations than unknowns. This system will then have to be solved by
using the least squares error solution. This becomes very easy if we write the system of
equations in a matrix form. Let us dene vectors s and a, and matrix P:
s
_
_
s
1
s
2
s
3
_
_
a
_
1
2
_
P
_
_
p
11
p
21
p
12
p
22
p
13
p
23
_
_
(7.53)
Note that vector s is actually none other than the mixed spectrum. Matrix P is made up
from the normalised pure spectra, written as columns next to each other. Vector a is actually
made up from the unknown mixing coecients. System (7.52) then in matrix form may be
written as:
s = Pa (7.54)
As matrix P is not square, we cannot take its inverse to solve the system. We may, however,
multiply both sides of (7.54) with P
T
:
P
T
s = P
T
Pa (7.55)
Matrix P
T
P now is a 2 2 matrix and we can take its inverse, in order to solve for a:
a =
_
P
T
P
_
1
P
T
s (7.56)
This is the least square error solution of the system.
From this example, it should be noted that the maximum number of pure components we
may recover is equal to the number of sampling points we use for (or the number of bands).
As hyperspectral data are expected to have many more bands than components we wish to
recover, this constraint does not pose any problem.
In the general case, we assume that we have spectra that are made up from L sampling
points or bands, and that we wish to isolate K components. The mixed spectrum s() may
be thought of as a vector s of dimensions L 1. The vector of the unknown proportions is
a K 1 vector. Matrix P is made up from the normalised pure spectra, written one next to
the other as columns, and it has dimensions L K. We understand that K < L. The least
squares error solution will be given by (7.56). Matrix P
T
P is a KK matrix. We shall have
to nd its inverse. So, the algorithm is as follows.
Step 1: Identify the spectra of the pure substances you wish to identify in your mixture.
Say they are K.
Step 2: Normalise the spectra of the pure substances by dividing each one with the square
root of the sum of the squares of its values.
Step 3: Arrange the normalised spectra next to each other to form the columns of matrix
P. This matrix will be L K in size, where L is the numbers of samples we use to sample
the electromagnetic spectrum.
Multispectral images 693
Step 4: Compute matrix Q P
T
P. This matrix will be K K in size.
Step 5: Compute the inverse of matrix Q.
Step 6: Compute matrix R Q
1
P
T
.
Step 7: Compute the K 1 vector a Rs, where s is the mixed spectrum.
Step 8: Divide each element of a with the sum of all its elements. These are the mixing
proportions that correspond to the mixed substances.
Is it possible that the inverse of matrix Q cannot be computed?
This happens when matrix Q is singular or nearly singular. This will happen if two of the
rows or two of the columns of the matrix are almost identical. This is very unlikely, unless
some of the pure substances we want to disambiguate have very similar spectra. To check
that, we should not try to subtract one spectrum from the other in order to see whether we
get a residual near 0, because the two spectra may be calibrated dierently. Instead, we must
compute their so called spectral angular distance, or SAD, for short. Let us consider
again gure 7.6. All points along line OA have the same spectrum. However, two such points
may have very large Euclidean distance from each other, although they lie on the same line
OA. So, to estimate how dierent spectra p
1
and p
2
are, we must not measure the distance
between points A and B, but rather the angle between lines OA and OB. This angle is
computed as
SAD(p
1
, p
2
) cos
1
_
p
1
p
2
||p
1
|| ||p
2
||
_
(7.57)
where p
1
p
2
is the dot product of the two vectors and ||...|| means the square root of the
sum of squares of the elements of the vector it refers to. If this angle is near 0
o
or 180
o
, the
two spectra are very similar, and only one of the two should be used. The two substances
cannot be disambiguated. In practice, we do not need to take the inverse cosine to check.
Just check the fraction:
Sim
p
1
p
2
||p
1
|| ||p
2
||
(7.58)
If this is near +1, the two spectra cannot be disambiguated. The threshold of similarity has
to be checked by trial and error. For example, we might say that spectra with SAD smaller
than 5
o
or larger than 175
o
, might cause problems. These values correspond to values of Sim
larger than 0.996. So, if Sim is larger than 0.996, we may not be able to separate the two
substances.
What happens if the library spectra have been sampled at dierent wavelengths
from the mixed spectrum?
It is important that all the spectra we use have been sampled at the same wavelengths with
sensors that have comparable resolutions. If that is not the case, we must resample the library
spectra, by using, for example, linear interpolation, so that we have their sample values at the
same wavelengths as the mixed spectrum. An idea is to resample everything in the coarsest
resolution, rather than upsampling everything in the nest resolution. This is because:
(i) we have usually many more samples than substances we wish to disambiguate and so this
will not pose a problem to the solution of the system of equations, and
(ii) by upsampling, we assume that the sparsely sampled spectrum did not have any special
694 Image Processing: The Fundamentals
important feature at the positions of the missed samples, which were left out by the sparse
sampling; an assumption that might not be true.
What happens if we do not know which pure substances might be present in the
mixed substance?
Referring again to gure 7.6, any substance that is a mixture of pure spectra p
1
and p
2
will
have its spectrum along a line starting from the origin O and lying between lines OA and OB.
If line OC were outside the cone of directions dened by lines OA and OB, then it would be
obvious that spectrum s() is not a linear mixture of spectra p
1
() and p
2
(). This situation
is depicted in gure 7.7. However, it might also be the case that, if we were to have an extra
sampling wavelength,
3
, line OC would not even be on the plane dened by lines OA and
OB. This situation is depicted in gure 7.8.
p
2
p
1
O
C
s
B
A
1
Figure 7.7: When the mixed spectrum s is not a mixture of the pure spectra p
1
and p
2
, it
may still be expressed as a linear combination of them, but at least one of the weights will be
negative. In this example, the weight expressed by length OB is negative. A negative weight,
therefore, indicates that we have not selected the pure spectra correctly.
Let us consider rst the case depicted in gure 7.7. In this case, the situation may be
interpreted as if spectrum p
1
is a mixture of spectra s and p
2
. The algorithm will still yield a
solution, but coecient
2
will be negative. This is because the relative values of the spectra
imply that spectrum p
1
may be expressed as a linear combination of the other two spectra
with positive coecients. However, if we solve such a mixture equation in terms of s, at
least one of the coecients that multiplies one of the other spectra will be negative. In the
algorithm of page 692, therefore, if we observe negative values in a, we must realise that the
pure spectra we assumed are not the right ones, or not the only ones.
Let us consider next the case depicted in gure 7.8. In this case, the fact that point C
does not lie on the plane dened by points OAB, implies that there is at least one more
pure substance present in the mixture. The least square error algorithm, described above,
will nd the components of the projection OD of vector OC on the plane dened by the two
pure substances we consider. In this case, when we synthesise the observed mixed spectrum
using the components of the pure substances we have estimated, we shall have a large residual
error, equal to the distance CD of point C from the plane dened by the pure substances.
This error can be estimated as follows:
e ||s (
1
p
1
+
2
p
2
)|| (7.59)
Multispectral images 695
If this error is large, we may infer that we did not consider the right pure spectra. The
algorithm then should introduce another pure spectrum and be executed again, in order to
make sure that the observed mixed spectrum is synthesised with a satisfactorily low error.
This may be repeated as many times as we wish, or as many times as we have pure spectra
to add, until the error is acceptable. An acceptable error may be determined empirically,
from several sets of training data, but perhaps it can be specied as a fraction of the total
magnitude of spectrum s. For example, if the square root of the sum of the squares of the
values of s is ||s||, an acceptable error might be e
threshold
= 0.05||s||.
p
2
p
1
3
C
s
A
B
D
1
O
Figure 7.8: When the mixed spectrum s is not a mixture of the pure spectra p
1
and p
2
, it
may be lying outside the plane dened by the pure spectra. Then, the algorithm will yield
a solution which will be actually synthesising the spectrum represented by point D, ie the
projection of point C on the plane dened by the two spectra. The residual error, however,
represented by length CD, will be high.
How do we solve the linear spectral unmixing problem if we do not know the
spectra of the pure materials?
In this case the problem may be solved by applying, for example, independent component
analysis (ICA). Note that the idea here is to identify some basis spectra in terms of which
all observed spectra in the image can be expressed as linear combinations. The coecients of
the linear combinations are considered as the cover proportions of the pixels. Although this
approach will produce some basis spectra, there is no guarantee that they will correspond to
any real materials. In this case, one should not refer to them as pure class spectra, but
with the commonly used term end members. The end members may be identied with any
of the methods we considered in this book. For example, if we wish the end members to be
uncorrelated, we may use PCA. If we wish the end members to be independent, we may use
ICA.
Example 7.9
Formulate the problem of linear spectral unmixing so that it can be solved
using PCA.
Consider that we computed the autocovariance matrix C of the image according to
equation (7.6). For an L-band image, this matrix is LL and it is expected to have at
most L nonzero eigenvalues. However, most likely, it will have fewer, let us say that
696 Image Processing: The Fundamentals
its nonzero eigenvalues are K < L. This means that PCA denes a coordinate system
with K axes only, in terms of which every pixel of the image may be identied. This
is shown schematically in gure 7.9. Let us call the mean spectrum s
0
. This vector is
made up from the mean grey values of the dierent bands and it is represented by vector
O in gure 7.9. Let us also call the eigenvectors we identied with PCA e
k
, where
k = 1, 2, . . . , K. We may say that the tip of each eigenvector denes the spectrum of
one end member of the L-band image:
p
k
e
k
+s
0
(7.60)
Let us say that a pixel (m, n), represented by point P in gure 7.9, has spectrum
s(m, n) and, when it is projected on the axes identied by PCA, it has components
[
1
(m, n),
2
(m, n), . . . ,
K
(m, n)]. We may write then:
s(m, n) s
0
=
1
(m, n)e
1
+
2
(m, n)e
2
+ +
K
(m, n)e
K
(7.61)
Note that these coecients do not necessarily sum up to 1. So, they cannot be treated
as mixing proportions. We can substitute e
k
from (7.60) in terms of the end member
spectra, to obtain:
s(m, n) s
0
=
1
(m, n)(p
1
s
0
) +
2
(m, n)(p
2
s
0
) + +
K
(m, n)(p
K
s
0
)
=
1
(m, n)p
1
+
2
(m, n)p
2
+ +
K
(m, n)p
K
[
1
(m, n) +
2
(m, n) + +
K
(m, n)]s
0
(7.62)
We may, therefore, write:
s(m, n) =
1
(m, n)p
1
+
2
(m, n)p
2
+ +
K
(m, n)p
K
+s
0
_
1
K
k=1
k
(m, n)
_
(7.63)
Accepting as end member spectra s
0
, p
1
, p
2
, . . . , p
K
, the mixing proportions are: 1
K
k=1
k
(m, n),
1
(m, n),
2
(m, n), . . . ,
K
(m, n), respectively.
So, the algorithm is as follows:
Step 0: Compute the mean of each band and create vector s
0
.
Step 1: Compute the autocovariance matrix of the image using equation (7.6).
Step 2: Compute the eigenvalues and eigenvectors e
k
of this matrix. Say there are
K nonzero eigenvalues.
Step 3: Identify as end member spectra s
0
and p
k
s
0
+e
k
, for k = 1, 2, . . . , K.
Step 4: For each pixel (m, n), with spectrum s(m, n), identify its mixing coecients
by taking the product of [s(m, n) s
0
] e
k
k
(m, n).
Step 5: The mixing proportions for pixel (m, n) are
_
1
K
k=1
k
(m, n),
1
(m, n),
2
(m, n), . . . , alpha
K
(m, n)
_
,
corresponding to the K + 1 end member spectra s
0
, p
1
, p
2
, . . . , p
K
, respectively.
Multispectral images 697
Note that although e
1
, e
2
, . . . , e
K
are orthogonal to each other, the end member
spectra are not.
band i
b
a
n
d
L
b
a
n
d
1
b
a
n
d
3
b
a
n
d
2
p
3
e
3
P
p
2
e
1
A
3
O
A
2
A
1
e
2
p
1
Figure 7.9: Vector O is the mean spectrum s
0
. Vectors e
1
, e
2
and e
3
are the
eigenvectors of the covariance matrix that dene the axes of the PCA system. The
thick black vectors are the p
k
vectors which represent end member spectra. Lengths
A
1
, A
2
and A
3
correspond to coecients
1
,
2
and
3
for pixel P. Vector OP
represents the spectrum of the pixel and vector P represents the spectrum of the pixel
minus the mean spectrum.
Example 7.10
Formulate the problem of linear spectral unmixing so that you can identify
the end member spectra using ICA. How can you then use them to work
out the mixing proportions?
We may consider that we want to solve the cocktail party problem (see page 234),
where every pixel is a microphone that recorded a mixture of the pure signals we are
seeking to identify. We are also interested in the mixing proportions with which each
pixel recorded the pure signals. We shall solve the problem according to the formulation
implied by gure 3.16a and the discussion that starts on page 264. Each image referred
to in that gure and that discussion corresponds to a band we have here.
We shall adapt here the ICA algorithm on page 274 to our problem.
Let us assume that we have an image of size M N, consisting of L bands. Each
pixel is a vector of size L1. We have MN such vectors. Let us say that index m of
a pixel denes in which row of the image the pixel is and index n identies in which
column of the image the pixel is. Then each pixel may be indexed by a unique index
i (n 1)M + m. Here we assume that indices m and n take values from 1 to M
and from 1 to N, respectively.
Step 1: Write the vectors of the pixels one next to each other, to form a matrix P
that will be L MN in size.
698 Image Processing: The Fundamentals
Step 2: Compute the average of all vectors, say vector m, and remove it from each
vector, thus creating MN vectors p
i
, of size L 1.
Step 3: Compute the autocorrelation matrix of the new vectors. Let us call p
ki
the
kth component of vector p
i
. Then the elements of the autocorrelation matrix C are:
C
kj
=
1
MN
MN
i=1
p
ki
p
ji
for k, j = 1, 2, . . . , L (7.64)
Matrix C is of size L L and it may also be computed as:
C =
1
MN
P
P
T
(7.65)
Step 4: Compute the nonzero eigenvalues of C and arrange them in decreasing order.
Let us say that they are E. Let us denote by u
l
the eigenvector that corresponds to
eigenvalue
l
. We may write them next to each other to form matrix U.
Step 5: Scale the eigenvectors so that the projected components of vectors p
i
will have
the same variance along all eigendirections: u
l
u
l
/
l
.
You may write the scaled eigenvectors next to each other to form matrix
U, of size
L E.
Step 6: Project all vectors p
i
on the scaled eigenvectors to produce vectors q
i
, where
q
i
is an E 1 vector with components q
li
, given by:
q
li
= u
T
l
p
i
(7.66)
This step may be performed in a concise way as
Q
U
T
P, with vectors q
i
being the
columns of matrix
Q.
Step 7: Select randomly an E1 vector w
1
, with the values of its components drawn
from a uniform distribution, in the range [1, 1].
Step 8: Normalise vector w
1
so that it has unit norm: if w
i1
is the ith component of
vector w
1
, dene vector w
1
, with components:
w
i1
w
i1
_
j
w
2
j1
(7.67)
Step 9: Project all data vectors q
i
on w
1
, to produce the MN dierent projection
components:
y
i
= w
T
1
q
i
(7.68)
These components will be stored in a 1 MN matrix (row vector), which may be
produced in one go as Y w
T
1
Q.
Step 10: Update each component of vector w
1
according to
w
+
k1
= w
k1
1
MN
MN
i=1
G
(y
i
)
1
MN
MN
i=1
q
ki
G
(y
i
) (7.69)
Note that for G
(y) = tanh y, G
(y) dG
(y)/dy = 1 (tanh y)
2
.
Step 11: Normalise vector w
+
1
, by dividing each of its elements with the square root
Multispectral images 699
of the sum of the squares of its elements,
_
j
(w
+
j1
)
2
, so that it has unit magnitude.
Call the normalised version of vector w
+
1
, vector w
+
1
.
Step 12: Check whether vectors w
+
1
and w
1
are suciently close. If, say,
| w
+T
1
w
1
| > 0.9999, the two vectors are considered identical and we may adopt the
normalised vector w
+
1
as the rst axis of the ICA system.
If the two vectors are dierent, ie if the absolute value of their dot product is less than
0.9999, we set w
1
= w
+
1
and go to Step 9.
After the rst ICA direction has been identied, we proceed to identify the remaining
directions. The steps we follow are the same as Steps 712, with one extra step
inserted: we have to make sure that any new direction we select is orthogonal to the
already selected directions. This is achieved by inserting an extra step between Steps
10 and 11, to make sure that we use only the part of vector w
+
e
(where e = 2, . . . , E),
which is orthogonal to all previously identied vectors w
+
t
, for t = 1, . . . , e 1. This
extra step is as follows.
Step 10.5: When trying to work out vector w
+
e
, create a matrix B that con-
tains as columns all w
+
t
, t = 1, . . . , e 1, vectors worked out so far. Then, in Step
11, instead of using vector w
+
e
, use vector w
+
e
BB
T
w
+
e
.
The ICA basis vectors we identied correspond to (but are not the same as) the
end member spectra. Each of the w
+
e
vectors is of size E 1. The components of
each such vector are measured along the scaled eigenaxes of matrix C. They may,
therefore, be used to express vector w
+
e
in terms of the original coordinate system, via
vectors u
l
and u
l
. So, if we want to work out the end member spectra, the following
step may be added to the algorithm.
Step 15: We denote by v
e
the position vector of the tip of vector w
+
e
in the
original coordinate system:
v
e
= w
+
1e
u
1
+ w
+
2e
u
2
+ + w
+
Ee
u
E
+m
= w
+
1e
u
1
1
+ w
+
2e
u
2
2
+ + w
+
Ee
u
E
E
+m (7.70)
Here m is the mean vector we removed originally from the cloud of points to move
the original coordinate system to the centre of the cloud. All these vectors may
be computed simultaneously as follows. First, we create a diagonal matrix , with
1/
i
along the diagonal. Then, vectors v
e
are the columns of matrix V , given by
V = UW +M, where matrix M is made up from vector m repeated E times to form
its columns.
There are E vectors v
e
, and they are of size L1. These are the end member spectra.
Once we have identied the end member spectra, we can treat them as the library
spectra and use the algorithm on page 692 to work out the mixing proportions for each
pixel.
700 Image Processing: The Fundamentals
7.2 The physics and psychophysics
of colour vision
What is colour?
Colour is the subjective sensation we get when electro-magnetic radiation with wavelengths
in the range [400nm, 700nm] reaches our eyes
1
. This range of wavelengths is known as the
optical part of the electromagnetic spectrum.
Therefore, colour really has meaning only if the human visual system is involved. So, one
cannot discuss colour independent from the human visual system.
What is the interest in colour from the engineering point of view?
There are two major areas of research in relation to colour images:
viewing an object that emits its own light;
viewing an object that is illuminated by some light source.
In the rst case, researchers are interested in the way light sources could be combined to
create the desirable eect in the brain of the viewer. This line of research is related to the
manufacturing of visual display units, like television sets and computer screens.
In the second case, researchers are interested in the role colour plays in human vision and
in ways of emulating this aspect of vision by computers, for better articial vision systems.
Figure 7.10 shows schematically the stages involved in the process of viewing dark objects
and light sources. The only dierence really is in the presence of an illuminating source in
the former case. The role of the illuminating source in the appearance of the viewed object
is part of the research in relation to viewing dark objects.
What inuences the colour we perceive for a dark object?
Two factors: the spectrum of the source that illuminates the object and the reectance
function of the surface of the object.
This is expressed as
B() = I()R() (7.71)
where is the wavelength of the electromagnetic spectrum, I() is the spectrum of the
illumination falling on the surface patch, R() is the reectance function of the material the
object is made from (or its surface is painted with) and B() is the spectrum of the light that
reaches the sensor and is associated with the corresponding pixel of the image. (We assume
here that each pixel receives the light from a nite patch of the imaged surface, and that
there is one-to-one correspondence between pixels and viewed surface patches, so that either
we talk about pixel (x, y) or surface patch (x, y), we talk about the same thing.)
1
The visible range may be slightly broader. The limits stated are only approximately correct, as individuals
vary.
The physics and psychophysics of colour vision 701
Sensor Scene Processing Perception
Illuminating
Source
EXTERNAL WORLD
INTERNAL
INTERFACE
WITH
THE
OUTSIDE
WORLD
SENSATION
COLOUR PERCEPTION OF DARK OBJECTS
COLOUR PERCEPTION OF LUMINOUS OBJECTS
Sensor Processing Perception
INTERNAL
INTERFACE
WITH
THE
OUTSIDE
WORLD
SENSATION
Source
of
illumination
EXTERNAL WORLD
Figure 7.10: The process of vision is the same either we see a dark or a bright object. The
dierence is in the necessity of an illuminating source in order to view a dark object. The
eect of the illuminating source on the way the object appears is very signicant.
Note that this is a simplistic model of the eect of illumination. There are materials which
do not just reect the light which falls on them, but they partly absorb it and then re-emit it
in a dierent range of wavelengths than that of the incident light. The object will be visible
as long as B() is nonzero for at least some values of in the range [400nm, 700nm].
In order to avoid having to deal with continuous functions, we often sample the range
of wavelengths of the electromagnetic spectrum, and so functions B(), I() and R()
become continued valued discrete functions. For example, if we discretise the range of
[400nm, 700nm], considering samples every 10nm, functions I(), R() and B() become
31D vectors, the elements of which take continuous values.
The human visual system evolved in the natural world, where the illuminating source
was the sun and the materials it was aimed at distinguishing were the natural materials. Its
functionality, therefore, is very much adapted to cope with the variations of the daylight and
to deal with the recognition of natural materials.
What causes the variations of the daylight?
The quality of the daylight changes with the time of day, the season and the geographical
location.
702 Image Processing: The Fundamentals
How can we model the variations of the daylight?
Scientists recorded the spectrum of the daylight, between 300nm and 830nm, in a large
variety of locations, times of day and seasons. This way, they created an ensemble of versions
of the spectrum of the daylight. This ensemble of functions, I(), when considered only in
the [400nm, 700nm] range and sampled every 10nm, becomes an ensemble of 31D vectors
representing the daylight over the range of visible wavelengths. We may consider a 31D
space, with 31 coordinate axes, so that we measure one component of these vectors along one
of these axes. Then, each 31D vector will be represented by a point and all recorded spectra
will constitute a cloud of points in this space. It turns out that this cloud of points does not
occupy all 31 dimensions, but only two dimensions. In other words, it turns out that with a
careful choice of the coordinate axes, one may fully specify the position of each of these points
by using two coordinates only (the remaining 29 being 0), as opposed to having to know the
values of all 31 coordinates. We say that the data we have lie on a 2D manifold in the 31D
space. The word manifold means super-surface.
Figure 7.11 explains the idea of points in a 3D space, lying on a 2D manifold. As a
consequence, these points may be fully dened by two (instead of three) coordinates, if we
select the coordinate axes appropriately. The situation of going from 31D down to 2D is
analogous.
A
p
A
q
O
x
y
z
p
A
s
q
Figure 7.11: The data points in this 3D space happen to fall on a 2D plane . We may then
choose as our coordinate axes vectors p and q, on this plane, and vector s perpendicular to
the plane. In terms of the Opqs coordinate system, a point A may be represented by two
numbers only, namely the lengths OA
p
and OA
q
. Its coordinate with respect to the third
axis (s) is 0. The new basis vectors p and q will, of course, be functions of the old coordinates
(x, y, z), so we may write OA(x, y, z) = A
p
p(x, y, z) + A
q
q(x, y, z). Here (A
p
, A
q
) are two
numbers fully characterising A(x, y, z), which is the function expressed by point A, as long as
we know the basis functions p(x, y, z) and q(x, y, z), which are common for all data points.
One, therefore, may write
I() = i
1
() +a
2
i
2
() +a
3
i
3
() =
3
k=1
a
k
i
k
() (7.72)
The physics and psychophysics of colour vision 703
where i
1
() is the average spectrum of daylight, a
1
= 1, (a
2
, a
3
) are two scalars fully character-
ising the natural illuminating source, and i
2
() and i
3
() are some universal basis functions.
(nm) i
1
() i
2
() i
3
() (nm) i
1
() i
2
() i
3
()
300 0.04 0.02 0.04 570 96.0 1.6 0.2
310 6.0 4.5 2.0 580 95.1 3.5 0.5
320 29.6 22.4 4.0 590 89.1 3.5 2.1
330 55.3 42.0 8.5 600 90.5 5.8 3.2
340 57.3 40.6 7.8 610 90.3 7.2 4.1
350 61.8 41.6 6.7 620 88.4 8.6 4.7
360 61.5 38.0 5.3 630 84.0 9.5 5.1
370 68.8 42.4 6.1 640 85.1 10.9 6.7
380 63.4 38.5 3.0 650 81.9 10.7 7.3
390 65.8 35.0 1.2 660 82.6 12.0 8.6
400 94.8 43.4 1.1 670 84.9 14.0 9.8
410 104.8 46.3 0.5 680 81.3 13.6 10.2
420 105.9 43.9 0.7 690 71.9 12.0 8.3
430 96.8 37.1 1.2 700 74.3 13.3 9.6
440 113.9 36.7 2.6 710 76.4 12.9 8.5
450 125.6 35.9 2.9 720 63.3 10.6 7.0
460 125.5 32.6 2.8 730 71.7 11.6 7.6
470 121.3 27.9 2.6 740 77.0 12.2 8.0
480 121.3 24.3 2.6 750 65.2 10.2 6.7
490 113.5 20.1 1.8 760 47.7 7.8 5.2
500 113.1 16.2 1.5 770 68.6 11.2 7.4
510 110.8 13.2 1.3 780 65.0 10.4 6.8
520 106.5 8.6 1.2 790 66.0 10.6 7.0
530 108.8 6.1 1.0 800 61.0 9.7 6.4
540 105.3 4.2 0.5 810 53.3 8.3 5.5
550 104.4 1.9 0.3 820 58.9 9.3 6.1
560 100.0 0.0 0.0 830 61.9 9.8 6.5
Table 7.1: The mean (i
1
()) daylight spectrum and the rst two principal components of
daylight variations (i
2
() and i
3
()).
Table 7.1 gives functions i
k
(), for k = 1, 2, 3, as functions of , sampled every 10nm.
Figure 7.12 shows the plots of these three functions. These three functions were computed
by applying principal component analysis (PCA) to 622 daylight spectra. Each measured
spectrum recorded the spectral radiant power of daylight (measured in watts per square
metre) per unit wavelength interval (measured in metres). These absolute measurements
were then normalised, so that all spectra had value 100 for wavelength = 560nm. Such
spectra are referred to as relative spectral radiant power distributions. They were these
normalised spectra that were used in the PCA analysis. Note that by talking on daylight,
as opposed to solar light, we imply the total ambient light, not only the light a surface
receives directly from the sun, but also whatever light a surface receives reected from all
other surfaces near it. Note also, that the solar light is not only in the visible wavelengths in
704 Image Processing: The Fundamentals
the range [400nm, 700nm], but it extents well beyond this range. In table 7.1 we include also
the daylight spectra in the ultraviolet region (300nm to 400nm) and in the near infrared
region (700nm to 830nm).
300 400 500 600 700 800
20
0
20
40
60
80
100
120
(nm)
i
1
()
i
2
()
i
3
()
i
k
()
Figure 7.12: The mean (i
1
()) daylight spectrum and the rst two principal components
(i
2
() and i
3
()) of daylight variations, as functions of the wavelength of the electromagnetic
spectrum, measured in nanometres (nm).
Box 7.3. Standard illuminants
In order to facilitate the communication between colour vision scientists, the Commis-
sion Internationale de lEclairage (CIE) dened certain illumination spectra that are
considered as standard. These illumination spectra may be synthesised from the spec-
tra given in table 7.1. Each one of them is fully specied by its so called correlated
colour temperature. The correlated colour temperature of an illuminant is dened
as the temperature of the black body radiator, that produces a colour, which is per-
ceived to be most similar to the colour produced by the illuminant, at the same overall
brightness and under the same viewing conditions. The standard daylight illuminants
are referred to by the letter D and the rst two digits of their correlated colour temper-
ature. Table 7.2 lists them alongside their corresponding correlated colour temperature.
The most commonly used illuminant is the D
65
. The spectra of these illuminants may
be computed by using the following formulae, which depend only on the correlated
The physics and psychophysics of colour vision 705
colour temperature T
c
of each illuminant, measured in degrees Kelvin.
Step 0: Compute:
s
10
3
T
c
(7.73)
Step 1: Compute the following.
For correlated colour temperature in the range [4000, 7000]
x 4.6070s
3
+ 2.9678s
2
+ 0.09911s + 0.244063
(7.74)
while for correlated colour temperature in the range [7000, 25000]:
x 2.0064s
3
+ 1.9018s
2
+ 0.24748s + 0.237040
y 3.000x
2
+ 2.870x 0.275 (7.75)
Values (x, y) are the so called chromaticity coordinates of the illuminant.
Step 2: Compute:
d
0
0.0241 + 0.2562x 0.7341y
d
1
1.3515 1.7703x + 5.9114y
d
2
0.0300 31.4424x + 30.0717y (7.76)
Step 3: Compute:
a
2
d
1
d
0
a
3
d
2
d
0
(7.77)
Step 4: Synthesise the relative spectral radiant power of the illuminant, using:
I
D
() = i
1
() +a
2
i
2
() +a
3
i
3
() (7.78)
Table 7.2 lists in its last two columns the values of a
2
and a
3
that should be used in
conjunction with table 7.1, to produce the spectra of the three standard illuminants.
Illuminant T
c
(
o
K) a
2
a
3
D
55
5503 0.78482 0.19793
D
65
6504 0.29447 0.68754
D
75
7504 0.14520 0.75975
Table 7.2: Standard daylight illuminants, their corresponding correlated colour temper-
ature in degrees Kelvin, and the coecients with which the spectra of table 7.1 must be
synthesised to produce their relative spectral radiant power distributions, normalised
so that they have value 100 at = 560nm.
706 Image Processing: The Fundamentals
Example B7.11
Compute and plot the spectral radiant power distribution of the standard
illuminants.
These are shown in gure 7.13.
300 400 500 600 700 800
0
20
40
60
80
100
120
140
I
D
()
I
D55
I
D65
I
D75
Figure 7.13: The spectral radiant power distribution of illuminants D
55
(mid-morning
daylight), D
65
(noon daylight) and D
75
(north sky daylight).
What is the observed variation in the natural materials?
Following the same procedure as for the daylight, scientists recorded the spectra from a large
number of natural materials. It turned out, that a large number of the reectance functions
R() of the materials can be reproduced by the linear superposition of only six basis functions
and in many cases of only three basis functions. In other words, the spectra of the recorded
natural objects created an almost 3D manifold in the spectral space, which had a small
The physics and psychophysics of colour vision 707
thickness along three other directions. If we neglect this small thickness along the three extra
directions, we may write
R() = r
0
() +u
1
r
1
() +u
2
r
2
() +u
3
r
3
() =
3
j=0
u
j
r
j
() (7.79)
where u
0
= 1, r
0
() is the mean spectrum, (u
1
, u
2
, u
3
) is the triplet of scalars that characterise
almost fully the material of the observed surface, and r
1
(), r
2
() and r
3
() are some universal
basis functions.
(nm) m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
380 0.055 0.121 0.141 0.051 0.158 0.145 0.053 0.132 0.098 0.095 0.060 0.062
390 0.058 0.148 0.184 0.054 0.209 0.185 0.053 0.171 0.116 0.119 0.061 0.062
400 0.061 0.180 0.254 0.055 0.300 0.250 0.053 0.233 0.131 0.148 0.062 0.063
410 0.062 0.197 0.307 0.056 0.380 0.299 0.053 0.290 0.136 0.172 0.063 0.064
420 0.062 0.201 0.325 0.057 0.412 0.323 0.053 0.329 0.134 0.182 0.064 0.064
430 0.062 0.204 0.331 0.059 0.425 0.340 0.054 0.362 0.132 0.176 0.066 0.065
440 0.061 0.208 0.334 0.060 0.429 0.358 0.054 0.387 0.131 0.160 0.070 0.066
450 0.061 0.216 0.333 0.062 0.429 0.383 0.054 0.399 0.129 0.139 0.075 0.066
460 0.061 0.229 0.327 0.062 0.422 0.420 0.055 0.390 0.126 0.118 0.086 0.067
470 0.061 0.250 0.314 0.063 0.405 0.466 0.056 0.359 0.121 0.100 0.104 0.069
480 0.061 0.277 0.300 0.065 0.381 0.509 0.058 0.313 0.116 0.085 0.137 0.072
490 0.062 0.304 0.287 0.068 0.348 0.545 0.060 0.257 0.111 0.074 0.190 0.077
500 0.065 0.325 0.271 0.076 0.313 0.567 0.067 0.207 0.106 0.065 0.269 0.090
510 0.070 0.330 0.251 0.103 0.282 0.575 0.088 0.167 0.101 0.059 0.377 0.129
520 0.076 0.314 0.231 0.146 0.254 0.571 0.123 0.137 0.096 0.056 0.478 0.208
530 0.079 0.289 0.215 0.177 0.229 0.553 0.151 0.117 0.093 0.053 0.533 0.303
540 0.080 0.277 0.200 0.183 0.214 0.524 0.172 0.105 0.093 0.051 0.551 0.378
550 0.084 0.279 0.184 0.170 0.208 0.488 0.200 0.097 0.094 0.051 0.547 0.426
560 0.090 0.280 0.167 0.149 0.202 0.443 0.250 0.090 0.097 0.052 0.529 0.468
570 0.102 0.294 0.156 0.132 0.195 0.398 0.338 0.086 0.109 0.052 0.505 0.521
580 0.119 0.344 0.150 0.121 0.194 0.349 0.445 0.084 0.157 0.051 0.471 0.575
590 0.134 0.424 0.147 0.114 0.202 0.299 0.536 0.084 0.268 0.053 0.427 0.613
600 0.143 0.489 0.144 0.109 0.216 0.253 0.583 0.084 0.401 0.059 0.381 0.631
610 0.147 0.523 0.141 0.104 0.231 0.223 0.590 0.084 0.501 0.073 0.346 0.637
620 0.151 0.542 0.140 0.103 0.241 0.206 0.586 0.083 0.557 0.095 0.327 0.637
630 0.158 0.558 0.140 0.104 0.253 0.198 0.582 0.084 0.579 0.117 0.317 0.638
640 0.168 0.576 0.141 0.107 0.276 0.192 0.579 0.088 0.587 0.139 0.312 0.641
650 0.179 0.594 0.145 0.109 0.310 0.190 0.579 0.095 0.589 0.162 0.309 0.644
660 0.188 0.611 0.150 0.111 0.345 0.192 0.584 0.105 0.592 0.189 0.314 0.646
670 0.190 0.623 0.151 0.111 0.365 0.200 0.595 0.117 0.595 0.221 0.327 0.648
680 0.188 0.634 0.147 0.111 0.367 0.212 0.611 0.133 0.601 0.256 0.345 0.653
690 0.185 0.651 0.141 0.112 0.363 0.223 0.628 0.155 0.607 0.295 0.362 0.661
700 0.186 0.672 0.134 0.117 0.362 0.231 0.644 0.186 0.614 0.336 0.376 0.671
710 0.192 0.693 0.131 0.123 0.368 0.233 0.653 0.218 0.617 0.370 0.380 0.679
720 0.200 0.710 0.133 0.129 0.377 0.229 0.654 0.255 0.617 0.404 0.378 0.684
730 0.214 0.728 0.144 0.135 0.394 0.229 0.659 0.296 0.618 0.445 0.380 0.689
Table 7.3: The rst 12 spectra of the Macbeth colour checker.
708 Image Processing: The Fundamentals
Plate Ia shows a chart of 24 basic spectra, which have been selected to represent the
variety of spectra observed in natural materials. This chart, called the Macbeth colour
chart, is used by scientists to calibrate their imaging systems. The spectra of these 24 square
patches are given in tables 7.3 and 7.4. We may perform PCA on these 24 spectra, as follows.
(nm) m13 m14 m15 m16 m17 m18 m19 m20 m21 m22 m23 m24
380 0.064 0.051 0.049 0.056 0.155 0.114 0.199 0.182 0.152 0.109 0.068 0.031
390 0.074 0.053 0.048 0.053 0.202 0.145 0.259 0.240 0.197 0.133 0.076 0.032
400 0.094 0.054 0.047 0.052 0.284 0.192 0.421 0.367 0.272 0.163 0.083 0.032
410 0.138 0.055 0.047 0.052 0.346 0.235 0.660 0.506 0.330 0.181 0.086 0.032
420 0.192 0.057 0.047 0.052 0.362 0.259 0.811 0.566 0.349 0.187 0.088 0.033
430 0.239 0.059 0.047 0.054 0.355 0.284 0.863 0.581 0.356 0.191 0.09 0.033
440 0.280 0.062 0.047 0.056 0.334 0.316 0.877 0.586 0.360 0.194 0.091 0.033
450 0.311 0.066 0.046 0.059 0.305 0.352 0.884 0.587 0.361 0.195 0.091 0.032
460 0.312 0.074 0.045 0.066 0.275 0.390 0.890 0.588 0.361 0.194 0.090 0.032
470 0.281 0.092 0.045 0.081 0.246 0.426 0.894 0.587 0.359 0.193 0.090 0.032
480 0.231 0.123 0.044 0.108 0.217 0.446 0.897 0.585 0.357 0.192 0.089 0.032
490 0.175 0.176 0.044 0.154 0.189 0.444 0.901 0.585 0.356 0.192 0.089 0.032
500 0.126 0.244 0.044 0.228 0.167 0.423 0.905 0.586 0.356 0.192 0.089 0.032
510 0.090 0.306 0.045 0.339 0.148 0.384 0.906 0.586 0.357 0.192 0.089 0.032
520 0.066 0.338 0.045 0.464 0.126 0.335 0.908 0.587 0.358 0.192 0.089 0.032
530 0.052 0.334 0.046 0.557 0.107 0.280 0.907 0.586 0.358 0.192 0.089 0.032
540 0.045 0.316 0.047 0.613 0.099 0.229 0.907 0.586 0.358 0.192 0.089 0.032
550 0.041 0.293 0.049 0.647 0.101 0.183 0.910 0.586 0.358 0.192 0.089 0.032
560 0.039 0.261 0.052 0.670 0.103 0.144 0.910 0.585 0.358 0.192 0.089 0.032
570 0.038 0.228 0.058 0.692 0.109 0.117 0.912 0.587 0.359 0.192 0.089 0.032
580 0.038 0.196 0.071 0.709 0.136 0.100 0.912 0.587 0.359 0.192 0.089 0.031
590 0.038 0.164 0.103 0.722 0.199 0.089 0.912 0.586 0.359 0.191 0.088 0.031
600 0.038 0.134 0.177 0.731 0.290 0.081 0.910 0.584 0.357 0.190 0.088 0.031
610 0.039 0.114 0.313 0.739 0.400 0.076 0.912 0.582 0.356 0.189 0.087 0.032
620 0.040 0.102 0.471 0.747 0.514 0.074 0.915 0.580 0.354 0.188 0.087 0.032
630 0.040 0.096 0.586 0.753 0.611 0.073 0.916 0.577 0.351 0.186 0.086 0.032
640 0.040 0.092 0.651 0.759 0.682 0.073 0.918 0.575 0.348 0.184 0.085 0.032
650 0.041 0.090 0.682 0.764 0.726 0.073 0.920 0.573 0.346 0.182 0.084 0.032
660 0.042 0.090 0.698 0.769 0.754 0.075 0.921 0.572 0.344 0.181 0.084 0.032
670 0.043 0.093 0.707 0.772 0.769 0.076 0.920 0.570 0.341 0.179 0.083 0.032
680 0.043 0.099 0.715 0.777 0.779 0.076 0.921 0.569 0.339 0.178 0.083 0.032
690 0.043 0.104 0.725 0.784 0.789 0.075 0.923 0.568 0.337 0.177 0.082 0.032
700 0.044 0.109 0.734 0.792 0.800 0.072 0.926 0.567 0.334 0.175 0.081 0.032
710 0.047 0.111 0.740 0.798 0.807 0.072 0.928 0.567 0.333 0.174 0.081 0.032
720 0.050 0.110 0.744 0.801 0.813 0.073 0.929 0.565 0.331 0.173 0.081 0.032
730 0.055 0.110 0.748 0.805 0.821 0.079 0.932 0.565 0.330 0.172 0.080 0.032
Table 7.4: The last 12 spectra of the Macbeth colour checker.
Let us call one such spectrum y
i
(j), where i identies the spectrum and j represents the
wavelength. So, i takes values from 1 to 24 and j takes values from 1 to 36, as the spectra
The physics and psychophysics of colour vision 709
are given for 36 sample values of . To perform PCA with these 24 spectra, we apply the
following algorithm.
Step 1: Average the spectra to derive the average spectrum: m(j).
Step 2: Remove the average spectrum from each spectrum to produce z
i
(j) y
i
(j) m(j).
Step 3: Write these spectra one next to the other, like columns, to form a 36 24 matrix Z.
Step 4: Compute ZZ
T
and divide it with 24 to produce the covariance matrix C of the data.
Step 5: Compute the eigenvalues of C and arrange them in decreasing order.
Step 6: Compute the eigenvectors of the rst E eigenvalues.
Step 7: Write the eigenvectors one under the other as rows to form matrix A of size E 36.
Step 8: Multiply matrix A with matrix Z. The result will be matrix B, of size E 24.
Step 9: Consider the ith column of B: these are the coecients with which the correspond-
ing eigenvectors have to be multiplied to create an approximation of the ith spectrum. So,
to obtain all the reconstructed spectra, multiply matrix A
T
with matrix B. The result will
be a 36 24 matrix S.
Step 10: Add to each column of matrix S the mean spectrum you constructed in Step 1.
This will give you matrix T. Each column of this matrix will be an approximated spectrum.
Figure 7.14 is the plot of the obtained rst 10 eigenvalues at Step 5. We can see that the
values become negligible after the 6th eigenvalue. In fact, the rst 12 eigenvalues are:
1.25518 0.43560 0.14310 0.01703 0.00818 0.00403
0.00296 0.00066 0.00052 0.00021 0.00010 0.00008
Table 7.5 lists the eigenvectors constructed at Step 6 that correspond to the rst seven
eigenvalues. The rst six of them and the mean spectrum are the basis spectra appropriate
for most natural materials. The rst three of them are functions r
1
(), r
2
() and r
3
() of
equation (7.79). They are plotted in gure 7.15.
Figure 7.14: The most signicant eigenvalues of the reectance functions given in tables 7.3
and 7.4. It is obvious that three basis functions must be enough to express all reectance
functions adequately.
710 Image Processing: The Fundamentals
(nm) r
1
() r
2
() r
3
() r
4
() r
5
() r
6
() r
7
()
380 0.020506 0.045631 0.062610 0.006043 0.048060 0.050640 0.266012
390 0.027144 0.066618 0.093315 0.009897 0.074313 0.077337 0.377349
400 0.047397 0.112613 0.146710 0.031212 0.125815 0.162142 0.461304
410 0.075593 0.168199 0.196682 0.077020 0.169757 0.223400 0.326176
420 0.090415 0.201078 0.219932 0.113649 0.184428 0.200833 0.105131
430 0.092957 0.216809 0.226791 0.135498 0.168369 0.141992 0.052165
440 0.091228 0.227789 0.224879 0.147404 0.122310 0.070923 0.170243
450 0.089344 0.237964 0.216419 0.145758 0.053601 0.011351 0.260007
460 0.089652 0.246586 0.198617 0.114778 0.030063 0.090446 0.272769
470 0.092113 0.251957 0.169411 0.048899 0.121394 0.162200 0.204037
480 0.096595 0.253135 0.128053 0.035329 0.197271 0.208657 0.080951
490 0.102764 0.250006 0.072187 0.133246 0.245160 0.210257 0.056241
500 0.110416 0.241373 0.002939 0.228315 0.244492 0.172907 0.155331
510 0.119776 0.224381 0.082501 0.295450 0.177094 0.095415 0.173485
520 0.130614 0.201407 0.170474 0.298266 0.067715 0.000175 0.106118
530 0.140259 0.177817 0.232556 0.243246 0.037258 0.096363 0.005653
540 0.148489 0.157773 0.263867 0.172285 0.114689 0.157946 0.065931
550 0.156040 0.139805 0.274118 0.098272 0.167139 0.174888 0.101521
560 0.162992 0.119845 0.275821 0.000402 0.199050 0.150914 0.128401
570 0.171146 0.095495 0.275670 0.123683 0.199459 0.061773 0.126913
580 0.180477 0.063528 0.261364 0.250903 0.144595 0.082575 0.046310
590 0.191169 0.024333 0.221068 0.343416 0.027525 0.212329 0.113841
600 0.200775 0.016684 0.160472 0.365896 0.103183 0.226100 0.221752
610 0.209076 0.055401 0.091462 0.294318 0.198683 0.111366 0.175590
620 0.215824 0.090016 0.027793 0.174284 0.248265 0.043674 0.048815
630 0.220383 0.114568 0.019745 0.072717 0.252560 0.161250 0.045361
640 0.223360 0.129491 0.052857 0.009023 0.226191 0.226345 0.084185
650 0.224806 0.137030 0.075422 0.024809 0.183108 0.243736 0.082530
660 0.225317 0.141354 0.090500 0.048227 0.131459 0.226676 0.059165
670 0.225224 0.144604 0.097500 0.070445 0.074263 0.174581 0.036580
680 0.225582 0.148451 0.098673 0.092785 0.017715 0.099392 0.019465
690 0.226353 0.153486 0.100108 0.114874 0.045528 0.004157 0.007038
700 0.226948 0.158965 0.103640 0.131776 0.119655 0.101870 0.000460
710 0.226636 0.162309 0.110174 0.137542 0.184263 0.193003 0.004018
720 0.225187 0.164183 0.120560 0.138495 0.256005 0.271950 0.000404
730 0.223405 0.164828 0.132616 0.140173 0.337889 0.360059 0.006705
Table 7.5: The basis reectance functions that correspond to the rst 7 eigenvalues of the 24
spectra of tables 7.3 and 7.4.
To calculate the error of the approximation of each spectrum, we compute the dierence
matrix P T. Then we compute the sum of the squares of each column of this matrix. These
are the square errors with which each spectrum is calculated. Tables 7.6 and 7.7 list the
errors with which each of the 24 reectance functions may be approximated, for each number
of the retained eigenvalues.
The physics and psychophysics of colour vision 711
400 500 600 700
0.4
0.3
0.2
0.1
0
0.1
0.2
r
j
()
(nm)
r
1
()
r
2
()
r
3
()
Figure 7.15: The three most signicant eigenvectors of the reectance functions given in tables
7.3 and 7.4. The reectance functions of most natural materials may be expressed, with a
high degree of accuracy, as linear combinations of these three basis functions plus the mean
reectance function.
Finally, tables 7.87.11 give the coecients with which the rst three eigenfunctions r
j
()
have to be multiplied to reproduce the reectance functions of the Macbeth colour checker.
The total square errors of these reproductions are given by the rows labelled with E = 3 in
tables 7.6 and 7.7.
What happens to the light once it reaches the sensors?
Each sensor responds dierently to dierent wavelengths. This is expressed by the sensor
sensitivity function S(). The sensor will record the following value for patch (x, y):
Q =
_
+
0
S()I()R()d (7.80)
If we substitute I() and R() from equations (7.72) and (7.79), respectively, we obtain:
Q =
_
+
0
S()
3
k=1
a
k
i
k
()
3
j=0
u
j
r
j
()d (7.81)
712 Image Processing: The Fundamentals
E m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
1 0.099 0.212 0.329 0.082 0.380 1.206 0.996 0.416 0.856 0.261 0.666 0.910
2 0.012 0.058 0.041 0.074 0.209 0.119 0.100 0.220 0.137 0.166 0.665 0.239
3 0.004 0.041 0.016 0.005 0.011 0.058 0.075 0.042 0.040 0.086 0.044 0.047
4 0.002 0.033 0.010 0.004 0.010 0.007 0.012 0.036 0.037 0.077 0.012 0.007
5 0.001 0.027 0.010 0.004 0.004 0.006 0.018 0.012 0.010 0.010 0.001 0.004
6 0.001 0.007 0.003 0.002 0.004 0.003 0.003 0.004 0.010 0.002 0.001 0.004
7 0.001 0.001 0.000 0.001 0.003 0.002 0.002 0.003 0.007 0.001 0.001 0.003
Table 7.6: The square errors with which the rst 12 spectra of the Macbeth colour checker
may be approximated, when the number of retained eigenvalues is E.
E m13 m14 m15 m16 m17 m18 m19 m20 m21 m22 m23 m24
1 0.194 0.330 1.881 1.084 1.250 0.739 1.690 0.757 0.239 0.044 0.024 0.055
2 0.094 0.288 0.306 0.572 0.746 0.064 0.030 0.018 0.019 0.018 0.022 0.029
3 0.051 0.036 0.098 0.006 0.029 0.052 0.011 0.013 0.019 0.012 0.008 0.009
4 0.026 0.004 0.021 0.005 0.016 0.038 0.010 0.005 0.010 0.006 0.003 0.005
5 0.024 0.003 0.015 0.003 0.012 0.008 0.010 0.005 0.010 0.005 0.002 0.003
6 0.024 0.003 0.013 0.002 0.002 0.001 0.010 0.002 0.005 0.002 0.000 0.002
7 0.001 0.003 0.001 0.001 0.001 0.001 0.002 0.000 0.001 0.000 0.000 0.002
Table 7.7: The square errors with which the last 12 spectra of the Macbeth colour checker
may be approximated, when the number of retained eigenvalues is E.
u
j
m1 m2 m3 m4 m5 m6
u
1
0.958251 0.890052 0.700982 1.091990 0.021951 0.011939
u
2
0.294399 0.393275 0.536469 0.092743 0.413129 1.042620
u
3
0.088996 0.130014 0.158925 0.263010 0.445110 0.247418
Table 7.8: The coecients with which the rst three r
j
() functions have to be multiplied, in
order to reproduce the reectance functions of samples m1m6 of the Macbeth colour checker,
with square error given in table 7.6, for E = 3.
u
j
m7 m8 m9 m10 m11 m12
u
1
0.624948 0.797700 0.391938 0.797558 0.170537 0.986600
u
2
0.946811 0.441647 0.848104 0.307806 0.029733 0.819337
u
3
0.157086 0.421878 0.311466 0.283262 0.788307 0.437923
Table 7.9: The coecients with which the rst three r
j
() functions have to be multiplied,
in order to reproduce the reectance functions of samples m7m12 of the Macbeth colour
checker, with square error given in table 7.6, for E = 3.
u
j
m13 m14 m15 m16 m17 m18
u
1
1.287590 0.914657 0.363834 1.641580 0.879216 0.852870
u
2
0.316123 0.203325 1.255130 0.715840 0.710057 0.821927
u
3
0.208002 0.502611 0.456081 0.752165 0.846522 0.110028
Table 7.10: The coecients with which the rst three r
j
() functions have to be multiplied,
in order to reproduce the reectance functions of samples m13m18 of the Macbeth colour
checker, with square error given in table 7.7, for E = 3.
The physics and psychophysics of colour vision 713
u
j
m19 m20 m21 m22 m23 m24
u
1
3.255370 1.452050 0.208123 0.688102 1.233390 1.531160
u
2
1.288540 0.859597 0.468760 0.161850 0.040420 0.159808
u
3
0.137171 0.073706 0.008776 0.075409 0.117581 0.142883
Table 7.11: The coecients with which the rst three r
j
() functions have to be multiplied,
in order to reproduce the reectance functions of samples m19m24 of the Macbeth colour
checker, with square error given in table 7.7, for E = 3.
We may then exchange the order of summation and integration and also apply the integral
only to the factors that depend on :
Q =
3
k=1
3
j=0
a
k
u
j
_
+
0
S()i
k
()r
j
()d
. .
S
kj
=
3
k=1
3
j=0
S
kj
a
k
u
j
(7.82)
We may assume familiarity with the illumination, ie innate knowledge of a
k
, as well as
innate knowledge of functions S
kj
. This implies that we have only three unknowns in (7.82),
namely parameters u
1
, u
2
and u
3
, which characterise the viewed surface. Then, if we have
three sensors with dierent sensitivity functions S(), we have three equations for the three
unknowns, and we must be able to solve for them to help recognise the surface. This innate
knowledge of functions a
k
and S
kj
, which helps us recognise colours during dawn or dusk,
and in summer or in winter equally well, is one of the properties of the human visual system
and it is called colour constancy.
Is it possible for dierent materials to produce the same recording by a sensor?
Yes, because a sensor, as shown by equation (7.80), integrates the light it receives. Dierent
combinations of photon energies may result in the same recorded values. Under the same
illumination, materials characterised by dierent reectance functions may produce identical
recordings by a sensor. The dierent reectance functions, that produce identical sensor
recordings, are known as metamers of the sensor.
Example 7.12
Using the approximated spectra of the colour chart of Plate I, construct
images which show how these spectra will appear under average daylight,
viewed by a camera that has three types of sensor, with spectral sensitivity
curves given by
S() = e
(m)
2
2s
2
(7.83)
with m = 650 and s = 25, for the R sensor type, m = 550 and s = 25, for the
G sensor type and m = 450 and s = 25, for the B sensor type.
First, we discretise the spectral sensitivity curves of the three types of sensor of the
camera, as follows. For each set of values m and s, we allow to take values from
714 Image Processing: The Fundamentals
380 to 730 in steps of 10. We call the 36 1 vectors we create this way S
R
, S
G
and
S
B
. We show them plotted in gure 7.16.
400 500 600 700
0
0.2
0.4
0.6
0.8
1
S
R
()
C()
T
j
P () P ()
2 3
1
0 0
j=1 j=2 j=3
1
P ()
()
Figure 7.17: Graphs that fully dene a colour system. The graph at the bottom left is the
spectrum of the ideal white.
Example 7.13
Use the information supplied by the graphs in gure 7.17 to decide what
the tristimulus values are, for a spectral distribution with intensity 1 at
wavelength and 0 everywhere else.
In gure 7.18 we show on the left the spectral distribution we wish to create and on the
right we reproduce the graph at the bottom right of gure 7.17. In this graph we draw
a vertical line at wavelength and we read from the vertical axis the coordinates of
the points where it cuts the three curves: T
1
(), T
2
() and T
3
(). In theory, these are
the intensities with which the three primary lights should be blended in order to form
a spectrum that a human may nd identical with the spectrum that has 0 energy at all
wavelengths, except at wavelength = , at which it has energy 1. However, since
T
3
() is negative, such a colour matching cannot happen in practice. It is obvious,
that when the tristimulus graph was being created, the third light had to be projected
with intensity T
3
() on the same spot as the monochromatic spectrum C(), in order
to create the same colour sensation as the one created by the rst and second lights
projected together, with intensities T
1
() and T
2
(), respectively.
The physics and psychophysics of colour vision 717
T
j
T
1
T
2
T
3
C()
1
0
0
j=1 j=2 j=3
()
()
()
()
Figure 7.18: To create the same sensation as that of a monochromatic stimulus at
wavelength , we have to project together the three primary lights with intensities
T
1
(), T
2
() and T
3
(), read from the plot on the right.
Do all people require the same intensities of the primary lights to match the same
monochromatic reference stimulus?
No. Usually, colour matching experiments are repeated several times with dierent people
with normal colour vision. Then average values are reported.
Who are the people with normal colour vision?
They are people who have in their retinas all three types of cone sensor. They are called
trichromats. People who lack one or two types of cone are called dichromats or monochro-
mats, respectively. If we also exclude trichromats who deviate signicantly in colour percep-
tion from the majority of trichromats, we are left with about 95% of the population that may
be considered as having normal colour vision and, thus, as being representative enough to be
used in colour matching experiments.
What are the most commonly used colour systems?
They are the CIE RGB, the XY Z and the recently proposed sRGB, which has become the
standard for digital image processing.
What is the CIE RGB colour system?
The primary lights of the CIE RGB colour system are monochromatic with wavelengths
700.0nm, 546.1nm and 435.8nm. However, in order to dene them fully, we have to dene
also the unit intensity for each one, so that when we say the viewer needed so much intensity
from each to match a particular spectrum, we actually mean so many units from each. To
718 Image Processing: The Fundamentals
emulate viewing the ideal white, all wavelengths have to be used simultaneously. As we want
the ideal white to have constant energy at all wavelengths, we normalise the tristimulus curves
so that the values of each one of them sum up to 1. For this to be so, the intensities (formally
called radiant power) of the three primary lights of the CIE RGB colour system have to
be in ratios 72.0962 : 1.3791 : 1.
Figure 7.19 shows the colour matching functions of the CIE RGB system, and table
7.12 lists their values. Note the presence of negative tristimulus values, which imply that
the particular light has to be added to the reference monochromatic stimulus to match the
blending of the other two lights. These tristimulus values have been worked out by averaging
the colour matching functions of many human subjects with normal colour vision. So, they
constitute the colour matching functions of the so called standard observer.
400 500 600 700
0.1
0
0.1
0.2
0.3
0.4
T
R
()
T
B
()
T
G
()
Figure 7.19: The colour matching functions (tristimulus values) of the CIE RGB colour
system.
What is the XY Z colour system?
The XY Z colour system is a transformed version of the CIE RGB colour system, so the
tristimulus values in this system are non-negative numbers. To understand how it is dened,
we have to consider rst the representation of colours in 3D and in 2D spaces.
How do we represent colours in 3D?
As every colour can be represented by its tristimulus values, we may represent the colours in
a 3D space. Along each axis, we measure the intensity with which the corresponding primary
light has to be projected to create the colour in question. Colours further away from the
origin of the axes (in the positive octant) appear brighter.
How do we represent colours in 2D?
Since brightness increases along the main diagonal of the colour space, we may then separate
the brightness component of the light from the pure colour component, ie represent a colour
with intensity-invariant tristimulus values. This means that two values only must be enough
to represent a colour. The representation of a colour by two normalised tristimulus values
can be plotted using two axes. Such a plot is the so called chromaticity diagram.
The physics and psychophysics of colour vision 719
(nm) T
R
() T
G
() T
B
() (nm) T
R
() T
G
() T
B
()
380 0.00003 0.00001 0.00117 580 0.24526 0.13610 0.00108
390 0.00010 0.00004 0.00359 590 0.30928 0.09754 0.00079
400 0.00030 0.00014 0.01214 600 0.34429 0.06246 0.00049
410 0.00084 0.00041 0.03707 610 0.33971 0.03557 0.00030
420 0.00211 0.00110 0.11541 620 0.29708 0.01828 0.00015
430 0.00218 0.00119 0.24769 630 0.22677 0.00833 0.00008
440 0.00261 0.00149 0.31228 640 0.15968 0.00334 0.00003
450 0.01213 0.00678 0.31670 650 0.10167 0.00116 0.00001
460 0.02608 0.01485 0.29821 660 0.05932 0.00037 0.00000
470 0.03933 0.02538 0.22991 670 0.03149 0.00011 0.00000
480 0.04939 0.03914 0.14494 680 0.01687 0.00003 0.00000
490 0.05814 0.05689 0.08257 690 0.00819 0.00000 0.00000
500 0.07173 0.08536 0.04776 700 0.00410 0.00000 0.00000
510 0.08901 0.12860 0.02698 710 0.00210 0.00000 0.00000
520 0.09264 0.17468 0.01221 720 0.00105 0.00000 0.00000
530 0.07101 0.20317 0.00549 730 0.00052 0.00000 0.00000
540 0.03152 0.21466 0.00146 740 0.00025 0.00000 0.00000
550 0.02279 0.21178 0.00058 750 0.00012 0.00000 0.00000
560 0.09060 0.21178 0.00058 760 0.00006 0.00000 0.00000
570 0.16768 0.17087 0.00135 770 0.00003 0.00000 0.00000
Table 7.12: The colour matching functions (tristimulus values) of the CIE RGB colour
system.
What is the chromaticity diagram?
It is a 2D space where we represent colours.
Figure 7.20 shows the colour space that corresponds to some colour system. Tristimulus
values (T
1
, T
2
, T
3
), that characterise a particular colour in this colour system, represent a
single point in this 3D space. Let us dene the normalised tristimulus values as:
t
j
T
j
T
1
+T
2
+T
3
for j = 1, 2, 3 (7.84)
The normalised tristimulus values (t
1
, t
2
, t
3
), obviously, satisfy the equation:
t
1
+t
2
+t
3
= 1 (7.85)
This equation represents a plane in the 3D space that cuts the three axes at point 1 along each
axis, as shown in gure 7.20. So, the normalisation we perform in (7.84) eectively collapses
all points in the 3D space onto that plane. Figure 7.21 shows the part of the plane that is
in the octant of the 3D space dened by the three positive semi-axes. It is an equilateral
triangle, called the Maxwell colour triangle. Any point on this plane can be dened by
two numbers. All we have to do is to choose a coordinate system on it. Figure 7.21 shows
one such possible coordinate system.
720 Image Processing: The Fundamentals
0
1
1
1
d
i
r
e
c
t
i
o
n
o
f
i
n
c
r
e
a
s
i
n
g
i
n
t
e
n
s
i
t
y
A
plane
(t ,t ,t )
1
(t ,t )
A
1
3
2 2
1
1
t +t +t =1
1 2
P ( )
P ( )
P ( )
(T ,T ,T )
3 2
2 3
3
Figure 7.20: A colour space. Increasing the overall intensity (T
1
+T
2
+T
3
) simply slides the
points along lines emanating from the origin of the axes. So, each such line represents a single
colour experienced with a variety of intensities. The coordinates of the intersection of the ray
with the oblique plane shown here are used to represent the colour.
Because of the oblique position of this plane with respect to the original axes, it is not
trivial to dene the 2D coordinates of a point on it, in terms of the coordinates of the point
in the original 3D space (see example 7.14). On the other hand, we observe that any point
A on the Maxwell triangle, in gure 7.20, corresponds to a unique point A
in the right-angle
triangle formed by axes OP
1
and OP
2
. So, it is not necessary to use the points on the Maxwell
triangle to represent colours. We might as well use the points of the right-angle triangle at
the bottom. This plane has a natural coordinate system, namely the two axes OP
1
and OP
2
.
Any point with coordinates (T
1
, T
2
, T
3
) in the 3D space corresponds uniquely to a point with
coordinates (t
1
, t
2
, t
3
), which in turn corresponds uniquely to a point in the bottom triangle
with coordinates (t
1
, t
2
). It is this bottom right-angle triangle with the two axes OP
1
and
OP
2
, along which we measure t
1
and t
2
, respectively, that is called chromaticity diagram.
R G
x
O
y B
Figure 7.21: All points that represent colours in the 3D colour space of gure 7.20 may be
projected radially on this triangle, that passes through the unit points of the three axes. One
may dene a 2D coordinate system on this plane, to dene uniquely all colours (points). This
is the Maxwell colour triangle.
The physics and psychophysics of colour vision 721
Box 7.4. Some useful theorems from 3D geometry
In order to reason in a 3D colour space, we may nd useful some theorems from 3D
geometry.
Theorem 1: Three points in space dene a plane (see gure 7.22a).
Theorem 2: Two intersecting lines in space dene a plane (see gure 7.22b).
Theorem 3: If a point P belongs to plane and a line l
1
lies on , then a line l
2
passing through P and parallel to l
1
also lies in plane (see gure 7.22c).
Theorem 4: If a line l is orthogonal to two non-parallel lines and , that lie in a
plane , then l is perpendicular to plane (see gure 7.23a).
Theorem 5: If a line l is perpendicular to a plane , it is orthogonal to all lines that
lie in the plane (see gure 7.23b).
Theorem 6: Consider a line l that belongs to a plane and a point P that does not
belong to plane . Consider the line that passes through P and is perpendicular to
plane . Consider point P
. Then line P
is
perpendicular to line l. This is known as the theorem of the three perpendiculars
(see gure 7.23c).
l
1
l
2
l
2
l
1
P
(a) (b)
(c)
A
B
C
Figure 7.22: (a) Three points A, B and C dene a plane . (b) Two intersecting lines
l
1
and l
2
dene a plane . (c) If a point P and a line l
1
lie in a plane , line l
2
, that
passes through P and is parallel to l
1
, also lies in plane .
l
P
P
l
l
(c) (a) (b)
Figure 7.23: (a) If a line l is orthogonal to two non-parallel lines and that lie on a
plane , then line l is perpendicular to plane . (b) If a line l is perpendicular to plane
, then it is orthogonal to all lines that lie in the plane. (c) P
MC
2
DM
2
=
_
3/2.
From right angle isosceles triangle AOM, which has OA = OM = 1, we deduce
that its height DO =
2/2.
By denition, CO = 1
Since TH = PF, TH = b, the normalised coordinate of point P along the B
axis.
724 Image Processing: The Fundamentals
From (7.87), we can then work out:
DT =
TH
CO
DC =
_
3
2
b PE =
_
3
2
b (7.88)
Also from (7.87), we deduce that:
DH =
TH
CO
DO =
2
2
b (7.89)
Then:
OH = ODHD =
2
2
2
2
b =
2
2
(1 b) (7.90)
By construction, we know that point F in plane OAM has coordinates (r, g). Then,
OF =
_
r
2
+g
2
. From the right angle triangle OHF then, we work out the other
coordinate we need for point P:
HF =
_
OF
2
OH
2
=
_
r
2
+g
2
1
2
(1 b)
2
(7.91)
Remember, however, that r +g +b = 1. So, we may write 1 b = r +g. So:
HF =
_
2r
2
+ 2g
2
(r +g)
2
2
=
_
2r
2
+ 2g
2
r
2
g
2
2rg
2
=
_
r
2
+g
2
2rg
2
=
|r g|
2
(7.92)
In summary, the coordinates of point P on the Maxwell colour triangle, in terms of
its (r, g, b) coordinates, are
_
g r
2
,
_
3
2
b
_
(7.93)
where we have allowed positive and negative values along axis DM (axis x in gure
7.21).
What is the chromaticity diagram of the CIE RGB colour system?
We may rst identify in the chromaticity diagram the points that represent monochromatic
colours (ie colours of a single wavelength). To do that, we consider all triplets of tristimulus
values (T
R
, T
G
, T
B
) of table 7.12, work out from them the corresponding (t
R
, t
G
) values, and
plot them in the chromaticity diagram. The result is shown in gure 7.25.
We can make the following observations from this gure. The line that is parametrised by
the wavelength forms an arc in this chromaticity space. This arc is called spectrum locus.
The physics and psychophysics of colour vision 725
The straight line that connects the points that correspond to the two extreme wavelengths
(400nm and 700nm) is called the purple line. At the resolution of gure 7.25, the purple
line appears to coincide with the t
R
axis, but this is not the case. The arc, made up from the
tristimulus values of the monochromatic stimuli, is outside the right-angle triangle dened
by the acceptable range (ie 0 t
B
, t
G
1) of tristimulus values. This is due to the negative
tristimulus values present in table 7.12 and it implies that the selected primary lights are not
good enough to allow us to create any colour sensation by simply blending them. That is
the reason the XY Z colour system was introduced: it was in response to the desire to have
only positive tristimulus values, ie to have all points of the spectrum locus inside the shaded
right-angle triangle shown in gure 7.25. In addition, we wish one of the axes of this new
colour system to reect the perceived brightness of the colour.
1.5 1 0.5 0 0.5 1 1.5
0.5
0
0.5
1
1.5
2
t
R
t
G
=700
=600
=550
=500
=400
Figure 7.25: The chromaticity diagram that corresponds to the CIE RGB colour system.
Only colours with (t
R
, t
G
) values inside the shaded triangle are physically realisable. The
curve plotted is parameterised by the wavelength . For each value of we consider the
tristimulus values triplet (T
R
, T
G
, T
B
), from which we deduce the coordinates (t
R
, t
G
), of the
point that is plotted here to form the curve.
How does the human brain perceive colour brightness?
We mentioned earlier that perceived brightness increases as we move away from the origin
of the CIE RGB axes in the 3D colour space, and along the rst octant of the space (the
positive branches of the colour axes). However, it is wrong to consider that overall brightness
is measured along the main diagonal of the colour space. We require dierent intensities of
dierent colours in order to sense equal brightness. As a result, the axis, along which the
perceived overall brightness changes, is dierent from the main diagonal of the CIE RGB
space. So, the overall perceived brightness is not measured along the diagonal passing through
the ideal white point (the point with coordinates (1/3, 1/3, 1/3)). The direction, along which
726 Image Processing: The Fundamentals
the perceived brightness is measured, is orthogonal to a plane passing through the origin of
the axes in the 3D colour space, called the alychne. So, if we want a system that removes
the component of brightness from the tristimulus values, and thus represents colours by two
numbers only, instead of collapsing all triplets of values on the Maxwell triangle of gure 7.20,
we must dene another plane, orthogonal to the direction of increased perceived brightness,
the alychne plane.
How is the alychne dened in the CIE RGB colour system?
It is the plane with equation:
T
R
+ 4.5907T
G
+ 0.0601T
B
= 0 (7.94)
How is the XY Z colour system dened?
It is dened as a linear transformation of the CIE RGB colour system, so that:
(i) the T
X
, T
Y
and T
Z
tristimulus values are all positive;
(ii) the equal energy ideal white point, with coordinates (1/3, 1/3, 1/3) in the chromaticity
diagram, remains unchanged;
(iii) the Y component corresponds to the axis of the overall perceived brightness;
(iv) the points enclosed by the spectrum locus and the purple line ll as much as possible the
right-angle triangle, dened by points (0, 0), (1, 0) and (0, 1) in the chromaticity plane;
(v) the chromaticity coordinate t
B
(), for between 650nm and 700nm, is treated as 0. This
implies that the spectrum locus for these wavelengths is assumed to coincide with the line
that passes through points (1, 0) and (0, 1), ie the line with equation t
R
+t
G
= 1.
2 1 0 1
0.5
0
0.5
1
1.5
2
2.5
3
A
B
C
t
G
t
R
. W
Figure 7.26: The chromaticity diagram of the CIE RGB colour space. Line AB is the
intersection of the alychne plane with plane T
B
= 0. Line AC is drawn tangent to the
spectrum locus, near the green (medium) wavelengths. Line BC has been drawn tangent to
the spectrum locus, near the red (long) wavelengths. W is the ideal white.
The physics and psychophysics of colour vision 727
Figure 7.26 shows the chromaticity diagram of the CIE RGB space. The line joining
points A and B is the line where the alychne plane intersects the (T
R
, T
G
) plane. The line
joining points B and C was drawn to full condition (v), ie it is tangent to the spectrum
locus at the point that corresponds to = 700nm. Point B, being the intersection of these
two lines, has coordinates (t
R
, t
G
) = (1.275, 0.278), and, by inference, it represents colour
(T
R
, T
G
, T
B
) = (1.275, 0.278, 0.003). The line passing through points A and C was drawn so
that condition (iv) was fullled, ie it was chosen to be tangent to the locus spectrum at about
the medium wavelengths. Point C turns out to have coordinates (t
R
, t
G
) = (1.740, 2.768),
and by inference it represents colour (T
R
, T
G
, T
B
) = (1.740, 2.768, 0.028). Finally, point
W represents colour (T
R
, T
G
, T
B
) = (1/3, 1/3, 1/3). The transformation between the two
colour systems is assumed linear, ie it has the form:
_
_
T
X
T
Y
T
Z
_
_
=
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
_
_
T
R
T
G
T
B
_
_
(7.95)
To full condition (i) the elements of the transformation matrix should be chosen so that
point B is mapped to point (1, 0, 0) and point C to point (0, 1, 0). Further, to full condition
(ii), point W should be mapped to point (1/3, 1/3, 1/3). This gives us a system of 9 equations
for the 9 unknowns a
ij
. This way, one may specify the transformation between the two colour
systems.
400 500 600 700
0
0.5
1
1.5
2
T
X
T
Y
T
Z
Figure 7.27: The tristimulus values of the XY Z colour system.
If a colour is represented in the CIE RGB colour space by (T
R
, T
G
, T
B
), its representation
in the XY Z space is given by values (T
X
, T
Y
, T
Z
), which can be calculated from (T
R
, T
G
, T
B
)
using the following formulae:
T
X
= 2.769T
R
+ 1.752T
G
+ 1.130T
B
T
Y
= T
R
+ 4.591T
G
+ 0.060T
B
T
Z
= 0.000T
R
+ 0.057T
G
+ 5.594T
B
(7.96)
Figure 7.27 shows the plots of T
X
(), T
Y
() and T
Z
(), computed by using equations (7.96)
in conjunction with the entries of table 7.12 for the (T
R
, T
G
, T
B
) values
3
. Note that none of
the values is now negative.
3
In practical applications, we use a simpler notation: we forget the term tristimulus values and we simply
refer to the values that dene a colour as its (R, G, B) or its (X, Y, Z) values.
728 Image Processing: The Fundamentals
Example B7.15
Work out the transformation between the chromaticity coordinates of the
XY Z system in terms of the chromaticity coordinates of the CIE RGB
system.
Using (7.96), we may work out t
X
as:
t
X
=
2.769T
R
+ 1.752T
G
+ 1.13T
B
2.769T
R
+1.752T
G
+1.13T
B
+T
R
+4.591T
G
+0.06T
B
+0.057T
G
+5.594T
B
=
2.769T
R
+ 1.752T
G
+ 1.130T
B
3.769T
R
+ 6.400T
G
+ 6.784T
B
(7.97)
If we divide the numerator and denominator with T
R
+T
G
+T
B
, we shall have on the
right-hand side the chromaticity values of the CIE RGB colour system:
t
X
=
2.769t
R
+ 1.752t
G
+ 1.130t
B
3.769t
R
+ 6.400t
G
+ 6.784t
B
(7.98)
In a similar way, we may compute t
Y
and t
Z
. Calculations with more signicant
gures yield the more accurate results given below:
t
X
=
0.49000t
R
+ 0.31000t
G
+ 0.20000t
B
0.66697t
R
+ 1.13240t
G
+ 1.20063t
B
t
Y
=
0.17697t
R
+ 0.81240t
G
+ 0.01063t
B
0.66697t
R
+ 1.13240t
G
+ 1.20063t
B
t
Z
=
0.00000t
R
+ 0.01000t
G
+ 0.99000t
B
0.66697t
R
+ 1.13240t
G
+ 1.20063t
B
(7.99)
Note that equation (7.98) is identical with the rst of equations (7.99), if we divide its
numerator and denominator with 5.651, which is the sum of the coecients of T
Y
in
(7.96).
What is the chromaticity diagram of the XY Z colour system?
We use the tristimulus values of the XY Z colour system, shown in gure 7.27, to work out
the (t
X
, t
Y
) values in order to plot the spectrum locus and the purple line of this colour
system. The result is shown in Plate Ib. Note that the spectrum locus and the purple line
now are entirely inside the right-angle triangle of permissible values. The interesting thing
here is that the primary lights, ie the vertices of the right-angle triangle of permissible values,
do not correspond to any real colours. These primaries are said to be imaginary.
The physics and psychophysics of colour vision 729
How is it possible to create a colour system with imaginary primaries, in practice?
It is not possible. The imaginary primaries are simply mathematical creations. The tristim-
ulus values of such a colour system are not worked out by performing physical experiments
with human subjects, but by simply transforming the tristimulus values of colour systems
that have non-imaginary primary lights.
Example B7.16
Show that vector V (1, 4.5907, 0.0601) is perpendicular to the alychne
plane.
Consider a point on the alychne plane with position vector r = (r
1
, r
2
, r
3
). Since the
alychne plan passes through the origin of the axes, the position vector of every point
on the alychne plane lies on the alychne plane. For vector V to be orthogonal to the
alychne plane, it must be orthogonal to any line that lies on the plane. So, for vector
V to be orthogonal to the alychne plane, its dot product with the position vector of any
point on the alychne plane must be 0: r
1
+4.5907r
2
+0.0601r
3
= 0. This is true since
(r
1
, r
2
, r
3
) lies on the alychne and thus its coordinates satisfy equation (7.94).
White
Primary
light 3
screen
light 2
Primary
light 1
Primary
Figure 7.28: The three primary lights are projected onto a white screen. At the place where
the three lights overlap, a human observer sees white. The intensities with which the three
beams have to be projected depends on the observer. Dierent people require dierent inten-
sities to say that they see white.
What if we wish to model the way a particular individual sees colours?
In all experiments done to identify the colour matching function, the viewer was asked to
match perceived colours, not to name the colour she sees. It is possible, for example, that
what a certain viewer considers to be pure white, another viewer may identify as creamy
or magnolia white, etc. This indicates that dierent individuals see colours dierently.
730 Image Processing: The Fundamentals
Let us say that in order to create the sensation of white in the brain of a particular viewer,
the viewer has to see simultaneously:
spectrum P
1
() with intensity A
1
(W);
spectrum P
2
() with intensity A
2
(W);
spectrum P
3
() with intensity A
3
(W).
We may then write:
W() =A
1
(W)P
1
() +A
2
(W)P
2
() +A
3
(W)P
3
() (7.100)
Here sign = means creates the sensation of (see gure 7.28).
If dierent viewers require dierent intensities of the primary lights to see white,
how do we calibrate colours between dierent viewers?
In order to avoid subjectivity, we usually refer all colours to the so called reference white, ie
the white of the particular viewer we have in mind. In other words, we calibrate the colours
so that they are all measured in terms of the corresponding white of the intended viewer.
In order to compare colours then, in an objective way, we say that we correct them for the
corresponding reference white.
How do we make use of the reference white?
Let us say that in order to create the sensation of an arbitrary colour C() in the brain of the
person, for whom equation (7.100) holds, we need to project at the same spot simultaneously:
spectrum P
1
() with intensity A
1
(C);
spectrum P
2
() with intensity A
2
(C);
spectrum P
3
() with intensity A
3
(C).
We may then write:
C() =A
1
(C)P
1
() +A
2
(C)P
2
() +A
3
(C)P
3
() (7.101)
We postulate here that the normalised quantities A
j
(C)/A
j
(W) are viewer-independent
and we call them tristimulus values of colour C():
T
j
(C)
A
j
(C)
A
j
(W)
for j = 1, 2, 3 (7.102)
Note that these tristimulus values are meant to correspond to the objective tristimulus
values of the colour system used as shown in gure 7.17. Using equation (7.102) then into
equation (7.101), we may write:
C() =
3
j=1
T
j
(C)A
j
(W)P
j
() (7.103)
In this expression T
j
(C) are the objective tristimulus values that fully and objectively
characterise a colour in relation to a given colour system. Factor A
j
(W) in (7.103) is the only
The physics and psychophysics of colour vision 731
subjective quantity that makes the equation specic for a particular viewer. If we omit it, we
shall have an equation that refers to the standard observer, as long as the tristimulus values
used are the average colour matching functions of many people with normal vision. The use
of the tristimulus values of the standard observer will create dierent colour sensations for
dierent viewers, according to each viewers internal hardware, ie their personalised way of
seeing colours.
Example 7.17
Use the information supplied by the graphs in gure 7.17 to decide what
the tristimulus values are, for a spectral distribution with intensity X()
at wavelength and 0 everywhere else.
At this stage, we are not concerned with the sensor that will view the created spectrum.
So, it is assumed that the three spectra are blended linearly, so, the tristimulus values
we need now are: X()T
s1
(), X()T
s2
() and X()T
s3
().
We may, therefore, write:
X()( ) =
3
j=1
T
sj
()X()P
j
() (7.104)
For a specic viewer, this equation has the form
X()( ) =
3
j=1
X()T
sj
()A
j
(W)P
j
() (7.105)
and corresponds to equation (7.103).
Example 7.18
Use the information supplied by the graphs in gure 7.17, on page 716,
to decide what the tristimulus values are for a given spectral distribution
X().
In this case, equation (7.104) has the form:
X() =
3
j=1
T
j
(X)P
j
() (7.106)
Let us integrate both sides of equation (7.104) with respect to :
_
+
0
X()( )d =
3
j=1
_
+
0
X()T
sj
()dP
j
() (7.107)
732 Image Processing: The Fundamentals
By performing the integration, we obtain:
X() =
3
j=1
P
j
()
_
+
0
X()T
sj
()d (7.108)
By comparing then equations (7.106) and (7.108), we deduce the tristimulus values
that are needed to create the particular colour sensation in the brain of the standard
observer:
T
j
(X) =
_
+
0
X()T
sj
()d (7.109)
For a particular observer, equation (7.106) will take the form:
X() =
3
j=1
T
j
(X)A
j
(W)P
j
() (7.110)
How is the sRGB colour system dened?
This colour system was introduced to comply with the way electronic monitors reproduce
colour. It is dened from the XY Z colour system, so that:
(i) the (x, y, z) coordinates of its R light are (0.64, 0.33, 0.03);
(ii) the (x, y, z) coordinates of its G light are (0.30, 0.60, 0.10);
(iii) the (x, y, z) coordinates of its B light are (0.15, 0.06, 0.79);
(iv) the (x, y, z) coordinates of its reference white are those of the standard illuminant D
65
,
namely (0.3127, 0.3290, 0.3583).
Assuming that X, Y and Z vary in the range [0, 1], the transformation matrix from
(X, Y, Z) values to the sRGB values is:
_
_
R
G
B
_
_
=
_
_
3.2419 1.5374 0.4986
0.9692 1.8760 0.0416
0.0556 0.2040 1.0570
_
_
_
_
X
Y
Z
_
_
(7.111)
Values outside the range [0, 1] are clipped. These (R, G, B) values, however, are not the
values used to display a digital image. In order to take into consideration the nonlinearities of
the displaying monitors, these values are further transformed as follows. If R, G, B < 0.00304,
dene:
R
= 12.92R G
= 12.92G B
= 12.92B (7.112)
If R, G, B 0.00304, dene:
R
= 1.055R
1
2.4
0.055 G
= 1.055G
1
2.4
0.055 B
= 1.055B
1
2.4
0.055 (7.113)
The (R
, G
, B
) multiplied with 255 and rounded to the nearest integer are the values
used to display the image on the monitor.
The physics and psychophysics of colour vision 733
Does a colour change if we double all its tristimulus values?
The sensation of colour depends on the relative brightness of the primary lights. So, the
perceived colour will not change if all its tristimulus values are multiplied with the same
scalar. However, the overall brightness of the seen light will increase. Of course, this is right
up to a point. When the brightness becomes too high, the sensors become saturated and the
perception of colour is aected. At the other extreme, if the overall brightness becomes too
low, our sensors are not triggered and we see only tones of grey, rather than colours. This
is called scotopic vision and it is performed with the help of dierent sensors from those
responsible for the so called photopic vision. The sensors responsible for the photopic vision
are the cones we discussed earlier, while the sensors responsible for the scotopic vision are
called rods.
How does the description of a colour, in terms of a colour system, relate to the
way we describe colours in everyday language?
In everyday language we describe colours by using terms like shade and depth, eg we say
shades of blue, deep red, etc. Formally, these terms are known as hue and saturation.
For example, all shades of red, from light pink to deep red, are recognised as red of dierent
degrees of saturation: pink has low saturation, while deep red has high saturation. On
the other hand, we distinguish the hue of green, from the hue of red, irrespective of their
saturations.
How do we compare colours?
If a colour is represented by a point in a 3D space, the dierence of two colours can be found
by using a metric to measure the distance of the two points in that space.
What is a metric?
A metric is a function that takes as input two points, A and B, and gives as output their
distance. A function f(A, B) has to have certain properties in order to be a metric:
1. the output of the function has to be a non-negative number, with f(A, A) = 0 and
f(A, B) > 0 if A = B;
2. the output of the function should not change if we change the order of the two inputs:
f(A, B) = f(B, A);
3. if C is a third point, f(A, C) +f(C, B) f(A, B) (triangle inequality).
The most well known metric is the Euclidean metric. According to the Euclidean metric, the
distance of two points A and B, in 3D, is given by
d
2
(A, B)
_
(x
A
x
B
)
2
+ (y
A
y
B
)
2
+ (z
A
z
B
)
2
(7.114)
where (x
A
, y
A
, z
A
) and (x
B
, y
B
, z
B
) are the coordinates of points A and B, respectively. This
metric sometimes is also referred as the L
2
norm. In image processing, we also use the L
1
norm, which is also known as the city block metric, and is dened as:
d
1
(A, B) |x
A
x
B
| +|y
A
y
B
| +|z
A
z
B
| (7.115)
734 Image Processing: The Fundamentals
When matching frequency spectra, we often use the L
max{|x
A
x
B
|, |y
A
y
B
|, |z
A
z
B
|} (7.116)
Can we use the Euclidean metric to measure the dierence of two colours?
For some colour systems, the Euclidean metric may be used; for some others it may not.
For example, consider the chromaticity diagram shown in Plate Ib. We can easily see that
the distance of points A and B is much bigger than the distance of points C and D; and
yet, points A and B represent two colours much more similar to each other than the colours
represented by points C and D. This means that, the Euclidean distance of points in this
chromaticity diagram does not reect the perceived dierence between the colours represented
by the points in this space. We say that this colour space is not perceptually uniform. This
statement is based on the tacit assumption that the metric we use to measure distances is the
Euclidean metric. So, more correctly, one should say: this colour space is not perceptually
uniform with respect to the Euclidean metric. It is of course possible, to dene a metric
that measures distances in a way that d(A, B) turns out to be much smaller than d(C, D),
in accordance with the perceived dierence between the colours these points represent. Such
a metric would rely on rather complicated functions of the coordinates of the points the
distance of which it measures. Instead, once the perceived dierences between colours have
been worked out (by psychophysical experiments), the colour space itself may be transformed
into a new colour space, which is perceptually uniform with respect to the Euclidean metric.
The transformation, obviously, has to be nonlinear and such that, for example, it brings closer
points A and B, while it may spread more points C and D. Once the colours are represented
in such a space, the Euclidean metric may be used to measure their dierences.
Which are the perceptually uniform colour spaces?
There are two perceptually uniform colour spaces, the Luv and the Lab. They are both dened
in terms of the XY Z colour system and the coordinates of the reference white, denoted by
(X
n
, Y
n
, Z
n
). The transformation formulae between the XY Z values
4
of a colour and the Luv
or the Lab values are empirical formulae, that have been worked out so that the Euclidean
metric may be used in these spaces to measure perceived colour dierences.
How is the Luv colour space dened?
L
_
_
116
_
Y
Y
n
_
1/3
16 if
Y
Y
n
> 0.008856
903.3
Y
Y
n
if
Y
Y
n
0.008856
u 13L(u
n
)
v 13L(v
n
) (7.117)
4
Note that for the sake of simplicity we use here (X, Y, Z) instead of (T
X
, T
Y
, T
Z
).
The physics and psychophysics of colour vision 735
The auxiliary functions that appear in these equations are dened as:
u
4X
X + 15Y + 3Z
u
n
4X
n
X
n
+ 15Y
n
+ 3Z
n
v
9Y
X + 15Y + 3Z
v
n
9Y
n
X
n
+ 15Y
n
+ 3Z
n
(7.118)
How is the Lab colour space dened?
L
_
_
116
_
Y
Y
n
_
1/3
16 if
Y
Y
n
> 0.008856
903.3
Y
Y
n
if
Y
Y
n
0.008856
a 500
_
f
_
X
X
n
_
f
_
Y
Y
n
__
b 200
_
f
_
Y
Y
n
_
f
_
Z
Z
n
__
(7.119)
Function f that appears in these formulae is dened as
f(x) =
_
_
_
x
1/3
if x > 0.008856
7.787x +
4
29
if x 0.008856
(7.120)
How do we choose values for (X
n
, Y
n
, Z
n
)?
The reference white depends on the conditions under which the image was captures. Often
these are unknown. As a rule of thumb, we may try to be consistent with the reference white
that was assumed when we transformed from the RGB to the XY Z values.
So, if the (X, Y, Z) values were produced from CIE RGB values, using transformation
(7.96), the equal energy white E should be used with X
n
= Y
n
= Z
n
= 100. If the (X, Y, Z)
values were produced from sRGB values, using the inverse of the transformation described
on page 732 and the inverse of matrix (7.111), the reference white should be the standard
illuminant D
65
, normalised so that Y
n
= 100. The chromaticity coordinates (x
n
, y
n
) of a
standard illuminant are given by equations (7.75) of Box 7.3, on page 704. From them we
can work out the value of z
n
as z
n
= 1 x
n
y
n
. Since we want to have Y
n
= 100, and since
y
n
= Y
n
/(X
n
+ Y
n
+ Z
n
), we work out that we must set X
n
+ Y
n
+ Z
n
= 100/y
n
. Then we
derive X
n
= x
n
(X
n
+ Y
n
+ Z
n
) and Z
n
= z
n
(X
n
+ Y
n
+ Z
n
). Table 7.13 lists the values of
(X
n
, Y
n
, Z
n
) for each one of the commonly used standard illuminants and for the ideal white.
How can we compute the RGB values from the Luv values?
We start with formula (7.117), to work out Y from the values of the reference white and L,
as follows:
Y = Y
n
_
L + 16
116
_
3
(7.121)
736 Image Processing: The Fundamentals
Channel D
55
D
65
D
75
E
X
n
95.6509 95.0155 94.9423 100
Y
n
100.0000 100.0000 100.0000 100
Z
n
92.0815 108.8259 122.5426 100
Table 7.13: Reference white values for (X
n
, Y
n
, Z
n
), for the three standard illuminants.
After we compute Y , we check whether Y/Y
n
> 0.008856. If this is correct, we accept the
value we computed. If it is not correct, we recompute Y using:
Y =
LY
n
903.3
(7.122)
From the other two equations (7.117), we can easily work out u
and v
:
u
=
u
13L
+u
n
v
=
v
13L
+v
n
(7.123)
Then, knowing Y and u
, from the second of equations (7.118), we can work out X+15Y +3Z:
X + 15Y + 3Z =
9Y
v
(7.124)
We can use this value into the rst of equations (7.118), to work out X:
X =
1
4
(X + 15Y + 3Z)u
(7.125)
Next, from the knowledge of X, Y and X + 15Y + 3Z, we can work out Z:
Z =
1
3
_
9Y
v
X 15Y
_
=
3Y
v
X
3
5Y (7.126)
The nal step is to transform the XY Z values to RGB using the inverse of transformation
(7.96), on page 727:
_
_
R
G
B
_
_
=
_
_
0.4184 0.1587 0.0828
0.0912 0.2524 0.0157
0.0009 0.0026 0.1786
_
_
_
_
X
Y
Z
_
_
(7.127)
How can we compute the RGB values from the Lab values?
We assume here that we know the Lab values and the values of the reference white (X
n
, Y
n
, Z
n
)
used to obtain them. We want to work out the corresponding (X, Y, Z) values and from them
the corresponding (R, G, B) values.
From the value of L, we can work out the value of Y/Y
n
, in the same way as in the case
of the inverse Luv transformation.
Once we know the value of Y/Y
n
, we can work out the value of f(Y/Y
n
), using equation
(7.120). Then, we may use the second of formulae (7.119), to work out the value of f(X/X
n
):
f
_
X
X
n
_
=
a
500
+f
_
Y
Y
n
_
(7.128)
The physics and psychophysics of colour vision 737
Assuming that X/X
n
> 0.008856, we can work out that:
X
X
n
=
_
f
_
X
X
n
__
3
(7.129)
If after this calculation X/X
n
turns out to be indeed larger than 0.008856, we accept this
value. If not, we rework it as:
X
X
n
=
1
7.787
_
f
_
X
X
n
_
4
29
_
(7.130)
This way, we derive the value of X.
Then, from the last of equations (7.119), we can work out the value of Z in a similar way:
f
_
Z
Z
n
_
= f
_
Y
Y
n
_
b
200
(7.131)
Assuming that Z/Z
n
> 0.008856, we can work out that:
Z
Z
n
=
_
f
_
Z
Z
n
__
3
(7.132)
If after this calculation Z/Z
n
turns out to be indeed larger than 0.008856, we accept this
value. If not, we rework it as:
Z
Z
n
=
1
7.787
_
f
_
Z
Z
n
_
4
29
_
(7.133)
This way, we derive the value of Z.
Finally, we have to apply the inverse transform to go from the XY Z values to the RGB
values.
How do we measure perceived saturation?
In the Luv colour space saturation is dened as
S
uv
= 13
_
(u
n
)
2
+ (v
n
)
2
(7.134)
Saturation cannot be dened in the Lab colour space.
How do we measure perceived dierences in saturation?
This is straightforward by applying formula (7.134) for the two colours and taking the dier-
ence of the two values.
How do we measure perceived hue?
We do not measure perceived hue, but perceived dierence in hue. This is dened in terms
of the perceived hue angle.
738 Image Processing: The Fundamentals
How is the perceived hue angle dened?
The hue angle, h
uv
or h
ab
, is dened using angle , computed from
uv
tan
1
|v|
|u|
or
ab
tan
1
|b|
|a|
(7.135)
for the Luv and the Lab colour spaces, respectively. Then, hue angle h
uv
or h
ab
is worked
out as
h
ij
_
ij
if Numerator > 0 Denominator > 0
360
o
ij
if Numerator < 0 Denominator > 0
180
o
ij
if Numerator > 0 Denominator < 0
180
o
+
ij
if Numerator < 0 Denominator < 0
(7.136)
where ij stands for uv or ab, Numerator stands for v or b, and Denominator stands for u
or a.
How do we measure perceived dierences in hue?
The perceived dierence in hue is dened by considering the total perceived dierence of two
colours and analysing it as:
(Total perceived difference of two colours)
2
(Perceived difference in lightness)
2
+(Perceived difference in chroma)
2
+(Perceived difference in hue)
2
(7.137)
Therefore:
(Perceived difference in hue)
2
(Total perceived difference of two colours)
2
(Perceived difference in lightness)
2
(Perceived difference in chroma)
2
(7.138)
In this expression, the total perceived dierence of two colours is given by the Euclidean
metric, applied either to the Luv or the Lab colour space. Lightness is the value of L dened
by equations (7.117) or (7.119). Chroma is dened as:
C
uv
_
u
2
+v
2
or C
ab
_
a
2
+b
2
(7.139)
Example B7.19
Show that in the Lab system, the perceived dierence in hue, H
ab
, between
two colours (a, b) and (a + a, b + b), where a << a and b << b, may be
computed using:
H
ab
=
|ab ba|
a
2
+b
2
(7.140)
The physics and psychophysics of colour vision 739
Let us rewrite equation (7.137). On the left-hand side we have the Euclidean distance
in the Lab colour space:
(L)
2
+ (a)
2
+ (b)
2
(L)
2
+ (C
ab
)
2
+ (H
ab
)
2
(H
ab
)
2
= (a)
2
+ (b)
2
(C
ab
)
2
(7.141)
From (7.139) we can work out C
ab
:
C
ab
=
C
ab
a
a +
C
ab
b
b
=
1
2
(a
2
+b
2
)
1/2
2aa +
1
2
(a
2
+b
2
)
1/2
2bb
=
aa +bb
a
2
+b
2
(7.142)
We may then substitute in (7.141):
(H
ab
)
2
= (a)
2
+ (b)
2
(aa +bb)
2
a
2
+b
2
=
a
2
(a)
2
+b
2
(a)
2
+a
2
(b)
2
+b
2
(b)
2
a
2
(a)
2
b
2
(b)
2
2abab
a
2
+b
2
=
(ab ba)
2
a
2
+b
2
(7.143)
Equation (7.140) then follows.
Example B7.20
Show that in the Lab system, the perceived dierence in hue, H
ab
, between
two colours (a, b) and (a + a, b + b), where a << a and b << b, is given
by:
H
ab
=
C
ab
h
ab
180
(7.144)
From (7.136), it is obvious that h
ab
= |
ab
|. From (7.135), we have:
ab
=
a
_
tan
1
b
a
_
a +
b
_
tan
1
b
a
_
b
(7.145)
740 Image Processing: The Fundamentals
=
1
1 +
_
b
a
_
2
a
_
b
a
_
a +
1
1 +
_
b
a
_
2
b
_
b
a
_
b
=
a
2
a
2
+b
2
_
b
a
2
a +
1
a
b
_
=
ab ba
a
2
+b
2
(7.146)
We note that angle
ab
and hue angle h
ab
are measured in degrees. To be able to use
their dierentials dened above, we must convert them into rads by multiplying them
with /180. If we substitute then from (7.139) and (7.146) into (7.144), we obtain
(7.140), which proves the validity of (7.144).
What aects the way we perceive colour?
Apart from the external factors of illumination and reectance function, the perception of
colour is also aected by the temporal and spatial context of the viewed colour surface.
What is meant by temporal context of colour?
If colour lights are ashed to a person with certain frequency, the colours the person sees do not
only depend on the actual spectra of the ashed lights, but also on the frequency with which
these spectra change. For example, if they change faster than, roughly, 30 times per second
(30Hz), no matter what colour the individual lights are when seen in static conditions and
individually, the person sees only tones of grey. If they are ashed with temporal frequency
between roughly 6Hz and 30Hz, the person sees only tones of green and red, but not tones
of blue. The lights have to alternate more slowly than 6 times per second for a person to be
able to perceive all colours ashed as if they were shown statically and individually.
What is meant by spatial context of colour?
Spatial context of colour refers to the colours that are next to it in space. The way we perceive
colour is very much inuenced by the surroundings of the colour patch we are concentrating
on. This is more so when the colour of interest is not very spatially extensive. Plate IV
demonstrates this very well. Seeing these two panels, we perceive on the left a more greyish
square in the middle than the corresponding square on the right. And yet, both central
squares were constructed to have exactly the same shade of greyish-yellow.
If the colour context changes with high spatial frequency, we may not even see the real
colours but rather dierent colours depending on the spatial frequency with which the real
colours alternate. This is demonstrated in Plate III, where depending on the distance from
which the pattern is seen, instead of seeing alterations of yellow and blue the right end of the
pattern may appear green.
The physics and psychophysics of colour vision 741
d
2
1
d
Figure 7.29: When an object is at a distance d
2
, there are more cycles of colour variation
within the viewing angle , than when the same object is at a distance d
1
< d
2
. If there are
more than 16 cycles of colour variation per degree of viewing angle, then we see only grey. If
there are between 4 and 16 cycles of colour variation per degree of viewing angle, we do not
see blue.
Why distance matters when we talk about spatial frequency?
A viewer has a eld of view, ie a cone of directions from which her sensors can receive input.
Whether the viewer will see the real colours or not depends on how many alterations of colour
happen inside that eld of view. That is why spatial frequencies are measured in cycles per
degree of viewing angle. The word cycles refers to alterations of colours. In the example
of Plate III, a cycle is a pair of blue and yellow stripes. So, if we are further away from
the pattern we see, we can t inside one degree of viewing angle more such cycles. This is
demonstrated in gure 7.29. There comes a point when the brain cannot cope with the large
number of cycles, and stops seeing blue and yellow and starts seeing green.
How do we explain the spatial dependence of colour perception?
The spatial dependence of colour perception is attributed to the paths which the signals from
the sensors in the retina have to share in order to reach the brain. The spectral sensitivities
of the cones in the retina are highly correlated and, therefore, the signals they produce are
correlated too. According to psychophysicists, these signals are not transmitted to the brain
separately, but through three distinct pathways, that somehow decorrelate them and encode
them for maximal eciency. One of these pathways transmits an excitatory signal when
the L cones are activated and an inhibitory signal when the M cones are activated. We say
then that this path transmits the opponent colour R G. Another pathway is identied
with the transmission of light-dark variation and it has the maximal spatial resolution. The
third pathway is responsible for the transmission of the blue-yellow signal and it has the least
spatial resolution. That is why light-dark variations are perceived with maximal resolution,
at spatial frequencies that are too high for colour identication, while the sensation of blue is
lost when the spatial frequency of colour alteration is moderate. It has been worked out that
the transformation from the XY Z colour space to the opponent colour space O
1
O
2
O
3
of the
decorrelated signals, that eventually reach the brain, is linear:
_
_
T
O1
T
O2
T
O3
_
_
=
_
_
0.279 0.720 0.107
0.449 0.290 0.077
0.086 0.590 0.501
_
_
_
_
T
X
T
Y
T
Z
_
_
(7.147)
742 Image Processing: The Fundamentals
7.3 Colour image processing in practice
How does the study of the human colour vision aect the way we do image
processing?
The way humans perceive colour is signicant in image processing only in special cases:
(i) when we try to develop an industrial inspection system that is aimed at replacing the
human inspector;
(ii) when we intent to modify images, eg compress or enhance them, with the purpose of
being seen by a human.
In all other problems, we may deal with the colour images as if they were simply mul-
tispectral images. We still have to solve important problems, but we do not need to worry
about the way the image would look to the human eye. For example, if we are talking about
colour constancy, the term is only correct if we wish to identify colours the same way humans
do. If we are simply talking about a computer recognising the same object under dierent
illumination conditions, then the correct term is spectral constancy rather than colour con-
stancy and the algorithm discussed on page 687 may be used for a colour image, the same
way it may be used for any multispectral image.
Often, inadequate understanding of the psychophysics of colour vision lead to bad out-
comes, like for example the use of the perceptually uniform colour spaces (which often turn
out to be not so perceptually uniform in practice). In this section we shall see how the un-
derstanding we gained on the human visual system in the previous section may be adapted
and used in practice in order to process colour images.
How perceptually uniform are the perceptually uniform colour spaces in practice?
Both Lab and Luv spaces are only approximately perceptually uniform. However, the major
problem is not the approximation used to go from the standard RGB space to these spaces,
but the approximations we make when we use these spaces in practice. For a start, formulae
(7.117) and (7.119) were derived using the RGB colour system as dened by the CIE standard.
When we apply these formulae, we use the RGB values we recorded with whatever sensor
we are using. These values do not necessarily correspond to the RGB values that are meant
to be used in these formulae. Next, we very seldomly know the exact values of the reference
white (which is user dependent) and we use the standard illuminant instead, which, however,
may not apply in our experimental set up. As a result, very often the use of Lab and Luv
colour spaces in practice turns out to produce disappointing results. This is largely due to the
sloppy use of these formulae, which are often applied without careful consideration of their
meaning.
How should we convert the image RGB values to the Luv or the Lab colour spaces?
Given that most images we are dealing with have been captured by digital media, it is best to
assume that the appropriate colour space for them is the sRGB space. We must then follow
a process that is the inverse of the process presented on page 732. In the following process,
the RGB values are assumed to be in the range [0, 255].
Colour image processing in practice 743
Step 1: Divide the RGB values of your image by 255, to convert them to the range [0, 1].
Call these scaled values R
, G
and B
.
Step 2: Apply to the (R
, G
, B
, G
, B
+ 0.055
1.055
_
2.4
G =
_
G
+ 0.055
1.055
_
2.4
B =
_
B
+ 0.055
1.055
_
2.4
(7.148)
If R
, G
, B
0.04045, use:
R =
R
12.92
G =
G
12.92
B =
B
12.92
(7.149)
Step 3: From these (R, G, B) values, work out the (X, Y, Z) values:
_
_
X
Y
Z
_
_
=
_
_
0.4124 0.3576 0.1805
0.2126 0.7152 0.0722
0.0193 0.1192 0.9505
_
_
_
_
R
G
B
_
_
(7.150)
Step 4: Use either formulae (7.117) or (7.119), on page 734, with reference white the D
65
standard illuminant, to obtain the values in the perceptually uniform colour spaces. Because
of the way the XY Z values have been derived, the values of the reference white that must
be used for the transformation to the perceptually uniform space have to be divided by 100.
That is, the (X
n
, Y
n
, Z
n
) values for D
65
that must be used are: (0.950155, 1.0000, 1.088259).
Example B7.21
Assume that instead of the correct reference white (X
n
, Y
n
, Z
n
), you use
(X
n
+ X
n
, Y
n
, Z
n
), in the transformation formulae (7.117) and (7.118).
Work out the error when you measure the distance of two colours in the
Luv colour space, assuming that the two colours have the same lightness
L. Use only rst order perturbations.
Using rst order perturbation theory, we may write:
u
measured
= u
true
u
X
n
X
n
u
true
u
v
measured
= v
true
v
X
n
X
n
v
true
v
(7.151)
744 Image Processing: The Fundamentals
From (7.118) we can easily work out that:
u
X
n
= 13L
60Y
n
+ 12Z
n
(X
n
+ 15Y
n
+ 3Z
n
)
2
v
X
n
= 13L
9Y
n
(X
n
+ 15Y
n
+ 3Z
n
)
2
(7.152)
The dierence between two colours E is given by the Euclidean metric in the Luv
colour space. Since the two colours have the same lightness,
E =
_
(u
1
u
2
)
2
+ (v
1
v
2
)
2
=
_
(u
1t
u
u
2t
u
)
2
+ (v
1t
v
v
2t
v
)
2
(7.153)
where (u
1t
, v
1t
) are the true values of one colour and (u
2t
, v
2t
) are the true values of
the other colour. Since both colours have the same value for L, both colours have the
same error
u
and
v
in the calculation of u and v, respectively. Also, when we add
or subtract values, we must always assume that the errors add (worst case scenario).
Then, we may write:
E =
_
(u
1t
u
2t
)
2
+ 4
2
u
+ 4
u
(u
1t
u
2t
) + (v
1t
v
2t
)
2
+ 4
2
v
+ 4
v
(v
1t
v
2t
)
=
_
(u
1t
u
2t
)
2
+ (v
1t
v
2t
)
2
. .
E
t
_
1 +
4
2
u
+ 4
u
(u
1t
u
2t
) + 4
2
v
+ 4
v
(v
1t
v
2t
)
(u
1t
u
2t
)
2
+ (v
1t
v
2t
)
2
_
1/2
E
t
_
1 +
1
2
|4
2
u
+ 4
u
(u
1t
u
2t
) + 4
2
v
+ 4
v
(v
1t
v
2t
)|
(u
1t
u
2t
)
2
+ (v
1t
v
2t
)
2
_
E
t
_
1 +
|2
u
(u
1t
u
2t
) + 2
v
(v
1t
v
2t
)|
(u
1t
u
2t
)
2
+ (v
1t
v
2t
)
2
_
(7.154)
The nal expression was derived by using (1 +x)
n
1 +n when 0 < x << 1 and by
omitting terms of second order in the error values. Then, we can work out the relative
error in the computed distance between the two colours, as
Relative Error
|E E
t
|
E
t
=
|2
u
(u
1t
u
2t
) + 2
v
(v
1t
v
2t
)|
(u
1t
u
2t
)
2
+ (v
1t
v
2t
)
2
(7.155)
where:
13L
60Y
n
+ 12Z
n
(X
n
+ 15Y
n
+ 3Z
n
)
2
X
n
13L
9Y
n
(X
n
+ 15Y
n
+ 3Z
n
)
2
X
n
(7.156)
Note that the relative error is a function of the true values of the colours. This means
that it is dierent at dierent parts of the colour space. This is characteristic of systems
Colour image processing in practice 745
that depend on the perturbed parameter in a nonlinear way. This nonuniformity of the
expected error by itself damages the perceptual uniformity of the colour space.
Figure 7.30 shows the plot of the relative error in the computed distance between two
colours, for dierent combinations of the true colour dierences u |u
1t
u
2t
| and
v |v
1t
v
2t
|, for (X
n
, Y
n
, Z
n
) = (95.0155, 100, 108.8259), L = 1 and X
n
= 1.
Note that as L increases, these errors grow linearly.
Figure 7.30: Relative error in the estimation of the distance of two colours, when their
true distance is
_
(u)
2
+ (v)
2
, for L = 1 and X
n
= 1.
Example B7.22
Assume that instead of the correct reference white (X
n
, Y
n
, Z
n
), you use
(X
n
+ X
n
, Y
n
, Z
n
), in the transformation formulae (7.119) and (7.120).
Work out the error when you measure the distance of two colours in the
Lab colour space, assuming that the two colours have the same lightness L.
Use only rst order perturbations.
From (7.119), we note that b does not depend on X
n
, and so an error in X
n
will not
aect its value. Using rst order perturbation theory, we may then write:
a
measured
= a
true
a
X
n
X
n
a
true
a
(7.157)
746 Image Processing: The Fundamentals
From (7.120), we can easily work out that:
a
X
n
= 500
df
dX
n
=
_
_
500
_
1
3
_
X
1/3
X
4/3
n
if
X
X
n
> 0.008856
500
_
7.787X
X
2
n
_
if
X
X
n
0.008856
=
_
500
3
_
X
X
4
n
_
1/3
if
X
X
n
> 0.008856
3893.5X
X
2
n
if
X
X
n
0.008856
(7.158)
The dierence between two colours E is given by the Euclidean metric in the Lab
colour space. Since the two colours have the same lightness,
E =
_
(a
1
a
2
)
2
+ (b
1
b
2
)
2
=
_
(a
1t
a1
a
2t
a2
)
2
+ (b
1t
b
2t
)
2
(7.159)
where (a
1t
, b
1t
) are the true values of one colour and (a
2t
, b
2t
) are the true values of
the other colour. The errors depend on the value of X, that characterises each colour,
so the two colours have dierent errors in a:
a1
and
a2
, respectively. Then, we may
write:
E =
_
(a
1t
a
2t
)
2
+
2
a1
+
2
a2
+ 2(
a1
+
a2
)(a
1t
a
2t
) + 2
a1
a2
+ (b
1t
b
2t
)
2
=
_
(a
1t
a
2t
)
2
+ (b
1t
b
2t
)
2
. .
E
t
_
1+
2
a1
+
2
a2
+2(
a1
+
a2
)(a
1t
a
2t
)+2
a1
a2
(a
1t
a
2t
)
2
+ (b
1t
b
2t
)
2
_
1/2
E
t
_
1 +
1
2
|
2
a1
+
2
a2
+ 2(
a1
+
a2
)(a
1t
a
2t
) + 2
a1
a2
|
(a
1t
a
2t
)
2
+ (b
1t
b
2t
)
2
_
E
t
_
1 +
|(
a1
+
a2
)(a
1t
a
2t
)|
(a
1t
a
2t
)
2
+ (b
1t
b
2t
)
2
_
(7.160)
Then, we can work out the relative error in the computed distance between the two
colours, as
Relative Error
|E E
t
|
E
t
=
|(
a1
+
a2
)(a
1t
a
2t
)|
(a
1t
a
2t
)
2
+ (b
1t
b
2t
)
2
(7.161)
where the errors are computed with the help of equations (7.157) and (7.158).
Colour image processing in practice 747
How do we measure hue and saturation in image processing applications?
There do not seem to be any standard formulae for this. One may quantify these two concepts
on the plane, shown in gure 7.21. We use as reference point the reference white. We may
then dene saturation as the distance from that point. The further away we go from the
reference white, the more saturated a colour becomes. We may also choose an orientation as
the reference orientation. Let us say that our reference orientation is the line that connects
the origin with the vertex marked R. We may measure the hue of a colour by the angle it
forms with the reference direction. If this angle is 0, the hue may be red, and, as it increases,
it passes through the shades of green and blue, before it becomes red again.
If (r, g, b) are the normalised colour coordinates of a point, and (r
w
, g
w
, b
w
) are the co-
ordinates of the reference white, then hue and saturation for that colour are given by (see
example 7.27):
Saturation
1
2
_
(g r g
w
+r
w
)
2
+ 3(b b
w
)
2
(7.162)
= tan
1
_
3|b
w
(1 +g r) b(1 +g
w
r
w
)|
|(1 +g
w
r
w
)(g
w
r
w
g +r) + 3b
w
(b
w
b)|
_
tan
1
_
3|Numerator|
|Denominator|
_
(7.163)
Then:
Hue
_
_
if Numerator > 0 Denominator > 0
360
o
if Numerator < 0 Denominator > 0
180
o
if Numerator > 0 Denominator < 0
180
o
+ if Numerator < 0 Denominator < 0
(7.164)
If we use as reference white the ideal white with values r
w
= g
w
= b
w
= 1/3 and scaling
so that the maximum value of saturation is 1, the above formulae take the form (see example
7.23):
Saturation
ideal white
3
2
(g r)
2
+ 3
_
b
1
3
_
2
(7.165)
ideal white
= tan
1
_
1
3
(1 +g r) b
r g b +
1
3
_
(7.166)
Example 7.23
Work out the values of the hue and saturation for the red, green and blue
colours of the Maxwell triangle, when as reference white we use the ideal
white.
The (r, g, b) coordinates of the red, green and blue colours are (1, 0, 0), (0, 1, 0) and
(0, 0, 1), respectively. For the ideal white, r
w
= g
w
= b
w
= 1/3. Then using formula
(7.162), the saturation of all three colours is worked out to be:
_
2/3. One may use
748 Image Processing: The Fundamentals
this value to normalise the denition of the saturation, so that it takes values between
0 and 1, and thus derive formula (7.165).
For angle , we use (7.166):
R
= 0
G
= tan
1
_
1
3
2
1 +
1
3
_
= tan
1
3 = 60
o
B
= tan
1
_
1
3
1
1 +
1
3
_
= tan
1
3 = 60
o
(7.167)
Applying then the rules of (7.164), we work out that:
Hue
R ideal white
= 0
o
Hue
G ideal white
= 180
o
G
= 120
o
Hue
B ideal white
= 180
o
+
B
= 240
o
(7.168)
Example B7.24
The Maxwell colour triangle of a colour system is shown in gure 7.31.
Point W is the reference white. The reference direction is dened to be line
WR. Hue is the angle formed by the direction of a point P, with coordinates
(x, y), with respect to the reference direction, measured counterclockwise.
Use vector calculus to work out the hue of point P.
P
O
P(x,y)
W
w
W(x ,y )
w
x
B
y
R G
Figure 7.31: The Maxwell colour triangle. Angle is the hue of point P.
Colour image processing in practice 749
Point R has coordinates (
2
2
x
w
, y
w
_
WP = (x x
w
, y y
w
) (7.169)
We consider the dot product of these two vectors:
WR WP = |WR||WP| cos
cos =
WR WP
|WR||WP|
(7.170)
Therefore:
cos =
_
2
2
+x
w
_
(x
w
x) +y
w
(y
w
y)
_
_
2
2
+x
w
_
2
+y
2
w
_
(x x
w
)
2
+ (y y
w
)
2
(7.171)
As is measured from 0
o
to 360
o
, to specify it fully we need to know its sine as well
as its cosine (or at least the sign of its sine so we can work out in which quadrant it
belongs). To compute the sine, we have to take the cross product of vectors WR and
WP. Let us associate unit vectors i and j with axes Ox and Oy, respectively. For a
right-handed coordinate system, the third axis will have unit vector, say, k that sticks
out of the page. The cross product of vectors WR and WP then will be:
WR WP =
i j k
2
2
x
w
y
w
0
x x
w
y y
w
0
= k
__
2
2
x
w
_
(y y
w
) (y
w
)(x x
w
)
_
= k
__
2
2
+x
w
_
(y
w
y) +y
w
(x x
w
)
_
(7.172)
We know that the component of the cross product along unit vector k is given by
|WR||WP| sin . So, we deduce that:
sin =
_
2
2
+x
w
_
(y
w
y) +y
w
(x x
w
)
_
_
2
2
+x
w
_
2
+y
2
w
_
(x x
w
)
2
+ (y y
w
)
2
(7.173)
Using (7.171) and (7.173) allows us to work out the value of in the range [0
o
, 360
o
).
750 Image Processing: The Fundamentals
Example B7.25
Use trigonometry to compute the sine of angle in the Maxwell colour
triangle of gure 7.31.
P
O
P(x,y)
K
Y
T
S
W
w
W(x ,y )
w
B
y
x
R G
Figure 7.32: The Maxwell colour triangle of the RGB colour space.
Figure 7.32 repeats gure 7.31 and is conveniently labelled for this example. We note
that = + 180
o
+. So:
sin = sin( +) = sin cos sin cos (7.174)
We project points P and W on the OB axis to points T and Y , respectively, and point
W on the RG axis to point K. From the right angle triangles PSW and WKR, we
have:
sin =
PS
PW
cos =
WS
PW
sin =
RK
RW
cos =
WK
RW
(7.175)
Let us say that the coordinates of the reference white are (x
w
, y
w
). The coordinates
of R obviously are (
2
2
+x
w
WS = y y
w
(7.176)
Lengths PW and RW are obviously given by
PW =
_
(x x
w
)
2
+ (y y
w
)
2
RW =
_
_
2
2
+x
w
_
2
+y
2
w
(7.177)
Colour image processing in practice 751
Substituting from (7.176) and (7.177) into (7.175) and from there into (7.174), we
can easily work out the expression for sin , which turns out to be the same as the one
given by equation (7.173).
Example B7.26
Use trigonometry to compute the sine of the hue angle in the Maxwell
colour triangle of gure 7.33.
W(x ,y )
w w
B
O
P(x,y)
P
W
S
K G R
Figure 7.33: A colour space and its corresponding Maxwell triangle.
We note that here the hue angle = 360
o
. Therefore:
sin = sin( +) = sin cos sin cos (7.178)
The trigonometric functions that appear here can be expressed again by equations
(7.175). In terms of the coordinates of the points involved, the lengths that appear
in (7.175) are:
PS = x x
w
WS = y
w
y RK =
2
2
+x
w
WK = y
w
(7.179)
Lengths PW and RW are again given by equations (7.177). Using these equations
into (7.178), we work out that sin is given by equation (7.173) in this case too.
752 Image Processing: The Fundamentals
Example B7.27
Work out the values of the tangent of the hue angle and of the saturation
in terms of the RGB values of a colour.
For the tangent of the hue angle all we have to do is to take the ratio of equations
(7.171) and (7.173), and substitute (x, y) and (x
w
, y
w
) in terms of the normalised
(r, g, b) values of the corresponding colours, given by equation (7.93), on page 724:
tan =
_
2
2
+x
w
_
(y
w
y) +y
w
(x x
w
)
_
2
2
+x
w
_
(x
w
x) +y
w
(y
w
y)
=
1+g
w
r
w
3(b
w
b)
2
+
3b
w
2
grg
w
+r
w
2
1+g
w
r
w
2
g
w
r
w
g+r
2
+
3b
w
3(b
w
b)
2
=
3[b
w
(1 +g r) b(1 +g
w
r
w
)]
(1 +g
w
r
w
)(g
w
r
w
g +r) + 3b
w
(b
w
b)
(7.180)
The value of the saturation is given by the length of vector WP of gure 7.31:
S =
_
(x x
w
)
2
+ (y y
w
)
2
=
_
(g r g
w
+r
w
)
2
+ 3(b b
w
)
2
2
(7.181)
How can we emulate the spatial dependence of colour perception in image pro-
cessing?
First, the RGB bands of an image are converted into the XY Z bands using transformation
(7.96):
_
_
X
Y
Z
_
_
=
_
_
2.769 1.752 1.130
1.000 4.591 0.060
0.000 0.057 5.594
_
_
_
_
R
G
B
_
_
(7.182)
Then the opponent colour bands O
1
O
2
O
3
are computed using transformation (7.147), which
is repeated here for convenience:
_
_
O
1
O
2
O
3
_
_
=
_
_
0.279 0.720 0.107
0.449 0.290 0.077
0.086 0.590 0.501
_
_
_
_
X
Y
Z
_
_
(7.183)
The opponent colour bands may be blurred with kernels that are designed to imitate the
way the human eye blurs colours. Three blurring kernels are required, one for each opponent
Colour image processing in practice 753
Channel w
ki
h
ki
O
1
w
11
= 1.003270 h
11
= 0.0500
(N
1
= 3) w
12
= 0.114416 h
12
= 0.2250
w
13
= 0.117686 h
13
= 7.0000
O
2
w
21
= 0.616725 h
21
= 0.0685
(N
2
= 2) w
22
= 0.383275 h
22
= 0.8260
O
3
w
31
= 0.567885 h
31
= 0.0920
(N
3
= 2) w
32
= 0.432115 h
32
= 0.6451
Table 7.14: Parameters for blurring the O
1
, O
2
and O
3
channels in order to imitate the human
way of seeing colours from a distance.
colour band. Each of these blurring kernels is constructed as the sum of some Gaussian
functions, as follows:
g
k
(x, y) =
N
k
i=1
w
ki
T
i
e
2
ki
(x
2
+y
2
)
(7.184)
Here index k identies the opponent colour band, (x, y) are spatial coordinates of the 2D
blurring mask, N
k
is the number of components that are used to create the mask for channel
k, T
i
is a normalising constant that makes the sum of the elements of the ith component equal
to 1, w
ki
is the weight with which the ith component contributes to the mask for channel k
and
ki
is the spreading parameter for the ith component of channel k, computed from the
parameter h
ki
, using:
ki
=
2
ln 2
Fh
ki
1
for Fh
ki
> 1 (7.185)
This formula comes about because h
ki
is the half width half maximum of the Gaussian
function, ie it measures the distance from the centre where the value of the Gaussian drops
to half its central value. This quantity is measured in degrees of visual angle. Factor F is
used to convert the degrees of visual angle in pixels. Usually, product Fh
ki
is quite high,
so removing 1 from it does not change it much. However, if this product is 1 or lower, the
implication is that the spreading of the Gaussian is in subpixel values, and so this Gaussian
may be omitted. (Note that
ki
is inversely proportional to the standard deviation, and if
ki
+, the standard deviation goes to 0.) So, setting
ki
= 0 for Fh
ki
1, safeguards
against creating lters with sub-pixel spread. Table 7.14 lists the values of parameters w
ki
and h
ki
for each channel.
Note that spatial coordinates (x, y) have to be in pixels. To emulate the human visual
system, we must relate the resolution of the sensor measured in mm per pixel, to the distance
from which the particular object is assumed to be seen.
After the O
1
O
2
O
3
channels have been blurred, they are transformed back to the XY Z
space, using the inverse of transformation (7.147), on page 741, and from there back to the
RGB space using the inverse of transformation (7.182), for displaying or further processing.
This sequence of transformations is called S-CIELAB colour system.
754 Image Processing: The Fundamentals
Example B7.28
Consider the image of Plate VIa. It was scanned so that a 13cm picture
was represented by 512 pixels. Blur this image so that it looks the way it
would appear when seen from distances 2, 4 and 6 metres. To avoid having
to introduce a background, use the full image as the scene, but concentrate
your processing only to the central part.
The scale of the image is r = 512/130 4 pixels per mm. Next, we shall work out the
size of an object at a distance of 2m, 4m and 6m. Figure 7.34 shows how the distance
D of the seen object is related to the size S of the object and the visual angle.
tan =
S
D
S = Dtan
1
(7.186)
By setting = 1
o
, and measuring D in mm, we can work out how many mm 1
o
of visual angle corresponds to, for the selected distance. This way, we work out that
S
2
= 34.91, S
4
= 69.82 and S
6
= 104.73 mm per degree of visual angle, for distances 2,
4 and 6 metres, respectively. If we then multiply these values with r, we shall convert
them into pixels per visual angle and thus have the factor F we need in order to work
out the smoothing lters. It turns out that F
2
= 139.64, F
4
= 279.28 and F
6
= 418.92.
These values of F are used in conjunction with formulae (7.184) and (7.185), and table
7.14 to work out the three smoothing masks for each distance. For a 4m distance, the
cross-sections of the constructed lters are shown in gure 7.35. We note that the
lter for smoothing the O
1
component, that corresponds to image brightness, is the
narrowest, while the lter for smoothing the O
3
component, that corresponds to the
blue-yellow colours, is the broadest. In fact, the three lters for the three distances,
with truncation error 0.1, turned out to be of sizes 11 11, 15 15 and 21 21, for
2m, 23 23, 33 33 and 45 45 for 4m, and 37 37, 51 51 and 69 69, for 6m.
Figure 7.36 shows the original opponent colour channels of the input image and the
blurred ones, with the lters used for the 4m distance. The nal colour images con-
structed are shown in Plates VIb, VIc and VId.
Distance D
S
i
z
e
S
Figure 7.34: The relationship between the distance and size of an object, and the visual
angle.
Colour image processing in practice 755
20 10 0 10 20
0
1
2
3
4
5
6
x 10
3
f
1
(x)
x
20 10 0 10 20
0
1
2
3
4
5
6
x 10
3
f
2
(x)
x
20 10 0 10 20
0
1
2
3
4
5
6
x 10
3
f
3
(x)
x
Figure 7.35: The smoothing lters for distance 4m, for the O
1
, O
2
and O
3
channels,
from left to right, respectively.
Figure 7.36: The original opponent channels (on the left) and after they have been
blurred with the lters for a distance of 4m (on the right).
756 Image Processing: The Fundamentals
What is the relevance of the phenomenon of metamerism to image processing?
Obviously, the metamers of the human eye are dierent from the metamers of the electronic
sensors, because of the dierent spectral responses of the two types of sensor. The phe-
nomenon of metamerism then is of paramount importance when one wishes to use computer
vision to grade colours of objects that are aimed for the consumer market. For example, if
one wishes to grade ceramic tiles in batches of the same colour shade, tiles placed in the same
batch by the computer should also be placed in the same batch by a human, otherwise the
grading performed will not be acceptable to the consumer: people may receive tiles that are
of dierent colour shade, while the computer thought the tiles packed together belonged to
the same colour shade.
How do we cope with the problem of metamerism in an industrial inspection
application?
The problem we wish to solve here may be expressed as follows. If the values recorded by
a set of three colour sensors for a particular pixel are (Q
1
, Q
2
, Q
3
), what would the recorded
values be if the same surface patch was seen by another set of three colour sensors with
dierent sensitivities, but under the same illumination conditions? In practice, the rst set
of recorded values (Q
1
, Q
2
, Q
3
) would be the (R, G, B) values of a pixel in the image captured
by an ordinary colour camera and the second set of values would be the values recorded by
the human eye.
Assuming that we work with sampled values of the wavelength , equation (7.80), on page
711, may be used in digital form to express the values that will be recorded by the camera
sensors, with sensitivities S
1
(), S
2
() and S
3
(), under illumination I(), for an object with
reectance function R():
Q
1
=
N
i=1
S
1
(
i
)I(
i
)R(
i
)
Q
2
=
N
i=1
S
2
(
i
)I(
i
)R(
i
)
Q
3
=
N
i=1
S
3
(
i
)I(
i
)R(
i
) (7.187)
Here N is the number of points we use to sample the range of wavelengths . Typically,
N = 31, as is in the range [400nm, 700nm] and we sample with a step of 10nm.
If we want to know what values another set of sensors would record, under the same
illumination and for the same object, we have to solve these equations for R(
i
) and use
these values in another set of similar equations, where the sensitivity functions are replaced
with the sensitivity functions of the new set of sensors. This, clearly, is not possible, as we
have N >> 3 unknowns, and only 3 equations. So, we have to work in an indirect way to
recover the transformation from the 3D space of the colour camera values to the 3D space of
the human eye values.
Colour image processing in practice 757
As the equations involved are linear, we may expect that the transformation we are seeking
between the two spaces of the recorded values are also linear, expressed by a matrix A. This
is expressed schematically in gure 7.37.
We may then work as follows. Here for simplicity we assume that N = 31, so that a
reectance function is a 31D vector.
Choose with a Monte-Carlo method M points in the 31D reectance function space,
that map through equations (7.187) into a small ball of radius E centred at point
(E
0
1
, E
0
2
, E
0
3
) in the 3D human sensor space. Call the values of these points in the 3D
human sensor space (E
i
1
, E
i
2
, E
i
3
), where index i is used to identify the selected point.
Find the values the camera sensors would record for the chosen 31-tuples, which repre-
sent the selected reectance functions and which are mapped within the ball of radius
E in the human sensor space. Let us call them (Q
i
1
, Q
i
2
, Q
i
3
).
Solve in the least square error sense the system of equations
_
_
E
i
1
E
i
2
E
i
3
_
_
=
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
_
_
Q
i
1
Q
i
2
Q
i
3
_
_
for i = 1, 2, . . . , M (7.188)
for the unknown values a
ij
of matrix A.
Repeat that for many balls of radius E, centred at several locations (E
0
1
, E
0
2
, E
0
3
) of
the 3D human sensor space.
Once the transformation from the camera space to the human visual system space has
been worked out, the (R, G, B) values of a colour seen by the camera may be converted to
the corresponding values in the human visual system space. Subsequent clustering of colours
can then proceed in this human visual system space.
In a practical application of grading ceramic tiles, it has been reported that all matrices
A, constructed for dierent locations (E
0
1
, E
0
2
, E
0
3
), turned out to be identical, within some
tolerance. Thus, the transformation from the camera recordings to the human recordings
had been identied, and used to work out, from the camera values, for each ceramic tile, the
values that would have been seen by a human observer. By clustering the transformed values,
batches of tiles of indistinguishable colour shade were then created, the same way they would
have been created by human inspectors.
What is a Monte-Carlo method?
A Monte-Carlo method is a rather brute force method of solving a problem by using random
numbers. In the particular case of the metamers of a reectance function with respect to a
particular sensor, we have to choose other reectance functions that are mapped to the same
values by the sensor. There is no analytic way to do that. All we can do is to choose at
random reectance functions, test each one of them to see whether it produces eye response
values near enough to the values of the original reectance function, so that the human brain
will confuse them as identical, and if so, keep it, if not disregard it. Another application of
the Monte-Carlo approach is the estimation of the volume of an object dened by a surface
that cannot be easily described analytically (see example 7.29).
758 Image Processing: The Fundamentals
of eye
sensor
values
3D space
3D space
of camera
sensor
values
31D space
of reflectance
functions
M2
A?
M1
Figure 7.37: Assume that you are given a uniformly coloured object with reectance function
represented by the point marked with the largest lled circle in the 31D space of discretised
reectance functions. You are required to nd metamers of this reectance function. By ran-
domly trying many 31-tuples of values, you deduce that the points marked inside the indicated
circle in the 31D space of reectance functions are mapped to points inside a small sphere of
radius E in the 3D space of human sensor values, such that the human eye cannot distinguish
them, although they are mathematically distinct. Mapping M1, from the reectance function
space to the human sensor space, is done through equations (7.187), with the knowledge of the
sensitivity curves of the sensors of the human eye. All reectance functions that are mapped
via M1 to the particular sphere of radius E, in the 3D space of human sensor, constitute
the metamers of the original reectance function we were given. The same points are also
mapped via mapping M
2
to the 3D space of the values of the camera we are using for the
inspection task. Mapping M
2
is done with the same equations as mapping M1, only now the
sensitivity curves we use are those of the camera. As mappings M1 and M
2
are dierent, the
points in the 3D space of the values of the camera are not expected to be particularly close to
each other. That is, they are not expected to constitute metamers of the original reectance
function with respect to the camera sensors. We can use the pairs of corresponding points
between the two sensor spaces, to work out the elements of a 3 3 matrix A, that will allow
the mapping from camera sensor values to eye sensor values, and which will be valid locally
in these two spaces. It has been reported that matrix A worked out this way may also be
valid globally, as it does not depend on the exact location of the sphere of metamers in the
3D space of human sensor values.
Colour image processing in practice 759
Example B7.29
Estimate the area inside the closed curve in gure 7.38a, by using the
Monte-Carlo approach. Assume that you have means to know when a
point is inside the curve and when it is outside.
(b) (a)
Figure 7.38: (a) We would like to estimate the area enclosed by the curve. (b) Ran-
domly selected points uniformly distributed inside the rectangle.
The shape of the curve cannot be described by an analytic function, which we could
integrate to obtain the area enclosed by the curve. We can, however, draw uniformly
distributed numbers in the area (x
max
x
min
) (y
max
y
min
), which denes the
rectangle inside which the curve lies, and test whether the drawn points are inside
the curve or outside. In this example, we drew 300 points randomly placed inside the
rectangle. We counted 112 points inside the curve. We deduce, therefore, that the
area of the curve is 112/300 = 0.37 of the total area of the rectangle. The area of
the rectangle was measured to be equal to 380 unit tiles, so the area of the curve is
380 0.37 = 140.6 unit tiles. Obviously, the more points we draw, the more accurate
our estimate is.
How do we remove noise from multispectral images?
Gaussian noise is usually removed by low pass ltering each band separately. The problem is
the removal of impulse noise. As we saw in Chapter 4, impulse noise requires the use of a rank
order lter, like the median lter. When, however, the values of the pixels are vectors, it is
not straightforward to dene a rank of the spectra of the pixels inside a window. Nevertheless,
scientists have done that and dened the so called median vector lter.
760 Image Processing: The Fundamentals
How do we rank vectors?
Consider vectors x
1
, x
2
,. . ., x
N
. Let us assign to each vector x
i
a number d
i
, which is the
sum of the distances this vector has from all other vectors. The median vector is the one with
the minimum sum distance from all other vectors. The distance between any two vectors
may be measured by adopting an appropriate norm. For example, one may use the L
1
norm
(see page 733). Alternatively, one may use the Euclidean metric. This is a good idea, if one
deals with colour images expressed in the Lab or Luv colour space, as the Euclidean metric
expresses the perceived dierence between colours in these spaces. This makes the use of this
metric, in conjunction with the Lab or Luv colour space, most appropriate for colour image
denoising.
Example 7.30
Work out the vector median of the following set of vectors:
x
1
= (1, 2, 3) x
2
= (0, 1, 3) x
3
= (2, 2, 1)
x
4
= (3, 1, 2) x
5
= (2, 3, 3) x
6
= (1, 1, 0)
x
7
= (3, 3, 1) x
8
= (1, 0, 0) x
9
= (2, 2, 2) (7.189)
The sum of distance of vector x
1
from all other vectors is:
d
1
(x
1
x
2
)
2
+ (x
1
x
3
)
2
+ (x
1
x
4
)
2
+ (x
1
x
5
)
2
+ (x
1
x
6
)
2
+(x
1
x
7
)
2
+ (x
1
x
8
)
2
+ (x
1
x
9
)
2
= (1
2
+ 1
2
) + (1
2
+ 2
2
) + (2
2
+ 1
2
+ 1
2
) + (1
2
+ 1
2
) + (1
2
+ 3
2
)
+(2
2
+ 1
2
+ 2
2
) + (2
2
+ 3
2
) + (1
2
+ 1
2
)
= 49 (7.190)
In a similar way, we compute the d
i
values associated with all other vectors:
d
1
= 49 d
2
= 73 d
3
= 34
d
4
= 49 d
5
= 61 d
6
= 61
d
7
= 64 d
8
= 82 d
9
= 31 (7.191)
According to the d
i
values, the median is vector x
9
.
How do we deal with mixed noise in multispectral images?
If an image is aected by Gaussian as well as impulse noise, then we may use the -trimmed
vector median lter: after we rank the vectors inside the smoothing window, we may keep
only the N(1 ) vectors with the smallest distance from the others. We may then compute
only from them the mean spectrum that we shall assign to the central pixel of the window.
Colour image processing in practice 761
Example 7.31
For the vectors of example 7.30 calculate the -trimmed mean vector, for
= 0.2.
Since we have N = 9 vectors, from the ranked sequence of vectors we must use only
the rst 9 (1 0.2) = 7.2 7 vectors. This means that we must ignore vectors x
2
and x
8
with the two largest distances. The mean value of the remaining vectors is:
x = (2.0000, 2.0000, 1.7143).
Example 7.32
Plate Va shows a 3-band 512 512 image aected by impulse noise while
Plate Vb shows the same image aected by impulse and Gaussian noise.
Use vector median ltering and -trimmed vector median ltering, with
= 0.2, respectively, to clean these images.
As we are going to use Euclidean distances to compute the median, we rst convert
the image into Luv space, using the process described on page 742. Then, we adopt a
3 3 window and identify the pixel with the median colour in the Luv space, from the
9 pixels inside the window. The RGB values of that pixel are assigned to the central
pixel of the window. The result is shown in Plate Vc.
For the mixed noise, we decide to use a window of size 55, so that after we have kept
only the 80% of the pixels with the smallest distances from all others inside the window,
we shall have a reasonable number of values to average to reduce the Gaussian noise.
For 25 pixels, 80% means that we keep the 20 pixels with the smallest distance from all
other pixels. We then average their RGB values and assign the result to the central
pixel. We average the RGB values, as opposed to averaging the Luv values, because the
noise in the RGB space is uncorrelated and additive Gaussian to a good approximation,
while in the Luv space, most likely, it is not, due to the nonlinear transformations
used to obtain the Luv values. So, it makes more sense to average values we have
good reasons to believe they suer from zero-mean, additive and uncorrelated noise,
rather than to average values we know they suer from much more complicated noise.
The result of using this lter is shown in Plate Vd.
How do we enhance a colour image?
We may convert the RGB values into hue and saturation. Then set the saturation of each
pixel to a fraction of the maximum possible and work back new RGB values, keeping the
same hue. The algorithm is as follows.
762 Image Processing: The Fundamentals
Step 0: Work out the (r, g, b) values of each pixel from its (R, G, B) values by using (7.86),
on page 722. Select a value of in the range [0, 1/
6].
Step 1: Work out the so called value V
p
for each pixel, dened as the maximum of its R, G
and B values.
Step 2: Work out the (x
p
, y
p
) coordinates of the pixel on the Maxwell triangle as given by
(7.93) on page 724.
Step 3: Compute the saturation of the pixel using formula (7.165) on page 747.
Step 4: If the saturation of the pixel is below a threshold, leave its values unchanged. This
will allow the retention of the white pixels of the image.
Step 5: If the saturation of the pixel is above the threshold, compute the hue of the pixel
using formula (7.166) in conjunction with formulae (7.164). Note that Numerator means
the quantity that is inside the absolute value in the numerator of (7.166) and Denominator
means the quantity that is inside the absolute value in the denominator of (7.166).
Step 6: Set = (4
3 +
6x
p
1
6y
p
if 0
o
hue < 120
o
(3
21)x
p
3
2x
p
+
6y
p
1
if 120
o
hue < 240
o
(13
2)x
p
3
2x
p
6y
p
+1
if 240
o
hue < 360
o
y
0
=
_
_
1
6
if 0
o
hue < 120
o
3
x
p
+(
6y
p
1)
3
2x
p
+
6y
p
1
if 120
o
hue < 240
o
3
x
p
(
6y
p
1)
3
2x
p
6y
p
+1
if 240
o
hue < 360
o
(7.192)
Some of these formulae are derived in examples 7.34 and 7.35. The others can be derived in
a similar way.
Step 7: Calculate new (r, g, b) values for the pixel as follows:
b
new
=
_
2
3
y
0
r
new
=
1 b
new
2x
0
2
g
new
=
1 b
new
+
2x
0
2
(7.193)
For the derivation of these formulae, see example 7.36.
Step 8: Calculate new (R, G, B) values for the pixel as follows:
R
new
=
V
p
r
new
c
max
G
new
=
V
p
g
new
c
max
B
new
=
V
p
b
new
c
max
(7.194)
where c
max
max{r
new
, g
new
, b
new
}
Plates VII and VIII show two images and their enhanced versions produced by this algorithm.
Colour image processing in practice 763
Example 7.33
Work out the maximum saturation you can assign to a pixel with hue in
the range [240
o
, 360
o
) without altering its hue.
W(x ,y )
w w
O G R
W
P
P(x ,y )
x
B
y
(x ,y )
0 0
p p
Figure 7.39: The maximum saturation a pixel P can have for xed hue is given by the
colour of point (x
0
, y
0
).
Consider a pixel P with coordinates (x
p
, y
p
) on the Maxwell triangle, as shown in
gure 7.39. Since the hue of this pixel is in the range [240
o
, 360
o
), the pixel is nearest
to the Red-Blue side of the triangle (see example 7.23). We assume that the reference
white W is the ideal white, with coordinates r
w
= g
w
= b
w
= 1/3, which translate into
(0, 1/
2/2 0
0
_
3/2
x = (y
_
3/2)
1
3
x =
y
3
1
2
(7.195)
The equation of the WP line is:
x 0
y 1/
6
=
x
p
0
y
p
1/
6
x =
_
y
1
6
_
x
p
y
p
1/
6
(7.196)
764 Image Processing: The Fundamentals
We can combine these two equations to solve for (x
0
, y
0
) which is the point where the
two lines intersect:
y
0
3
1
2
=
_
y
0
6
_
x
p
y
p
1/
6
y
0
_
1
3
x
p
y
p
1/
6
_
=
1
2
x
p
6y
p
1
y
0
=
1
2
x
p
6y
p
1
1
6x
p
6y
p
1
y
0
=
_
3
2
6y
p
1
2x
p
6y
p
1 3
2x
p
(7.197)
We may then set y = y
0
in (7.195) to work out the value of x
0
:
x
0
=
2x
p
3
2x
p
6y
p
+ 1
(7.198)
The new saturation of pixel P will be:
New Saturation =
x
2
0
+
_
y
0
6
_
2
(7.199)
Example 7.34
Work out a value for increased saturation you may assign to the pixel
represented by point P in gure 7.39, without altering its hue.
This problem is similar to the problem of example 7.33, but now we are seeking the
intersection of line WP with a line parallel to line RB and closer to P (see gure
7.40). The equation of line RB is (7.195). A line parallel to it has equation
x =
y
3
(7.200)
where is some positive constant. Note that if =
1
2
, this line is the RB line.
Obviously, the limiting case is this line to pass through point P. In that case, parameter
Colour image processing in practice 765
takes the value:
x
p
=
y
p
3
=
y
p
3
x
p
(7.201)
So, the range of allowable values of beta is
y
p
3
x
p
<
1
2
.
W(x ,y )
w w
(x ,y )
0 0
O G
W
P(x ,y )
x
B
y
p p
R
P
Figure 7.40: The saturation of pixel P can be increased without altering its hue, if
pixel P is given the colour of point (x
0
, y
0
), which is the intersection of line WP and
a line parallel to RB.
We can now combine equations (7.200) and (7.196) to work out their intersection
point:
y
0
3
=
_
y
0
6
_
x
p
y
p
1/
6
y
0
_
1
3
x
p
y
p
1/
6
_
=
x
p
6y
p
1
y
0
=
x
p
6y
p
1
1
6x
p
6y
p
1
y
0
=
3
(
6y
p
1) x
p
6y
p
1 3
2x
p
(7.202)
We may then set y = y
0
in (7.195) to work out the value of x
0
:
x
0
=
(1 3
2)x
p
3
2x
p
6y
p
+ 1
(7.203)
The new saturation of pixel P can be computed using (7.199).
766 Image Processing: The Fundamentals
Example 7.35
We decide to saturate all pixels of an image on the perimeter of the grey
triangle in gure 7.41, characterised by the value of parameter as dened
in the same gure. Work out the relationship between of example 7.34
and .
= WS=WT=WU
=
OF=OH
x
B
y
R
W
A
S
F K O G H
T
U
Figure 7.41: When we do not want extreme saturation for the enhanced image pixels,
we may asign to each pixel the saturation dened by the perimeter of the grey triangle,
characterised by parameter .
Remembering that RG =
2 and using the property of equilateral triangles that
GK/GR = GW/GA = 2/3, we work out that when the pixel has hue in the range
[240
o
, 360
o
):
=
4
3 +
2
6
(7.204)
Example 7.36
Compute the enhanced RGB values for the pixel of example 7.33.
We remember that the (x, y) coordinates of a pixel in the Maxwell triangle are expressed
in terms of the (r, g, b) coordinates of the pixel as ((g r)/
2,
3b/
2x
0
, yields:
g =
1 b +
2x
0
2
r =
1 b
2x
0
2
(7.206)
Colour image processing in practice 767
How do we restore multispectral images?
In general, blurring, particularly motion blurring, is expected to aect all channels the same.
So, the restoration lter we deduce for one channel is applicable to all channels and it should
be used to restore each channel separately, unless we have reasons to believe that the channels
have dierent levels of noise, in which case the component of the lter (Wiener lter (page
429), or constraint matrix inversion lter (page 456)) that is designed to deal with noise will
have to be dierent for the dierent channels.
How do we compress colour images?
The human visual system is much more sensitive to variations of brightness than to variations
in colour. So, image compression schemes use more bits to encode the luminance component
of the colour system and fewer bits to represent the chroma component (eg (u, v) or (a, b)).
This, of course, applies to colour systems that separate the luminance component from the
two chroma components. For multispectral images and for RGB colour images, where all
components contain a mixture of luminance and chroma information, all channels are treated
the same.
How do we segment multispectral images?
The most commonly used method for segmenting multispectral images is to treat the values
of each pixel in the dierent bands as the features of the pixel. In other words, for an L-band
image we consider an L-dimensional space and we measure the value of a pixel in each band
along one of the axes in this space. This is the spectral histogram of the image. For a
3-band image this is known as the colour histogram of the image.
Pixels with similar spectra are expected to cluster together. If we know a priori the
number of dierent spectra we expect to have in the image, ie if we know a priori the number
of clusters in the L-dimensional spectral space, then we can identify the clusters by using
simple k-means clustering.
Alternatively, if we wish to take into consideration the locality of the pixels too, we may
use the generalised mean shift algorithm we saw in Chapter 6, on page 574. The generalisation
of this algorithm for multispectral images is trivial: each pixel instead of being characterised
by three numbers (its grey value and its two spatial coordinates) is characterised by L + 2
numbers, namely its values in the L bands and its two spatial coordinates. The values of
these L + 2 parameters are updated according to the mean shift algorithm.
How do we apply k-means clustering in practice?
Step 0: Decide upon the number k of clusters you will create.
Step 1: Select at random k vectors that will be the centres of the clusters you will cre-
ate.
Step 2: For each vector (ie each pixel) identify the nearest cluster centre and assign the
pixel to that cluster.
768 Image Processing: The Fundamentals
Step 3: After all pixels have been assigned to clusters, recalculate the cluster centres as
the average vectors of the pixels that belong to each cluster and go to Step 2.
The algorithm stops when at some iteration step no pixel changes cluster. At the end,
all pixels that belong to the same cluster may be assigned the spectrum of the centre of the
cluster. As the centre of the cluster may not coincide with a pixel, the colour of the pixel that
is nearest to the centre is selected. This algorithm has the problem that it often converges to
bad clusters, as it strongly depends on the initial choice of cluster centres. A good strategy
is to run the algorithm several times, with dierent initial guesses of the cluster centres each
time, and at the end to select the cluster assignment that minimises the sum of the distances
of all pixels from their corresponding clusters.
If the input image is a 3-band image and the outcome of the segmentation is to produce
an image that is more pleasing to the human eye, or the segmentation is aimed at reducing
the number of colours in the image to achieve image compression, then it is a good idea
for this algorithm to be applied in the Lab or the Luv colour spaces, where the Euclidean
metric reects human colour perception. If, however, we are not interested in correct colour
assignment or if we are dealing with multispectral images, the algorithm may be applied
directly to the raw spectral values.
Example 7.37
Segment the image of Plate IXa using k-means clustering and mean shift
clustering. At the end assign to each pixel the mean colour of the cluster
to which it belongs.
As we use the Euclidean distance to compute colour dierences, before we perform any
clustering, we have to convert the image into either the Lab or the Luv space. We
decided to work in the Luv space. Assuming that the RGB values of this image are
sRGB, we must use the procedure described on page 742 for this transformation.
The k-means clustering was run with 10 dierent random initialisations of the cluster
centres and at the end the clustering with the minimum total energy was selected.
(The energy was computed as the sum of the square distances of all pixels from their
corresponding cluster centres). The number of clusters was selected to be k = 10. The
result is shown in Plate IXb. The average RGB values of the pixels that belong to a
cluster were assigned to all the pixels of that cluster.
For the mean shift algorithm, we must select values for the scaling parameters. We
selected h
x
= h
y
= 15, because in previous experiments presented in this book, and for
images of this size, these values gave good results. A quick check of the range of values,
of the Luv representation of the image, revealed that the ranges of values for the three
components were: L [2, 100], u [42, 117] and v [77, 72]. These ranges are of
roughly the same order of magnitude, and we selected to use h
L
= h
u
= h
v
= 15. The
result of the mean shift algorithm run in the static feature space is shown in Plate IXc.
The trajectory of each pixel in the static feature space was assumed to have converged,
Colour image processing in practice 769
when the square dierence between two successive points was less than 0.0001.
For comparison, the mean shift algorithm was run assuming CIE RGB values for the
original image. In this case, L [71, 266], u [191, 424] and v [182, 225], so we
selected h
L
= h
u
= h
v
= 25 and again h
x
= h
y
= 15. The result is shown in Plate
IXd. The trajectory convergence threshold was again 0.0001.
How do we extract the edges of multispectral images?
Often, edge detection is performed to the average band of a multispectral image. This is a
quick and easy way to proceed, but it is not the best.
Edge detection is best performed in the rst principal component of a multispectral image,
as that has the maximum contrast over all original image bands.
If, however, the calculation of the rst principal component is to be avoided, one may
apply gradient magnitude estimation to each band separately and then, for each pixel, select
the maximum gradient value it has over all bands. This way, a composite gradient map is
created, which contains the maximum contrast a pixel has with its neighbours over all bands.
The subsequent processing of this gradient map may then proceed as for the ordinary gradient
map of a grey image.
Example 7.38
Extract the edges of a 6464 subimage of image IXa, using its rst principal
component, its average grey band and the composite gradient map.
The red band of the image, its average grey band (obtained by averaging the grey values
of the three bands) and its rst principal component are shown in gure 7.42. The
Sobel edge detector was applied directly to the images in 7.42b and 7.42c. The results,
superimposed on the original colour image, are shown in Plates Xa and Xb. To work
out the composite gradient map, we computed the gradient magnitude in each band
separately, using the Sobel lters, and accepted as gradient magnitude the maximum
value of the three. The output of this edge detection is shown in Plate Xc. In all three
cases, non-maxima suppression was performed using only the horizontal and vertical
directions. In all three cases, the gradient magnitude was thresholded at 0.3 times the
maximum value, to remove weak edges.
From Plate X, we can see that the use of the rst principal component detected correctly
some edges that were missed by using the average band, while the use of all three bands
allowed the detection of some edges missed by the previous method, since they had very
similar brightness in the grey versions of the image.
770 Image Processing: The Fundamentals
(a) Red band (b) Average band (c) First PC
Figure 7.42: The red band of a subimage of Plate IXa, its average band and its rst
principal component.
What is the take home message of this chapter?
This chapter rst of all stressed the dierence between colour image processing and multi-
spectral image processing. If the outcome of image processing is to be an image to be viewed
by humans, or if the outcome has to be identical with the outcome a human would produce
by taking into consideration the colour of the image, then one has to apply image processing
taking into consideration the way humans see colours. Examples of such applications are im-
age compression and visual industrial inspection of certain products, like cosmetics, ceramic
tiles, etc.
In all other cases a colour image may be treated like any other multispectral image. For
some processes, a grey version of the image may be adequate, and in such a case the rst
principal component may be used, as it contains most of the image information. In cases
where spectral information is important, one may use the generalisations for multispectral
images of several of the methods that were discussed in the previous chapters. The area
where spectral information is most useful is that of image segmentation, where the spectral
signature of a pixel may be used for segmentation and recognition, ie classication. Indeed,
the spectral signature of a pixel may be used to work out what type of object this pixel comes
from. This is less so in robotic vision, where colour is not really a strong indicator of object
class (eg there are green and red and brown doors), but more so in remote sensing, where
the spectral signature of a pixel may be used to identify its class (eg the spectral signature of
grass is dierent from the spectral signature of concrete). In particular, hyperspectral images,
made up from hundreds of bands, may be used to identify dierent minerals present in the
soil. In the case of robotic vision, colour is hardly useful for generic class identication, but
it may be very useful in tracking a particular object. For example, one may not use colour
to identify cars, but a particular car may be tracked in a series of images with the aid of its
specic colour. In those cases, a form of spectral constancy is necessary, ie methodology that
allows the computer to compensate for the variation of the dierent illuminants under which
the same object is seen.
Bibliographical notes
This book is not aimed at covering the most recently proposed algorithms. Conference pro-
ceedings and the latest journal issues are more appropriate for that. This book is about
methods that have already been established and earned their worth in the recent years. So,
we do not include here a comprehensive list of papers on image processing-that would have
been impossible anyway, due to the shear volume of the output of the research community.
We give, however, the key references that helped in shaping this book. This said, the au-
thors have benetted enormously from hundreds of publications over the years. The distilled
information is very dicult to be attributed to specic books and papers.
The references will be commented chapter by chapter, where this is relevant, but there
are some key books that helped us throughout. References [1], [24] and [50] proved to be
invaluable sources of functions and formulae. Papoulis book [47] is irreplaceable for anybody
who wants to understand stochastic processes. For Fourier and Hilbert transforms, Bracewells
book is a classic [9] and excellent worked out examples can be found in [29]. Other classical
books were very helpful too [61; 65; 62; 30; 23] In certain places, reference to Book II is made.
This is reference [55], where detailed information on many topics, which were simply touched
upon here, like, for example, wavelets and mathematical morphology, may be found.
Chapter 2: For Walsh functions we relied heavily on [2]. The book contains some minor
mistakes, but it is the only comprehensive book on Walsh functions and it is generally well
written.
Chapter 3: For z-transforms and the Butterworth lter [20] is a very readable book. For
ICA we relied on [31], [32] and [35] and on many useful discussions with Nikos Mitianoudis,
where we sorted out several inconsistencies and misunderstandings in the published literature.
Chapter 4: Examples 4.10 and 4.12, showing that noise can be white but not indepen-
dent, were devised by Mike Brookes. For weighted median and mode ltering our source
is [26]. For the retinex algorithm useful references were [34; 40; 41; 63]. Unsharp masking
was rst developed in Germany in the 1930s, but the rst paper in the digital image pro-
cessing community was [68]. The idea of pairwise image enhancement came from [33] and
[36]. Toboggan enhancement was proposed in [17]. Details about the algorithm of anisotropic
diusion were found in [13; 49; 42].
Chapter 5: Example 5.66 on lens distortion is from [10], while [3] was found to be a
very useful tutorial on radial lens distortion models. The simulated annealing algorithm is
based on the classical paper by Geman and Geman [21]. We also found enormously helpful
[43] and [48]. The Renormalisation Group transform was introduced to image processing by
Gidas [22]. A tutorial can be found in [58] while an application to signal processing can be
found in [66]. Details on the super-coupling transform can be found in [4].
Chapter 6: The theory of Otsus thresholding was proposed in [45]. Modications and
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
776 Bibliographical notes
renements can be found in [38] and [59]. The ideas for nonlinear edge operators come from
[25] and [60] and more recently from [54]. The theory of linear step edge operators and
hysteresis thresholding come from the classical work by Canny [11; 12] and from [70]. The
extension to ramp edges was proposed in [56] and [53], while the adaptation of the theory
for lines (page 609) was proposed in [51] and [57]. These works make use of the results in
[64], which is a must for anybody who wishes to learn about ltering of noise. How to select
optimally the two thresholds for hysteresis thresholding was studied in [27] and it can be
found in tutorial [52] in a distilled form. The working out of the Sobel lter weights, in Boxes
6.5 and 6.6, was published rst in [37]. The mean shift algorithm is from [14] and [15]. The
normalised graph cuts algorithm was proposed in [69]. Good sources on phase congruency
are [39] and [46], while the monogenic signal was introduced in [18].
Chapter 7: The algorithm on spectral constancy is from [19]. The sRGB system was
proposed in [71]. A useful book for colour transformations is [67]. Reference [72] is a delightful
book to read, if one wants to understand the psychophysics of colour vision, while the bible
of the physics of colour is undoubtedly [73]. It is from [73] that the entries of tables 7.1, 7.3,
7.4 and 7.12 were taken. For linear spectral unmixing with robust or second order statistics
one should consult [5; 6]. For linear spectral unmixing with negative and superunity mixing
proportions, when the reference spectra are mixtures themselves, the reader is referred to
[16]. It was possible to reproduce the colours of the Macbeth colour checker thanks to the
information supplied in [28]. The work reported on how to deal with metamers in industrial
inspection came from [7; 8]. The information on opponent colour space and the sCIELAB
transformation is from [74]. This transformation was used for the segmentation of colour
images in [44].
References
[1] M Abramowitz and I A Stegun (eds), 1970. Handbook of Mathematical Functions, Dover
Publications, New York, ISBN 486 61272 4, Library of Congress Catalogue 6512253.
[2] K G Beauchamp, 1975. Walsh Functions and their Applications, Academic Press, ISBN
0-12-084050-2.
[3] A Bismpigiannis, Measurements and correction of geometric distortion, Stanford Uni-
versity, http://scien.stanford.edu/class/psych221/projects/07
/geometric distortion/project.htm
[4] M Bober, M Petrou and J Kittler, 1998. Non-linear motion estimation using the super-
coupling approach. PAMI-20:550555.
[5] P Bosdogianni, M Petrou and J Kittler, 1997. Mixed Pixel Classication with Robust
Statistics. TGRS-35:551559.
[6] P Bosdogianni, M Petrou and J Kittler, 1997. Mixture models with higher order mo-
ments. TGRS-35:341353.
[7] C Boukouvalas and M Petrou, 1998. Perceptual Correction for Colour Grading using
Sensor Transformations and Metameric Data. Machine Vision and Applications, 11:96
104.
[8] C Boukouvalas and M Petrou, 2000. Perceptual Correction for Colour Grading of Ran-
dom Textures. Machine Vision and Applications, 12:129136.
[9] R N Bracewell, 1978. The Fourier Transform and its Applications, McGraw Hill, ISBN
0-07-007013-X.
[10] F M Candocia, A scale-preserving lens distortion model and its application to image
registration, Florida Conference on Recent Advances in Robotics, FCRAR, Miami, May
2526, 2006.
[11] J Canny, 1983. Finding edges and lines in images. MIT AI Lab Technical Report 720.
[12] J Canny, 1986. A computational approach to edge detection. PAMI-8:679698.
[13] F Catte, P-L Lions, J-M Morel and T Coll, 1992. Image selective smoothing and edge
detection by non-linear diusion. SIAM Journal of Numerical Analysis, 29:182193.
[14] Y Cheng, 1995. Mean shift, mode seeking and clustering. PAMI-17:2146.
[15] D Comaniciu and P Meer, 2002. Mean shift: a robust approach toward feature space
analysis. PAMI-24:603619.
[16] O Duran and M Petrou, 2009. Spectral unmixing with negative and superunity abun-
dances for subpixel anomaly detection. TGRS-6:152156.
777
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
778 References
[17] J Faireld, 1992. Toboggan contrast enhancement. Proceedings of the SPIE Conference
Applications of Articial Intelligence X: Machine Vision and Robotics, K W Bowyer (ed),
2224 April, Orlando, 1708:282292.
[18] M Felsberg and G Sommer, 2001. The monogenic signal. TIP-49:31363144.
[19] G D Finlayson, B Schiele and J L Crowley, 1998. Comprehensive colour image normal-
isation, ECCV, I:475490.
[20] R A Gabel and R A Roberts, 1987. Signals and Linear Systems, J Wiley, ISBN 0-471-
83821-7.
[21] S Geman and D Geman, 1984. Stochastic relaxation, Gibbs distributions and the
Bayesian restoration of images. PAMI-6:721741.
[22] B Gidas, 1989. A Renormalisation Group Approach to Image Processing Problems.
PAMI-11:164180.
[23] R C Gonzalez and R E Woods, 1992. Digital Image Processing, Addison Wesley, ISBN
0-201-50803-6.
[24] I S Gradshteyn and I M Ryzhik, 1980. Table of Integrals, Series and Products, Academic
Press, ISBN 0-12-294760-6.
[25] J Graham and C J Taylor, 1988. Boundary cue operators for model-based image pro-
cessing. Proceedings of the fourth Alvey Vision Conference, AVC88, Manchester, 31
August2 September, 5964.
[26] L D Grin, 2000. Mean, median and mode ltering of images. Proceedings of the
Royal Society of London, 456:29953004.
[27] E R Hancock and J Kittler, 1991. Adaptive Estimation of Hysteresis Thresholds,
CVPR, 196201.
[28] D Holloway, 2004. http://www.mambo.net/cgi-bin/TempProcessor/view/113
[29] H P Hsu, 1970. Fourier Analysis, Simon and Schuster, New York.
[30] T S Huang (ed), 1979. Picture Processing and Digital Filtering, Topics in Applied Physics,
Vol 6, Springer-Verlag, ISBN 0-387-09339-7.
[31] A Hyvarinen, 1998. New approximations of dierential entropy for independent com-
ponent analysis and projection pursuit. Advances in Neural Information Processing
Systems, 10:272279.
[32] A Hyvarinen and E Oja, 2000. Independent Component Analysis: Algorithms and
Applications. Neural Networks, 13:411430.
[33] T Jen, B Hsieh and S Wang, 2005. Image contrast enhancement based on intensity pair
distribution. International Conference on Image Processing, 1:913-916.
[34] D J Jobson, Z Rahman and G A Woodell, 1997. Properties and performance of a
center/surround retinex. TIP-6:451462.
[35] M Jones and R Sibson, 1987. What is projection pursuit?. Journal of the Royal Sta-
tistical Society, Series A, 150:136.
[36] M H Kabir, M Abdullah-Al-Wadud and O Chae, 2006. Image contrast enhance-
ment based on block-wise intensity pair distribution with two expansion forces. 11th
Iberoamerican Congress on Pattern Recognition, CIARP2006, November 14-17, Cancun,
Mexico, pp 247256.
References 779
[37] J Kittler, 1983. On the accuracy of the Sobel edge detector. Image and Vision Com-
puting, 1:3742.
[38] J Kittler and J Illingworth, 1985. On threshold selection using clustering criteria.
SMC-15:652655.
[39] P Kovesi, 1997. Invariant Measures of Image Features From Phase Information, PhD
thesis, University of Western Australia,
http://www.cs.uwa.edu.au/pub/robvis/theses/PeterKovesi/.
[40] E Land, 1986. An alternative technique for the computation of the designator in
the retinex theory of colour vision. Proceedings of the National Academy of Science,
83:30783080.
[41] Y Li, R He, C Xu, C Hou, Y Sun, L Guo, L Rao and W Yan, 2008. Retinex enhancement
of infrared images, 30th Annual International IEEE EMBS Conference, Vancouver,
British Columbia, Canada, August 2024, 21892192.
[42] B Mackiewich, 1995. Intracranial boundary detection and radio frequency correction in
MRI, PhD thesis, Simon Fraser University, Canada,
http://www.cs.sfu.ca/stella/papers/blairthesis/main/main.htm
[43] K V Mardia and G K Kanji, editors, 1993. Advances in Applied Statistics. Carfax Pub-
lishing Company, ISBN 0-902879-25-1.
[44] M Mirmehdi and M Petrou, 2000. Segmentation of colour textures. PAMI-22:142
159.
[45] N Otsu, 1979. A threshold selection method from gray level histograms. SMC-9:62
66.
[46] R Owens, 1997. Feature detection via phase congruency.
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL COPIES/OWENS
/LECT2/node3.html
[47] A Papoulis, 1965. Probability, Random Variables and Stochastic Processes, McGraw-Hill,
Library of Congress Catalogue 64-22956.
[48] K S Pedersen, 2004. From Bayes to pdes, Part II: Mumford-Shah, Geman-Geman and
MRFs, http://www.itu.dk/courses/MBAG/E2004/
[49] P Perona and J Malik, 1990. Scale-space and edge detection using anisotropic diusion.
PAMI-12:629639.
[50] S Persidis, 2007. Mathematical Handbook. ISBN 978-960-7610-13-3, ESPI, Athens.
[51] M Petrou, 1993. Optimal convolution lters and an algorithm for the detection of wide
linear features. IEE Proceedings I, Vision, Signal and Image Processing, 140:331339.
[52] M Petrou, 1994. The Dierentiating Filter Approach to Edge Detection. Advances in
Electronics and Electron Physics, 88:297345.
[53] M Petrou, 1995. Separable 2D lters for the detection of ramp edges. IEE Proceedings
Vision, Image and Signal Processing, 142:228231.
[54] M Petrou, V A Kovalev and J R Reichenbach, 2006. Three dimensional nonlinear in-
visible boundary detection. TIP-15:30203032.
[55] M Petrou and P Garcia Sevilla, 2006. Image Processing, dealing with texture. John Wiley
& Sons, Ltd, ISBN-13-978-0-470-02628-1.
[56] M Petrou and J Kittler, 1991. Optimal edge detectors for ramp edges. PAMI-13:483
491.
780 References
[57] M Petrou and A Kolomvas, 1992. The recursive implementation of the optimal lter
for the detection of roof edges and thin lines. Signal Processing VI, Theory and Appli-
cations, 14891492.
[58] M Petrou, 1995. Accelerated optimisation in Image Processing via the Renormalisa-
tion Group Transformation. Complex Stochastic Systems and Engineering, Ed: D M
Titterington, Clarendon Press, ISBN 0 19 853485 X, 105120.
[59] M Petrou and A Matrucceli, 1998. On the stability of thresholding SAR images. Pat-
tern Recognition, 31:17911796.
[60] I Pitas and A N Venetsanopoulos, 1986. Non-linear order statistic lters for image
ltering and edge detection. Signal Processing, 10:395413.
[61] W K Pratt, 1978. Digital Image Processing, John Wiley & Sons, Inc, ISBN 0-471-01888-0.
[62] R M Pringle and A A Rayner, 1971. Generalised inverse matrices with applications to
Statistics, Being number twenty eight of Grins Statistical monographs and courses,
edited by A Stuart, ISBN 0-85264-181-8.
[63] E Provenzi, M Fierro, A Rizzi, L De Carli, D Gadia and D Marini, 2007. Random spray
retinex: a new retinex implementation to investigate the local properties of the model.
TIP-16:162171.
[64] S O Rice, 1945. Mathematical analysis of random noise. Bell Systems Tech. J., 24:46
156.
[65] A Rosenfeld and A C Kak, 1982. Digital Picture Processing, Academic Press, ISBN
0-12-597301-2.
[66] M Samonas and M Petrou, 1999. Multiresolution Restoration of Medical Signals using
the Renormalisation Group and the Super-coupling Transforms. Computers in Biology
and Medicine, 29:191206.
[67] S J Sangwine and R E N Horne, (eds), 1998. The Colour Image Processing Handbook.
Chapman and Hall, ISBN 0-412-80620-7.
[68] W F Schreiber, 1970. Wirephoto Quality Improvement by Unsharp Masking. Pattern
Recognition, 2:117121.
[69] J Shi and J Malik, 2000. Normalised cuts and image segmentation. PAMI-22:888905.
[70] L A Spacek, 1986. Edge detection and motion detection. Image and Vision Computing,
4:4353.
[71] M Stokes, M Anderson, S Chandrasekhar and R Motta, 1996. A standard default colour
space for the Internet-sRGB,
http://www.w3org/Graphics/Color/sRGB.html
[72] B A Wandell, 1995. Foundations of Vision. Sinauer Associates, Inc. Publishers, Sunder-
land, Massachusetts, ISBN 0-87893-853-2.
[73] G Wyszecki and W S Stiles, 1982. Color Science. ISBN 0-471-02106-7, John Wiley &
Sons, Inc, New York.
[74] X Zhang and B A Wandell, 1996. A spatial extension of CIELAB for digital color image
reproduction. SID journal.
http://white.stanford.edu/brian/scielab/introduction.html
Index
D
55
, 705, 706
D
65
, 705, 706
for sRGB, 732
D
75
, 705, 706
-trimmed lter, 760
2
-test, 237, 239
k-means clustering, 767
z-transform, 294, 301
3D geometry, 721
algorithm
k-means clustering, 767
anisotropic diusion, 349
averaging angles, 629
colour constancy, 687
colour enhancement, 761
constrained matrix inversion, 464
DFT, 102
edge detection, 602
edge detection as zero crossings, 623
endmember spectra by ICA, 697
endmember spectra by PCA, 696
expectation maximisation, 537
Gauss-Seidel method, 485
gradient descent, 487
Haar transform, 89
histogram equalisation, 370
histogram equalisation with random ad-
ditions, 372
histogram hyperbolisation, 373
histogram hyperbolisation with random
additions, 374
Hough, 520
hysteresis edge linking, 621
ICA, 274
inverse ltering, 410
Jacobis method, 482
K-L transform, 214, 215
knee, 549
linear spectral unmixing, 692, 696, 697
local energy, 651
local variance, 339
metamer mapping, 757
monogenic signal, 660
normalised graph cuts, 589
pairwise image enhancement, 378
PCA, 674
PCA for material spectra, 709
power method, 682
reconstruction with unknown degradation
matrix, 489
retinex, 360
Riesz transform, 660
simulated annealing, 539
simulated annealing with Gibbs sampler,
507
simulated annealing with Metropolis sam-
pler, 506
spectral constancy, 687
sRGB to XYZ, 743
standard illuminant spectra, 705
successive doubling, 124
SVD, 69
thresholding a unimodal histogram, 549
toboggan enhancement, 387, 389
unsharp masking, 357
unsharp masking, adaptive, 358
Walsh transform, 91
watershed, 566
whirl transform, 469, 472
whirl transform, scrambling matrix for,
473
Wiener ltering, 431
alychne, 725
781
Image Processing: The Fundamentals, Second Edition Maria Petrou and Costas Petrou
2010 John Wiley & Sons, Ltd. ISBN: 978-0-470-74586-1
782 Index
ambient light, 351
analytic signal, 659
anisotropic diusion, 337, 342, 348
algorithm for, 349
approximation of image
by DFT, 103, 176
by EDCT, 148, 176
by EDST, 165, 166, 176
by Haar transform, 79, 89, 176
by Hadamard transform, 91, 176
by K-L, 220
by ODCT, 156, 176
by ODST, 175, 176
by SVD, 63, 176
by Walsh transform, 91, 176
error of
by DFT, 103, 176
by EDCT, 148, 176
by EDST, 165, 166, 176
by Haar transform, 79, 89, 176
by Hadamard transform, 91, 176
by K-L, 220
by ODCT, 156, 176
by ODST, 175, 176
by SVD, 63, 176
by Walsh transform, 91, 176
astronomical image, 402
attribute, 551
autocorrelation function, 211, 216, 313, 424
ensemble, 190
spatial, 196, 325
autocorrelation matrix, 211, 216
spatial, 215
autocovariance, 190
autocovariance matrix, 201
average angle, 629
axis of symmetry, 202
band, 2
barrel distortion, 513
basis images, 75
DFT, 101
EDCT, 146
EDST, 163
Haar, 80, 93
Hadamard, 88
ICA, 281
independent, 292
K-L, 221
ODCT, 154
ODST, 173
orthogonal, 292
uncorrelated, 292
Walsh, 88
basis signals in ICA, 283
Bessel function, 298, 300
approximation of, 300
binary order, 86
blind source separation, 289
block circulant matrix, 437
diagonalisation of, 445
inversion of, 438
camera
colour, sensor arrangement of, 3
distortion due to lens, 513
multispectral, 2
point spread function of, 12
Canny criteria, 606
Canny lter, 608
central limit theorem, 235, 264
central moments, 239
Chebyshev norm, 733
checkerboard eect, 7
chroma, 738
chromaticity diagram, 718
denition of, 720
CIE, 704, 717
circulant matrix, 438
city block metric, 733
class of a pixel, 528
cocktail party problem, 234, 264, 283
colour
calibration between dierent viewers, 730
comparison of, 733
denition of, 700
emulation of spatial dependence of, 752
enhancement, 761
factors inuencing its perception, 740
spatial context of, 740
temporal context of, 740
colour camera, 3
colour constancy, 713, 742
colour histogram, 767
Index 783
colour matching experiments, 715
colour space
Lab, 734, 742
Lab, errors in, 745
Lab, inverse transform, 736
Luv, 734, 742
Luv, denition of, 734
Luv, errors in, 743
Luv, inverse transform, 735
2D, 718
3D, 718
opponent, 741
perceptually uniform, 734, 742
colour system
CIE RGB, 717
CIE RGB, chromaticity diagram for, 724
CIE RGB, colour matching functions of,
718
CIE RGB, denition of, 717
CIE RGB, denition of alychne for, 726
CIE RGB, tristimulus values of, 718
denition of, 715
S-CIELAB, 753
sRGB, 717, 742
sRGB, denition of, 732
transformation CIE RGB to XYZ, 727
transformation from Lab to RGB, 736
transformation from Luv to RGB, 735
transformation from sRGB to Lab, 742
transformation from sRGB to Luv, 742
tristimulus values of, 715
XYZ, 717, 718
XYZ, chromaticity diagram of, 728
XYZ, denition of, 726
XYZ, imaginary primaries of, 728
XYZ, tristimulus values of, 727
complete set of functions, 72
cones, 715, 741
condence of
2
-test, 239
conjugate transpose, 50
constrained matrix inversion, 436
algorithm for, 464
comparison with Wiener ltering, 462
lter derivation for, 459
lter for, 456
inhomogeneous degradation, 477
contrast, inhomogeneous, 375
convolution, 632
convolution theorem, 105, 108
convolution theorem, for z-transform, 302
cooling schedule, 506
correlated colour temperature, 705
cosine transform
even symmetric, 137, 138
basis images of, 146
inverse of 1D, 143
inverse of 2D, 145
odd symmetric, 137, 149
basis images of, 154
inverse of 1D, 152
inverse of 2D, 154
cost function, 477, 481
for image restoration, 499
minimisation of, 503
minimisation of quadratic, 487
minimum of, 491
covariance, 184
cover proportions, 689
cross correlation, 193, 423
cross covariance, 193
data whitening, 265, 266
daylight, 703
daylight, variations of, 701, 703
degree, of a graph node, 577, 586
degrees of freedom of
2
-test, 239
delta function, 95, 142, 144, 152, 405
Fourier transform of, 325
shifting property of, 15
DFT, 94, 176
algorithm for, 102
convolution theorem for, 105
dc component of, 118
display of, 112
for rotated image, 113
for scaled image, 119
for shifted image, 114
imaginary, 130
inverse of, 95
magnitude and phase of, 411
matrix for, 99
real valued, 126
dichromats, 717
dierentiation with respect to a vector, 267
784 Index
direct component, 118
discrete Fourier transform, see DFT
distribution function, 178, 180
divergence, 345
dual grid, 569, 593
Earth observation, 688
EDCT, 138, 176
basis images of, 146
inverse
1D, 143
2D, 145
edge detection, 527, 591
algorithm, 602
and noise, 593, 605, 606
by linear ltering, 592, 602
by nonlinear ltering, 591
Cannys criteria for, 606
for multispectral images, 769
in the rst principal component, 769
statistical, 591
using phase congruency, 626
via local energy, 652
with Laplacian of Gaussian, 621, 623
edge preserving smoothness constraint, 502
edge, of a graph, 576
edgel, 593
EDST, 157, 176
basis images of, 163
inverse
1D, 160
2D, 162
eigenface, 292
eigenimage, 60, 222, 292
eigenvector, 222
elementary images
DFT, 101
EDCT, 146
EDST, 163
Haar, 80, 93
Hadamard, 88
ICA, 281
independent, 292
K-L, 221
ODCT, 154
ODST, 173
orthogonal, 292
uncorrelated, 292
Walsh, 88
EM algorithm, 537
end members, 695, 697
energy, local, 660
ensemble autocorrelation function, 190
ensemble autocorrelation matrix, 210
ensemble of images, 190192
ensemble statistics, 195, 292
entropy, 239, 243
of an image, 673
entropy of a Gaussian pdf, 243, 246
entropy of a uniform pdf, 244
equal energy spectrum, 715
ergodicity, 195, 197, 199, 200, 215
error
due to wrong reference white, 743, 745
in image approximation
by DFT, 103, 176
by EDCT, 148, 176
by EDST, 165, 166, 176
by Haar transform, 79, 89, 176
by Hadamard transform, 91, 176
by K-L, 220
by ODCT, 156, 176
by ODST, 175, 176
by SVD, 63, 176
by Walsh transform, 91, 176
least mean square, 226
least mean square in image approx., 292
least square in image approx., 292
error function, 237, 251
estimation
least square error, 419
MAP, 490
maximum a posteriori, 490
Euclidean metric
denition of, 733
use in measuring colour dierence, 734
even antisymmetric sine transform, see EDST
even symmetric cosine transform, see EDCT
event, 178
probability of, 179
expectation maximisation algorithm, 537
expectation value, 181
false contouring, 7
Index 785
false maxima, 619
fast Fourier transform, 124
feature, 551
symmetry of, 661
feature detection
antisymmetric, 650, 652
symmetric, 648, 650, 652
feature extraction, 44
feature space, 551
Fiedler vector, 586
lter
-trimmed, 337, 760
z-transform of, 294
Butterworth, 303
denition of, 294
edge adaptive, 574
edge preserving, 574
rst derivative of Gaussian, 352
at, 327, 328
frequency response function of, 294
Gaussian, 309, 332
high pass, 351
ideal 1D low pass, 296, 300
ideal 2D band pass, 299
ideal 2D high pass, 299
ideal 2D low pass, 296, 300
ideal low pass 1D-2D comparison, 301
IIR, 296
impulse response of, 294
innite impulse response, 296
Laplace transform of, 294
Laplacian of Gaussian, 621
median, 326328
mode, 326, 328, 574
nonlinear, 357
nonrecursive, 301
rank order, 326
real, frequency response of, 294
real, unit sample response of, 294
recursive, 301, 303
Robinson, 406
second derivative of Gaussian, 352
separable, 605
Sobel, 605
stability of, 294
statistical, 326
system transfer function of, 294
unit sample response of, 294
vector median, 760
weighted median, 333
weighted mode, 333
ltering
averaging, 327, 328
lowpass, 328
mean shift, 574
median, 326, 327
mode, 328
rst principal component
for edge detection, 769
for maximum contrast, 672
xed temperature annealing, 505
atelding, 364, 366, 407, 548
oor operator, 216
Fourier series, 625
Fourier slice theorem, 404
Fourier transform, 72, 94, 177, 293, 637
denition of, 637
duality of, 637, 638
fast, 124
inverse of, 637
of a constant, 637
of a delta function, 637
of an integral, 637
of derivative function, 637
Parsevals theorem for, 637
properties of, 637
scaling property of, 637
shifting property of, 637
slice theorem of, 404
Wiener-Khinchine theorem of, 325
fourth order cumulant, 240
frequency convolution theorem, 108
frequency response function, 294, 396
Fresnel integrals, 400
asymptotic behaviour of, 401
function
, 183
, 183
delta, 405
Gabor, 126
gradient vector of, 345
Haar, 73, 74
harmonic expansion of, 641
Laplacian of, 345
786 Index
periodic, 141
Rademacher, 74, 86
Riemann, 183
step, 405
Walsh, 73, 74, 86
functions
complete set of, 72
orthogonal, 72
orthonormal, 72
fuzzy logic, 200
Gabor functions, 126
Gamma function, 183
Gauss-Seidel method, 482, 485
Gaussian lter, 309
weights of, 329
Gaussian ltering, 342
Gaussian function
Fourier transform of, 309
Gaussian mixture model, 537
Gaussian probability density function
parameter estimation for, 537, 539
geometric image degradation
global, 513
inhomogeneous, 515
geometric progression, 95
Gibbs sampler, 503
GMM, 537
good locality measure, 617
gradient, 345
magnitude of, 557
gradient descent algorithm, 487
gradient vector, 345
Gram matrix, 228
graph, 576
Laplacian matrix of, 577, 586
relational, 576
undirected, 576
weights, 576
Gray code, 86
grey level, 1
grid, dual, 569, 593
Haar functions, 73, 74
discrete version of, 76
image basis from, 75
Haar transform, 74, 177
advantages of, 92
algorithm for, 76, 89
basis images of, 80
disadvantages of, 92
elementary images of, 80
Haar wavelet, 93
Hadamard matrices, 85
Hadamard transform, 74, 88
basis images of, 88
half width half maximum, 753
heat equation, 342, 348
Hermitian transpose, 50
Hilbert pair, 631
Hilbert transform, 631, 639
generalisation of, 660
hill in mathematical morphology, 568
histogram, 528
equalisation, 370
equalisation with random additions, 372
hyperbolisation, 373, 374
hyperbolisation with random additions,
374
manipulation, 367, 368
normalised, 236
stretching, 367
with upsampling, 553
homomorphic lter, 364
Hough transform, 520
hue, 733
in practice, 747
perceived dierences in, 738
perceived, measurement of, 737
hue angle, 738
human vision, 700, 701
colour constancy of, 714
ganglion cells, 292
perception of brightness, 725
photopic, 733
physics of, 700
psychophysics of, 700
rods, 714, 733
scotopic, 733
hysteresis edge linking, 621
hysteresis thresholding, 528
ICA, 234
algorithm for, 274
Index 787
blind source separation, 289
characteristics of, 289
dierences in image and signal process-
ing, 290
for linear spectral unmixing, 697
for signal processing, 283
in image processing, 264
independent components, 290
medical images, 260
sparse representation, 290
ideal white, 715, 716, 718
IIR, 296
image
approximation of
by DFT, 103, 176
by EDCT, 148, 176
by EDST, 165, 166, 176
by Haar transform, 79, 89, 176
by Hadamard transform, 91, 176
by K-L, 220, 226
by ODCT, 156, 176
by ODST, 175, 176
by SVD, 62, 63, 176
by Walsh transform, 91, 176
average of, 118
bit size of, 6
classication of, 528
coding for optimisation algorithms, 505
colouring for optimisation algorithms, 505
compression, 44
contrast of, 10
dc component of, 118, 176
denition of, 1
degradation, model of, 396
DFT of scaled, 119
digital, 1
digital, formation of, 3
dilation, 554
edge detection of, 527
eigenimages of, 60
enhancement, 43, 293, 358, 395
local, 375
pairwise, 377, 378
toboggan, 383
entropy, 673
erosion, 555
ltering, 293, 574
rst dierence of, 449
gradient magnitude of, 557
Haar transform of, 74, 76
Hadamard transform of, 74
histogram of, 528
information of, 673
inhomogeneous degradation of, 468
labelling of, 528
Laplacian of, 342, 449
mean value of, 220
morphological reconstruction of, 554
multispectral, 1, 669
multispectral, compression of, 767
multispectral, edge detection, 769
multispectral, restoration of, 767
multispectral, segmentation of, 767
of an ideal line, 403
panchromatic, 1
quality of, 7
registration, 395
representation by a graph, 576
resolution of, 7
restoration, 44, 395
as MAP estimation, 490
constrained matrix inversion, 436
degradation matrix unknown, 489
geometric, 395, 513
grey value, 395
inverse ltering, 396
nonlinear, 490
Wiener ltering, 419
rotation of, 519
SAR, 326
scaling, 112
second dierence of, 449
segmentation of, 527, 528
sharpening, 351
singular value decomposition of, 50, 51,
60
synthetic aperture radar, 326
thresholding, 528
transform, cosine even symmetric, 137,
138
transform, cosine odd symmetric, 137, 149
transform, DFT, 94, 95
transform, EDCT, 138
transform, EDST, 157
788 Index
transform, Fourier, 94, 95
transform, Hadamard, 88
transform, ODCT, 149
transform, ODST, 167
transform, sine even antisymmetric, 137,
157
transform, sine odd antisymmetric, 137,
167
transform, unitary, 49
transform, Walsh, 88
transforms, comparison of, 176
Walsh transform of, 74, 76
whirl transform of, 468
image basis from
DFT, 101
EDCT, 146
EDST, 163
Haar functions, 75
Haar transform, 80
Hadamard transform, 88
ICA, 281
K-L, 221
ODCT, 154
ODST, 173
SVD, 60
Walsh functions, 75
image noise, 311
image smoothing, 328
image thresholding and variable illumination,
545
images ensemble, 190
imaginary primaries, 728, 729
impulse response, 294
independent component analysis,
see ICA
independent components, 277
industrial inspection, 364, 366, 742
and metamers, 756
innite impulse response, 296
information content, 673
information of a symbol, 673
interference, low frequency, 351
inverse ltering, 396
algorithm for, 410
comparison with Wiener ltering,
430
iterative conditional modes, 503
Jacobis method, 482
Jacobian matrix, 270
joint distribution function, 184
joint probability density function, 184
JPEG, 176
K-L transform, 200, 201, 214
as a rst step of ICA, 265
basis images of, 221
comparison with SVD, 292
error of, 220
error of approximation with, 226
Karhunen-Loeve, see K-L
kernel-based method, 553
knee algorithm, 549
Kronecker order, 86
Kronecker product, 38
kurtosis, 240
excess, 240
Pearson, 240
proper, 240
L cones, 715
label of a pixel, 528, 529
labelling, 528, 529
Lagrange multiplier, 269
Lagrange multipliers, 460
method of, 268
used in ICA, 269
Lambertian surface, 685
Laplace transform, 294
Laplacian, 345
as a smoothness constraint, 455
matrix operator, 450
matrix operator, eigenvalues of,
454
of an image, 342, 449
Laplacian matrix of a graph, 577, 586
Laplacian of Gaussian lter, 621
least square error estimation, 419
least square error solution, 692
Leibniz rule, 348, 531, 620
lens distortion
algorithm for, 522
modelling of, 514
leptokurtic probability density function,
240
Index 789
level set methods, 621
lexicographic order, 86
library spectra, 689
lightness for colour description, 738
line detection, 609
using phase congruency, 626
via local energy, 652
linear degradation process
inversion of its matrix, 446
matrix of, 444
linear lter, 293
linear ltering, 293
high pass, 351
linear operator, 12
point spread function of, 12
linear spectral unmixing
with ICA, 697
with library spectra, 692
with PCA, 695
without library spectra, 695
linear system of equations
solution by Gauss-Seidel method, 482, 485
solution by Jacobis method, 482
local energy, 626, 647, 660
algorithm for, 651
lters for, 648
in 2D, 659
measurement of, 630
local image enhancement, 375
M cones, 715
Macbeth colour checker, 707, 708
magnetoencephalography, 288
manifold, 702
MAP estimation, 490
marginal probability density function, 188
Markov chain, 503
matrix
block circulant, 22, 437, 600
circulant, 22, 438
conjugate transpose of, 50
diagonalisation of, 50
eigenvalues of, 587
Gram, 228
Hadamard, 85
Hermitian transpose of, 50
Jacobian, 270
Kronecker product of, 38, 445
norm of, 63, 64, 226
orthogonal, 50
partition of, 19
positive semidenite, 228, 586
symmetric, 586
Toeplitz, 213
trace of, 64, 226
unitary, 49, 99
maximum a posteriori estimation, 490
Maxwell triangle, 720, 747
coordinates in, 722
MCMC minimisation, 503
mean distance of zero crossings, 619
mean shift, 339
for colour image segmentation, 767
for segmentation, 574
for smoothing, 337, 339
mean square error, 416
mean value, 181, 185
median, 326
medical images and ICA, 260
membrane model, 501
edge preserving, 502
metamerism, 756
metamers, 713, 756
in industrial inspection, 756
metric
L
1
norm, 733
L
2
norm, 733
L
norm, 733
Chebyshev norm, 733
city block, 733
denition of, 733
Euclidean, 733
Metropolis sampler, 503
minimum error threshold, 530, 534, 535
drawbacks of, 541
mode, 328
mode lter, 574
edge adaptive, 574
edge preserving, 574
mode ltering, 326
modulus, 100
monochromats, 717
monogenic signal, 659, 660
Monte Carlo Markov chain, 503
790 Index
Monte-Carlo method, 757
morphological image reconstruction, 557
morphological reconstruction, 554, 558, 560
motion blurring, 397
point spread function of, 398
MSE, 416
multiple illumination sources, 687
multiresolution, 511
via renormalisation group transform, 512
via super-coupling transform, 512
multiresolution optimisation, 512
multispectral camera, 2
multispectral image, 1, 669
natural materials
spectra of, 708
variation of, 706, 708
natural order, 86
near infrared, 704
negentropy, 239
approximation of, 246, 252, 254, 257
denition of, 243
maximisation of, 269
noise
additive, 311, 337
additive Gaussian, 492
autocorrelation function of, 313
biased, 312, 314
blue, 313
coloured, 313, 314
dependent, 316, 318
ltered, 615
xed pattern, 312
Gaussian, 311, 317, 326328, 332, 337
homogeneous, 311
iid, 311, 313315, 317, 492
impulse, 311, 328, 337
in multispectral images, 759
independent, 311, 312
independent identically distributed, 313
mixed, 337, 760
multiplicative, 311, 325
probability density function of, 235
salt and pepper, 311
shot, 311, 326
spec, 311
unbiased, 311, 312
uncorrelated, 311313
uniform, 315
white, 311, 313316, 318
zero-mean, 311313, 318
non-Gaussianity
measure of, 239, 240
non-maxima suppression, 601
norm
L
1
, 733
L
2
, 733
L
, 733
Chebyshev, 733
of a matrix, 64, 226
normal colour vision, 717
normal distribution, 534
normal order, 86
normalised graph cuts algorithm, 576, 589
as an eigenvalue problem, 576
normalised histogram, 236
ODCT, 149, 176
basis images of, 154
inverse
1D, 152
2D, 154
odd antisymmetric sine transform, see ODST
odd symmetric cosine transform, see ODCT
ODST, 167, 176
basis images of, 173
inverse
1D, 171
2D, 172
opponent colour space, 741, 752
opponent colours, 741
optimisation, 503505, 512
orthogonal matrix, 50
orthogonal set of functions, 72
orthonormal set of functions, 72
orthonormal vectors, 50
Otsu method, 541, 542
drawbacks of, 545
p-tile method, 530
pairwise image enhancement, 377, 378
Paley order, 86
panchromatic image, 1
Parsevals theorem, 496, 637, 647
Index 791
path
non-descending, 568
non-ascending, 568
PCA, 672
advantages of, 675
algorithm for, 674
disadvantages of, 675
for linear spectral unmixing, 695, 696
for material spectra, 708
of daylight spectra, 703
pel, 1
perceptually uniform colour space, 742
period of a function, 141
phase congruency, 625
in 2D, 659
in practice, 630
measure of, 627
photometric stereo, 364, 366
photopic vision, 733
picture element, 1
pin-cushion distortion, 513
pixel, 1
platykurtic probability density function, 240
point source, 12, 14
point spread function, 12, 13, 396
from astronomical image, 402
deduction of, 27
from an ideal edge, 404
of a camera, 405
separable, 13
shift invariant, 13
Poisson distribution, 311
power method
algorithm for, 682
for eigenvector estimation, 682
power spectrum, 313
primary lights, 715
principal component analysis, see PCA
probability
a posteriori, 491
a priori, 491
denition of, 179
of an event, 178
posterior, 491
prior, 491
probability density function, 181, 528, 530
fourth order moment of, 239
Gaussian, 235, 237, 240
Gaussian, entropy of, 243, 246
Gaussian, fourth moment of, 242
Gaussian, kurtosis of, 242
Gaussian, skewness of, 241
leptokurtic, 240
non-Gaussianity of, 235
normal, 238
of a function of a random variable, 320
of sum of two random variables, 545, 546
platykurtic, 240
second order moment of, 239
skewness of, 239
sub-Gaussian, 240
super-Gaussian, 240
third moment of, 239
third order moment of, 241
uniform, denition of, 244
uniform, entropy of, 244
uniform, fourth moment of, 247
uniform, negentropy of, 244, 247, 251
uniform, variance of, 244
variance of, 239
probability theory, 200
pseudo-convolution, 632
pseudorandom, 178
purple line, 725
quad tree, 554
Rademacher function, 74, 86
radiant power, 718
radiometric correction, 689
ramp detection, 609
random experiment, 178
random eld, 177, 178, 189, 190
autocorrelation function of, 424
autocorrelation of, 190
autocovariance, 190
ergodic, 195, 197, 428
homogeneous, 195, 424
spatial autocorrelation function of, 325
spatial statistics of, 196
stationary, 195
random elds
cross correlation of, 193, 423
cross covariance of, 193
792 Index
uncorrelated, 193
random number generator, 178
according to given probability density
function, 510
random variable, 177, 178
central moments of, 239
distribution function of, 180
entropy of, 243
expectation value of, 181, 190
function of, 320
marginal probability density function of,
188
mean value of, 181, 185
moments of, 239
probability density function of, 181
standard deviation of, 181
variance of, 181
zero-mean, 234
random variables, 190
covariance of, 184
expectation value of, 187
independent, 184, 234
joint distribution function of, 184
joint probability density function of,
184
orthogonal, 184, 234
uncorrelated, 184, 234
rank order ltering, 326
ranking vectors, 760
Rayleigh quotient, 578, 682
minimisation of, 585
recursive lter, 303
reference white, 730, 742
and transformations to Luv and Lab, 735
error due to, 743, 745
reectance function, 545, 684, 700
region growing, 553
regularisation term, 501
relational graph, 576
remote sensing, 671, 675, 688, 689, 770
renormalisation group transform, 512
retinex algorithm, 357, 360
ridges in the watershed algorithm, 568
Riemanns function, 183
Riesz transform, 659, 660
Robinson operators, 406
rods, 733
rotation transformation, 519
S cones, 715
SAD, 693
saturation, 733
in practice, 747
perceived, measurement of, 737
scale space, 342
scaling function, 93
scotopic vision, 733
seed of a random number generator, 178
seed pixel, 554, 557
segmentation, 527, 528
by ltering, 574
by normalised graph cuts, 576
via edge detection, 620
with mean shift, 574
with region growing, 553
with split and merge, 554
with watershed, 553
sensitivity function of multispectral camera,
2
sensor sensitivity function, 711
sequency order, 86
sharpening, 351
signal
analytic, 659
local energy of, 626, 647
monogenic, 659, 660
projection of, 632
signal to noise ratio, 616
simulated annealing, 503, 539
accelerated, 511
cooling schedule for, 506
xed temperature, 505
properties of, 511
termination criterion for, 505
with Gibbs sampler, algorithm for, 507
with Metropolis sampler, algorithm for,
506
sine transform
even antisymmetric, 137, 157
basis images of, 163
inverse of 1D, 160
inverse of 2D, 162
odd antisymmetric, 137, 167
basis images of, 173
Index 793
inverse of 1D, 171
inverse of 2D, 172
singular value decomposition, see SVD
skewness, 239
smoothing, edge adaptive, 337
smoothness constraint, 455, 493, 501
snakes, 621
Sobel lter, 596, 605
source signal, 290
sparse representation, 290
spatial autocorrelation function, 196
spatial autocorrelation matrix, 215, 222
spatial mean, 196
spatial statistics, 195, 196
spectral angular distance, 693
spectral band, 2
spectral constancy, 669, 684, 742
algorithm for, 687
assumptions for, 687
spectral density, 427
spectral histogram, 767
spectral signature of a pixel, 684
spectral unmixing, 671, 688
linear, 688, 689
spectrum locus, 724
split and merge algorithm, 554
stacking operator, 29
standard deviation, 181
standard illuminant, 704
as reference white, 735
standard illuminants
chromaticity coordinates of, 705
correlated colour temperature of, 705
spectral radiant power of, 705, 706
standard observer, 718
statistics
ensemble, 190, 195
spatial, 195, 196
step function, 405
structuring element, 554, 555, 560
successive doubling, 124
super-coupling transform, 512
SVD, 50, 51, 60, 176
algorithm for, 69
comparison with K-L, 292
intuitive explanation of, 62
symmetry axis, 202
system transfer function, 294
temperature parameter, 501
theorem of the three perpendiculars, 721
thin plate model, 501
threshold
and variable illumination, 545, 548
minimum error, 530, 534, 535, 541
Otsu, 541, 542, 545
thresholding, 528
a unimodal histogram, 549
drawbacks of, 550
hysteresis, 528
minimum error, 530, 534
Otsu method for, 541, 542, 545
p-tile method, 530
under variable illumination, 548
time convolution theorem, 108
toboggan enhancement, 383
Toeplitz matrix, 213
trace of a matrix, 64, 226
transform
DFT, 176
EDCT, 176
EDST, 176
Fourier, 177
Haar, 176, 177
Hilbert, 639, 660
K-L, 200, 201, 214
ODCT, 176
ODST, 176
Riesz, 660
Walsh, 176, 177
triangle inequality, 733
trichromatic theory of colour vision, 715
trichromats, 717
tristimulus values, 715
negative, 715
ultraviolet, 704
unit sample response, 294
unitary matrix, 49
unitary transform, 49
unsharp masking, 357
local, 357
locally adaptive, 358
value in colour representation, 761
794 Index
variable illumination, 351, 364, 407, 545, 548
as low frequency interference, 351
variance, 181
ecient computation of, 339
interclass, 541, 544
intraclass, 541, 544
vector -trimmed median lter, 760
vector median lter, 760
vector outer product, 47
vector ranking, 760
vectors
orthonormal, 50
outer product of, 47
vertex of a graph, 576
Walsh functions, 73, 74, 86
Paley order of, 86
binary order of, 86
discrete version of, 76
dyadic order of, 86
from Hadamard matrices, 85
from Rademacher functions, 74
image basis from, 75
Kronecker order of, 86
lexicographic order of, 86
natural order of, 86
normal order of, 86
ordering of, 86
sequency order of, 86
Walsh order of, 86
Walsh-Kaczmarz order of, 86
Walsh order, 86
Walsh transform, 74, 88, 177
advantages of, 92
algorithm for, 76, 81, 91
basis images of, 88
disadvantages of, 92
Walsh-Kaczmarz order, 86
waterhole in mathematical morphology, 568
waterlines for the watershed algorithm, 568
watershed algorithm, 566
drawbacks of, 568
watershed segmentation, 553
wavelets, 93
Weber-Fechner law, 360
whirl transform, 468
algorithm for, 469, 472, 473
Wiener lter, 417, 420, 429
comparison with constrained matrix in-
version lter, 462
derivation of, 428
in practice, 430
relationship with inverse lter, 430
Wiener ltering, 419
algorithm for, 431
Wiener-Khinchine theorem, 313, 325, 427