0% found this document useful (1 vote)
1K views587 pages

(Image Processing Book) - Jain - Fundamentals of Digital Image Processing - Prentice Hall 1989

Uploaded by

g_turing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
1K views587 pages

(Image Processing Book) - Jain - Fundamentals of Digital Image Processing - Prentice Hall 1989

Uploaded by

g_turing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 587

I

• • •


PRENTICE HALL INFORMATION


AND SYSTEM SCIENCES SERIES •

Thomas Kailath, Editor



ANDERSON & MOORE Optimal Control: Linear Quadratic Methods
ANDERSON & MOORE Optimal Filtering
ASTROM & WITIENMARK Computer-Controlled Systems: Theory and Design, liE
DICKINSON Systems: Analysis, Design and Computation
GARDNER Statistical Spectral Analysis: A Nonprobabilistic Theory
GooDWIN & SIN Adaptive Filtering. Prediction. and Control
GRAY & DAVISSON Random Processes: A Mathematical Approach jar Engineers
HAYKIN Adaptive Filter Theory. 21E
JAIN Fundamentals of Digital Image Processing
JOHNSON Lectures on Adaptive Parameter Estimation
KAILATH Linear Systems
KUMAR & VARAIYA Stochastic Systems: Estimation, Indentification, and Adaptive
• Control
KUNG VLSI Array Processors

KUNG, WHITEHOUSE, & VLSI and Modern Signal Processing •
KAlLATH, EDS.
KWAKERNAAK & SIVAN Modem Signals and Systems
LANDAU System Identification and Control Design Using PlM+ Scftware
L!UNG System Identification: Theory for the User
MACOVSKI Medical imaging Systems
MIDDLETON & GOODWIN Digital Control and Estimation: A Unified Approach •
NARENDRA & Stable Adaptive Systems
ANNASWAMY
SASTRY & BODSON Adaptive Control: Stability. Convergence, and Robustness
SOLIMAN & SRINATH Continuous and Discrete Signals and Systems
SPILKER Digital Communications by Sate/lite
WILLIAMS Designing Digital Filters .


,

•• •

Fundamentals
of Digital
Image Processing
.• . •
,,

. ,

ANIL K. JAIN

University of California, Davis


c
\
\
• •

• •

,
.,

i

,

-
\

\
'.


PRENTICE HALL, Englewood.Cliffs; NJ 07632


,

Library of Congress Catulogtng-in-Publicmion Data


JAiN, Asn, K.
Fundamentals of digital image processing.
Bibliography; p,
Includes index.
I. Image processing-e-Digital techniques. I. T.de.
1:1\1632.J35 1989 621.36'7 88-U624
ISBN 0-13-336165-9


• •
Editorial/production supervision: Colleen Brosnan
Manufacturing buyer: Mary Noonan

Page layout: Martin Behan
Cover design: Diane Saxe 1
Logo design: A.M. Bruckstein
Cover an: Halley's comet image by the author
reconstructed from data gathered by NASA:$
Pioneer Venus Orbiter in 1986.

e1989 bY Prentice-Hall, Inc. •


A Paramount Communications Company
Englewood Cliffs, New Jersey 07632

All rights reserved. No part of this book may be


reproduced, in any form or by any means,
without permission in writing from the publisher.



Printed in the United States of America

10 9

ISBN 0-13-33b1b5-9

PRENTICE·HALL INTERIolATIONAL (UK) LIMITED, London • •

PRENTICE-HALL OF AUSTRALIA PTY. LIMITED, Sydney


PRE1'o'TICE·HALL CANADA INC., Toronto
PRENTICE-HALL HISPANOAlI-IERICANA. S.A., Mexico
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PREl'o'T!CE·HALL OF JAPAN, INC., Tokyo
SIMON & SCHUSTER AsIA PTE. LTD., Singapore
EDlTORA PRENTICE-HALL DO BRASIL, LTDA., Rio de Janeiro


_ - - 1ill'~

Contents

, PREFACE xix

ACKNOWLEDGMENTS XXI

1 INTRODUCTION 1

1.1 Digital Image Processing: Problems and Applications 1 .


1.2 Image Representation and Modeling 4 '
• 1.3 Image Enhancement 6 •\,
1.4 Image Restoration 7,

1.5 Image Analysis 7


,
1.6 Image Reconstruction from Projections 8
1.7 Image Data Compression, 9
"
Bibliography , 10

2 TWO·DIMENSIONAL SYSTEMS AND MATHEMATICAL


PRELIMINARIES 11'
;
, , •
./
2:1 Introduction 11
2.2 Notation and Definitions 11

2.3 Linear Systems
., , and
'
Shift Invariance 13
• " '

2.4 The Fourier Transform 15 . •



Properties of the Fourier Transform, 16
Fourier Transform of Sequences (Fourier
Series), 18 '
"

2.5 The Z·Transform or Laurent Series 20


Causality and Stability, 21
2.6 Optical and Modulation Transfer Functions 21

2.7 Matrix Theory Results 22
VeL/ors and Matrices, 22
• Rowand Column Ordering, 23
Transposition and Conjugation Rules, 25
Toeplitz and Circulant Matrices, 25
Orthogonal and Unitary Matrices, 26
Positive Definiteness and Quodralic Forms, 27
Diagonal Forms, 27
2.8 Block Matrices and Kronecker Products 28 ,
Block Matrices, 28
Kronecker Products, 30 " •
Separable Operations, 31
, 2.9 Random Signals 31
Definitions, 31
Gaussian or Normal Distribution, 32
Gaussian Random Processes, 32
Stationary Processes, 32
Markov Processes, 33 •

Orthogonality and Independence, 34'


The Karhunen Loeve (KL) Transform, 34 • •

2.10 '
D iscrete Rcancom
rl ~-lelU>
T" "'_ 35
Definitions.
"
35
Separable and Isotropic Covariance •
Functions, 36

. 2.11 The Spectral Density Function 37
Properties of the SDF, 38
2.12 Some Results from Estimation Theory 39 •
Mean Square Estimates, 40
The Orthogonality Principle, 40
2.13 Some Results from Information Theory 41
Information, 42
Entropy, 42
The Rate Distortion Function, 43 I •
Problems 44 ,

Bibliography 47

"
3 IMAGE PERCEpTION 49
3.1 Introduction 49
3.2 Light, Luminance, Brightness, and Contrast 49

Simultaneous Contrast, 51
Mach Bands, 53

vi Contents .
3.3 MTF of the Visual System 54

3.4 The Visibility Function 55
3.5 Monochrome Vision Models 56
3.6 Image Fidelity Criteria 57
3.7 Color Representation 60
3.8 Color Matching and Reproduction 62
Laws of Color Matching, 63
Chromaticity Diagram, 65
3.9 Color Coordinate Systems 66
3.10 Color Difference Measures 71

3.11 Color Vision Model 73
3.12 Temporal Properties of Vision 75 ,
i
Bloch's Law, 75 ..
Critical Fusion Frequency (CFF), 75
Spatial versus Temporal Effects, 75
Problems 76
• Bibliography 78

4 IMAGE SAMPLING AND QUANTIZATION 80


-
4.1 i Introduction 80 •

Image Scanning, 80
\
Television Standards, 81 .,
Image Display and Recording, 83
4.2 Two-Dimensional Sampling Theory 84 .
Bandlimited Images, -84

Sampling Versus Replication, 85
- , , Reconstruction of the Image from Its
Samples, 85

Nyquist Rate, Aliasing, and Foldover •

Frequencies, 87
Sampling Theorem, 88 ,.
;,
• Remarks,89

4.3 Extensions of Sampling Theory 89
Sampling Random Fields, 90
Sampling Theorem for Random Fields, 90

Remarks,90 •
Nonrectangular Grid Sampling and
Interlacing, 91 •
Hexagonal Sampling, 92
Optima! Sampling, 92 •

.,
Contents VII
4.4 Practical Limitations in Saznpling and Reconstruction 93
Sampling Aperture, 93
Display Aperturcllnterpolation Function, 94
Lagrange Interpolation, 98 . •
Moire Effect and Flat Field Response, 99
4.5 Image Quantization 99

4.6 The Optimum Mean Square or Lloyd-Max Quantizer 101
The Uniform Optimal Quantizer, 103
Properties of the Optimum Mean Square
Quantize~ '103 .
Proof', 112
4.7 A Compandor Design 113
Remarks, 114 •

4.8 The Optimum Mean Square Uniform Quantizer


for Nonuniform Densities 115
,

4.9 Examples, Comparison, and Practical Limitations 115


4.10 Analytic Models for Practical Quantizers 118
4.11 Quantization of 'Complex Gaussian Random Variables 119
4.12 Visual Quantization 119
Contrast Quantization, 120

. ,

Pseudorandom Noise Quantization, 120


Halftone Image Generation, 121
Color Quantization, 122 •
• •,

Problems 124
Bibliography 128

6 IMAGE TRANSFORMS 132

5.1 Introduction 132


5.2 Two-Dimensional Orthogonal and Unitary Transforms 134
Separable Unitary Transforms, 134
Basis Images, 135
Kronecker Products and Dimensionality, 137 \

Dimensionality of Image Transforms, 138 ;

Transform Frequency, 138


Optimum , Transform, 138

5.3 Properties of Unitary Transforms 138
Energy Conservation and Rotation, 138
Energy Compaction and Variances Qj Transform
Coefficients, 139

Decorrelation. 140
Other Properties, ]40

viii Contents

, ,

5.4 The One-Dimensional Discrete Fourier Transform (DFT) 141


Properties of the Dl'Ttllnitary DFl', 141 ,

5.5 The Two-Dimensional DFT 145


Properties of the Two-Dimensional DFT. U7
5.6 The Cosine Transform 150
Properties of the Cosme Transform, 151
5.7 The Sine Transform 154
Properties of the Sine Transform, 154
5.8 The Hadamard Transform 155
Properties of the Hadamard Transform, 157

i, 5.9 The Haar Transform 159


,,
Properties of the Haar Transform, 161
5.10 The Slant Transform 161
Properties of the Slant Transform, 162
5.11 The KL Transform 163
KL Transform of Images, 164
Properties nf the KL Transform, 165
5.12 A Sinusoidal Family of Unitary Transforms 175
'Approximation to the KL Transform, 176
,
5.13 Outer Product Expansion and Singular Value
Decomposition 176
Properties of the SVD Transform, 177

5.14 Summary 180 \
Problems 180
,

. " •
Bibliography 185

6 IMAGE REPRESENTATION BY STOCHASTIC MODELS 189

6.1 Introduction' 1 8 9 ' -


Covariance Models, 189 •
Linear System Models, 189
,
• 6.2
, One-Dimensional Causal Models 190 , ,
"
Autoregressive (AR) Models, 190 '

Properties of AR Models, 191
• Application of AR Models in Image Processing,
193
Moving AVf!r"ge (MA) Representations, 194·
• •
, Autoregressive Moving Average (ARidA)
Representations, 195

State Variable Models, 195
Image Scanning Models, 196

Contents ix

6.3 One-Dimensional Spectral Factorization 196


Rational SDFs, 197 .

Remarks. 198·

6.4 AR Models, Spectral Factorization, and Levinson Algorithm 198


The Levinson-Durbin Algorithm, 198

6.5 Noncausal Representations 200


Remarks, 201
Noncausal MVRs for Autoregressive Sequences,
201
A Fast KL Transform, 202
Optimum Interpolation of Images, 204

6.6 Linear Prediction in Two Dimensions 204'



Causal Prediction, 205
Semicausai Prediction, 206

Noncausal Prediction, 206
Minimum Variance Prediction, 206
Stochastic Representation of Random Fields. 207 .
Finite-Order MVRs, 208
Remarks, 209
Stability of Two-Dimensional Systems, 212

6.7 Two-Dimensional Spectral Factorization and Spectral .


Estimation Via Prediction Models 213
Separable Models, 214
Realization of Noncausal MVRs, 215
Realization of Causal and Semicausal ]YiVRs,
216
Realization via Orthogonality Condition, 216

6.8 Spectral Factorization via the Wiener-Doob Homomorphic


Transformation 219

Causal MVRs, 220
Semicausal WNDRs, 220
Semicausal MVRs, 222
Remarks and Examples, 222
• ,
6.9 Image Decomposition, Fast KL 'Iransforms, and Stochastic De-
coupling 223 \, .
Periodic Random Fields, 223 .\
Noncausal Models and Fast KL Transforms, 224
Semicausal Models and Stochastic Decoupling,
225 ..

6.10 Summary 226


Problems 227 ,
Bibliography 230

x • Contents

7 !M4GE ENHlJ.NC£r./l£Jlrr 233



7.1 Introduction 233
7.2 Point Operations 235
Contrast Stretching, 235
Clipping and Thresholding, 235
• Digital Negative, 238
Intensity Level Slicing, 238
Bit Extraction, 239
Range Compression, 240
Image Subtraction and Change Detection, 240
7.3 Histogram Modeling 241
Histogram Equalization; 241 •

Histogram Modification, 242


Histogram Specification, 243


7.4 Spatial Operations 244
Spatial Averaging and Spatial Low-pass '
Filtering, 244
Directional Smoothing, 245
Median Filtering, 246· •
Other Smoothing Techniques, 249 '
,
Unsharp Masking and Crispening, 249

Spatial Low-pass, High-pass and Band-pass
_.
Filtering, 250 - -

Inverse Contrast Ratio Mapping and Statistical ,


Scaling, 252
Magnification and Interpolation (Zooming), 253 •
• Replication, 253 \,
Linear Interpolation, 253
7.5 Transform Operations 256
Generalized Linear Filtering, 256
Root Filtering, 258
Generalized Cepstrum and Homomorphic
- ,
I

Filtering, 259
7.6 Multispectral Image Enhancement 260
.•

• Intensity Ratios, 260 ,
,

I • •
Log-Ratios, 261
\
Principal Components, 261 •
• ,.

.
,

7.7 False Color and Pseudocolor 262 ,

7.8 Color Image Enhancement , 262 "

7.9 Summary 263


Problems 263

Bibliography 265

• • •
Contents

8 IMAGE FILTERING AND RESTORATION 267 '


8.1! Introduction 267

.8.2) Image Observation Models 268


Image Formation Models, 269
Detector and Recorder Models, 273
Noise Models, 273 ,
Sampled Image Observation Models, 275

Inverse and Wiener Filtering 275


Inverse Filter, 275
Pseudoinverse Filter, 276
The Wiener Filter, 276
Remarks, 279

8.4 Finite Impulse Response (FIR) Wiener Filters 284


Filter Design, 284
Remarks, 285
Spatially Varying FIR Filters, 287

~ 8.5 Other Fourier Domain Filters 290


Geometric Mean Filter, 291
Nonlinear Filters, 291

8.6 Filtering Using Image Transforms. 292


Wiener Filtering, 292
.Remarks, 293
Generalized Wiener Filtering. 293
Fi~ri~byF~tD«o~poriM~2~

. 8.7'". I
,
Smoothtng Splines and Interpolation . 295
Remarks, 297
, •
8.8 .' Least Squares Filters 297
Constrained Least Squares Restoration, 297"
Remarks, 298
• •

;f•• " 8.9 •) Generalized Inverse, SVD, and Iterative Methods 299
.., "
",.. ~., The Pseudoinverse, 299

Minimum Norm Least Squares (MNLS) ,
,

Solution and the Generalized Inverse, 300


One-step Gradient Methods, 301
Van Cittert Filter, 301
The Conjugate Gradient Method, 302
Separable Point Spread Functions, 303

Recursive Filtering For State Variable Systems 304


Kalman Filtering, 304 •
Remarks, .J07 '

xii • Contents, •
8.11 Causa! Models and Recursive Filtering 307·
A Vector Recursive Filter, 308
Stationary Models, 3]0
Steady-State Filter, 310
A Two-Stage Recursive Filter, 310
A Reduced Update Filter, 310
Remarks, 311
8.12 Semicausal Models and Semirecursive Filtering 311
Filter Formulation, 312
8.13 Digital Processing of Speckle Images 313
Speckle Representation, 313
Speckle Reduction: N-Look Method, 315
Spatial Averaging of Speckle, 315 ,
Homomorphic Filtering, 315
Maximum Entropy Restoration 316
Distribution-Entropy Restoration, 317'
Log-Entropy Restoration, 318 . , ,

Bayesian Methods 319


Remarks, 320
Coordinate Transfonnation and Geometric Correction 320
Blind Deconvolution 322
Extrapolation of Bandlimited Signals 323
Analytic Continuation, 323 •
Super-resolution, 323
. Extrapolation Via Prolate Spheroidal Wave
,
Functions (PSWFs), 324 "
\
'Extrapolation by Error Energy Reduction, 324
Extrapolation of Sampled Signals, 326
Minimum Norm Least Squm'es (MNLS)
Extrapolation, 326
Iterative Algorithms, 327
Discrete Prolate Spheroidal Sequences (DPSS), 327
, Mean Square Extrapolation, 328
. I
Generalization to TWo Dimensions, 328 ,

• 8.19 Summary 330
Problems 331
• ,
• ,
• '

Bibliography 335 ,

9 IMAGE ANALYSIS AND COMPUTER VISION 342


,

!.~ 9.1 Introduction,


. 342
• ; - '-'

9.2 Spatial Feature Extraction 344 •



Amplitude Features, 344
. Histogram Features, 344

, . •
Contents xiii
Transform Features 346
Edge Detection 347
Gradient Operators, 348
Compass Operators" 350
Laplace Operators and Zero Crossings, 3~1
Stochastic Gradients, 353
Performance of Edge Detection Operators, 355
Line and Spot Detection; 356

[I 9.5 Boundary Extraction 357


, Connectivity, 357

,,
, Contour Following, 358
Edge Linking and Heuristic Graph Searching,
358 •
Dynamic Programming, 359 ,
Hough Transform, 362
,
9.6 Boundary Representation 362
Chain Codes, 363
Fitting Line Segments, 364
BsSpline Representation, 364
Fourier Descriptors. 370
Autoregressive Models,' 374

. I 9.7 Region Representation 375 •

Run-length Codes, 375


Quad-Trees, 375
Projections. 376
;


• 9.8 Moment Representation 377
Definitions, 377
Moment Representation Theorem, 378 •

Moment Matching, 378
Orthogonal Moments, 379
Moment invariants, 380
Applications of Moment Invariants, 381

I 9.9 Structure 381



Medial Axis Transform, 381
Morphological Processing, 384 \, '
Morphological Transforms, 387

• ·9.10 Shape' Features 390 •



Geometry Features, 391
Moment-Based Features, 392

Texture 394
Statistical Approaches, 394
Structural Approaches, 398

Other Approaches, 399


• .Contents
XIV •
j In
._iY.12
, Scene Matching and Detection 400
Image Subtraction, 400
Template Matching and Area Correlailon, 400
Matched Filtering, 403
I Direct Search Methods, 404

L{9.13 Image Segmentation . 407 .


Amplitude Thresholding or Window Slicing, 407 .
Component Labeling, 4fJ9
Boundary-based Approaches, 411
Region-based Approaches and Clustering,412
Template Matching, 413
Texture Segmentation, 413
. 19.14 Classification Techniques 414 .
!. Supervised Learning, 414
Nonsupervised Learning or Clustering, 418 .
k~.15
, l'f
Image Understanding 421 , •

Problems 422 - ....



Bibliography 425

. ,
10 IMAGE RECONSTRUCTION FROM PROJECTIONS' • 431
,

J:hO.l Introduction 431
, Transmission Tomography, 431
Reflection Tomography, 432 \
Emission Tomography, 433
• Magnetic Resonance Imaging, 434
Projection-based Image Processing, 434 \

\10.2 The Radon Transform 434 .'


.. - ,

Definition, 434 •
Notation. 436
Properties of the Radon Transform. 437

The Back..projection Operator 439
Definition. 439
Remarks, 440 •

The Projection Theorem 442


Remarks, 443 -
l110.5
Ii
The Inverse Radon Transform 444
• Remarks, 445
Convolution Back-projeaion Method, 446 •

Filter Back-projection Method, 446'
TWo-Dimensional Filtering via the Radon •

. Transform, 447

Contents
'----/--- /
/

10.6 ConvolutionIFilter Back-projection Algorithms: Digital


. Implementation 448
Sampling Considerations, 448 .
Choice of Filters, 448
Convolution Back-projection Algorithm, 450
Fil~rB~k~rofrwonA~Muhm,~1
Reconstruction Using a Parallel Pipeline
, Processor; 452
10.7 Radon Transform of Random Fields : 452
A Unitary Transform R, 452
Radon Transform Properties for Random Fields,
456
Projection Theorem for Random Fields, 157
10.8 Reconstruction from Blurred Noisy Projections f58 •

Measurement Model, 458


The Optimum Mean Square Filter. 458
Remarks, 458
10.9 Fourier Reconstruction 462
Algorithm, 462 ,
Reconstruction of Magnetic Resonance Images,

463
10.10 Fan-Beam Reconstruction 464
10.11 Algebraic Methods 465
The Reconstruction Problem as a Set of Linear
Equations, 465
Algebraic Reconstruction Techniques, 466
10.12 Three-Dimensional Tomography 468
Three-Dimensional Reconstruction Algorithms,
469

10.13 Summary 470


Problems 470
,
Bibliography 473

\

11 IMAGE DATA COMPRESS1CN , 476


=
11.1 Introduction' 476
••
Image Raw Data Rates, 476
Data Compression versus Bandwidth •
Compression, 477
Information Rates, 477
Subsampllng, Coarse Quantization, Frame .
Repetition, and Interlacing, 479

xvi 'Contents


11.2 Pixel Coding 479
PCM,480

Entropy Coding, 480
Run-Length Coding, 481
Bit-Plane Encoding, 483
,

11.3 Predictive Techniques 483 ,

Basic Principle, 483


Feedback versus Feedforward Prediction, 484
Distortionless Predictive Coding, 485
Performance Analysis of DPCM, 486, . ," '

Delta Modulation, 488


Line-by-Line DPCM, 490
Two-Dimensional DPCM, 491
Performance Comparisons" 493
Remarks, 494
Adaptive Techniques, 495
Other Methods, 497
11.4 Transform Coding Theory 498
The Optimum Transform Coder, 498
Proofs, 499 ,
Remarks, 501
Bit Allocation and Rate-Distortion
Characteristics, 501

,, 11.5 Transform Coding of Images 504
Two-Dimensional Coding Algorithm, 504 •

Transform Coding Performances Trade-offs and


• Examples, 507 •
Zonal versus Threshold Coding, 50S \
Fast KL Transform Coding, 510
Remarks, 512 ,

Two-Source Coding, 513
Transform Coding under Visual Criterion,515
Adaptive Transform Coding, 515
Summary of Transform Coding, 516
'11.6 Hybrid Coding and Vector DPCM 518 •
Basic Idea, 518 ;,
"','
, Adaptive Hybrid Coding, 520 .
Hybrid Coding Conclusions, 521 •
, I ,

11.7 Interframe Coding 521


Frame Repetition, 521
Resolution Exchange, 521
Conditional Replenishment, 522
Adaptive Predictive Coding, 522
Predictive Coding with Motion Compensation, •

524 •

Interframe Hybrid Coding, 527 •

Three-Dimensional Transform Coding, 529

.,
Contents XVII
,

11.8 Image Coding in the Presence of Channel Errors 532


The Optimum Mean Square Decoder, 532
The Optimum Encoding Rule, 533
. Optimization of PCM Transmission, 534 •
Channel Error Effects in DPCM, 536
Optimization of Transform Coding, 537

11.9 Coding of Tho Tone Images 540
Run-length Coding, 540
White Block Skipping, 546
Prediction Differential Quantization, 547
Relative Address Coding, 547 ,

CCITT Modified Relative Element Address ';


Designate Coding, 548 ,,
Predictive Coding, 551 ,

Adaptive Predictors, 552 •
Comparison of Algorithms, 553
Other Methods, 553
11.10 Color and Multispectral Image Coding 553
11.11 Summary 557
,

Problems 557
Bibliography 561
,
• •

INDEX 566
\

,,
, . •,
, :

• •

, •

xviii Contents
"

,,
, . ,

Preface
I

Digital image processing is a rapidly evolving field with growing applications in


science and engineering. Image processing holds the possibility of developing the
ultimate machine that could perform the visual functions of all living beings. Many
theoretical as well as technological breakthroughs are required before we could build
such a machine. In the meantime, there is an abundance of image processing
applications that can serve mankind with the available and anticipated technology in
• the near future .
This book addresses the 'fundamentals of the major topics of digital image
processing: representation, processing techniques, and communication. Attention
has been focused on mature topics with the hope that the level of discussion provided
would enable an engineer or a scientist to design image processing systems or conduct
research on advanced and newly emerging topics. Image representation includes
tasks ranging from acquisition, digitization, and display to mathematical characteriz-
ation of images for subsequent processing. Often, a proper representation is a
prerequisite to an efficient processing technique such as enhancement, filtering and
, , , restoration, analysis, reconstruction from projections, and image communication.
Image processing problems and techniques (Chapter 1) invoke concepts from diverse
fields such as physical optics, digital signal processing, estimation theory, information
theory, visual perception, stochastic processes, artificial intelligence, computer
" graphics, and so on, This book, is intended to serve as a text for second and third
quarter (or semester) graduate students in electrical engineering and computer
science. It has evolved out of my class notes used for teaching introductory and

advanced courses on image processing at the University of California at Davis .
The introductory course (Image Processing I) covers Chapter 1, Chapter 2
(Sections 2.1 t02.8), much of Chapters 3 to 5, Chapter 7, and Sections 9.1 to 9.5. This
material is supplemented by laboratory instruction that includes computer experi-

ments. Students in this course are expected to have had prior exposure to one-
dimensional digital signal processing topics such as sampling theorem, Fourier trans-

xix

form, linear systems, and some experience with matrix algebra. Typically, an entry
level graduate course in digital signal processing is sufficient. Chapter 2 of the text
includes much of the mathema ~cal background that is needed in the rest of the book.
A student who .masters Chapter 2' should be able to handle most of the image'
processing problems discussed in the text and elsewhere in the image processing
literature. •
The advanced course (Image Processing II) covers Sections 2.9, 2.13, and
selected topics from Chapters 6, 8, 9, 10, and 11. Both the courses are taught using
visual aids such as overhead transparencies and slides to maximize discussion time
and to minimize in-class writing time while maintaining a reasonable pace. In the
advanced course, the prerequisites include Image Processing I and entry level gradu-
ate coursework in linear systems and random signals.
Chapters 3 to 6 cover the topic of image representation. Chapter 3 is devoted to
low-level representation of visual information such as luminance, color, and spatial
and temporal properties of vision.' Chapter 4 deals with image digitization, an
essential step for digital processing. In Chapter 5, images are represented as series
expansion of orthogonal arrays or basis images. In Chapter 6, images are considered
as random signals.
Chapters 7 through 11 are devoted to image processing techniques based on
representations developed in the earlier chapters. Chapter 7 is devoted to image
enhancement tcchniques;u topic of considerable importance in the practice of image
processing. This is followed by a chapter on image restoration that deals with the
theory and algorithms for removing degradations in images. Chapter 9 is concerned ..
with the end goal of image processing, that is, image analysis. A special image
restoration problem is image reconstruction from projections-a problem of im-
mense importance in medical imaging and nondestructive testing of objects. The .
theory and techniques of image reconstruction are covered in Chapter 10. Chapter 11 .
is devoted to image data compression-a topic of fundamental importance in image
communication and storage.
Each chapter concludes with a set of problems and annotated bibliography. The
problems either go into the details or provide the extensions of results presented in
the text. The problems marked with an asterisk (*) involve computer simulations.
The problem sets give readers an opportunity to further their expertise on the
relevant topics in image processing. The annotated bibliography provides a quick
survey of the topics for the enthusiasts who wish to pursue the subject matter in
greater depth.

Supplementary Course Materials

Forthcoming with this text is an instructors manual that contains solutions to selected'
problems from the text, a list of experimental laboratory projects,. and course
syllabus design suggestions for various situations•

"

• •

Preface .
.'

Acknowledgments

I am deeply indebted to the many people who have contributed in making the
completion of this book possible. Ralph Algazi, Mike Buonocore, Joe Goodman,
Sarah Rajala, K R. Rao, Jorge Sanz, S. Srinivasan, John Woods, and Yasuo Yoshida
carefully read 'portions of the manuscript and provided important feedback. Many
graduate students, especially Siamak Ansari, Steve Azevedo; Jon Brandt, Ahmed
Darwish, Paul Farrelle, Jaswant Jain, Phil Kelly, David Paglieroni, S. Ranganath,
John Sanders, S. H. Wang, and Wim Van Warmerdam provided valuable inputs
through many examples and experimental results presented in the text. Ralph Algazi
and his staff, especially Tom Arons and Jim Stewart, have contributed greatly
through their assistance in implementing the computer experiments at the UCD
Image Processing Facility and the Computer Vision Research Laboratory. Vivien
Braly and Liz Fenner provided much help in typing and organizing several parts of
the book. Colleen Brosnan and Tim Bozik of Prentice Hall provided the much-
needed focus and guidance for my adherence to a schedule and editorial assistance
that is required in the completion of a text to bring it to market. Thanks are also due
to Tom Kailath for his enthusiasm for this work.
Finally, I would like to dedicate this book tqmy favorite image, Mohini, my
children Mukul, Malini, and Ankit, and to my patents, all, for their constant support

and encouragement.
. ,

• •

xxi
,,

Introduction

1.1 DIGITAL IMAGE PROCESSING: PROBLEMS AND APPLlCA-rlONS


,
"
,
The term digital image processing generally refers to processing of a two-
dimensional picture by a digital computer. In a broader context, it implies digital
processing of any two-dimensional data. A digital image is an array of real or
complex numbers represented by a finite number of bits. Figure 1.1 shows a
computer laboratory (at the University of California, Davis) used for digital image
processing. An image given in the form of a transparency, slide, photograph, or
chart is first digitized and stored as a matrix of binary digits in computer memory.
This digitized image can then be processed and/or displayed on a high-resolution
television monitor. For display, the image is stored in a rapid-access buffer memory
which refreshes the monitor at 30 frames/s to produce a visibly continuous display.
Mini- or microcomputers are used to communicate and control all the digitization,
storage, processing, and display operations via a computer network (such as the
, Ethernet). Program inputs to the computer are made through a terminal, and the
outputs are available on a terminal, television monitor, or a printerfplotter. Fig-
• ure 1.2 shows the steps in a typical image processing sequence"
- Digital image processing has a broad spectrum of applications, such as remote
sensing via satellites and other spacecrafts, image transmission and storage for
business applications, medical processing, radar, sonar, and acoustic image process-
ing, robotics, and automated inspection of industrial parts.
Images acquired by satellites are useful in tracking of earth resources; geo-
graphical mapping; prediction of agricultural crops, urban growth, and weather;
flood and fire control; and many other environmental applications. Space image
applications include recognition and analysis of objects contained in images ob-
'. tained from deep space-probe missions. Image transmission and storage applica-

• • •

1

.

,( Link to other

\~\~.
computers •
• .

~'-. ~ • •

\
'.

- •

. Image scanner
\, XY digitizer
i Photo
(AID converter) transparency

I Large disk I
4
I CRT
High
resolution
Photo
camera ~
• •
Terminals Large digital image Color
,~U buffet and processor •
monitors
• I (network)

I •


t Hardcopy
IGraphics •
Printer plotter rape drive Disks terminal
Iworkstation .



• •

.- II1_- ~IIII.
~ ~~

~I ""5/1 i -'-~Ii II
! ' i>; 11


Figure 1.1 A digital image processing system (Signal and Image Processing
Laboratory, University of California, Davis).

.
.... •
Sample Digital . I' Online
Imaging and storage Digital
system computer Display
quantize (disk) \.buffer
..... ~

Output
Object Observe Digitize Store • Process Refreshl
store

Rooord•

.•

Figure 1.2 A typical digital image processing sequence.

tions occur in broadcast television, teleconferencing, transmission of facsimile im-


ages (printed documents and graphics) for office automation, communication over
computer networks, closed-circuit television based security monitoring systems,
and in military communications. In medical applications one is concerned
• with· •

processing of chest X rays, cineangiograms, projection images of transaxial


tomography, and other medical images that occur in radiology, nuclear magnetic

2 .Introduction Chap. 1

resonance (NMR), and ultrasonic scanning. These images may be used for patient
screening and monitoring or for detection of tumors or other dis ease in patients.
Radar and sonar images are used for detection and recognition of various types of
targets or in guidance and maneuvering of aircraft or missile systems. Figure 1.3
.shows examples of several different types of images. There are many other applica-
tions ranging from robot vision for industrial automation to image synthesis for
cartoon making or fashion design. In other words, whenever a human or a machine
or any other entity receives data of two or more dimensions, an image is processed.
Although ther~ are many image processing applications and problems, in this
text we ~ll consider the following basic classes of problems.

,
; 'I

~,

• •• •

• l
i
(a) Space probe Images: Moon and Mars. (b) multispectral images: visual and infrared.



~ " c·,
. , c',

HUHHUU
t


. " .. ;'
".<:,~,

" "J
- /

- -j- '.:, .;~


,,"," -
- - .-" N _. ,-.C'.,"_

Ie) medical images: Xray and eyeball. (dJ optical camera Images: Golden Gate and
,

downtown San Francisco.

\ "
,
.

i-


. Figure 1.3 Examples of digital images.


Sec. 1.1 Digital Image Processing: Problems and Applications • 3
1. Image representation and modeling'
2. Image enhancement
3. Image restoration
4. Image analysis
.5. Image reconstruction
6. Image data compression

1.2 iMAGE REPRESENTATION AND MODELING

In image representation one is concerned with characterization of the quantity that


each picture-element (also called pixel or pel} represents. An image could represent
luminances of objects in a scene (such as pictures taken-by ordinary camera), the
absorption characteristics of the body tissue (X-ray imaging), the radar cross section
of a target (radar imaging), the temperature profile of a region (infrared imaging),
or the gravitational field in an area (in geophysical imaging). In general, any two-
dimensional function that bears information can be considered an image. Image.
models give a logical or quantitative description of the properties of this function.
Figure 1.4 lists several image representation and modeling problems.
An important consideration in image representation is the fidelity or intelli-
gibility criteria fOf measuring the' quality of an image or the performance of a
processing technique. Specification of such measures requires models of perception
of contrast, spatial frequencies, color, and so on, as discussed in Chapter 3. Knowl-
edge of a fidelity criterion helps in designing the imaging sensor, because it tells us
the variables that should be measured most accurately.
The fundamental requirement of digital processing is that images be sampled
and quantized. The sampling rate (number of pixels per unit area) has to be large
enough to preserve the useful information in an image. It is determined by the
bandwidth of the image. For example, the bandwidth of raster scanned common
television signal is about 4 MHz. From the sampling theorem, this requires a
minimum sampling rate of 8 MHz. At 30 frames/s, this means each frame should
contain approximately 266,000 pixels. Thus for a 512-line raster, this means each

Image representation and modeling
• •

••
. ~" •

I Perceptio n mOdels::I .
.
Local models

Global models

.. Visual perception af contrast, *Sampling and reconstruction -Scens analysis/artificial


spatial frequencies, and color "Image quantization intelligence models
-I mage fidelity models 'Oetermlnlstic models -Sequantial and clustering .
-Temporal perception 'Series expansions/unitary models
·Scene perception transforms -Image understolnding models
- Statistical models

Figure 1.4 Image representation and modeling.

,
4 lntroductlcn- Chap. 1

, •
.
"•

=03'1 1

,8, • 1 BM,N

Figure I.S Image representation by orthogonal basis image series Bm.•.


,
image frame contains approximately 512 x 512 pixels. Image quantization is the
analog to digital conversion of a sampled image to a finite number of gray levels.
Image sampling and quantization methods are discussed in Chapter 4.
, A classical method of signal representation is by an orthogonal series ex-
pansion, such as the Fourier series. For images, analogous representation is possible
via two-dimensional orthogonal functions called basis images. For sampled images,
the basis images can be determined from unitary matrices called image transforms.
Any given image can be expressed as a weighted sum of the basis images (Fig. 1.5).
Several characteristics of images, such as their spatial frequency content, band-
width, power spectrum, and application in filter design, feature extraction, and so
on, can be studied via such expansions. The theory and applications of image
transforms are discussed in Chapter 5. , '
Statistical models describe an image as a member of an ensemble, often
characterized by its mean and covariance functions. This permits development of
algorithms thatare useful for an entire class or an ensemble of images rather than
for a single image. Often the ensemble is assumed to be stationary so that the mean
and covariance functions can easily be estimated. Stationary models are useful in
data compression problems such as transform coding, restoration problems such as
Wiener filtering, and in other applications where global properties of the ensemble
. are sufficient. A more effective use of these models in image processing is to con-
. sider them to be spatially varying or piecewise spatially invariant.
To characterize short-term or local properties of the pixels, one alternative is
to characterize each pixel by a relationship with its neighborhood pixels. For
example, a linear system characterized by a (low-order) difference equation and
. ; forced by white noise or some other random field with known power spectrum
density is a useful approach for representing the ensemble. 'Figure 1.6 shows three
types of stochasticmodels where an image pixel is characterized in terms of its
neighboring pixels. If the image were scanned top to bottom and then left to right,
the model of Fig. 1.6a would be called a causal model. This is because the pixel A is
characterized by pixels that lie in the "past." Extending this idea, the model of Fig.
1.6b is a noncausal model because the neighbors of A lie in the past as well as the
"future" in both the directions. In Fig. 1.6c, we have a semicausal model because
the neighbors of A are in the past in the I-direction and are in the past as well as
future in the i-direction. •
Such models are useful in developing algorithms that have different hardware
realizations. For example, causal models can realize recursive filters, which require
small memory while yielding an infinite impulse response (UR), On the other hand,

Sec. 1.2·' . Image Representation and Modeling 5


,
• i-I l' i +1 •
• G ,C F I

i- 1
,
1
1- 1 .t i r- 1 B

i- 1 C a
t I
,
D a i D A
,
I
A
0
i +1 F
H E I

(a) Causal model, (M Noncausal model. tel Semlcaussl model,

Figure 1.6 Three canonical forms of stochastic models.



,
noncausal models can be used to design fast transform-based finite impulse re-
sponse (FIR) filters. Semicausal models can yield two-dimensional algorithms;
which are recursive in one dimension and nonrecursive in the other. Some of these
stochastic models can be thought of as generalizations of one dimensional random •

processes represented by autoregressive (AR) and autoregressive moving average


(ARMA) models. Details of these aspects are discussed in Chapter'S. '
In global modeling, all image is considered as a composition of several objects.
Various objects in the scene are detected (for example, by segmentation tech-
niques), and the model gives the rules for defining the relationship among various
objects. Such representations fall under the category of image understanding
models, which are not a subject of study in this text.
,

1.3 IMAGE ENHANCEMENT

In image enhancement, the goal is to accentuate certain image features for subse-
quent analysis or for image display. Examples include contrast and edge enhance-
ment, pseudocoloring, noise filtering, sharpening, and magnifying. Image enhance-
ment is useful in feature extraction, image analysis, and visual information display.
• The enhancement process itself does not increase the inherent information content
in the data. It simply emphasizes certain specified image characteristics. Enhance-
ment algorithms are generally interactive and application-dependent:
Image enhancement techniques, such as contrast stretching, map each grry.
level into another gray level b:l R predetermined transformation, An example is the
histogram equalization method, where the input gray levels are mapped so that the
output gray level distribution is uniform. This has been found to be a powerful
method of enhancement oflow contrast images (see Fig. 7.14). Other enhancement
techniques perform local neighborhood operations as in convolution, transform,
operations as in the discrete Fourier transform, and other operations as in pseudo-
coloring where a gray level image is mapped into a color image by assigning differ-
ent colors to different features. Examples and details of these techniques are
considered in Chapter 7. .

• •
6 •
Introduction . Chap. 1
,
f(o:, 13) g(x, y)

, ...-Noise
h(x,y;a,p!
,
Imaging l--Output
Input-""" system

e o
Figure 1.7 Blurring due to an imaging system. Given the noisy and blurred im-
age the image restoration problem is to find an estimate of the input image f(x, y),

1.4 IMAGE RESTORATIO",j

Image restoration refers to removal or minimization of known degradations in an


image. This includes deblurring of images degraded by the limitations of a sensor or
its environment, noise filtering, and correction of geometric, ,distortion or non-
Iinearities due to sensors. Figure 1.7 shows a typical situation in image restoration.
The image of a point source is blurred and degraded due to noise by an imaging
system. If the imaging sytem is linear', the image of an object can be expressed as

g(x,y) = rr
-<'Xl -ro
h(x,y;cx;I3)!(a, l3)dadl3 +,,(x,y) (1.1)

where ,,(x, y) is the additive noise function, I( a, 13) is the object, g (x, y) is the image,
and h (x, y; a, 13) iscalled the point spread function (PSF). A typical image restora-
tion problem is to find an estimate off(a, 13) given the PSF, the blurredimage, and
the statistical properties of the noise process.

, A fundamental result in filtering theory used commonly for image restoration
, is called the Wiener filter. This filter gives the best linear mean square estimate of
- the object from the observations. It can be implemented in frequency domain via
the fast unitary transforms, in spatial domain by two-dimensional recursive tech-
niques similar to Kalman filtering, or by FIR nonrecursive filters (see Fig. 8.15). It
can also be implemented as a semirecursive filter that employs a unitary transform
in one of the dimensions and a recursive filter
, in the other.
Several other image restoration methods such as least squares, constrained
least squares, and spline interpolation methods can be shown to belong to the class
of Wiener filtering algorithms. Other methods such as maximum likelihood, max-
imum entropy, and maximum a posteriori are nonlinear techniques that require
iterative solutions. These and other algorithms useful in image restoration' are
discussed in Chapter 8. '. ,

. 1.5 IMAGE ANALYSIS

Image analysis is concerned with making quantitative measurements from an image


to produce a description of it. In the simplest form, this task could be reading a label
on a grocery item, sorting different parts on an assembly line (Fig. 1.8), or

Image Analysis '1


Image Wrench
" Input image "
analysis 18 in"
• system #003250
accept
" •

/ffQU!~\
Hopper
Camera

I Reject
I
,I
p
--+-->- Accept

Figure 1.8 Parts inspection and sorting on an assembly line.

measuring the size and. orientation of blood cells in a medical image. More
advanced image analysis systems measure quantitative information and use it to
make a sophisticated decision, such as controlling the arm of a robot to move an
object after identifying it or navigating an aircraft with the aid of images acquired
along its trajectory. " ". " " . "
Image analysis techniques require extraction of certain features that aid in the
identification of the object. Segmentation techniques are used to isolate the desired"
object from the scene so that measurements can be made on it subsequently.
Quantitative measurements of object features allow classification and description of
the image. Thesetechniques are considered in Chapter 9.

1.6 IMAGE RECONSTRUCTION FROM PROJECTIONS

" Image reconstruction from projections is a special class of image restoration prob-
lems where a two- (or higher) dimensional object is reconstructed from several
one-dimensional projections. Each projection is obtained by projecting a parallel
X ray (or other penetrating radiation) beam through the object (Fig. 1.9). Planar
projections are thus obtained by viewing the object from many different angles.
Reconstruction. algorithms derive an image of a thin axial slice of the object, giving
an inside view otherwise unobtainable without performing extensive surgery,' Such
techniques are important in medical imaging (CT scanners), astronomy, radar imag-
ing, geological exploration, and nondestructive testing of assemblies.
Mathematically, image reconstruction problems can be set up in the frame-
work of Radon transform theory. This theory leads to several'useful reconstruction
algorithms, details of which are discussed in Chapter 10.

8 Introduction Chap.. 1
<.iJ
I •

I
____ J .II
----------f--- Projections
\
\
!-+-+t-'::-...."-,,,~-+----­
Reconstruction
~~r----r x~~
algorithm
~--T---t

Figure 1.9 Image reconstruction using X-ray cr scanners.

1.7 IMAGE DATA COMPRESSION

The amount of data associated with visual information is so large (see Table Lla)
that its storage would require enormous storage capacity. Although the capacities of
several storage media (Tablel.1b) are substantial, their access speeds are usually
inversely proportional to their capacity. Typical television images generate data
rates exceeding 10 million bytes per second. There are other image sources that
generate even higher data rates. Storage and/or transmission of such data require
large capacity and/or bandwidth, which could be very expensive. Image data com-
pression techniques are concerned with reduction of the number of bits required to
store or transmit images without any appreciable loss of information. Image trans-

TABLE 1.1a Data Volumes of Image Sources


ljn Millions of Bytes)

'. 9
. National archives 12.5 X 10
1 h of color television 28 x 1O~
- •
I
Encyclopeadia Britannica 12.5 X 103
Book (200 pages of text characters) 1.3 .
One page viewed as an image .13

. \
• TABLE 1.1b Storage Capacities
lin Millions of Bytes)
Human brain . 125,000,000
Magnetic cartridge 250,000
Optical disc memory 12,500

• Magnetic disc 760
2400-ft magnetic tape 200
Floppy disc 1.25
. . Solid-state memory modules 0.25

Sec. 1.7 .Image Data Compression 9



mission applications are in broadcast television; remote sensing via satellite, air-
craft, radar, or sonar; teleconferencing; computer communications; and facsimile
transmission. Imagestoraje is required. most commonly for educational and busi-
ness documents, medical images used in patient monitoring systems, and the like.
Because of their wide applications, data compression is of great importance in
digital image processing. Various image data compression techniques and examples

are discussed in Chapter 11.

BIBLIOGRAPHY

For books and special issues of journals devoted to digital imaging processing:
• •

1. H. C. Andrews, with contribution by W. K. Pratt and K. Caspari. Computer Techniques.


in Image Processing. New York: Academic Press, 1970.
2. H. C. Andrews and B. R. Hunt. Digital Image Restoration. Englewood Cliffs, N.J.:
Prentice-Hall, 1977. •
3. K. R.Castleman. Digital Image Processing. Englewood Cliffs, N.J.: Prentice-Hall, 1979.
4. M.P. Ekstrom (Ed.), Digital Image Processing Techniques. New York: Academic Press,
1984.
,
.
;S. R. C. Gonzalez and P. Wintz. Digital Image Processing, 2nd ed. Reading, Mass.:
,r Addison-Wesley,19R7.. . . .
6.. E. L. Hall. Computer Image Processing and Recognition. New YorkrAcademic Press; .
1979. . .
7. T. S. Huang (ed.). Topics in Applied Physics: Picture Processing and Digital Filtering,
• •
Vol. 6. New York: Springer-Verlag,'1975. Also see, Vols, 42-43, 1981.
8. S. Lipkin and A. Rosenfeld. Picture Processing and Psychopictorics. New York: Aca-
demic Press, 1970.. .
op-9. W. K. Pratt. Digital Image Processing. New York: Wiley-Interscience, 1978.
r prlO. A. Rosenfeld and A. C. Kak. Digital Image Processing, Vols. I and II. New York: .
;- Academic Press, 1982. . .
11. Special issues on image processing. Proceedings IEEE, 60, no. 7 (July 1972); IEEE
Computer, 7, no. 5 (May 1974); Proceedings IEEE, 69, no. 5 (May 1981).

••

,•


• •





10 Introduction· Chap; 1.

Two-Dimensional
.. - .
Systems -

and
Mathematical Preliminaries

2.1 INTRODUCTION

In this ~chl1})terj\'e define our notation and discuss some mathematicalpreliminaries


th.!!1 will b~!hroughoutthe b()()k. Because imagenti~~~
tWo-dimensional systems, mathematical concepts used in the study of such systems
are needed. We start~Ldefining our nO!a!i~lllalld then review.Jhe definitions arig
properties of linear systems and the Fourier and Z -transforms. This is followed by a
• - . • j

review
.-
of several fundamental results from !!1(iyix1l;li1oQ( that are important
. . .
in .
digital image processing theory. Two-dimensional random fields and some impor-,
tant concepts from probability and estimation theory are tllen reviewed. The
emphasis is on the final results and their appfications
. in image proCessing. It is
.
assumed that the reader has encountered most of these basic concepts earlier. The
summary discussion provided here is intended to serve as an easy reference for
subsequent chapters. The problems at the end of the chapter provide an oppor-
tunity to revise these concepts through special cases and examples. •

2.2 NOTATION AND DEFINITIONS


.
" .

A one-dimensional continuous signal will be represented as a function of one


variable: f(x), u (x), s (t), and so on. One-dimensional sampled signals will be written
as single index sequences: Un, u(n), and the like. .
A continuous image will be represented as a function of two independent
variables: u(x, y), v (x,y),j(x, y), and so forth. A sampled image will be represented
. as a two- (or higher) dimensional sequence of real numbers: Um• n , vern, n), u(i, i. k),
and so on. Unless stated 'otherwise, the symbols i, j, k, l, m, n, '... will be used to

Sec. 2.2 Notation and Definitions 11


specify integer indices of arrays and vectors. The symbol roman j will represent
VCI. The complex conjugate of a complex variable such as z, will be denoted by
z*. Certain symbols will be redefined at appropriate places in the text to keep the
notation clear. .
Table 2.1 lists several well-known one-dimensional functions that will be often
encountered. Their two-dimensional versions are functions of the~.(;B~l'!Jf form

I(x, y) = Nx)f,(y) (2.1)
For example, the two-dimensional delta functions are defined as
_,Dirac: 8(x,y) =8(x)8(y)
,K.r~er; B(m, n) = 8(m)8(n)
.

which satisfy the properties


• • ,
'" [[/(X', Y ')8(x - x', y -:- y')dx' dy' = I(x, y)
!~~ff 8(x,y)dxdy = 1,
~

.~ x(m, n) = 2.:2: x(m',n ')8(m -m',n -n') •


m', fl' ""'-',II} (2.4)
~ 2:2: oem, n) = 1
m." =-cc
The definitions and properties of the functions rect(x, y), sinc(x, y), and comb (x, y)
can be defined in a similar manner.

TABLE 2.1 Some Special Functions


Function ; Definition Function Definition

Dirac delta' 8(x)=0,xoloO Rectangle () {l' Ix S \12


rect x = 0, . Ix > \12

~~L 8 (x) dx = 1 • Signum sgn(x) =


1, x>O
0, x =0
-1 , x<O
Sifting
property [J(x')&(X -x') dx' =f(x)
Scaling
(q•
• '-,.. ,./

sine (x) = Sin srx
r.x
property &(ax) = 81~1) ~

Comb comb (x) = 2: &(x-n)


Kronecker #'j ... -0\>

delta B(n) ={~: n +0


n =0
Triangle tri (x) =P -lxi,
o, . ~ISI
Sifting ~
I> 1
property 2:
tn- .... co.
f(m)S(n -m)=f(n)

12 Two-Dimensional Systems and Mathematical



Preliminaries Chap.

2
1.3 LINEAR SYSTEMS AND SHIFT INVAR1ANCE

A large number of imaging 'systems can be modeled as two-dimensional linear


systems. Let x(m, n) and y(m, n) represent the input and output sequences,
respectively, of a two-dimensional system (Fig. 2.1), written as
y(m, n) = ;;tf [x (m, n)J (2.5)
This system' is called .li.I1/tlJ£.. if and only if any linear combination of two inputs
xl(m, n) and xl(m, n) produces the same combination of their respective outputs
• • Yl(m, n) and Yl(m, n), i.e., for arbitrary constants al and al
~ [alxl(m, n) + alxl(m, n)J = al ~[xI(m, n)] + a2~[xZ(m, n)]
:= aIYl(m, n) + aZY2(m,.n) (2.6)
• This is called linmr s/JP-erjlQ,S,iUQll. When the input is' the two-dimensional
Kronecker delta function at location im', n '), the output at location (m, n) is
defined as
h(m,n;m',n')~~[8(m -m',n -n')] (2.7)
ill1:1E!:.~e
and is called the .,.,- ri!sponse of the system. For an imaging system, it is the
::;;;::::::(,:,:\.,
. --

image in the output plane due to an ideal point source at location (m', n ') in the
input plane. In our notation, the semicolon (;) is employed to distinguish the input
and output pairs of coordinates.
The impulse response is called th«~~ilJJ~r~!iu:ljij.lJ.dJJ)ll (PSF) when the inputs
.ii.rJ.d~'rtPutsIepresent a positive quantity such as the intensity of light in imaging
§ystem.b The term impulse response is more general and·is.allQwed to take negative
as well as complex values. The region 0,[ SUllP0rt of animpuIseJespon~jsthe
~allest dosed regi()ii~in the m, n plane outside which the impulse resI'0nse is zero.
A system is said to be a finite impulse response (FIR) OqlIl infinite imjlulse response
(IIR) system if its impulse response has finite or infinite regions of support, re-
spectively. " .
The output of any linear system can be obtained from its impulse response and
the input by applying the superposition rule of (2.6) to the representation of (2.4) as
follows:
,
- y(m,a):= ~[x(m, n)]

=:~ 2: 2:x(m',n')8(m -r m'i n -n')


m' n'

-,,
.
:"
, ,,
. -
..
,
,
m' n'

:} ,,'y(m,n)=2:2:x(m',n')h(m,njm',n') (2.8)
m ' n' _.. _~~~- ---_..-


!
x(m, n l - -.... 3C [. J If----;"''''Ylm, n)

Figure 2.1 A system.


Sec. 2.3 Linear Systems and Shift Invariance 13

A system is called sfJatiallJljnva~ianLor~hiftir!:'::E!iant if a translation of the input


causes a translation of the output. Following the definition of (2.7), ii' the impulse
occurs at :the origin we will have .
~[o(m, n)J = hem, n;O,O)
Hence, it must be true for shift invariant systems that
hem, n; m', n')~ ~[o(m - m " n - n ')]
=h(m -m',n -n';O,O)
h(m,n;m',n') =h(m -m',n -n') (2.9)

i.e., the impulse response is afunction of the two displacement variables only. This
means ~he shape. of the impulse response does not change as the impulse moves
about the m, n plane. A system is called sp!l1iallv varybig when (2.9) does not hold.
Figure 2.2 shows examples of PSFs of imaging systems with separable or circularly
symmetric impulse responses. . .
For shift invariant systems, the output becomes

y(m,n)= 2:2: hem -m',n -n')x(m',n') (2.10)


111'. ro' := -00

which is called the COl1VQbl!;on of the input with th~jl7lPHlse tsspans». Figure 2.3
shows a graphical interpretation of this operation. The impulse response array is
0
rotated about the origin by 180 and then shifted by (m, n) and overlayed on the
array xtm', n '), The sum of the product of the arrays {x(·,.)} and {h(·,.)} in the
overlapping regions gives the result at im, n). We will use the symbol @ to denote
the convolution operation in both discrete and continuous cases, i.e.,

g(x, y) = h (x, y)@/(x, y) ~ rr _;:x;. _00


hex - x', y - y ')/(x', y')dx' dy'
.

"
y(m,n)=h(m,n)@x(m,n)~ 'Z2: h(m-m',n-n')x(m',n') (2.11)
m'n' ,<tlI _tc>

a b
Figure 2.2 Examples of PSFs c d i.
(a) Circularly symmetric PSF of average
atmospheric turbulence causing small
.

blur; (b) atmospheric turbulence PSF
causing large blur; (c) separable PSF of a
diffraction limited system with square
aperature; (d) same as Ie) but with
smaller aperture.

14 Two-Dimensional Systems and Mathematical Preliminaries Chap.2 .


• •

• •
n' n'

Rotate by 180· and I



shift by tm# nl
xim', /?') B

n
him - m', n - d)
A

. I'A
I
~----_-m' m
B c
(a) impulse response (b) output at location (m, n I is the sum of product
of quantities in the area of overlap.

Figure 2.3 Discrete. convolution in two dimensions

The convolution operation has several interesting properties, which are explored in
Problems 2.2 and 2.3.
Example 2.1 (Discrete convolution). .
Consider the 2 x 2 and 3 x 2 arrays Iitm, n) and x (in, n) shown next, where the boxed
element is at the origin. Also shown are the various steps for obtaining the convolution
of these two arrays. The result y(m, n) is a 4 x 3 array. In general, the convolution of
two arrays of sizes (M, x N,) and (M, x N,) yields an array. of size [(M, + M, -1) X
(N, + N, - 1») (Problem 2.5).
n n n n •

t 1 4 1 1 1 . t
_1[1]53 CD -1 ::}-1' IT]
lW,m l.j...=-,.--...,~", m .+-=1;=-----;1>0 m m
1
(a)x(m, n) (b)h(m,n) (c)h(-m,-n) (d)h(l-m, -n)

n n
A
I 5 5 1 •
• , •;; 10 5 2
::} ::}'
~I-~ ~

,[1]
, 3 -2 -3
_m ,
II m
'l• ' .
(e)y(l, 0)= -2+5=3 . (f) y(m, n)

2.4 THE FOURIER TRANSFORM



Two-dimensional transforms such as the Fourier transform and the Zvtransform are
of fundamental importance in digital image processing as will becoine evident in the
subsequent chapters. In one dimension, the Fourier transform of a complex

Sec. 2.4 '. The Fourier Transform 15


(2.12)

• (2.13)

rF(~l ,~) == fJj(x,y) exp[-:j21T(X~1 + Y~2)]$i , (2.14)


"I(x, y) == [ [ F(~l'~) exp{j21T(Xtl + y~)] d~l d~2 ,I (2.15)

Examples of some useful two-dimensional Fourier transforms are given in Table


2.2.

Properties of the Fourier ,Transform

Table 2.3 gives a summary of the properties of the two-dimensional Fourier trans-
form. Some of these properties are discussed next.
. ,

1. &zarial frequencies. If I(x, y) is luminance and x, y the spatial coordinates,


then ~l ,~2 are the spati,al ~~~~<:.ndes ..!llatreprese l1t luminance chang~~"Yjth.
respect to ~atial distan~es. The units of ~l and ~2 are reciprocals of x'and y, ,
respectively. Sometimes the coordinates x, yare normalized by the viewing
distance of the image I(x, y). Then the units of ~l ,~ are cycles per degree (of
the viewing angle).
2. Uniqueness. for coatinuous functions, f(x, y) and F( ~I '~2) are unique wit:lL
respect to one another. There is no loss of informaiion if instead of preserving
the image, its Fourier transform is preserved. '[his factl'l!s been utilized in an
image data compression technique called transform coding.
3. Separability. J3y definition, the Fourier transform kernel is separable, so that it
-~ .

TABLE 2.2 Two-Dimensional Fourier Transform Pairs


I(x, y)
'8 (x, y) 1

8 (x ± Xo, y ± Yo) exp (± j2'll"xo ~1) exp (± j21fYo~)
exp (±jZ'll"X'I)l) exp (±j2'll"Y'I)2) 8 (~, :;: '1'11, ~2 :;: '1)2)
, 2
exp [-'ll" (x + Y )] exp] -'!I' (~1 + ~)]
reet(x, y) sinC(~h~) ,
tri (x,~) sine" (~,. ~2)
comb (x, y) comb
, (~" ~2)

16 Two-Dimensional Systems arid Mathematical Preliminaries Chap. 2,


,

TABLE 2,3 Properties of Two-Dimensional Fourier Transform

Property Function j'(x, y) Fourier Transform F(~" ~~)

Rotation f(+x, :ty) F( :i:t;" :tl;,)


Linearity a-], (x, y) + adz (x, y) a, F; (~" ~) + az 1'2 (1;,,1;,)
Conjugation r (x, y) P (- 1;" -1;,)
Separability f, (x)f,(y) fI (~,) 1'2 (1;,)
F(t;, I a, t,l b)
Scaling [tax, by)
labl
Shifting f(x :i: XO, Y :t Yo) exp [±j2'lT (xo!;, + Yo 1;,)] F (1;., i;z)
Modulation exp[±j2'lT (Ti'X + Ti,y)]f (x, y) ,
F (1;,:;: '1" 1;2:;: Til)
Convolution g(x, y) '" hex, Y)@/(x, y) 0(1;1,1;2) = H(i;., i;,) F(I;" t,)
Multiplication g(x, y) = hex, y)/(x, y) O(i;" ~2) = H(!;" ~) @F(t" t,)
*
Spatial correlation c(x, y) '" hex, y) I(x, Y) CCt" i;z) = H( -s., -{;,) F(t., s,)
Inner product I'" fJoo/(x,Y)h* (x,y) dx dy 1= fJ: F(~,,!;,)H* (sd,) dt;J di;z
_
can
..
be written as a separable transformation in x and y, i.e.,

F(~1'~2)= r -llCl
r
-00
f(x,y) exp(-j2'1TX~1)dx exp(-j27l'Y~2)dy

This means the two-dimensional transformation can be realized by a succes-
sion of one-dimensional
.
transformations. along each of the spatial
,
coordinates.
4. !:J:$.guency response and eigenfunctions oi.$1Jift inYClriaflt systern;s. An eigen:
function of~_~ystem is define,d as an input function that is reproduced at the
output with a possible change only in its amplitude. A fundamental property
of a linear shift invariant system is that its eigenfunctions are given by the-
complex exponential exp[J27l'(~lx + ~2Y)]' Thus in Fig.2A, for any fixed
(~1' ~), the output of the linear shift invariant system would be

g(x, y) = rJ~ hex - x', y - y ') exp[j2'lT(~IX' + e2Y')]dx' dy I

Performing the change of variables i =x - x', y =y - y' and simplifying the .


result, we get
(irx, y)= H(~l' ~) exp[j27l'(~lx + ~2Y)] ." .. ~ (2.16)
The function H (~1 , ~2), which is the Fourier transform of. the impulse re-
sponse, is also called the frequency response of the system. It represents the
(complex) amplitude of the system response at spatial frequency (~I , ~ ).

Flrre 2.4 . Eigenfunctions of a linear shift invariant system,


¢ --exp{j2'lT(~,x + ~,y)}. H = H(~"W ~ Fourier transform
of h(x,y). .

Sec. 2.4 The Fourier Transform 17



S. Convolution theorem. The Fourier transform of the convolution of two func-
tions is the product of their Fourier transforms, i.e.,


.This theorem suggests that the convolution of two functions' may be evaluated
by inverse Fourier transforming the product of their Fourier transforms. The
discrete version of this theorem yields a fast Fourier transform based con-
volution algorithm (see Chapter 5).
The converse of the convolution theorem is that the Fourier transform of
the product of two functions is the convolution of their Fourier transforms.
. The result of convolution theorem can also be extended to the,SJ!atiqi
correlation between tworeal functions h (z, y) and lex, y), which is defined as

c(x,y) = h(x,y) *f(x, y) ~ rr~h(x:'Y')f(x +x',y + y')dx' dy ' (2.18)

A change of variables shows that c(x, y) is also the convolution


h( -x, -Y)@/(x, y), which yields
C{~1 ,S2) := H( -~I , -1,;2)F(1,;1> ~) (2.19)
6. !!Jnf!!:.l2.rOdUCl preservation.. Another important property of the Fourier trans-
form is that the inner product of two functions is equal to the inner product of
their Fourier transforms, i.e., ,

1 tiJ:. rJ~J(X' y)h* (x, y) dx dy = [r~ F(I,;I, ~)H' (I,;I'~} d~l dS2 (2.20)

Setting h = f, we obtain the well-known Egrsel!1l.1 energy (;,QMerJ!fltion formul4


• •

f.r~ If(x, Y)1 2


dx dy == rr IF(~I' 2
t2 )1 dtl dt2 (2.21)

i.e., the total energy in the function is the same as in its Fourier transform.
7. !;l1l11ls..el trans/arm., The Fourier transform of a circularly symmetric function is
also circularly symmetric and is given by what is called the Hankel transfo17'TJ...
(see Problem 2.10). . ' . . .-

Fourier Transform of Sequences (Fourier Series)

For a. one-dimensional sequence x(n), real or complex, its Fourier transform is


defined as the s e r i e s '
"
. X(w) '" 2: x(n) exp( -jnw), (2.22)

Theinversetran~ormisgITenby

, (2.23)

• , .
18 Two-Dimensional
. Systems and. Mathematical Preliminaries Chap. 2


Note that X(w) is periodic with period 2n. Hence it is sufficient to specify it over
one period. . .. ,
The Fourier transform pair of a two-dimensional sequence x(rn, n) is defined
as . •
~. ----. -
F.- 00_

X(W1>W2)~ 2:2: x(m, n) expl-j(mwi + nwz)],1•



,.,:0:: . ~._. -_~._~. . _.~ . .__ .1
. . m,n
. ~.. . ..... _--~

~(m, n)'; ~2rJ:"X(WI ,(2) exp(j(mwl + n ( 2)] dWl ~:/ ,I (2.25)

Now X(WI' (2) is periodic with period Zrr in each argument, i.e.,
X(WI + 271', W2';' 2'11") =: X(WI ± 2'11", Wz) = X(WI> Wz + 271') = X(WI, wz) (2.25)
Often, the sequence xtm, n) in the series in (2.24) is absolutely summable, i.e.,
.,
2:2: Ix(m, n)1 < 00 (2.26)
m, n "" -Q:l

Analogous to the continuous case, H (WI, Wz), the Fourier transform of the shift
invariant impulse response is called frequency response. The Fourier transform of
sequences has many properties similar to the Fourier transform of continuous
functions. These are summarized in Table 2.4. .

TABLE 2.4 Properties and Examples of Fourier Transform of Two-Dimensional Sequences


Property Sequence Transform

x(rn, n),y(m, n), n),'" hem, X(UlI' Ulo), Y(OOi> Ulz). Hew" Ulz).··'"
Linearity alXI (rn, n) + azxz(rn, n) a,XI (Ul" (2) + azX,(Ul" lllz)
Conjugation z" (rn, n) X" (-WI, -Ul2)
Separability Xl (rn)xz(n) Xl (WI) X 2 (w:)
Shifting xtm ± mo, n ± no) exp[±j(mOUll + no wz)J X (Ul" Ul,)
Modulation exp[±j(WOlm + Ul02n)]X (m, n) X (Ul. :;: Uln" w" '+' til(2)
Convolution y(m, n) = hem, n)@x(rn, n) Y (ult, Wz) "" H(Ul" wz)X (WI, (2)
Multiplication hem, n)x(m, n) (4~') H(w1, wz)@X(w"ooz)
Spatial correlation c(rn, n) "" h (rn, n)
~
*x(rn, n) C(UlI, (2) '" H(-w" -Wz)X(UlI, "'2)
. Inneroroduct ,1'" LL x(rn, n)y*(rn, n) . IJ""'"
I = 40T Z - -rr t,
X(Whw,,) Y' (WI, (02) .
dUll dUlz

Energy conservation 1Z
~ = 4'lT f" J'_",IX(Ull,
_'"
" . UlzW dUl,dUlZ
LL exp O(mUlOI + nUlJ)Z)] 2
4oT S (Ul1 - WOl, Ul2 - Woz)

S (m, n)

Sec. 2.4 The Fourier Transform . 19


2.5 THE Z-TRANSFORM OR LAURENT SERIES
.
A useful generalization of the Fourier series is the Z -transform, which for a
two-dimensional complex sequence x(m, n)is defined as '.
00

X(ZI ,zz) == 2: 2.: x(m, n)zi Zz n tiD, ' (2.27) m


• m,n=-Q( /"'././""~ .. <,

., . - -.... -'
_f•.. ',_ -=~
/' .. '

where ZI ,Zz are complex variables. The set of values of ?1 ,Zz for which this series
converges uniformly is called the region ot convergence. The,;;;-transform_ of.the .
!!!!£.ulse response of a linear shift invariant discrete system is called its transfer
Jz:mction, Applying the convolution theorem for Z -transforms (Table 2.5) we can
transform (2.10) as , ," .

iY(Zl ,zz) == H(zi ,ZZ).~(ZI ,zz} I


, .
(' ) Y(ZI>Zz).·
=? L H(ZI'ZZ ==Z_1:z.l, z2L
Le., the transfer function is also the ratio of the Z -transforms of the output and the'
input sequences. The inverse Z-transform is given by the double contour integral
" . - _ . . . . - -_ -._._.~.- - ······1

.
x(m, n}== pI )2
J-'lT,
ff X(ZI ,zz)z;n-I Z~-l dZ I dzzj'f (2.28)
1 . _. . . . _" .'.. oM.' •..•..• '' ..._ ---'"

where the contours of integration are counterclockwise and lie in the region of
convergence. When the region of convergence includes the unit circles IZI':=,
1, IZzl == 1, then evaluation of X(Z"ZZ) at Zl == exp(jUld,zz == expGWz). yields the
Fourier transform of x(m,n).. Sometimes X(ZI,ZZ) is available as a finite series
(such as the transfer function of fIR filters). Then x(m, n) can be obtained by
m n
inspection as the coefficient of the term zi zi •

TABLE 2.5 Properties ofthe Two-Dimensional Z-Transform

Property Sequence Z« Transform

x(m, n), y(m, n), htm, n), .. · X(z" Z,), Y(Z" Z2), H(z" Z2),'"
Rotation x(-m, -n) X(Z,I, zi l )
Linearity alxl (m, n) + a2x,(m, n) a1X1(Zt. Z,) + a2X Z(z" I2)
Conjugation X*{m, n) X*(zi, zn
Separability XI (m)X2(n) X, (Zl) X 2(Z2) \
\
Shifting x{m ± mo, n :!: no) Zl±mO Z 2:t:tto X(Zl, Zz)

Modulation am b" x(m, n) X(~, ~)


Convolution h(m, n)@x(m, n) H(ZI, Z2)X(Zh z,)

. if X (~."~)
. (1)2
Multiplication x(m, n) y(m, n) , Y(' ')
Zit Zz dzi
, dz,
z
2 'lTj II Z" ZI Z2·
CIC2


20 Two-Dimensional Systems and Mathematical Preliminaries Chap. 2
Causality and Stability

1\ one-dimensional shift invariant system is called causalif its output at any' time is.
_
not... affected by future inputs. This means its impulse response h (n) ==
,~---
°for n < 0
.
and its transfer function must have a one-sided Laurent series, i.e.,
,, '"
I H(z) "" 2: h(n)z-' i ~
(2.29)
{ n""'O }-
. ' .

Extending this definition, any sequence x(n) is called causal if x(n)==O,n <0;
anticausal if x (n) == 0, n ;z: 0, and noncausal if it is neither causal nor anticausal,
A system is called Wlhle if its output remains uniformly bounded for any
bounded input. For linear shift invariant systems, this condition requires that the
impulse response should be absolutely summable (£rove it!), i.e.,
, ",--.,
-:-.. .,
00
,""'
'"

L Ih(n)1 < 00 -.; (2.30)

This means H(z) cannot have any poles on the unit circle Izi == 1. If this system is to
be causal and stable, then the convergence of (2.29) at IzI = 1 implies the series must
converge for all [z] > 1, l.e., the poles of H(z) must lie inside the unit circle.
In two dimensions, a linear shift invariant system is stable when
..- - .

. , . LL Ih(m,n)l<oo!, (2.31)
~~_" m.-E~ _ ;
which implies the region of convergence of H(Zl' zz) must include the unit circles,
i.e., IZII == 1, Izzi == 1.
•,
\

2.6 OPTICAL AND MODULATION TRANSFER FUNCTIONS

For a spati;:lllyjnvariant imaging system., its gptiJ:aL:trtmster.~(OTF) is de-


fined as its normalized frequency response, i.e.,

OTF=H(~l'~) (2.32)"
H(O, D)
The modulation transfer function (MTF) is defined as the magnitude of die OTF,
• •

I.e. "
• , '

(2.33) .

Similar relations are valid for discrete systems. Figure 2.5 shows the MTFs of

systems whose PSFs are displayed in Fig. 2.2. In practice, it is often the MTF that is
measurable. The phase of the frequency response is estimated from physical consid-
erations. For many. optical systems, the OTF itself is positive. '

Sec. 2.6 Optical and Modulation Transfer Functions 21



l
I
{

..
,.,.
'y
i/

.'{'
'if


I

Example 2.2 •
The impulse response of an imaging system is given as h (x, y) = 2 sin']"ll'(x - xo)]/
['IT(x - xo)Y sin'["ll'(Y - YO)]/['ll'(Y - Yo)]'. Then its frequency response is H(~l ,~r=
2 tri(~l ,~)exp[-i2'lT(xO~' +Yo~)J, and OTF=tri(l;l'~) exp [-j2'll' (XOl;l + Yo~J,
MTF = tri (~ .. ~).

2.7 MATRIX THEORY RESULTS

Vectors and Matrices

Often one- and two-dimensional sequences will be represented by vectors and


• matrices, respectively. A column vector u containingN elements is denoted as

u(l)
u(2) "
u~{u(n)}=


(2.34)

u(N)

The nth element of the vector u is denoted by u(n), u., or [u] n Unless specified
»

otherwise, all vectors will be column vectors. A column vector of size N is also
called an N x 1 vector. Likewise, a row vector of size N is called a 1 x N vector.

22' Two~DimensionalSystems and Mathematical Preliminaries Chap. 2



,
A matrix A of size M x N has M rows and N columns and is defined as

a(l,l) a(1,2} .. -a(l, N)


a(2, 1)
• •
••
(2.35)

• ' . •

a(M,I)

a(M, 2)·· 'a(M, N)
The element in the mth row and nth column of matrix A is written as [A]m.n ~
a(m, n)~ am..' The nth column of A is denoted by a., whose mth element is written
as a.(m) = aim, n). When the starting index of a matrix is not (1,1), it will be so
indicated. For example,
A = {a(m,n) Osm,ns;;N-l}

represents an N x N matrix with starting index (0,0). Common definitions from


matrix theory are summarized in Table 2.6.' .
In two dimensions it is often useful to visualize an image as a matrix. The
0
matrix representation is simply a ,90 clockwise rotation of the conventional
two-dimensional Cartesian coordinate representation:
n n
tz -1

-3
m
1 4 2
. x(m, n)=i 4 0 5 :::}x = 2 0 :"'1
~

1 2 3 3 5 -:-
m

••

Rowand Column Ordering


Sometimes it is necessary to write a matrix in the form of a vector, for instance,


when storing an image on a disk or a tape. Let
.:c ~ O{x(m,

n)}

be a one-to-one ordering of the elements of the array {x(m, ~)} into the vectcr ,e •
. For an M x N matrix, a mapping used often is called the lexicographic or dictionary
ordering. This is a row-ordered vector and is defined as
."'" .,vT A[x(l, l)x(l, 2) ... ~ (1, N)x(2, 1) ... x (2: N) .. .x (M, 1) .. :x (M, NW'
= Cr{x(m, n)} '. (2.36a) .

Thus x T is the row vector obtained by stacking each row to the right of the previous
row of X. Another useful mapping is the column by column stacking, which gives a
column-ordered veetora'S' .

. .,vT = [x(l, 1)x(2, 1) ... x(M, 1)x(1,2) ... x(M, 2) .. . x(l,M) .. . x(M, N)f

Sec. 2.7 . Matrix Theory Results


TABLE 2.6 Matrix Theory Definitions


Item Definition Comments
" -
Matrix A = {a(m, n)} m = row-index, n = columr
index

Transpose Rows and columns are


interchanged.
Complex conjugate A* = {a' em, n)}
Conjugate transpose A*T={a*(n, m)}
"

Identity matrix I={o(m -n)} A square matrix with


unity along its diagonal.
Null matrix O={O} All elements are zero.

Matrix addition A +B = {a(m, n) + b(m," n)} . A, B have same


dimensions.
Scalar multiplication etA = {aa(m, n)}
K
Matrix multiplication cem, n)~ }; a(m, k)b(k,n) C ~ AB, A is M x K, B is
k-' K X N, C,is M x N.
AD '" BA.
Commuting matrices AB = BA Not true in general.
Vectorinner product (x, y) ~X·Ty = Lx*(n)y(n) Scalar quantity. If zero,
" x and yare called
orthogonal.
Vector outer product xyT = {x{m)y(n)} x is M x 1, Yis N x r,
outerproduct is M x N;
is a rank 1 matrix.
Symmetric
Hermitian A real symmetric matrix '
, is Hermitian. All
eigenvalues• are real.

Determinant IAI For square matrices only.


Rank [A] Number of linearly indepen-
dent rows or columns,
Inverse, A-' A-'A=AA-' =1 For square matrices only.
,Singular . A-e-L does not exist IAI = 0 ,

Trace Tr[A] = }; atn, n) Sum of the diagonal


" elements.
Eigenvalues, Ak ,All roots IA - Ak1j = 0
Eigenvectors, <l>k All solutions A <1>. = Ak $.,
<l>k '" 0

ABCDlemrna (A - BCD)·' = A, C are nonsingular.


A·' + A·' B(C·' - DA-, B)-' DA- 1
• , •

24 Two-Dimensional Systems and Mathematical Preliminaries ClJap.2


r~J
- •


~ Cc{x(m, fin (2.36b)

XN
where x, is the nth column of X.
Transposition and Conjugation Rules
1. A*T=[AT ] .
2. [ABY=BTA T
3. [A-1y= [ATt l ,

4. [AB]* = A*B*
Note that the c,pnjU!ta1!I.Jranspose. is denoted by A*T. In matrix theory literature, a
simplified notation A" is often used to denote the conjugate transpose of A. In the
theory of image transforms (Chapter 5), WtLWjll h5\ye_to '~A, A",
Ar and A• T and hence the need for the notation.
Teeplltz and Circulant Matrices • •

A Toeplitz matrix T is a matrix that has constant elements along the main diagonal
and the subdiagonals. This means the elements t(m, n) depend only on the differ-
ence m- n, i.e., tern,
n)= tm - n , Thus an N x NToeplitz matrix is of theform

'-I ... • t-tV.;. 1


to t_ 1 t-N+Z

T= • •

(2.37)
• •

• t_1
tN-I'" to
and is completely defined by the (2N -1) elements {tk, -N + 1 <k:s,N -I}.
-. Toeplitz matrices describe the input-output transformations of one-dimensional
linear shift invariant systems (see Example 2.3) and correlation matrices of sta-
, .tionary sequences. -
A matrix C is called circulant if each of its rows (or columns) is a circular shift
of the previous row (or column), i.e., ..
• ,
Co C2 ..... CN-I
CN-I CN-Z
• •
C= • • (2.38)
• •

• C2
CI C2'" CN-I


Sec. 2.7 Matrix Theory Results •
25

Note that C is also Toeplitz and ,


c(m, n) = c«m - n) modulo N) (2.39)
.
Circulant matrices describe the input-output
. , behavior of one-dimensional
linear periodic systems (see Example 2.4) and correlation matrices of periodic
sequences.
Example 2.3 (Linear convolution as a 'Ioeplitz matrix operation)
The output of a shift invariant system with impulse response h (n) "" n, -1:$1 n es l , and
with input x (n), which is zero outside 0 S n S 4, is given by the convolution
4

yen) "" h(n)@x(n) "" 2: hen - k)x(k)


k-O

Note that y (n) will be zero outside the interval -1:$1 n S 5. In vector notation, this can
be written as a 7 x 5 Toeplitz mat ix operating on a5 x 1 vector, namely"

y(-I) -1 0 .0 0 r-
0
yeO) 0 -1 0 0 0 x(O)
y(1) 1 0 -1 0 0 x(l)
y(2) ::::: 0 1 0 -1 0 x (2)
y(3) 0 0 1 0 -1 x (3)
y(4) 0 0
0
0
0 0
1 0 I x (4) .
y(5) L 0 1

Example 2.4 (Circular convolution as a circulant matrix operation)


, '
If two convolving sequences are periodic, then their convolution is also periodic and
can be represented as '
N--1

yen) = 2: hen - k)x(k), OSnsN-l


. .
where h( -n) = heN - n) and Nis the period. For example.• letN = 4 and hen) = n + 3 . . d .J <-

(modulo 4). In vector notation this gives e:;:" --iF,"..!'..!\ i'! ~ (:.:':",
.
,
,._r

y(O) 3 2 1 0 x (0)
)'(1) = 0 3 2 1 x(l)
y(2) 1 0 3 2 x(2)
y(3) 2 1 0 3 x(3)

Thus the input-to-output transformation of a circular convolution is described by II


circulant matrix.
.\ .

Orthogonal and Unitary Matrices

An cTthogonal matrix is such that its inverse is equal to its transpose, i.e., A is
orthogonal if .'
A-1=A T
or
• , (2.40)

26 Two-Dimensional Systems and Mathematical Prellmlnaries Chap.2.


or
•.!\.AH=A*TA = I (2.41)
A real orthogonal matrix is also unitary, but a unitary matrix need not be
orthogonal. Thepreceding definitions imply that the columns (or rows) ofan N X N
unitary matrix are orthogonal and form a complete set of basis vectors in an
Ni-dimensionai vector space.
Example 2.5
Consider the matrices

1[1
A = v2 1
t A, = [v2,2
-J
j
v2'
]

It is easy to check that At is orthogonal and unitary. A, is not unitary. A3 is unitary with
orthogonal rows.

Positive Definiteness and Quadratic Forms

An N x N Hermitian matrix A is called J?~~iJ!l!!d(:lj!1iJe.,()rpositivesemidefinue if the


quadratic form
Q ~ x,.T Ax, "iIx =F 0 (2.42)
is positive (>0) or nonnegative (>0), respectively. Similarly, A is negative definite or
negative semidefinite if Q < 0 or Q sO, respectively. A matrix that does not satisfy
any of the above is indefinite.
If A is a symmetric positive (nonnegative) definite matrix, then all itseigen-
values {Ak} are' positive (nonnegative). and the determinant of A satisfies the in-
equality ,
N N
IAI = 11 Ak S IT aik, k) (2.43) ,
k*l k=:l

Diagonal Forms

For any Hermitian matrix R there exists a unitarymatriJ;: W~c!tth.!t


,
<t>*TRep = A (2.44)

• where A is a diaJS2nai matrix containing the eigenvaluea of R. An alternate form of
, . .

the above equation is . ,


(2.45)
which is the set of eige9yaJue equations
Rtj>k = Ak «Pk. k = 1, ... ,N (2.46)
• •

where {Ad and {<Pk} are the eigenvalues and eigenvectors, respectively, of R. For'

Sec. 2.7" Matrix Theory Results ,


27
,
Hermitian matrices, the eigenvcftors corresponding to distinct eigenvalues are
•...•.• ~. _ . - --- - "._._.•. _~-_. .,.' "'" .__ __ U'.". __ ._·_ ~._~ ,

-.ortho~~pal. For repeated eigenvalues, their eigenvectors form a subspace thaCcan


be orthogonalized to yield a complete set of orthogonal eigenvectors. Norrnaliza-
tion of these eigenvectors yields an orthonormal set, i.e., the unitary matrix cl>,
whose columns are these eigenvectors. The matrix <P is also called the eigenrnatrix
of R,

2.8 BLOCK MATRICES AND KRONECKER PRODUCTS


/
In image processing, the analysis of many problems can be simplified substantially
by working with block matrices and the so-called Kronecker products. For example,
the two-dimensional convolution can be expressed by simple block matrix oper-
ations. . •
!
Block Matrices
Any matrix dJ. whose elements are matrices themselves is called a block matrix; for
example, '.
AI,I A1, 2 •• , AI,n
AZ,I AZ,2 'A2,n •

• , ,
dJ.= , (2.47)
• • •
, , •

Am, I Am, 2, .. Am,n

is a block matrix where {Ai,i} are p x q matrices. The matrix sa. iscalled an m x n
block matrix of basic dimension p x q. If A i• i are square matrices (say, p x p), then
we also call ss to be an m x n block matrix of basic dimension p.
If the block structure is Toeplitz, (Aj,i = Ai-i) or circulant (Ai,i "" Aw-nmodulon),
m := n) then dJ. is called block Toeplitz or block circulant, respectively. Additionally,
if each block itself is Toeplitz (or circulant), then d is called doubly block Toeplitz
(or doubly block circulant). Finally, if {Ai,i} are Toeplitz (or circulant) but (~,i =F
Ai-i) then d is called a Toeplitz block (or circulant block) matrix, Note that a doubly

Toeplitz (or circulant) matrix need not be fully Toeplitz (or circulant), i.e. the
scaler elements of d need not be constants along the subdiagonals.

Example 2.6 .
Consider the two-dimensional convolution
-.
Z 1

y(m,n)= 2:.2: h(m -m',n -n')x(m',n), . QSms3, Q<n <2


",,' .. Ott' ... o
where the x(m, n) and h(m, n) are defined in Example 2.1. We will examine the block
structure of the matrices when the input and output arrays are mapped into column-
ordered vectors. Let x, and y. be the column vectors. Then
. 1

Yn = ~ H, - n-:Xn' , H,.={h(m ,-m',n), OSm s3, QSm'.SZ},


,,' -Q

• 28 Two-Dimensional Systems and Mathematical Preliminaries. Chap. 2


where
2
Xo= :> , - Xl =
1
4 , Ho=
1
~l
0
1 ~l
0 -1 1
3 , 1
0 0 -1
,
1 0 0 ,
1 1 0
H, =
0 1 1
, H- l =0, H,=O
0 0 1J

Defining.g and ,e as column-ordered vectors, we get

, <y' =
yoy, = H,
Ho 0
Ho
[x o] J1
='i1e.r·
yz , 0 H, X,

where 'Ye is a doubly Toeplitz 3 x 2 block matrix of basic dimensions 4 x 3, However,


the matrix 'Ye as a whole is not Toeplitz because ['Ye]"". 4- ('Ye]", _. (show itl). Hence the
one-dimensional system..!!, = 'iJe..v is linear but not spatially invariant, even though the
original two-dimensional system is. Alternatively,.9 = 'Ye.r does not represent a one-
dimensional convolution operation although it does represent a two-dimensional
convolution.
Example 2.7
Block circulantmatrices arise when the convolving arrays are periodic. For example,
let

z ,
y(m,n)= L L
m· ... On··O
h(m-m',n-n')x(m',n'),


where h(m, n) is doubly periodic with periods (3,4), i.e., h (m, n) = hem + 3, n + 4),
'Vm, n. The array hem, n) over one array period is shown next: .

3 1 0 1
, • ---
2 2 2
--.,. .......
0 h(m, n)

-.-
1 3 5
--- ---1 •

n =0 4 8
-- -- --
3 m
m= 0 1 2
In terms of column vectors of x(m, n) andy(m, n), we can write
. '


3
Yn=
flO
L0 Hn-n:xn"
Jilt

where H. is a periodic sequence of 3 x 3 circulant matrices with period 4, given by


4 3 8 3 1 5 2 0 2 1 1 O·
H o= 8 4 3 , H~=
• 5 3 1 , H,= 2 2 0 , H,'= 0-1 1
3 8 4 1 53 0 2 2 1 0 1

Sec. 2.8, Block Matrices and Kronecker Products . 29


Written as a column-ordered vector equation, the outpuc becomes

I. .Yo rH 0
.
H3 H2 a, Xvi i
_ 'Y, ._ H, .Ro H3 He X, ' li "J>


Jf - y, .-: fl, HI Ho H 3 X, = 41.'<: •
,Y3J H 3 H, H, HoJ X3

where H_ n = H...n. Now ::Jeis a doubly circulant, 4 x 4 block matrix of basic dimension
3 x 3.

Kronecker Products

If A and Bare M I x M 2 and N; x N 2 matrices, respectively, then their Kronecker


product is defined as
. a(l, l)B ... aCl, M 2)B
• •
A®B~{a(m, n)B}:= • • (2,48)
, •

This is an MI x M 2 block matrix of basic dimension NI x Ns- Note tUlltA®U


B@A. Kronecker products are useful in generating high-order matrices. from
low-order matrices, for example, the fast Hadamard transforms that will be studied
in Chapter 5. Severa! properties of Kronecker products are listed in Table 2.7. A , ,
particularly useful result is the identity ,
(A®B)(C®D) = (AC)®(BD) (2.49)
• • •
It expresses the matrix multiplication of two Kronecker products as a Kronecker
product of two matrices, For N x N matrices, it will take (N") + (N 4) oper- o o
TABLE 2.7 Properties of Kronecker Products
1. (A + B) r.:Ax x C + B 'CI
\.."'Y C -- A \{Y
r.:A x C ""''--.
iO. ._-
~ . (') / s/ '";..,.,
J .....,.' ."
~,".
o

2. (A@B)(8)C =A@(B(8)C).. ,·0.,. 0


3. o.(A(8)B)=(aA)@B = A (8) (ciB),where 0. is scalar, •
T(8)B T
4. (A@Bt=A
5. (A(8)B)-1=A- 1@B- 1
6. (A (8) B) (C(8)D) = (AC)@(BD)
7. A@B=(A@I)(I(8)B)
8. n(A. (8) B = (flAk) ® ( nBk), where A and s, are square matrices'
Ie-I
k)
k=lk"'l
k

9. det (A (8) B) ='(detA)" (det B)", where A is m x m and B is n x n



10, Tr (A@B) = [Tr (A)] [Tr (B)]
11. If rCA) denotes the rank of a matrix A, then rCA (8) B) = rCA) r(B).
12. If A and B are unitary, then A (8) B is also unitary ""cl
13 If C = A (8) B, C~ '7 i'kSk, AXi = h;Xi, BY1 = !kiYi>
e .

thent.=X;®Ybi'.=A,~i> Isism, Isjsn, lsksmn.


30 Two-Dimensional Systems and Mathematical Preliminaries Chap. 2
4
ations to compute the left side, whereas only 0(1'1 ) operations are required to .
compute the right side. This principle is useful in developing fast algorithms for
multiplying matrices that can be expressed as Kronecker products. '
Example 2.8
Let

Note the two products are not equal.

Separable Operations

Consider the transformation (on an tv x M image U)


V~AUBT
or

v(k, I) = 2: 2: a(k, m)u(m, n)b (I, n) (2.50)


m n

This defines a class of separable operations, where A operates on the columns of U


and B operates on the rows of the result, If Vk and Um denote the kth and mth row
vectors of V and U, respectively, then the preceding series becomes .

m m

where [A ® Bh,,,, is the (k, m )th block of A ® B. Thus if U and V are row-ordered
into vectors a and a , respectively, then
V=AUB T ::} O'=(A®B)U'
i.e., the separable transformation of (2.50) maps into a Kronecker product' oper-
ating on a vector.
• ,.
2.9 RANDOM SIGNALS ,

Definitions

• A complex discrete random signal or a discrete random process is a sequence of


random variables u (n), For complex random sequences, we definl'l
Mean ~ j.L.(n) ~ j..t,(n) = E[u(n)] (2.51)

Sec. 2.9 Random Signals 31



2
Variance ~ (J' ~(n) = 0' 2(n) '" .E[lu (n) .... iJ-(n ,)1 ] (2.52)
Covariance> Cov[u(n), u(n ')J ~ ru(n, n ')~ rtn, n ')
= E {(urn) - lJ.(n)][u· (n ~ - JJ. '(n ')J) (2.53)
Cross covarian~e ~ Cov[u(n), v(n ')] ~~u,(~,n i~-'-~
. = E{[u(n) - fJ.u(n)][v"'(n') -11:(n')]} (2.54)
Autocorrelation ~ auu(n, n ') d atn, n ') = E[u(n)u* (n ')]
= r(n, n') + /A-(n)/J-'(n ') (2.55)
Cross-correlation = au,(n, n ') = E[u(n)v* (n ')] = ru,(n, n') + /A-u(n)/A-: (n ') (2.56)
The symbol E denotes the mathematical ~ectatiQn. operator. Whenever there is
no confusion, we will drop the subscript u from the various functions. For an N x 1
vector u, its mean, covariance, and other properties are defined as
E[u] = IJ- = {f.I.(n)} is an N x 1 vector, (2.57)
Cov[n] ~E(n -IJ-)(U* -1J-*;r~Ru~R={r(n,n')} is anN x Nmatrix (2.58)

Cov[n, v]~ E(n - IJ-u)(v* -IJ-: r ~ R"" = {ruv(n, n ')} is an N x N matrix (2.59)
Now IJ- and R represent the mean vector and the covariance matrix, respectively, of
the vector u. .


Gaussian or Normal Distribution •

.The probability density function of a random variable u is denoted by P« (u). •For a


Gaussian random variable
2
d 1· {-I a - /J-1 }
Pu (ll» = V2''!T':r exp 2<? ' (2.60)
where f.I. and o" are its mean variance and u. denotes the value the random variable
takes. For f.I. = 0 and!? = 1, this is called the standard normal distribution.

Gaussian Random Processes

A sequence, possibly infinite, is called a Gaussian (or normal) random process if the
joint probability density of any finite sub-sequence is a Gaussian distribution. For
example, for a Gaussian sequence {u (n), 1 S n S N} the joint density would be
p. (II) = P. (Ul> lL2, •.• ,UN) = [(2'll') N12IRl lI2
r 1
exp{ -Y2 «(I - IJ-)*TR- 1 «(I - IJ-)} (2.61)
where R is the covariance matrix of u and is assumed to be nonsingular. .

Stationary Processes

A random sequence u (n) is said to be strict-sense stationary if the jgil}i l,l~!lli!Y. of


any partial sequence {u(l), 1 s I sk} is the same as that of the sIiIfied sequence
'-~._ ...• - -."..-..-
• •

32 Two-Dirnensional Systems and Mathematical PreUminaries Chap. 2.


{u(l + m), 1 $1 :5 k}, for any integer m and any length k. The sequence u(n) is
called wide-sense stationary if .
E[u(n)] = jJ. = constant
E[u(n)u*(n')] = r(n -n') (2.62)
Thisimplies r(n, n') = r(n - n '), i.e., the covariance matrix of {u (n)} is Toeplitz.
Unless stated otherwise, we will imply wide-sense stationarity whenever we
call a random process stationary. Since a Gaussian process is completely specified
by the mean and covariance functions, for such a process wide-sense stationarity is
the same as strict-sense stationarity. In general, although strict-sense stationarity
implies stationarity in the wide sense, the converse is not true.
We will denote the covariance function of a stationary process u(n) by r(n),
the implication being
r(n) = Cov[u(n), u(O)] = Cov[u(n' + n), u(n ')], 'rIn " 'rIn (2.63)
Using the definitions of covariance and autocorrelation functions, it can be
shown that the arrays r(n, n') and a(n, n') are conjugate symmetric and non-
negative definite, i.e., .
Symmetry: r(n, n'} = r" (n', Ti), 'rIn, n' (2.64)
Nonnegativity: 2: 2: x (n)r(n, n ')x* (n ') z: 0, x(n) =F 0, 'rIn (2.65)
• n n'

This means the covariance and autocorrelation matrices 'are Hermitian and
nonnegative definite.

Markov Processes •

A random sequence u(n) is called Marko!-~<l.U?Jh:of(ler Markov-Jf the condi-.


}ional probability of u(n) given the entire past is equal to the conditional probability
of u(n) given only u(n - 1), ... , u(n - p), i.e.,
/.::' -- - .

· Prob[u(n)ju(n -1), u(n - 2), ... ] =


· Pro1;>[u(n)lu(n -1),;.... ,u(n.-:- p)J0 'rill (2.66a)
, .. ,,~-_.~~. ~ .. ~~-

- A Markov-I sequence is simply called Markov. A Markov-p scalar sequence can


· also be expressed as a (p x 1) Markov-I vector sequence. Another interpretation of
a pth-order Markov sequence is that if the "present," {u(j), n - p :5j:5 n -1}, is
known, then the "past," {u (j),j < II - p}, and the "future," {u (j),j ~ n}, are
independent. This definition is useful in defining Markov random fields in two
dimensions (see Chapter 6). For Gaussian Markov-p sequences it is sufficient that .
. the conditional expectations satisfy the relation . -_. '.
,
'E[u(n)lu(n -l),u(n -2), ... ]=E[u(n)lu(n -1), ... ,u(n -p)], /'lfn (2.66b)

Example 2.9 (Covariance matrix of stationary sequences)
The covariance function of a first-order stationary Markov sequence u (n) is given as
r(n) = plnl, Ipl < 1, lin (2.67)

Sec. 2.9 Random Signals 33




This is often used as the covariance model .


of a scan line of monochrome images. -
For an N x 1 vector u '" {u(n), 1 <:: n s N}, its covariance matrix is {rem - n)}, i.e.,

r •
• •
R= (2.68)

PN-l ... p
J
which is Toeplitz. !n fact the covariance and al.l1l:lwm;:lat.imunattices of ll~ stationm .
Jequ,ence are Toepli!f:. Conversely, any sequence, finite or infinite, can be called
stationary if its covariance and autocorrelation matrices are Toeplitz,

• •

Orthogonality and Independence


I

Two random variables x and yare called inrJf;1?J-:tlg.ent if and only if their joint
probability density function is a product of their marginal densities, i.e.,
I, PX,y(x,'y r= pJr )py(y) 7
.
(2.69)
'TWo random sequences x(n) and yen) are called independent if and only if for every
nand n ', the random variablesx(n) and yen ') are independent. .
The random variables x. and yare said to beQl.fk~if.
. '
E[xy*] = 0 (2.70)
and are called.:;;;un;:..:c=.Qrr~kl{edjf
E[xy*] = (E[x ))(E[y*])
or
E[(x - J.Lx )(y - J.Ly )*J = 0 (2.71)
Thus zero mean uncorrelated random variables are also orthogonal.Gaussill!l r~
dom variables which are uncorrelated are also independent.
-'- ._~- - --,

,!The Karhunen·loeve (KL) Transform


. •

-,
Let {x (n), 1 :s; n <.: N} be a complex random sequence whose autocorrelation matrix
is R. Let ell be an N x N unitary matrix, which reduces R to its diagonal form A [see
(2.44)]. The transformed vector .
1Y=<Il*Txj (2.72)
is called the Karhunen-Loeve (KL) transform of x, It satisfies the property .
E[yy*'1 = <Il*T{E[xx*7J}c!> = 4J*TRtb "" A
~ . E[y(k)y* (1)J "" AkO(k -1)/
, J
(2.73)

34· Two-Dimensional
• •
Systems and Mathematical Preliminaries • •
Chap. 2

i.e .. the elements of the transformed sequence y(k) are orthogonpl. If R represents,
• - M ••• _·"·····_··~

the covariance matrix rather than the autocorrelation matrix of A., then the ~uenc,~
y1k) is uncorrelated, The unitary matrix <f>*r is called the KL transform matrix. Its
tows are the conjugate eigenvectors of R, i.e., it is the conjugate transpose of the
eigenmatrix of R. The KL transform is of fundamental importance in digital signal
and image processing. Its applications and properties are considered in Chapter 5.

2.10 DISCRETE RANDOM FIELDS

In statistical, representation of images, each pixel is considered as a random


variable. Thus we think of a given image as a sample function of an ensemble of
images. Such an ensemble would be adequately defined by a joint probability
density of the array of random variables. For practical
, image sizes, the ,
number of
random variables is very large (262,144 for 512 x 512 images). Thus it is difficult to
specify a realistic joint density function because it would be an enormous task to
measure it. One possibility is to specify the ensemble by its first- and second-order
moments only (mean and covariances), Even with this simplifying constraint, the
task of determining realistic model parameters remains difficult. Various ap-
proaches for stochastic modeling are considered in Chapter 6. Here we consider
some basic definitions that will be useful in the subsequent chapters.

Definitions

When each sample of a two-dimensional sequence is a random variable, we call it a


disE!:.e..te ralJdom fi.~!d. When the random field represents an ensemble of images
,(such as television images or satellite images), we call it a ![J!ulQrrt imggs:. The term
random field will apply to any two-dimensional random sequence.
The mean and covariance functions of a complex random field are defined as
-
, E[u(m, n)] = J-L(m, n) . (2.74)
Cov[u(m, n), u(m', n ')] ~ E[(u(m, n) - J-L(m, n»(u* (m', n) - J-L "(m', n '»]
= rim, n; m', n ') = rem, n; m', n') (2.75)
" ,..,
Often, we will consider the stationary case wher~.t'"I'!) F):'/fi
f.1(m, n) = J-L = constant ~,,- ~'.:, . ",.,."
ru(m, n;m',n')=ru(m -m',n -n~ =r(m -m',n -it') (2.76)
As before, whenever there is no confusion, we will drop the subscript u from.r,; A
random field satisfying (2.76) is also called shift invariant, translational (or spatial)
invariant, homogeneous; or wide-sense stationary. Unless otherwise mentioned, '.
stationarity will always be implied in the wide sense.
We will denote the covariance function of a stationary random field u (m, n) by
r.(m, n) or rem, n), implying that
,
rtm, n) = Cov[u(m, n), u(O, 0)] = Cov[u(m' + m, n' + n), uim', n ,)J , V(m', n')
. (2.77)

Sec. 2.10, Discrete Random Fields 35


, •

A random field x(m, n) will be called a...1fMN1Jg~t:.,. t1diLwhenever any two


different elements x (m, n) and x(m', n') are mutually uncorrelated, i.e., the field's
covariance function is of the form

rx(m,n;m',n')=<T~(m,n)a(m -m',n -n') (2.78)


_~.Landomfield is calledGaussian l.f its every segment defined on an arbitrary
finite grid is Gaussian. This means every finite segment of u(m, n) when mapped
into a vector will have a joint density of the form of (2.61).
<;:ovariances and autocorrelations of two-dimensional fields have symmetry
and nonnegativity properties similar to those of one-dimensional random processes:
f.,Ylnm~ r(m,n;m',n? =r*(m',n';m,n) (2.79)
In general
*r" im", n;m, n ')
• •

rem, n ;m', n ') "'" rim', n;m, n') (2.80)

!:!2nnesativity: 2: 2: 2: 2:x(m, n)r(m, n;m', n ')x* (m', n ,) ~ 0,


m n m' n'
x(m, n) "'" 0, V(m, n) (2.81)
.
Separable and Isotropic Image Covariance Functions

The covariance function of a random field is called separable when it can be ex-
pressed as a product of covariance functions of one-dimensional sequences, I.e., if
rem, n; ,m', n') = rl(m, m ')r2(n, n ') .(Nonstationary case) (2.82)
rem, n) = rl(m)r2(n) (Stationary case) '(2.83)
A separable stationary covariance function often used in image processing is
rem, n) = u pili p~I,
2
!Pli < I, 1P21 < 1 (2.84) .
Here u 2 represents the variance of the random field and PI = r(l, 0)/u 2 ,
PI = reO, 1)/u 2 are the one-step correlations in the m and n directions, respectively.
Another covariance function often considered as more realistic' for many
images is the nonseparable exponential function
rem, n) = (1'2 exp{-y,..../X-lm-2=-+-(X2-n-=2}· (2.85)

When (Xl = et2 = 01., rem, n) becomes a function of the Euclidean distance d ~
;~_ , ·c.···_~._ ~

,I 2 2'
V m + n ,I.e.,
(2.86)

where p = exp( -Ietl). Such a function is also calledJE!lrop.if.or"c:irgdarly.symmetric.


Figure 2.6 shows a display of the separable and isotropic covariance functions. The
parameters of the nonseparable exponential covariance function are related to the
one-step correlations as (Xl = -lnpl' lX2::::; -lnp2' Thus the covariance models (2.84)
and (2.85) can be identified by measuring the variance and the one-step correlations
of their zero mean random fields. In practice, these quantities are estimated from

36 Two-Dimensional Systems and Mathematical Preliminaries Chap. 2 .



Figure 2.6 Two-dimensional covariance


I
functions : I (a) Isotropic covariance

and its log display; (b) separable covar-
iance and its log display.

the given image data by replacing the ensemble averages by sample averages; for
example, for an M x Nimage u(m, n),

(2.87)
M-m N-n

rem, n)=f(m,n) = ~N mf:: .'1- [u(m', n') - l1][u(m +m', n +n')- III (2.88)
l 1

For many image classes, PI and pz are found to be around 0,95.


Example 2.10 (Covariance matrices of random fields)
In Example 2.9 we saw that the covariance matrix of a one-dimensional stationary
sequence is a symmetric Toeplitz matrix.. Covariance matrices of stationary random
fields mapped into vectors by row or column ordering are block Toeplitz matrices. For
example, the covariance matrix of a segment of a stationary random field mapped into
a vector by column (or row) ordering is a doubly Toeplitz block matrix. If the covar-
iance function is separable, then this covariance block matrix is a Kronecker product of
two matrices. For details see Problem 2.17. •
., •

2.11 THE SPECTRAL DENSITY FUNCTION

Let u(n) be a stationary random sequence. Its covariance gelJerating.!H11Ctjr;>tL(fXiJ!J


is defined "!i_the Z -transform of its covariance ----function r.(n),Le"
~~ ,- --,. . -_.- -
00 •

CGF{u(n)}~SI«z)~S(z)~ .l: ruCn)z-· (2.89)

Sec. 2.11" The Spectral Density Function 31


--The §jJec.tral density [unction. (SDF) is defined as the ,fourier tran~for!11_of rin),
which is the CGF evaluated atz = exp(jw),
- .
i.e.,
, -- -~-" --~..
x

SDF{u(n)} ~ Siw) ~ Sew) = L r.(n) exp( -jwn) = S(z)Iz=ew , ,


(2.90)
n = -a;,

The covariance
... r.(n) is simply the inverse Fourier transform of the SDF, i.e.,

jit.(n) = ( Siw) 2~ eX~(jwn~d~/ (2.91)


I ... ~,,~
.-'
In two dimensions the CGF and SDF have the analogous definitions.
ee

CGF{u(m, n)} = Sizi ,Z2) ~ S (z, , Z2) ~ II r"(m, n)zi zi m n


(2.92)
ee

SDF{u(m, n)} = S.(WbW2) ~ StWhU12} ~ II ,"em, n) exp[ -j(Wjm + W2n)]


m. n =0 -';:0

r -= Su(Zi ,z2)lzI=t'jl41.z2~ej~ - c' (2.93)


. r.(m,n}=4\Ji" S.(whw2}exp[f<~lm·+~;n)]dwldw2.). (2.94)

This shows
.
c_~_
~
. -v
-..- ., .
of r~~~,;_-" _-,-:~~:- ~~ _ t1fl
f
I

:!::~.~f I_
I . - - ;---/1 r r" ....- 4
1
I d,,=E[lu(m, n) - ,-----
fLlZj =,.(0,0) = 4~j
.,
t SuCWI ,W2} d wl d w2
-
! (2.95)

i.e., t e volume upder SU(0J!, (2) is equal to the average power in the random fielg
,u(m, n}. Therefore, p ysically S" w;~-w;) represents
"-' '---'~"'~ ,
the power density in the image.
at spatial frequencies (w] ,(2)' Hence,the SnF is also known as the power spectrum
density function or simply the power spectrum of the underlying random field.
Often the power spectrumis..defined as the f9uri~r transformof the aUlocorrelatio!!
-sequence rather than the covariance sequence. Unless stated otherwise, wewill.
continue to use the definitions based on covariances, ....
In the text whenever we refer to S,,(ZI, Z2} as the SDF, it is .implied that
Zl = exp(jwl), Z2 = exp(jw2)' ~hen a s.I>!e.(;lra! density function can be e!l'ressecLas a
Latio offinite polynomials in 41 and Z2 ,it is called.a.ratianal spectrum and it is of 1M
form K L

k~-K
2: I=-L
2:: b(k, l )zlk Zi l
M N . (2.96)
I I a(m,n)zlmzin
m=-M n=-N
Such SDFs are realized by linear systems represented by finite-order difference
equations.

Properties of the SDF



1. The SDF is real:


• (2.97)

38 Two-Dimensional Systems and Mathematical• Preliminaries Chap.·2



. TABLE 2.8 Propert.es of SDF of Real Random Sequences

Property One-Dimensional Two-Dimensional

Fourier transform pair S(w)-r(n) S(w" (2)-r(m, n)


Real. S(w)=S*(w) S(Wl, w,) = S"(w" w,)
Even S(w)=S(-w) S(w" (2) = S( -w" -(2)
Nonnegative Sew) <:: 0, 'flU} S(Wl,w,»o, \fw"w,
Linear system output 2
S; (w) = IH(w)1 S. (w) 2
Su (WI, (2) = IH (w" (2)1 S. (w" (2)
Separability 5(W1,W2) = 5, (w,)S, (W2)
lfr(m, n)=r,(m)r2(n)

This follows by observing that the covariance function is conjugate symmetric,


i.e., rem, n) = r* (-m, -n). For real random fields the SDF is also even.
2. The SDF is nonnegative, i.e.,
(2.98)
Intuitively, this must be true because power cannot be negative. The formal
proof can be obtained applying the nonnegativity property of (2.si) to the covar-
iance functions of stationary random fields.
For a space-invariant system whose frequency response" is H (w, ,wz) and
. whose input is a random field eim, n), the SDF of the output u(m, n) is given by
. . S.(<<>1 , «>2) == IH(WI ,(2)j2 S.(WI ,(2) (2.99)
Table 2.8 summarizes the properties of the one- and two-dimensional SDFs. Similar
definitions and properties hold for the SDFs of continuous random fields.
Example 2.11 .
The covariance function of a stationary white noise field is given as rem, n) =.
,i;i? c?- 8(m, n). The SDF is the constant (12 because

m •

Example 2.12
, Consider the separable covariance function defined in (2.84). Taking the Fourier trans-
form, the SDF is found to be the separable function
,
. d'(1 - p1)(l- liz)
S(WhW,) = . ~ ". (2.100)
(1 + 1'1 - 2p, cos w, )(1 + Pi - Zpz cos,W2)
.
For the isotropic covariance function of (2.86), an
analytic expression for the SDF is
not available. Figure 2.7 shows displays of some SDFs.
,

2.12 SOME RESULTS FROM ESTIMATION THEORY

Here we state some important definitions and results from estimation theory that
are useful in many image processing problems.

Soc. 2.12 . Some Resu Its from Estimation Theory 39



r'"h" I'"~ ). ",i.i 'b.1',,", /S,d' ",'+:i?"'"'-';' " "_%4.,!!t 41/ qL H 1YAcW/Lfi'p"""N\}';'¢'f"'1!'ht',1 jJ'>H>'j;'5ft~

~ ~ :;
..• ,

,"

t

o
00

,t,f . •
1o
oo •

Figure 2.7 ~pectral density functions


alb
c Id SDF of (a) isotropic covariance'
0

function; (b) separable covariance func-


tion; (c) covariance function of the girl
image; (d) covariance function of the'
moon image (see Figure 1.3).

Mean Square Estimates •

Let {y (n), 1;5, n :$ N} be a real random sequence andx be any real random variable.
It is desired to find x, called the optimum mefL,!t§f1lJ:E!! estimate of x, from an
observation of the random sequence y (n) such that the !,l)e,il,!l §q~1lf~ ep'Qt __",e"_"__ '" -- ,

f(T~~E[(x-x)JJ (2.101)
;,,__':-"':-_r__ ~~,~r~'--_~'

is minimized. It is simply the condj~Qnalme1!n QLU.iYenYla), 1:$ n ;5, N [9, 10]

- . " i '" E(xly) ~E[xly(1), . .. ,YeN)] = r-~


~PxIY(~) d; (2.102)

where PxiY(~) is the conditional probability density of x given the observation vector

y. If x and y (n) are independent, then i is simply the mean value of x. Note thati is
an unbiased estimate oU;. because
. ,. , 'II<
.
'"
.. ,
J)~/E[~L~:[E(xIY)] =E[x](,'
,

(2.103) .
For zero me~ Gaussian random variables. i turns.out tobeJinea.rin.y(n), i.e.,

?) i'" f
.~l
a(n)y(n) (2.104)

where the coefficients a(n) are determined by solving linear equations shown next.

The Orthogonality Principle


According to this principle the minimum mean square estimation error vector is
orthogonal to every random variable functionally related to the observations, i.e.,

40 •
Two-Dimensional Systems and Mathematical Preliminaries . Chap. 2


for any g(y) ~g(y(1),y(2),... ,yeN)),
E[(x - x)g(y)] = 0 . (2.105)
To prove this we write
E[£g(y)] == E[E(xly)g(y)] == E[E (xg (y)jy)] "" E[xg(y»)
which implies (2.105). Since £ is a function of y, this also implies
E[(x - x)ij "" 0 (2.106)
E[(x - £)g(£)] = 0 (2.107)

i.e., the estimation error is orthogonal to every function of the estimate.


The orthogonality principle has been found to be very useful in linear estima-
tion. In general, the conditional mean is a nonlinear function and is difficult to
evaluate. Therefore, one often determines the optimum linear mean square esti-
mate. For zero mean random variables this is done by writingx as a linear function
of the observations, as in (2.104), and then finding the unknowns Ot(n) that minimize
the mean square error. This minimization gives .
IV
L Ot(k)E[y (k)y (n)] = E[xy(n»), n=l, ... ,N
k~l •

In matrix notation this yields


(2.108) .

where (X == {a(n)}, rxy;; {E[xy(n)J} are N x 1 vectors and R, is the N x N covariance


matrix of y. The minimized mean square error is given by
• (2.109)

If x, yen) are nonzero mean random variables, then instead of (2.104), we


• •
wnte
N
X - ....x= X - ....x = L q(n)[y(n) -!Ly(n)] (2.110)
n=l

Once again €X is given by (2.108), where Rt and rxy represent the covariance and
cross-covariance arrays. ff x, yen) are non-Gaussian, them (2.104) and (2.109) still
give the best linear mean square estimate. However, this is not necessarily the
conditional mean. ..

2.13 SOME RESULTS FROM INFORMATION THEORY


Information theory gives some important concepts that are useful in digital repre-
sentation of images. Some of these concepts will be used in image quantization
(Chapter 4), image transforms (Chapter 5), and image data compression (Chap-
ter 11). . .

Sec. 2.13 Some Results from Information Theory 41


Information

Suppose there is a source (such as animage), which generates a discrete set of


independent messages (such as gray levels) ri , with probabilities PA, k = 1, ... ,L.
Then the information associated with rA is defined as
(2.111)
Since

(2.112)

each PA::::; 1 and h is nonnegative. This definition implies that the information
conveyed is large when an unlikely message is generated.

Entropy I

The entropy of a source is defined as tll~.Qy.~Eage !!!fo,!E!~~i()n.g~!lerated_ by the



source, i.e.,
L
Entropy, H = - 2: 11k logZPk bits/message. (2.113;'
k~l

For a digital image considered"as !L§O\ enLP' " y,~


'~stimated from its histogram. !:pI' a given L, the entropy of a source is ma"imum for,
uniform
.-' ..
distributions,
--_. -
--- - ---
i.e., Pk =
-'~_._'.,"~
1/L, k= 1, .. , , L. In that case
L 1 1
max H = -

2: -
k~l L
log2- = !Og2 L
L -
bits (2.114)
. ,

The entropy of a source gives the lower bound on the number oibi1sreq.uited.
~o encode its output. In fact, according to Shannon's noiseless coding theorem [11,
12), it is possible to code without distortion a source of entropy H bits using an
average of H + ~ bits/message, where E> 0 is an arbitrarily small quantity. An
alternate form of this theorem staes that it is ...possible
- .-
to code the source with H-bits
" -. -

such that the distortion in the decoded message could be made arbitrarily small.

1 ,----~---

=----.


o I
2
I

• fil p

Figure 2.8 Entropy of a binary source.

42 TWQ-Dimensional Systems and Mathematical Preliminaries Chap. 2


• •
Example 2.13
Let the source be binary, i.e .• L = 2. Then, if p, = p, P2 = j - p, 0 S P S 1, the entropy
is (Fig. 2.8) .

,
H = H(p) = -p log 2p - (1- p) logz(l- p)
,
The maximum entropy is 1 bit, which occurs when both the messages are equally likely.
Since the source JS binary, it is always possible to code the output using 1 bit/message.
However, if p <~, P '" k (say), then H < 0.2 bits, and Shannon's noiseless coding
theorem says it is possible to ~nd a noiseless coding scheme that requires only 0.2 bits!
message. .

The Rate Distortion Function "

In analog-to-digital conversion of data, it is inevitable that the digitized data would


, have some error, however small, when compared to the analog sample. Rate dis-
tortion theory provides some useful results, which tell us the minimum number of
bits required to encode the data, while admitting a certain level of distortion and

VIce

versa.
The rate distortion function of a random variable x gives the minimum average
rate R D (in bits per sample) required to represent (or code) it while allowing a fixed
distortion D in its reproduced value. If x is a Gaussian random variable of variance
/7 1 and y is its reproduced value and if the distortion is measured by the mean square
value of the difference (x - y), i.e.,

(2.115)
then the rate distortion function of x is defined as [11, 12J
R = {'G)IO~(/711D), D <0'2
DO, D>er 1

= max[ 0, (D logl(7i)] (2.116)

.C1early
_
the •.maximum
.... _ .., - •..--
value
_
of D is equal to /72,
_
the variance of-,,' x.Figure 2.9 shows
~"""'--""-_H.'_'

the nature of this function. , - - - '- '"


Now if {x(O),x(l), .. . ,x(N -I)} are Gaussian random variables encoded
independently and if {y(O), ... ,yeN -I)} are their reproduced values, then the
average mean square distortion is
1 N-I
D=N k~O E[Jx(k) L y(k)/l] , (2.117)

.For a fixed average distortion D, the rate distortion function R D of the vector x is
given by
I N- I [ . O'k21
RD = NkLO max 0, ~ log2(j J (2.118)

where eis determined oy solving •


1 ,v-I ,"
D =-N L min(e,O'l) (2.119)
k .... o-

Sec. 2.13 , Some Results from Information Theory , 43


• •
• ,
,


Figure 2.9 Rate distortion function for


~ Distortion, 0 a Gaussiansource,

Alternatively, if R o is fixed, then (2.119) gives the minimum attainable distortion,


where (} is obtained by solving (2.118). In general RDjs convex a~otonic1!lly .
.nonincreasing function of the distortion /). '

PROBLEMS
r .., '-
2.1 a. Given a sequence u(m, n) '" (m +n)3.Evaluate u(m, n)o(m--l, II -2) and
u(m,n)@8(m-l,n-2). , ." •
b. Given a function I(x, y) '" (x + y)'. Evaluate I(x, y) II (x -1, y - 2) , and
[t», y)@o(x-l,y-2).
, 1 J~
c. Show that -> e~in. d6 '" 8(n).
2'11" -~
2.2 (Properties of Discrete Convolution) Prove the following:
a. h(m,n)@u(m,n)=u(m,n)@h(m,n) (Commutative)
b. h (m, n) @[alu, (m, n) + a, u, (m, n)] = a, [h (m, n) @u, (m, n)J
+a,[h(m,n)@u2(m,n)J (Distributive)
'-c. hem, n)@u(m - mo, n - no = hem - mo, n - no)@u(m, n) (Shift invariance)
d. hem, n)@[u, (m, n)@u2(m,n)] = [h (m, n)@il, (m, n)J@u,(m,n) (Associative)
e. hem, n)@8(m, n) =h(m, n) (Reproductive)
, '

. ~ i • ,

r. 2:2: v(m, n} = 2:2:


hem, n) 2:2:
u(m, n)
m,
(Volume conservation)
fl ... - - "

• where vtm, n} =h(m, n)@u(m, n)


2.3 Write and prove the properties of convolution analogous to those stated in Problem 2.2
for two-dimensional continuous systems.
2.4 In each of the following systems find the impulse response anddetermine whether or
not the system is linear, shift inv~, FIR, or HR. '
a. )'(m, n) =,3x(m, n ) + 9 ( \
b. y(m, n} = m'n' x(m, n} \ , \\'
,

44 Two-Dimensional Systems and Mathematical Preliminaries Chap. 2


,

m {I

c. y(m, n) = 2:
'tn' """ _c:<l n'
2:
= ~c:t;
x(rn',Il')

d. y(m, n) =x(m - mo, n - nO) .
.e. y(m, n) =exp{-Ix(m, n)l}
1 1

f. ,y ( m, 11) =.::.-
'" .:." x (m' , n ') . ,
'"
",'--111'--1 ..,,,,,,,",

M- 1N - 1 2'l'imm'} { 2'l'inn'}
{
g. y(m, n)= 2: 2:
m .... Orj'=O
x(m', n')exp -j 'M exp -j 'N
1

2.5 a. Determine the convolution of x(m, n) of Example 2,1 with each of the following
arrays, where the boxed element denotes the (0, 0) location.'
l, 0 -1 1 Ii. m. -2
-1 rn-l IT]

2.6
2.7

2.8.

• .,;.V).......[ Scaling
_f_(X.,;. I flax, bV).[ 5' 1-.. . . . 1 5' Ig(x, vI. ,.,

Figure P2.9

(Hankel transform) . Show that in polar coordinates the two-dimensional Fourier trans-
form becomes
~ F(t; cos~, t; sin o) = L [fp(r, e) exp] -j2'l'irt; cos(e -<1»]r dr de.
2~

Fp(t;, <1»

.where jp (r, 1:1) = f(r cos 1:1, r sin e). Hence, show that if f(x, y) is circularly symmetric,
then its Fourier transform is also circularly symmetric and is given by . ,
(~ . 1
p,,(p) =21T Jo rjp(r)Jo(27rr p) dr, Jo(x)D.:z; 0 exp(-jxcos6) de L'"
The pair fp (r) and Fp (p) is called the Hankel transform pair of zero order.
2.11 Prove the properties of the Z- transform listed in Table 2.5.
2.12 For each of the following linear systems, determine the transfer function, frequency
response, OTF, and the MTF.
a. y(m, n) - Ply(m -1, n) - p,y(m, n -1) =x(m, n)
• b. y(m, n) - p,y(m -1, n) - p,y(m, n -1) + PIP,y(m -1, n -1) =x(m, n) .
2.13 What is the impulse response of each filter? . .
a. Transfer function is H, (z" z,) = 1- a, zl' - a,zi' - a3zi-1 zi' - a.,z, a:
b. Frequency response is H(<J>h w,) = 1- 20< COSuh - 2()( cosw,. •


Chap. 2 Problems 45


Z.14 a. Write the convolution of two sequences {1,2,3,4} and {-1,2, -I} as a Toeplitz
matrix operating on a 3 x 1 vector and then as a Toeplitz matrix operating on a 4 x 1
vector.
b. Write the convolution of two periodic sequences {1, 2, 3, 4, ...}and {-1, 2,-1,0, ...},
each of period 4, as a circulant matrix operating on a 4 x I vector that represents tlre
. first sequence. .
&a.triX trace and related formulas].
N
a. Show that for square matrices A and D, Tr[A] '" Tr[A T
] = L x, Tr[A + B] =
< i = I
Tr(A]+Tr[B], and Tr[AB] = Tr[BA] when Ai are the eigenvalues of A.
a. {a }
I). Define D.. (Y)~-Tr[YJ~j· ( )Tr[YJ. Then show that D.. (AB)= B T,
aA . laa m, n
D.. (ABA') = AB T + AB, and D.. (A -, BAC) = -(A -, ~4.CA-I)T + (CA -, By.
2.16 Express the two-dimensional convolutions of Problems 2.5(a) as a doubly Toeplitz
block matrix operating on a 6 x 1 vectcr obtained by column ordering of the x(m, n).
~~In the two-dimensional linear system of (2.8), thex(m, n) andy(m, n) are of size M x N
and are mapped into column ordered vectors x and y, respectively. Write this as a
. ,
matnx equation
y '" 'iJlx
and show 'ilf is an N x N block matrix of basic dimension M x M that satisfies the
properties listed in Table P2.17.
TABLE P2.17 Impulse Response (and Covariance) Sequences •
and Corresponding Block Matrix Structures

Sequence .Block matrix • •


"
Spatially varying hem, n; m', n') 'ilf, general
Spatially invariant in m; h(m -rm ', n; n') Toeplitz blocks.
Spatially invariant in n; h (m, n - .n''tm
, ') . Block Toeplitz .
Spatially invariant in m, n; h(m -m', n -n9i1Ji2l)
) J
Doubly Toeplitz
Spatially invariant in m, n
and periodic in m h(m modulo 111, n) Block Toeplitz with
circulant blocks
Spatially invariant in m, n
and periodic in n hem, n modulo N) Block circulant with
Toeplitz blocks
Spatially invariant and
periodic in m, n h(m modulo M, n modulo N) Doubly block circulant
Separable, spatially varying h, (m, m ')h2 (n, n ') Kronecker product
H2 0 H .
Separable, spatially invariant h. (m - m ') h2 (n - n ') Toeplitz Kronecker
product H20H.,
• H" H2 Toeplitz
Separable, spatially invariant, ,
and periodic hi (m)h l (n) (m modulo M, Circulant Kronecker
n modulo V) product H 2 0 H 1o
H" H2 circulant'

2.18 Show each of the following.


a. A circulant matrix is Toeplitz, but the converse is not true,

46 Two-Dimensional Systems and Mathematical Preliminaries Chap. •2


b. The product oftwo circulant (or block circulant) matrices is a circulant (or block
circulant! matrix.
c. The product of two Toep1i!z matrices need not be Toeplitz, •

2.19 Show each of Hie following.


a. The covariance matrix of a sequence of uncorrelated random variables is diagonal.
b. The cross-covariance matrix of two mutually wide-sense stationary sequences is'
Toeplitz,
c. The covariance matrix of one period of a real stationary periodic random sequence is
circulant.
2.20 In Table P2.17, if h (m, n; m', n ') represents the covariance function of an M x N
segment of a random field x (m, n), then show that the block matrix, 'ill represents the
covariance matrix of column-ordered vector x for each of the cases listed in that table.
2.21 Prove properties (2.97) through (2.99) of SDFs. Show that (2.100) is the SDF of random
fields whose covariance function is the separable function given by (2.84).
2.22 a. *Compute the entropies of several digital images from their histograms and compare
them with the gray scale activity in the images. The gray scale activity may be
represented by the variance of the image.
b. Show that for a given number of possible messages the entropy of a source is
maximum If all the messages are equally likely. ,
c. Show that RD given by (2.118) is a monotonically nonincreasing function of D.

• BIBLIOGRAPHY

Sections 2.1-2.6

For fundamental concepts in linear systems, Fourier theory, Z-transforms and


related topics: •
1. T. Kailath, Linear Systems. Englewood Cliffs, N.J.: Prentice-Hall, 1980.
2., A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Englewood Cliffs, N.J.:
Prentice-Hall, 1975.
3. A. Papoulis. Systems and Transforms with Applications in Optics. New York:
McGraw-Hill, 1968.
4. J. W. Goodman. Introduction to Fourier Optics. New York: McGraw-Hill, 1968.
5. R. N. Bracewell. The Fourier Transform and Its Applications. New York: McGraw-Hill,
1965. '
6. E. 1. Jury. Theory and Application of the Z- Transform Method. New York: John Wiley,
1964.
. ,

'Sections 2.7, 2.8 "

For matrix theory results and their proofs:


7. R. Bellman. Introductionto MatrixAnalysis, 2d ed. New York: McGraw-Hill, 1970.
, .

8. G. A. Graybill. Introduction to Matrices with Applications in Statistics. Belmont, Calif.:
Wadsworth, 1969. ,
,

• Problems marked with an asterisk require computer simulation or other experiments,

Chap. 2 ' Bibliography 47


Sections 2.9-2.13

For fundamentals of random processes, estimation theory, and information theory:


• • •

9. A. Papoulis. Probability, Random Variables and Stochastic Processes. New York: .


McGraw-Hili, 1965.
10. W. B. Davenport. Probability and Random Processes. New York: McGraw-Hili, 1970.

11. R. G. Gallager. Information Theory and Reliable Communication. New York: John
Wiley, 1968.
12. C. E. Shannon and W. Weaver. The Mathematical Theory of Communication. Urbana:
The University of Illinois Press, 1949.


• .'

».

I
, t




48 Two-Dimensional Systems and Mathematical
,
Preliminaries Chap.2.
,

.
IIIl!liIllilt!!&SlIi$NZlSa:Jil,iill¢;lW"'L:;;<:;:;g:~'t'Miilliillil:llil1WZiIi!JillIiIG:'=¢li£;ii;i:i)J:~iZZllliill

Image Perception

3.1 INTRODUCTION

In presenting the output of an imaging system to a human observer, it is essential to


consider how it is transformed into information by the viewer. Understanding of the
visual perception process is important for developing measures of image fidelity,
• which aid in the design and evaluation of image processing algorithms and imaging
systems. Visual image data itself represents spatial distribution of physical quan-
tities such as luminance and spatial frequencies of an object. The perceived.lnfor- '..
@Iltian may be reQ£es~n~d by attribt!1e.§..such as brightness, color, anded~s.. Our
. primary goal here is to study how the perceptual infor~ation may be represented
quantitatively.. '_ _ no -

3.2 LIGHT. LUMINANCE. BRIGHTNESS. AND CONTRAST



Light is the electromagnetic radiation that stimulates our visual response. It is
expressed as a spectral energy distribution L (A), where Ais the wavelength that lies
· in the visible region, 350 nm to 780 nm, of the electromagnetic spectrum. Liglit,
• received from an object can be written as

~
I J(A) = p(A)L(A) J . .
(3.1)
where p(A) represents the reflectivity or transmissivity of the object and L(A) is the
~bddent energy distribution. The illumination range over which the visual system
•can operate is roughly 1 to 1010, or 10 orders of magnitude.
The retina of the human eye (Fig. 3.1) contains two types of photoreceptors
called~. and '-fPJfs. The rods, about 100 million in number, are relatively long
and thin. They provide s.~o~ij;;,vision, which is the visual response at the lower
several orders of magnitude of illumination. The cones, many fewer in number

49

The eye

.' Iris
Fovea
Lens

--:ti~ ~oo
Retina
Optic
Cornea nerve
Figure 3.1 Cross section of the eye.

(about6.S million), are shorter and thicker and are less sensitive than th.e rods. They
.. : provide photopic vision, the visua! response at the higher 5 to 6 orders of magnitude.
of illumination (for instance, in a well-lighted room or bright sunlight). In the
intermediate region of illumination, both rods and cones are active and provide
mes,oj)ic vision. We are primarily concerned with the photopic vision, since elec-
tronic image displays are well lighted. .
The cones ~r:e ~!so responsible for color vision. They are densely packed in the
center of the retina (called fovea) at a density of about 120 cones per degree of arc
subtended in the field of vision. This corresponds to a spacing of about 0.5 min of
arc, or 2 urn. The density of cones falls off rapidly outside a circle of 1 radius from
0

the fovea. The pupil of the eye acts as an aperture. In bright light it is about 2mm in
diameter and acts as a low-pass filter (for green light) with a passband of about
60
"".cycles per degree.
..._-=--.. -,. . .
The cones are laterally connected by horizontal cells and have a forward
connection with bipolar cells. The bipolar cells are connected to ganglion cells, .
which join to form the optic nerve that provides communication to the central
nervous system.
The luminance or intensity of a spatially distributed object with light distribu-
tion I (r, y, A) is defined as

!(x,y) == r I(~,y, 1I.)V(A)dA (3.2)

where V(A) is called the relative luminous efficiency function of the visual system.
For the human eye, V(A) is a bell-shaped curve (Fig. 3.2) whose characteristics

1.0

0.8

--:'":.. 0.6
0.4

0.2

, 0 •

380 460 640 620 100



780 •
Figure 3.2 Typical relative luminous ef-

Mom) • ficiency function.

50 Iml:!ge Perception Chap. 3


depend on whether it is scotopic or photopic vision. I~~ luminance of an object ig,
independent of the luminances of the surrounding objects; The brightness (also
.ca1I"J app!!.refl~brightness) of an object is the perceived luminance anddepends on
The-luminance of the surround. Two objects wlth'different surroundings could have
. identical luminances but different brightnesses. The following visual phenomena
exemplify the differences between luminance and brightness.

Simultaneous Contrast

In Fig. 3.3a, the two smaller squares in the middle have equal luminance values, but
the one on the left appears brighter. On the other hand in Fig. 3.3b, the two squares
appear about equal in brightness although their luminances are quite different. The
reason is that our perception is sensitive to luminance contrast rather than the
absolute luminance values themselves. "
According to Weber's law (2, 3}, if the !uminan~ fp} of an object is just
noticeably different from the luminance Is of its surround, then their ratio is

lis - fol ;: constant (3.3)


fo
Writingfo = f,1s = f + I:J.fwhere /1fis small for just noticeably different luminances,
(3.3) can be written as . .

0/"'" d (Iogf) =: I:J.c (constant) (3.4)

The value of the constant has been found to be 0.02, which means that at least
• • •

50 levels are needed for the contrast on a scale of 0 to 1. Equation (3.4) says

I
. 1
Fll,'lU'C 3.3 Simultaneous contrast: (a)

j ,,I
small squares in the middle have equal
luminances but do not appear equally .
- .,. ,I •
bright;


I .

• • ,

I
! .. .
(b) small squares [n the middle appear
1 almost equally bright, but their Iurni-
nances are different.

Sec. 3.2 '. Light, luminance, Brightness, and Contrast . 51




TABLE 3.1 Luminance to Contrast Models

1 Logarithmic law c "" 50 log., f, 1:sf:s 100


2

Power law c -an
« r: n-2 3
'-')"'~ •
(1'2 = 10, <:'I., = 21.9

• 3 Background ratio
_IUn+ 100)
C- fn+f
Is = background luminance
, The luminance f lies in the interval [0, 100] except in the
logarithmic law. Contrast scale is over [0, 100].

equal increments in the log of the luminance should be perceived to be equally


different, i.e., A(logf) is proportional to sc, the. change in contrast. Accordingly,
the q u a n t i t y .
c= at + a2 log! (3.5)
where at , a2 are constants, is called the contrast. There are other mQl1.eJ; of contrast
[see Table 3.1 and Fig. 3.4], one of which is the root law . '
c = pin • (3.6)

The choice of n = 3 has been preferred over the logarithmic law in an image coding
study [7]. However, the logarithmic law remains the most widely used choice.

100

50 Iog,o f _ _ ~
80

u 60

J 40
I
I
I I
20


'0 20 40 60 80 100
Luminance, f
\

Figure 3;4 Contrast models.

52 Image Perception Chap. 3 '


Mach Bands

The spatial interaction of luminances from an object and its surround creates a
phenomenon called the j.1ach band effect. This effect. shows tl:!Jlt!>r:ightrl"'ss_~not a
monotonic function of luminance. Consider the gray level bar chart of Fig. 3.5a,
where each bar has constant Iumiriance. But the apparent brightness is not unjfuml
alcmg the width of the bar. Transitions at each bar appear brighter on the right side
abd darker on the left side. The dashed line in Fig. 3.Sh represents the perceived
brightness. The overshoots and undershoots illustrate the Mach band effect. Mach
bands are also visible in Fig. 3.6a, which exhibits a dark and a bright line (marked D
and B) near the transition regions ofa smooth-intensity ramp. Measurement of the

"

'(a) Gray-level bar chart.


-., luminance

i\
-
"..

e
(.?
,
~'lo ... Brightness

~---~
r---!.. I ,
1 " ' - - - """"'"
...---4,- -- ......~
'--"'--~'
'=1"'''''-_:":_:-:.=l-J i



Distance
,
(b) Luminance verlus brightness.. Figure 3.5 Mach band effect.

Sec. 3.2 Light, Luminance, Brightness, and Contrast 53



I

/ ......., Brightness
I
...... _-------
I Lumtnance
I
I
l

D 8 -- .. ' ..
lal D - dark band, B - bright band. D FJ
Distance

, {bl Mach band effect.


1.0

0.75

0.50

• 0.25

- - -.... --~,....,.--+--~lt-r--- ....- - -...... n


-0.20

leI Nature of the visual system impulse response•


.
Figure 3.6 Mach bands,
~Mach band effect can be used to estimate the impulse response of the visual system
(see Problem 3.5). . . .
Figure 3.6c shows the nature of this impulse response. The negative lobes
manifest a visual phenomenon known as !ateralj(!hibitiQn.~ The impulse response
values represent the relative spatial weighting (of the contrast) by the receptors,

rods and cones. The negative lobes indicate that the neural (postretinal) signal at a
,
given location has been inhibited by some of the laterally
." '
located receptors.
...•... "-- . ~,--.


3.3 MTF OF THE VISUAL SY&TfM
• •

The Mach band effect measures the response of the visual system in spatial COQr~:..
Qinates. The Fourier transform of the impulse response gives the frequency re-
sponse of the system from which its MTF can be determined. A direct measurement
of the MTF is possible by considering a sinusoidal grating of varying contrast (ratio·
. of the maximum to minimum intensity) and spatial frequency (Fig. 3.7a). Obser-
vation of this figure (at a distance of about 1 m) shows the thresholds of visibility at

54 • •
Image Perception Chap. 3


-;g.
OJ
50

.-,~1:: 40
.-
~
c:
"
~

il 20
'8"
e
10

o 0.5 1 5 10 50.
(a)
Spatial frequency. cycles/degree

'(bl
Figure 3.7 MTF of the human visual system. (a) Contrast versus spatial frequency
sinusoidal grating; (b) typical MTF plot.

various frequencies. The curve representing these thresholds is also the MTF, and it
varies with the viewer as well as the viewing

angle. Its typical shape is of the form
shewn in Fig. 3.7b. The curve actually observed from Fig. 3.7a is your own MTF
(distorted by the printing process). The shape of the curve is similar to a band-pass
filter and suggests that the human visual system is most sensitive to midfrequencies
and least sensitive to high frequencies. The frequency at which the peak occurs
varies with the viewer and generally lies between 3 and 10 cycles/degree. In pr.?ctice,
the contrast sensitivity also depends· on-.1he~orjentation_of the grating, being
maximum for horizontal and vertical gratings. However, the angular sensitivity
variations are within 3 dB (maximum deviation is at 45°) and, to a first approxi-
mation, the MTF can be considered to beiaotropic and the phase effects can be .
ignored. A curve fitting procedure [6] has yielded a formula for the frequency
, response of the visual system as
f .

/ H(f,I,g)~Hp(p) ~A [0:+ (fo)] exp[ -(for] I


p= Yf,y + ~~ cycles/degree (3.7)


0:
where A, Q, 13, and Po are constants. For ~ 0 and 13 ~ 1, Pu is the frequency at •

which the peak occurs. For example, in an image coding application [6], the values
A = 2.6, Cl = 0.0192, Po = (O,1l4yl ~ 8.772, and 13 ~ 1.1 have been found useful.
.The peak frequency is 8 cycles/degree and the peak value is normalized to unitY,
" .
3.4 THE VISIBILITY FUNCTION

In many image processing systems for instance, in image coding-the output


image u' (m, n) contains additive noise qtm, n), which depends on etm, n), a func-
tion of the input image uim, n) [see Fig. 3.8]. '!Q~_se~I!~ etm, n) is sometimes
called the masking function. A masking function is an image feature iliat is to be

Sec. 3.4 The Visibility Function '55


u (m,n ) + .
u ( m.n )
,

+ . •

+ ! q(m, n l
.
Noise
source Figure 3.8 Visibility function noise
source model. The filter impulse response

hem, n) determines the masking func-
him, n} tion. Noise source output depends on the
elm, n} masking function amplitude 14

observed or processed in the given application..For example, e em, n) = u em, n) iE


~CM transmission of images, Other examples are as follows:

1. e(m,n)=u(m,n)-u(m -1,n)
2. e(m, n) = utm, n) - aj u(m - 1, n) - Qzu(m, n -1) + QJu(m -1,n - 1)
3. e(m,n)=u(in;n)-a[u(m-1,n)+u(m,J-l,n)
+ uim, n - 1) + u (m, n + I)J
The ~bility function me~sures th~l.IbJective visibility--in li-!cene containing
this masking fimction depemfent noise q (m, n). It is measured as follows. Fora
suitably smallaz and a fixed interval [x, x + !:u], add white noise of power P, to all
those pixels in the original image where masking function magnitude lei lies in this
interval. Then obtain another image by adding white noise of power P", to all the
pixels such that the two images are subjectively equivalent based on a subjective
scale rating, such as the one shown in Table 3.3. Then the visibility function v(x) is
defined as [4] ... . .

Jvex) = -d;(X) I (3.8)


,j J
L-,..... -
where

V(x)=~
r;
,

The visibility function therefore represents the subjective visibility in a sc.e~ofunit.


maskingnoise.D!i§.function varies with the scene. It is useful in defining a quantita-
tive criterion for subjective evaluation of errors in an image (see Section 3.6).

3.5 MONOCHROME VISION MODELS

Based on the foregoing discussion, a simple overall model of !l1Qnochtome vision


can be obtained [5, 6] as shown in Fig. 3.9. 'Ught enters the. e¥e, whose optical
.characteristics are represented by a .1ow~Q:1;lsS fil.ter with fre_®encL re~onse
HI (~l , ~z). The spatial response of the eye, represented by the relative luminous
efficiency function !:::L~)l yields the luminance distribution f(x, y)' via (3.2). The
nonlinear response of the-rods and cones, represented
.
by the. point nonlinearity

56 Image Perception Chap. 3·


~----------------l
, I I
I
\.i;;ht F:.ye optics Spectral, Luminance
, Cone Contrast Lateral ,I Brightn ess
response response ~ inhibition
Hll~,~ (2; I bix, Y )
u», y. A) g( • I cix, 1'1 HU" , ~2)
-
L lx. ¥. ~l
- VIA) fIx, vI
,
I
,
I
I
I
~---------------~

+---_f +-_.....::::::::: - p

(al: Overall monochrome vision model. •

,
, Luminance C(X, vI Brightness

tt«, 1'1 Contrast b{x, vI

(bl Simplified monochrome vision model.



Figure 3.9

g('), yields the contrastc(x, y). The lateral inhibition phenomenon is represented
oy a spatially mvariant, isotropic, linear system whose frequency response is
H(~I ,~). Its output is the neural signal, which.represents the apparent brightness
b (x, y). For an optically well-corrected eye, the low-pass filter has a much slower
drop-off with increasing frequency than that of the lateral inhibition mechanism.
Thus the optical effects of the eye could be ignored, and the simpler model showing
the transformation between the luminance and the brightness suffices.
Results from experiments using sinusoidal gratings indicate that spatial fre-
quency components, separated by about an octave, can be detected independently
by observers. Thus, it has been proposed (7] that the visual system contains a
.nurnber of independent spatial channels, each tuned to a different spatial frequency
and orientation angle. This yields a refined model, which is useful in the analysis
and evaluation of image. processing systems that are far from the optimum and
introduce large levels of distortions.Tor near-optimum systems, where the output
image is only slightly degraded, the simplified model in
, . Fig. 3,9 is adequate and is ,

• the one
. with which we shall mostly
'
be concerned.,

3.6 IMAGE FIDELITY CRITERIA


,

.Image fidelity criteria are useful for measuringimage quality and for ratingt.he
performance of a processing technique or a vision system. There ~re tvvo.tYl'esof
_~ritetiaJhatar~ used for evaluati . ze quality, subjective and quantit?Jjv~
The subjective criteria use rating scales such as goodness scales and impairment

Sec.3.6 "Image Fidelity Criteria • 57


TABLE 3.2 Image Goodness Scales


Overall goodness scale Group goodness scale
---------_-.:-:::..-_-----
Excellent (5) Best (7) .
Good (4) . Well above average (6)
Fair (3) Slightly above average· (5)
• Poor (2) Average (4)
Unsatisfactory (1) Slightly below average (3)
Well below average (2)
Worst (1)

TIle numbers in parenthesis indicate a numerical weight attached to the rating.

scales. A goodness scale may be a global scale or a group scale (Table 3.2). The
overall goodness criterion rates image quality on a scale ranging from excellent to
unsatisfactory. A training set of images is used to calibrate such a scale. The group
goodness scale is based on comparisons within a set of images.
The impairment scale (Table 3.3) rates an image on the basis of the level of
degradation present in an image when compared with an ideal image. It is useful
in applications such as image coding, where the encoding process introduces
degradations in the output image.
Sometimes a method called bubble sort is used in rating images. Two-images A
and B from a group are compared and their order is determined (say it is AB). Then
the third image is compared with B and the order ABC or ACB is established. If the
order is ACE, then A and C are compared and the new order is established. In this.
way, the best image bubbles to the top if no ties are allowed. Numerical rating may
be given after the images have been ranked. .
If several observers are used in the evaluation process, then the mean rating is •
given by
n
2:= Sknk
k I .
R=~"~-
2:
k=l
nk
.
where Sk is the score associated with the kth rating, n« is the number of observers
with this rating, and n is the number of grades in the scale.

TABLE 3.3 Impairment Scale


Not noticeable (1)
• Just noticeable (2)

Definitely noticeable but only
slight impairment (3)
Impairment not objectionable (4)
Somewhat objectionable (5)
• Definitely objectionable (6)


Extremely objectionable (7)


58 Image Perception Chal'.3
Among the quantitative measures, a class of criteria used often is called the
mean square criterion. It refers to some sort of average or sum (or integra!) of
squares of the error between two images. For M x N images u(m, n) and u 'em, n),
(or v (x, y) and v'(x, y) in the continuous case), the quantity

O"t~
1
L: L:
/Ii N.
iu (m , n) - u ' (m, n)1 2
or
Jr Iv(x,y)-v'(x,Y)1 dx dy "
J
2
(3.9)
MN m~)n=l ' J l '

where en. is the region


over which the image is given, is called, the average least
squares (or integral square) error. The quantity .
O";"'~E[lu(tn,n)-u'(m,n)I'1 or E[lv(x,y) - v'(x,Y)1 2
] (3.10)
is called the mean square error, where E represents the mathematical expectation.
Often (3.9) is used as an estimate of (3.10) when ensembles for u(m,n) and "
u 'em, n) or v (x, y) and v '(x, y) are not available. Another quantity,
1 ,If N
O"~ = MN m~ln2:1 E[)u(m, n) - u 'em, n)1 J 2

or fl E[iv(x,y)-v'(x,y)1
\ll. "
21dxdy
(3.11)

called the average mean square or integral mean square error, is also used many
times. In many applications the (mean square) error is expressed in terms of a
signal-to-noise ratio (SNR), Which is defined in decibels (dB) as
~ 0" •
SNR = 10 logIY2" (3.12)
-,_ "'tr e
----------- ;
where 0"2 is the variance of the desired (or original) image.
Another definition of SNR, used commonly in image coding applications, Is

S
'.
~N
'-~
R ' :: 1-0"-'I - - (peak-to-peak valueoflne reference Image):!
oglO 2 (3.13)
"_ -- - "_._
.
0- _ _ ••• __
0" e =----~-

This definition generally results in a value of SNR' roughly 12 to 15 dB above


the value of SNR (see Problem 3.6).
The sequence uim, n) (or the function v {x, need not always represent y»
the image luminance function. For example, in the monochrome image model of
Fig. 3.9, v (x, y) ~ b(x, y) would represent the brightness function. Then from (3.9)
we may write for large images

0'1.= If ~ec
/b(x,y)-b'(x,y)i2dxdy
_
,, (3.14)
' . I

= Ir-'" IE (1;1'~) - B '(i;l' ~w dEl dl;2 (3.15)

where B(~I'~) is the Fo.~rier transform of b(x, y) and (3.15) follows by virtue of the
Parseval theorem. From Fig. 3.9 we now obtain . '

O"~ = If", 1C(~I' ~)- C'(~l' ~)12IH(E1' ~)12d~I'd~ (3.16)


Sec. 3.6 Image Fidelity Criteria 59


which is a frequency weighted mean square criterion applied to the contrast


function.
An alternate visual criteria is to define the expectation operator E with respect
to the visibility (rather than the probability) function, for example, by
-, .
',,> , - - , -
.

, (J'~$e ~ r leYv(e)de (3.17)


-~-) .
where e ~ u - u' is the value of the error at any pixel and vee) is its visibility. The
quantity O'~sr then represents the .mean square subjective eaai.
The mean square error criterion is not without limitations, especially when
used as a global measure of image fidelity . The prime justification for its common
use is the relative ease with which it can be handled mathematically for developing
image processing algorithms. When used as a local measurer for instance, in adap-
tive techniques, it has proven to be much more effective.


3.7 COLOR REPRESENTATION

The study of color is important in the design and development of color vision
systems. Use of color in image displays is not only more pleasing, but it also enables
us to receive more visual information. lYhiIe we can ~.l~ive only a few dozen gray
_~v~ls,we have the ability todistinguis~between thous~nds_of coiors:Tlie percep-
tual attributes of color are brightness, hue, and saturation. Bri1Wt.r~~
tl1e-perceiveJlluminance-aLmentioned"bclm:e.:'T:,he hpe of 3..<,:010r refers toj~
"redness," "greenness," ap.d SQ un. For monochromatic light sources, differences in
White

-- • I ,

Iw· Line of 9ray~

Pure (spectral)
Hue colors

G-
\
+--Saturation ,
-,
8
I

_ Brightness
Figure 3.10 Perceptual representation
of the color space. The bnghtness W'
varies along the vertica] axis. hue a varies .
along the circumference, and saturation S
varies along tile radius.

60 Image Perception Chap. 3 .

-
100
$,(l\)

so


1.l 40 f s, (l\)CIl\)dl\ '" IC)
Ii
15
*- 20 I
,
1 C(I\} f S2(A1COddl\ "2 IC )
OL-_.l-_-I.._......L---;......J_--l _
4Oil' 450 500 550 ,600 650 •
Wavelength Inml
,
lal Typical sensitivity curves for S" S2' S3
(not to scale). ' {b) Three receptor mode' for color representation
" Figure 3.11 (a) Typical absorption spectra of the three types of cones in the
human retina; (b) three-receptor model for color representation.

/"
bues. are manifested by the differences in wavelengths., Saturation is that
.,,-,",
aspect of
perception that varies most strongly as more and more white light is added to a
monochromatic light. These definitions are somewhat imprecise because hue, satur-
.ation, and brightness all change when either the wavelength, the intensity, the hue,
-,or the amount of white light in a color is changed. Figure 3.10 shows a perceptual
! representation of the color space. Brightness (W*) varies along the vertical axis,
hue (6) varies along the circumference, and saturation (S) varies along the radial

distance.' For a fixed brightness W*, the symbols R, G, and B show the relative -
locations of the red, green, and blue spectral colors. ,
Color representation is based on the classical theory of Thomas Young (1802)
[8], who stated that any color can be reproduced by mixing an appropriate set of
thre,~,prill1ary colors. Subsequent findings, starting from those of Maxwell [9J to
'more recent ones reported in [10, Ill, have established that tl1er~ ar~bree_differ~nt
types of cones in the (n,ormal) human retinawith.absorpsion spectra Sl(A)...Sz(A),
. S3(A), where Am;":S A:S Amax ,Ami, = 380- nm, Amax = 780 .nrn, These responses
ana
peak in the yellow-green, green, and blue regions, respectively, of the visible •

electromagnetic spectrum (Fig:' 3.Ila), Note that there is significant overlap


'between S, and S2' •
Based on the three-color theory, the spectral energy distribution of a
'''colored'' light, C(A), will produce a._color s~nsatioJl that can be described by
spe£tral res20ns~~ (Fig. s.m»
as
,

Cti(C) = rm"Si(~)C(A)dA, ,i "" 1,2,3 ,(3.18)


Arnin' «;;,~,{
., i,r,.
"_ , - ") ---. ~ - "' - -,
,
. , " _.r ~

, , ~ .. '.
'The superscript' used for brightness should not be confused with the complex conjugate. The notation
used here is to remain consistent with the commonly used symbols for color coordinates.

, Sec. 3.7 Color Representation 61


Equation (3.18) may be interpreted as an equation of color representation. If C 1(lI.)
and C2(lI.) are two spectral distributions that produce responses «, (C1) and eLi (C 2)
such that

i= 1,2,3 (3.19)
then the colors C\ and C~ are pen:eived to be identical. Hence two colors that look
-identical' could have different spectral distributions.

3.8 COLOR MATCHING AND REPRODUCTION

One o[the basicproblems in the study of color is the_re~roductionof color using_a


. set of light sources. Q..eneral!Y,..Jhenumber of sources is restricted to tbree_which,
due to the three-receptor model,' is· the minimum number required to match
arbitrary colors, Consider three prilrl;ary,§Qffrl;..t;.5... o f light with spectral energy distri:
butions Pk(lI.), k = 1,2,3. Let --

(3.20)

where the limits of integration are assumed to be Amill and Amax and the sources 1,re
~nearl[i!!~~£~':l2~nJ, i.e. ,"a linear combination of any two sources cannotproduce'
tfie third source. To match a color C(lI.), suppos~ the three primaries are mixed in
proportions of (3k,k = 1,2,3 (Fig. 3.12). Then 2. (3k~(lI.) should be perceived as
. k- 1 •
C( lI.,l.e.,
) -
3 l r 3
2: I3kPk(lI.),Si(lI.)dll. = 2: 13k) s, (X,)Pk(lI.) dll.
Uj(C) =
J k~1 J k~l
i=1,2,3 . (3.21)

Defining the ith cone reSDOl1se generated by one unit of the kth primary as

i, k = 1,2,3 (3.22)

we get
J
2: (3k f1"k = ai(C) = fSi(lI.)C(x') s», 'i = 1, 2 , 3 (3.23)
k=l

These are the color matching equations. Given an arbitrary color spectral distribu-
tion .C(A), the primary sources Pk(lI.), and the ~ectral sensitivity curye~ S, (lI.), the
quantities f3k, k = 1,2,3, can be found by solving these equations. Tn practice, the
primary sources are calibrated against a reference white light source with known

P,

13 1 ,(

,C/:

P, 13,c/ . S,{t.) •

/ • ,

P; 133 7)- • Figure 3.12 Color matching -using three.


pnmary sources .

62 Image Perception Chap, 3


energy distribution W(A). Let Wk denote the amount of kth primary required to
match the reference white. Then the quantities
< ------ - ., •
.-.--~-

( 13k
'Ik(C)=-, k=1,2,3 (3.24)
Wk _

i = 1,2,3 (3.25)
I

for each A'. Given the spectral tristimulus values 'Ik(A), the tristimulus values of an
arbitrary color qA) can be calculated as (Problem 3.8) . '

Tk(C) = JC(A)Tk(A)dA, k = 1,2,3 (3.26)


i'

t Example 3,L .
t
The primary sources recommended by the CIE as standard sources are three
monochromutic sources
Nt.) = 8(A - A,), AI = 700 nm, red
PleA) = 8(!I. - A2), A2 =546.1 nm, green
P3(A) = 8(A - A3), A3 = 435.8 nm, blue
Using (3.22), we obtain a" k = S, (Ak), i, k = 1,2,3. The standard cm white source has a
flat spectrum. Therefore, «i(W) = JSi(A)dA. Using these two relations in (3.23) for
reference Whit,:.: we~ail ~vrite ... -:--~:7 c..."'(fl) = 1

2:
k = 1
W.S,(Ak) = JSi(A)dA, i = 1,2,3 (3.27)

which can be solved for w. provided {Si (A.), 1 <i, k :£ 3} is a nonsingular matrix. Using
the spectra: sensitivity curves and w", one can solve (3.25) for the spectral tristimulus
values 7k(!I.) and obtain their plot as in Fig. 3.13 . .Note that some of thetrisjimulus
CC

values are negative, This means that the source with negative tristirnulus va!JJe, when
mixed with the given color, will match an appropriate mixture -- of the.ather two sources.
-_.".- .. - _.~

,It is safe to say that anyone s(:t of three primary sources cannot match all the visible
,:o'ors; J!lthough for any given caloi,..:a--:-sult;lble set of three pr:inl;iry sources can.
-be found. Hence, the primary sources for cpl()L[eproduction should be chosen to
maximize the number of colors that can be matched.

l..a'NS of Color Matching

. The preceding theory of colorimetry leads to a useful set of color matching' rules
[13], which are stated next. .
, '

'Commission Internationalc de L'Eclairagc, the international committee on color standards.

Sec. 3.8 Color Matching .and Reproduction 63


0.4
• T, (f,)
Ta(Al
0.3

~
T, (AI (

."-"rn 0.2
>
-E""
~

.-.--
0
0.1
~

f-

0
400 600 700


-0.1 !

Wavelength, A(nml
Figure 3.13 Spectral matching tr.stimulus curves for the erE spectral primary ,
system. The negative tristimulus v~lues indicate that the colors at those wavelengths
cannot be reproduced by the eIE primaries.

1. ,Any color can be matched by mixing at most ~hree colored lightslThis means
we can always' find three primary sources such that the matrix {al. J is non-
singular and (3.23) has a unique solution.
2. The luminance of. a color mixture is equal to the sum of the luminances of its
components. The luminance Y of a color light C(A) can be obtained via (3.2)
as (here the dependence on x, y is suppressed)

Y';Y(C)= f C(X)V(X)dX (3.28)

From this formula, the luminance of the kth primary source with tristimulus
setting 13k = Wk Tk (see Fig. 3.12) will be Tk WkJ Px(A)V(A) d): Hence the lumi-
nance of a color with tristimulus values 1k, k = 1, 2, 3 can also be written as
3 3
Y= 2: 1kf wkPk(A)V(A)dA A L tu, (3.29)
k=1 . k=l

where lk is called the 1Y.mJngsity coefjident of the l'th primary.
The reader should be cautioned that in general \
3
C(A) =F L Wk Tt,Pk(X} (3.30)
k=l

even though a color match has been achieved.


3. The human' eye
cannot resolve the components ola color mix~u'!.:. This means
that a monochromatic light source and its color are not unique with respect to
each other, i.e., the eye cannot resolve the wavelengths from a color.
4. A color match.at.onduminani:.eleuel holdsaver a wide rangeofluminance§,:,..

64 Image Perception . Chap. :3


,
5. Color addition: If a color C l matches color Cz and Q color c; matches color C2 ,
then the mixture of Cl and Ci matches the mixture of C z and C2. Using the
• •
notation
. [Cd = [C2 ] ::;,: color Cl matches color Cz
a.l[Cd + C/z( Czl::} a mixture containing an amount CLI of CI
and an amount O'.z of Cz
we can write the preceding law as follows. If
[Cd = [c:] and [Cz] = [C2]
then
al[Cd + az[ Cz] = 0'.1[Cl] + O'.z[C:i]
6. Color subtraction: If a mixture of CI and Cz matches a mixture orc: and Ci and
if Cz matches C:i then Cl matches C[ , i.e., if .
[Cd + [CzI = [Ca + [C2]
and [Cz] = [C2]
,
. then [Cd = (Cll
.7. Transitive law: If c, matches Cz and if Cz matches C3 , then C I matches C3 ',
i.e., if

8•.C6f6r matches: Three types of color matches are defined:


a. arC] = al[Cd + (Xz[Czl + 0'.3[C3]; i.e., a units of Care matched by a mixture


of (Xl units of C1 , (Xz units of Cz , and (X3 units of C3. This is a direct match.
Indirect matches are defined by the following.
b.a[ C] + (XI[ Cd = az[ Cz] + (X3[C2]
. c. «[C] + (Xl[Ctl + C<z[Cz] = Ci.2(C2]

These are also called Grassman's laws. They hold except when the luminance levels
are very high or very iow. :niese are also useful in color reproduction colorimetry,
, the science of measuring color quantitatively.
• •

Chromaticity Diagram
The chromaticities of a color are defined as
~ ~~. __. ~.. - "~7

" a '1k r (3.31)


;' l« == T,I + 'E2 + T,2 ,)1 k = 1, 2, 3
-
i :
-- -
... -"-"- ._--~


Clearly t l + t2 + t3 = 1. Hence, only two of the three chromaticity coordinates
are independent. Therefore, the chromaticity coordinates project the three-
dimensional color solid on a plane. The chromaticities il,"iz jointly represent the

Sec. 3.8
J
Color Matching and Reproduction 65
9

510 __ 2.0 t

530 1.5

Aererence
white

, -1.5 -1.0 -0.5 0.5

-0.5 , Line of
purples

Figure 3.14 Chromaticity diagram for the elE spectral primary system. Shaded
area is the color gamut of this system.

chrominance components (i.e., hue and saturation) of the color. The entire color
space can be represented by the coordinates (tt ,12 , Y), in which any Y = constant is
a chrominance plane. The chromaticity diagram represents the color subspace in
the chrominance plane. Figure 3.14 shows the chromaticity diagram for the CIE
spectral primary system. The chromaticity diagram has the following properties:
1. The locus of all the points representing spectral colors contains the region of
all the visible colors. .
2. The straight line joining the chromaticity coordinates of blue (360 nm) and red
(780 nm) contains the purple colors and is called the line of-purp}CJ..
3. The region bounded by the straight lines joining the coordinates (0,0), (0,1)

and (1,0) (the shaded region of Fig. 3.14) fontains.Jlllth~Qlor'§_J~oducible_
by the primary.sources,
- ~, . - '
This region is called the color');flltlut of the primary
.

sources. '.

4. The reference white of the CIE primary system has chromaticity coordinates
(t }). Colors lying close to this pointare the less saturatedcolors:-Colors
located far from this point-are t~e..J!10resatur:[email protected]$".T hus the spectral
colors and the colors on the line of purples are maximally saturated.

3.9 COLOR COORDINATE SYSTEMS
There are several color coordinate systems (Table 3.4); which have corne into
existence for a variety of reasons.

Image Perception Chap. 3


,
TABLE 3.4 Color Coordinate Systems
------------~-
Color coordinate system
L c.t. E. spectral primary Monochromatic primary sources P;. red =: 700 nrn ,
system: R. G, B P,.green ~ 546.1 nm, P,. blue = 435.8 nm. Reference
while has flat spectrum and 8. = G =B = l . See Figs.
3.13 and 3.14 for spectra, matching curves and
chromaticity diagram.
- - - - - - - - - - - - - ~--:-._----------
• 2. c.t. E. X, Y, Z system X] [0.490 0.310 0.2°O][Rl
Y = 0.177 0.813 0.01l G
Y = luminance
[Z 0.000 0.010 0.990 B.
3. C.l. E: uniform
chromaticity scale (UCS)
system: u, 11, Y ..

u, v =chromaticities

Y = luminance U= ?-X V = Y W ==X-!.~3Y,--,-+=Z


3., 2
=
U, V, W tristimulus
values corresponding to
u, v, w
--4. U" V*, W* system U* = 13 W*(u - u,,)
-

(modified VOl system) V* = 13W*(v - v,,)
W* = 25(100Y)1" - 17. I:S; IOOY:s; 100

Y = luminance [0.01. IJ chrornaticines of reference white
Uu, Vn;:::;
W * = contrast or brightness

5. S. 6. W* system:
S = saturation ,

'.
6= hue
W' = brightness
6. NTSC receiver primary
- - - - .~.----------~-
Linear transformation of X, Y. Z. Is based on
system RN 1 GN , B,... television phosphor primaries. Reference white is
iIIuminant C for which R" = G N = B,v = 1-

RN] [ 1.910
G = -0.985
-0.533
2.000
-0.288J[X
-0.028 Y
j'
[B N
N 0.058 -0.118 0.896 Z
- •

7, NTSC transmission 'system:
. Y = luminance
,
J, Q = chrorninances ,.

= brightness ,
L'
a * =500 .-
,XU
[(X)'" -(Y)'
- "']
'XI
.. ,.,. f •

a * =red-green content

b =200 [(-};;Y)"" - -(Z)
- ''']
Zt/ •

b' = yellow-blue content Xu. Y." Zli = tristirnulus


, values of the reference white

.
Sec. 3.9 Color Coordinate Systems 67
_As mentioned before, the CfE spectral.primary sources .do.notyield.a.nnj .
gamut of reproducible colors. In fact, no practical set of three primaries has been
-found that can reproduce all colors. This has led to the development of the eIE
X, Y, Z system with hypothetical primary sources such that all the spectral tri- .
stimulus values are positive. Although the primary sources are physically unreal-
izable, this is a convenient coordinate system for colormetric calculations. In this
system Y represents the luminance of the color. The X, Y, Z coordinates are related
to the eIE R, G, B system via the linear transformation shown in Table 3.4. Figure
3.15 shows the chromaticity diagram for this system. The reference white for this
system has a flat spectrum as in the R, G, B system. Ihe tristimulus values foUh.!<.
.reference white are X... = Y = Z ='--_., 1.
-~. ~~ ~ ~.' ~ ".~" .._.. - -- --

Figure 3.15 also contains several ellipses of different sizes and orientations.
These ellipses, also called MlLc;Ada.m. elLgzse~ [10, 11]; ~e sueh
lie inside are indistil1~uishable. Any color lying just outside the ellipse is just.
-noticeably different (JND)-from the color at the center of the ellipse. The size,
orientation, and eccentricity (ratio of major to minor area) of these ellipses vary
throughout the color space. Tue uniformchromaticity scale (UeS) system u, v, Y
transforms these elliptical contours with large eccentricity (up to 20: 1) to near
circles (eccentricity = 2 : 1) of almost equal size in the u, v plane. It is related to the
X, Y, Z system via the transformation shown in Table 3.4. Note that x; y and u, v are
the chromaticity coordinates and Y is the luminance. Figure 3.16 shows, the
chromaticity diagram of the ues coordinate system. The tristimulus coordinates
corresponding to u, v, and IV ~ 1·"- u - v are labeled as U, V, and W respectively.
>!--
The U·, V·, W* systS:ffi is a modified ues system whose origin Clio, va) is. .
-,.,~.... ,

,l:'shifted to the reference white in the u, v chromaticity plane. The coordinate tv"
is a cube root transformation of the luminance and represents the contrast. (or
y
520

0.8 r-r:
540

0.6
soo
~ \ f!)
,
0.4 ,,
600
~ 6i'
f!(J @ I@

0.2 ~ f). @
480 (b f)@ tJ9........
Figure
, 3.15 . Chromaticity. diagram for
380 . the CIE Xy:Z color coordinate system.
The (MacAdam) ellipses are the just no-
I • "
o 0.2 0.4 0.6 ticeable color difference ellipses.

68 Image Perception Chap. 3 .



v

t20540 560 580 600 620 660770 •


0.4
500
0.3

0.2 •

460
0.1 440
i"oJ'400 Figure 3.16 Chromaticity diagram for
0.1 . 0.3 0.4 the CIE ues color coordinate system.

brightness) of a uniform color patch. This coordinate system is useful for measuring
color differences quantitatively. In this system, for unsaturated colors, i.e., for
colors lying near the grays in the color solid, the difference between two colors is, to
a good approximation, proportional to the length of the straight line joining them.
. The S, 0, W* system is simply the ~.!lr representation of the U*, V*, W*
~, where Sand e represent, respectively, the saturation and hue attributes of
color (Fig. 3.10). Large values of S imply highly saturated colors.
The National Television Systems Committee (NTSC) fllfgjver primary systgm=-
(R N , GN , BN ) was developed as a standard for television receivers. The NTSChas
adopted three phosphor primaries that glow in the red, green, and blue regions of
the visible spectrum. The reference white was chosen as the illuminant C, for which
the tristimulus values are R N = GN = BN = 1. Table 3.5 gives the NTSC coordinates
of some of the major colors. The color solid for this coordinate system is a cube
(Fig. 3.17). The chromaticity diagram for this system is shown in Fig. 3.18. Note that \
the reference white for NTSC is different from that for the CIE system. .
The NTSC transmission system (Y, I, Q) lVas developed to facilitate. trans-
mission of color images using the existing monochrome television.channels without
Increasing the bandwidth requirement. The Y coordinate is the luminance (mono-
chrome channel) of the color. The other two tristimulus signals, I and Q, jointly
represent hue and saturation of the color and whose bandwidths are much smaller

than that of the luminance signal. ~j,J2. components are transmitted on a
~ -,._..-- cha[1nel. using quadrature modulation in such
subcarrier - --- ----- .. -_.
a -way that the spatiaJ.,
• •


Tristimulus and Chromaticity Values of Major Colors in the NTSC
Receiver Primary System

Red Yellow Green Cyan Blue Magenta White Black



.. R.v 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0
GN 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0
• BN 0.0 0.0 0.0 1.0 1.0 LO 1.0 . 0.0 .
rN 1.0 0.5 0.0 0.0 0.0 0.5 0.333 0.333
g", 0:0 0.5 1.0 0.5 0.0 0.0 0.33:3 0.333
bN 0.0 0.0 0.0 0.5 1.0 0.5 0.333 0.333
)
Sec. 3.9· Color Coordinate Systems 69
.-

Cyan r---H-L('

"=-+----;1-::-..,---- FIN
Black./0 480
Red

BOO
/ Blue Magenta
360
BN Line of purples

Figure 3.17' Tristimulus color solid for the Figure 3.18 Chromaticity diagram for the
NTSC receiver primary system. l'o'TSC receive I primary system.

spectra of I, Q do not overlap with that of Y and the overall bandwidth required for
.transmission remains unchanged (see Chapter 4). The Y, I, Q system is related to
the R N , ON, B N system via a linear transformation. This and some other trans- .
formations relating the different coordinate systems are given in Table 3.6.
The L *, a*, b* system gives a quantitative expression for the Munsell system.
of color classification [121. Like the U*, V·, W' system, this also gives a useful color
.difference formula.

Example 3.2
We will find the representation of the NTSC receiver primary yellow in the various
coordinate systems. From Table 3.5. we have R" = 1.0, Gf<i = 1.0,B" = 0.0.
Using Table 3.6, we obtain the eIE spectral primary system coordinates
as R == 1.167 - 0.146 - 0.0 "" 1.021, G "" 0.114 -t- 0.753 + 0.0 = 0.867, B = -0.001 +
0.59 + 0.0 = 0.058.
The corresponding.chromaticity values

are

r = 1.021 = 0.525 0.867 0.058


b = _ . - = 0.030
1.021 + 0.867 + 0.058 ' 8 = 1.946 = 0.445, . 1.946'·
Similarly, for the other coordinate systems we obtain:
x = 0.781, Y = 0.886, 2 =0.066; x = 0.451, y ;:: 0.511, z= 0.038
U =0.521, V = 0.886, W = 0.972, u = 0.219, v = 0.373, . w =OA08
Y =0.886, 1==0.322, Q =-0.312
In the NTSC receiver primary system, the reference white is RN= ON =
BN == 1. This gives Xo = 0.982, Yo;:: 1.00,20 = 1.183. Note that• X o , Yo, and Zo are

70 . Image Perception Chap.3



,
TABLE as Transformations from NTSC Receiver Primary to Different Coordinate
Systems. InplJt Vector is [RN GN BNY.
. ,~'--
,
"'= " -,.;"

Output
vector Transformation matrix Comments

y 0.299 0.587 0.114


I 0.596 -0.274 -0.322 NTSC transmission system
Q 0.211 -0.523 0.312
.,/""\ .
. '

not ~ because the reference white for(NT9ources is different from that of


the CQ9l. Using the definitions ofu and v'Trom Table 3.4, we obtain Uo = 0.201,
vo = 0.307 for the reference white.
Using the preceding results in the formulas for the remaining coordinate
systems, we obtain
W* = 25(88.6)113 - 17 = 94.45, U* = 22.10, V* = 81.04
S = 84.00, 9"" tan- (3.67) = 1.30 rad,
1
W* = 94.45
L * = 25(88.6)113 - 16"" 95.45, a* "" 500lr(0.781
0.982
r - (O.886YI3] "" -16.98
1 .

b* = 200[ (0.886)113 _ (~:~~~) 113] = 115.67


3.10 COLOR DIFFERENCE MEASURES



Quantitative measures of color difference between any two arbitrary colors pose a
problem of considerable interest in coding, enhancement, and analysis of color
images. Experimental evidence suggests that the tristimulus color solid may be
considered as a Riemannian space with a color distance metric [14, 15]
3 3
.•.• (ds)2 = ~.t:.J
""' ""' c·dXdX
f,J • ) (3.32)
'<::lj=1

The distance ds represents the infinitesimal difference between two colors with
coordinates X, and Xi + dX; in the chosen color coordinate system. The coefficients

Sec.3.10 '. Color Difference Measures •


71

,

c., measure the average human perception sensitivity due to small differences in the
ith and in the jth coordinates. ' .
Smalldifferences in color are described oil observations of just noticeable
differences (JNDs) in colors. A unit JND defined by
'3 3
1= L L c'.jdj(dXj
i ..~lj=l
(3.33)

is the describing equation for an ellipsoid. If the coefficients c'.j were constant
throughout the color space, then the JND ellipsoids would be of uniform size in
the color space. In that event, the color space could be reduced to a Euclidean
tristimulus space, where the color difference between any two colors would become
proportional to the length of the straight line joining them. Unfortunately, the
Ci,j exhibit large variations with tristimulus values, so that the sizes as well as
the orientations of the JND ellipsoids vary considerably. Consequently, the distance
between two arbitrary colors C j and Cz is given by the minimal distance chain of
ellipsoids lying along a curve <9" joining Cj and Cz such that the distance integral
A l C, (Xi)
uc; Cz) = ([1), C I (Xi)
ds (3.34)

is minimum when evaluated along this curve, i.e., for <9 = ~*. This curve is called
the geodesic between C j and Cz. If ci,j are constantin the tristimulus space, then the
geodesic is a straight line. Geodesics in color space can be determined by employing .
a suitable optimization technique such as dynamic programming or the calculus of

v
,

0,4 Green •
:;;::- ... Yellow

~~
0.3
Cyan

,
,

Blue
O. 1

, .

0.1 0.2 0.3 0,4 •0.5



Figurt 3.19 Geodesics in the (u, v) plane.

72 Image Perception Chap. 3


TASLE 3.7 CIE Color-Difference Formulas
. Equation
Formula Number Comments

(3.35) 1964 CIE formula



(.is)' = (ilL *)2 + (AU *)2 + (c..v *)2 (3.36) 1976eIE formula, modification of
the u, v, Yspace to 1.1', v ", L*
L' = 25(I°~yr -16 space. Uo, Yo, Yo refer to .
reference white.
u~ =13L* (u' -uo)
v " = 13L* (v' - Yo) •
u' = u .
v' = 1 5v =_ _9Y -r-r-z:
. X + 15Y +32 _.
(3,37) L *, a' , b" color coordinate
system.

variations [15J. Figure 3.19 shows the projections of several geodesic curves
between the major NTSC colors on the DCS 1.1, v chromaticity plane. The geodesics
between the primary colors are nearly straight lines (in the chromaticity plane), but
the geodesics between most other colors are generally curved.
Due to the large complexity of the foregoing procedure of determining color
distance, simpler measures that can easily be used are desired. Several simple
formulas that approximate the Riemannian color space by a Euclidean color space
have been proposed by the CIE (Table 3.7). The first of these formulas [eq. (3.35)J
was adopted by the CIE in 1964. The formula of (3.36), called the eIE 1976 \
L', 1.1*, v* formula, is an improvement over the 1964 CIE U', V', W*formula in \
regard to uniform spacing of colors that exhibit differences in sizes typical of those
in the Munsell book ofcolor [12] .
. The third formula, (3.37), is called the CIE 1976 L *, a", b" color-difference
formula. It is intended to yield perceptually uniform spacing of colors that exhibit
color differences greater than JND threshold but smaller than those in the Munsell
book of color.


'3.11 COLOR VISION MODEL

With color represented by a three-element vector, a color vision model containing


three channels [16], each being similar to the simplified model of Fig. 3.9, is shown
in Fig. 3.20. The color image is represented by the R"" GN , B N coordinates at
each pixel. The matrix A transforms the input into the three cone responses
ak(x, y, C), k := 1,2,3, where (x, y) are the spatial pixel coordinates and C refers to
its color. In Fig. 3.20. we have represented the normalized cone responses

Tk~akt'Y,C~,

k:=1,2,3 (3.38)
ak x, y, W
)
Sec. 3.11 Color Vision Model 13

R{oJ / X.YI n1, X, ¥I
-,
T C1 (X, JI I
Log H, (." <,I ' 8, lx, Vi

GN!x. yl T"lx, yl -
T2 • , C2 (x, vI
A

Log ,
B If,!.,. <') B,(x, yt
BN Ix. yl T;(x.y) -
Ta C, lx, yl
, , Log , If, (.,. <,I

....
Figure 3.20 A color vision model.

In analogy with the definition of tristimulus values Tkfire called the retina{
~tristimuluscoordjnates (see Problem 3.14>:..,The cone '::sponses undergo non-
linear point transformations to give three fields T,,(x, y), k =- 1,2,3. The 3 x 3 matrix
B transforms the {1k(x,y)} into {Ck(x;y)} such.that C1(x,y) is the monochrome
(achromatic) contrast field c(x, y), as in Fig. 3.9, and C2(x, y) and C,(x, y) represent
the corresponding chromatic fields. The spatial filters Hk(~l '~2)' k = 1,2,3, repre-
sent the frequency response of the visual system to luminance and chrominance
contrast signals, Thus Hl(~I' ~2) is the same as H(~l'~) in Fig. 3.9 and is a band-
pass filter that represents the lateral inhibition phenomenon. The visualfrequency
response to chrominance signals are not well established but are believed to have
their passbands in the lower frequency region, as shown in Fig. 3.21. The.3 x 3
matrices A and B are given as follows:
. 1

I 0,299 0.587 0.114\ 21.5 0,0 0.00 •

A = 1-0.127 0,724 0.175, B= -41.0 41.0 0,00 (3,39)


\ 0.000 0,066 1.117 - 6,27 0,0 6,27 ,

From the model of Fig. 3.20, a criterion for color image fidelity can be.
defined. For example, for two color images {R N , G N , B N} and {R N, G,v, BM, their
subjective mean square error could be defined by

(3.40)

where Wt is the region over which the image is defined (or available), A is its area,
and {Bk(x, y)} and {Bi(x, y)} are the outputs IJf the model for the two color images.

1,0

Figure 3.21 Frequency responses of the



• three color channels C" C.. C, of the
color vision model. BaCD filteX is assumed
30.0 to be isotropic so that Hp • (p) =- H.. (~" ~).
. p=-y'~l+ ti, k =- 1, 2, 3.

. 74 . Image Perception Chap, 3


3.12 TEMPORAL PROPERTIES OF VISION

Temporal aspects of visual perception [1, 181 become important in the processingof
motion images and in the design of image displays for stationary images. The main
properties that will be relevant to our discussion are summarized here.

Bloch's Law

Ligh~ flashes of differ.!mldurations bULe_9J!.!!lJ':liergy <WUn.di.stinguishabJe below a


<;.dti9!!L.<!uratiop. This critical duration is about 30 ms when the eye is adapted at
moderate illumination level. The more the eye is adapted to the dark, the longer is
the critical duration. .

Critical Fusion Frequency (CFFJ .

When a slowly flashing light is observed, the individual flashes are distinguishable.
At flashing rates above the critical fusion frequency (CFF), the flashes are indistin-
guishable from a steady light of the same average intensity. This frequency gener-
ally does not exceed 50 to 60 Hz, Figure 3.22 shows a typical temporal MTF.
This property is the basis of television raster scanning cameras aad displays.
Interlaced image fields are sampled and displayed at rates of 50 or 60 Hz. (The rate
is chosen to coincide; with the power-line frequency to avoid any interference.) For
digital display of still images, modem display monitors are refreshed at a rate of
60 frames/a to avoid any flicker perception.

Spatial versus Temporal Effects


,
The eye is more sensitive to flickering of high spatial frequencies than low spatial \
frequencies. Figure 3.22 compares the temporal MTFs for flickering fields with
different spatial frequencies, This fact has been found useful in coding o[ motion

,,"
--- .....
"
1.0
-_..... ",; "\
\
0.5 \
;:- ~ •
.-
>
~ 0.2
• '& ,.
'-.~
.!i
0.1
--- High spatial
frequencies field
~ 0.05
, Low spatial
- - frequencies field
0.02

1 2 5 10 20 •
Figure 3.22 Temporal MTFs for flicker·
Flick'" frequency {Hz} ing fields. •

Sec. 3.12 Temporal Properties of Vision 75


images by subsampling the moving areas everywhere except at the edges. For the
same reason, image display monitors offering high spatial resolution display images
at a noninterlaced 60- Hz refresh rate.
• •


PROBLEMS

3.1 Generate two 256 x 256 8-bit images as in Fig. 3.3a, where the small squares have gray
level values of 127 and the backgrounds have the values 63 and 223. Verify the result
of Fig. 3.3a. Next change the gray level of one of the small squares until the result of
Fig. 3.3b is verified.
Show that eqs. (3.5) and (3.6) are solutions of a modified Weber law: dflf " is pro-
portional to de, i.e., equal changes in contrast are induced by equal amounts of dflf'.

Find '{.
3.3 Generate a digital bar chart as shown in Fig. 3.5a, where each bar is 64 pixels wide.
Each image line is a staircase function, as shown in Fig. 3.5b. Plot the brightness
function (approximately) as you perceive it.
3.4 Generate a 512 x 512 image, each row of which is a smooth ramp r(n). as shown in
Fig. P3.4. Display on a video monitor and locate the dark (D) and the bright (B) Mach
bands. .

r(n)

• •
'.

240 --------_ ..... _----- - - - • • •

225 ---------------- • I •
I I
I
I I
I I
I I
I I
I I
I I
I I
I I
135 --------- I I
120 I I
I I ,
I I
n
1 100 200 280 300 512

Figure P3.4

The Mach band phenomenon predicts the one-dimensional step response of the visual
system, as shown by s(n) in Fig. P3.5. The corresponding one-dimensional impulse
response (or the vertical line response) is given by h (n)- s(n) - s(n - 1). Showthat
h(n) has negative lobes (which manifest the lateral inhibition phenomenon) as shown
in Fig. 3.6c.
• •

3.6 As a rule ofthumb, the peak-to-peak value of images can be estimated as na,where n
varies between 4 to 6. Letting n'= 5 and using (3.13), show that
• •

SNR' = SNR + 14dB


76 Image Perception Chap. 3



2.7
s-; 2.5
II 'lI' - --.- -Cl- - - Step response s{n) •
I I I
I
I
I I
I
I
I I I
I
2.0 . Step input
I I I
I
I I I I
I I I
I I I 1
I I I I
I
I I
I
I
I
I
,
I

I I
1.0? I I
I I
I'
I I
I .1
I .
I
I
,
I
I
I
I
I
0.5 I I
I I
, I I I I I
I
I
I
I 'f,0.3 : I
I
I
I
I
I
I
I
I I I I I I I
I I
-3
!

-2 -1
! I
0
I
1 2
! I
• n

Figure P3.S

Can two monochromatic sources with different wavelengths be perceived to have the
same color? Explain.
Using eqs. (3.23) through (3.25), show that (3.26) is valid.
In this problem we show that any two tristimulus coordinate systems based on different
sets of primary sources are linearly related. Let {Pk(A)} and {Pk(A)}, k == 1,2,3, be two
sets of primary sources with corresponding tristimulus coordinates {L} and {Tk} and
reference white sources W(A) and W'(A). If a color C(A) is matched by these sets of
sources,then show that
•,

where the definitions of a's and w's follow from the text. Express this in matrix form
and write the solution for {Tn.
Show that given the chromaticities t, , /2 and the luminance Y, the tristimulus values of
a coordinate system can be obtained by

-. , 7' -
'k -
Ik
3
Y
.
k ==.1, 2, 3
L: li t,
i = 1

where l, are the luminosity coefficients of the primary sources.


••

'*3.11" For all the major NTSC colors listed in Table 3.5, calculate their tristimulus values
in the RCB, XYZ, UVW, YIQ, U'V*W, L *a*b', saw*, and 'Ti t: t, coordinate
.. .... systems. Calculate their chromaticity coordinates in the first three of these systems.
,-3.12\ Among the major NTSC colors, except for white and black (see Table 3.5), which one
(a) has the maximum luminance, (b) is most~§turated,~nd
- ,.•.•.'>........... --
(c) is least saturated?

. 3.13* Calculate the color differences between all pairs of the major NTsc colors listed in
Table 3.5 according to the 1964 CIE formula given in Table 3.7. Which pair of colors is
(a) maximally different, (b) minimally different? Repeat the calculations using the
L"a'b" system formula given by (3.37). . .

Chap. 3 Problems 77
~[Retinal cone system; Ti, Ti, TJ] Let Pk(A), k = 1,2,3 denote the primary sources
C/ that generate the retinal cone tristimulus values. Using (3.38), (3.24) and (3.23), show
that this requires (for every x, y, C), .'
3
2: ((.(x, y, C)ai,k = a,ex, y, C) =? al.k = S(i - k) (P3.14)
k~l


To determine P.(A), write
. 3

P.(A) =2: S.{A)b i• k , k=1,2,3 .'

and show that (P3.14) implies B ~ {b k} = :£-1, where S ~ {O'I,i} and


"
0'1,1 = JS,(A)SAA) dA .

Is the set {Pk(A)} physically realizable? (Hint: Are b.,k nonnegarive") •
• !

BIBLIOGRAPHY

Sections 3.1-3.3
• •
For further discussion on fundamental topics in visual perception:

1. T. N. Cornsweet. Visual Perception. New York: Academic Press, 1971.
2, E. C. Carterette and M. P. Friedman, eds. Handbook of Perception, vol, 5. New York:
Academic Press, 1975.
3. S. Hecht. "The Visual Discrimination of Intensity and the Weber-Fechner Law."
J. Gen.
,
Physiol. 7, (1924): 241.

Section 3.4

For measurement and applications of the visibility function:

4. A. N, Netravali and B. Prasada. "Adaptive Quantization of Picture Signals Using Spatial


Masking." Proc. IEEE 65, (April 1977): 536-548.

Sections 3.5-3.6

For a detailed development of the monochrome vision model and related image
fidelity criteria: •

S. C. F. Hall and E. L. Hall. "A Nonlinear Model for the Spatial Characteristics of
the Human Visual System." IEEE Trans. Syst. Man. Cybem., SMC-7, 3 (March 1977):
161-170.
• • •
6. J. L. Mannes and D. J. Sakrison, "The Effects of a Visual Fidelity Criterion on the
Encoding of Images." IEEE Trans. Info. Theory IT-20, 110. 4 (July 1974): 525-536. .
7. D. J. Sakrison. "On the Role of Observer and a Distortion Measure in Image Trans-
mission." IEEE Trans. Communication·COM-25, (Nov.. 1977): 1251-1267.

78 Image Perception Chap. 3



,

Sections 3.7-3.10

For introductory material on color perception, color representation and general


reading on color:

8. T. Young. "On the Theory of Light and Colors." Philosophical Transactions ofthe Royal
Society of London, 92, (1802): 20-71. .
9. J. C. Maxwell. "On the Theory of Three Primary Colours." Lectures delivered in 1861.
W. D. Nevin (ed.), Sci. Papers 1,Cambridge Univ. Press, London (1890): 445-450.
10. D. L. MacAdam. Sources 01 Color Science. Cambridge, Mass.: MIT Press, 1970.
11. G. W. Wyzecki and W. S. Stiles. Color Science. New York: John Wiley, 1967.
12. Munsell Book of Color. Munsell Color Co., 2441 North Calvert St., Baltimore, Md.
13. H. G. Grassman. "Theory of Compound Colours." Philosophic Magazine 4, no. 7
(1954): 254-264. '

For color distances, geodesics, and color brightness:



14. C.LE. "Colorimetry Proposal for Study of Color Spaces" (technical note). J. Opt. Soc.
Am. 64, (June 1974): 896-897.
, .
15.' A. K. Jain. "Color Distance and Geodesics in Color 3 Space." J. Opt. Soc. Am. 62
(November 1972): 1287-1290. Also see J. Opt. Soc. Am. 63, (August 1973): 934-939.

Section 3.11
,

For the color vision model, their applications and related biography:

16. W. Frei and B. Baxter. "Rate Distortion Coding Simulation for Color Images." IEEE
Trans. Communications COM-25, (November 1977): 1385-1392. ..
17. J. O. Limb, C. B. Rubinstein, and J. E. Thompson. "Digital Coding of Color Video
. Signals." IEEE Trans. Communications COM-25 (November 1977): 1349-1384.

Section 3.12
,

• For further details on temporal visual perceptions, see [1] and:

18. D. H. Kelly. "Visual Responses to Time Dependent Stimuli. I. Amplitude Sensitivity


, Measurements." J. Opt. Soc. Am. 51, (1961): 422-429. Also see pp. 917-918 of this
issue, and Vol. 59 (1969): 1361-1369.

\
, '

, .
,
\
.. .,.." y


Chap. 3 Bibliography 79

Image Sampling
and
• •

Quantization

4.1 INTRODUCTION

The most basic requirement for computer processing of images is that the ill1l!.!ies ~~.
available in digital form, that is, as arrays of fi . . . For digi-
'tization (Fig. 4.1), the given image iS$.ff!l/.pled on a discrete grid and each sample or
pixel is quantized using ll.finlklUlmber of bits. The digitized image can then be
processed by the computer. To display a digital image, it is first converted to an
analog signal, which is scanned onto a display.

Image Scanning

A commonnfJmage-sll.l'1l12ling is to scan the image..mw bX Io~aI!QsaJl1P~


each row. An example is the television camera with a vidicon camera' tube or an .
-image dIssector tube: Figure 4.2 shows the operating principle. An object, film, or
transparency is continuously illuminated to form an electron image on a photo-
sensitive plate called the target. In a vidicon tube the target is photoconductive,
,

fix. yl f,<x, y) u: 1m, III


Digital
Sampler Guanti~er
computer ,
Input

•mage
Digitlzation

u'(m,Il}
Digital o toA Display
computer converter

Display •
• •

Figure 4.1 • Sampling, quantization, and display 9f images.


• •

80 •
,
Finite aperture
,/' ./' electronic beam
,
• ./'" , 1\ I,
!
~

,,
-- ~
,/'
/""
:;.J
,

)00-
I V ,
i Vidicon camera

I tlumination
... ,
...
t
Object/film
t
Photosensitive
target

Figure 4.2 Scan-out method.


Target
,

1m- •

m £i3 I!\l 100




• ••• • Switching , {,lx. y) u: {m. III
,
• and Ouantizer
• control
A logic
ar:!r
a • •

• • • • •

4m-
l<'igure 4.3 Self scanning array.

whereas in an image dissector tube it is photoemissive. A finite-aperture electron


beam scans the target and generates current which is proportional to the light
intensity falling on the target. A system with such scanning mechanism is called a
scan-out digitizer. Some of the modern scanning devices, such as charge-coupled
device (CCD) cameras, contain an array of photodetectors, a set of electronic
switches, and control circuitry all on a single chip. By external clocking, the array
can be scanned element by element in an de' (see Fig. 4.3). This is
truly a two-dimensional sampling device and is sometimes called a self-scanning
·arrav.

In another technique, called the scan-in method, the object is scanned by a .
tlliILCollimatedlight such as a laser beam, which illuminates only a small spot at
a time. The transmitted light is imaged by a lens onto a photodetector (Fig. 4.4).
Certain high-resolution flatbed scanners and rotating drum scanners use this tech-
nique for image digitization, display, or recording.

Television Standards '

In the United States a standard scanning convention, has been adopted by the
RETMN. Each complete scan of the tar~t is <::~kd.a~, which •
contains 525

•Radio Electronics Television Manufacturers Association


J
Sec. 4.1 '. Introduction 81



c
Detector
Source
.

• •
Lens

Laser source thin


-•ransparency Figure 4.4 Scan-in method. Technique
collimated light used by some high-resolution scanners.

Start of even field


Start
-- -- -- -- - •
of odd

field


----- - --- --
- -- -- - - -
••
End
of even

field
End of odd field Figure 4.5 Interlaced scanning.

lines and is scanned at a rate of 30 frames per second.J~!!ch fraill<:j§ composed of


two interlaced fields, each consisting of 262.5 lines, as shown in Fig. 4.5. To elimi-
nate flicker, alternate fields are sent at a rate of 60 fields per second. Thescan Jines
have a tilt because of the slower vertical scan rate. The first field contains all the odd
lines, and the second field contains the even lines. By keeping the field rate rather
. than the frame rate at 60 Hz, the bandwidth of the transmitted signal is reduced and
is found to be about 4.0 MHz. At tho end of the first field, the cathode-ray tube
(CRT) beam retraces quickly upward to the top center of the target. The beam is
biased off during the horizontal and vertical retrace periods so that its zigzag retrace
is not visible.Tn each vertical retrace 21 lines [lre lost, liO there are only 484 active
lines. per- frame.
'---, ".,
- , .... ....• -
There are three color television standards, the NTSC, used in North America
and Japan; the Sequential Couleur a Memoire (SECAM, or sequential chrorninance
signal and memory), used in.France, Eastern Europe, and the Soviet Union; and the
Phase Alternating Line (PAL), used in West Germany, The United Kingdom, pans
of Europe, South America, parts of Asia, and Africa."
The NTSC system uses 525 scan lines per frame, 30 frames/s, and two inter-
laced fields per frame. The color video signal can be written as a Fomposite s~~l141
ISome television engineers have been known to refer to these standards as Never Twice Same Color
(NTSC). Something Essentially Contradictory to the American Method (SECA'M), and Peace At Last
,PAL)! . •

82 Image Sampling and Quantization Chap. 4


• •
u(t) = yet) + I(t)cos(2r.f'et + <1» + Q(t)sin(2r./sct + <1» (4.1)
where <I> -:- 33° and f,c is the subcarrier frequency. The quantities Yand (I, Q) are
n.c luminance and chrominance compongl!t.~, respectively, which can be obtained
by linearly transforming theR, G, and B signals (see Chapter 3). The half-power
bandwidths of Y, I, and Q are approximately 4.2 MHz, J.3 MHz, and 0.5 MHz,
respectively. The color subcarrier frequency f,c is 3.58 MHz, which is 455/J12, where
ft is the scan line frequency (I.e., 15.75 kHz for NTSC). Since Isc is an 5ddmultiple of
ftl2 as well as half the frame frequency, /,/2, the :,Jhase of the subcarrier will change
1800 from line to line and from frame toframe. Taking this into account, the NTSC
composite video signal with 2 : 1 line interlace can be represented as
u{x,y, t) = Y(x,y, t) + I(x,y, t)cos(2r.f,cx + «l)cos[r.(f,.t - fty)] (4.2)
+ Q (x , y , t )sin(2r.f,c x + «l)cos[r.(f, t - fty)]
The SECAM system uses 625 lines at 25 frames/s with 2: 1 line interlacing.
Each scan line is composed of the luminance signal yet) and one of the chro-
minanee signals fj ~ (B - Y)/2.03 or V ~ (R - Y)/1.14 alternating from line to
line. These chrominances are related to the NTSC coordinates as
- -
I = V cos 33° - U sin 33°

(4.3)
This avoids the quadrature demodulation and.
--
the corresponding
.... chrominance
due to phase detection errors present in the NTSC receivers. The U and V sub-
shifts -
carriers are at 4.25 and 4.41 MHz. SECAM also transmits a subcarrier for lumi-
""...." •.. f ' •.....•. ~ . >. - • >. >. >. ,

JJ,l!!l~,~which increases the complexity of mixers fortransmissiol1 .


.The PAL system also transmits 625 lines at 25 frarnes/s with 2 : 1 line interlace.
The composite signal is , / ,
- -
f §'iP u(t) , yet) + U cos 2r.fct + (-l) sin 2r.fc t
If;. >. •

ffiV

- .
where m is the line number. Thus the phase of V changes by 180 between successive, 0

c The
lines in the same field._ _ cross talk between adjacent lines can be suppressed by

averaging them. The U, V are allowed the same bandwidths (1.3 MHz) with the
carrier located at 4.43 MHz.

Image Display and Recording


,

An image displayfrecording system is conceptually a scanning system operating in


the reverse direction. A common method is to scan the image samples, after digital
to analog (D to A) conversion, onto a CRT, which displays all array of closely'
spaced small light spots' whose intensities are proportional to the sample milg-
nitudes. The image is viewed through a glass screen. The quality of the image
<!epends on the spot size, both its shape and spacing, Basically, the viewed image
should appear to be continuous. The required interpolation between the samples
can be provided in a number of ways. One way is to blur the writing spot electrically,
thereby creating an overlap between the spots. This requirt'~,c;ontr()!9ver the sRQt
~11~, Even then, one is not close to the "optimal solution," which, as we shall see,

Sec. 4. '1 • ,Introduction 83



requires a perfect low-pass filter. In some displays a very srnallspot size can be

achieved so that interpolation can be performed digitally to generate a larger array,
which contains estimates' of some of the missing 'samples in between the given
samples. This idea is used in bit-mapped computer graphics displays.
The CRT display can be. used for recording the image on a film by simply
imaging thespot through a lens onto the film (basically the same as imaging with a
camera with shutter open for at least one frame period). Other recorders, such as
microdensitometers, project a rectangular aperture of size equal to that of the
. ' image pixel so that the image field is completely filled. .
. Another type of display/recorder is called a halftone display. Such a display
can write only black or white dots. J~ymMing the dot size muchsmaller than th~
yixel size, white or black dots are dispersed pseudorandomly such that the a".er~p
number of dots per pixel area is equal to the pixel gray level. Due to spatial
l,ntegration performed by the eye, such a black and white display renders the
perception of a gray-level image. Newspapers, magazines, several printer/plotters,
graphic displays, and facsimile machines use the halftone method of display.

4.2 TWO-DIMENSIONAL SAMPLING THEORY



Bandlimited Images

The digitization process for 'images can be understood by modeling them as


bandlimited signals. Although real-world.images are-rarely band1.im.ited,.tbe.y..ca.n.·
be approximated arbitrarily closely by bandlimited functions.
A function/(x, y) is called bandlimited if its-Fourier transform F(gj, ~2) is zero
outside a bounded region in the frequency plane (Fig. 4.6); for instance,
. .

(4.4)

• \
I
I
I
I
I

E2
(a) Fourier transform of •
bandlimited function,

Figure' 4.6

84 Image Sampling and Quanti~atlon Chap. 4


• •
The quantities ~>O and ~yO are called the x and y bandwidths of the image. If the
spectrum is 2ircularly symmetric, then the single spatial frequency ~ ~ S.rll = SJ'o is
called the bandwidth. .

Sampling versus Replication •

The sampling theory can be understood easily by remembering the fact that jhe
fourier transform of an arbitrary sampled function is a scaled, periodic replication
of the Fourier transform of the original function. To see this, consider the ideal
image sampling function, which is a two-dimensional infinite array of Dirac delta
functions situated on a rectallgular gri~twith spacing i.\x, i.\y (Fig. 4.7a), that is,
00

comb(x,y;i.\x,i.\y)~ 2::L 8(x-mi.\x,y-ni.\y) . (4.5)


m,n= -00

The sampled image is defined as


hex, y) = I(x, y) comb (x, y; i.\x, liy)
~.
«> -' - .-
• • (4.6)
= LL
m n = -';JfJ
I(mlix, nliy)8(x - rnlix, y - nliy)


The Fourier transform of a comb function with spacing S:», liy is another comb
function with spacing (l/lix, l/i.\y), n a m e l y , '
COMB(~h~2) = S'{comb(x, y; Ii x, liy)}. .

= ~xs~YS LL
k,l=-oe
3(~1 - k~xs> ~2 -l~ys) •
(4.7)

= ~xs~YS comb(~h{;2; l//);,.x, l/i.\y)


where .£.... ~ l//);,.x, £)';' ~l/lii': Applying the multiplication property of Table 2.3 to
(4.6), the Fourier transform of the sampled image !sex, y) is given by the convolution
\
}.:(~h ~2) = F(~h ~2) @ COMB(~h g)

=~xs~ys LL F(~hg)@3(~1-k~.xn~~l~ys)
k; 1 ~ -e»
(4.8)


From (4.8) the Fourier transform of the sampled image is, ~hi!!Jl:scale tactor,.!l
periodic replication of the Fourier transform of the input image on a grid whose .
spacing is (~"" ~y;) (Fig. 4.7b).

. Reconstruction of the Image from Its Samples

.From uniqueness of the Fourier tranform, we know that if the spectrum of the
original image could be recovered somehow from the spectrum of the sampled
image, then we would ha~e the interpolated continuous image from the sampled

Sec. 4.2 Two-dimensional Sampling Theory 6S


r- AX----:-1
H

------ ------+--T ,
v I
am

I r------ . iR,
-~~

,
\ !.
I l/t;y
• • • • • • •

~
~,.. iR,
,
I

• • •
• • •
A;r*
....L. •
0 I
I .
I

• • x
I
I

• • • • • ,
I
L ___ ____ ..II
I

61,
• • • • • • •
,I ,,
I- n",.....j I

I" ~..- -
{ol Sampling grid. lb) Sampled Image spectrum.

• • •

o
(c} Aliasing and fotdcver
frequencles {shaded _areas},

Figure 4.1 Two-dimensional sampling.



Image. If the x, y sampling frequencies are greater than twice the bandwidths,
that is, •
~,., > 2~..o . (4.9)
or, equivalently, if the sampling intervals are smaller than one-half of the reciprocal.
of bandwidths, namely, -. .... .
< 1
; A
,-,y "C
-':,j"J
(4.10)

86 Image Sampling and Quantization Chap. 4


then F(~b ~z) can be recovered by a low-pass filter with frequency response
. 1
(~j, ~) E V? )
. H(~b~2) = (~U~y,)' '(4.11
0, otherwise

where fA' is any region whose boundary i! ,J( is contained within the annular ring
between the rectangles i/? 1 and iii' zshown in Fig. 4.7b. This is seen by writing
-F(s!> sz) =a H(~b ~)f;(~b
( SZ) = F(~l' ii.!) (4.12)
that is, the original continuous image can be recovered exactly by low-pass filtering
.. -j -

the sampled image. .

Nyquist Rate. Aliasing. and Foldover Frequencies'


,
The lower bounds on the sampling rates, that is, 2~xO, 2~y(j in (4.9), are called the
Nyquist rates or the Nvquistfrequencies. Their reciprocals are called the Nyquist
intervals. The sampling theory states that a bandlimited image sampled above its
x and y Nyquist rates can be recovered without error by low-pass filtering the
sampled image. However, if the sampling frequencies are below the Nyquist
frequencies, that is, if
~X$ < 2~ to, ~Y$ < 2~};)

then the periodic replications of F(s!> ~2) will overlap (Fig. -;L 7c), resulting in a
distorted spectrum Fs( 1;1,1;2), from which F( 1;1, I;z) is irrevocably lost. The frequencies
above half the sampling frequencies, that is, above ~xs/2, ~y,l2, are called the fold-
over frequencies. This overlapping of successive periods of the spectrum causes the
foldover frequencies in the original image to appear as frequencies below ~xs/2, ~,.,!2
in the sampled image. This phenomenon is called £!{jg.siJig. Aliasing errcrs.cannot.be
_J~2ye(,i J;ly subsequent filtering. Aliasing can be avoided by low-pass filtering the
image first so that its bandwidth is less than one-half of the sampling frequency, that
is, when (4.9) is satisfied.
Figures 4.8a and 4.8b show an image sampled above and below its Nyquist
rate. Aliasing is visible near the high frequencies (about one-third distance from
the center). Aliasing effects become invisible when tQ.~.or~inaljm~eis J9w-Pass .
..-~. - - .

lUtered before subsampling (Figs. 4.8c and 4.8d). .


If the region of support of the ideal low-pass filter in (4.11) IS the rectangle
fA' = (-!I;XSl !~X$l x (-i£mil;ys] (4.13)
, .
~ centered at the origin, then.its,im.l?.ll~sl;:resP.2cn~e IS
. , )

'-f h(~,y) = SillC(x~xs)sinc(yl;ys) . (4.14)


Inverse Fourier transforming (4.12) and using (4:14) and (4.6), ~he reconstructed
ill1<.l~ is obtained as -,.' . .

i(x, y) =
.
LL
m••
--~
k(m A x,mly)sinc(xl;u - m)sinc(y~y; - n)
.~ . •
(4.15)

Sec. 4.2 Two-dimensional Sampling Theory 87


- -
--

lal Sampled above Nyquist rate and reconstructed by (b) sampled below Nyquist rate and reconstructed by
ZOH; ZOH;

(ci lowoass filtared before subsarnplinq and recon- (d) lowpass filtered before sUbsampling and recon-
structed by ZOH; structed by FOH. •

. Figure 4.8 Image sampling, aliasing, and reconstruction.

which is equal to f(x, y) if Ax, Ay satisfy (4.10). We can summarize the preceding
results by the following theorem.

Sampling Theorem
. ,

A l>~ndlirnite.~ image f(x .. y) satisfying (4.4) and sampled uniformly on a. rectangular .


,gild with spacing' Ax, Ay can be recovered withOut error from the sample values
f(mAx, nAy) provided the sampling rate is greater than the Nyquist rate, that is,

88 Image Sampling and Quantization ChapA,


Moreover, the reconstructed image is given by the i!1J:erpolation formula
\ "" "00,,, A' sin(,x~x.r - m}rr\ sin(yl;" - n}rr)\
f( A (4.16)
f( x, Yl L"L., m x, n yJ. ) I ( )
m. n =' -~ (x ~ xs - m 'Tr / YSys - n 11'

Remarks

1. Equation (4.16) shows that iIlfinit~<llil.eL· n, is required to recon-


struct the continuous function f(x, y) from its samples f(mAx, nAy). In
practice only finite-order interpolation is possible. However, sampling theory
reduces the uncountably infinite number of samples of f(x, y) over the area
AxAy to just one sample. This gives an infinite compression ratio per unit
area.
2. The~ener~isthe energy in the foldover frequencies and is equal to the
energy of the image in the tails of its spectrum outside the rectangle [It defined
in (4.13).
Example 4.1
An image described by the function ~
..
r:' .._.. .... W
• l[(x, y) = 2 cos 27!13x +~4tU'

is sampled such that Ax = Ay = 0.2. Clearly f(x, y) is \landIiTIlit~, since


8(SI - 3, ~2 - 4) + 8(SI + 3, ~2 + 4)
F(~" ~2) =
.. --
is zero for I~ll > 3, 1~21 > 4. Hence ~x(l = 3, ~(l = 4. Also ~., = {ys = 1/0.2 = 5, which is

less than the Nyquist frequencies 2;'0 and 2~o. The sampled image spectrum is , •

:7: ee

F,(~l, ~2) = 25 2:2: [8(~1 - 3 - Sk, ~2


, - 4. - 51) H(sl + 3 - Sk, ~2 + 4 - Sf)]
k./"'-oc>

Let the low-pass filter have a rectangular region Of.SUPRWt with cutoff frequencies at
half the sampling frequencies, that is, .-' ..
I
-2.5 :S ~, -s 2.5, -2.5:s ~:s 2.5
-. H(~,,~)= ~: otherwise
. Applying (4.12), we obtain
--~"
- -
F(~l, ~2) = O(~l -
.
2, £:, -1) + S(~, + 2. £:, + 1)

which gives the reconstructed image as lex:, y) = 2 cos2'1l'(2x + y). Thi£..shows that any
li'eguency component in the input image that is above (~xs/2, ~y,/2) by(A~,~) is

!.eproduced (or aliased) as a frequency component at (~ •.,i2 - ~s t"s/2 - A~). ..
4.3 EXTENSIONS OF SAMPLING THEORY

There are several extensions of the two-dimensional sampling theory that are of
• • • •
interest In Image processing. ,
)

Sec. 4.3' Extensions of Sampling Theory 89


Sampling I:tandom Fields
,

In physical sampling environments, random noise is always present in the imagc..so


it is important to consider sampling theory for random fields. A continuous §ta:.,
tion~I:YX~nQom fieldf(x, y) is called bandlimited if its power spectral density func-
tlon S(~h ~2) is bandlimited, that is, i f . -'
(4.17)

Sampling Theorem for Random Fields

Iff(x, y) is a stationary bandlimited random field, then


~

j(x,y)~ 2:2: f(mAx,n.iy)sinc(~~xs-m)sinc(y~-n) (4.18) •


I
converges to [(x, y) in the mean square sense, that is,
,
"
E(I [(x, y) - f(x, y)r) = 0- (4.19)
where ~xs = l/.ix, ~YS = l/.iy, ~xs > 2~x{I, ~YS > 2~yo.

Remarks I

This theorem states that if the random field/(x, y) is sampled above its Nyquist rate,
then a continuous random field lex, y) can be reconstructed from the sampled,
sequence such that! converges to f in the mean square sense. It can be shown that
the power spectral density. function S'(~b~) of the sampled image hex, y) is a
periodic extension of S(~b~) and is given by
00

SS(~h~) = ~xs~ys 2:2: S(~l - k~x" ~ -l~Y$) (4.20)


k,l = -1);0

When the image is reconstructed by an ideal low-pass filter with gain 1/(~~ys), the
reconstructed
.
image power
,
spectral density is given by

S(~h ~2) = 2:2: S(~l -


k, {=-:»
k~, S2 -l~,~) W(~l> ~2)
(4.21) .

where
,
!Ie 1 (~h ~2) 'E
W(~;'~2)";' , (4.22)
0, otherwise.
The alia~ItPower(T~ is the power in the tails of the power spectrum outside !Ie,
that is,

u~= Ii S(~ht2)d~ld~2= fr [1- W(~b~)]S(~h~2)d~1d~2 (4.23)


<1> 12 'I' ;iI' -~ .

which is zero if f(x, y) is bandlimited witt t:o :s: g., /2, ~yO s ~y, 12. This analysis is also

90 Image Sampling ami Quantization Chap. 4



,
useful when a bandlimited image containing wideband noise is sampled. Jhenthe
signal-lo-noise ratio of the sampled image can deteriorate significantly unless it is
low-pass filtered before sampling (see Problem 4:0). . '

Nonrectangular Grid Sampling and Interlacing

All of our previous discussion and most of the literature on two-dimensional


sampling is devoted to rectangular sampling lattices. This is the desirable form
of sampling grid if the spectrum F(~h ~2) is limited over the rectangle Yi of (4.13).
Other sampling grids may be more efficient in terms of sampling density (that is,
samples/area) if the region of support of F(~b~) is nonrectanjWlgr.
Consider, for example, the spectrum shown in Fig. 4.9, which can be tightly.
enclosed by a diamond-shaped region. On a rectangular sampling grid G h the

n n

t • • • •
• • , • • 2 •

• A,
• • • •
~·1 •
m' m
-1 0 1 2 -2 -1 0 1 2
•, • • • • :.., • •



(a) Spectrum. (bl Rectangular grid G,. Cel Intarlaead grid G•. ,
\

, -v' ~

(til Spectrum whan sampled by G, Ie) Spectrum of interlaced signal


{noointarJaced frames). when sampled by G;. Hatched •

area shows the new inserts.


Figure 4.9 Interlaced sampling.

Sec. 4.3 ,Extensions of Sampling Theory 91


Nyquist sampling intervals would be .1.x = t>.y ~ D.; = 1. If the sampling grid G z is
chosen, which is a 45° rotation of G 1 but. with intersample distance of A2, the
spectrum of the sampled image will repeat on a grid similar to G2 (with spacing 1/1l.2)
(Fig. 4.ge). Therefore, if.:j.: = V2, there will be no aliasing, but the sampling density
has been reduced by half. Thus if an image does not contain the high frequencies in
both the dimensions simultaneously, then its sampling rate can be reduced by a
factor of 2. This theory is used in line interlacing television signals because the
human vision is insensitive to high spatial frequencies in areas of large motion (high
temporal frequencies). The interlaced television signal can be considered as a three-
dimensional signal f(x, y, t) sampled in vertical (y) and temporal (r) dimensions. If
SI and S2 represent the temporal and vertical frequencies, respectively, then Fig.
4.ge represents the projection in (~h ~2) plane of the three-dimensional spectrum of
the interlaced television signal. .
In digital television, all three coordinates x, y, and t are sampled. The pre-
ceding interlacing concept can be extended to yield the lln~ quincunx sampling
pattern [10]. Here each field uses an interlaced grid as in Fig. 4.9c, which reduces.
the sampling rate by another factor of two.

Hexagonal Sampling

, .
For functions that are circularly symmetric and/or bandlimited over a circular
region, it can be shown that sampling on a hexagonal lattice requires 13.4 percent
fewer samples than rectangular sampling. Alternatively, for the same sampling rate'
less aliasing is obtained on a hexagonal lattice than a rectangular lattice, Details are
available in [14]. .

Optimal Sampling

Equation (4.16) provides the interpretation that the sampling process transforms a
continuous function [i», y) into a sequence [tm S:», nAy) from which the original
function can be recovered. Therefore, the coefficients of any convergent series
expansion of/ex, y) can be considered to give a generalized form of sampling. Suell
sampling is not restricted to bandlimited functions, For bandlimited functions the
sine functions are optimal for recovering the original function ft», y) from the
samples f(mAx, nll.y). For bandlimited random fields, the reconstructed random
field converges to the original in the mean square sense. .
More generally, there are functions that are optimal in the sense that they
sample a random image to give a finite sequence such that the mean 'square error
between the original and the reconstructed images is minimized. In particular, a
series expansion of special interest is
'"
f(x, y) = :EL: am.n<Pm.n{X, y)
m,n.::. 0:
(4-,24)

where {<PmAx, y)} are the eigenfunctions of the autocorrelation function of the
random field f(x, y). This is called the Karhunen-Loeve (KL) series expansion of
the random field. This expansion is such that am.• are orthogonal random variables,


92 Image Sampling and Quantization . Chap. 4

and, for a given number of terms, the mean square error in the reconstructed image
is minimum among all possible sampling functions. This property is useful in devel-
oping data compression techniques for images. "
The main difficulty in utilizing the preceding result for optimal sampling of
practical (finite size) images is in generating the coefficients am,n' In conventional
sampling (via the sine functions), the coefficients am,n are simply the valuesf(m~x,
nAY), which are easy to obtain. Nevertheless, the theory of KL expansion is useful
in determining bounds on performance and serves as an important guide in the
design of many image processing algorithms.

4.4 PRACTICAL LIMITATIONS IN SAMPLING


AND RECONSTRUCTION
"

The foreg~ng sampling theory i1) based ons~erIlLiclealizatiOQ§..Real-world images


are not bandlimited, which means aliasing errors occur. These can be reduced by
'low-pass filtering the input image prior to sampling but at the cost .of attenuating
. higher spatial frequencies. Such resolution loss, which results in blurring of the
image, also occurs because practical scanners have finite apertures. Finally, the
reconstruction system .£Ill!-neveLbe the ideal low-pass filter. required by the sam-
Eling theory. Its transfer function depends on the display aperture. Figure 4.10
represents the practical sampling/reconstruction systems. ,

Sampling Aperture ".


A practical sampling system gives an output gs(x, y), which can be modeled 'as (see
Fig. 4.10) ". /'
g(x, y) ~PsCx, y)*!(x , y) = p.(-x, -y)0f(x, y)

'.

(4.25)
= If A
Ps(x'-x,y'-y)!(x',YJdx'dy'

g.(x, y) = comb(x, y; AX, Ay)g(x, y) (4.26)


where p,(x ,y) denotes the light distribution in the aperture and A denotes its
0
shape. In practice the aperture is symmetric with respect to 180 rotation, that is, .
p,(x ,y) = p,(-x, -y). EquationJ4.25) is~the spatial correlation ·of[(xyy), with ---
••

r------------------, •
1 ' 1
Input I I
Image I ' Scanning g(x, y) Ideal g,(x, yl Display ii(x, y)
aperture sampler spot

i PsI-x, -v) , Ax, Ay I p.l-x, -V)
I I ,
I . I
L_~~---------------_-~
Practical scanner model •

Figure 4.10 Practical sampling and reconstruction. In the ideal case P.(x, y) =
P.(x, y) - 8(x, y)..
J
Sec. 4.4 Practical Limitations in Sampling and Reconstruction 93

pix, y) andrepresents the processofsca!ming throughthe Equation apertur~.


(4.26) represents the sampled output. For example, for an L x L square aperture
. with uniform distribution, we will have
(X"" Ll2 rs + LIZ
g(x,y) = 1. .
x-ill.
r
j(x',y')dx'dy'
y-u2
(4.27)

which is simply the integral of the image over the scanner aperture at position (x, y).
1n general (:J .2~lrtlpre~~ a 10w-paJisfiltering opetation w!wsetfansrer fun.cti()~
determined by the aperture function Ps(x ,y). The overall effect on the recon-
. structed image is a loss of resolutionand a decrease in aliasing error (Fig. 4.11). This
effect is also visible in the images of Fig.4.12.

Display Aperture/Interpolation Function


• f' " "1 ,,,,,,,.-

Perfect image reconstruction requires an infinite-orderinterpolation between the


samplesj(mAx, nAy). For a display system this means its displayspot should have a
light distribution given by the sine function, which has infinite duration and negative
lobes. .This makes it impossible for an incoherent imaging svstem to perform near
perfect interpolation.
Figure 4.13 lists several functions useful for interpolation. Two-dimensional
interpolation can be performed h:;' successive interpolation along rows and columns
of the image. The zero-order- and first-order-hold filters give piecewise constant
·and linear interpolations, respectively, between the samples. Higher-order holds
can give quadratic (n = 2) and cubic spline (n "" 3) interpolations: With proper
coordinate scaling of the interpolating function, the nth-order hold converges to the
Gaussian function as n....,. 00. The display spot of a CRT is circular and can be
modeled by a Gaussian function whose variance controls its spread. Figure 4.14
shows the effect of a practical interpolator on the reconstructed image. The resolu-
. tion loss due to the reconstruction filter depends on the width of the main lobe.
Since Isinc(x)1" < 1 for every x, the main lobe of the nth-order hold filter spectrum

.
Input image Resolution
spectrum loss

,
Lowpass filtered
image spectrum
due to finite
aperture scanning .-,.-"'" Aliased .frequencies
._ without lowpass,
filtering
,

~s
--2 ~s
-2
Reduction in
l
Aliasing with
aliasing power Iowpass filtering
due to aperture due to aperture •


Figure 4.11 Effect of aperture scanning.

94 Image Sampling and Quantization Chap. 4.


Figure 4.12 Comparison between zero- and
first-order hold interpolators. Zero-order
bold gives higher resolution and first-order
hold gives greater smoothing•

,"

-

,



Ilil 256" 256 image interpolated to 512 X 512
,. ~ - by zero-order hold IZOH I. .

••

• •

r


• '\ ":"" -
• •

'?'
,• • •'.

I .
Ibl 256 x 256 image interpolated to 512 X 512
" ; f •
by first-order hold (FOHI. .,

1- "
I



Icl256 x 256 images after interpolation

(il iii)
liiil i (Iv) • (iJ 128 x }28 image zoomed by

ZOH; liil64 x 64 image loomed by ZOH; (iii)
128 x 128 image zoomed by FOH; liv)
64 x 64 image zoomed bV FOH.

Sec. 4.4 Practical Limitations in Sampling and Reconstruction 95



Two-dimensional
One-dimensionaf Definition interpolation frequency
interpolation Diagram p(x~ function response Pd\( ,. 0\ .
function. Pd 1x, Yi = p(x)pivi . Pd(~" .21

R"",angle 1 1.0
~ [zero-order hold) La 1 ( x ) P (xiI' (Yi 'inc ( ~,
) sine (.~) /h
ZOH "root " • o 2<.. 2<yO
p.lx) . x x x, ~ 'f:;f'P--
• -~
.2 II Ax
2 -+j 4<'0 I-*-
.. . -----;-~-----+----

.... - t _ - - - -• _ 1
Triangle ..!....!.. trl ( ..3...) 1.0
(first-order hold)
FOH •
Ax Ax Ax
P, (x)p, (y)
[ (..)
sine 2"
(~]2
sine 2'
,"'\'_
p,lxl <•• Eyo ._
. x ~

. ,..Ax Ax P.{x) (!;l 1'. (x) --...l 4~,0 f--

nth-crder h o l d ' . _ • n d 1.0


. n=2.quadr.ti~ ./ I'.(XI<;)'.'. (!;lp.{xl P.lxIP.(Y) [sin~(.!L)'inc(~\l
3. cubic 5phne~
(I =: ..... n wnvuhrtlQrA fxc tya)J .- .
p. {xl x ~""""_
• . '~4~~
r,-.---+----+-----+----------l... '.
• 1.0
Gaussian -1..-. exp [- x
2
]. , exp [_ (x' + Y'I] exp [-2.'o'(H + s» I<-
p. (xl . ~ i"-. x ../2na2 20' 21l"a2 20' ~::-L""'':::,--_
--120)-
I------t----+... . _~ .+ -1

. l'r°+,
. Sine 1 SInC
4x . tLx(X) 1.
Ax4Y SI~C. ( AX x).smc (x)
.6.V~ rect (2~~~-
~l) recI (.2
2~yo )
--"'r.1--'-lt~_
... """...j . H .
2Ax 2.'0
• •



,
Figure 4.13 Image interpolation functions; 1;,0 ~-,-! AI.
, ~YO; 211y
• •

Reconstruction filter or
~--t- ..... display spot spectrum
,

Reconstructed Input image spectrum


image spectrum
,I , ...
Resolution
loss , ...,
Sampled image
spectrum
\~
, ,
I \ I
I \
I \ I \
I \
'--_1----:' ~ ,l "~::..."..... ~,

Figure 4.14 Effect of practical inter-


"'--- Interpolation error----- polation.

will become narrower as n increases. Therefore, among the nth-order-hold interpo-


lators of Figure 4.13, the zero-order-hold function will have minimum resolution loss
and maximum' interpolation error. This effect is visible in Fig. 4.12, which contains
images interpolated by zero- and first-order-hold functions. In practice bilinear
interpolation (first-order hold) gives a reasonable trade-off between resolution loss
and smoothing accuracy-.
\,
Example 4.2
. A CCD camera contains a 256 x 256 array of identical photodetectors of size a x a with
spacing AX = Ay = a <: A (see Fig. 4.3). The scanning electronics produces output
pulses proportional to the response of each detector. The spatial response of a detector
to a unit intensity impulse input at location (x ,y) is p,(x ,y) = p(x)p(y), where -

p(x)= ~(l-~~I).lxl<:~ f/" . t' •

0, otherwise

Suppose an image/ex ,y) ~ 2 'cos 2r.(xl4a + yI8a). with a = A, is scanned. Using (4.25)
and taking the Fourier transform, we obtain , ,
I' \
,,
" ( . ) • 2 a~I)' 2 a~2
._ .' G~" ~2 = SInC 2 sine 2 F(~" ~2)
• J
.:' : ~nc2(~)sin~!)~t.)F(~" ~2) a! O.94F(~d2) ,


wher~ we have used F(~" ~2) :... ~(~l ':"1;412, ~2 -1/8a) + li(s, + 1I4a, ~ +lISa). .The
scanner output signal can be written as
x ,
gs(x ,y) = g(x, y)w(x ,y) 2:2:
m, r,--~
!l(x - m S, y - nA)

1
Sec. 4.4 Practical Limitations in Sampling and Reconstruction 97

. .

~j. ~~.~",=,,""r"-_ ~,
64
2568

1 Figure 4.15 . Array scanner frequency


1288 response.
..
where w(x ,y) is a rectangular window [- Ll2, LI2], which limits the field of view of the
camera. With spacing A and L = 256A, the spectrum of tile scanned image is
~

Gs(St, £2); <:L 2:2: G(s. - m!;" £2 - n!;,),!;, =' l/A

where G(st, £2) ~ G (St, ~)@W(£t, £2), 'W(St, ~) ~ L"sinc(£1 L )sinc(S2 L). This gives
.
~

G(et,~) = 61 ,440a
2[sinc(256aSl ~ 64)sinc(256as2 - 32)

+ sinc(256as. + 64)sinc(256as2 + 32)]


~

Figure 4.15 shows G (eI, e,) at £2 = V8 a, for SI > O. Thus, instead of obtaining a delta
function at SI = V.a, a sine function with main-lobe width of Vl2sa is obtained. This
---
degradation of G(SI,~) due to convolution with W(st, (2) is called ripple. The associ-
ated energy (in the frequency domain) leaked into the side lobes of the sine functions
due to this convolution is known as leakage.

Lagrange Interpolation

The zero- and first-order holds also belong to a class of polynomial interpolation
functions called Lagrange polynomials. The Lagrange polynomial of order (q -1)
is defined as
(x
a -m)
. L%(x) = Jt k-m '
kl
,
m4k
(4.28)
Lk(x) ~ 1, 'r/k
where k o ~ -(q -1)/2, k, = (q -1)f2 for q odd and k o ~-(q - 2)f2, k, ~ q/2 for
q even. For a one-dimensional sampled sequence f(m a), with sampling interval a,
the interpolated function between given samples is defined as
"k,
J(x)=!(inll+Ci.a)~ 2: L%(Ci.)f(ma+kll) .
,.
(4.29)
" k=ko--'-
~~
~~~---
\ ~~

98 /--"' Image Sampling and Quantiza!i0n Chap.4



\, - . ... )

TheLagrange interpolation formula ofC'!~~is_~1 becausejt (;onverg~to,


the sine function interpolation as q"""" Co [Problem 4.10]. In two dimensions the
._--- ~ ---.. ~ ~-- .,........-._--

'Lagrange interpolation fomillla "becomes -. '


!(x,y) = j(mAx + aAx ,nAy + ~Ay)
kl II
(4.31)
A 2.: 2.: Ltl(a)Lj'([3)j(m + kAx, n + LAy)
k=k o 1=/0

where ql and qz refer to the Lagrange polynomial orders in the x and y directions,
re~ecti~y. . .

Moire Effect and Flat Field Response [4, 40]



Another phenomenon that results from practical interpolation filters is called the
!10ire_e!i.ec!. Itappears in the form of beatj)atterns that arise if the image contains
periodicities that are close to half the sampling frequencies. This effect occurs when
the display spot size.is small (compared to samplingmSJ:imce) so that the recon-
struction filter cutoff extends far beyond the ideal low-pass filter cutoff. Then a
signal at frequency ~x < ~xsl2 will interfere with a companion signal at txt - tx to
create a Eeat pattern, or the Moire effect (see Problems 4.11 and 4.12). A special

case of this situation occurs when the input image is a uniform gray field. Then, if
the reconstruction filter does not have zero response at the sampling frequencies
(~"'" ty,), scan lines will appear, and the displayed image will exhibit stripes and not
a flat field. •
,

4.5 IMAGE QUANTIZATION •

The step subsequent to sampling in image digitization is quantization. A g.!::u~anjt~iz~~~


ma s a co . e u in.to a..dise.rete variable u: ,_which takes'valllS
afinite set {r, •.. ,"~f numl,>ers. This mapping is generally a staircase function
(Fig. 4.16) and the quantization 'rule is as follows: Define {tk, k = 1, .'.. , L + 1} as a
set of increasing. tr;qnsitiqn or .rjecision leve/§ with t1 and tL+l as the minimum and

maximum values, respectively, of u. If u lies in interval [tk, tm), then it IS mapped to
ri, the kth rer;ollStrUeJi.I211.leveL. .

Sec. 4.5 Image Quantization 99


• •



Qupnti:zer
• rL output
-

Ouantizer f .. u• -I- -

-
r.
I
I
I
I I /1 /
I I I I
I I / / /
I I I
u
I
t, / t2
I
/ /
/ / / t. /
I I
I
t L + '\
I I
I
~
- • -
. -
r, _- -
\
Ouantlzer
- • error
-

"- I-

Figure 4.16 A quannzer.


- .

Example 4.3
The simplest a lL\:l.!lillU~..m<:. JlJl.ifmJlU!!l~nti.~ Let the output of
• - . an image sensor take values between 0.0 to 10.0. If the samples are quantized uni-
formly to 256 levels, then the transition and reconstruction levels are

10(k -1)
Ik '= 256 ' .k '= 1, ... , 257

5
r. ;= IN + 256 ' k "" 1, ... , 256

The interval q ~ IN -' 1.- 1 '= T.:" r.-t is constant for different values of k and is called
the quantization interval.

- -
In this chapter we will consider only ~ero m~WQr)! qum:W.w:s., which operate on one
input sample at a time, and the output value depends only on that input. Such
quantizers are useful in image coding techniques such as pulse code modulation
(PCM), differential PCM, transform coding, and so on. j"lote thi. .~
.~in$ is, irreversible; that is, for a given quantizer output, the input value cannot
be determined uniquely. Hence, ;...quantizer introduces distortiQ!h which any rea-
sonable design method must attempt to minimize. There are several quanrizer
designs available that offer various trade-offs between simplicity and performance.
These are discussed next.

100 Image Sampling and Quantization Chap. 4


4.6 THE OPTIMUM MEAN SQUARE OR LLOYD-MAX QUANTIZER

This:jllon:izer minimizes the mean square error for a given number of quantization
,l~~els. Let u be a real scalar random variable with a continuous probability density
function PuC 1/ ). It is desired to find the decision levels t, and the reconstruction levels
r,; for an L -level quantizer such that the meansquare errOL
.
.
- fl' "
-~'-
-
8= E[(u -U·)2) = r+' ("_I/.)2 p,,(1l) dtt
I,
I' (4.32)

is minimized. Rewriting this as

I.: =
L
:L itt +1
(II - ry PuCtl ) eLI (4.33)
i"" 1 ti

the necessary conditions for minimization of (} are obtained by differentiating it


with respect to tk and r, and equating the results to zero. This gives

!, iJarks = 2 r" +l (1/ - rk)p.(" ) d" = 0,


, .
Using the fact that tk - l S tk , simplification of the preceding equations gives
(rk + rk-J
tk = -'-"---;:-"-:.f.. (4.34)
2

(4.35)

where. '7k is the kth interval [tk , tk + d. These results state that the optimum transition
levels lie halfway between the optimum reconstruction levels, which, in turn, lie at
the center of mass of the probability density in between the transition levels.
Together, (4.34) and (4.35) are nonlinear equations that have to be solved simulta-
neously given the boundary values t 1 and h+l' In practice, these equations can be
solved by an iterative scheme such as the Newton method.
W_hJm1he number of quantization levels is large, an approximate solution can •

)e obtained by modeling the probability density Pu(lI) as a piecewise constant



function as (see Fig. 4.17),

.
• • Ll. 1
Pu(t/) = Pu(ti ) , ti = i(tj + ti + 1), (4.36)
• ,

Using this approximation in (4.33) and performing the required minimizations, an


approximate solution for the decision levels is obtained as
Zk + / 1
" . . A f ,I
[Pu(1I)r1l3dll .
• tk+1 = IL "' }
- + t1 • (4.37)
, f {Pu(1I Wll3 d» -
"
Sec. 4.6
. ,
\ . .
The Optimum Mean Square or Lloyd-Max Ouantizer 101

_ L -_ ___J'--~___J_ _ ......I_..,.--'---.....L---'---,;v Figure 4.17 Piecewise constant approxi-
I,
~ J
rii
'\
fi r., oj- 1 mation of pr)(I1)~

.~-
~ '
- -,
~ c,'
-. " '"- .'
where A:': tL+li:. t l and Zk = (kiL )A, k = 1, ... , L. Ihis method rl:.qll-ir~slh,!Uhe .
quantities t l and tl,+! , also called the overloa.d points; .befinite. These values, which
determine the ,dynami£Lange A onne quantlzcr, have to be assumed prior to the
placement of the decision and reconstruction levels. Once the transition levels {tk}
have been determined, the reconstruction levels {rk} can be determined easily by
averaging tk and tHI '. The quantizer mean square distortion is obtained as

r» -
1 It' ., rp..(tI.)1
' 113 d«
Is
I (4.38)
'" - 12L 1 " '''. J
This is a useful formula because it gives an estimate of quantizer error directly in
terms of the probability density and the number of quantization levels. This result is
exact for piecewise constant probability densities.
Two commonly used densities for quantization of image-related data are the
'Gaussian and the Laplacian densities, which are defined as follows. .
Gaussian:
. .

,
(4.39) .

Laplacian:
(4.40a)

where iL and {f2 denote the mean and variance, respectively, of u. The variance of
the Laplacian density is given by

(4AOb)

Tables 4.1 and 4.2 (on pp. 104-111) list the design values for several Lloyd-Max
quantizers for the preceding densities. For more extensive tables see [30].

102 Image Sampiing and Qoantization. Chap. 4


The Uniform Optimal Quantizer
,

For uniform distributions, the Lloyd-Max quantizer equations become linear, giving
equal intervals between the transition levels and the reconstruction levels. This is
also called the linear
_.'""
quantizer. Let
1
pu(a) =
0, other~se

From (4.35) we obtain


(tl+l - tl) tk+l + tk
rk= = (4.41)
2(tk+ 1 - tk} 2
I

Combining (4.34) and (4.41) we get

which gives

Finally,we obtain
, ti=tk-l+q, - q
rk-tk+'2 ( )
4.42
7 .
t .

Thus all transition as well as reconstruction levels are equally spaced. The quantiza-
tion error e ~ u - u: is uniformly distributed over the interval (-qI2, qI2). Hence,
the mean square error is given by
1
(J =-
J q
l2
ll2 da
q2
=12 (4.43)
q -q/2

The variance (j~ of a uniform random variable whose range is A is A 2/12. For a
uniform quantizer having B bits, we have q = AI2 B• This gives

2B
SNR = 10 log102 = 6B dB (4.44)

Thus the signal-to-noise ratio achieved by the optimum mean square quantizer for
uniform distributions is 6 dB per bit.
,
Properties of the Optimum Mean Square Quantizer • •

This quantizer has several interesting'proEerties.


1. The quantizer output is an unbiased estimate ofthe input, that is~
"l." ~

EIu'J=E[u] (4.45)
2. The quantization error is orthogonal to the quantizer output, that is,
E(u -u')u'] = O. (4.46) ,

Sec. 4.6 . The Optimum Mean Square or Lloyd-Max Quantizer 103



....
g

.. •


TABLE 4.1 Optimum mean square quantizers for Gaussian density with zero
mean and unity standard deviation; t_. "" -t., r_. = r•. tU 2+ 1 ~ co, -r

Levels 2 3 4 5 6 7 8
MSE .3634 .1902 .ll75 .0799 .0580 .0440 .0345
SNR(OB) 4.3964 7.2085 9.3003 10.972 12.367 13.565 14.6J6
Entropy i.een 1.5358 1.9111 2.2029 2.4428 •
2.6469 2.8248

k /. ,. /. T. /, T, /, T, /, T, /, T, /, r,
1 0.‫סס‬OO .7979 .6120 0.‫סס‬OO 0.0000 .4528 .3823 0.‫סס‬OO 0.000 .3177 .2803 0.‫סס‬OO 0‫סס‬oo .2451
2 1.2240 .9816 1.5104 1.2444 .7646 .6589 1.0lXll .8744 .5606 ..5006 .7561
3 1.7242 1.4469 1.8936 1.6108 Ll882 1.0500 1.3440
4 2.0334 1.7480 21520
-,,-

Levels 9 10 It 12 13 J4 15
MSE .0279 .0229 .0192 .0163 .0141 .0122 .0107
SNR (dB) 15.551 16.395 17.16.1 17.868 18.519 19.125 19.691
EnthIopy 2.9826 3.1245 3.2534 3.3716 3.4806 3.5819 3.6765
~~'-"-'--

k /, T, I, T, /, T, I, T, r, T, /, r. I, T,

. 1 .2218 0.‫סס‬OO 0.‫סס‬OO .11J96 .1838 0‫סס‬oo 0.‫סס‬OO .1685 .1569 0..‫סס‬OO 0.‫סס‬OO .1457 .1370 0‫סס‬oo
2 .•6813 .4437 .4048 .60'.19 .5600 .3675 .3402 .5119 .4761 .3138 .2936 .4414 04144 ',' .2739
3 1.1977 .9189 .8339 1.0579 .9657 .7525 .6944 .8769 .8127 .6384 .5960 .7506 .7031· .5549
4 '1.8656 1.4765 1.3247 1.5914 1.4359 I.l789 1.0814 1.2859 ,1.1843 .9871 .9182 1.0858 I.om .8513
5 2.2547 1.9683 2.3452 2.0593 1.6928 1.5345 1.7&32 1.6231 1.33J4 1.2768 1.4611 1.3607 J 1151

6 2.4259 2.1409 2.4986 2.2147 1.8647 l.7Q33 1.9388 1.7765 15463
7 • 2.5647 2.2820 2.6253 2.3439 20067
8 2.6811
__-_ _.
... ...
- Levels •

. _.
16 17 18 19 20 21 22
I
MSE .0095 cOOS5 .0076 .0069 .0062 .11057 .0052 •

SNR (dB) 20.222 2O.m 21.196 21.644 22.071 22.477 22.865


. Entropy 3.7652 3.8486 3.9275 4.0023 4.0773 4.1410 4.2056

k I. . .r. I, r, t, r, I, r, I, r, I, r. I. t,
1 0.0000 .1284 .1215 o.m 0·0000 .1148 .1093 0.0000 o.m .1038 .0992 0.0000 0.0000 .9474
2 •
.2583 .3882 .3671 .2431 .2306 .3465 .3295 .2185 .2084 .3129 .2990 .1985 .1901 .2854
3 .5226 .6569 .6203 .4910 .4655 .5845 .5553 .4405 .4198 .5267 .5029 .3996 .3824 .4795
4 .7998 .9426 .8877 .7495 .7093 .8341 .7910- .6700 .6378 ".7488 .7140 .6062 .5797 .6798
5 1.0995 1.2565 1.1785 1.0259 .9683 1.1024 1.0426 .9120 .8664 .9840 .9364 .8218 .7848 .8897
6 1.4374 1.6183 1.5080 1.3312 1.2513 1.4002 1.3187 1.I732 1.1114 1.2389 1.1756 1.0510 1.0016 1.1135
7 1.8438 2.0693 1.9060 1.6848 1.5733 1.7464 1.6340 1.4642 1.3814 1.5238 1.4399 1.3002 1.2355 1.3576
8 2.4011 .2.7328 2.4542' 2.1273 1.9638 2.1813 2.0177 1.8037 1.6906 1.8574 1.7437 1.5797 1.4949 1.6321
, 2.5501
9 2.7810 2.5037 2.8261 2.2317 2.0683 2.2791 2.1158 1.9078 1.7937 1.9553
10 2.8684 25937 2.9083 2.6349 2.3237 2.1606 2.3659
11' 2.9460 2.6738 2.9817
• -

Levels 23 24 25 26 27 28 29
MSE .ll047 .00« .0040 .0037 .0035 .0032 .00311
SNR (dB) 23.237 23.593 23.935 24.264 24.5&1 24.887 25.182
Entropy 4.2675 4.3267 4.3837 4.4384 4.4911 4.5420 4.5911

.
k I, r, I, r, I, r, I, r, I, r, I, t. I. t.
1 .0909 . 0.0000 o.m .0871 .0839 o.m 0.0000 .0807 .0778 0.0000 0.0000 .0751 .0726 0.0000
2 .2737 .1818 .1747 .2623 .2535 .1677 .1617 .2427 .2342 .1557 .1504 .2258 .2184 .1453
3 .4596 .3656 .3512 .4401 .4233 .3370 .3247 .4068 .3924 .3126 .3020 .3782 .3658 .2916
4 .6510 .5536 .5314 .6117 .5985 .5096 .4908 .5747 .5540 .4722 .4560 .5337 .5158

.4400
5 .8508 .7484 .7176 .8125 .7801 .6874 .6614 .7481 .7206 .6358 .6136 .6935 .6698 .5916
6 ' 1.0626 .9531 .9126 1.0126 .9707 .8727 .8388 .9294 .8941 .8054 .7765 ..8594 , .8293 .7480
7 1.2918. ' 1.1721 1.1199 1.2272 1.1739 1.0686 1.0254 1.1214 1.0772 .9829 .9465 1.0336 .9952 .9106
-8 1.5466 lA1I6 1.3448 1.4624 1.3949 1.2792 1.2249 1.3283 1.2732 1.I715 1.1263 1.2189 1.1729 1.0S17
9 1.8408 1.6816 1.5954 1.7283 1.6416 1.5105 1.4423 1.5561 1.4872 1.3750 1.3191 1.4193 13628 l.2640
10 2.2029 2JlOOI 1.8854 2.0426 1.9277 1.7725 • 1.6854 1.8146 1.7270 1.5995 1.5300 1.6401 1.5708 1.4615
11 2.7107 2.4058 2.2431 2.4437 2.2813 2.0829 1.9679 ' 2.1213 2.0062 1.8546 1.7668 1.8928 1.8047 1.6801

12 3.0156 2.7458

3.0479 2.7792 2.4797 2.3177 2.5141 23524 2.1578 2.0428 2.1928 2.0778 •
1.9293
13 3.0787 2.8l1l 3.1081 2.8416 2.5470 2.3856 2.5784 2.4174 2.2263
... 14 , - 3.1363 2.8709 3.1634 2.8989 2.6085
0 15 3.1893
{,II _.-
, ....
...g

TABLE 4.1 Continued • •



••
Levels 3d 31 32 33 34 35 36
MSE .0028 .0027 .0025 .lJ024 .0022 .0021 .0020 •••
SNR (dB) 25.468 25.744 26.012 26.272 26.525 26.770 27.009
Entropy, 4.6386 4.6846 4.7291 '4.7723 4.8142 4.8550 4.8946
• "
k I. r, f, r, . t. r, I, r, I, r, I, r, I, . tk
1 0.‫סס‬OO ' .0702 .0681 0.‫סס‬OO 0.‫סס‬OO .06Ii0 .0641 0.‫סס‬OO 0.‫סס‬OO .0622 0605 0.‫סס‬OO 0.‫סס‬OO .0588
2 .1407 .2111 .2047 .1362 .1321 .1983 .1926 .1281 .1246 .1869 .1818 .1210 .ll78 .1768
• 3 .2823 .3535 .3425 .2732 .2650 .3318 .3221 .2570 .2497 .3126 .3040 .2426 .2361 ..2955
4 ,4258 .4982 .4826 ,4119 .3995 ,4672 .4535 .3872 .3762 .4399 .4277 .3654 .3556 ' .4157
5 .5723 .6465 .6259 .5533 .5364 .6056 .5874 .5197 .5048 .5696 • 5536 .4900 .4768 .5378
6 .7231 .7996 ' .7736 .6985 .6767 .7479 .7251 .6553 .6362 .7026 .6826 .6173 .6003 .6627
7 $794 .95!l3 .9271 .8487 .8217 .8954 .8675 .7950 .7713 .8399 .8154 .7479 7269 .7911
8 1.0434 •
1.1275 J 1.0884 1.0056 .9726 1.0497 1.0160 .9400 .9113 .9826 .9531 .8828 .8576 .9240
9 1.2173 1.3071 • 1.2596 1.1711 1.1313 1.2128 1.1723 1.0919 1.0574 1.1323 1.0972 1.0234 .9933 1.0625
10 1.4045 1.5019 •
1.4443 1.3481 1.3001 1.3874 13389 12527 1.2116 1.2909 1.2493 I. t710 I.l353 1.2081
11 1.6098 1.7177 1.6471 1.S404 1.4824 1.5773 1.5189 1.4250 1.3760 t.4611 1.4117 1.3276 1.2855 1.3628
12 1.8410 1.9642 1.8757 1.7538 1.6828 1.7883 1.H71 1.6127 1.5539 1.6467 1.5876 1.4958 L4460 1.5292
13 2.1113 2.2584 2.1434 1.9977 1.9091 2.0298 1.9411' 1.8215 U501 1.8534 1.7818 1.6794 1.6201 1.7110
14 2.4479 2.6375 2.4772 2.2892 2.1743 2.3188 2.2040 2.0007 1.9720 1090'
". ) 2.0017 1.8842 1.8124 1.9t38

15 2.9259 3.2143 2.9518 2.6653 2.5054 2.6912 2.5326 2.3473 2.2326 2.3748 2.2602 2.1192 2.0303 "2.1469
• 16 3.2384 2.9768 3.2616 3.0010 2.7179 2.5588 2.7428 2.5841 2.4013 2.2869 ' 2,4269
'17 3.2840 3.0242 3.3056 3.0468 2.7669 2.6Oll6 2.7902
18 3.3265 3.0685 . 3.3469

-
<

<

• •

/
...o

"

TABLE 4.2 Optimum mean square quantizers for laplacian density with zero
mean and unity variance; t-k = -t., f_. = <r«, tv" + 1 Ll. 00•

Levels 2 • 3 4 5 6 7 8
MSE .5lXXl .2642 .1762 .1198 .0899 .0681 .0545
SNR (dB) 3.1)103 5.7800 7.5401 9.2152 10,464 1l.669 12638
Entropy 1.100 1.3169 1.7282 1.9466 2.2071 2.3745 2.5654
-
k t. r, t. r. t, r, t, r, t, r, I, r, I, r•
1 0.0lXXl .1071 .7071 0.0lXXl 0.0lXXl .4198 .4198 0.0000 0.0000 .2998 .2998 0.0000 0.0lXXl . 2334
2 .1414 1.1269 1.8340 .L5467 . .8395 .7196 1.1393 1.0194 .5996 .5332 .8330
3 . 2.2538 1.8464 • 2.5535 2.1462 1.4391 1.2528 1.6725
4 2.8533 2.3797 3.0868
Levels 9 10 11 12 .13 14 15
MSE .0439 .0365 .0306 .0262 .0225 .0197 .0173
SNR (dB) 13.580 14.m 15.146 15.815 16.471 17.051 17.621
Enthropy 2.7011 2.8519 2.9661 3.0907 3.1893 •
3.2955 3.3822
-
k t. r, I, r. t. r. t, r, t, r, t, r. I, r,
°1 .2334 0.0000 O.lXXlO .1912 .1912 0.0lXXl 0.0lXXl .1619 .1619 0.0000 0.0000 .1404 .1405 O.lXXlO
2 .76ftJ. ,4668 .4246 .(>580 .6158 .3824 .3531 .5443 .5150 .3239 ..3024 .4643 .4428 .2809 •
3 1.4862 1.0664 .9578 1.2576' 1.1490 .8492 .7777 LOm .9396 .7062 .6555 .8467 .7959 6047
4 2.6131 1.9060 1.6774 2.0971 1.8686 1,4488 1.3109 1.6107 1.4729 1.1731 1.0801 1.3135 1.2206 .9871
5 3.3202 . 2.8043 3.5114 2.9955 2.2883 2.0305 2.4503 2.1924 1.7727 1.6133 1.9131 1.7538 1.4540
6 3.7026 3.1574 3.8645 3.3193 2.6122 2.3329 2.7527 2,4733 2.0536
7 .>
4.0264 3.4598 4.1669 3.6002 2.8931
8 '3073
- - _.- - - _.- .~

UIIS!." 4.2 Continued


...--......-..--' -~ ..
Levels 16 17 18 19 20 21 22
MSE .0154 .0137 .0123 mu ..0101 .0092 .00l\4
SNR (dB) 18.133 18.636 19.094 19.545 19.959 20.368 20.746
Entropy 3.4747 . 3.5521 3.6341 3.7040 3.7776 3.8413 3.90;n

~ I, F. I, F. t, T. /. T. I. r. I. r. I. T•

I 0.0000 .1240 .1240 0.0000 0.0000 .1110 .lllO 0.0000 0.0000 .1005 .1005 0.0000 0.0000 .0918
• 2 .2645 .4049 .3885 .2480 .2350 .3590 .3461 .2220 .2115 .3225 .3120 .2010 .1923 .2928
3 .5Wi .7288 .6909 .5289 .4995 .6399 .6105 .4701 .4466 .5706 .5471 .4230 .4038 .5148
4 '.9200 1.1111 1.0440 .8528 .8019 .9638 .9129 .7510 .7110 .8515 .81lS .6711 .6389 .7629
5 1.3446 1.5780 1.4686 1.2352 1.1550 1.3462 1.2660 1.0748 1.0134 1.1753 1.1139 .9520 .9033 1.0438
6 1.8778 2.1776 2.0018 1.7020 1.5796 1.8130 1.6907 1.4572 1.3665 1.5577 1.4671 1.2759 1.2057 1.3677
7 2.5974 3.0171 2.7214 . 2.3016 2.ll29 2.4126 2.2239 1.9241 1.1912 2.(1246 1.8917 \.6583 l.S589 1.7501
8 3.1243 404314 3.8483 3.1412 2.8324 3.2522 2.9435 2.5231 2.3244 2.6242 2.4249 2.1251 1.9835 2.2169
9 4.5554 3.9593 4.6664 4.Q704 3.3632 3.0440 3.4637 3.1445 2.7247 2.5167 2.8165

10 • 4.7775 4.1709 4.8780 4.2714 3.5643 3.2363 3.6561
11 4.9785 4.3632 5.0703

Levels 21 24 25 26 27 28 29'
MSE • • .0077 .0071 .0066 .0061 .0057 .0053 .0050
SNR (dB) 21.120 21.467 21.811 22.133 22.452 22.751 23.048
Entropy 3.9666 4.0277 4.0819 4.1382 4.1886 4.2408 4.2879


k I. T. /. r, t, r. I. r. I. rk I, T, r, r.
1 .0918 . 0.0000 0.0000 .0845 .0845 0.0000 0.0000 .0783 .0783 0.0000 0.0000 .0729 .0729 0.1JO(J(l
2 .2841 .1836 .1763 .2681 .2608 .1690 .1627 .2472 .2410 .1565 .1511 2294 .2240 .1458
3 ,4956 .3846 .3686 .4691 04531 .3526 .3390 .4308 04173 .3255 .3139 .3984 3868 .3023
4 .7307 .6067 .5801 .6912 .6647 .5536 .5314 .6319 .6096 .5091 .4902 .5820 .5631 .4713
5 .9952 .8547 .8152 .9392 .8997 .7757 .7429 .8539 .8212 .7101 .6825 .7830 .7554 .6549
6 1.2976 1.1356 1.0797 1.2201 Ll642 1.0231 • .9780 1.I020 1.0562 .9322 .8941 1.0051 .9670 .8559
7 1.6507 14595

1.3821 l.544G 1.4666 1.3046 1.2424 1.3829 1.3207 1.1803 1.1291 1.2532 1.2020 1.0780

8 2.0753 1.8419 1.7352 1.9264 1.8197 •
1.6285 1.5448 . 1.7068 1.6231 1.4612 1.3936 1.5341 1.4665 LJ261
9 2.6085 2.308'i 2.1598 2.3932 2.2443 2.0109 1.8980 2.0892 1.9763 1.7851 1.6960 1.8580 1.7689 . 1.6070
10 3.3281 2.9083 2.6931 2.9929 2.m6 2.4778 2.3226 2.5560 2.4009 2.1675 2.0492 2.2404 2.1221 1.9309
• II 4.4550 3.7479 3.4126 18324 3.4971 3.0774 2.8558 3.1556 2.9341 2.6343 2.471~ 2.7072 2.5467 2.3m
12 5.1621 4.5395 5.2466 4.6240 3.9169 3.5754 3.9952 3.6537 3.2339 10070 3.3068 3.0799 2.7801
13 5.3311 4.7023 5.4094 4.7806 4.0735 3.7266 _ 4.1461 .' • .3.7995 3.3797
,
..
Q
14
15
5.4877 4.8535 5.5606 4.9264 4.2193
5.6335
CD
.......
o

; ,,~


,

, ,

TABLE 4.2 Continued


'.
Bil 6 7 Bit 6 7 Bil 7
Levels 64 128 Levels 64 128 , Levels 128
MSE .0011 .0003
SNR (dB) 29.7430 35.6880
Entropy 5.4003 6.3826
- --
r,
-- -~-

k t. r, t, r, k t, r. t. k I, r,
1 ••· 0.0000 •0326 0.0000 .0166 23 2.3936 2.4942 .8910 .9161 45 . 2.4416 2.4933
2 • .0663 .1000 .0334 .0502 24 2.6052 2.7163 .9418 .9675 46 2.5477 2.6220
3 .1348 .1695 .0673 .0844 25 2.8403 2.9644 .9938 1.0202 47 2.6593 27165
4 .2OS5 .2415 .1018 .1191 26 3.1048 3.2453 1.0471 1.0741 4S 2.7171 2.8376
5 .2787 .3159 .1368 .1544 •
27 14m3 3.5692 ' 1.1m8 1.1295 49 2.9017 2.9659
6 .3545 .3930 .1724 .1903 28 3.7604 3.9516 1.1579 1.1863 50 3.0341 3.1024
7 .4330 .4730 .2086 .2269 29 4.1850 4.4185 1.2155 1.2447 51 3.1753 3.2483
8 .5146 .5563 .2455 •
.2640 30 4.7183 5.0181 ' 1.2747 1.3047 52 3.3266 3.4049
..,~

9 .5995" .6427 .2829 .3019 31 5.4379 5.8576 1.3355 1.3664 ),j 3.4894 3.5740
10 .6879 .7330 .3211 ,3404 32 6.5647 7.2719 iJ982 ' 1.4299 54 3.6658 3.75n

11 .7801 .8272 .3600 .3796 33 1.4627 1.4964 55 3.8582 3.9587
12 .8764 .9257 .3995 .4195 34 1.5292 1.5629 56 4.0698 4.1809
13 .9774 1.0291 .4398 .4602 35 1.5978 1.6327 57 4.3049 . 4.4289
14 1.0834 1.13n .' .4809 .5017 ,36 1.6687 1.7048 58 4.5694 4.7099
15 1.1949 1.2522 .5228 .5440 37 1.7420 1.7793 59 4.8718 5.0338
16 1.3127 1.3731 .5655 .5871 38 1.8180 1.8566 60 5.2250 5.4162
17 1.4373 1.5014 .6091 .6311 39 1.8967 1.9368 61 5.6496 5.6831
18 1.5697 1.6379 .6536 .6761 40 1.9764 2.0200 62 6.1829 6.4227
. 19 1.7108 1.7838 .6990 .7220 41' 2.0634 2.1067 63 6.9024 7.3222
20 1.8621 1.9404 .7455 .7689 42 2.1518 2.1970 64 8.0293 8.7364
......... 21
22
,
2.0249
2.2013
2.1094
2.2931
.7929
.,8414
,8169
.8659
43
44
2.2441
2.3406
2,2913
2.3899

,
• •
••
Proofs

1. If P« is the probability of ri, that is,


lk +1
Pk =
J'. p,,(a) da (4.48)

then from (4.35),


L L Jlk+! - JIl.+l .
E[u·].1. 2: PkE[ulu E .9k] = 2: ,,,puC,,) dft = "Pu(ft) dtt = E[u]
k=l k=l Ik II

L L
2. E[uu'J = E{E[u'ulu E .'1k]} = 2: PkhE[ulu E :/.]} = 2:.Pkri= E[(uY]
k=1 k=1

which proves (4.46). This gives an interesting model for the quantizer, as
shown in Fig: 4.18. The quantizer noise" is uncorrelated with the quantizer
output, and we can write
u = u: +" • (4.49)
(J~ = E[(u - u: )2] = E[u 2] - E[(U·)2J ' (4.50)
Since (J~ 2: 0, (4.50) implies the average power of the quantizer output is
reduced by the average power of quantizer noise. Also, the quantizer noise"
is dependent on the quantizer input u, since ' '
E[u,,] = E[u(u- u')J = E[,,2J
3. Since for any mean square quantizer (J~ = CT~f(B), (4.50) immediately yields
(4.47). •



,,
u Optimum u: + u•
mean square
quantizer > • u +


.., -
'I

u=u' +'1
E[u''1]=Q •

E[u'll = E['1 2 j ,

Figure 4.18 Optimum mean square quantizer and its signal dependent noise
model.

112 Image Sampling and Quantization Chap. 4



,
4.7 A COMPANDOR DESIGN [20-23]

A compandor (compressor-expander) is a uniform quantize! 'preceded and sue-


cee e y nonlinear transformations, asshown irl.-fig.4,19. The inpuf'ranaofn
variable u is first passeatIlrough a nonlinear memoryless transformation [(-) to
yield another random variable IV. This random variable is uniformly quantized to
give y E {y;}, which is nonlinearly transformed by g(.) to give the output u' . The
overall transformation from u to u' is a nonuniform quantizer. The functions [and g
are determined so that the overall system approximates the Lloyd-Max quantizer.
The result is given by
g(x) = rex) (4.51)

r [Pu(1I )P'3 dn
t1 ' , .
[(x) = 2a -a (4.52)
, ,

where [ -a, a] is the range of IV over which the uniform quantizer operates. If pu(tt) .
is al.1 even function, that is, Pu(1I )= PuC -tt ), we get

, ro [puett. d» )]113 •

[(x) = a . - t-. . : . . . . . - - - - X 2:0


Ieo [puClI )] dtl
+1 113 '

[(x) = -[('--X), x <0 (4.53)


This gives the minimum and maximum values of [(x) as' -a and a, respectively...
However, the choice of ais arbitrary. As shown in Fig. 4.19,f(x) and g(x) turn out'
to be functions that compress and expand, respectively, their domains.

W = flu) y u'=g(YI

--- I I
-. I
I
I
I
I I •
u.. •
W Y

,
,
• ,

u I( • ) W Uniform Y g( • ) u
compressor. , quantizer expander
• , •

Figure 4.19 A compandor.

Sec. 4.7 A Compandor Design [20-23J 113


Example 4.4 •
Consider the truncated Laplacian density; which is often used as a probabilistic mode!
for tbe prediction error signal in D PCM;
p,,(tI) ""'ce- UIHi , -A <: II sA (4.54)

where c '" ~(1 - exp( -aA W'. Use of (4.53) gives

a (I - exp( -ax/3)]
.,.---,....:-----c'c:::-:- 0S X s A
f(x) = [1 - expf -aA 13)}'
-!(-x),-A sx <0.

g(x) '"
-~ln{l-; [l-exp(-~)]}, Q:sx:sa
(4.56)'
-g(-x), -a:sx<O. I

Transformations f(x) and g (x) for other probability densities are given in Problem
4.15.

Remarks •

1. The compander design does not cause the transformed random variable w to
be uniformly distributed:
2. For large L, the mean square error estimate of the compander can be approxi-
mated as
l' __
(. =
1 'J't.'
lZU ,[pifl)J
j/3
dll
3
(4.57)
t! .

The actual performance characteristics of companders are found to be quite


close to those of the minimum mean square quantizers. .
3. The compander does not necessarily require tl and t~+l to be finite.
l
4. To implement a compandor, the nonlinear transformations f(-) and ( . ) may r
, be implemented by an analog nonlinear device and the uniform quantizer can
be a simple analog-to-digital converter. In a digital communication applica-
tion, the output of the Uniform quantizer may be coded digitally for trans-
mission. The receiver would decode, perform digital-to-analog conversion.
and follow it by the nonlinear transformation jv'(»).
S. The decision and reconstruction levels of the compandor viewed as a non-
uniform quantizer are given by

, L
1. "" g(kg), Lk"" -tb k '= 0,., - ''2

rk""g ((k -~)q), r-k= -rbk = I, ... ,~


,

for an even probability density function.


114 Image Sampling and Quantization Chap. 4



, •
4.8 THE OPTIMUM MEAN SQUARE UNiFORM QUANTIZER
fOR NONUNIFORM DENSiTiES '
,
Since a uniform quantizer can be easily implemented, it is of interest to know how to
best- quantize a nonuniformly distributed random varible by an L-level uniform
quantizer, For simplicity.Ietp.I«) be an even function and let L be an even integer.
For a fixed L, the optimum uniform ,
quantizer is determined
1\
completely by the
. , . D f' '; v,. \'\.,.",G(,i,".:r Y'" , ,
quantization step size q. e me
,

. A --~
.~~

.-
'2a=Lq
"

where q has to be determined so that the mean square error is minimized. In terms
of these parameters,
_ ", r "
s = ~1 r+ 1
(tt - ri PU(II) d« +:2 roo (ll - rd pill) d»:
1""2 fj "Q-q

Since {tj, rj} come from a uniform quantizer, this simplifies to


(W)-lf ici (2j -l)q 2 f~' (L -ljq 2
, 0=2 2:
j=1 (i-l)q
/1- 2' Pu(II)dll+2 I~- 2 ' Pu(u)dll
(LI2-1)q
-
Now; the problem is simply to minimize C as a functionof the single variable q. The
result is obtained by iteratively solving the nonlinear equation (for L > 2)

d "=
---,-0 a
dq
For L = 2, the optimum uniform and the Lloyd-Max quantizers are identical, giving

a= 2 r IIpu(II) dr/.

which equals the mean value of the random variable lui. Table 4.3 gives the opti-
mum uniform quantizer design parameters for Gaussian and Laplacian probability
densities.


4;9 EXAMPLES, COMPARISONS, AND PRACTICAL LIMITATIONS

Comparisons among various quantizers can be madejn at least two different


, ways. First. suppose the quantizer output is to be coded by a fixed number of levels.
This would be the case for a fixed-word-length analog-to-digital conversion. Then
one would compare the quantizing error variance (or the signal-to-noise ratio,
SNR = 10 IOglO (12/ Z,') as a function of number of quantization bits, B, Figure 4.20

, shows these curves for the optimum mean square (Lloyd-Max), the compander, and
.the optimum uniform quantizers for the Gaussian density function. As expected,
the optimum mean squafequantizers give the best performance, For Gaussian
densities, the performance difference between the optimum mean square and the
optimum uniform quantizers is about 2 d.'!3 for B = 6. In the case of-the Laplacian

Sec. 4.9 Examples, Comparisons, and Practical Limitations 115


density, this difference is about
.
4.3 dB. The compandor and optimum ,
mean square
quantizers are practically indistinguishable., .
From rate distortion theory it is known that the minimum achievable rate, Bn
of a Gaussian random variable of variance (1" 2 is given by the rate distortion function
(see Section 2.13) a s . . .
. . 1 (1"2
B'=i 1og2'D -(4..59)

116 Image Sampling and Ouantization Chap. 4


50

Shannon quantizer •
40
. "'" '-...
Opt. uniform with
entropy coding

t 30

'"•
-0
0::

'" 20

10

OL-_ _.L-_ _..l-,..--_-'-_ _.....I.._ _......l......:._--l


I 2 3 4 5 7
Rate, bits ..

Figure 4.20 Performance of Gaussian density quantizers,

where D < 0'2, is the average mean square distortion per sample. This can also be
written as /}
. ..... '-~1, ,

"Lf o';F2{j;D =0' 2 2- 28"


1D"''l.. ~ (
B,>O (4.60)
and representsa ' er d on attainable distortion for any practical quantizer for
Gaussian random variables. This is called the Shannon lower bound. The associated
optimal encoder is hypothetical because it requires that a block of infinite observa-
tionsof the random variable be quantized jointly. This is also called the Shannon
quantizer. The various quantizers in Fig. 4.20 are also compared with this quantizer.
Equation (4.60) also gives an upper bound on attainable distortion by the Shannon
quantizer, for non-Gaussian distributed random variables. For a given rate B" zero
memory quantizers, also called one-dimensional quantizers (block length is one),
.generally do not attain distortion below the values given by (4.60). The one- .
dimensional optimum mean square quantizer for a uniform random variable of
variance 0'2 does achieve.this distortion, however. Thus for any fixed distortion D,
, the rate of the Shannon quantizer may be considered to give practically the min-
imum achievable rate by a zero memory quantizer for most probability distributions
of interest in image processing. (See Problems 4.17 and 4.18.)
The second comparison is based on the entropy of the quantizer output versus
its distortion. If the quantized variables are entropy coded by a variable-length
. coding scheme such as Huffman coding (see Ch. 11, Section 11.2), then the average
. .number of bits needed to code the output will often be less than log, L. An optimum
quantizer under this criterion would be the one that minimizes the distortion for a
specified output entropy [27-29). Entropy coding increases the complexity of the
encoding-decoding algorithm and requires extra buffer storage at •
transmitters and

Sec. 4.9 Examples, Comparisons, and Practical Limitations 117


,

• receivers to maintain a constarit bit rate over communication channels. From Fig,
4.20 it is seen that the uniform quanrizer with entropy coding gives a better per-
formance than the Lloyd-Max quantizer (without entropy coding). It has been
found that the uniform quantizer is quite a good approximation of the "optimum
quantizer" based on entropy versus mean square distortion criterion, if the quanti-
zation step size is optimized with respect-to this criterion,
In practice the design of a quantizer boils down to the selection of number of
quantization levels (L) and the dynamic range (A). For a given.number of levels, a
compromise has to be struck between the quantizer resolution (ti - t;-I) and the
attainable dynamic range. These factors become particularly important when the
input signal is nonstationary or has an unknown probability' density.


'4.10 ANALYTIC MODELS FOR PRACTICAL QUANTIZERS [3D}
!

In image coding problems we will find it useful to have analytic expressions for the
quantizer mean square error as a function of the number of bits. Table 4.4 lists the
•distortion function models for the Lloyd-Max and optimum uniform quantizers for
Gaussian and Laplacian probability densities of unity variance. Note that the mean
of the density functions can be arbitrary, These models have the general form
feB) = a2- bB • If the input to the quantizer has a variance (J"2, then the output mean·
square error will be simply (J"2/(B). It is easy to check that the f(B) models are
monotonically decreasing, convexfunctions of B, which are properties required of
distortion versus rate functions.
From Table 4.4 we see that for equal distortion, the number of bits, x, needed
for the optimum mean square quantizer to match the performance of a B-bit
Shannon quantizer is given by
r 2B "" 2.26(rl.%3x)

for small distortions. Solving for x, we get


x =B + 0.5
which means the zero memory (or a one-dimensional) optimum mean square quan-
,
bB
TABLE 4.4 Quantizer Distortion Models, fiB) = a2-
B B
0:$ 2 < 5 5 <2 < 36 36s2B s512
'.
Quantizer a b a b a b
Shannon 1
-2 1

2 1 2
Mean square ,

Guassian 1 1.5047 1.5253 1.8274 '2.2573



1.9626
Mean square
Laplacian 1 1.1711 2.0851 I:Z645 3.6308 1.9572 •
Optimum uniform
Gaussian 1 1.5012 1.2477 1.6883 1.5414 1.7562
Optimum uniform
Laplacian 1 1.1619 1.4156 1.4518 2.1969 1.5944

118 h-nage Sampling and Quantization' Chap.


,
4.

tizer performs within about Y2 bit of its lower bound achieved by an infinite-
dimensional block encoder (for Gaussian distributions).


4.11 QUANTIZATION OF COMPLEX
GAUSSIAN RANDOM VARIABLES

In many situations we want to quantize a complex random variable, such as


z "';' x + jf. (4.61)
where x and yare independent, 'identically distributed Gaussian random variables.
. One method is to quantize x and y independently by their Lloyd-Max quantizers
using B bits each. This would not be the minimum mean square quantizer for z,
Now suppose we write
z =Ae i&
(4.62)


then A and 9 are independent, where A has the Rayleigh density (see Problem
4.15b\.~m! 9 is uniformly distributed. It can be shown the minimum mean square
quantizer for z requires that (\ be uniformly quantized. Let L 1 and L z be the number .
of quantization levels for A and 6, respectively, such that L 1 L z = L (given). Let {Vk}
and {Wk} be the decision and reconstruction levels of A, respectively, if it we~e
quantized independently by its own mean square quantizer, Then the decision levels
{tk} and reconstruction levels irk} of A for the optimum mean square reconstruction
of z are given by [31]
,,

(4.63)
• f 1)
rk"" Wk smc~L~

If L 2 is large, then sinc(1IL+-o> 1, which means the amplitude and phase variables
can be quantized independently. For a given L, the optimum allocation of L 1 and L~
requires that for rates log, L ~ 4.6 bits, the phase should be allocated approximately
1.37 bits more than the amplitude [32].
The performance of the joint amplitude-phase quantizer is found to be Quly
marginally better than that of the independent mean square quantizers for x and y,
However, the preceding results are useful when one is required to digitize the .
amplitude and phase variables, as in certain coherent imaging applications where
amplitude and phase measurements are made directly.

4.12' VISUAL aUANTIZATION


. .

~ ,
• •
The foregoing methods can be applied for gray scale quantization of monochrome .
.images. If the number of quantization levels is not sufficient, a phenomenon called
9gnto!!r~n{{ becomes visible. When groups of neighbor.ag pixels are quantized to the

Sec. 4.12 Visual Quantization 119


Lum mance
u •
f( " )
luminance to
c MMSE c•
,
.
I r: , . ,\
contrast to
I ,

,quantizer
contrast -Iuminanee
,

Figure 4.21 Contrast quantization.

same value, re i9ns of constant gray levels are formed, whose boundaries .i!L~ called
contours (see Fig. 4.23a). rutorrn quantization of common images, where the
pixels represent the luminance function, requires about 256 gray levels, or 8 bits.
Contouring effects start becoming visible at or below 6 bits/pixel. A mean square
quantizer matched to the histogram of a given image may need only 5 to 6 bits/pixel
without any visible .contours. Since histograms .pf imagl:ls vary Quite dr?sticalIy,
o '
-' ---
an s uare ' uantizers for raw image data are rarely u.~e~L A ,uniform
guantizer with 8 bits/pixe IS uS,!lally used. ~, ' .
-,--~..
.. -

,
----
In evaluating quantized images, the eye seems to be quite sensitive to contours
and errors that affect local structure. However, the contours do not contribute very
much to the mean square error. Thus a visual quantization scheme should attempt
to hold the quantization contours below the level of visibility over the range of
luminances to be displayed. We consider two methods of achieving this (other than
allocating the fullS bits/pixel).

Contrast Quantization

Since visual sensitivity is nearly uniform to just noticeable changes in contrast, it is


more appropriate to quantize the contrast function shown in Fig. 4.21. Two non-
linear transformations that have been used for representation of contrast c are [see
Chapter 3]
c == a In(l + l3u), O<u s 1 (4.64)
c ==au~ (4.65)
where 0'. and 13 are constants and u represents the luminance. For example, in (4.64)
the values a == 13/ln(l + 13) for IX lying between 6 and 18, and 'in (4.65) the values
IX == 1,13 "" Y, have been suggested [34].
FOr the given contrast representation we simply use the minimum mean square
error (MMSE) quantizer for the contrast field (see Fig. 4.21f To display (orrecon-
struct) the image, the quantized contrast is transformed back to the luminance value
by the inverse transformation. Experimental studies indicate that a 2% change in
contrast is just noticeable. Therefore, if uniformly quantized, the contrast scale
needs 50 levels, or about 6 bits. However, with the optimum mean square quantizer,
4 to 5 bits/pixel could be sufficient.

Pseudorandom Noise Quantization

Another method of suppressing contouring effects [35] is to add a small amount of


uniformly distributed pseudorandom noise to the luminance samples before quanti-
zation (see Fig. 4.22). This pseudorandom
.
noise is also called dithe.r. To display the
.~

120 Image Sampling


. and Quantization ',Chap. 4 '

u' (m , n)

<II m,o) + vim, n} Quantizer v'(m,n) + display
E L
k bits •

+ • - •

lI(m, 0)

Pseudorandom -
noise unitorrrr.
[-A,A! Figure 4.22 Pseudorandom noise quan-
• •
nzanon,

image, the same (or another) pseudorandom sequence is subtracted from the quan-
tizer output, J'hi< effect i~ that}o the regio~~I~:.::lumir:ancegr1l.die.Qts ,{which are
the regions of contours), the input noise causes pixels to go above or below the
original decision level, th'ereby6reaking tn~cOnfQllrs. However, the average value
of1Iie'quantized-' . ute same with and without the additive noise .
. During display, the noise tends to fill in the regions of contours in such a way that
the spatial average is unchanged (Fig. 4.23). The amount of dither added should be
kept small enough to maintain the spatial resolution but large enough to allow the
luminance values to vary randomly about the quantizer decision levels. The noise
should usually affect the least significant bit of the quantizer. Reasonable image
quality is achievable by a 3-bit quantizer.
Halftone Image Generation

The preceding method is closely related to the method of generating half-tone


images from gray-level images. Halftone images are binary images that give a gray
scale rendition.

For example, most printed images, including all the images printed •

<.

a b
Figure 4.23 c d (a) 3-bit image, con-
tours are visible; (b) g·bit image with
pseudorandom noise uniform over
[-16,16); (c) v'(rn,n), 3-bit quantized
v(rn,1I ); (d) image after subtracting pseu-
dorandom noise.

See. 4.12 Visual Quantization 121



Threshold
v'

Luminance vIm, n) v' (m. n)


o<u(m, n) s: A + •


+ ,

+
0 A
• 0< III m , nl <A
Pseudorandom array

Figure 4.24 Digital halftone generation.

40 60 150 90 10
80 170 240 200 lID
HI -- 140
120
210
190
250
230
220
180
130
70

,•
20 100 160 50 30

52 44 36 124 II132 140 148 156


60 4 28 116 I 200 228 236 164
68 12' 20 lOB I 212 252 244 172
76 84 92 100 i 204 186 188 180 Figure 4.2$ Two halftone patterns, Re-
H2 = ---------,-----------
132140 148 156 52 124 I 44 35 peat periodically to obtain the full size
200 228 236 164 I 60 4 28 116 array. Hz is called 45" halftone screen be-
212 252 244 172 I 68 12 20 108 cause it repeats two 4 x 4 basic patterns at
204 196 188 180 : 76 64. 92 100 ±4S" angles.

in this text, are halftones. Figure 4.24 shows the basic concept of generating
halftone images. The given image is oversampled (for instance, a, 256 x 256 image
may be printed on a 1024 x 1024 grid of black and white dots) to coincide with
the number of dots available for the halftone image. To each image sample (repre-
senting a luminance value) a random number (halftone screen) is added, and the
resulting signal is quantized by a l-bit quantizer. The output (0 or 1) then represents
a black or white dot. In practice the dither signal is a finite two-dimensional pseudo-
random pattern that is repeated periodically to generate a halftone matrix of the
. same size as the image. Figure 4.25 shows two halftone patterns. The halftone
image may exhibit' if the image pattern and the dither matrix have
common or nearly common periodicities. Good halftoning algorithms are designed
to minimize the Moire, ~f.fect. Figure 4.26 shows a 512 X 512 halftone image gener-
ated digitally from the original 512 x 512 x 8-bit image. The gray level rendition in
halftones is due to local spatial averaging performed by the eye. In general, the
perceived gray level is equal to the number of black dots perceived in one resolution
cell. One resolution cell corresponds to ..
the area occupied by one pixel in the
original image.

Color Quantization
• •

Perceptual considerations become even more important in quantization of color


'signals. A pixel of a color image can be considered as a three-dimensional vectorC•

122 Image Sampling and Quantization Chap. 4


a b
Figure 4.26 c d (a) Original 8-bitl
pixel image; (b) halftone screen H,; (c)
halftone image; (d) most significant I-bit!
pixel image.

its elements Cj, C2 , C3 representing the three color primaries. From Chapter 3 we
know the color gamut is a highly irregular solid in the. three-dimensional space.
Quantization of a color image requires allocating quantization cells to colors in the
color solid in the chosen coordinate system. Even if all the colors were equally likely
(uniform probability density), the quantization cells will be unequal in size because
equal changes in color coordinates do not, in general, result in equal changes in
perceived colors.
Figure 4.27 shows a desirable color quantization procedure. First a coordinate
transformation is performed and the new coordinate variables Tk are independently
quantized. The choice of transformation and the quantizer should be such that
the perceptual color difference due to quantization is minimized. In the NTSC
color coordinate Rj.b G N , BN system, the reproducible color gamut is the cube
[0, IJ x [0, IJ x [0, IJ. It has been shown that uniform quantization of each color
coordinate in this system provides the best results as compared to uniform quantiza- •

tion in several other coordinate systems. Four bits per color have been found to be
just adequate in this coordinate system.' ,


,
, T, T,', , WN
Ouannzer

Coordinate T, r:2 fnverse ON


Ouantize«
transformation transformation
• --. •
T, T'3 8;'

Ouenrtaer

Figure 4.1.7 Color quantization.

Sec. 4.12 Visual Quantization 123


PROBLEMS

4.1 In the RETMA scanning convention, 262.5 lines of each field are scanned in ;;\; s.
Show that the beam has a horizontal scan rate of 15.75 KHz and a slow downward
motion at vertical scan rate of 60 Hz.
4.2 Show that a bandlimited image cannot be space-limited, and vice-versa.
4.3 The image [(x, y) = 4 cos 41rx cos 6rry is sampled with oilx = D.y "" 0.5 and Ax = Ay =
0.2. The reconstruction filter is an ideal low-pass filter with bandwidths 6.:lx, lAy).
What is the reconstructed image in each case?
4.4 The NTSC composite video signal of (4.2) is sampled such that Ax = l~xO"Ay '" Ilf/.
At any given frame, say at t = 0, what would be the spectrum of the sampled frame for
the composite color signal spectrum shown in Fig, P4.4?

Chrorninance ,
spectra
I.

Luminance
/speetrum
~

Figure P4,4 NTSC composite color sig·


nal spectrum.

4.5 Prove (4.19) for bandlirnited stationary random fields,


4.6 (Sampling Noisy Images) A bandlimited image acquired by a practical sensor is
observed as g(x, y) = {(x, y) + n (x , y), where ~xo = ~yO ~ ~/8nd n(x, y) is wideband
noise whose spectral density function Sn(~l' ~) = '1/4 is bandiimited to -~.:S~l,
~z:;;; ~., ~. '" 2~f' Therandom field g(x, y) is sampled without prefiltering and with
prefiltering at the Nyquist rate of the noiseless image and reconstructed by an ideal
low-pass filter whose bandwidths are also ~xO'~" (Fig. P4.6). Show that (a) the SNR of
the sensor output g ovet its bandwidth is a}/(1]~}), where a}is the image power, (b) the

- ' .
• • , • •
fix. y} + glx. y) ldeal Iow-pass g(x.y) , Ideal low-pass
filter g,(x, y) g'
+ . Ideal sampler reconstruction
aw· e, +
t•• • ~vo • ii, t...' t", ~ 2tl . filter
Ii.o -liyo -Ii,

, •

Sen$(lr nolse
nix, y)
Ii. ·2;,

Figure P4.6 Sampling of noisy images.


124 . Image Sampling and Quantization Chap. 4
• •
,
SNRs of the reconstructed image with and without prefiltering are a}/( '1~}) and
<r}/(4'1tJ), respectively. What would be the SNR of the reconstructed image if the
sensor output were sampled at the Nyquist rate of the noise without any prefiltering?
Compare the preceding sampling schemes and recommend the best way for sampling
• •
nOIsy Images. .
4.7 Show that (4.15) is an orthogonal expansion for a bandlimited function such that the
least squares error

<rI,. = f[I f(x, y) - ~~a(m, n)<j>m(x)l/JbW dx dy


where <Pm(X).:l sinc(x~xs - m), ljJ,,(y).:l sinc(y~, - n), is minimized to zero when
a(m,n) =f(mAx, nAy).
4.8 (Optimal sampling) A real random field f(x,y), defined on a square [-L, L] x
{- L, L], with autocorrelation function R (x,y;x',y ') ~ E[f(x ,y)f(x',y ')], - L s
x, x", y, y' s L, is sampled by a set of orthogonal functions 4>mAx , >:) to obtain the
M-l N- l

samples am." such that the reconstructed function IM,N (x ,y) ~ i: i: am." 4>m,n(x ,y)
m"'O n=O
minimizes the mean square error O"~f.N ~ ffL L E(/ f(x ,y) - !M,N(X ,Y)1 ] dx dy, Let 2

<Pm.n(X ,y) be a set of complete orthonormal functions obtained by solving the eigen-
value integral equation

JfL R(x, Yi X ', Y ')<Ilm.n(X', y') dx ' dy ' = Am.n <Pm.,,(x, y), -L sx,y -s L

a. Show that {am.,,} are orthogonal random variables, that is,
E[a m,,, am',n'] = Am,,, SCm - m", n - n ').

b. Show that <r1,N is minimized when {<jlmA are chosen to correspond to the largest
~ ~

MN eigenvalues and the minimized error is O"~.N = . i: i: Am .n


m ... M n ,.,. ,v
The preceding series representation for the random field f(x, y) is called its KL
• •
senes expansion.
(Interlaced sampling) The interlaced sampling grid G 2 of Fig. 4."9c can be written as a
superposition of rectangular grids, that is,

g(x,Y)=i:i:S(x-2m,y-2n)+i:i:o(x-2m - 1,y-2n-l)
m,"

Verify Fig. 4,ge by showing the Fourier transform of this array is


-. .---- ._~ .~~---- - .--~~

"G(~d2)=& i:i: S(\;l-k~,~-l~), ·~~i


a.
- - . k+l-cven
Show that the limiting Lagrange polynomial can be written as


lim L"k(X) =
q-'''''
n (x - km + m)= n
m ... -:>'> m""'l
1- (x _:)2
m

m"O
which is the. well-known product expansion of sinc(x - k).
b. Show that the Lagrange interpolation formula of (4,29) satisfies the properties'

f(mA) = f(mA)-that is, no interpolation error at known samples-and

flex) dx =.:ii: f(m.:i)-that is, the area is.preserved according to the trapezoidal
• •
nl
rule of integration; .
c. Write the two-dimensional interpolation formula for q I = q, = 1, 2, 3.

Chap. 4 Problems - ,- .
125
(Moire effect-sone-dimensional} A one-dimensional function f(x) = 2 cos 1'rsoX is
• sampled at a rate ~~, which is just above ~o. Common reconstruction filters such as the
zero- or first-order-hold circuits have a passband greater than ±~J2 with the first zero
crossings at :l:~s. as shown in Fig. P4.lla. .
a. Show that the reconstructed function is of the form _..- >
~- . -
---~-.
,,
j(x) = 2(a + b cos 2'lT~sx) cos 1'rSoX + 2b sin 2'l1"~sx sin 'If Sox

afiiH ("2
~o)
~,
.
~,H(~l
bAI/(" -!?),.
Ejjj:,
. "J~S
'.

I
I •
\ I

I •
t· }r"\
. ! •
~0
,n
i / . ~'" !
! • \ .

o
~o ~
--:I -:I
ta}

I
o
I
t

-1

-2 I ·


-3' L..-._~ _ _..L..._--..L_ _- L_ _.L-_--L_ _....L-_ _.L-_--L_ _....L-_


o 6 10 16 20 25 30 36 40 46 60

(bl

Figure P4.11 (a) Reconstruction filter frequency response.


(b) A one-dimensional Moire pattern.

126 Image Sampling and Ouantizatiorr Chap. 4


• •

whicn is a beat pattern between the signal frequency ~"i2 and one of its companion
frequencies ~, - (~j2) present in the sampled spectrum.
b. Show that if the sampling frequency is above twice the highest signal frequency,
then the Moire effect will be eliminated. Note that if F;o = O-that is, f(x}=
constant-then the reconstructed signal is also constant-that is, the sampling
system has a flat field response.
. c.' As all example, generate a sampled signa! j;(k) '" 2 cos(k1T/L05), which corre-
sponds to ~o = 1, ~, = L05. Now plot this sequence as a continuous signal on a
line plotter (which generally performs a first-order hold). Fig. P4.11b shows the
nature of the result, which looks like an amplitude-modulated sine wave. This is a
Moire pattern in one dimension ..
4.12 (Moire effect-two dimensions) An irnage.j'(x , y) = 4 cos 4'l1'x cos 4'l1'y, is sampled at
a rate ~,,= ~Y' = 5. The reconstruction filter has the frequency response of a square
display spot of size 0.2 x 0.2 but is bandlimited to the region [-5, 5] x [-5, 5].
Calculate the reconstructed image. If the input image is a constant gray instead, what
would the displayed image look like?Would this display have a flat field response?
4,13 If t., r. are the decision and reconstruction levels for a zero mean,: unity variance
random variable u, show that i, = '" + ati, h = fl. + crs, ·are· the 'corresponding
quantities for a random variable v having the same distribution but with mean fA. and
variance r;2. Thus v may be quantized by first finding U = (v - fJ.}/r;, then quantizing u
by a zero mean, unity variance quantizer to obtain u', and finally obtaining the
quantized value of v as v' = [L + au' .

4.14 Suppose the compandor transformations in Fig. 4.19 are g(x) = r'(x) and

A. ( } - ["p,,(II)dll, u>O
w= f u - o
-f(-u}, u <0
where p,,(1l ) = p,,(-,,). This transformation (also called histogram equalization) causes
w to be uniformly distributed over the interval [-i, il. The uniform quantizer is now •
optimum for w. However, the overall quantizer need not be optimum for u.

. . a. Lt
e p" ()
II
_ {1-
O.
-
1
1I1
, -1<,,"';1
otherwise
and let the number of quantizer levels be 4. What are the decision and
reconstruction levels for the input u't Calculate the mean square error.
b. Show that this cornpandor is suboptimal compared to the one discussed in the text.
4.15 (Compandor transformations)
a. For zero mean Gaussian random variables, show that the compander transforma-
. tions are given by f(x) = 2 erf(xIV&r), x 02: O,and g(y) = V&r cn'(yI2), y 02:0,
where erf(x) ~ (1/y';) I~ exp( _yZ) dy. .

b. For the Rayleigh density , .
,
pu(~)= ;zcxP(-;:zz), ,,>0

0, a <0

show that the transformation is
..... - '
[(x) = c J: a ' 13
exp ~1
Z: da

where c is a normalization constant such that f( oo} "" 1.. ..

Chap. 4 Problems 127


4.16 Use the probability density function of Problem 4. 14a.


~,a. Design the four-level optimum-uniform quantizer. Calculate the mean square
.' . error and the entropy of the output.
b. Design the four-level Lloyd-Max quantizer (or the compander) and calculate the
mean square error and the entropy of the output.
c. If the criterion of quantizer performance is the mean square error for a given
entropy of the output, which of the preceding two quantizers is superior?
4.17 Show that the optimum mean square, zero-memory quantizer for a uniformly distrib-
'•. ~./ utedrandom variable achieves the rate distortion characteristics of the Shannon quan-
tizer for a Gaussian distribution having the same variance.
4.18 The differential entropy (in bits) of a continuous random variable u is defined as

H(u) = - r~ Pu(tl )log2Pu(fI) de


a. Show that for a Gaussian random variable g whose variance is (}'2, H(g) '"
~log2(21Tecr2). Similarly, for a uniform random variable [whose variance is 0'2,
H (I) '" ! log2(12cr ' ) . . .
b.. For an arbitrary random variable x, its entropy power Qx is defined by the relation
2,
H(x) = !log,(21TeQx)' If we write Qx '" axcr', then we have Q. '" cr Q/= (00')/
1Te, as'" 1, and a/= 0.702. Show that among possible continuous density
functions whose variance is fixed at 0"2, the Gaussian density function has the
maximum entropy. Hence show that ax :$ l for. any random variable x.
c. For the random variable x with entropy power Q" the minimum achievable rate,
nm;n(x), for a mean square distortion D is given by nm;n(x) = i!og,(Q.ID). For a
uniform random variable f, its zero memory Lloyd-Max quantize. achieves the
2ID}.
. rate n/= ilog2(cr Show that .

,, nf = rl m ;. ( /) + *
that is, for uniform distributions, the one-dimensional optimum mean square
*
, quantizer is within bit/sample of its minimum achievable rate of its Shannon
,
. quannzer.
4.19* Take a 512 x 512 g·bit/pixel image and quantize it to 3 bits using a (a) uniform
quantizer, (b) contrast quantizer via (4.65) with 0: = 1, 13 = 11l, and (c) pseudorandom
noise quantizer of Fig. 4.22 with a suitable value of A (for instance, between 4 and 16).
Compare the mean square errors and their visual qualities.

. BIBLIOGRAPHY

Section 4.1
For scanning, display and other hardware engineering principles in image sampling
and acquisition:

1. K. R Castleman. Digital/mage Processing. Englewood Cliffs, N.J.: Prentice Hall,1979,


pp. 14-51.
2. D. G. Fink, (ed.). Television Engineering Handbook. New York: McGraw-Hill, 1957.
3. H. R. Luxenberg and R. L. Kuehn (eds.). Display Systems Engineering. New York:

McGraw-Hill, 1968. •

128 Image Sampling and Quantization Chap, .II



4. P. Mertz and F. Grey. "A Theory of Scanning and its Relation to the Characteristics of
" the Transmitted Signal in Telephotography and Television." Bell Sys. Tech. J. 13 (1934):
464-515.

Section 4.2
The two-dimensional sampling theory presented here is a direct extension of the
basic concepts in one dimension, which may be found in:

5. E. T. Whittaker. "On the Functions which are Represented by the Expansions of the
Interpolation Theory." Proc. Roy. Soc., Edinburgh, Section A 35 (1915): 181-194.
6. C. E. Shannon. "Communications in the Presence of Noise," Proc. IRE 37 (January
1949): 10-21.

For extensions to two and higher dimensions: .

7. J. W. Goodman. Introduction to Fourier Optics. New York: McGraw-Hill, 1968.


8. A. Papoulis. Systems and Transforms with Applications in Optics. New York: McGraw-
Hill, 1966.
9. D. P. Peterson and D. Middleton. "Sampling and Reconstruction of Wave Number
Limited Functions in N-dimensional Euclidean Spaces." Inform. Contr. 5 (1962):

279-323.
10. J. Sabatier and F. Kretz. "Sampling the Components of 625-Line Colour Television
Signals," Eur. Broadcast. Union Rev. Tech. 171 (1978): 2.

Section 4.3 •

For extensions of sampling theory to random processes and random fields and for
orthogonal function expansions for optimal sampling: .

11. S. P. Lloyd. "A Sampling Theorem for Stationary (Wide Sense) Stochastic Processes,"
Trans. Am. Math. Soc. 92 (July 1959): 1-12.
12. J. L. Brown, Jr. "Bounds for Truncation Error in Sampling Expansions of Bandlimited
Signals," IEEE Trans. Inf. Theory IT-15 (July 1969): 440-444.
13. A. Rosenfeld and A. C. Kak. Digital Picture Processing. New York: Academic Press,
-. 1976, pp. 83-98.

For hexagonal sampling and related results:


14. R. M. Mersereau. "The Processing of Hexagonally Sampled Two Dimensional Signals."
I
Proc. IEEE (July 1979): 930-949. ' .

Section 4.4
For aliasing and other practical problems associated with sampling:
15. R. Legault. "The Aliasing Problems in Two Dimensional Sampled Imagery." in
. Perception of Displayed Information, L. M. Biberman (ed.). New York: Plenum Press,
1973.

Chap. 4' . Bibliography 129


Sections 4.5, 4.6

For comprehensive reviews of image quantization techniques and extended


bibliography:
"

16. Special issue on Quantization. IEEE Trans. Inform. Theory. IT·28, no. 2 (March 1982).
17. A. K. Jain. "Image Data Compression: A Review." Proc. IEEE 69, no. 3 (March 1981):
349-389.

For mean square quantizer results:



18. S. P. Uoyd. "Least Squares Quantization in PCM." unpublished memorandum, Bell
Laboratories, 1957. (Copy available by writing the author.)
19. J. Max. "Quantizing for Minimum Distortion," IRE Trans. Inform. Theory IT-6 (1960):
7-12.
• I

Section 4.7

For results on companders: •

20. P. F. Panter and W. Dite. "Quantizing Distortion in Pulse-Code Modulation with Non-
uniform Spacing Levels." Proc. IRE 39 (1951): 44-48.
21. B. Smith. "Instantaneous Companding of Quantizing Signals." Bell Syst. Tech. J. 27
(1948): 446-472. "
22. G. M. Roe. "Quantizing for Minimum Distortion." IEEE Trans. Inform. Theory IT-I0
(1964): 384-385. .
23. V. R. Algazi. "Useful Approximations to Optimum Quantization." IEEE Trans.
Commun. Tech. COM·14 (1966): 297-301.

Sections 4.8, 4.9

For results related to optimum uniform quantizers and quantizer performance


trade-offs:
24. T. J. Goblick and J. L. Holsinger. "Analog Source Digitization: A Comparison of

Theory and Practice." IEEE Trans. Inform. Theory IT-13. (April 1967): 323-326.

25. H. Gish and J. N. Pierce. . "Asymptotically Efficient Quantization." IEEE Trans.
.

Inform. Theory IT-14 (1968): 676-681. . '


26. T. Berger. Rate Distortion Theory. Englewood Cliffs. N.J.: Prentice-Hall, 1971. ,
• • •

For optimum quantizers based on rate versus distortion characteristics:


• •

27. T. Berger. "Optimum Ouantizers and Permutation Codes." IEEE Trans. Inform.
Theory IT-16 (November 1972): 759-765.

28. A. N. Netravali and R. Saigal. "An Algorithm for the Design of Optimum Quantizers."
Bell Syst. Tech. J. 55 (November 1976): 1423-1435.
29. D. K. Sharma. "Design of Absolutely Optimal Ouantizers for a Wide Class of Distortion
Measures." IEEE Trans. Inform. Theory IT-24 (November 1978): 693-702.

130 Image Sampling and Quantization' Chap. 4



Section 4.10

For analytic models of common quantizers:


30. S. H. Wang and A. K. Jain. "Application of Stochastic Models for Image Data
Compression," Technical Report, Signal &. Image Proc, Lab, Dept. of Electrical
Engineering, University of California, Davis, September 1979.

Section 4.11

Here we follow:

31. N. C. Gallagher, Jr. "Quantizing Schemes for the Discrete Fourier Transform of a
Random Time-Series." IEEE Trans. inform. Theory IT-24 (March 1978): 156-163.
32. W. A. Pearlman. "Quantizing Error Bounds for Computer Generated Holograms,"
Tech. Rep. 6 503-1, Stanford University Information Systems Laboratory, Stanford,
Calif., August 1974. Also see Pearlman and Gray, IEEE Trans. Inform. Theory IT·24
(November 1978): 683-692. . . .

Section 4.12

For further details on visual quantization:

33. F. W. Scoville and T. S. Huang. "The Subjective Effect of Spatial and Brightness
Quantization in PCM Picture Transmission." NEREM Record (1965): 234-235.
34. F. Kretz. "Subjectively Optimal Quantization of Pictures." iEEE Trans. Comm.
COM-23, (November 1975): 1288-1292. . .
35. L. G. Roberts. "Picture Coding Using Pseudo-Random Noise:" IRE Trans. Infor, ,
\
Theory IT-8, no. 2 (February 1962): 145-154.
36. J. E. Thompson and J. J. Sparkes. "A Pseudo-Random Quantizer for' Television
Signals." Proc. iEEE 55, no. 3 (March 1967): 353-355.
37. J. O. Limb. "Design of Dither Waveforms for Quantized Visual Signals," Bell Syst.
Tech. J., 48, no. 7 (September 1969): 2555-2583.
38. B. Lippel, M. Kurland, and A. H. March. "Ordered Dither Patterns for Coarse Quanti-
zation of Pictures." Fmc. IEEE 59, no. 3 (March 1971): 429-431. Also see IEEE Trans.
Commun. Tech. COM-13, no. 6 (December 1971): 879-889.
39. C. N. Judice. "Digital Video: A Buffer-Controlled Dither Processor tor Animated
Images." iEEE Trans. Comm. COM·25, (November 1977): 1433-1440.
40. P. G. Roetling. "Halftone Method with Edge Enhancement and Moire Suppression." •
J. Opt. Soc. Am. 66 (1976): 985-989.

41. A. K. Jain and W. K. Pratt.' "Color Image Quantization." National Telecomm.
Conference 1972 Record. IEEE Publication No. 72CH0601·SoNTC, Houston, Texas,
December 1972. .

. 42. J. O. Limb, C. B. Rubinstein, and J. E. Thompson. "Digital Coding of Color Video
Signals A Review." IEEE Trans. Commun. COM·25 (November 1977): 1349-1385.

Chap. 4 Bibliography 131


,
~jJa\lti!il£iil~~'::IjAllai!ii$!ll!~.W!!ffli~)n.ULE_ll!l:ilIlI

Image Transforms I "


1II_'Jli&D~*~~'\iIiiiilll~iwmllilllih_~_

5.1 INTRODUCTION

Th~ termimag!Ltransforms usually refers to a class of uni!-ary matrices used fOI


representing images. Just asa one-dimensional signal can be" represented by an
"orthogonal series of basis [unctions, an image can also be expanded in terms of a
discrete set of basis arrays called basis imaSes. These basis images can be generated
by unitary matrices. Alternatively;'~gl~n N x N image can be viewed as an N x 1 2

vector. An image transform provides a set of coordinates or basis vectors for the
vector space. "
For continuous functions, ..Qr!hogopi!Ls.eJ:· 'Q!1S provide ~ -
effi~it:njs which can be used for any further processing or analysis of the functions.
"For a one-dimensional sequence {u(n), O:s; 11. es N - I}, represented as a vector u of
size N, a unitary transformation is written as
N-l

v=Au :::;>v(k) = 2: a(k,n)u(n), (5.1)


, n=O

where A -I =: A· T (unitary-1' This gives


,---'" '''.-

N-l
" O:s;n<N-1
:::;>u(n) =2: v(k)a* (k, 11.1 (5.2)
"" .<=0

Equation (5.2) can be viewed as a series representation of the sequence u(n). The
columns of A *T, that is, the vectors at 4.{a' (k, s), 0 :s; 11. S N - 1V are called the"
basis vectors of A. Figure 5.1 shows examples of basis vectors of several orthogonal
-transforms-encountered in image processing. The series coefficients v(k) give "a
representation of the original sequence u(n) and are useful in filtering, data
compression ,feature extraction, and other analyses.


132 •

,
Cosine
,,

5ine ~Iadama'd Iia.r Slant l<LT
• •
,

k~O !-.LLL.lllLL u.ur.u.u.


"
-

k~' I I I L-Lr-r-rr

t;2
,
,,
, •

,
"
\
k';,3

t=4
.,

1.:=5

.1.:=6
'''IT
,
,~

k=1 ,
I
I
t
I r
~~


...
Ii.> •
01234561
,
01:234567 01234567 01234567
,
01234567 01234567

W •
Figure 5.1 .Basic vectors of the 8 X 8 transforms.

,
"
5.2 TWO-DIMENSIONAL ORTHOGONAL
AND UNITARY TRANSFORMS

In the context of image processing a general orthogonal series expansion for an


N x N imageu(m, n) is a pair of transformations of the form
N- J f

~(k, l) = LL uim, n)a.Am, n), O::=;k, I sN -1 (5.3)


N-l
uim, n) = LL v(k, l)«t.dm, n),
k,I=O . .
O::=;m, n::=;N-1 (5.4)

wbere {ak. I (m, n)}, called an image transform, is a set of complete orthonormal
- . "

discrete basis functions satisfying the properties


,IV ·1 ,

? Orthonormality: 2::L
m. n
"u(m, n)«:-,I' (m, n) = o(k'- k:', I-I')
= 0
(5.5)
N-I .
',i
....-l1'Completeness: LL
i,I-O
«k,l(m,n)ak,dm',n') =o(m -m',n -n')
.
(5.6) .

. The elements v (k, I) are called the t1]ln§fQrrncoe[ficients and V ~ {v (k, I)} is
called the lransforl'rled image. ,The .9rthQri<;!!:m~l!1x property assures that any trun~
cated series expansion of the form •
p. 1 Q - [ .
up,Q(m.n)~.L: .L: v{k,/)«t,dm,n), P::=;N, Q::=;N (5.7)
.-01-0

will minimize the sum of squares er~2:
1'/0 / "'} N-J
,I ,' "
- 'ir~ = .L:L [u(ni, n) - UP.Q (m, nW (5,8)
m,Il-'='O

when the coefficients v(k./) are given by (5.3). The cQmpletepe§.(l property assures
that this error will be zero for P = Q = N (Problem 5.1).

Separable Unitary Transforms

The number of multiplications and additions required to compute the transform


coefficients v(k, l).using (5.3) is f;1{,lt;4), which is quite excessive for practical-size
images. '[he dimensionality of the.problem is reduced..to 0 eNS} wl!.~n the rr~!!sform
is restricted tob-e g;parabJe•• that is, . --
_~- -~
<
_~~.---_.~~ .. _"~._ .._.. -.- ---- _.~._--

.
«k,l(m, n) = ak(m)bl(n) ~ a(k, m)b(/, n) (5.9)
"
where {adm), k = 0, , .. ,N - I}, {b i (n), I = 0, ... ,N -l} are one-dimensional
complete orthonormal sets of basis vectors. .lm . . . . '
A ~ {a(k, m)}
.--.- .....
---_~
and B ~ \l!QI-Ellihould bt'l unit~!1 m~i~~ thems~l'y~,ior example,
.
T (5.10) .
AA*T·=A A* =1
• . ,

Often one chooses B to be the same as A so that (5.3) and (5.4) reduce to

134 Image Transforms CMp.5


N-I
v(k, /) = 2:2: a(k, m)u(m, n)a(l, n)- v = AUA T
(5.11)
m,Il=O
N-1 •
u(m, n) = 2:2: a" (k, m)v(k, I)a" (/, n) - U = A.TVA" (5.12)
k, I ~ 0

For an M x N •..
~ectang.ula.tjm.l!ge,
,",
the transform pair is
V=AMUAN (5.13)
U =' AAiTVA,V (5.14)
where AM and AN are M x M and N x N unitary matrices, respectively. These are
called two-dimensional separable transformations. Unless otherwise stated, we will
always imply the preceding separability when we mention two-dimensional unitary
transformations. Note that (5.11) can be written as
VT=A[AUF (5.15)
~~!insJ5.1l) can be performedlJyjir:st!!ll.~sforming~ach
c olum
then transforming each row of the result to obtain tnerows QfY.
'Z ' ~.~ _".
. '"~~- .. _.... -.~.-.~"' ... __.-.._- _ _ ~... •

Basis Images
Let at denote the kth :oll\!1ln ,Of A "T. Define the matrices
"
A k.I- 8k,,*T
81 (5.16)
•.
and the matrix inner product of two N x N matrices F and G as •

N-1N-l
(F, G) = 2: 2: f(m, n)g* (m, n) (5.17)
m""On=O

Then (5.4) and (5.3) give a series representation for the image as
J N- 1 f
IU = ~~ v(k, l)AZ,1 I . (5.18)

I~ (k, I) = (U, AZ. 1) J (5.19)


E~uation (5.18) ..expreSSl<S a.I1Y image 11.as a lir~~ll~ combinatiop of ~l1e_N:.matrice~
Ak . :. k, I = 0, ... , N - 1, which are called the.Q9s4Lma.ges...Figure 5.2 shows 8 x B •

basis images for the same set of transforms in Fig. 5.1. The transform coefficient
v (k, I) is simply the inner product ofthe (k, /)th basis image with the given image. It
is also called the projection of the image on the (k, /)th basis image. Therefore, any
. ". N x N Image can be expanded in a series usingE .compl!:teset of N 2 basis images. If
U and V are mapped into vectors by row ordering, then (5.11), (5.12), and (5.16)
yield (see Section 2.8, on Kronecker products) . .-
\ lJ'=.(A@A)U'~vtU' (5.20)


)~:= (A@A)*To-=vt*To- : (5.21)

where •

(5.22)

Sec. 5.2 Two-Dimensional Orthogonal and Unitsry Transforms 135


" ,~
.--- .~"'" '. -~-~'

" , ,, I
, ,
" ~ .
• .",
"" '
~
~",
, 4 ",',J
"
,t •
''I' ,-,

~,- ,• "f f
~~
• T•
),.;
,,
...."'.'",
"
,.-
, • , , , ,
,•, , •
,0

" ,~ ;., ~ ,.. "


"


~
--... .'
,
"....• ,'J" , ..
·, "-- ,--,
·. ..
'" ...
. """}"
~ ,
.' .~It"".
'

,
iI ~ .... ,

h--.
'" '
,~t:?:JfI'
r~
."!'

n, ,,;-"""""
, ,

rifT"
,'-'","
.. _.~

- - " ,
,,~ .'~
,

,
\
,
~:.
If~,J
, " '"
.':: ,-

, '.,
.':4
" " IF
i/O'
. ~.--
,
,
,
," ,"

. .' ,....,..--

, ,•
.... "

,,
-",--,"
oL__' __
._-}
. _••",. 0, , ,
" .. •,• , ••
",,
• .'
,'-'. .'

,
.~,
"
"...... is 1-' "
1.1 ..~ ."
, ,."
......
"-.'~"""
,
,--- "

, , , .
·, .,..
,
"'
"

.'" • " .. 'y:- " ,",


" .}.< " ~
M;t . '

136 lmage Transforms Chap,


,

is a unitary matrix. Thus, given any unitary transform A, a two-dimensional sepa-


rable unitary transformation can be defined via (5.20) or (5.13).
~~~j>I~~SJ •

F~ For the given orthogonal matrix A and image U

A =~ (i -D,. U = (; ~)
the transformed image, obtained according to (5.11), is
,,
I
V
_ I
-2
(11 .~11)(13 2)4, (1\1 -11) ~ -24 :"'2
-2
1( ,6)(11 -1'1) -_(-25 -1)0
I . ' ',

To obtain the basis images, we find the outer product ofthe columns of A' r, which gives

A;.o=i(D(1, l)=~(i i)
, and similarly

,•
•,
, ,(1 -1)
(iO.l =" i: -1 =A,.o,
«r
,

, The inverse transformation gives

A,TVA' ,);1(11 -11)(,-25 -1)(1


= 1) (3
0 1 -1 =" 7
1 -1\(1
-1) 1
which is U, the original image.


Kronecker Products and Dimensionality

Dimensionality of image transforms can also be studied in terms of their Kronecker


product separability. An arbitrary
,
one-dimensional, transformation.
• •

,,
,ff =./t.v (5.23)
is called separable if
(5.24)
" This is because (5.23) can be reduced to the separable two-dimensional trans-
formation .
. , (5.25)

.where X and Yare matrices that map into vectors ,» andJl', respectively, by row
ordering. If vt is N X N and A, , A2 are N x N, then the number of operations
2 2

required for implementing (5.25) reduces from N 4 to' about 2N 3 • The number of
operations can be reduced further if A j and A2 are also separable. Image transforms
such as discrete Fourier, sine, cosine, Hadamard, Haar, and Slant can be factored as
Kronecker products of several smaller-sized matrices, which leads to fast algorithms
for their implementation (see Problem 5.2); In the'context of image processing such
matrices are also calledJast ~'Eage transforms.

Sec. 5.2 Two-Dimensional Orthogonal and Unitary Transforms 137


Dimensionality of Image Transforms·


~2C9I11P~utati911sfarV can also be reduced by resttjctingtheehoice of A to the
fast transforms. whose matrix structure all(}Ws a .factorization of rherypc
." "W ... , ?
"
(5.26)
where Am, i = 1•... •p (p « N) are matrices with just a few nonzero entries (say r,
with r «: N). Thus. a multiplication of the type y = Ax is accomplished in rpN
operations. For Fourier, sine. cosine, Hadamard, Slant, and several other trans-
2
forms, p = log, N, and the operations reduce to the order of N log2 N (or N log, N
for N X N images). pepending on the actualtransform, oneoperation canbe
"defined as one multiplication and one addition or subtraction, as il1Jhe Fourier
transform. or oniaadltloJ] QrSuDtractfQD,'asmihc Hadamar<[tnlTlSform .
., >_,<",._.. ~._._._~1 _

•Transform Frequency
For a one-dimensional signal I(x). frequency is defined by the Fourier domain
. variable ~. It is related to the number of zero crossings of the real or imaginary part
of the basis function exp{j21f~x}. This concept can be generalized to arbitrary
unitary transforms. Let the rows of a unitary matrix A be arranged so that the
number of zero crossings increases with the row number. Then in the trans-
formation
• y=Ax
the elements y(k). are, ordered ag;ording to inc~sing !!:'m'e numker o~ transtor~
fteg,u§1:l(_·Y.~ In the sequel any reference to freque!1cY" willimply the transform
frequency, that is, discrete Fourier frequency, cosine frequency, and so on. The
term spatial frequency generally refers to the continuous Fourier transform fre-
quencyand is not the same as the discrete Fourier frequency, In the case of
• Hadamard transform, a term called sequency is also used. It should be noted that
this concept of frequency is useful only on a relative basis for a particular transform.
A low-f ~, t:t:a.nstQrm cpuld GIl..ntainJ.he..high-fre';;h\Cnc.Lh(jrmol1!f§
of another transform. 7' , "
.~-------""'"" •

The Optimum Transform


r im ortant co . ,0 <'I a _!Iansform .is its ,Rerforma!lce in.

fifteril!lL~nd data comgre,ssi.on 9,f imqges ba~ed all the J!l.£'.l!P s,Q.llare..J:rite.cion. The
Karhunen-Loeve transform (KLT) is known to be optimum with respect to this
criterion and is discussed in Section 5.11.

5.3 PROPERTIES OF UNITARY TRANSFORMS

Energy Conservation and Rotation

In the unitary transformation,


• ,
- v=Au

• ,
(5.27)

*
138 Imag,e Transforms 'Chap. :>

This is easily proven by noting that
)11-1 )II-I
!'vliZ ~ L Iv(k)1 2 = V*T V = U*T A *T Au = U*T U = L lu(n)1 1 ~ lIuW
k~O n=O

,J:h 'nit ation preserves the SI nal energy or, equivalently, the_
JC1lg.th of the ve<:;!.Q; U in !~lt([me.nSiilliiiLYcctor SR~ce~ _.18- m~nseverY~\lnitilry_.
~nsfo!!,'.ationi§ sim '. . ·.-the.Yec1or u in the N -dim~ 'lector
space. Altern tiv ;furmatiof!)s a rotation of the b'.!§iu;..~
IDlQ.i.he comp9nents of v are J;.lJ.~.prQJ~tions~tu.D.a~basis (see Problem SA).
Similarly, for the two-dimensional unitary transformations such as (5.3), (SA), and
(5.11) to (5.14), it can be proven that ~.

)11-1 )11-1

LL lu(m, n)12 = LL Iv(k, l)1 2


(5.28)
m.n=O k,'J=O •

Energy Compaction and Variances of Transform Coefficients

Most unitary transfQr.m~~havea"Je!Jden.c~.tQ R!lck a)ar~ctiQnQf the av~


ener .. 'v ~~~ntso cienrs.
ince the total energy is preserved, this means many of the transform coefficients
will contain very little energy. If fl.u and R. denote. the mean and covariance of a
vector u, then the corresponding quantities for the transformed vector v are given
by
.....v~ E[v] = E[Au] = AE[u] = Af.Lu (5.29)

R, = E[(v - .....v)(v - ""v)*T]
= A(E[(u - ......)(u - ......)*7])A *T =ARuA *T (5.30)

'The transform coefficient variances are given by the diagonalelements of R. ,that is
(1~(k) = [Rv]k,k = [ARuA*1Jk,k . (5.31)
Since A is unitary, it follows that
~---'
.(J N-l N-l

L lIJ.v(k)1 =,..: T ...... = .....: T A*T A .....u= L 1IJ..(n)1


£:""'0
2
fl""lO
2
(5.32)
N-l N-l
2: (1~(k) = Tr[ARuA *7]:= Tr[RuJ i= 2: (1~(n) (5.33)
.=0 n~O
N-l N-l

L
k=O
E(lv(kW] "" L E[lu(n)1
n=O
2
] . (5.34)

The average energy E[!;'(k)j2J of the transform coefficients v(k) tends to be un-
evenly distributed, although it may be evenly distributed for. the input sequence
u(n). For a two-dimensional random field u(m, n) whose mean is IJ.u(m, n) and
covariance is rtm, n; m', n '), its transform coefficients v (k, l) satisfy the properties

IJ.v(k, l) = L L a(k, m)a(l, n)Ik.(m, n) (5.35)


m n

Sec. 5.3 Properties of Unitary Transforms 1~9




O"~(k, l)= E[ll'(k; l) - fl.,(k,l)j2j


(5.36)
= L: L: L: L: aik, m)a(l, n)r(m, n; m ', Il ')a* (k, m ')a* (I, n ')
m n rn' n"

If the covariance of utm, n) is separable, that is


rem, n; m', n ') = rl(m, m ')r2(n; n ') . (5.37)
then the variances of the transform coefficients can be written as a separable
product '
O"~(k, I) = 0"1(k)0"~(I)

~ [AR,A*1]k.k[AR2A*']I,1 (5,38)

where

Decorrelation
hen tlie input vector elementsar€;Jlighl:x.£on~l<1teJJ._the-1r.im~fo[11l~£Qeffi~.!!.ts
tend to~. e unco.rrelated. This means the off-diagonal terms of the COvl!ri<!lli:cmatrix
,!! tend to become §.!!lill . 0 the ~oiiaI elemenis:·~~- .. .~~
WitJi"respe<:t to the preceding two properties, the KL transform is optimum, .
that is, it packs tile maximum average energy in a given number of transform
coefficients while completely, decorrelating them. These properties are presented in
greater detail in Section 5.11.
Other Properties
Unitary transforms have other interesting properties. For example, the determinant
and the eigenvalues of a unitary matrix have unity magnitude. Also, the entropy
of a random vector is preserved under a unitary transformation. Since entropy is
a measure of average information, this means information is preserved under a
unitary transformation.
Example 5,2 (Energy compaction and deeorrelation)
A 2 x 1 zero mean vector u is unitarily transformed as

.: 1
v = '2
(\I3t' 1~
-1 v3} u,
v whereRu =
t> (1p Ip)' O'<p< 1
'.
The parameter p measures the correlation between u (0) and u (1).. The covariance of v
is obtained as
R = (1 + v3pl2 ·pl2. )
v \ p/2 1 - v3p/2
Fromthe expression for R. ,(1';'(0) = (1';(1) '" 1, that is, the total average energy of 2
is distributed equally between u(O) and u(l). However, (1~(0) = 1 + v3pl2 and
(J"~(1) = 1 - v3p/2. The total average energy is still 2, but the average energy in v(O) is .
greater than in v(l). If p = 0.95, then 91.1% of the total average energy has been
packed in the first sample. The correlation between v (0) and v (1) is given by •

140 Image Transforms Chap. 5


"
, •
A E[v(O)v(l)] p
p,(O, 1) =(1" v(O)~ ,(1) = 2(1::'" ~ pi)V2

which is less in absolute value than ipl for Ip! < 1. For p = 0.95, we find p,,(O, 1) = 0.83.
Hence the correlation between the transform coefficients has been reduced. If the
foregoing procedure is repeated for the 2 x 2 transform A of Example 5.1, then we find
(1":(0) = 1 + p, 0';(1) = 1- p, and p,(O, 1) = O. For p '" 0.95, now 97.5% of the energy is
packed in v(O). Moreover, v (0) and vCl) become uncorrelated. "

5.4 THE ONE-DIMENSIONAL DISCRETE


FOURIER TRANSFORM (OFT}

The discrete Fourier transform (DFf) of a sequence {u (n), n = 0, ... ,N - 1} is


defined as "
N-l
v(k) '" 2:
n=O
u(n)W~, k=O,1, ... ,N-1 (5.39)

where
A
WN=exp {:-i N
2'1T}
(5.40)

The inverse transform is given by •


N-l
u(n) "'~ k2;o V(k)w;/n , n=O,l, ... ,N-l

(5.41)

The pair of equations (5.39) and (5.41) are not scaled properly to be unitary
transformations. In image processing it is more convenient to consider the unitary
DFT, which is defined as
N-l


v{k) = INN L u(n)W~,
n=O
k = 0, ... ;N-l (5.42)
, 1 N-l
u(n) = "\IN L v (k)W./ n
, n = 0, ... , N-l (5.43)
N k=O
The N x N unitary DFf matrix F is given by

F= 1 Wkn 05.k,n5.N-1· (5.44)


. "\IN N,

Future references to DFf and unitary DFf will imply the definitions of (5.39) and
'(5.42), respectively. The DFf is one of the most important transforms in digital
, signal and image processing. It has several properties that make it attractive for .,
image processing applications.

Properties of the OFT/Unitary OFT


Let u(n) be an arbitrary sequence defined for n, = 0,1, ... ; N -1. A circular shift
ofu(n) by I, denoted byu(n -1)" is defined as u(n -I) modulo s']. See Fig. 5.3
for 1 = 2, N = 5.

Sec. 5.4 The One-Dimensional Discrete Fourier Transform 10FT) , 141



Llln) u!(n - 2; modulo.5)

o 1 234 o 1 2 3 4
n • n • Figure 5.3 Circular shift of u (n) by 2.

The OFT and Unitary OFT matrices are symmetric. By definition the
•• matrix F is symmetric. Therefore, .
(5.45)

The extensions are periodic. The extensions of the DFT and unitary DFT
of a sequence and their inverse transforms are periodic with period N. If for
example, in the definition of (5.42) we let k take alI integer values, then the
sequence v (k) turns out to be periodic, that is, v (k) == v (k + N) for every k.

The OFT is the sampled spectrum of the finite sequence u (n) extended
. by zeros outside the interval [0, N - 1]. If we define a zero-extended sequence
u(n)A u(n), O~n~N-l (5.46)
0, otherwise

then its Fourier transform is
<'Xl N-l

U(w) = 2: u(n) exp(-jwn) = 2: u(n) exp(-jwn) (5.47)


n=-OO n~O


Comparing this with (5.39) we see that

v(k) = ue~k) (5.48)



Note that the unitary DFT of (5.42) would be U(2'TfkIN)rVN.

. The OFT and unitary DFT of dimension N can be implemented by a fast


algorithm in O(N logzN) operations. There exists a class of algorithms, called
the fast Fourier transform (FFT), which requires O(N log2N) operations for imple-
menting the DFT or unitary DFT, where one operation is a real multiplication and a .
real addition. The exact operation count depends on N as well as the particular
choice of the algorithm in that class. Most common FFT algorithms require N = 2P,
where p is a positive integer.

The DFT or unitary OFT of a real sequence {x(n),n =O, ... ,N-l} is
conjugatesvmmetric about N12. From (5.42) we obtain

N-I ,v-1
2: u;(n)W,v(N-k)n= 2: u(n)Wt"=v(k)
v*(N-k}= •
n=O n=O
'-A--::-:"' ~.-.-r--~'
. , ,.

142 Imal'1 Tran~forms Chap.S


=> V(~-k)=v*(~+k), k = 0, ...


N
'2 - 1 (5A9)

iV(~-k)l= v(~+k)1 (5.50)

Figure 5.4 shows a 256-sample scan line of an image. The magnitude of its DFT is
shown in Fig. 5.5, which exhibits symmetry about the point 128. If we. consider the
.

periodic extension of v (k); we see that .


v(-k) =v(N -k)
., , . . ,

200

150

100
,,(Il).

50 • •

o I..-_..J.._-..I"--_.J..._...J._ _ - -
o 50 100 150 zoo 250
FigIIref>.4 A 256-sampJe scan line of an
)it' n Image. •


89 •

70
.,

56
vCt)

42

28 •, ,

19

0.D1
o 100 150 200 250
• •
Figure 5.S Unitary disc. :te Fourier
k "...... transform of Fig. 5.4.

Sec. 5.4 The One-Dimensional Discrete Fourier Transform (OFT)' 143



, •

HIClh;~the (unitary) D FT frequencies N /2 + k, k = 0, ... , N /2 - 1, are simply the


negative frequencies at 00 = (21T/N)("'" N /2 + k) in the Fourier spectrum of the finite
sequence {u en), a <: n S N - 1}. Also, from (5.39) and (5.49), we see that v(O) and
v(N!2) are real, so that the N X 1 real sequence

t N } { . N } (N\
v (0), { Re{v(k)},K=l""'Z-l, Im{v(k)},k=l""'Z-l ,V\Z) (5.51)

completely defines the DFf of the real sequence u(n). Therefore, it can be said that
the D.FT or unitary DFT of an N x 1 real sequence has N degrees of freedom and
requires the same storage capacity as the sequence itself.

The basis vectors of the unitary OFT are the orthonormal eigenvectors
of any circulant matrix. Moreover, th~ eigenvalues of a circulant matrix are
given by the OFT of its first column. Let H be an N x N circulant matrix.
Therefore, its elements satisfy , .
• [H}rn.• = hem - n) = h[(m - n) moduloN}, Osm, n <N -1 . (5,52)
The basis vectors of the unitary DFT are columns of F* T = F* ,that is,
. T
<l>k = .IN WiV lm
,0 s n S N - 1 , k = 0, ...• ,N -1 . (5.53)

Consider the expression

(5.54)

Writing m - n = I and rearranging terms, we can write


"'-'I -I "'-1
[H<I>k}m=.~w/m 2: h(I)H1f+ 2: h(l)W%- 2: h(/)H1f (5.55)
vN I~O . 1~-IV+m+1 l=m+1

Using (5.52) and the fact that W",' = W%-I (since W% = 1), the second and third
terms in the brackets cancel, giving the desired eigenvalue equation
[H<I>kJm = Ak<l>k(m)
or
(5.56)
where Ak , the eigenvalues of H, are defined as •
IV-I
Ak~
,
2:
[",,-0·
h(l)W~, Osk sN-1 (5.57) .

This is simply the DFT of the first column of H •.

Based on the preceding properties of the DFT, the following • additional •

properties can be proven (Problem 5.9).

Circular convolution theorem. The DFT ofthe circular convolution 0/ two


sequences is equal to the product of their DFTs, that is, if /7

144 lmaqe Transforms Cha;:l. 5

N-l
X2(n) = L h (n - k)cXl(k), O:Sn:sN-1 (5.58)
k=Q
,
then

DFT{X2(n)}N = DFT{h(n)}NDFT{Xl(n)}N (5.59)

where DFT{x(n)}N denotes the DFT of the -sequence x(n) of size N. This means we
can calculate the circular convolution by first calculating the DFT oh2(n) via (5.59)
and then taking its inverse DFT. Using the FFT this will take O(N log2N) oper-
ations, compared to N 2 operations required for direct evaluation of (5.58).

A linear convolution of two sequences can also be obtained via the FFT
by imbedding it into a circular convolution. In general, the linear convolution
of two sequences {h(n),n =0, .. :,N' -1} and {xl(n),n =0, ... ,N -1} is a se-
quence {xz(n), O:s n :s N' + N - 2} and can be obtained by the following algorithm:

Step 1: Let M> N' + N - 1 be an integer for which an FFT algorithm is available.
Step 2: Define Ii(n) and xl(n), O:s n :S M -1, as zero extended sequences corre-
o spending 10 hen) and x.(n); respectively.
Step 3: Let Y1(k) = DFT{xl(n)}M, Ak ':'" DFT{h(n)}M' Define y.(k) = AkYl(k), k =
. 0, ... ,M - 1 . . . ;
Step 4: Take the inverse DFT of y;(k) to obtain x2(n). Then x2(n) =x2(n) for
O<n:sN + N t - 2 . .
- . - - ' .....-.......,

Any circulant matrix can be diagonalized by the DFT/unitarv DFT.


That is,
FHF"'=A (5.60)

where A = Diag{Ak' O:s k :S N - I} and Ak are given by (5.57). It follows that if


C, C 1 and C 2 are circulant matrices, then the following hold.
,

1. C1 C 2 = C2 C 1 , that is, circulant matrices commute.


2. col is a circulant matrix and can be computed in O(N logN) operations.
3. C Tt C 1 + C2 , and fCC) ar.$' all circulant matrices, where f(x) is an arbitrary
function of x, .
• •


5.5 THE TWO-D!MENSIONAL OfT

The two-dimensional DFT of an N x N image {u (m, n)} is a se ara
defined as

N-IN-I
v(k,l) = L L u(m,n)WffWj(l, o:S k, l :SlY - 1 (5.61)
muG n=O

Sec. 5.5 The Two-Dimensional OJIT 145



and the inverse transform is


. 1 N";IN~l
utm, n) == Z -::- 2. lICk, J)wN,m W,v''', Osm,fl sN -:1 (5.62)
IV k~O I~O
The two-dimensional ~nitarY'pFr pair is defined as
1 N-IN-l
v(k, l) == N:tQft~O u(m, n)W~'"W~, O<k,[sN-l (5.63)

utm, n) (5,64)

In matrix notation this becomes


V==FUF (5.65)
,..""
.'-
."

',' .. .
~,"r,
~,,~

,~
, ' ".

L
"
,
• ,
nd~ ... , •
la) Original image; (bj phase;

,
t:
, ,;. ., , "

. '

(
" < " ,
•"

, , '
..

, '
" ..,

-.<

lei magnitude;, (dl magnitude centered, ,



Fi#ure 5.6 Two-dimensional unitary DFf of a 256 x 256 image,

146 Image Transforms Chap. 5


,
, ,''':1

'j

'<i

«I ;.
>;,i
",
'i~,
,

,I a b!
, ; l' .~

-, ,.,,~ c dl
'.~,.4t Figure 5.7 Unitary DFT of images
'::'
,
(a) Resolution chart;
{l
(b) its DFT;
f' (c) binary image;


,f 1
./-.
(d) its DFT. The two parallel lines are due
to the 'l' sign in the binary image.

U=F*VF* (5.66)
If U and V are mapped into row-ordered vectors Ii' and 0', respectively, then
, ., (5,67)
g;=F(8)F (5,68)
The N 2 x N 2 matrix g; represents the N x N two-dimensional unitary DFf. Figure
5.6 shows an original image and the magnitude and phase components of its unitary
DFf. Figure 5.7 shows magnitudes of the unitary DFfs of two other images.

Properties of the Two-Dimensional OFT •

The properties of the two-dimensional $lt8r,y grT are quite similar to the one-
dimensional case and are summarized next.

Symmetric, unitary.
g;T=g;, g;-l = gr- = F* (8)1'* (5.69)

Periodic extensions.
+ N, 1 + N) = v{k, I),
v(k 'rIk, l
" (5.70)
,
utm + N, II + N) = utm, n), 'rim, n

Sampled Fourier spectrum. If u(m, Il) = u(m,n),O "m,nsN-l, and


item, n) = 0 otherwise, then
- (21Tk 2ir1\
, U N' N ) = DFT{u(m, n)} = v(k,l) . (5.71)
-
where U (001 ,(02) is the Fourier transform of u(m, n).

Sec. 5.5 The TViItl-Oimensional OFT 141


,
Since the two-dimensional DFT is separable, the trans-
.Fast transform.
formation of (5.65) is equivalent to 2N one-dimensional unitary DFfs, each of
which can be performed in O(N logzN) operations via the FFf. Hence the total
number of operations is 0 (N' logz N).

Conjugate symmetry. The OFT and unitary OFT of !if!!{ i,!!,!!~s exhibit
conjugate symmetry, that is,

v (~± k,~± t) "" v* (~+ k,~+ 1), (5.72)

or
v(k,l)=v*(N-k,N-l), Osk,lsN - 1 (5.73)
From this, it can be shown that v(k, 1) has only N 2 independent real elements. For
example, the samples in the shaded region of Fig. 5.8 determine the complete OFT'
or unitary OFT (see problem 5.10).

Basis images. The basis images are given by definition [see (5.16) and •
(5.53)]: .

A k,* I -- A. "" -
'l'k'l'l -
1-
N {w-(km
N
+ In)
, Osm,n sN -I}, Osk,lsN-l . (5.74)

Two~dimensional circular convolution theorem.


The DFT of the two·
dimensional circular convolution oftwo arrays is the product oftheir DFTs.
. Two-dimensional circular convolution of two N x N arrays h(m; 11.) and
ul(m, 11.) is defined as
N-1N-l .
u2(m, 11.) = L L h(m - m", n - 11. ')Cul(m ',n '), Osm,nsN-l (5.75)
m'=O~'=U

where I

hem, n)c = hem moduloN, n moduloN) (5.76)


. ;:a I

o 1 N-l
... _----- • • •

INI2) - 1
N/2 •

Figure 5.8 Discrete Fourier transform


coefficients v (k, I) in the shaded area de-
N/2 termine the remaining coefficients. •

148 Image Transforms



,

n n'


.---,
N -1 ..; - - - - - - - , '" im, nl d--_~-_I :
..I

him, n) = 0
h(m - m', n - n'}c

M- 1 1-.-..... ~-h(m.nl #00 /


,d.,., (m. n)
\d.,
:.l- ....J.._m _m'
M-1N-l

(a) Array hIm. n). (bl Circular convolution of


him, n) with ",1m. nl
over N X N region.
Figure S.' Two-dimensional circular convolution.

Figure 5.9 shows the meaning of circular convolution. It is the same when a periodic
extension of hem, n) is convolved over an N x N region with ul(m, n). The two-
dimensional DFT of h (m - m', n - ,n ')c for fixed m t n' is given by

N-IN-l N-l-m'N-l-n'
:z:
m=O n=(}
Lh(m-m',n-n')cw>vmHnl)=W}j','Hn'l) L
i=-m'
2:
)=-n'
h(i,j)c Wj4H j i)
N-IN-t
= WW'k+n'l) L L hem; n)W~mk+nI) (5.77)
m "" 0 n "" 0

= w~m'k+n'I)DFt{h(m, n)}N

where we have used (5.76). Taking the DFT of both sides of (5.75) and using the

preceding result, we obtain'
DFT{u2(m, n)}N = DFT{h(m, n)}NDrl{ul(m. n)}N (5.78)
From this and the fast transform property (page 142), it follows that an N x N
circular convolution CjlfJ. be performed in O(N 2 10gzN) operations. This property is
also useful in calculafing two-dimensional convolutions such as
M-I M-I

x3(m, n) = L L x2(m - m', n - n')xI(m', n ') , (5.79)


m' =On'=O

where xI(m, n) and x2(m, n) are assumed to be zero for m, n rt:. [0, M -1]. The
region of support for the resultx3(m, n) is {O::5 m, n ::5 2M - 2}. LetN;:: 2M --I and
, .define N x N arrays
-( )' .:i rx2(m, n), O::5m,n::5M-l
h m.n
' =l 0, otherwise
(5.80)


•'(' ., ).:i xI(m, n), 0::5 m, n ::5 M - 1
UI m, n = (5.81)
0, otherwise
'We denote DFT{x(m. n)},y as the two-dimensional DFT of an N x N array x(m, n),Osm, n s N-1.

Sec. 5..5 The Two-Dimensional OFT 149


Evaluating the circular convolution of Ji(m, n) and iit(m, n) according to (5.75), it


can be seen with the aid of Fig. 5.9 that .
xJ(m,n)=u2(m,n), 0s,m,n'52M - 2 ,(5.82)
This means the two-dimensional linear convolution of (5.79) can be performed in
O(N2 log2N) operations.

Block circulant operations. Dividing both sides of (5.77) by N and using


the definition of Kronecker product, we obtain
(F@F)9C= ~(F@F) (5.83)
where 9Cis doubly circulant and flJ is diagonal whose elements are given by
[flJ].,V+l.kN+I.idk.I=DFT{h(m,n)}N' O'5k,I:::;;N-l (5.84)
Eqn. (5.83) can be written as
or (5.85)
that is, a doubly block circulant matrix is diagonalized by the two-dimensional
unitary DFT. From (5.84) and the fast transform property (page 142), we conclude
that a doubly block circulant matrix can be diagonalized in O(N 2 logzN) opera-
. tions. The eigenvalues of 9C, given ,,). the two-dimensional DFT of hem, n), are the
same as operating NET on the first column of 9C. This is because the elements of the
first column of 9C are the elements h (m, n) mapped by lexicographic ordering.

Block Toeplitz operations. Our discussion on linear convolution implies.


that any doubly block Toeplitz matrix operation can be imbedded into a double'
block circulant operation, which, in turn, can be implemented using the two-
• dimensional unitary DFT.

5.6 THE COSINE TRANSFORM



The N x N cosine transform matrix C = {c(k, n)}, also called the Jii§.crete cosine

JrmJ§{omJ (DCT), is defined as
. 1
, k =0, Q:::;;n '5N-1
\IN
ctk, n) = fi 7I'(2n + 1)k (5.86)
ca s l:::;;ksN-l,O:::;;n:::;;N-l'
'rii 2N '.
The one-dimensional DCT of a sequence {u(n), 0:::;; n :::;; N - I} is defined as
N-I [7I'(2n + l)k]
v(k)=n(k)n~/(n)cos 2N ' O:::;;k<N ..... 1 (5.87)

where

nCO) =
A fi
'.IN' .o.(k)~ :.J~ for 1 <k:::;;N-l , (5.88)

. 150 • Image Transforms Chap. 5



822

254

65

-503

1 50 160 150 200 250 • •


Figure 5.10 Cosine transform of the im-
'" '" k age scan line shown in Fig. 5.4.

The inverse transformation is given by •

N-l [1r(2n + I)k]-'


u(n) = k~O a(k)v(k) cos 2N ' OSnsN-l (5.89)

,

The basis vectors of the 8 x 8 DeT are shown in Fig. 5.1. Figure 5.10 shows the
cosine transform of the image scan line shown in Fig. 5.4. Note that many transform
coefficients are small, that is, most of the energy of the data is packed in a few
transform coefficients.
The two-dimensional cosine transform pair is obtained by substituting
A = A* = C in (5.11) and (5.12). The basis images of the 8 x 8 two-dimensional
cosine transform are shown in Fig. 5.2. Figure 5.11 shows examples of the cosine
transform of different images.

Properties of the Cosine Transform •
!
,-

I. The cosine transform is real and orthogonal, that is, ,


C=C· ::>C- 1 = C T (5.90)
2. The cosine transform is not the real part of the unitary DFT. This can be seen
by inspection of C and the DFTmatrix F. (Also see Problem 5.13.) However,
the cosine transform of a sequence is related to .the DFT of its symmetric
extension (see Problem 5.16).

$eo. 5..6 The CoSine Transform 151


.f - , ', ,


.I - ::"-'-O;""'.~." ?":--~·,.~'-~1·<.'ti; '. ',\ .. "..,.

.r)
,
.
--
\Ii>
- "',
:"_,

• •

.. .' I
(a) Cosine transform examples of monochrome irn- (b) Cosine transform examples of binary images.
ages; ,
Figure 5.11

3. The cosine transform is a fast transform. The cosine transform of a vector of N


elements can be calculated in O(N log2N) operations via an N-point FFf
.[19]. To show this w.e define a new sequence lien) by reordering the even and
odd elements of u(n) as
u(n) = u(2n)
, (5.91)
u(N - n -1) = u(2n + 1)

Changing the index of summation in the second term to n' = N - n - 1 and


combining terms, we obtain .' .
'. .'/-1 _ [Tr(4n + l)k]
v(k) = a(k) .~o u(n) cos 2N _.:
(5.92)


152 Image Transforms

which proves the previously stated result. For inverse cosine transform we.
write (5.89) for even data points as
N-l
u(2n) = u(2n) ~ ReL [a(k)v(k)e j" kI2A] e j2" nklN ,
. )c;= D
(5.93)


O~n ~(~)-1
The odd data points are obtained by noting that

u(2n +1) =u[2(N -l-n»), O~n:S; (~) -1 (5.94)

Therefore, if we calculate the N -point inverse FFT of the sequence


a(k)v(k) exp(j-rrk/2N), we can also obtain the inverse DCT in O(N logN)
operations. Direct algorithms that do not require FFT as an intermediate step,
so that complex arithmetic is avoided, are also possible [18}. The computa-
tional complexity of the direct as well as the FFT based methods is about the
same.

4. The cosine transform has excellent energy compaction for highly correlated
data. This is due to the following properties.
5. The basis vectors.of the cosine transform (that is, rows of C) are the eigen-
vectors of the symmetric tridiagonal matrix Qc, defined as .
1- et -et 0

(5.95)
Qc= -a
1~
1 -a.
o
-a 1- Ct
The proof is left as an exercise.

• 6. The N x N cosine transform is very close to the KL transform of a first-order
stationary Markov sequence of length N whose covariance matrix is given by
(2.68) when the correlation parameter p is close to 1. The reason is that R- 1 is
a f?mmetric tridiagonal matrix, which for a scalar ~2 ~ (1.-. p2)/(1 + p2), and
a = p/(1 + p'2) satisfies the relation . . .
•• •

1-pa -a
o

1",
. 1 -a
o -a 1-pa

Sec. 5.6· The Cosine Transform . 153


This gives the approximation


,
1
/32R- '= Q< for p"" 1 (5.97) .
,

Hence the eigenvectors of R and the eigenvectors of Q<, that is, the cosine
transform, will be quite close. These aspects are considered in greater depth in
Section 5.12 on sinusoidal transforms.
rThis property of the cosine transform together with the fact that it is a fast
\ transform has made it a useful substitute for the KL transform of highly
l£orrelated first-order Markov sequences. '
"

5.7 THE SINE TRANSFORM


, '

The N x N sine transform matrix \{1 = {Ij/(k, n)}, also called the discrete sine.tz:aIJ§-
.f!!rm (DST), is defined as

r
2 . 1'i(k + 1)(n + 1) ,
ljI(k, n) = N+ i sin- N +1 , O s k, n s N - 1 (5.98)

The sine transform pair of one-dimensional sequences is defined as


= [2 -N.... l ( ) .;. 1'i(k + l)(n + 1) O<k <'N -I'
v (k) Yl\~ 1 k U n ".•1 N+1 ~ (5.99)
" + "~O .
r-z-
N-!, • 'li(k + 1)(11 + 1)
u(n) = v!f+i :-0
v(k) sin- N +1 .' Osn<N-l (5.100)

The two-dimensional sine transform pair for N X N images is obtained by


substituting A = A * = AT = \{1 in (5.11) and (5.12). The basis vectors and the basis
• images of the sine transform are shown in Figs. 5.1 and 5.2. Figure 5.12 shows the
sine transform of a 255 x 255 image. Once again it is seen that a large fraction of the
total energy is concentrated in a few transform coefficients.

Properties of the Sine Transform •

1. The sine transform is real, symmetric, and orthogonal, that is,


\{1~ = \{1:: 'itT = \{1-1 (5.101)

Thus, the forward and inverse sine transforms are identical.


Figure 5.12 Sine transform of II 2:5:5 x


. 255 portion of the 2:56 x 256 image shown
in Fig. 5.6a.

154 Ima~e Transforms Chap.5 .



2. The sine transform is not the imaginary part of the unitary DFT. The sine
transform of a sequence is related to .the DFT of its antisymrnetric extension
(see; Problem 5.16). .
3. The sine transform is a fast transform. The sine transform (or its inverse) of a
vector of N elements can be calculated in O(N log2N) operations via a
7(N + I)-point FFT.
Typically this requires N + 1 = 2P, that is, the fast sine transform is usually
defined forN = 3,7, 15,31,63,255, .... Fast sine transform algorithms that·
.do not require complex arithmetic (or the FFT) are also possible. In fact, these
algorithms are somewhat faster than the FFT and the fast cosine transform
algorithms [20].
4. The basis vectors of the sine transform are the eigenvectors of the symmetric
tridiagonal Toeplitz matrix .
1 -a <,
0
Q= -a -a (5.102)
o"
.
-a 1
S•. The sine transform is close to the KL transform of first order: stationary
Markov sequences, whose covariance matrix is given in (2.68), when the
.correlation parameter p lies in the interval.(~0.5,O.5). In general}t has very
: good to excellent energy compaction property for images.
6. The sine transform leads to a fast KL transform algorithm for Markov se-
quences, whose boundary values are given. This makes it useful in many
image processing problems. Details are considered in g~eater depth in
Chapter 6 (Sections 6.5 and 6.9) and Chapter 11 (Section 11.5). .

,

5.8 THE HADAMARD TRANSFORM

Unlike the previously discussed transforms, the elements of the basis vectors of the
Hadamard transform take only the binary values e I and are, therefore, well suited
for digital signal processing. The Hadamard transform matrices, Hn , are N x N
matrices, where N ~ 2 , n = 1,2,3. These can be easily generated by the core
n
• •
matrix

H1= ~'G -i) •I


(5.103)
.

and the Kronecker product recursion

Hn=Hn-l®HI=Hl®H~_I=~ H
,
n- I
2 Hn -
(5.104)
1 •

As an example, for n = 3, the.Hadamard matrix becomes


• H 3 = H 1®H2· • (5.105)
H 2=H1®H1 (5.106)

,
Sec. 5.8 TheHadamard Transform 155

, which gives

,..
Sequency

r
1 1 1. 1
1 -1 1 -1

I 1
1 -
' 1
1 1 11
1 :'-11
0
7
I .s,

1 1 -1 -1 I 1 1 -1 -1/ 3
oc-<

1
,1 -1 ..,1' 1 I 1 -1 -1 1' 4
H3 =v'8 :::....,--_..:-_+------ (5.107)
1 1:' :~ 1 1-1 -1 -1 -1
1 - •
1
1 -1.1-1 1-1 1 -1 1 6
1 1 -1 -1 1-1 -1 1 1 2
1 -1 ~r 1 •1-1 1 1 -1 5

The basis vectors of the Hadamard transform can also be .zenera~ed by samf!ling ~
.slass of functions called the WalshJlIrl'lct,igns. These functions also take only the
binary values ±1 and form a complete orthonormal basis for square integrable
functions. For this reason the Hadamard transform just defined is also called the
Walsh-Hadamard transform. "
The number of zero crossings of a Walsh function or !he number of transitions
in a basis vector of the Hadamardtransform iscal1edJts s.eq.u.euCJi. Recall thatfor
sinusoidal signals, frequency can be defined in terms of the zero crossings. In the
Hadamard matrix generated via (5.104), the row vectors are not sequency ordered.
The existing sequency order of these vectors is called the Hadamard order. The
Hadamard transform of an N x 1 vector uis written as
v=Hu (5.108)
and the inverse transform is given by
u=Hv (5.109) ,
,where H ~ Hn .n = log, N. In series form the transform pair becomes
N-l
v(k)::= 1 £..,
'" u(m)(_l)b(k.ml, OsksN-l (5.110)
~ VNm~o ~

. N-l
u(m)= 1 2: v(k)(-l)b(k.ml, OSmsN-l (5.111)
VNk~O ,
where
n- I

b(k".1)= Lkimi; (5.112)


i =: 0

and {k j } , {mi} are the binary representations of k and m, respectively, that is,
1k
k=ko+2k1 + ' " +2n - n_ 1

The two-dimensional Hadamard transform pair for N X N images is obtained by


substituting A::= A* = AT ::= H in (5.l1jkahd (5.12). The basis vectors and the basis

156 Image Transforms



-
~,,\,..'\\\\\\\::\li! ""1/11'','//,<4. -- ,-,- ".""

~
_
" -, ,~).\\" t
~,:;:-._' ! -
~-,. - .-., -. ,.

-- -,--
~>-
<: •
_i

~
- -, "
.. (
''-
'-
_., - .. ..' .
I
,
- - -- --
. -'-

.';: .._ , " "


"-

1

, , .

q - -
jp~

~
.. , ~,.
)

"

- ~",
,t I•
I·-
;-
*
~. 1O~_"',4_ ••, 0.~~··_'_·~ __ '_.' ,', -'.. ' _._.,0""",_~

(al Hadamard transforms of monochrome images, (b) Hadamard transforms of binary images.

Figura 5.13 Examples of Hadamard transforms. .

images of the Hadamard transform are shown in Figs. 5.1 and 5.2. Examples of
two-dimensional Hadamard transforms of images are shown in Fig. 5.13.

Properties of the Hadamard Transform

1. The Hadamard transform H is re~l, ~yml!!etrjc, and.or~.J:t~, that is, t ...l l . _ •

H = II' = H T = Ir· .
(5.114)
2. The Hadamard transform is a fast transform. The one-dimensional trans-.
formation of (5.108) can be implemented in O(N logzN) additions and sub-
. tractions. ._' . .

Since the Hadamard transform contains only ±1 values, no multiplica-
tions are required in the transform calculations. Moreover, the number of
additions or subtractions required can be reduced from N Z to about N 10& N.
This is due to the fact that lin can be written as a product of n sparse matrices,
that is,
n =lo~N (5.115)
where. , \,-

.. ,.0'. ,
t'2 rows ' ,

, 1" 1.
1 0 •
,
0 0 1 .. 1 0 0

••
• •• ••
•• • •
0 0 1 1

------------- •
• • •
1 (5.116)
1 -1 0
0 0 1
0
-1 r-
N'

•• ••

••

••
• , '2 rows
0 0 • • • 1 -1 t
Sec. 5.8 The Hadamard Transform 157

- .
Since H contains only two nonzero terms per row, the transformation
_ _ 'W ".., •

V = H~ u = HH . , . Uu,
~-, ....
.n=logzN (5.117)
n terms
. - -
can be accomplished by operating H n times on u, Due to the structure of H
-
only N additions or subtractions are required each time H operates on a
vector, giving a total of Nn = N log:: N additions or subtractions.
(3. The natural order of the Hadamard transform coefficients turns out to be
equal to the bit reversed gray code representation of its sequency s. If the,
sequency s has the binary representation bnbn- 1 ••• b, and if the correspond-
ing gray code is gngn-l" .gJ, then the bit-reversed representationg-ga . . .g;
gives the natural order. Table 5.1 shows the conversion. of sequency s to
• natural order h, and vice versa, for N = 8. In general,
gk= b k (f1 bk H , ,,= 1, ... ,n-l ,
• gn=bn (5.118)

,
and

gk= h.- k + 1
bk ' gk(f1b k+ 1, k =n -1, .. ,,1 (5.119)
bn=g.
give the forward and reverse conversion formulas for the sequency and natural
ordering.
. 4. The Hadamard transform has good to very goo,Lener~
highly correlated images. let {u (n), 0 S n S N - I] be a stationary random
--=-.
....=--_--~---'=-.-

158 Image Transforms' Chap- 5


sequence with autocorrelation r(n),Ostt -s N -1. The~9f the ex-
pected energy packed in the first N/2! sequency ordered Hadamard transform
coefficients is given by (23] .

(l-~r(k)
1
1 +/I
k~l 2 reO)
(5.120)
2J

a
,R={r(m-n)} . (5.I21)

where Vi are the first NI]} sequency ordered elements Di . Note that the V k
are simply the meaJ;l~lus:s
.
.. . IIYh. The .'
significance of this result is that (; (NI2!) depends on the first 21 auto-
correlations only. For j = 1, the fractional energy packed in the first NIZ
scquency ordered coefficients win be (1 + r(I)/r(O)/2 and depends only upon
the one-step correlation p ~ r(I)/r(O). Thus for p = 0.95, 97.5% the total of
energy is concentrated in half of the transform coefficients.. The result of
(5.120) is useful in calculating the energy compaction efficiency of the
Hadamard transform. .
,
Example 5.3
Consider the covariance matrix R of (2.6&) for N '" 4. Using the definition of H 2 we
• • •
ootam . . ,

Sequeney
4 + 6p + 4p2 + 2p3 o ,

4 - 6p + 4p2 - 2p3 3
D= diagonal [H2 RHz] = ~
4 + 2p - 4p2 - 2p' 1
o 4 -2p - 4p2 + 2p3 2

This gives Do = Do , Dj = D, , D~ = D m = D
3, 1 and

(~g =t~o Dk=k(4+6p+4p2+2p'+4+2p-4p2-2p3)=~1;P)


, ) 1

as expected'according to (5.120).
I •

5.9 THE HAAR TRANSfORM

,.

Sec. 5.9 The Haar Transform 159


where Osp Sn - 1; q = 0, 1 for p = 0 and 1 S q s2P for p "" O. For example, when
N=4,nh~ .



k 0 1 2 3
• , ,
p 0 0 1 1



q!0 1 1 2

Representing k by (p, q), the Haar functions are defined as ,

A 1
ho(x) = ho.o(x) = VN ' x E [0, I}. (5.123a)

, •
q -1 q_1
~-<x< 2
..
' .. ",
. 1 2" 1 - 2"
h k (x) ~ hp• q (x) = VN q-2
S X
<!L (5.123b)
2P 2P •
0, otherwise for x E [0, 1]

The Haar transform is obtained by letting x take discrete values at mIN, m = 0,


1, ... , N - 1. For N = 8, the Haar transform is given by
• •

Sequency
1 1 1. 1 1 1 1 1 0 •
1 1 1 1 -1 -1 -1 -1 1
v'2 v'2 - v'2 - v'2 0 0 0 2
1 0 0 0 o v'2 v'2 - v'2 - v'2° 2
(5.124)
Hr=VS 2 -2 0 0 0 0 0 0 2
0 0 2 -2 0 0 0 0 2
0 0 0 0 2 -2 0 0 2
0 0 0 0 0 0 2 -2 2

The basis vectors and the basis images of the Haar transform are shown in Figs. 5.1
and 5.2. An example of the Haar transform of an image is shown in Fig. 5.14. From
thestructure of Hr (see (5. 124)J we see that the Haar transform takes differences of
the samples or differences of local averages of the samples of the input vector.
Hence the two-dimensional Haar transform coefficientsy(k,l), except for
k = l = 0, are the differences along rows and columns of the local averages of pixels
in the image. 'These are manifested as several "edge extractions" of the original
image, as is evident from Fig. 5.14.
Although SOn;1e work has. been done for using the Haar transform in image
data compression problems, its full potential in feature extraction and image anal-
ysis problems has not been determined.

160 Image Transforms Chap- 5


" •, ,,-," ;,j-"""'- ~t,F."lf"V ;,('~-ft'l"1f." '"+J>. ."J_,.l. --" <m.m" . . "."
f',~:; :,-:<.~:~~:.~y.:'~ ,'; " ".- ·A1-i)l)1t.,
.,< "" '., ",/,/ ,," 0, -::-.',.: "
'. i "_,•• _-'t~,, ,''',,'
'f" .- .• " " _. , ' •

~::~,:~,~:':;<~~_;1,':;,~ ;:'.:: i, . r
,
'C'
o
{.
'-'.,1""i_>'~_''''

.
~'1f'~)-.. -,,' , -,
::,,)''''i'';,': . -
".
e',

~'"-'
,~ , ;,'- ., "
"""'-'/"
r::;~·;/~,;;,.·,,"
"._ 'F . ~,
,,__ ,
",_,
v"-""
,.-., ''''''''~' ,"
,,"
):-1..'''''
,.-.,-
r,; -'"",',
.
.
",.
~,::" '~', . ,,'

ft
~:\, 'j
r,
"". ','
~'. ~
I-.
~.

\'
'~

t.\~""" . ,~-""_, . ,__ "'~,~~= ,.,,,,,~,,,,,~",_ ,,,,_.""''''',,",,,.. ,,_,.,.,.,_~ __ .,,~,,,,~-.,,§,. ~


_~ ~._"_G.~ki
j

Figure 5.14 Haar transform of the 256 x 256 Figure 5.15 Slant transform of the 256 x 256
image shown in Fig. 5.6a. image shown in Fig. 5.6a.

Properties of the Haar Transform

1. The Haar transform is real and orthogonal. Therefore,



• Hr=Hr*
Hr- 1 = HrT (5.125)
2. The Haar transform is a very fast transform. On an N x 1 vector it can be
implemented in 0 (N) operations.
3. The basis vectors of the Haar matrix are sequency ordered.
4. The Haar transform has poor energy compaction for images.

5.10 THE SLANT TRANSFORM

The N x N Slant transform matrices are defined by the recursion



101 1101 ""1 I
101 lO l l
i: I
an I -an i; I I
---r---r---r-- S.-II, 0
O. I I(NI2)-2 I ,0 1 I(NI2)-2 I
1
Sn = • ;;:; I . I I I (5.126)
v2 , ---~---~---~-~--~-
o 1 I I 0 -1 II

0 . I 0 I I
0 18n - 1

. -b. an]. I b; a.· I
. I
. ---J----I--.;...-~--- .1
[ o II I(NI2)-2!I 0 II -~NI2)-2 '. - II

Sec. 5.10 The Slant Transform 161


where N = 2n, 1M denotes an M x M identity matrix, and-

8 - 1 [1 1] (5.127)
I ~ v'2 1 - 1 ,
,

The parameters an and b; are defined qy the recursions ,

bn "" (1 + 4a;_lr l12


,
(5.128)
an ",, 2b. Q. - 1

which solve to give


112
n
, N "" 2 (5.129)

,

Using these formulas, the 4 x 4 Slant transformation matrix is obtained as


Sequency
1 1 1 1 0
3 1 -1 -3
1
1 v's v's v's v's
82 = - (5.130)
2 1 -1 -1 1 '7
,.
1 -3 3 -1•
• 3
v's \IS v's \IS • •

. Figure 5.1 shows the basis vectors of the 8 x 8 Slant transform. Figure 5.2 shows the
basis images of the 8 x 8 two dimensional Slant transform. Figure 5.15 shows the
Slant transform of a 256 x 256 image. .

Properties of the Slant Transform


1. The Slant transform is real and orthogonal. Therefore,
8 = S*, 8- 1 "" ST (5.131)
2. The Slant transform is a fast transform, which can be implemented in
O(N log2N) operations on an N x 1 vector.
3. It has very good to excellent energy compaction for images.
4. The basis vectors of the Slant transform matrix S are not sequency ordered for
n ~ 3. If 8n- 1 is sequency ordered, the ith row sequency of Sn is given as
fuUows. . .
• •
sequency=O ,

i ee 1, sequency = 1

2si S!!!.-l, ,sequency""


" {2iU'+ 1, i = even
, 2, i = odd'

. N •
• •

1=- sequency=2
2'

162 Image Transforms Chap. 5


sequency = 3

N . Z(i -~) + 1, •
i = even
Z+2S.isN-l, sequency=
2(i -~), i = odd

5.11 THE Kl TRANSFORM .'

The KL transform was originally introduced as a series expansion for continuous


random processes by Karhunen [27] and Loeve {28]. For random sequences Hotel-
ling [26] first studied what was called a method of principal components, which is
the discrete equivalent of the KL series expansion. Consequently, the KL transform
is also called the Hotelling transform or the method of principal components.
For a real N x 1 random vector u, the basis vectors of the KL transform (see
Section 2.9) are given by the orthonormalized eigenvectors of its autocorrelation
matrix R, that is,

R~k= Ak~k' OsksN-l (5.132)


The KL transform of u is defined as
• v== W*T U (5.133)
• --~
and the inverse transform is
/~
iIll!P . h'b-'-o ' ...

-
U = 4>v == 2: V(k)~k (5.134) .
k=O

where ~k is the kth column of W. From (2.44) we know 4> reduces R to its diagonal '.
form, that is,
(5.135)
We often work with the covariance matrix rather than the autocorrelation
matrix. With f.t ~ E[u], then .
Ro a. cov[u] ~ E[(u - f.t)(u - f.tYl = E[uu T ] - f.tf.tT = R -"f.tf.tT (5.136) •

If the vectm:..p..js known, then the eigenmatri~J?LIk...c:let~rrojll~Jhe-KL


transform of the zero mean random process u - f.t. In general, the KL transform of
, u and u - f.t need not be identicar: - .

Note that whereas the image transforms considered earlier were functionally
independent of the data, !he KL transform depen~s on the (~CI?,nd-or4~£S.
of the data-. . . "

)i;"':lmple 5.4 (KL Transform of Markov-! Sequences)

. The covariance matrix of a zero mean Markov sequence of N elements is given by
(2.68). Its eigenvalues hk and the ~igenvectors ~k are given by . .
1- p2 •
hk =-;.--::----...:'---'"
1-2p COSWk+ p2

Sec. 6.11 The Kl Transform 163



+*(m) ='$(m, k) (5.l37)
• •

= (N;A./'\in w.(m+l·
N;l)+
(k~l)'lT, O<m,ksN-1

where the {w.} are the positive roots of the equation .


(1 -Ii) sin w
tan(N w) =' - N even (5 138)
cosw - 2p + p' cos m' .
A similar result holds when N is odd. This is a transcendental equation that gives rise to
nonharmonic sinusoids $k(m). Figure 5.1 shows the basis vectors of this 8 x 8 KL
transform for p = 0.95. Note the basis vectors of the KLT and the DCT are quite
similar. Because $. (m) are nonharmonic.ji fa~m jor this transform does qot
exist.
>s
Also note, the KL '
transform matrix •
is <ItT =' {$(k, m)}.

Example 5.S .

, Since the unitary DFT reduces any circulant matrix to Ii diagonal form, it is
!r<llU ue' taeorrelaljQll.IDatrices~lhat~,for
. .odi quenees-.
The DCT is the KL transform of a random sequence whose autocorrelation
matrix R commutes with Qc of (5.95) (that is, if RQc =' QcR). Similarly, the DST is thtl
KL transform of all random sequences whose autocorrelation matrices commute with
Q of (5.102). . .


KL TransfOrm of Images'
If an N x N image u(m, n)is re,eresented by a random field whose autocorrelation
function is given by .. ' '" ",." "..... '"' ..
E[u(m,n)u(m ', n')]=r(m,n;m', n'l, Osm, m', n, n' sN -1 (5.139)

then the basis images of the KL transform arf the orthonormalizedsenfunetions .


l.\Ik.1 (m , n) obtained by solving ' , ' 4·

N-I N-l
L L r(m, n;m'; n ')l.\Ik,/(m I, n')
m:"'On'=()
(5.140)
= Ak,ll.\Ik,l(m, n), 0 s k , 1 sN -1, Osm, n sN-1
In matrix notation this can be written as ,,'
-,
inlftj = A,"",it i = 0, , .. , N 2 - 1 (5.141)
where 1ft! is an N2 x 1 vector representation of l.\I.,1 (m, n) and in is an N Z x N Z
Z
autocorrelation matrix of the image mapped into an N x 1 vector (,p. Thus
in = £[",U'1] (5.142)
If f!t is s~arabl~, then the N 2 x N2 matrix 'It whose columns are {""J 'beComes
.separable (see Table 2.7). For example, let

r(m, 1'1; m", 1'1') = r\ (m, m ')r2 (It, n ') . (5.143)

164 . Image Trarisfo~ms ChaP. 5



(5.144)

;. ...
,.,.... , ,
(5.145)
where
«I>R«I>*1'
/11 = Aj ,. 'J' = 1, 2 (5.146)
and the KL transform of Ib is
o- = 'Put< r Ie' = [«I>i r ® 4J! 1]1e' (5.147)
For row-ordered vectors this is equivalent to
J:$i2Y(l;" . (5.148)
and the inverse KL transform is
u = «1>1 V(lI
,- ~- . 'i
(5.149)
The"advantage in modelinf the Imas . function-iS-tbat
instead of solving the N x N 2 matrixcigenvalue pmblem of (5.141~ only two
N x N matrix eigenvalue.prohlems.ot (5. 146} need to be solved, Since an N x N
matrix eigenvalue problem requires O(N) computations, the reduction in dimen-
sionality achieved by the separable model is 0 (N 6YO
(N ) == o (N ) , which is very
J J

significant. Also, the transformation calculations of (5.148) and (5.149) require 2NJ
operations compared to N 4 operations required for '\(1'*1' lb. .
,

Example 5.6
Consider the separable covariance function for a zero mean random field
r(m, n; m', n ') = plm-m'l pln-n'l (5.150)
This gives I7l = R ® R, where R is given by (2.68). The eigenvectors of R are given by
4>. in Example 5.4. Hence .=~®~ and the KL transform matrix is ~T®4tT.
Figure 5.2 shows the basis images of this 8 x 8 two-dimensional KL transform for
p = 0.95.

Properties of the KL Transform


The KL transform has many desirable properties, which make it optimal in many
. signal processing applications. Some of these properties are discussed here. For
,~implicity~we assume u ~as zero mean and a pO:.,::s.:..:it::..,iv.::.e-:d::.;e=fl::.;·n,iteCllva.Iiance.mJltrix_R.
.
,
t,· -
.
.Decorrelation. The KL transform coefficients {v (k), k = 0, ... ,N - I} are
uncorrelated and have zero mean, that is, I , '

E[v(k)] =0
,
,

E(v (k)v* (I)] = Aka(k-l) (5.151)



The proof follows directly from (5.133) and (5.135), since
E(vv*1] ~«I>"1' E[uu1]lP = ... rR«I> = A = diagonal (5.152)

. Sec. 5.11 The KL Transform , 165


which implies the latter relation in (5.151). It should be noted that $ is not a unique
matrix with respect to this property. There.couldbe many matrices (unitary and
nonunitary) that would decorrelate the transformed sequence. For example, a lower
triangular matrix ~ could be found that satisfies (5.152).
Example 5.7
The covariance matrix R of (2.68) is diagonalized by the lower triangular matrix

1- p2
1 o 1- fl2 o
-:-p :}VRL=
",-. ~D (5.153)
o <,
-p 1
o 1- p2
1
Hence the transformation v = LT u, will cause the sequence v(k) to be uncorrelated.
Comparing with Example 5.4. we see that L 4ofl. Moreover, L is not unitary and the .
diagonal elements of D are not the eigenvalues of R.

Basis restriction mean square error. .Consider the operations in Fig. 5.16.
The vector u is first transformed to v. The elements of ware chosen to be the first m
elements of v and zeros elsewhere. Finally, w is transformed to z, A and Bare
N x N matrices and I,. is a matrix with is along the first m diagonal terms and zeros
elsewhere. Hence
. w(k) = {v(k), Os:kS:m-l
. 0, k'2m
(5.154)

Therefore, whereas u and v are vectors in an N-dimensional vector space, w is a


vector restricted to an m s;; N-dimensional subspace. The average mean square
error between the sequences u(n) and zen) is defined as
1 N~l 1
Jm~NE ~olu(n)-z(nW =N Tr[E{(u-z)(u-z)*T}] . (5.155) . •

This quantity is called the basis restriction error. It is desired to find the matrices A
and B such that 1m is minimized for each and every value of me [I, N]. This
minimum is achieved by the KL transform of u,

~J:!ID!S.L.

The error I,,, in (5.155) is minimum when
\

A = 4l*T, B = e, AB =: I . . (5.156)

where the columns of ~ are arranged according to the' decreasing order of the
eigenvalues of R. . c .

Proof. From Fig. 5.16, we have .


v=Au, w = Imv, and z= Bw (5.157)

u A v J". w B z
NXN 1SmSN NXN

Flpre 5.16 KL transform basis restriction•.

166 Image Transforms Chap.S



Using these we can rewrite (5.155) as

J; "" ~ Tr[(I- BlmA)R(I - BImA}*T]

To minimize 1,. we first differentiate it with respect to the elements of A and set the
result to zero [see Problem 2.15 for the differentiation rules). This gives

I",BT(I - BI",A)*R = 0 (5. 158}
which yields
1
J"'-="N Tr[(I-BI",A)R] (5.159)

I",B*r = r, B*TBI", A (5.160)


At m = N, the minimum value of IN must be zero, wbich requires
• • - • <

I-BA-=O or B=A- 1 •
(5.161)

Using this in (5.160) and rearranging terms, we obtain


lsmsN (5. 162}
For (5.1fi2) to be true for every m, it is necessary that B*TB be diagonal. Since •

B = A-I, it is easy to see that (5.160) remains invariant if B is replaced by DB or


BD, where D is a diagonal matrix. Hence, without Joss of generality we can
normalize B so that B*TB -= I, that is, B is a unitary matrix. Therefore, A is also
unitary and B = A .. T. This gives .

(5.163)
• v
\

Since R is fixed, J", is minimized if the quantity •

",-I
J",~Tr(ImARA"'I) =: 2.: al'Rat (5.164)
k=O .

is maximized where 8f is the kth row of A. Since A 18 unitary,


• •
ala; = 1 (5.165)
To maximizeJ",8ubject to (5.165), we form the Lagrangian
",-I ",-1

t; -= k=O
2.: aIRa; + L Ak(l - aIan
k-O
,. (5.166) • .
,
and differentiate it with respect to 3j. The result gives a necessary condition

• " (5.167)
• •

where at. are orthonormalized eigenvectors of R. This yields •


",-I

J",= LAk . (5.168}'
k-O,

Sec. 5.11 The KL Transform 167.


which is maximized if {at, 0:5 j :5 m - I} correspond 10 the largest m eigenvalues
of R. Because i m must be maximized for every m, it is necessary to arrange
~o 2: At 2: A2 2: .•. 2: AN-t . Then aT, the rows of A, are the conjugate transpose of the
eigenvectors ofR, that is, A is the KL transform of u, •

Distribution of variances. A m i t a r y transfw:rnations v = Au,


.the KL
Define transform ~*T~s t&:.l11i!&@lllllll~erage
. .. energy in m :5 N samples
. .. . . . . .
~
of v,
~..........

ai ~ EUv(k)j2], .(T~ 2: 0'1' .. 2: O'1r-t


m-l (5.169)
S",(A) ~ 2: 0'1
k=O
Then for any fixed m 8 [1, N] ,
• •

Sm(~*1) 2: Sm(A) (5.170)


Proof. Note that
m-l
SmeAl == L:
k-O
(ARA *1)k.k

==Tr(ImA*TRA)
,
=Jm
which, we know from the last property [see (5.164)], is maximized when Ais the KI; .
transform. Since 0'1 = 't\k when A = ~*T, from (5.168)
m-l m-l
L: ~k 2: L: 0'1c, (5.1n)
k=O k=O·

Threshold representation. The KL transform also minimizes E[m], the


expected number of transform coefficients required, so that their energy just
Aceeds a,prescribed!hreshold (see ~ro~lem.;~6 and [33]).
. ::t.fJG
/.;.' t1dt/L IINi"iJJlrr 0 ~ .
A fast KL transform. In application Of the KL transform to images, there
are dimensionality difficulties. The KL transform depends on the statistics as well as
the size of the image and, in general, the basis vectors are not known analytically.
After the transform matrix has been computed, the operations for performing the
transformation are quite large for images.
It has been shown that certain statistical image models yield a fast KL trans-
form algorithm as an alternative to the conventional KL transform for images. It is
based on a stocliastic decomposition of an image as a sum of two random sequences.
The first random sequence is such that its KL transform is a fast transform and the
second sequence, called the boundary response, depends only on information at the
boundary points of the image. For details see Sections 6.5 and 6.9.
• ,

.The rate-distortion function. Suppose a random vector u is unitary trans-


formed to v and transmitted over a communication channel (Fig. 5.17). Let v' and u' .

168 Image Transforms Ch.\lP·£>


u A v V u
Cnannel A,r
(Unitary) •


Figure 5.17 Unitary transform data transmission. Each element of v is ended
independently.

be the reproduced values ofv and u, respectively. Further, assume that u, Y. y', and
_.
u' are Gaussian. The average distortion
.._" in u is ; .. .-"

1 .
D = NE(u - U·)*T(U - u·)] (5.172)

Since A is unitary and u = A" TV and u' == A" T v' • we have

D :::: 1.. E[(v - V-)*T AA*T(V - v»)


N (5.173)
1 1
== NE(v - V)*T(V - v')] ==jVE(8v*T8v)

where Bv == v - V represents the error in the reproduction of v, From the preceding,


D is invariant under all unitary transformations. The rate-distortion function is now
obtained, following Section 2.13, as
N-t 2
• IX' 1 \Tk
R ==-N LJ max 0, 210~9 (5.174)
k=O •
1 N-l •

D ==-N 2: min[I:l, \Ti] (5.175)


k-O .

where •
\
c
\T1, =E[lv(kWl == [ARA*T]k.k (5.176)
depend on the transform A. Therefore, the rate
R ==R(A) (5.177)
also depends on A. For each fixed D, the KL transform achieves the minimum
1¢ . ", ...._ _ M •• . - . _ • - - " ,
rate.
among
, all unitary transforms, that is,
. '-,~
. . .
. , "

R(lI»*T) SR(A) (5.178)



1'
This property is discussed further in Chapter 11, on transform coding.

,,
Example5 . 8 '
, " .
Consider a 2 x 1 vector u, whose covariance matrix is '
,
, .
R= 1 P Ipl< 1
:p 1 ' , "

The KL transform is •

1 1 . ,
1 . ,-1

Sec. 5.11 . The Kl Transform 169


The transformation v = cfJu gives
EUv(Om==Ao=l+p, E{[v(lm=l-p

R('Il) =![ max (0, ! log 1; P) + max(o,! log 1


~ p)]
Compare this with the case when A == I (that is, u is transmitted), which gives
I1t = 111 == 1, arid .
. R(I)=A(-210g9]. ,
0<0
...
< 1
~,_.~'

Suppose we let 6 be small. say 6 < l-Ipl. Then it is easy to show that
R(lI» < R(I)
This means for a fixed level of distortion, the number of bits required to transmit the
KLT sequence would be less than those required for transmission of the original

sequence. •

100.00

10.00

• •

1.00 Sine

0.10

Slant


Cosine, Kl

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1S

Index k

Jilgure 5.18 Distribution of variances ofthe transform coefficients (in decreasing


order) of a stationary Markov sequence with N == 16, p =0.95 (see Example 5.9).

170 .Image Transforrne


where 0"1 have been arranged in decreasing order.
Figure 5.19 shows L; versus m for the various transforms. It is seen that the
cosine transform .performance is.indistinguishable from that of the KL transform for
p = 0.95. In general it seems possible to find a fast sinusoidal transform (that is. a
transform consisting of sine or cosine functions) as a good substitute for the KL
transform for different values of p as well' as for higher-order stationary random
sequences (see Section 5.12).

f The mean square performance of the various transforms also depends on the
dimension N of the transform. Such comparisons are made in Section ~.12, .


. Exampli., 5.10 (Performance of Transforms on Images)
The mean square error test of the last example can be extended to actual images. .
Consider an N x N image u (m, n) from which its mean is subtracted outto make it zero
mean. The transform coefficient variances are estimated as •
ci'(k., I) = Ellv(k, I)I~ s Iv(k, 1)12

Sec. 5.11 The KL Transform 171





30 ,
,

.'• •

25
Sine

20

.- OFT
-- -.
/I! ,
i

...~ w 15
~



10

. 5

eosine. f< L
,

I
Slant

o 1 2 ' 3 4 6· 1 8 9 10 ',11' 12 13 1415 16


Samples retained 1m)


FIgure 5.19 Performance of different unitary transforms with respect to basis,
restriction errors (Jnt) versus the number of basis (m) for a stationary Markov
sequence with N = 16, p = 0.95.

,

Figure 5.20 Zonal filters for 2: I, 4: 1.


S : I, 16: 1 sample re(luction; White arc.u
are passbands, dark areas are stopba.nd.s.

172 linage Transforms Chap~ lS



The image transform is filtered by a zonal mask (Fig. 5.20) such that only a fraction of
the transform coefficients are retained and the remaining ones are set to zero. Define
the normalized mean square error '

energy in stopband
=
total energy
k,/-O

Figure 5.21 shows an original image and the image obtained after cosine transform
zonal filtering to achieve various sample reduction ratios. Figure 5.22 shows the zonal
filtered images for different transforms at, a 4: 1 sample reduction ratio. Figure 5.23
shows the mean square error versus sample reduction ratio for different transforms.
Again we find the cosine transform to have the best performance.

,
,

.'IN·. """".__
. , .« •
,


'.•
(a~Original; (bI4: 1 sample reduction;


-
, '


,
_a"_'" it;l',¥;'«;'Jiki~',i4, ,g:" ....
(c) 8 : 1 sample reductlqn;

, .
td) 16 : 1 sample reduction.

Figure 5.21 Basis restriction zonal filtered images in rosine transform domain.


Sec. 5.11 '. The KL Transform 173
,,
• ..r •
;r1:~' •

, 1:<-
,,,-:
;j

J ,

- - At~Y'j,
i
<

la) Cosine; fbi sine;


-1f.."P<. f
" ?Z
.,,-
'-'.-'.
0\/
iA',
,,.,,
h,

'" '
'1'4

,,. d

.""
,..
"
-'

I,
':J
.. if'

•,
• "I
:,.~
I -,
-OJ'"
\ ~i 4
.'/
,
•. .- >

:,' ,
)
,
I
" • ...
,

'if ''i~i'{ L .::'"


Ie) Haar; If) Slant.
Figure 5.22 Basis ,restriction zonal filtering using different transforms with 4: I
sample reduction.


10

~ 6
w
<t.l
:;;
-e
w 5
-.E.
.!:!

~
0 •
Z 4

OFT
3
Sin. Haar


1 Cosl""
Figure 5.23 Performance comparison of
o L- ...L_ _..,....---1_--.,;_---J different transforms with respect to basis
16 8 4 2 restriction zonal filtering for 2S6 x 256
• •
Sample reduction ratio Images.

. .
5.12 A SINUSOIDAL FAMILY OF UNITARY TRANSFORMS

, - ,
This is a class of complete orthonormal sets of eigenvectors generated by the \
I
. parametric family of matrices whose structure is similar to that of R- [see (5.96)],

I-kIll -a k,a
0 ,
-a 1
J = J(k!> kz, k 3) = (5.180)
, •
1 -a • •
I 0
,
k,fX -a l-kza

Sec. 5.12 A Sinusoidal Familv of Unitary Transforms 175



-
• can be shown that the-basis vectors of previously discussed cosine, sine, and discrete
Fourier transforms are the eigenvectors of J(l,I,O), J(O,O,O), and J(I,I,-I), -
respectively. In fact, several other fast transforms whose basis vectors are sinusoids
can be generated for different combinations of k b k z, and k 3• For example, for
Os m, k s N -1, we obtain the following transforms:

(5.182)

,
(5.183)

Approximation to the KL Transform

The J matrices playa useful role in performance evaluation of the sinusoidal trans-
forms. For example, two sinusoidal transforms can be compared with the KL
transform by comparing corresponding J-matrix distances --
. A(k h k 2,k3) ~ IIJ(kt, k 2 , k 3) - J(p, p, 0)11 2 (5.184)

This measure can also explain the close performance of the DCT and the
KLT. Further, it can be shown that the DCT performs better than the sine trans-
form for 0.5 :5 P :5 1 and the sine transform performs better than the cosine for other
values of p. The J matrices are also useful in finding a fast sinusoidal transform
approximation to the KL transform of an arbitrary random sequence whose co-
variance matrix is A. If A commutes with a J matrix, that is, AJ =JA, then they will
have an identical set of eigenvectors. The best fast sinusoidal transform may be
chosen as the one whose corresponding J matrix minimizes the commuting distance
IIAJ - JAII2• Other uses of the J matrices are (1) finding fast algorithms for inversion
of banded Toeplitz matrices, (2) efficient calculation of transform coefficient vari-
• ances, which are needed in transform domain processing algorithms, and (3) estab-
lishing certain useful asymptotic properties of these transforms. For details see [34].

5.13 OUTER PRODUCT EXPANSION


AND SINGULAR VALUE DECOMPOSITION

In the foregoing transform theory, we considered an N x Ai image U to be a vector


in an NM-dimensional vector space. However, it is possible to represent any such
image in an r-dimensional subspace where, is the rank of the matrix U, .'
Let the image be real and M s N. The matrices UU T and UTU are non-
negative, symmetric and have the identical eigenvalues, {Am}. Since M sN, there
are at most ,sM nonzero eigenvalues. It is possible tofind r orthogonal, M x 1

118 •
Image Transforms Chap. 5

,
eigenvectors {cf>,J of UTU and r orthogonal N x 1 eigenvectors N'm} of UU T, that is,
UTUcf>m == Am cf>m, m =: 1, ~ .. , r (5.185)

UU T I/Im = Aml/lm, m = 1, ••• .r (5.186)
The matrix U 'has the representation
U == 'litA II2I;1)T • (5.187)
,
== 2: Vi: I/Im cf>~ (5.188)
m=l

where 'lit and 1;1) are N x rand M x r matrices whose mth columns are the vectors
'!1m and cf>m, respectively, and A 1/2 is an r x r diagonal matrix, defined as
~~,~ 0
~~
,~

. 0 '~ (5.189) .

Equation (5.188) is called the spectral representation, the outer product expansion,
or the singular value decomposition (SYD) of U. The nonzero eigenvalues (of
UTU), Am' are also called the singular values of U. If r < M, then the image contain-
ing NM samples can be represented by (M + N)r samples of the vectors
{A~4 IJIm ,A::'4 cf>m ; m =: 1, ... ,r}. .
Since 'lit and 1;1) have orthogonal columns, from (5.187) the SVD transform of
the image U is defined as ' .
(5.190)
which is a separable transform that diagonalizes the given image.The proof of
(5.188) is outlined in Problem 5.31. "

Properties of the SVD Transform

1. Once c1Jm ,m == 1, ... , r are known, the eigenvectors IJIm can be determined as
.:i 1 .
I/Im ==. ;:-UclJm, m = 1, ..• ,r (5.191)
vx, ,

It can be shown that I/Im are orthonormal eigenvectors of UU if c1Jm are the T

, orthonormal eigenvectors of UTU.



, 2. The SVD transform as defined by (5.190) is not a unitary transform. This is
because 'lit and 1;1) are rectangular matrices. However, we can include in 1;1) and
'lit additional orthogonal eigenvectors c1Jm and I/Im, which satisfy UclJm == 0,
m == r + 1, •.. ,M and U I/Im = 0, m == r + 1, ..•. ,N such that these matrices
T

are unitary and the unitary SVD transform is '. ';


• '(5.192)

Sec. 5.13· Outer Product Expansion and Singular Value Decomposition' 177
• 3. The image U, generatedby the partial sum

U. ~ 2: 0:: Ilfm $~ , .: k s r (5.193)

is the best least squares rank-k approximation of U' if Am are in decreasing


order of magnitude. For any k :5 r, the least squares error
M N
4=2: 2:lu(m,n)-u.(m,n)j2, k=1,2, ... .r (5.194)
/11-1,,=1 .

reduces to
(5.195)

Let L ~ NM. Note that we can always write a two-dimensional unitary transform
representation as ail outer product expansion in an L-dimensional,space, namely,
L
U=2:wI8Ibl (5.196)
1= I

where WI are scalars and III and b l are sequences of orthogonal basis vectors of
dimensions N x 1 and M x 1, respectively. The least squares error between U and
any partial sum .
(5.197)

is minimized for any k E [1, L] when the above expansion coincides with (5.193),
. -
that IS, when Uk = Uk.
This means the energy concentrated in the transform coefficients w{,l = 1, ... , k
is maximized by the Sl/D transform for the given image. Recall that the KL trans-
form, maximizes the average energy in a given number of transform coefficients, the
average being taken over the ensemble for which the autocorrelation function is
defined. Hence, on an image-to-image basis, the SVD. transform will concentrate
more energy in the same number of coefficients. But the SVD has to be calculated
for each image. On the other hand the KL transform needs to be calculated only
once for the whole image ensemble. Therefore, while one may be able to find a
. reasonable fast transform approximation of the KL transform, no such fast trans-
form substitute for the SVD is expected to exist.

Although applicable in image restoration and image data compression prob-
lems, the usefulness of SVD in such image processing problems is severely limited
because of large computational effort required for calculating the eigenvalues and
eigenvectors of large image matrices. However, the SVD is a fundamental result in
matrix theory that is useful in finding the generalized inverse of singular matrices
and in the analysis of several image processing problems.

Example 5.11 •

Let 1 2
U= 2 1
1 3·

178 Image Transforms. . Chap. S




The eigenvalues ofUTU are found to be Al == 18.06, A2 == 1.94, which give r = 2, and the
SVD transform of U is
A ' 12 - [4.25 0 ] _
- 0 1.39

The eigenvectors are found to be


0..8649J
.... '" [0.5019]
.'1'1
. 0.8649'
. .2 == [ -0.5019
(cOnlinued on page 180)
TABLE 5.3 Summary of Image Transforms

DFf/llnitaIy DFf Fast transform, most useful in digital signal processing, convolution,
digital filtering, analysis of circulant and Toeplitz systems. Requires
complex arithmetic. Has very good energy compaction for images.
Cosine' Fast transform, requires real operations, near optimal substitute for
the KL transform of highly correlated images. Useful in designing
transform coders and Wiener filters for images. Has excellent
energy compaction for images.
Sine About twice as fast as the fast cosine transform. symmetric, requires

real operations; yields fast KL transform algorithm which yields
recursive block processing algorithms, for coding, filtering, and so
on; useful in estimating performance bounds of many image
processing problems. Energy compaction for images is very good.
Hadamard Faster than sinusoidal transforms, since no multiplications are
. required; useful in digital hardware implementations of image
processing algorithms. Easy to simulate but difficult to analyze. •

Applications in image data compression, filtering, and design of


codes. Has good energy compaction for images. . •

Haar Very fast transform. Useful in feature extracton, image coding, and
image analysis problems. Energy compaction is fair.
Slant Fast transform. Has "image-like basis"; useful in image coding. Has \
,
very good energy compaction for images.
Karhunen-Loeve Is optimal in many ways; has no fast algorithm; useful in performance \
evaluation and for finding performance bounds. Useful for small
size vectors e.g., color multispectral or other feature vectors. Has
the best energy compaction in tbe mean square sense over an
ensemble.

FastKL Useful for designing fast, recursive-block processing techniques,
including adaptive techniques. Its performance is better than
independent block-by-block processing techniques. •

Sinusoidal transforms Many members have fast implementation, Useful in finding practical
substitutes for the I(L transform, analysis of Toeplitz systems, •
\
, mathematical modeling of signals. Energy compaction for the
optimum-fast transform is excellent.
SVD transform Best energy-packing efficiency [or any given image. Varies drastically
from image to lmage.has no fast algorithm or a reasonable fast
• transform substitute; useful in design of separable FIR filters,
finding least squares and minimum norm solutions of linear
equations. finding rank of large matrices, and so on. Potential •

image processing applications are in image restoration, power


spectrum estimation and data compression.

Sec. 5.13 Outer Product Expansion and Singufar Value Decomposition 179

From above +, is obtained via (5.191) to yield
1.120 1.94
V, = ~ +1 4Jf = 0.935 1.62
1.549 2.70
as the best least squares rank-1 approximation of V. Let us compare this with the two
dimensional cosine transform U, which is given by
v'Z v'Z v'Z 1 2 [1 1] 1 10v'Z -2v2
V"C;lUC~=.-l- v3 0 -v3 2 1 1 -1 = -v3 v3
v12 1 -2 1 1 ' 3 ' v12 -1 -5

It is easy to see that };2: v Z (k, I) = A, + Az. The energy concentrated in the K
k,J
samples ofSVD, k;_tAm, K = 1, 2, is greater than the energyconcentrated in any K
samples of the cosine transform coefficients (showl). . '

5.14 SUMMARY

In this chapter we have studied the theory of unitary transforms and their proper-
ties. Several unitary tranforms, OFT, cosine, sine, Hadamard, Haar, Slant, KL,
sinusoidal family, fast KL, and SVD, .were discussed. Table 5,3 summarizes the
various transforms and their applications.

PROBLEMS,'

5.1 For given P, Q show that the error o-i of (5.8) is minimized when the series coefficients
v(k, l) Me given by (5.3). Also show that the basis images must form a complete set for
0"; to be Zero for P .. Q .. N.
5.2 (Fast transforms and Kronecker separability) From (5.23) we see that the number of '
operations in implementing the matrix-vector product is reduced from 0 (N<) to
o (N ) if A is a Kronecker product. Apply this idea inductively to sbow that if .A is
3

M xMand

,A = A I@A2@ ... @Am
where Ate is n. x n., M = Uk= I n., then the transformation of (5.23) can be imple-
mented in O(M "i::. t nk), which equals nM log. M if n. = n. Many fast algorithms for
unitary matrices can be given this interpretation which was suggested by Good [9].
Transforms possessing this property are sometimes called Good transforms.
5.3 For the 2 x 2 transform A and the image V
t v3 1 U= 2 3
A=z -1 v3' 1 2
calculate tbe.transfonped image V and the ~ges.
-
180 Image Transforms Chap. $,
5.4 Consider the vector x and an orthogonal transform A
x= Xo A= co~a sin a
XI' , -Sin e cos (I
Let 11.0 and at denote the columns of AT (that is, the basis vectors of A). 'The trans-
, formation' y = Ax' can be written as Yo =;J;x,y, = aT x, ,Represent the vector x in
Cartesian coordinates on a plane. Show that the transform A is a rotation of the
coordinates by a and Yo and y, are the projections of:lt in the new coordinate system
c (see Fig. PH).

5.5 Prove that the magnitude of determinantof a unitary transform is unity. Also show
that all the eigenvalues of a unitary matrix have unity magnitude.
Show that the entropy of an N x 1 Gaussian random vector u with meanp, and
covariance R, given by
1""'H-(-U)-=-!:!';";2-10-g,-(-2'll'-e-IRu--11I~ €
,

is inv riant und it&); transformatio,p, •


5.7 ,Consider the zero mean random vector u with covariance Ru discussed in Example 5.2.
From the class of unitary transforms
A = cos (l sin 9
• -sin 6 cos 6 '
determine the value of a for which (a) the ayemge energy CQIllmesse,d .invlL is
, maximum and (b) the Com n ret<!ted.
5.8 Prove the two-dimensional energy conservati relation of (5.28).
5.9 (DFT and circulant matrices) "
- .
+k
a. Show that (5.60) follows directly from (5.56) if is chosen as the kth column of the- .
unitary DFT F. Now write (5.58) as a circulant matrix operation )11.2 = Hx,. Take
unitary DFT of both sides and apply (5.60) to prove the circular convolution
theorem, that is, (5.59). .
b. Using (5.60) show that the inverse of an N x N circulant matrix Can be obtained in
O(N 10gN) operations via the FFT by calculating the elements of its first column
as '
./
\

Chap. 5 '. Problems •


181
,
c, Show that the N x 1 vector Xz '" Tx, where T is N x N Toeplitz but not circulant,
can be evaluated in O(N logN) operations via the FIT...
5.10 Show that the N Z complex elen-ecrs v (k, I) of the unitary OFT of a real sequence
, {u(m, tt), () S m, n :;;; N -I} can be determined' from the knowledge of the partial
• sequence

{V(k, O),()s'k <~}, {v (k,~),()Sk <~}, ,


{V(k, 1),0 sk sN ~ 1,1 S I :$~-l}, (Neven)

which contains,only N 2 nonzero real elements, in general.


,,
S.U a. Find the eigenvalues of the 2 x 2 doubly block circulant matrix , .

1 2 3 4
2 '1 4 3.
gr=; 1--+---)
3 4 1 2
43 2 1

Write their convolution ;(3(m, n) =;xz(m, n)@x,(m, n) as a doubly block circulant


matrix operating on a vector of size 16 and calculate the result. Verify your result by-
performing the convolution directly..
5.12 Show that if an image {u (nI, n), 0 :$ m, n S N - I} is multiplied by the checkerboard
pattern (_1)m+n, then its unitary OFT is centered at (N/2, N 12). If the unitary OFT of
u(m, n) has its region of support as shown in Fig. P5.12, what would be the region of
support of the unitary OFT of (-l)"'+Ou(m, n)? Figure 5.6 shows the magnitude of
the unitary OFTs of an image u(m, n) and the image (-l)m+»u(m, n). This method
can be used for computing the unitary OFf whose origin is at the center of the image
matrix. The frequency increases as one moves away from the origin. ,
I< ..

0 -2
N
N - 1
\
0 ,
,

R

~
"
-2
N

AI-I
N

-2
. '

Figure PS.12

182 Image Transforms Chap.S


'5.13 Show that the real and imaginary parts of the unitary DFT matrix are not orthogonal
matrices in general.
5.14 Show that the N x N cosine transform matrix C is orthogonal. Verify your proof for .
the esse-N = 4.
5.15 Show that the N x N sine transform is orthogonal and is the eigenmatrix of Q given by
(5.102). Verify:your
, prooffor the case N = 3.
'5.16 Show that the cosine and sine transforms of an N x 1 sequence {u(O), ... , u(N - I)}
can be calculated from the DFTs of the 2N x 1 symmetrically extended sequence
{u(N -1), u(N - 2), .•. , u(I). u(O), u(O), u(l), ... , u(N -I)} and of the
(2N + 2) x 1 anrisyrnmetrically extended sequence {O - u(N -1), ... , -u(I), -u(O),
0, u(O); u(l), ... , u(N -I)}, respectively.
2
5.17 Suppose an N x N image U is mapped into a row-ordered N x 1 vector u. Show
that the N' x N' one-dimensional Hadamard transform of u gives the N x N two-
. dimensional Hadamard , transform of U. Is this true for the other transforms discussed

in the text? Give reasons.


5.18 Using the Kronecker Product recursion (5.104), prove that a 2" x 2" Hadamard trans.
form is orthogonal.
5.19 Calculate and plot the energy packed in the first 1, 2, 4, 8, 16 sequency ordered
samples of the Hadamard transform of a 16 x 1 vector whose autocorrelations are
r(k) = (0.95)'.
5.20 Prove that an N x N Haar transform matrix is orthogonal and can be implemented in
O(N) operations on an N x 1 vector.
5.21 Using the recursive formula for generating the slant transforms prove that these
matrices are orthogonal and fast. .
5.22 If the KL transform of a zero mean N x 1 vector u is cl), then show that the KL
transform of the sequence
U (n) = u(n) + 1Jo, OsnsN-l

where IJo is a constant, remains the same only if the vector 1 ~ (1, 1, ... , IV is an
eigenvector of the covariance matrix of u, Whi"h of the fast transforms discussed in the
text satisfy this property?
M3 If u, and U2 are random vectors whose autocorrelation matrices commute, then show
that they have a common KL transform, Hence, show that the KL transforms for
autocorrelation matrices R, R- 1, and f(R), where f(·) is an arbitrary function, are
identical. What are the corresponding eigenvalues?
5.24* The autocorrelation array of a 4 x 1 zero nlean vector u is given by {O.95 Im- nl,
Osm, n s3}.
,
a. What is the KL transform of u?
• b. Compare the basis vectors of the KL transform with the basis vectors of the 4 x 4
unitary DFT, DCT, DST, Hadamard, Haar, and Slant transforms.
Co Compare the performance of the various transforms by plotting .
the basis restriction
,

• error Jm versus m.
5.25* TIle autocorrelation function of a zero mean random field is given by (5.150), where
. p = 0.95. A 16 x 16 segment of this random field is unitarily transformed. .

a. What is the maximum energy concentrated in 16, 32, 64,' and 128 transform coeffi-
cients for each of the seven transforms, KL, cosine, sine, unitary DFT, Hadamard,
Haar, and Slant? .

Chap. 5 Problems 183


• b. Compare the performance of these transforms for this random field by plotting
the mean square error for sample reduction ratios of 2, 4, 8, and 16. (Hint: Use
Table 5.2.) • •

5.26 (Threshold representation) Referring to Fig. 5.16, where u(n) ill a Gaussian random
sequence, the quantity
IN-I . IN-l ",-I •

Cm=N.~)u(n)-z(nW=N.~o u(n)- J~O b(n,j)v(j)


is a random variable with respect to m. Let m be such that
Cm -I > iE', Cm S e' for any fixed IE' >0
If A and B are restricted to be unitary transforms, then show that E [m Jis minimized
when A = eJI"T, B "" A-I, where eJI"T is the KL transform of u(n). For details see (33].

5.27 (Minimum entropy property of the KL transform) (30] Define an entropy in the'
A-transform domain as 1
N-'
H[A] "" - 2: 0"1 log. 0"1
.~O

where 0"1 are the variances ofthe transformed variables v (k). Show that among all the
unitary transforms the KL transform minimizes this entropy, that is"H[eJI'1 s H[A].
5.28 a. Write the N x N covariance matrix R defined in (2.68) as
. . 132 R " = J(k k 2 , k3) - A,J
"
where AJ is a sparse N x N matrix with nonzero terms at the four comers. Show
that the above relation yields
R = 13 2
r ' + 13 r ' AJr ' + r ' (AR)r '
2

where AR ~ AJRAJ is also a sparse matrix, which has at most four (comer) non-
zero terms. If eJI diagonalizes J, then show that the variances of the transform
coefficients are given by .

O"~ ~ [eJI"rReJI]•.• "" Irll.' + ~: [eJI"T AJcfl)... + A[cfl"rAReJI]... P5.28-1


• 1\' 1\.
Now verify the formulas

ai(DCT) ~ f- 4(1;1I.~~a'[1- <-1)'p1[ cos' (~Z) -~8(k)] PS.2S-2



! .
••


• . P5.28-3
Osk<N-l
where 1I.. = 1 - 2a cos(k + 1)-rr/(N + 1), and •

2 (D' FT) = t _2(1- p~a[cos(2-rrkfN)- 2a] P5.28-4


{1 • 1I.. NlI.~
where lI.. = 1 - 2a cos 2-rrk/N, 0 s k :S N - 1.

184 Image Transforms Chap. 5




b. Using the formulas P5.28·2-P5.28-4 and (5.120), calculate the fraction of energy
packed in N!2 transform coefficients arranged in decreasing order by the cosine,
sine, unitary DFT, and Hadamard transforms for.N '" 4, 16, 64, 256, 1024, and
.4096 for a stationary Markov sequence whose autocorrelation matrix is given by
R={plm .. ni},p"'O.95. . ,
S.Z!i a. For an arbitrary real stationary sequence, its autocorrelation matrix, R ~ {r(m - n)},
is Toeplitz. Show that A-transform coefficient variances denoted by aHA), can be
obtained in O(N logN) operations via the formulas
1 N-l
a~(F) = Nn-~+l (N -lnj)r(n)WN nk
, . F = unitary DFt'

uHDST) '" r(O) + N ~1 2a(k) + b(k) cot ClTt: 11») ,


[j1Tn(k + 1)]
a(k)+jb(k)~ ,~,
N- I .
[r(n)+r(-n)Jexp N+l

where 0 -s k: s N -1. Find a similar expression for the DCl'. .• •


b. In two dimensions, for stationary random fields, (5.36) implies we have to evaluate

u~AA) '" 2: 2: 2:
Itt ,. IH;'
L: a(k, m)a* (k, m ')r(m -
n'
m',n - n ')a(t, n)a* (t, n ')
2
Show that CT1AA) can be evaluated in O(N log N) operations, when A is the FFT,
DST, or DCT. '
5.30 Compare the maximum energy packed in k SVD transform coefficients for k = 1,2, of
the 2 x 4 image .

u=G~~:)
with that packed by the cosine, unitary DFT, and Hadamard transforms.
5.31 (ProofofSVD representation) Define cit", such that m '" 0 for m = r + 1, •.• ,M SO\U+
that the set cit.. , 1:s m s M is complete and orthonormal. Substituting fur from .m
•• (5.191) in (5.188), obtain the following result: .. .
r r . '

2; v'C +m +::. =' U L:


.. ~I
cltm +1.'. =u

,
BIBLIOGRAPHY •
Sections 5.1, 5.2

General references on image transforms: , .


, ,
1. H. C. Andrews. Computer Techniques in [mage Processing. New York: Academic Press,
1970, Chapters 5, 6. ,
. 2. H. C. Andrews. "Two Dimensional Transforms." in Topics in Applied Physics: Picture
Processing and Digital Filtering, vol. 6, T. S. Huang (ed)., New York: Springer Verlag,
1975. '" ,
.
3. N. Ahmed and K.R. Rao. Ortlwgonal Tramforms for Digital Signal ProCess/nt. New
York: Springer Verlag, 1975.
,

Chap. 5 Bibliographv 185


4. W. K. Pratt. Digital image Processing. New York: Wiley Interscience, 1978.


5. H. F. Harmuth., Transmission ofInformation by Orthogonal Signals. New York; Springer
Verlag, 1970. .' ' . .
6. Proceedings Symposia on Applications of Walih Functions, University of Maryland,
IEEE·EMC (1970-73) and Cath. U. of Ameri., 1974.
7. D. F. Elliott and K. R. Rao, Fast Transforms, Algorithms and Applications, New York:
Academic Press,1983.

Section 5.3

For matrix theory description of unitary transforms:

8. R. Bellman. Introduction to Matrix Analysis, New York: McGraw-Hill, 1960.

Sections 5.4, 5.5 •

For DFT, FFf, and their applications:

9. I. J. Good. "The Interaction Algorithm and Practical Fourier Analysis." J. Royal Stat.
Soc. (London) B20 (1958): 361..
10. J. W. Cooley and J. W. Tukey, "An Algorithm (or the Machine Calculation of Complex
Fourier Series," Math. Comput. 19,90 (April 1965): 297-301.
11. IEEE Trans. Audio and Electroacoustics. Special Issue, on the Fast Fourier Transform
AU-IS (1967).
.12. G. D. Bergland. "A Guided Tour of the Fast Fourier Transform." IEEE Spectrum
6 (July 1969): 41-52.
. 13. E. O. Brigham. The Fast Fourier Transform. Englewood Cliffs, N.J.: Prentice-Hall,
19~. •
14. A. K. Jain. "Fast Inversion of Banded Toeplitz Matrices Via Circular Decomposition."
IEEE Trans. ASSP ASSP-26, no. 2 (April 1978): 121-126.

Sections 5.6, 5.7

15. N. Ahmed, T. Natarajan, and K. R. Rao. "Discrete Cosine Transform." IEEE Trans. on
Computers (correspondence) C-23 (January 1974): 90-93.
16. A. K. Jain. "A Fast Karhunen Loeve Transform for a Class of Random Processes."
IEEE Trans. Communications, Vol. COM-24, pp. 1023-1029, Sept. 1976.
17. A. K. Jain. "Some New Techniques in Image Processing," Proc. Symposium on Current
Mathematical Problems in Image Science, Monterey, California, November 10-12, 1976.
18. W. H. Chen, C. H. Smith, and S. C. Fralick. "A Fast Computational Algorithm for tbe
Discrete Cosine Transform." IEEE Trans. Commun. COM-25 (September 1977):
1004-1009.
• •
19. M. J. Narasimha and A. M. Peterson. "On the Computation of the Discrete COiTOe
Transform. IEEE Trans: Commun., COM-26, no. 6 (June 1978): 934-936. •

186 Image Tranl\forms .Chap· 5


20. P., Yip .and K. R. Rao. "A Fast Computational Algorithm for the Discrete Sine
Transform." IEEE Trans. Commun. COM-28, no. 2 (February 1980): 304-307.

Sections 5.8, 5.9, 5.10


, .
For Walsh functions and Hadamard, Haar, and, slant transforms, see [1-6] and:

, 21. J. L. Walsh. "AOosed Set of Orthogonal Functions." American J. of Mathematics


45 (1923): 5-24.
22. R. E. A. C. Paley. "A Remarkable Series of Orthogonal Functions." Proc. London
Math. Soc. 34 (1932): 241-279.
23. H. Kitajima. "Energy Packing Efficiency of the Hadamard Transform." IEEE Trans.
Comm. (correspondence) COM-24 (November 1976): 1256-1258. '. '
24. J. E. Shore. "On the Applications of Haar Functions," IEEE Trans. Communications
COM-21 (March 1973): 209-216.
25. W. K. Pratt, W. H. Chen, and L. R. Welch. "Slant Transform Image Coding." IEEE
Trans. Comm. COM-22 (August 1974): 1075-1093. Also see W. H. Chen, "Slant Trans-
form Image Coding." Ph.D. Thesis, University of Southern California, Los Angeles,
California, 1973.
• •

Section 5.11

For theory of KL transform and its historicdevelopment:



26. H. Hotelling. "Analysis of a Complex of Statistical Variables into Principle Compo-
nents." J. Educ. Psychology 24 (1933): 417-441 and 498-520.
27. H. Karhunen. "Uber Lineare Methoden in der Wahrscheinlich-Keitsrechnung." Ann.
Acad. Science Fenn, Ser. A.I. 37, Helsmki, 1947. (also see translation by I. Selin in the
Rand Corp., Doc. T-131, August 11,1960).
,
28. M. Loeve. "Fonctions Aleatoires de Seconde Ordre,' in P. Levy, Processus Stochas-
tiques et Mouvement Brownien. Paris, France: Hermann, 1948.
29. J. L. Brown, Jr. "Mean Square Truncation Error in Series Expansion of Random
Functions," J. SIAM 8 (March 1960): 28-32.
30. S. Watanabe. "Karhunen-Loeve Expansion and Factor Analysis, Theoretical Remarks
and Applications." Trans. Fourth Prague Conf. Inform. Theory, Statist. Decision
Functions, and Random Processes, Prague, 1965, pp. 635-660" "

31. H. P. Kramer and M. V. Mathews. "A Linear Coding for Transmitting a Set of Corre-
I
lated Signals." IRE Trans. Inform. Theory IT~2 (September 1956): 41-46. "

32. W. D. Ray and R. M. Driver. "Further Decomposition of the Karhunen-Loeve Series
Representation of a Stationary Random Process." IEEE Trans. Info. Theory IT-ll
(November 1970): 663-668.

For minimum mean square variance distribution and entropy properties We follow,
primarily, [30] and [31]. Some other properties of the KL transform are discussed

·m: . .


Chap. 5 Bibliogtaphy 187

33. V. R. Algazi and D. J. Sakrison: "On the Optimality of Karhunen-Loeve Expansion."


(Correspondence) IEEE Trans. Information Theory (March 1969): 319-321.

Section 5.12 " .

The sinusoidal family of transforms was introduced. in .[17J and:

34. A. K. Jain. "A Sinusoidal family of Unitary Transforms." IEEE Trans. Pauern Anal.
• •

Mach. Intelligence PAMI-1, no. 6 (October 1979): 356-365.

Section 5.13

The theory and applications of outerproduct expansion (SVD) can be found in


many references, such as [2, 4], and: '.

35. G. E. Forsythe, P. Henrici. "The Cyclic Jacobi Method' for Computing the Principal
Values of a Complex Matrix." Trans. Amer. Math. Soc. 94 (1960): 1-23. . .
36. G. H. Golub and C. Reinsch. "Singular Value Decomposition and Least Squares Solu-
tions." Numer. Math. 14 (1970): 403-420.
37. S. Treitel and J. L. Shanks. "The Design of Multistage Separable Planar Filters." IEEE
Trans. Geoscience Elec. Ge-9 (January 1971): 10-27.


188 . Image Transforms Chap. 5


Image Representation
by Stochastic Models

6.1 INTRODUCTION

In stochastic representations an image is considered to be a sample function of an


array of random variables called a random field (see Section 2.10). This characteri-
-zation of an ensemble of images is useful in developing image processing techniques
that are valid for an entire class and not just for an individual image.

Covariance Models

In many applications such as image restoration and data compression it is often


sufficient to characterize an ensemble of images by its mean and covariancefunc-
tions, Often one starts with a stationary random field representation where the
mean is held constant and the covariance function is represented by the separable or
the nonseparable exponential models defined in Section 2.10. The separable covar-
iance model of (2.84) is very convenient for analysis of image processing algorithms,
and it also yields computationally attractive algorithms (for example, algorithms
that can be implemented line by line and then column by column). On the other
-. hand, the nonseparable covariance function of (2.85) is a better model [21] but is
not as convenient for analysis. .

Covariance models have been found useful in transform image coding, where
the covariance function is used to determine the variances of the. transform coeffi-
cients. Autocorrelation models with spatially varying mean and variance but spa-
. tially invariant correlation have been found useful in adaptive block-by-block pro-
cessing techniques in image coding and restoration problems.

Linear System Models..


,- - .-

An alternative to representing random fields by mean and covariance functions is to


characterize them as the outputs of linear systems whose inputs are random fields

• 189
• Stochastic models lJsed.in image processing

r I .: . . . 'I -
Covarjance models One-dlmeosional lt-Dl modelS Two'dimensionaI12-D) model.
- -~ .~---- .

• Sepa-rabJe exponential • AR and ARMA • Causat


~ f'Jotlseparable exponential • State variable • Seroicausat
• Noncausal minimum • Noncausal
Vi)flilnCe

Figure 6.1 Stochastic models used in image processing.



with known or desired statistical properties (forexample, white noise inputs). Such
linear systems are represented by difference equations and are often useful in
developing computationally efficient image processing algorithms. Also, adaptive
algorithms based on updating the difference 'equation parameters are easier to
implement than those based on updating covariance functions. The problem of'
finding a linear stochastic difference equation model that realizes the eovariances of
an ensemble is known as the spectral factorization problem.
Figure 6.1 summarizes the stochastic models that have been used in image
processing [1]. Applications of stochastic image models are in image data compres-
sion, image restoration, texture synthesis and analysis, two-dimensional power
spectrum estimation, edge extraction from noisy images, image reconstruction from
noisy projections, and in several other situations.

6.2 ONE-DIMENSiONAL (1-0) CAUSAL MODELS

A simple way of characterizing an image is to consider it a I-D signal that appears at


the output of a raster scanner, that is, a sequence of rows or columns. If the interrow
or inter-column dependencies are ignored then I-D linear systems are useful for
modeling such signals.
Let u(n) be a' real, stationary random sequence with zero mean and
covariance r(n). If u(n) is considered as the output of astable, linear shift invariant,
system R(z) whose input is astationary zero mean random sequence £(n), then its
SDF is given by

S(z)=H(z)S.(z)H(z"I), z =e i" , -1l'<wS1I' (6.1)


where S.(z) is the SDF of ,,(n), If H(z) must also be causal while remaining stable,
then it must have a one-sided Laurent series ' -,
'" .
R(z) =: 2: h(n)z·· (6.2)
.-0
and all its poles must lie inside the unit circle [21.

Autoregressive (AR) Models

A zero mean random sequence u(n) is called an autoregressive (AR) process of


order p (Figure 6.2) when it can be generated as the output of the system

190 Image'Representation by Stochastic Mo'dels '. Chap. 6




e(nl + •
u I nl
+ ,,

+ • uln) e(nl
p

• I
,-j
l: =a(klr'

(a) AR sequence realization. (bl Prediction-error filter.

Figure 6.2 pth-order AR model.

p
. u(n) = L: a(k)u(n - k) + ein], "In (6.3a)
k=1

E[e(n)] = 0, E[e(n)u(m)] = 0, m<n (6.3b)


.
where fen) is a stationary zero mean input sequence that is independent of past
outputs. This system uses the most recent p outputs and the current input to
generate recursively the next output. Autoregressive models are of special signifi-
cance in signal and image processing because they possess several important
properties, discussed next.

PropClrties of AR Models

The quantity
. p

Ii (n) ~ L: a(k)u(n - k) (6.4)


k=1

is the best linear mean square predictor of u(n) based on all its past but depends
only on the previous p samples. For Gaussian sequences this means a pth-order AR
sequence is a Markov-p process [see eq. (2.66b)). Thus (6.3a) can be written as
u(n) = Ii (n) + 8(71) (6.5)
which says the sample at 71 is the sum of its minimum variance, causal, prediction
estimate plus the prediction error e(n), which is also called the innovations sequence.
Because of this property an . AR model is sometimes called a causal minimum .

variance representation (MVR). The causal filter defined by


. p .
A p (z) ~ 1 - (6.6) L: a(n)z-O .'
0=1
-
is called the prediction error filter. This filter generates the prediction error se-
quence E(n) from the sequence u(n).. '
The prediction error sequence is white, that is,
E[E(n)E(m)] "" ("S(n - m) (6.7)
'. . •

For this reason, Ap(z) is also called the whitening filter for u(n). The proofis
considered in Problem 6.1. .
Except for possible zeros at z = 0, the transfer function and the SDF of an AR

Sec. 6.2 One-dimensional (1-0) Causal Models 191


process are all-pole models. This follows by inspection of the transfer function
1
. H(z) = Ap(z) (6.8)

. The SDF of an AR model is given by

• _ R2 .
S(z) J::: z :;:: el "' , (6.9)
- A p (z)A p (z -I) ,
Because r, (n) = f:)2 0(n) gives S, (z) = a
2, this formula follows directly by applying
(6.1).
For sequences with mean j.l., the AR model can be modified as
p

x(n) = L
k-'J
a(k)x(n - k) + 8(n)
(6.lOa)
,
u(h) = x(n) + fJ.
where the properties of 8(n) are same as before. This representation can also be
written as
p p
u(n) = L a(k)u(n - k) + 8(n) + fJ. 1- L a(k) (6. 1Gb)
k=l k=J

which is equivalent to assuming (6.3a) with E[E(n)] = fJ.[l-.2:ka(k»),


cov[E(n)E(m)] = f:)2 0 (n - m).

Identification of AR models. Multiplying both sides Of (6.3a) by E(rn),


taking expectations and using (6.3b) and (6.7) we get .
E[u(n)E(m)J= E[i(n)E(m)] = 132 6(n -m), m >n . (6;11)

Now, multiplying both sides of (6.3a) by u(O) and taking expectations, we find the
AR model satisfies the relation
p

r(n) - L
k=J
a(k)r(n - k) = f:) 23(n), Vn ;::: ° (6.12)

where r(n) ~ E[u(n)u(O)] is the covariance function of u(n). This result is im-
portant for identification of the AR model parameters a (k), 132 from a given set of
covariances {r(n), -p :s;, n :s;,p}. In fact, a pth-order AR model can be uniquely
determined by solving (6.12) for n = 0, ... ,po In matrix notation, this is equivalent
to solving the following normal equations:
,
Ra=r (6. 13a)
• (6. 13b)
, •
where R is the i x p Toeplitz matrix
· reO) r(l) ... rep -1)
R~ r(l) . ••


- .• r(l)
(6.13c)
• • . '
rep -1) ... r(l) reO)

192 Image Representation by Stochastic Models . Chap. IS


,
and a ~ [a(1)a(2) . . . a(p)f, r ~ [r(1)i(2) . . , r(p)f. If R is positive definite, then
the AR model is guaranteed to be stable, that is, the solution {a (k), l:s k :S p}
is such that the roots of A p (z) lie inside the unit circle. This procedure allows us to
fit a stable AR model to any sequence u(n) whose p + 1 covariances reO), r(l),
7(2), ... ,r(p) are known.
EXlUllpie 6.1-
The covariance function of a raster scan line of an image can be obtained by consider-
ing the covariance between two pixels on the same row. Both the 2-D models of (2.84)
to
and (2.85) reduce a 1-D model of the form r(n) = <? plnl. To fit an AR model of order.
2, for instance, we solve .

. <T2[~~J[:ml=(T2[~2]
2
which gives a(1) = p, a(2) = 0, and 13 "" (T2 (1- p2).The corresponding representation
for a scan line of the image, having pixel mean of jJ., is a first-order AR model
.
x(n) = px(n -1) +8(n), r. (n) = (T2 (1- p2)8(n)
a(n) = x(n) + f.l. (6.14)
withA(z) = 1- pz-" S. = (T2(1- p2), and S(z) =(12(1- p2)/[(1 - pz-l)(I- pz)J.

Maximum entropy extension. Suppose we are given a positive definite


sequence r(n) for In I:s: p (that is, R is a positive definite matrix). Then it is possible
to extrapolate r(n) for Inj > p by first fitting an AR model via (6.13a) and (6.13b)
and then running the recursions of (6.12) for n > p, that is, by solving
p

rep
,
+ n):=
.
2:
k l
=;
a(k)r(p + n - k), "In <!: 1
(6.15)
.. r(-n):=r(n), "In
This extension has the property that among all possible positive definite extensions
of {r(n)}, for In I> p, it maximizes the entropy

H~2,;
1 f"
_"If 10gS(w)dw (6.16)
t
. <. where Sew) is the Fourier transform of {r(n),'v'n}. The AR model SDF Sew), which .
can be evaluated from the knowledge of a(n) via (6.9), is also called the maximum
entropy spectrum of {r(n), Inl :sp}. This result gives a method of estimating the
power spectrum of a partially observed signal. One would start with an estimate of
the p + 1 covariances, {r(n), O:;:;:n :S:p}, calculate the AR model parameters [32,
.a (k), k= 1, ... .p, and finally evaluate (6.9). This algorithm is also useful in certain
image restoration problems [see Section 8.14 and Problem 8.26].
• • •

Applications of AR Models in Image Processing


As seen from Example-6.1,AR models are useful in image processing for repre-
senting image scan lines. The
.
prediction'
property
. .
of the AR,models has been
, '

"Recall from Seetioll2.1l our notationS(w) ~ S(z), z = elM.

Sec; 6.2 One-dimensional (1-0) Causal MaClels 193


, ,
• v.(O) en 10) ,

An,o!z) AR model
,

nth column Unitary vn lk) <n 1k i vn lk) un


transform An"lzl AR model '1'-1
un q, ,
• •
• ,

• •
Ap , AI _ liz) AR model , '

Figure 6.3 ,Semirecursive representation of images.

exploited in data compression of images and other signals. For example, a digitized
AR sequence u(n) represented by B bits/sample is completely equivalent to the
digital sequence i (n) ~ u (n) ~ "if (n), where u' (n) is the quantized value of u (n).
The quantity E (n) represents the unpredictable component of u(n), and its entropy
is generally much less than that of u (n). Therefore, it can be encoded by many fewer,
bits per sample than B. AR models have also been found very useful in representa-
tion and linear predictive coding (LPC) of speech signals [7}.
Another useful application of AR models is in semirecursive representation of
images. Each image column Un, n = 0, 1,2, ... , is first transformed by a unitary
matrix, and each row of the resulting image is represented by an independent AR
model. Thus if V n ~ 'lTu., where 'IT is a unitary transform, then the sequence
{lin (k), n = 0, 1, 2 ...} is represented by an AR model for each k (Figure 6.3), as
p
ll.(k) = S ai(k)vn_i(k) + e.(k), 'rIn,k =O,I, ... ,N-l
i= 1
(6.17)

The optimal choice of 'IT is the KL transform of the ensemble of all the image
columns so that the elements of v, are uncorrelated. In practice, the value of p = 1
or 2 is sufficient, and fast transforms such as the cosine or the sine transform are
good substitutes for the KL transform. In Section 6.9 we will see that certain,
so-called sernicausal models also yield this type of representation. Such models are
useful in filtering and data compression of images [see Sections 8.12 and 11.6].

Moving Average IMA) Representations

A random sequence lI(n) is called a moving average (MA) precess of order q when
it can be written as a weighted running average of uncorrelated random variables
,
q. •

u(n) = 2: b(k)e(n - k) (6.18)
k=O •
,

where e(n) is a zero mean white noise process of variance (32 (Fig. 6.4). The SDF of
this MA is given by .

Figure 6.4 qlh-order MA model.

194 Image Representation bv Stochastic Models Chap.S



,

(6.19a)
q
B q (z) = 2: b(k)z-k • (6.19b)
k=O

From the preceding relations it is easy to deduce that the covariance sequence of a
qth-order MA is zero outside the interval r-q, q]. In general, any covariance
sequence that is zero outside the interval [-q, q Jcan be generated by a qth-order
MA filter B q (z), Note that Bq(z) is an FIR filter, which means MA representations
are all-zero models.
Example (i.2
Consider the first-order MA process
2
u(n) "''l;(n) - ns(n -1),. E[,,(n),,(m)] == 13 l\(m - n)
• Then B, (z) = 1- nz·" S(z) = {32[1 + a - «(z + Z-l)J. This shows the covariance s.e-· .
2

quence of u(n) is reO) = 13'(1 + a'), r( ±1) = -a(3" r(n) = 0, Inl .>, 1.
I !, "
Autoregressive Moving Average (ARMAI Representations

An AR model whose input is an MA sequence yields a representation of the type
(Fig. 6.5)
p q •

2: a(k)u(n - k) = 2: b(tMn -I) (6.20)


k-D 1-0

where £(n) is a zero mean white sequence •of variance 132 • This is called an ARMA •

representation of order (p, q). Its transfer function and the SDF are given by

H(z) = Bq(z) (6.21) •


Ap(z)
\ '
S(z) == 1)2 Bq (z)Bq (Z-l)
(6.22)
A p(z)Ap(Z-l)
For q = 0, it is apth-order AR and for p = 0, it is a qth-orderMA

State Variable Models




A state variable representation of a random vector sequence Yn is of the form {see
Fig.8.22(a)] •


Xn ., I == An X" + DnE"
Y. = en". + 'lJ., 'r/n (6.23)
Here x, is an m x 1 vector whose elements x; (i), i = 1, . c . ,m are called the states
of the process Y., which may represent the image pixel at time or location n; En is a
••

• •
eln} B.(z! . , 1lAp [z) u(n)

MA model AR model
• , • • • , 1\'1gure 6.5 . (p, q)~rder ARMA model,

Sec.6.2 '. One-dimensionaln-D) Causal Models . ,


195

p x 1 vector sequence of independent random variables and TIn is the additive white
noise. The matrices An , B, , and en are of appropriate dimensions and T/n , ell satisfy

E[EnJ "" 0, E[T/n T/~'] ==Qn'o(n - n '), E["Q.J == 0 (6.24)


E[T/n Il~J == 0, E[E ne~.J == Pn S(n- n '), 'lfn, n'
The state variable model just defined is also a vector Markov-I process. The
ARMA models discussed earlier arc special cases of state variable models. The
application of state variable models in image restoration is discussed in Sections'
8.10 and 8.11.
Image Scanning Models
The output s(k) of a raster scanned image becomes a nonstationary sequence even
when the image is represented by a stationary random field. This is because equal
mterV"ls between scanner outputs do not correspond to equal distances in their
spatial locations. [Also see Problem 6.3.J Thus the covariance function
'. . r, (k, l) A Ens (k) - J-t)[S (k -I) - IJ.]} (6.25)

depends on 'k, the position of the raster, as well as the displacement variable l. Such
a covariance function can only yield a rime-varying. realization, which would
increase thecomplexity of associated processing algorithms. A practical alternative
is to replacers (k, I) by its average over the scanner locations [9J, that is, by
N - 1

r (l) ~ t 1'0 r. (k.. l)


s (6.26)

Given is (I), we can find a suitable order AR realization using (6.12) or the results of
the following two sections. Another alternative is to determine a so-called cycle-
stationary state variable model, which requires a periodic initialization of the states
for the scanning process. A vector scanning model, which is Markov and time
invariant, can be obtained from the cyclostationary model [10J. State variable
scanning models have also been generalized to two dimensions [11,12]. The causal
models considered in Section 6.6 are examples of these. .

6.3 ONE·DlMENSIONAL SPECTRAL FACTORIZATION

Spectral factorization. refers to the determination of a white noise driven linear


system such that the power spectrum density of its output matches a given SDF.
Basically, we have to find a causal and stable filter H(z) whose white noise input has
the spectral density K, a constant, such that
S(z) = KH(z)H(z -J), (6.27)
where S(z) is the given SDP. Since, for z :== e J"',
S(e J"') ~ 8(w) = KIH(w)1 2 (6.28)
the spectral factorization problem is equivalent to finding a causal, stable, linear
filter that realizes a-given magnitude frequency response. This is also equivalent to
• •

196 Image Representation by Stochastic Models Chap. e



specifying the phase of H (w), because its magnitude can be calculated within a
constant from Sew). .

Rational SDFs

S(Z). is called a proper rational function if it is a ratio of polynomials


.
that can be
factored, as in (6.27), such that

H(z) =.8'1 (z) (6.29)


Ap(z)
where A p (z) andB, (z) are polynomials of the form .
p q
Ap(z) = 1- 2: a(k)z-k, Bq(z) = 2: b(k)z-k. (6.30)
k~l I(~O

For such SDFs it is always possible to find a causal and stable filter H(z). The
method is based on a fundamental result in algebra, which states that any'
polynomial Pn (x) of degree n has exactly n roots, so that it can be reduced to a
product of first-order polynomials, that is,
n
P; (x) = TI (aix -131) (6.31)
i:- 1 •
jw
For a proper rational S(z), which is strictly positive and bounded for z = e , there
will be no roots (poles or zeros) on the unit circle. Since S(z) = S(Z-I), for every
root inside the unit circle there is a root outside the unit circle. Hence, if H(z) is
chosen so that it is causal and all the roots of AI' (z) lie inside the unit circle, then
(6.27) will be satisfied, and we will have a causal and stable realization. Moreover, if
Bq (a) is chosen to be causal and such that its roots also lie inside the unit circle, then
the inverse filter 1/8(z), which is A p(z)IBq(z), is also causal and stable. A filter that
is causal and stable and has a causal, stable inverse is called a minimum-phase filter.
Example 6.3
Let

(6.32)
-,
The roots of the numerator are ZI = 0.25 and Z2 = 4 and those of the denominator are
z, = 0:5 and z, = 2. Note the roots occur in reciprocal pairs. Now we can write
S(z) = ](1 - O.25z -1)(1 - 0.25t) •
(1-: O.5z -')(1 - OSz)
. .
. Comparing this with (6.32), we obtain K = 2. Hence a filter with H(z) = (1- 0.25z- 1)1
(1-0.5z -I) whose input is zero mean white noise, with variance of 2, will be a mini-
mum phase realization of S(z). The representation of this system will be
" ·u(n}= O:5u(n -1) + e(n) - O.25e(n -1)
• E[e(n)] '" 0, E[e(n)e(m)] = 28(n - m) • (6.33)
This is an ARMA model of order (1, 1).

Sec. 6.3 " One-dimensionalSpeetral Factorization 197


,
• Remarks

It is not possible to find finite-order ARMA realizations when S(z) is not rational.
The spectral factors are irrational, that is, they have infinite Laurent series. In
practice, suitable approximations are made to obtain finite-order models. There is a
subtle difference between the terms realization and modeling that should be pointed
out here. Realization refers to an exact matching of the SDF or the covariances of
the model output to the given quantities. Modeling refers to an approximation of
the realization such that the match is close or as close as we wish.
One method of finding minimum phase realizations when S (z) is not rational
is by the Wiener-Doob spectral decomposition technique [5, 6]. This is discussed for
2-D case in Section 6.8. The method can be easily adapted to 1-D signals (see
Problem 6.6). .

I
6.4 AR MODELS, SPECTRAL FACTORIZATION,
AND LEVINSON ALGORITHM

The theory ofAR models offers an attractive method of approximating a given SDF
arbitrarily closely by a finite order AR spectrum. Specifically, (6.13a) arid (6.13b)
can be solved for a sufficiently large p such that the SDF Sp (z) ~ /3},/[A p (z)Ap(Z-I»)
is as close to the given S (z),.z == exp(jw), as we wish under some mild restrictions on
S(z) [1]. This gives the spectral factorization of S(z) as p -> 00. If S(z) happens to
have the rational form of (6.9), then a(n) wili turn out to be zero for all n > p, An
efficient method of solving (6.13a) and (6. 13b) is given by the following algorithm;

The levinson-Durbin Algorithm .

~~ = f3~ -1 (1 - p, p~), f3~ == reO) -, (6.34)


1 ~. " r(l)
p. +l '" ~ r{n + 1) - .'7:1 a" (k)r{n + 1:- k) > PI '" rCO)

where n = 1,2,,, .. .p. The AR model coefficients are given by a (k) = Qp (k),
13 = j3~" One advantage ofthis algorithm is that (6.13a) can now be solved in O(p2)
2

multiplication and addition operations, compared to O{p3) operations required by ,


conventional matrix inversion methods such as Gaussian elimination. Besides giv-
ing a fast algorithm for solving Toeplitz equations, the Levinson-Durbin recursions"
reveal many important properties ofAR models (see Problem 6.8).

198 Image Representation by Stochastic Model:> . Chap. 6



The quantities {PM 1 :5 n :5 p} are called the partial correlations, and their
negatives, -p., are called the reflection coefficients of the ptll-order AR model. The
quantity p» represents correlation between u(m) and u(m + n) if u(m + 1), ... ,
u(m + n - 1) are held fixed. It can be shown that the AR model is stable, that is,
the roots of A p (z) are inside the unit circle, if Ip. 1< 1. This condition issatisfied
when R is positive definite.
The Levinson recursions give a useful algorithm for determining a finite-order
stable AR model whose spectrum fits a given SDF arbitrarily closely (Fig. 6.6).
Given reO) and anyone of the sequences {r(n)}, {a(n)}, {p»} , the remaining se- "

quences can be determined uniquely from these recursions. This property is useful
in developing the stability tests for 2-D systems. The Levinson algorithm is also
useful in modeling 2-D random fields, where the SDFs rarely have rational factors.

Example 6.4
The result of AR modeling problem discussed in Example 6.1 can also be obtained by
applying the Levinson recursions for p = 2. Thus reO) = 0'., r(l) = a p, r(2) = 0'2 p.
2

and we get
0"2 p
Pi = 2 = p, 13r= 17 2(1 - p2),
,
0"

p, =-\13 I
[a 2 pZ _ paZp]
'
=0 •

This gives /.)2:= 13! = 131 , a(1):= Q,(l) = a, (1) = p, and a(2) = az (2) = 0, which leads to
(6.14). Since Ip,l and Ilhl are less than I, this AR model is stable.

--.,-I'" :;-1 f--'"


5(,.,/ r(n)
Levinson
recursions

"
No

I
•,
,, •

Yes ,

\
..,." •

Stop

6.6 AR modeling via Levinson recursions. Ap(U» ~ 1- f tlp(k) eikM


Fig"dre
,.,
Sec. 6.4 AR Models, Spectral Factorization, and Levinson Algorithm 199

6.5 NONCAUSAL REPRESENTATIONS


Causal prediction, as in the case of AR models, is .motivatcdby the fact that a


scanning process orders the image pixels in a time sequence. As such, images
represent spatial information and have no causality associated in those coordinates.
It is, therefore, natural to think of prediction processes that do not impose the
causality constraints, that is, noncausal predictors. A linear .noncausal predictor
11 (n) depends on the past as well as the future values of u(n). It is of the form '.

11 (n):= 2: Q'.(k)u(n - k) . (6.35)


k:¢n

where a(k) are determined to minimize the variance of the prediction error
u(n) - u (n). The noncausal MVR of a random sequence. u(n) is then defined as

. u(n):= u (n) + v(n):= 2: Q'.(k)u(n- k) + v(n) (6.36)


k ""' -~
k ,,0

where v(n) is the noncausal prediction error sequence. Figure 6.7a shows the
noncausal MVR system representation. The sequence u(n) is the output of a non-
causal system whose input is the noncausal prediction error sequence v(n). The
transfer function of this system is l/A (z), where A (z) !I. 1 - 2:"",,0 Q'.(n)z -n is called
the noncausal MVR prediction error filter. The filter coefficients a(n) can be deter-
mined according to the following theorem.
Theorem 6.1 (Noncausal MVR theorem): Let u(n) be a zero mean, sta-
j
tionary random sequence whose SDF is S(z). If lIS(e ",) has the Fourier series
00

1 .~ 2: -: (n)e-j",n
• S(e'"') n~-ro

r+(n) =-If''
2'1l' _.".
S-l(ejw)ei""'dw - (6.37) .

then u(n) has the noncausal MVR of (6.36), where •

-r+ (n)
a(n) = r" (0)

2A ." . 1
13 := E{[v(nW} = r+ (0) (6.38)

Moreover, the covariances r, (n) and the SDP SV (z) of the noncausal prediction
error v(n) are given by .
rv(n):= -13 2 Q'. (n), <x(O) ~-1

SV(Z):=132A(Z)~f32[1- n~ooa(n)z-n] (6.39)'


n pO



The proof is developed in Problem 6.10.
. .

200 Image Representation. bvStochastic MOdels




pin) u(n) r" In)

. ~

Aizl = 1 - ~ ",(n)r"
fl"'·-""
n-O

ial Ncncausa] system representation. (bl Realization of prediction .


, error filter coefficients.

Flgnre 6.7 Noncausal MVas.

Remarks

1. The noncausal prediction error sequence is not white. This follows from
(6.39). Also, since r v (-n) = r, (n), the filter coefficient sequence o:.(n) is ~wen,.
that is, 01.( -n) = OI.(n), which implies A (z -I) = A (z),
2. For the linear noncausal system of Fig. 6.7a, the output SDF is given by
S(z) = S; (z)/(A (z)A (Z-I)J. Using (6.39) and the fact that A (z) = A (Z-l), this
becomes •
.

. (6.40)
.
3. Figure 6.7b shows the algorithm for realization of noncausal MVR filter
coefficients. Eq. (6.40) and Fig. 6.7b show spectral factorization of S(z) is not
required for realizing noncausal MVRs. To obtain a finite-order model, the
Fourier series coefficients r+ (n) should be truncated to a sufficient number of
terms, say, p. Alternatively, we can find the optimum pth-order minimum
variance noncausal prediction error filter. For sufficiently large p, these meth-
ods yield finite-order, stable, noncausal MVR-models while matching the
. given
." covariances to a desired accuracy.
. '.

Noncausal MVRs of Autoregressive Sequences

(6-41)
(l"'" P .
. 1 +l


,,'-'- -~,

Sec. 6.5 Noncausel Representations' . 201


which means •
• •
••
1
0'(0) = -1. uO) '" 0'.(-1) = Q, 'r+(O)=-
(3'
The resulting noncausal :-'1 VR is
u(n) = a[u(n - 1) + u(n + l)J + v(ll)

A (z) = 1- «(z + Z-I) (6.42)


r; (n) = ()l{l - a[5(n ..., 1) + 3(n + I)]}
••

S,(z) = f3'[1-0.(z + Z-I)]


where v(n) is a first-order MA sequence. On the other hand, the AR representation
of (6.14) is a causal MVR whose input e(n) is 1 white noise sequence. The generali-
zation of (6.42) yields the noncausal MVR of a pth-order AR sequence as
P . '
u(n) = 2: o.(k)[u(n- k) + u(n + k)J + v(n)
.-1 ,•-

A (z) = 1 - 2: a(k)(z'" + Zk) (6.43)


k~l

S,(z) = ~A(z)

,
where v(n) is a pth-order MA with zero mean and variance 13; (see Problem 6.11).


.A Fast KL Transform [13]

The foregoing noncausal MVRs give the following interesting result.


Theorem 6.2. The KL transform of a stationary first-order AR sequence


{u (n), 1 ~ It ~ N} whose boundary variables u (0) and u (N + 1) are known is the
sine transform, which is a fast transform. -
Proof Writing the noncausal representation of (6.42) in matrix notation,
where u and II are N x 1 vectors consisting of elements (u(n), 1 ~n sN} and
Ivett), 1 s n s N}, respectively, we get
Qti '" v + b (6.44)
where Q is defined in (5.102) and b contains only the two boundary values, u{O) and
u(N + 1). Specifically, .
b{l) = au (0), b(N)=au(N+l), b(n) =0, 2Sn~N-l (6.45)
The covariance matrix of v is obtained from (6.42), as
R.~ E[vvTJ = {r.(m - n)}= 132 Q (6.46)
The orthogonality condition for minimum variance requires that v(n) must be
'*
orthogonal to u{k), for all k nand E[v(n)u{n)] - E[v{n)] ea /32. This gives

E[vbTJ = 0, E[uv1] '" j:l2J (6,47)

202 Image Representation bv Stochastic Models Chap. 6



Multiplying both sides of (6.44) by Q-I and defining.
uo ~ Q-I v, nb~ Q-I b (6.48)
we obtain an orthogonal decomposition of u as
b
u = uo+ u (6.49)
Note that u" is orthogonal to u" due to the orthogonality ofv and b [see (6.47)] and
is completely determined by the boundary variables u(O) and u(N + 1). Figure 6.8
shows how this decomposition can be realized. The covariance matrix of u" is given
by
(6.50)
Because 132 is a scalar, the eigenvectors of R o are the same as those of Q, which we
know to be the column vectors of the sine transform (see Section 5.11). Hence, the
b
. KL transform of UO is.a fasttransform. Moreover, since u and UO are orthogonal, the
conditional mean of u given u(O), u(N + 1) is simply ub,that is, .
!J.b ~ E[ulu(O), u(N + 1)] == E[uO + ublu(D), u(N + 1))
(6.51)
b
= E[uO] + E[ublu(O), u(N + 1)] == u
Therefore, •

Cov[ulu(O), u(N + 1)] = E[(u - ftb)(U - ftbY] = E[uOuo'] == Ro (6.52)


• Hence the KL transform of u conditioned on u (0) and u (N + 1) is the eigenmatrix
of Ro, that is, the sine transform.
In Chapter 11 we use this result for developing a (recursive) block-by-block
transform coding algorithm, which is more efficient than the conventional block-by-
block KL transform coding method. Orthogonal decompositions such as (6.49) can •

be obtained for higher order AR sequences also (see Problem 6.12). In general, the
KL transform of UO is determined by the eigenvectors of a banded Toeplitz matrix.
whose eigenvectors can be approximated by an appropriate transform from the
sinusoidal family of orthogonal transforms discussed
.
in Chapter 5. .

u(l }
- •


u c= • ,


u{N}
, •, ,

, , +
. UO

OlU(O )
- +
-
o
r •

• =b FIR filter u" u"


N •
• 0- 1
o •
Figure '.8 Noncausal orthogonal de-
OlU{N +1) composition for first-order AR se-
- • quences.

Sec. 6.5 Noncausal Representations 203




Optimum Interpotation of Images'

The noncausal MVRs are also useful in finding the optimum interpolators for
random sequences. For example, suppose a line of an image, represented by :l
first-order AR model, issubsampled so that N samples are missing between given
samples. Then the best mean square estimate of a missing sample u(n) is
U (n) ~ E[u(n)iu(O), u(N + 1)], which is precisely u" (n), that is,
, ,

u= Q-l b:? u(h) = o:[Q-l]n, 1U(O) + (X[Q-l]n.Nu(N + 1) (6.53)

When the interpixel correlation p-l, it can be shown that u(n) is the straight-line
interpolator,between u(O) and u(N + 1), that is,
,

u (n) = u(O) + N: 1 [ueN + 1) - u(O)} (6.54)

For values of p near 1, the interpolation formula of (6.53) becomes a cubic poly-
nomial in n/(N + 1) (Problem 6.13). ' '

6.6 LINEAR PREDICTION IN TI"lfO DIMENSIONS


The notion of causality does not extend naturally to two or higher dimensions.
Line-by-line processing techniques that utilize the simple I-D algorithms do not
exploit the 2-D structure and the interline dependence. Since causality has no
intrinsic importance in two dimensions, it is natural to consider other data structures
to characterize 2-D models. There are three canonical forms, namely, causal, semi-
causal, and noncausal, that we shall' consider here in the framework of linear
prediction. "
These three types of stochastic models have application in many image process-
ing problems. For example, causal models yieldo recursive algorithms in ' data' com-
pression of images by the differential pulse code modulation (DPCM) technique and
in recursive filtering of images. "
Sernicausal models are causal in one dimension and noncausal in the other and
lead themselves naturally to hybrid algorithms, which are recursive in one dimen-
sion and unitary transform based (nonrecursive) in the other. The unitary transform
decorrelates the data in the noncausal dimension, setting up the causal dimension
for processing by I-D techniques. Such techniques combine the advantages of high
performance of transform-based methods and ease of implementation of 1-D
, '

algorithms.
Noncausal models give rise to transform-based algorithms. For example, the
notion of fast KL transform discussed in Section 6.5 arises from the noneausal MVR,
of Markov sequences. Many spatial image processing operators. are noncaussl-«
that is, finite impulse response deblurring filters and gradient masks, The coeffi-
cients of such filters or masks, generally' derived by intuitive reasoning, can be
obtained more accurately and quite rigorously by invoking noncausal prediction
concepts discussed here. '
,

204 Image Representation by Stochastic Models. Chap. IS


The linear prediction models considered here can also be used effectively for
2-D spectral factorization and spectral estimation. Details are considered in Section
6.7.
Let utm, n) be a stationary random field with zero mean and covariance
r(k, I). Let u(m, n) denote a linear prediction estimate of u im, n), defined as '

u(m,n) = 2: 2: a(k,l)u(m -k,n - J) (6.55)


(k, I) E s,
,
where atk, l) are called the predictor coefficients and Sx, a subset of the 2-D lattice,
is called the prediction region. ,','
The samples included in Sx depend on the type of prediction considered,
namely, causal (x = 1), semicausal (x = 2), or noncausal (x = 3). With a hypothet-
ical scanning mechanism that scans sequentially from top to bottom and left to right,
the three prediction regions are defined as follows. ' '< '

Causal Prediction

A causal predictor is a function of only the elements that arrive before it. Thus the
causal prediction region is (Fig. 6.9a) .
SI ={l ;;=:l~Vk} U {I =O,k;;=: I} (6.56)
This definition of causality includes the special case of single-quadrant causal pre-
dictors.

u(m, n) = 2: 2: atk, I )u(m - kin -I) (6.57)


k~OI~O
(k.I)'(O.O) ,

This is called a strongly causal predictor. In signal processing literature the term
causal is sometimes used for strongly causal models only, and (6.56) is also called
the nonsymmetric half-plane (NSHP) model [22].

e r
(l'

I
••
.... rm n
I

~ , ,
$, $, $3
(a) Causal (b) Sernicausal (c) Noncausal
• • •
Figure 6.9 Three canonical prediction regions Sx and the corresponding finite
• •
. prediction windows W.. x = 1, 2, 3: . '

5eC,'6.6 Linear Prediction in Two Dimensions 205


Semicausal Prediction

A semicausal predictor is causal in one of the coordinates and noncausal in the


other. For example, thesemicausal prediction region 'that is causalin n and non-
causal in m is (Fig. 6.9b)
S2={/?: I,Vk} U {I =O,k 400} (6.58)

Noncausal Prediction

A noncausal predictor u(m, n) is a function of possibly all the variables in the


.' random field except u(m, n) itself. The noncausal prediction region is (Fig. 6.9c)
53 = fv'(k, I) 40 (0, O)} (6.59)
,

In practice, only a finite neighborhood, cr.lled a prediction window, it c Sx, can be


used in the prediction process, so that


u(m,n)=
,
LL
(k.l) E\V,
a(k,l)u(m -k,n -I),
'
(m, n) E S., x = 1, 2 , 3 (6.60)

Some commonly used it are (Fig. 6.9)


Causal: W;~{-p $k $p, 1$/ $q} U {l ss k: <p, I =O}
Sernicausal: tVz ~ {-p < k $ p, 0 <I $ q, (k, l) 40 (0, O)} , (6.61a)
• Noncausal: -w;A{_p<k » », -q$l$q,(k,/)~(O,O)} ,

We also define
A •
w,=Wx U (0,0), x=I,2,3 (6.61b)

Example 6.5
,
The following are examples of causal, semicausal, and noncausa! nredictors.

Causal: u(m, n) = al u(m - 1, n) + azu(n" n - 1) + a3 u(m - 1, n - 1) .


Semicausal: u(m, n) = al u(m -l,n) + azu(m + 1, n) + a3u(m, n -1)
Noncausal: u(m, n) = al u(m -l,n) + azu(m + 1,n) + aju(m, n -1)
+ a4u(m, It + 1)

Minimum Variance Prediction

Given a prediction region for forming the estimate u(m, n), the prediction coeffi-
cients a(m, n) can be determined using the minimum variance criterion. This re-
quires that the variance of prediction error be minimized, that is,
~Z~minE[eZ(m,n)], e(m,n)~u(m,n)-u(m,n) (6.62)
The orthogonality condition associated with this minimum variance pred~ction is
E[e(nt, n)u(m - k, n '-l)J :=: 0, (k, l) E S., 'r/(m, n) (6.63)

206 Image Representation by Stochastic, Models Chap. e


(6.64)

(6.65)
from which we get.
,
2
, rtk, l) - L:L:.a(i,j)r(k - i, [- j) = fS 8(k, l), (k, l) E S., x = 1,2,3 (6.66)
", (i./) E Wx

The solution of the above simultaneous equations gives the predictor coefficients
at], j) and the prediction error variance (32. Using the symmetry property
rtk, I) = r( k, -I), it can be deduced from (6.66) that
-r .
a (-i, 0) = a (i, 0) for semicausal predictors
(6.67)
a(-i, -j) = a(i,j) for noneausal predictors

Stochastic Representation of Random Fields


In general, the random field u(m, n) can be characterized as •

u(m, n) = u(m, n) + e(m, n) (6.68)


where u em,
n) is an arbitrary prediction of utm, n) and fern, n) is another random
field such that (6.68) realizes the covariance properties of u(m, n). There are three

types of representations that are of interest in image processing:

1. Minimum variance representations (MVR) • •

2. White noise driven representations (WNDR)


I '
3. Autoregressive moving average (ARMA) representations

For minimum variance representations, u(m, n) is chosen to be a minimum variance


predictor and e(m, n) is the prediction error. For white noise-driven representa-
tions f(m, n) is chosen to be a white noise field; In ARMA representations, eim, n)
is a two-dimensional moving average, that is, a random field with a truncated
covariance function:
Eie(i, j)e(m, n)] = 0, Vii - m I> K, Ii - n I> L (6.69)

for some fixed integers K > 0, L > O.
Example 6.6 .
The covariance function
r(k, 1) = &(k, I) - a[a(k - 1, l) + 8(k + 1, I) + t(k, ( -1) + &(k, 1 + 1)]
has a finite region of support and represents a moving average random field. Its
spectr~1 density function is a (finite-order) two-dimensional polynomial given by

S(ZI' Z2) = 1- CX(ZI + Z,l + Z2 + zi'), lal < ~


S~C. 6.6 . Linear Prediction in Two Dimensions ;;:07

Finite-Order MVRs ,'.


,
. .
. ; The finite order predictors of (6. ()O) yield MVRs of. the form

u(m,n) "" 2:2.:, a(k, l)u(m -r k, n -l) + etm, n) (6.70)


(k,1) , Wx

A random field characterized by this MVR must satisfy (6.63) and (6.64). Multi-
plying both sides of (6.70) by ll(m, n), taking expectations arid using (6.63), we
obtain
. •
E[u(m, n)e(m, n)] = 132 . '
. (6.71)
Using the preceding results we find
r.Ck, I) ~ E[e(m, n)ll(m - k, n-l)]
= E{e(m, n)[u(m - k, n -l) - u(m - k, n -I)]} (6.72)
,

= 2
13 3(k, l) - 2:2:. a(i, ne(k + i, I + j)
(i.n f w.
With 0(0, 0) ~ -1 and using (6.67), the covariance function, of the prediction error
eim, n) is obtained as i

(323(k, l), 'rI(k, 1) for causal MVRs


• . P

r.(k, I) =
-13 3(1)
2
2: a(i, O)o(k+ 0, .. 'riCk, I) for semicausal MVRs
!"'" -p
p q .
-1>2.2: .2: a(i,j)8(k - ,
i)o(l - j), 'rI(k, I) for noncausal MVRs
j=-pj=-q
(6.73)
The filter represented by the two-dimensional polynomial
.

A(ZhZ2)~1- kk. a(m,n)ztfflZ;n (6.74)


. , (m.n) , W. .
is called the prediction error filter. The prediction error e(m, n) is obtained by
passing utm, n) through.this filter. The transfer function of (6.70) is given by
-1
H(;h Z2) = A (1' ) = 1 - 2:2:.W, a(m, n)z}m Z2 n (6.15)
" Zl> Z2 (m.n) f
.
Taking the 2·D Z-transform of (6.73) and using (6.74), we obtain

causal MVRs
P
-13 2: oem, o)z}m = 13 A (Z1> (0),
2 2
semicausal, MVRs
m== -p

noncausal MVRs

(6.76)

208 Image Representat,ion by Stochastic Models Chap. 6



causal MVRs

semicausal
.. MVRs (6.77)
_,-!=-f32..:.A::...(~Z-"b:...;Z7Z),---.,.,.= /32 , noncausal
1
A (ZI zz)A (ZI I, Z2 ) A (zj, Zi) . MVRs

Thus the SDFs of all MVRs are determined completely by their prediction error
filters A (Z1> Z2) and the prediction error variances (32: From (6.73), we note the
causal MVRs are also white noise-driven models, just like the 1-D AR models. The
semicausal MVRs are driven by random fields, which are white in the causal
dimension and moving averages in the noncausal dimension. The noncausal MVRs
are driven by 2-D moving average fields. .

Remarks

1. Definition: A two-dimensional sequence x im, n) is called causal, semicausal,


or noncausal if its region of support is contained in, Sj, S2, or 5J, respectively.
Based on this definition, we call a filter causak semicausal, or noncausal if its
impulse response is causal, semicausal, ornoncausal, respectively.
2. If the prediction error filters A (ZI> Z2) are causal, sernicausal, or noncausal,
then their inverses VA (ZI> %2) are likewise, respectively.
. 3. The causal models are recursive, that is, the output sample uim, n) can be
uniquely computed recursively' from the past outputs and inputs-s-from
{u(m, n), e(m, n) for (m, n) E SI}' Therefore, causal MVRs are difference
equations that can be solved as initial value problems.
The semicausal models are semirecursive, that is, they are recursive only
in one dimension. The full vector u, = {u(m, n), Vm} can be calculated from
the past output vectors {Uj,j <n} and all the past and present input vectors
{ej>j <n}. Therefore, semicausal rv1VRs are difference equations that have to
be solved as initial value problems in one of the dimensions and as boundary
value problems in the other dimension.
The noncausalmodels are nonrecursive because the output u(m, n)
depends on all the past and future inputs. Noncausal MVRs are boundary
". value difference equations.
4. The causal, sernicausal, and noncausal models are related to the hyperbolic,
parabolic, and elliptic classes, respectively, of partial differential equations
[21].
5. Every finite-order causal MVR also has a finite-ordernoncausal minimum
variance representation, although the converse is not •
true. This is because the
• SDF of a causal MVR can always be expressed in the form of the SDF of a
noncausal MVR.

Sec. 6.6 Linear Prediction in Two Dimensions 209



~'igure 6.10 Partition' for Markovian-


ness of random fields.

6. (Markov random fields). A two-dimensional random field is called Markov


if at every pixel location we can find a partition J" (future), a.I (present) and
J: (past) of the two-dimensional lattice {m. n} (Fig. 6.10) providing support
to the sets of random variables U+, au, and U-, respectively, such that
p[U+IW ,aU] = P[U+ICIU]

This means, given the present (aU), the future (U+) is independent of the past
(U-). It can be shown that every Gaussian noncausal MVR is a Markov
random field [19]. If the noncausal prediction window is [-p, p Jx [-q, q],
then the random field is called Markov [p X q]. Using property 5, it can be
shown that every causal MVR is also a Markov random field.
Example 6.7
For the models of Example 6.5, their prediction error filters are given by
A(z"z2)=.1-a,zl'-lIzz;' -II,Z,' zi' (causal-model)

A (z" Z2) = 1- a, zi' - a2Z, - ll, xi' (semicausal model)
A, (z" Z2) = 1- lit zl' - II.Z, - a,z;' -II. Z2 . (noncausal model)
2
Let the prediction error variance be 13 • If these are to be MVR filters, then the
following must be true.

causal model. S. (z" Z2) = pz, which means e(m, n) is a white noise field.
This gives

• Semicausal model. 2
S. (z" Zl) = 13 [1 - a, zi ' - 112Z,1 = /32 A (.1" 00). Because
the SDF S.(ZhZ2) must equal S, (2:'''z,'). we must have 0, = 02 and

2
S ( ) _ 1 3 (1 - o, zi ' - a, z,) , .
• Z"Zz -'( 1
- 0, z, ' - a, Zl - a,z.-')( 1 - a, Zl - 0, Zl-- 'O,Z2
)'
Clearly e(m, n) is a moving average in the m (noncausal) dimension and is white in the
n (causal) dimension.

Noneausal model.
2
S. (z"zz) = 13 [1 - al zl' - a2Z' - a,z,' - a4z2) = ~z A (z" Z2) •

Once again, for S. to be an SDF, we must have a, = 0., a, = a., and


2
) .:. . 13
A (z"z.) . _ 1 3 '
S.(Z"Z2 -A· ( Z"Z2 )A(Z,-I ,Z2-I) -1·-a,z,-, -o,z,-o,z.-I -a'Z2

210 Image Representation by Stochastic Models . Chap. 6



Now £(m, n) is a two-dimensional moving average field. However, it is a special moving
average whose SPF is proportional to the frequency response of the prediction error
filter. This allows cancellation of the A (Zl, Z2) term in the numerator and the denomi-
nator of the preceding expression for S. (Zh Z2)' .

Example 6.8
Consider the linear system
A (z" Z2) U (z" Z2) = Ftz«, Z2)
where A (z" Z2) is a causal, semicausal, or noncausal prediction filter discussed in
Example 6.7 and F(Z"Z2) represents the Z-transform of anarbitrary input [(m, n).
For the causal model, we can write the difference equation
u(m, n) - al u(m -I,n):' a2u(m, n -1) - a,u(m - I,n - 1) = [(m, n)
This equation can be solved recursively for all m ~ 0, n ~ 0, for example, if the initial
values u(m, 0), u(O,n) and the inputf(m, n) are given (see Fig. 6.11a). To obtain the

n -1 n •
It-I "

• m -1

• • • m ,
c
m-l • m+ 1
~'
m~ • D ....-<O·) ,

• • m=N+1
",

m m

lal Initial conditions for the _sal system. Cbl Initial and boundary conditions for the semicausal system. ,
,

n- 1 " ;n e N +1

m -1

m , , •
!

m+ 1

m=N+1
,,
, ,
m

leI Boundal"{ conditions for the noncausal system.


Figure 6.11 Terminal conditions for systems of Example 6.8.

Sec. 6.6 Linear Prediction in Two Dimensions 211


,
-
solution at location A,
' . .
we need only the previously determined solutions at locations
B, C. and D and the input at A. Such problems are called initial-value problems.
. The sernicausal system satisfies the difference equation

,
u(m, 11) - a, II (m - 1, Il) - 112 utm + 1, nl - a3 u(m, n - 1) = t(m, til •

Now the solution at (m,n) (sec Fig.6.l1b) needs the solution at a future location
'(m + 1, n). However, a unique solution for a column vector u, £; (u(l,n), ... ,
"(N, ll)jT can be obtained recursively for all n ~O if the initial values utm, 0) and the
boundary values u(O,jl) and u(N + L») are known and if the coefficients aJ,a2,a3
satisfy certain stability conditions discussed shortly. Such problems are called initial-
boundary-value problems and 'can be solved semirecursively, that is, recursively in one
dimension and nonrecursively in the other.
For the noncausal system, we have
u(m,n) -al(m -I,n) -atu(m + l,n)-a3(m,n -1) -a.u(m,n + 1) =[(m,n)
• •

which becomes a boundary-value problem because the solution at lm, n) requiresthe


solutions at (m + 1, n) and (m, n + 1) (see Fig. 6.11c).
Note that it is possible to reindex, for example, the sernicausal system equation
as
.1 a1 (I, -1
u(m' n)
a-2 - u(m -1 '. n) +-a2 u(m - 2'n) a+2 - u(m -1 ' n -1). = a2, f(m -1 , n)

This can seemingly be solved recursively for all m 2: 0, n :> 0 as an initial-value problem
if the initial conditions s On, O).u(O,n),u(-I,n) are known. Similarly, we could reo
index the noncausal system equation and write it as an initial-value problem. However,
this procedure is nonproductive because semicausal and noncausal systems can become
unstable if treated as causal systems and may not yield unique solutions as initial value
problems. The stability of these models is considered next

Stability of Two-Dimensional Systems


In designing image processing techniques, care must be taken to assure the under-
lying algorithms are stable. The algorithms can be viewed as 2-D systems. These
systems should be stable. Otherwise, small errors in calculations (input) can cause
large errors in the result (output).
The stability of a linear shift invariant system whose impulse response is
h(m, n) requires that (see Section 2.5)

2: 2: Ih(m, n)I, < 00 (6.78)


m n

We define a system to be causal, semicausal, or noncausal if the region of support


of its impulse response is contained in Sr, Sz, or SJ, respectively. Based on this
definition, -the stability conditions for different models whose transfer function·
H(zj,zz) == ItA (Z1>Zz), are as follows. ,

Noncausal systems
p q
A(zl;z~.)=l- 2: 2: a(m,n)zi'mzi' (6.79)
In'=-pn=-q
(n:. n)* to.O)
• •

212 Image Representation by Stochastic Models I:;hap.6
These are stable as nonrecursive filters
,
if and only if

Semicausal systems
p p q

A(zl>z2)=I- 2: a(m,O)zlm- 2: 2: a(m,n)zimzi n


_ m"'-p m=-pn-l
m ,,0

These are semirecursively stable if and only if


A (Z1> Z2) '* 0, IZII = 1, (6.80)

Causal systems
p p q
A(Z1> Z2) = 1- 2: a(m,O)zlm- 2: 2:a(m,n) zlm zi"
m=l m=-pr.=l

These are recursively stable ifand only if


A (ZI> Z2)'* 0, IZll?!: 1, Z2 = "",
A (Zh Z2) '* 0, iZI = 1, IZ21"" 1 (6.81)
These conditions assure H to have a uniformly convergent series in the appro-
priate regions in the ZI> Z2 hyperplane so that (6.78) is satisfied subject to the
causality conditions. Proofs are considered in Problem 6.17.
The preceding stability conditions for the causal and semicausal systems
require that for each WI E (-'IT, 'IT), the one-dimensional prediction error filter
A (e iw" Z2) be stable and causal, that is, all its roots should lie within the unit circle
1Z2I = 1. This condition can be tested by finding, for each Wt. the partial correlations
P(Wl) associated with A (e i"\ Z2) via the Levinson recursions and verifying that
Ip( U}1)1 < 1 for every U}I' For semicausal systems it is also required that the one-
dimensional polynomial A (Zi> 00) be stable as a noncausal prediction error filter,
that is, it should have no zeros on the unit circle. For causal systems, A (Zll 00) should
be stable as a causal prediction error filter, that is, its zeros should be inside the unit
circle.

6:7 TWO-DIMENSIONAL SPECTRAL FACTORIZATION


, AND SPECTRAL ESTIMATION VIA PREDICTION MODELS •

, ' Now we consider the problem of realizing the foregoing three types of representa-
tions given the spectral density function or, equivalently, the covariance function of
the image. Let H (Zl. Z2) represent the transfer function of a two-dimensional stable
linear system. The SDF of the output u(m, n) when forced by a stationary random
field E(m, n) is given by
1
• .. Su(ZhZ2) = H(Zh z2)H(z'!\zi )S. (ZhZ2) . (6.82)
The problem of finding a stable system H(ZI,Z2) given 8u(ZhZ2) and S.(ZhZ2) is
called the two-dimensional spectralfactorizationproblem. In general, it is not possi-

. Sec. 6.7', Two~dimensionalSpectralFactorization and Spectral Estimation 213


, ble to reduce a two-dimensional polynomial as a product of lower order factors.
Thus, unlike in 1-0, it may not be possible to Iinda suitable finite-order linear
system realization of a 2-D rational SDF. To obtain finite-order models our only
recourse is to relax the requirement of an exact match of the model SOP with
the given SDF. As in the one-dimensional case [see (6.28)J the problem of designing
a stable filter H(Zh Z2) whose magnitude of the frequency response is given is
essentially the same as spectral factorization to obtain. white noise-driven
representations.
Examl'le 6,\1
Consider the SDF
S(z"z,) = 1- a(z, + z,' + z, + ZZ·,), 10(1 < l (6.83)
• • •
Defining A (z,) = 1- a(z, + zi"I),S(Zl,ZZ) can be factored as
, S(Zh z,) = A (z.,) - O(z. + z,') = H(Zh z,)H(z,l, zi"1)
where •
(6.84)
a A (Z, ) + V A 2(Z' ) - 4<x'
H(z" z,) ~ (1 - p(z,)z,') =a~ p (Zz) = 2a
p(z,) ,
Note that H is not rational. In fact, rational factors satisfying (6.82) are not possible.
Therefore, a finite-order linear shift invariant realization of S(z" z,) is not possible.

Separable Models

If the given SDF is separable, that is, S(Zl> Z2) = S. (z,) S2(Z2) or, equivalently,
r(k, I) = r, (k)r2 (I) and 5, (Zl) and 52 (Z2) have the one-dimensional realizations
[HI (Z\), 5" (Zl)] and [H2(Z2), S'2 (Z2)] [see (6.1)J, then S(ZI> Z2) has the realization
H(zt.z2) ,d, H I (zl)H2(Z2) .
S, (Zh Z2) = S" (ZI)S., (Z2) . (6;85)
Example 6,10 (Realizations of the separable cavariance function)
The separable covariance function of (2.84) can be factored as r, (k) = 0- 2 pl~l, r2(1) =
pl~l. Now the covariance function r(k) ~ 0- 2 plkl has (1) a causal first-order AR realiza-
tion (see Example 6.1) A(z) ~ 1 -' pz -I, S. (z) = 0"(1- p2) and (2) a noncausal MVR
(see Section 6.5) with A(z)!b-a(z +z·'), Sv(z)=a2~'A(z), ~'~(l-p2)1
(1 + p'), n ~ pI(l + p2). Applying these results to r, (k) and r, (1), we can obtain the
following three different realizations; where o., 13" i '" 1,2, are defined by replacing p
by p" i = 1,2 in the previous ddi!'.itions
.
of
.
a and 13.. ,
,' .

Causal MVR IC1 model), Both r. (k) and r, (I) have causal realizations. This

gIves •
A (zt, z,) = (1- P, zi 1)(1 - Pzzi '),

2
S. (z" z,) = 0- (1- P1)(1 - p1)
u(m, n) = Plu(m -I,n) + p2u(m, n -1) (6.86)

-pIPzu(m·""1,n -1}+ e(m,n)

214 Image Representation by Stochastic Modell; Chap. I>



,

r, (k, l) "" ( j ' (1 - ,{,)(1 -liz)o(k, I)

Semicausal MVR (SC1 model), . Here', (k) has nencausal realization, and
. ,,(1) has causal realization. This yields the semicausal prediction-error filter

A(z" 4') "" [1- \X, (z , + z,I)](1- 112 Zit), .
S. (Zl, Z2) "" fl 21s. (1 - ~)[1 - \X, (z, + zll)]
u(m, n) "" Ot,[u(m -1,n) + u(m + l,n)] (6.87)
. +p,u(m,n -1)-p2(lI[u(m -1,n -1)
+u(m+1,ri-l)j+6(m,n) .

Noncausal MVR (NCt modell, Now both Tl (k) and '2(/) have noncausal
realizations, giving
A (Zl' Z2) "" [1- al (Zl + zj')][1 - U2 (Z2 + zi')],
S. (Z"Z2) "" 0" f3'i ~A (z" Z,)

u(rn, n) = ot,[u(m -1, n) + u(m + 1,It)J
(6.88)
" + Q.[u(m, n - 1) + u(m, It + I)J
- Qc,Qc,{u(m -i,1t -1) •
+u(m + 1,1t -1)+u(m -1,1t +1)
+ u(m + l,n + 1)] + 6(m, n)
.
This example shows that all the three canonical forms of minimum variance representa- .
,

tions are realizable for separable rational SDFs. For nonseparable and/or irrational
". SDPs, only approximate realizations can be achieved, as explained next. . .

:Realization of Noncausal MVRs .

For nonseparable SDFs, the problem of determining the prediction-error filters


becomes more difficult. In general, this requires solution of (6.66) with p -l> CIO,
q -+ 00. In this limiting case for noneausal MVRs, we find [see (6.77)]
S. (Zl> 22):=,@2== ' . . J:i~2 _ (6.89)
·· A (Zl> Z2), 1-. "".t.-.
" " a (m, n) Zl-m Z2-n
• •
I"', ,) E S,

,' •

For the given SDF S(ZI> Z2), suppose lIS (Zl, Z2) has the Fourier series
1 '" '"
• S
(
Zj,Z2)
== L L
"'=-"n=-'"
r+(m,n)zl"'zi',' zl";'e iw1, z2==e i"'" (6.90)
, I

then the quantities desired in (6.89) to match Sand S. are obtained as


1 - r + (m n)
1!,2:= r" (0, 0) , a (m, n) = r+ (0, ~) (6.91)

Sec. 6.7 Two-dimensional Spectral Factorization and Spectral Estimation 215


This is the two-dimensional version of the one-dimensional noncausal MVRs con-
sidered in Section 6.5. ,
, 'In general, this representationwill be of infinite order unless the Fourier series
of liS ("1, z,) is finite. An approximate finite-order realization Can be obtained by
truncating the Fourier series to a finite number of terms such that the truncated
series remains positive for every (Wl> (')2)' By keeping the number of terms suf-
ficiently large, the SDF and the covariancesof the finite-order noncausal MVR can
be matched arbitrarily closely to the respective given functions [16}.
Example 6.11
Consider the SDF ,

1 1
S(zr,Zz) = [ ( -I ' .')]-' 0<0:<4 (6.92)
. 1 - c< Zl + ZI + Z2 + Z2
Clearly, s:has a finite Fourier series that gives 13' = 1, a(m, n) "" a for (m, n) =
(:t1,0). (0, ± 1) and a(m, n) "" 0, otherwise. Hence the noncausal MVR is
u(m, n) = o:[u(m + 1, n) + u(m -1, n) + u(m, n + 1) + u(m, n -1)] + eim, n)
, (1 (k, I) = (0, 0)
r, (k, I) "" 1':(1, (k, l) "" (±I, 0), (0, :t1)
lo, otherwise
Realization of Causal and Semicausal MVRs
Inverse Fourier transforming S(el"',e iw,) with respect to (1)1 gives a covariance se-
quence r,(e iw,), 1 = integers, which is parametric in (1)2' Hence for each W2; we can
find an AR model realization of order q, for instance, via the Levinson recursion
(Section 6.4), which will match rr(e i"'2) for -:-q :s; l < q.
Let 13 (e iW2),an (e i" ,) , 1 S n S q be the AR model parameters where the predic-
2

tion error (32 (e jWZ ) > 0 for every W2. Now (32 (e i"',) is a one-dimensional SDF and can
be factored by one-dimensional spectral factorization techniques. It has been shown
that the causal and semicausal MVRs can be realized when 13-2 (e l"',) is factored by
causal (AR) and noncausal MVRs, respectively. To obtain finite-order models
an (e l"'2) and /32 (e l"',) are replaced by suitable rational approximations of order P, for
instance. Stability of the models is ascertained by requiring the reflection coeffi-
cients associated with the-rational approximations of an (e i" ,) , n '" 1,2, ... ,q to be
less than unity in magnitude. The SDFs realized by these finite-order MVRs can be
made arbitrarily close to. the given SDF by increasing p and q sufficiently. Details
may be found in [16}, ' .
,

Realization via Orthogonality Condition


,
The foregoing techniques require working with infinite sets of equations, which
have to be truncated in practical situations. A more practical alternative is to start ,
with the covariance sequence r(k, l) and solve the subset of equations (6.66) for
(k, l) E l¥. C S., t1)at is, .
,

r(k, l) = 2:2:. (lp,q tm, n)r(k - m, 1 - n) + I?>~,q '3(k, I); (k, l) El¥. (6.93)
(m,n) € W. '

216 Image Representation by Stochastic Models


(

where the dependence of the model parameters on the window size is explicitly A

shown. These equations are such that the solution for prediction coefficients on VII,
requires covariances rtk, /) from a larger window. Consequently, unlike the I-D
AR models, the covariances generated by the model need not match the given
covariances used originally to solve for the model coefficients, and there can be
many different sets of covariances which yield the same MVR predictors. Also,
stability of the model is not guaranteed for a chosen order..
In spite of the said shortcomings, the advantages of the foregoing method
are (1) only a finite number of linear equations need to be solved, and (2) by solving
these equations for increasing orders (p, q), it is possible to obtain eventually a
finite-order stable model whose covariances match the given rik, I) to a desired
accuracy. Moreover, there is a 2-D Toeplitz structure in (6.93) that can be exploited
to compute recursively ap , q (m, n) from ap-l,q (m, n) or ap , q -1 (m, n), and so on, This
yields an efficient computational procedure, which. is. similar to the Levinson-
Durbin algorithm discussed in Section 6.4.
If the given covariances do indeed come from a finite-order MVR, then the
solution of (6,93) would automatically yield that finite-order MVR. Finally, given
the solution of (6.93), we can determine the model SDF via (6.77), This feature
gives an attractive algorithm for estimating the SDF of a 2-D random field from a
limited number of covariances,
Example 6.12
Consider the isotropic covariance function r(k, l') = O.9vkj~+I2. Figure 6.12 shows the
impulse responses {-a (m, n)} of the prediction error filters for causal, semicausal, and
n n
-0.233 -0.3887 -0.0135 -0.0006 • -0,2518 . -0.0006

11.0 i -0.3133 . ~ "'0,2546 -0,2518 [IQ] -0.2518

-0.4963 -0.054 -0.3887 -0.0135 -0.0006 -0.2518 -0,0006


m m m
~l, 1 ~ 0,008936 ~1. , = 0.094548 ~l, 1 ~ 0.039527

Causal . Semicausal Noncausal


o 1 :i 0 1 2
n n
-2 O. -0.1009 -0.0289 -2 -0.0028 0.0268 0.0094
-1 O. . -{I.21ll0 -0.0509 -I -0.3875 -0.0388 0.0187

0 0]] -0.3634 -0,0516 0 11.0 i -0.2578 0.0304


,\ 1 -0.4797 -0,0229. -0.0280 . 1 -0.3875 -0.0388 0,0187
.
2 -0.0124 -0,0317 -0.0149 2 -0.0028 0.Q268 0,0094
m m
,
~~ .
2 ~ 0.093297
,

Causal Sernicausal
,
Figure 6.12 Prediction error filler impulse responses of different MVRs. The
. , origin (0, 0) is at the locanon of the boxed elements.

Sec. 6,7 Two-dimensional Spectral Factorization and Spectral Estimation 217


(a) Given spectrum (hI Samicausal model spectrum

Figure 6.13 Spectral match obtained by sernicausal MVR of order p = q = 2.


noncausal MVRs of different orders. TIle difference between the given covariances
and those generated by the•
models has been found to decrease when the model order is

increased [16). Figure 6.13 shows the spectral density match obtained by the sernicausal
MVR of order (2.2) is quite good.

ElWtlple 6.13 (Two-Dimensional Spectral Estimation)


The spectral estimation problem is to find the SOF estimate of a random field given
either a finite humber of covariances or a finite number of samples of the random field.
As an example, suppose the 2·0 covariance sequence
k I) cos 31T(k + 1) k 1
r(k, 1) '" COS'lT ( 8 + 4. + 16~- + O.058( , )

is available on a small grid {-16 s k, Is 16}. These covariances correspond to two


plane waves of small frequency separation in lO·dB noise. The two frequencies are not

resolved when the given data is padded with zeros on -a 256 x 256.grid and the DFT is
taken (Fig. 6.14). The low resolution afforded by Fourier methods can be attributed to
the fact that the data arc assumed to be zero outside their known extent.
• •
.
The causal and semicausal models improve the resolution of the SOF estimate
beyond the Fourier limit by implicitly providing an extrapol~tionof the covariance data
outside its known extent. The rnetnod is to fit a suitable order model that is expected to
characterize the underlying random field accurately. To this end, we fit (p, q) =(2, 2)
order models by solving (6.66) for (k, I) E W; where W: c S"x, =,1, 2 are subsets of
causal and sernicausal prediction regions corresponding to (p, q) = (6,12) and defined
in a similar manner as n~ in (6.61b). Since l¥x c Wi, the resulting system of equations

is overdetermined and was solved by least squares techniques (see Section 8.9). Note
that by solving (6.66) for (k, l) E W;, we are enforcing the orthogonality condition of
(6.63) over W;. Ideally, we should let W; "" S,. Practically, a reasonably large region'
suffices. Once the model parameters are obtained, the SDP is calculated using (6.77)

218 Image Representation bV Stochastic Models. ' Chap. I)



,
:i"
@";,liHl e " • . . . ,...,.",&...--.,~-..-. ----,,_.__ _-_......-
..

(alOFT spectrum; fbI causal MVR spectrum; (c) 68micausal MVR spectrum.

Figure 6.14 Two-dimensional spectral estimation.

with z, '" expfjo»), Zz '" exp(jwz). Results given in Fig. 6.14 show both the causal and
the semicausal MVRs resolve the two frequencies quite well. This approach has also
been employed successfully for spectral estimation when observations of the random
field rather than its covariances are available. Details of algorithms for identifying
model parameters from such data are given in [17].

6.8 SPECTRAL FACTORIZATION VIA THE WIENER-DODe


HOMOMORPHiC TRANSFORMATION

Another approach to spectral factorization for causal and semicausal models is


through the Wiener-Doob homomorphic transformation method. The principle
behind this method is to map the poles and zeros of the SDP into singularities of its
logarithm. Assume that the SDF S is positive and continuous and the Fourier series
of logS is absolutely convergent. Then ..

10gS(zbZz) = . 2:
m .. n
2:
-00 = -::.t
c(rn, n)zi'''' zin, (6.94)

where the cepstrum c(rn, n) is absolutely summsble

• 2: 2: Ic(rn, n)1 <to (6.95)


m " •
Suppose log S is decomposed as a sum of three components
log S ~ S. + C' + c- (6.96)
, , •

Then

S= S. ~S H'H- (6.97)

A+(ZhZZ)A-(zj,zz)- e

is a product of three factors, where

Sec. 6.8 Spectral Factorization via the Wiener-Doob Tram"ormation 219


If the decomposition is such that A + (Z1> Z2) equals A - (Z1 1, Z2 1) and S, (zj, Z2) is an
SDF,then there exists a stable, two-dimensional linear system with transfer func-
tion H+ (ZI> Z2) = ItA + (Zh Z2) such that the SDF of the output is S if the SDP of the
input is S,. The causal or the semicausal models can be realized by decomposing the
cepstrum so that C+ has causal or semicausal regions of support, respectively. The
specific decompositions for causal MVRs, semicausal WNDRs, and semicausal
MVRs are given next. Figure 6.15ashows the algorithms.

Causal MVRs [22]

Partition the cepstrum as


ctm, n) == c" (m, n) + c" (m, n)+ i. (m, n)
" " .
where c" (m, n) = 0, (m, n) If. SI; c" (m, n) == 0, (m, n) E 8 1; c" (0,0) = c" (0,0) = 0,
'Y. (m, n) = c(O, O)8(m, n). s,
and 8 1 are defined in (6.56) and (6.65), respectively.
Hence we define
'" ~ m

c+ == c+ (ZI> Z2) ~ 2: c(m, o)zlm + 2: 2: c(m, n)z;:m Z2" (6.99a)


m= 1 m= -o:.>n=-l

-1 ~ -1 .

c- = C- (ZlJ Z2) ~ 2: cim, O)Zl


m =_'X'
m
+
11;=
2: 2:
-0011:::::-~
c(m, n)zjm z:i n (6.99b)

(6.99c)

Using (6.95) it can be shown that. C+ (Zl, Z2) is analytic in the region {izll = 1,
Izd 2: 1} U {IZII2: 1, Z2 = oo}. Since eX'is a monotonic function, A + and H+ are also
analytic in the same region and, therefore, have no singularities in that region. The
region of support of their impulse responses «: (m, n) and h + (m, n) will be the same
as that of c+ (m, n), that is, 51' Hence A + and H+ will be causal filters .


Semicausal WNDRs

Here we let c" (m, n):= 0, (m, n) If:. S2and define


~ ~ ~

C+ (Zh Z2):= ~
In
2:
0:: -till'
m
c(m, O)Zl + 2:
m=
2:1
ctm, n)Zlm Z2"
-«I It !lit
(6.100a)

. ~ ~ -1

C- (Zh Z2) =! 2: c(m, O)ZI"'+ 2: 2: c(m, n)Zj'm zi" (6.10Gb)


m "'" -Q:! m "'" -00 n'" -\I;

S. (Zl, Z2) = 0 (6.10Oc)
Now C+ and hence A + and H+ will be analytic in the region Liz 1 I = l,lz2';;:: I} and
the impulse responses s" (m, n) andh+(m, n)will be zero for{n <0, Vm}, that is.
these filters are semjcausal.
.
.~
Also S, = 1, which means the model, is white noise
driven.

220 Image Representation by.Sfochastic Models . Cl1ap.6



r--- -----------------~ I

II : •

rim, nll I efm, nl


log
r I
I I
I I
~-------~----------~~ A

r fm. n)

ICIJ
I
I

w(m, n)

a+ 1m, nl

-1


Causal MVR Semicausal WNDR Semicausal l\llVR

w(m, nl =i
'f0'l
,
1m,h n)eS,
.
at 8rwl5e
A

wlm,
"
n ) fa
ee
l - j.,ln),
,
h
ot e(\VI5e
.
1m, nleS.
-wIm
,
nl -
- (1, 1m, nleS.
~
lO, otherwise
w,(m, nl = .,(m)6fn) w,lm, n) = 0 w,(m,n! = -.,(n)

fal Wiener-Doob homomorphic transform method. '1ft is the two dimensional homomorphic transform;

• • • •
• • • •
• • • • •
-0.0044 0.0023 --0.0016 0.0010
• •
-0.9965 0.0034 -0.0024 0.0015 •
• •
-0.0101 0.0052 • -0,0039 0.0026
• 0.0004

-0.0169 0.0083 0.0008 -0.0063 0.0056 0.0006 •
• •
-0.0315 0.0144 0.0011 • 0.0019 0.0259 0.0052 0.0004

-0.0712 0.0271 0.0017 0.0004 -0.3866 -0.0385 0.0168 0.0010

I
-0.2'142 0.0426 0:0025 0.0005 rn 11.000 I -0.2753 0.0282 0.0014 •••

rt n 11.0000
-0.4759
-0.3517
-0.0299
0,0460
0.0238
0.0025
0.0016
0.0005'., ••
0.0005 m
-0.3866
0.0019
-0.0385
0.0259
0.0168
0.0052
0.0010
0.0004
m -0.0029 0.0363 0.0068 0.0007 0.0004 -0.0063

'0.0056 0.0006 •
-0.0111 "
0.0091 • •
0.0010 0.0004 • , :"0.0039 0.0026 0.0004
• •
-0.0070 0.0045 0.0007 0.0003 -0.0024 0.0015 •
-0.0045 0.0028 0.0004 • •
• --0.0016 0.0010

--0.0030 0.0019 0.0003 •




" .•

-0.0021 0.0013 0.0002 _' j""'" ,
, ., •
·


'



' '


• • , •
Semlcausal MVR, {32 = 0.09324

Causal MVR, /3' = 0,11438


(bllnflnlte order prediction error filters obtained by Wiener-Dool:> factorization for
isotropic covariance model with p = Q,9, 17 = 1.
• • .
Figure 6.15 Tho-dimensional spectral factorization

Semicausal MVRs

For semicausal MVRs, it is necessary that S, be proportional to A + (Zh (0) [see


(6.76)]. Using this condition we obtain
~ ~ ~

C+(Zt,Z2)= L c(m,O)zl"'+ L Lc(m,n)zl"'z2" (6.101a)


m"",-a) fft""'-""n ..... l
" ~ tit> -1
C- (Zt, ZZ) = L c(m, O)Zl'" + L ' .2:
",=-oon __ 46
c(m, n)zl'" zi" (6. IOlb)

'"
S, (Zh Zz) == - 2: c(m, O)ZI'" •
(6.101c)

The region of analyticity of C+ and A + is {lzli = l,lzzl ~ I}, and that of S, and S, is
{lzll == l,\fzz}. Also, a + (m, ll) and h + (m, n) will be semicausal, and S, is a valid SDF.
I '

Remarks and Examples


1. For noncausal models,' we do not go through this procedure. The Fourier


series of s:'
yields the desired model [see Section 6.7 and (6.91)J.
2. Since C+ (ZhZZ) is analytic, the causal or sernicausal filters just obtained are
stable and have stable causal or semicausal inverse filters.
3. A practical algorithm. for implementing this method requires numerical ap- A . ' .

proximations via the DFf in the evaluation of S and A +or H+.


Example 6.14
,

Consider the SDF of (6.83). Assuming 0<<X<,;;1, we obtain the Fourier series
logS = <X(ZI + %1 1 + Zz + zi") + 0(a2 ) . Ignoring O(a') terms and using the preceding
results, we get
CausalMVR
l), ,
C+=a(zl'+zi S.(ZI,Z,)=O
~ A + (Zt. %2) = exp( ":C''') "" 1- a(z,1 + zi'I), S. (XI,X2) = 1
Semicausal MVR

C+ = a(2:t + XI I)+ l
azi , S. =: -a{xl + Zt l
)

~ A + (Zl, z,) =< 1 - a(zl + ZII) - ,azi!, S. (Z1o z,) "" 1 -, cx(Zt + zll)
,

, Semicausal WNDR


et(%, + ZI-')+'
C,+ =:2 ·-1
' OI.Z2, 8.=0

Example 6.15
Figure 6.15b shows the results of applying the Wiener-Dcob factorization algorithm to,
the isotropic covariance function of Example 6.12. These models theoretically
,
achieve

222 Image Representation by Stochastic Models . Chap. 6



,

perfect covariance match. Comparing these with Fig. 6.12 we see that the causal and
semicausal MVRs of orders (2,2) are quite close to their infinite-order counterparts in
Figure 6.15b. .


6.9 IMAGE DECOMPOSITION, FAST KL TRANSFORMS,
AND STOCHASTIC DECOUPLlNG

In Section 6.5 we saw that a one-dimensional noncausal MVR yields an orthogonal


decomposition, which leads to the notion of a fast KL transform. These ideas can
be generalized to two dimensions by perturbing the boundary conditions of the
stochastic image model such that the KL transform of the resulting random field
becomes a fast transform. This is useful in developing efficient image processing
algorithms. For certain sernicausal models this technique yields uncorrelated sets
of one-dimensional AR models, which can be processed independently by one-
dimensional algorithms.

Periodic Random Fields


A convenient representation for images is obtained when the image model is forced
to lie on a doubly periodic grid, that is, a torroid (like a doughnut). In that case, the
sequences u(m, n) and etm, n) are doubly periodic, that is,
eim, n) = E(m + M, n + N)

u(m, n)= u(m + M, n + N} '<1m, n (6.102)


where (M, N) are the periods of (m, n) dimensions. For stationary random fields
the covariance matrices of u(m, n) and E(m; n) will become doubly block-circulant, ,

and their KL transform will be the two-dimensional unitary DFT, which is a fast
transform. The periodic grid assumption is recommended only when the grid size is
very large compared to the model order. Periodic models are useful for obtaining
asymptotic performance bounds but are quite restrictive in many image processing
applications where small (typically 16 x 16) blocks of data are processed at a time.
Properties of periodic random field models are discussed in Problem 6.21.
• •

• Example 6.16
Suppose the causal MVR of Example 6.7. written as •

uim, n) = p, (m -l,n) + p,u(m, n -1) + P3u(m -t,n -1) + e(m, n)



I E[e(m. n)] = 0, r.(m, n) = f32l)(m. n)

is defined on an N x N periodic grid. Denoting l'(k./)· and e(k, I) as the two-
dimensional unitary DFTs of u(m, n) and e(m,'n), respectively, we can transform both
sides of the model equation as .

where we have used the fact that u(m, n) is periodic. This can be written as

Sec. 6.9 Image Decomposition, Fast KL Transforms, and Decoupling 223


v(k, I) = e(k, I)/A (k, l), where
A(k, I) = 1- p;W~-

P2 WN - P3 W;, WN
Since ~(m, n) is a stationary zero mean sequence uncorrelated over the N x N grid,
e(k, l) is also an uncorrelated sequence with variance 13 2 • This gives
*
, , _ Ele(/(, I)e* (k'l')] _ 13' "
E[v(k, l)v (k ,I )] - IA(k, 1)1 2 -IA(k, l)i 2 3(k - k ,'-I)

that is, v(k, I) is an uncorrelated sequence. This means the unitary DFT is the KL
transform of u (m, n).
• •
Noncausal
. . Models and Fast KL
.
Transforms
Example 6.17

The noncausal (NCZ) model defined by the equation (also see Example 6.7)
u(m;n) - a[u(m -l,n) + u(m + l,n) I
(6.103a)
· -r uim,» -l)+u(m,n +l)]=e(m,n)
becomes an ARMA representation when e(m, n) is a moving average with covariance
1 (k,/) = (0,0) .
ai'
r, (k, l) = f3 -0:" (k,/)=(±l,O) or (O,::t1) (6.1i)3b)
0, otherwise.
For an 111 x Nimage U, (6.103a) can be written as

QU + UQ = E + B 1 + B2

. ., '

, (6.104)
•••

",b~

where b., 1>;" b o, and b. are 111 x 1 vectors containing the boundary elements of the
image (Fig. 6.16), Q is a symmetric, tridiagonal, Toeplirz matrix with values j along the
· ll'rnl\lll U


.
b[
I
---- I +

U· V<' •••
IIf2 uncornllated
b:li
I
U I b.
1
,

-
+ >J;U·q,T • •••
" .. ~ random "ariabl.
----
bf . •


Boundary
values. a •
.>(-1

• •

Figure 6.16 Realization of noncausal model decomposition and the concept of fast
KL transform algorithms. 'i' is tile fast sine transform.

·224 Image Representation by Stocha~tic Models Chap.S


,
main diagonal and -a along the two subdiagonals, This random field has the decom-
position
u = lJ'l+ Vb •
where Vb is determined from the boundary values and the KL transform of If' is the
(fast) sine transform. Specifically,

lb· -:- A~l (.6-, + 6- 2) , .~~ d (I ®Q + Q ® I)


(b0= lQ-u! (6.105)
where lQ, uP, (b., 0-] and 0- 2 are N' x 1 vectors obtained by lexicographic ordering of
the N x N matrices V, If', V·,B" and B" respectively. Figure 6.16 shows a realization
of the noncausal model decomposition and the concept of the fast KL transform algo-
rithm. The appropriate boundary variables of the random field are processed firsr to
obtain the boundary response u" (m. n). Then the residual UO (m, n) t.l. utm, n) -
u' (m, n) can be processed by its KL transform, which is the (fast) sine transform.
Application of this model in image data compression is discussed in Section 11.5.
Several other low-order noncausal models also yield this type. of decomposition [21].

Semicausal Models and Stochastic DecQupling

Certain semicausal models can be decomposed into a set of uncorrelated one-


dimensional random sequences (Fig. 6.17). For example, consider the semicausal
(SC2) white noise driven representation .
uirn, n) - alu(m - 1, n) + u(m + 1, n)J = 'Yu(m, n - 1) + B(m, n) (6.106)
2
where eim, n) is a white noise random field with r, (k, I) = 13 8(k, I). .For an image
with N pixels per column, where Un> En, ••• , represent N x 1 vectors, (6.106) can be
written as
(6.107)
where Q is defined in (5.102). The N x 1 vector b, contains only two boundary

-
-. •

+ u·
• v: •
N uncorrelated
+ '!' AR processes
- yO• •

. •

\ ,
,
, • Sernlcausa!
model -
Boundary values
,
b.
__ u._0 , . •

Initial values •

Figure 6.17 Semicausal rnodel decomposition.

Sec. 6.9 Image Deeomposltlon, Fast KL Transforms, and Decoupling 225


terms, b; (1):= o:u(O,n), b. (N) = o:u(N + 1, n), b. (m) = 0, 2 s m sN -1. Now u.
can be decomposed as
(6.108)
where
Qu~ = "U~-l + bn- (6.109)
Qu~ = j'u~_ J + E., (6.110)

Clearly, u~ is a vector sequence generated by the boundary values b. and the initial
vector uo. Multiplying both sides of (6.110) by 'If, the sine transform, and remem- .
bering that .r. = I,·.Q.T := A = Diag{l\(k)}, we obtain
.Q.T.U~=i'.U~_1 +.£., .u8= 0 ::}Av~='YV~ _I + e. (6.111)
where v~ and e. are the sine transforms of u~ and e., respectively. This reduces to a
set of equations deeoupled in k, .

A(k)v~(k) = i'V~ _I (k) + e. (k), v8(k) = 0,


where
Eren (k)e., (k ')J "T 132 &(n - n ')8(k - k ') . (6.112)

Since En is a stationary white 'noise vector sequence, its transform coefficients en (k)
are uncorrelated in the k-dimension. Therefore, v~(k) are also uncorrelated in the
k-dimension and (6.112) is a set of uncorrelated AR sequences. The semicausal
model of,(6.87) also yields this type of decomposition [21]. Figure 6.17 shows the
realization of sernicausal model decompositions. Disregarding the boundary
effects, (6.112) suggests that the rows of aeolumn-by-column transformed image
using a suitable unitary transform may be represented by AR models, as mentioned
in Section 6.2. This is indeed a useful image representation, and techniques based
on such models lead to what are called hybrid algorithms [1, 23, 241, that is,
algorithms that are recursive in one dimension and unitary transform based in the
other. Applications of semicausal models have been found in image coding, restora-
. tion, edge extraction, and high-resolution spectral estimation in two dimensions.

6.10 SUMMARY

In this chapter we have considered several stochastic models for images. The one-
dimensional AR and state variable models are useful for line-by-line processing of
images. Such models will be found useful in filtering and restoration problems. The
causal, semicausal, and noncausal models were introduced as different types of
realizations of 2-D spectral density functions. One major difficulty in the identifica-
tion of 2-D models arises due to the fact that a two-dimensional polynomial may not
be reducible to a product of a finite number of lower-order polynomials. Therefore,

226 Image Representation by Stochastic Models ,Chap. 6


,
,

two-dimensional spectral factorization algorithms generally yield infinite-order


models. We considered several methods of finding finite order approximate realiza-
tions that are stable. Applications of these stochastic models are in many image
processing problems found in the subsequent chapters.

PROBLEMS

6.1 (AR model properties) To prove (6.4) show that il (n) satisfies the orthogonality
condition E[(u(n) - u (n))u{m)) = 0 for every m < n. Using this and (6.3a), prove (6.7).
6.2 An image is scanned line by line. The mean value of a pixel in a scan line is 70. The
autocorrelations for n = 0,1.2 are. 6500, 6468, and 6420, respectively.
a. Find the first- and second-order AR models for a scan line. Which is a better monel?
Why?
b. Find the first five covariances and the autocorrelations generated by the second-
order AR model and verify that the given autocorrelations are matched by the
model.
6.3 Show that the covariance matrix of a row-ordered vector obtained from an N x N array
, of a stationary random field is not fully Toeplitz. Hence, a row scanned two-dimensional
stationary random field does not yield a one-dimensional stationary random sequence.
6.4 One easy.method of solving (6.28) when we are given S(w) is to let H(oo) == v'S(w).
For K ~ 1 and S(w) == (1 + p2) - 2p cos u, show that this algorithm will not yield a
finite-order ARMA realization. Can this filter be causal or stable? •

. 6.5 What are the necessary and sufficient conditions that an ARMA system be minimum
;>bse? Find if the following filters are (i) causal and stable, (ii) minimum phase. •

a. H(z) == 1 - O.8z~'
b. H(z) == (1 - z -1)/[1.81 - O.9(z + z -')]
c. H(z)=(I-O.2z-')/(1-0.9z~')
6.6 Following the Wiener-Doob decomposition method outlined in Section 6.8, show that a
I-D SDF S(z) can be factored to give H(z) ~exp[C+ (z)], C'" (z) ~ :2:=-1
c(n)z-', K =
. exp[c(O)], where {c(n),Vn} are the Fourier series coefficients of log5(oo). Show that
H(z) is a minimum-phase filter.
6.7 Using the identity for a Hermitian Toeplitz matrix
I • •

R~ ,
I
6,,*
Rn + 1 = ,



b~ , reO)
where bn ~ [r(n), r(n - 1), ... ,r(lW, prove the Levinson recursions.
6.8 .. Assume that R is real and positive definite. Then a. (k) and P. will also be real. Using·
I
the Levinson recursions, prove the following. . •

a. Ip.1 < 1,Vn.


b. Given r(O) and anyone of the sequences {r(n)}, {d(n)}, {P.}, the remaining se-
quences can be determined uniquely from it.
.6.9 For the third-order AR model
/.I(n) == O.lu(n - 1) + 0.782u(n - 2) + O.lu(n - 3) + E(n), ~2=O.06nI6
• •
find the partial correlations and determine if this system is stable. What are the first four.

• •

Sec.S Problems 227


covariances of this sequence? Assumeu (n) to be a zero mean, unity, variance . random
sequence,
6.10 (Proof of noncausal MVR theorem) . First show that the orthogonality condition for
minimum-variance noncausal estimate u (n) yields

r(k) - ::s ,,(I)r(k -1) = I)' o(k),


i -=F 0
Vi< .

Solve this via the Fourier transform to arrive at (6.38) and show that S(z) "" (>2/A (z).
Apply this to the relation S(z) = Sv(z)IA(z)A(z·') to obtain (6.39).
·6.11 Show that the parameters of the noncausal MVR of (6.43) for pth-order AR sequences
defined by (6.3a) are given by
'_'~'a(n)o(n +k).
",-(k) - £., C2 '
n=O •

where 0(0) ~-1.


6.12 Show that for a stationary pili-order AR sequence of length N, its noncausal MVR can
be. written as
Hu= v+ b.

, ""
such that Ub (x), which is determined by the boundary values Uo and U, • is orthogonal to
uD(x). Moreover, the KL expansion of UO (x) is given by the harmonic sinusoids


k=I,3;5 ...• -L$x$L

k = 2,4,6 ...

228 Image Representation by Stochastic Models Chap. 6


' . ,


6.15 Determine whether the following filters are causal, semicausal or noncausal for a
vertically scanned, left-to-right image.
a. H(z, , z,) = z. + zl' + zl' Z2 ' •
b. Htz, , Z2) = 1 + zl' + Zl' z , + zi" Zl'
C. H(Z,. z,)
,
= (2 - ...
71
1; -".."
-I)
""'''-
. 1
d. H(z"zz) = 4 . _, _,
- Zl - Zl - Z2 - Z2

Sketch their regions of support.
6.16 Given the prediction error filters

(causal)
A (z, Z2) = 1 - a, (z. + zl') - a2zi ' - a, Z2 "(z, + zj ') (scrnicausal)
A (z, , Z2) = 1 - a, (z, + Zl') - a2 (z, + zil)
.; a3z2 I (ZI + zl')- a.zl' (zz + zi') (noncausal)

a. Assuming the prediction error has zero mean and variance i?>2, find the SDF of the
prediction error sequences if these are to be minimum variance prediction-error
filters.
b. What initial or boundary' values are needed to solve the filter equations
A (ZI ,ZZ)U(ZI ,Z2) = Fiz, ,Z2) over an N x N grid in each case?
6.17 The stability condition (6.78) is equivalent to the requirement that IH(z; .z2)1 <00,
Iz 11 = 1, IZ21 = 1. .
a. Show that this immediately gives the general stability condition of (Q.79) for any
two-dimensional system, in particular for the noncausal systems.
b. For a semicausal system H(m, n) = 0 for n < 0, for every m. Show that this re-
striction yields the stability conditions of (6.80).
c. For a causal system, we need hem, n) = 0 for n < O,V'm and hem, n) = 0 for
n = 0, m < O. Show that this restriction yields the stability conditions of (6.81).

6.18 Assuming the prediction-error filters of Example 6.7 represent MVRs, find conditions
on the predictor coefficients so that the associated MVRs are stable,.
. 6.19 If a transfer function Htz, , zz) = H, (z,)H z (Z2), then show that the system is stable and
(a) causal, (b) sernicausal or (c) noncausal provided H, (z,) and H 2 (z, ) are transfer
functions of one-dimensional stable systems that are (a) both causal, (b) one causal and
one noncausal, or (c) both noncausal, respectively.
,
6.20 Assuming the cepstrum c(m, n) is absolutely surnmable, prove the stability conditions
for the causal and sernicausal models.
6.21 a•. Show that the KL transform of any periodic random field that is also stationary is the
two-dimensional (unitary) DFT. . '
\ b. Suppose the finite-order. causal, semicausal, and noncausal MVRs given by (6.70)
are defined on a periodic grid with period (M, N). Show that the SDF of these
random fields is given by

Sec. a Problems 229


• BIBLIOGRAPHY •

Section 6.1
For a survey of mathematical models and their relevance in image processing and
related bibliography:

1. A. K. Jain. "Advances in mathematical models for image processing." Proceedings


IEEE 69, no. 5 (May 1981): 502-528.

Sections 6.2-6.4
Further discussions on spectral factorization and state variable models, ARMA
models, and so on, are available in: .

2. A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Englewood Cliffs, N.J.:


Prentice-Hall, 1975.
• •
.
3. A. H. Jazwinsky. Stochastic Processes and Filtering Theory. New York: Academic Press,
1970, pp. 70-92.
4. P. Whittle. Prediction and Regulation by Linear Least-Squares Methods. London:
English University Press, 1954. .
5. N. Wiener. Extrapolation, Interpolation and Smoothing of Stationary Time Series. New
York: John Wiley, 1949..

6. C. L. Rino. "Factorization of Spectra by Discrete Fourier Transforms." IEEE
Transactions on Information Theory IT-16 (July 1970): 484-485.
7. J. Makhoul. "Linear Prediction: A Tutorial Review." Proceedings IEEE 63 (April
1975): 561-580..
8. IEEE Trans. Auto. Contr. Special Issue on System Identification and Time Series
Analysis. T. Kailath, D. O. Mayne and R. K.Mehra (eds.), Vol. AC-19, December
1974. .
9. N,' E. Nahi and T. Assefi. "Bayesian Recursive Image Estimation." IEEE Trans.
Comput. (Short Notes) C-21 (July 1972): 734-738.
10. S. R. Powell and L. M. Silverman. "Modeling of Two Dimensional Covariance Functions
with Application to Image Restoration." IEEE Trans. AU/o. Contr. AC-19 (February
1974): 8-12. .
11. R. P. Roesser. "A Discrete State Space Model for Linear Image Processing." IEEE

Trans. Auto. Contr. AC-20{February 1975): 1-10.

12. E. Fornasini and G. Marchesini, "State Space Realization Theoryof Two-Dimensional
Filters." IEEE Trans. Auto. Contr-. ACZ1 (August 1976): 484-492.

Section 6.5
Noncausal representations and fast KL transforms for discrete random processes
are discussed in [1, 21] and: .

13. A. K. Jain. "A Fast Karhunen Loeve Transform for a Class of Random Processes."
IEEE T'rans. Comm. COM-24 (September 1976): 1023-1029. Also see IEEE Trans.
Comput. C-25 (November 1977): 1065-1071...
-
230 . Image Representation by Stochastic Models. Chap. 6

Sections 6.6 and 6.7

Here we follow [I], The linear prediction models discussed here can also be gen-
eralized to nonstationary random fields [1]. FGf more on random fields:

14. P. Whittle. "On Stationary Processes in the Plane." Biometrika 41 (1954): 434-449.
15. T. L. Marzetta. "A Linear Prediction Approach to Two-Dimensional Spectral Factoriza-
tion and Spectral Estimation." Ph.D. Thesis, Department Electrical Engineering and
Computer Science, MIT, February 1978. .
16. S. Ranganath and A. K. Jain. "Two-Dimensional Linear Prediction Models Part I:
Spectral Factorization and Realization." IEEE Trans. ASSPASSP-33, no. 1 (February
1985): 280-299. Also see S. Ranganath. "Two-Dimensional Spectral Factorization,
Spectral Estimation and Applications in Image Processing." Ph.D. Dissertation,
Department Electrical and Computer Engineering, UC Davis. March 1983..
17. A. K. Jain and S. Ranganath. "Two-Dimensional Linear Prediction Models and Spectral
Estimation," Ch. 7 in Advances in Computer Vision and Image Processing. (T. S. Huang,
ed.). Vol. 2, Greeenwich, Conn.: JAI Press Inc., 1986, pp. 333-372.
18. R. Chellappa. "Two-Dimensional Discrete Gaussian Markov Random Field Models for
• Image Processing," in Progress in Pattern Recognition, L. Kanal and A. Rosenfeld (eds).
Vol. 2, New

York, N.Y.: North Holland,198~. pp. 79-112.
19. J. W; Woods. "Two-Dimensional Discrete Markov Fields." IEEE Trans. Inform. Theory
IT-IS (March 1972): 232-240.

For stability of two-dimensional systems:

20. D. Goodman. "Some Stability Properties of Two-dimensional Linear Shift Invariant


Filters," IEEE Trans. Cir. Sys. CAS-24 (April 1977): 201-208.
,
The relationship between the three types of prediction models and partial differ-
ential equations is discussed in:
21. A. K. Jain. "Partial Differential Equations and Finite Difference Methods in Image
Processing, Part I-Image Representation." J. Optimization Theory andAppl. 23, no.!
(September 1977): 65-91. Also see IEEE Trans. Auto. Control AC23 (October 1978):
817-834.

Section 6.8

Here we follow [1] and have applied the method-of [6J and:

• •

22. M. P. Ekstrom and J. W. Woods. "Two-Dimensional Spectral Factorization with


Application in Recursive Digital Filtering." IEEE Trans. on Acoust. Speech and Signal
Processing ASSP·24 (April 1976): G115-128.

Section 6.9

For fast KL transform decomposition in two dimensions; stochastic decoupling, and


related results see [13], [21J, and:
-
. Sec. 6 . Bibliography 231
23, S, H. Wang. "Applications of Stochastic Models for Image Data Compression." Ph.D.
Dissertation, Department of Electrical Engineering, SUNY Buffalo, September 1979.
Also Technical Report SIPL-79-6, Signal and Image Processing Laboratory, Department
Electrical and Computer Engineering, UC Davis, September 1979.

24. A. K. Jain. "A Fast Karhunen-Loeve Transform for Recursive Filtering of Images
Corrupted by White and Colored Noise." IEEE Trans. Comput. C-26 (June 1977):
560-571.


. .

232 Image Representation by Stochastic Models Chap. 6



,

Image Enhancement

7.1 INTRODUCTION

Image enhancement refers to .accentuation, or sharpening, of image features such as


edges, boundaries, or contrast to make a graphic display more useful for display
and analysis. The enhancement process does not increase the inherent information
.content in the data. But it does increase the drnamic range of the chosen features
~9 that they can be detect~d easily. Image enhancement includes gray level and
contrast manipulation, noise reduction, edge crispening and sharpening, filtering,
interpolation and magnification, pseudocoloring, and so on. The greatest difficulty
in image enhancement is quantifying the criterion for enhancement. Therefore, J!
_Jar enu~ . - image enhancement techniques are empirical and require interac-
,--- . tJve p~edures to obtaiIl.,sati&actory results. However, image enhancement re=
_mains a very important topic because of its usefulness in virtually all image process-
,ing applications. In this chapter we consider several algorithms commonly used for
enhancement of images. Figure 7.1 lists some of the common image enhancement
, techniques.
Image enhancement
,

,

-


.
,
Point operations Spatial operations Transform operations Pseudocoloring
,
- Contrast stretching
It * Noise smoothing • Linear filtering • False coloring
• Noise clipping .' Medlan filtering • Hoot filtering , .. Pseudocoloring
*' W~ndow slicing • Unsharp masking .. Homomorphic fHtering
• Hinogrammodeling ;r'- ....,,-* 'Lew-pass. bandpass• •

high. pass filtering
.• Zooming •
Figure 7.1 Image enhancement.

233

! •

TABLE 7.1 Zero-me~£!Y~r~JQrJm~W'l~~.Qpem~!lt. Input and output gray level$ are distributed
"EetweenTO,l]. TypIcally, L = 255 .
1.. Contrast stretching- au,. Osu<a The slopes a, ~, 'Y determine the relative
f(u)=ijj(u -a)+v., a su <b contrast stretch. See Fig. 7.2.
'Y(u -b)+vb , b is u-c L
2. Noise clipping and 0, OSu<a Useful for binary or other images that
thresholding f(u) = {au, aSusb have bitqo<I&ill.stxjJ;>.u.timLof gray--··
L, u :i:! b levels. The a and b define the valley
between the peaks of the histogram.
For a = b = t, this is called
thresholding.

3. Gray scale reversal •
f(u)=L -u Creates digital negative of the image.

4. Gray-level window slicing


'{L, a zs u sz b Fully illuminates pixels lying in the
f(u) = 0, otherwise
interval [a, bland removes the
background.
5. Bit extraction f(u) = (in - 2in- I)L B = number of bits used to represent u
. = Int[u]
In 2 8 .. , n .
= 1,2, ... ,B
as an integer. This extracts the nth
most-significant bit.
6. Bit removal f(u ).= 2u modulo (L + 1), 0:5 U :5 L Most-significant-bit removal.
-, f(u) = 21nt [i] Least-significant-bit removal.
7. Range compression v=c!oglO(l+u), u:i:!O Intensity to contrast transformation.
- -
c~ L
- 10glo(1 + L)
7.2 POINT OPERATIONS

Point operations are ?er:o memory.opera.tions where a ..8iven~ray level U E [0, L J~


mapped into agrl!J I~vel J.;,~ [0, L J according
,
to a transformation
v = feu) (7.1)

Table 7.1 lists several of these transformations.

Contrast Stretching",

Low-contrast images occur often due to poor or nonuniform ljghting conditions or


due to.!i()nli!l~ati!1',Q!:J~!!iillqyniJmii;:
!iing~of the imaging sensor. Figure 7.2 shows
-atypical contrast stretching transformation, which can be expressed as
au, OSU <a
v= l3(u-a)+v., asu<b (7.2)
'Y(U-b)+Vb' bsu<L

The slope of the transformation is chosengreeter thall.uni~il1 the.region of stretch,.


The parame!er! a andjJ~~. ~e obtained by examining th~,his!o.Eram(Jf the
image, For example, the ,gray scale intervals where pixels 'occur most frequently
.would be stretched most to improve the overall visibility ofascen~. Figure 7.3 •
shows examples of contrast stretched images.

Clipping and Thresholding

A special case of contrast stretching where (X = 'Y = 0 (Fig. 7.4) is called ~liQPing. ,,
This is useful for noise reduction when the input signal is known to lie in the range ..
[a, b]. .
Thresholding is a special case of clipping where a '" b ~ t and the output
becomes binary (Fig. 7.5). For example, a seemingly binary image, such as a printed'
page, does not give binary output when scanned because of sensor noise and back-
ground illumination variations. Thresholding is used to make such an image binary.
Figure 7.6 shows examples of clipping and thresholding on images.

v
,r/'
':ji;.-

I -----~------- , •
I
I
v" I
I
I
t,
I •
I Figure 7.2 Contrast stretching transfer-
V. I marion. For dark region stretch a> 1,
• ,
" I '
u
a exLI3; .midregion stretch, Ii> 1,
0 b L b ex ~ L; bright region stretch 'Y > 1.

Sec. 7.2 Point Operations 235


;mc,,,..,,.,.. ,,'"',-,--.....".',""'
,"
,"'.
,

,
,, • .' • • •
,

• ," .-
-, . •

, , ., ,,~

, ~ . • ....
'.
.- ""
• ~.# ii"
H.
~f'"
". ,
,- • ••
'.

-
•• t,

• ;-.,"


"

,
• " ;"'


mmmm
t;.<l! • • • ~.t¥«r'''' ~
I"
cun'! ~ t'I ;"f .. ~ ;r tf .... $I
is # ;;; ' Ii ,3 R : x' ;.Ill ! ~ 1t
"r;tw It ~ ~ 'ff '11l!;~ ,>';r ....
",ItIJ/,OtJlI:t';'!ff"fltl!::,:. Ill' };:' ~ :I' '" tl ~,' !t' >' Z "'"7 !J! ,
*'!":;!Jl4~<'4,","."ll:4'101 :7 :: '7 3 " '7 "f' '!t ':'" .,. V '1'
"'l:"iIi<!IJ,,(J'lijl!K'~r .. 11 if ,,: '~; -; '.' \t ,7 '!1 'r' 1"
, lI:' ;:, ') "'" ~ '" "" IT % .;;) 7"
if iti ,. II 'II ,. a It .: '4 " II!' l';'

.W';i..-r \!o n""·'·''''"''# .... .I;' 11 ::r Y 'l! " / 'Ii' }';; '1'
y'J;
t' % t: ';t ;f ".:- tr' '$ (; 17 '" F ~
.. "i/lr;'I)'$'!I''!It:?lf)'S -f •
• ,." .. ,," "'e i oN 1$ l.1 P i& 4 ::: ;lt 11 ;.} "!t \!<
"."'1
'If.<t'!t UI"I!l~. , t1&Z'(1,!7::'==~"':1rr ~

.. ,.If .. Ult""WllIltl'1l' t 1lJ if st.". .. ~1 mfZ :t ;0 'f U'


g'l!! 'if r:7 :r :;' if '-' :It if!' .. ',~
1f • • jlHilI'.""'lI.~~•• J •
•,
• !OlI!'!\'l'~rUi/"'ll!~W$l --" ,.
'" <? II "" ¥l '/l1I' ~1 ~­
'~

,
.~
IIII '.!' .'<l21lt ':' e filii
.... fj"V !J /.-
a."1i"6t'J"U!t'''$: ,...., ;,.
..

,.. ,
,•
'1I'rr'flt/l1t1l,lJ>..-at;l!1;J
1(f1\'!311~lIi$.,.~~;;1!
"

- ,"

I
f/I"..-:::Jfflf>;;.'<''''l!>'!l
.,1II:f'f",li:I lll;:;N'f!Il!l,,"
1I''I!l1!<olW'lt'lo1l!1l'!:'1ll'>f\l'
Ii,I ... •
c'
, .'
••• .at"1FS-vi»<'!I':$tt~tt. ~,
!!
" .. " , WI ... 10f 11' '" .. ;::<
_.~

,""
"-..L< ~.,
""
~*t31"""
II « ',"

.. fI:I Br': -c:" ,,! \:- ",' '# r "" .. t"



tI! . . . ?q.lJIl\!l;!'2·,~~iF 'r-"
t!lfI:"IIf~"'Iil>Ilii'.«U. " .....
... • •
, '
\
,
~ :~ f'rrt'0"~( ,T;, %:iZl t'''''V A'7'~
.~1Tttn:J.. O'%:e:Wi1lf'

~
,*tf'#1ll1lftl,,···llI!!l1
••
.IS filii! .. " " . , "07" ,
'/"!'i'~T W~l'ZW~Jt::~!\"et~:
,""',
• .,i1 Ilt "e••• d-fiJ"!.. 1$, e W~fsv~~~~~~m~,,¥
., ....... ~- ....
Orlqinals Enhanced
.....----,..-,.~~·,·,~·-·-.,."""'
._.·._>.,_.;~'\-k-,.'.'-Y'I.·,!,,~.;·n, """".•,' n, 'C' '.?fA- " ",,1m
,y

.j

,•

,
. • 'I
. .. ~ ,
'.
'
~
'L
.
". ,""

••
_,.--,,..~,,,,,.._., ,~"",~_",,,~_ ,7
j•.",'-
>:i

l,
,
'j • • • ......., ~ ~.
-- • , ,.'

r•
" I
• •, •
,

• •
I \
"

Originals Enhanced Ori"in~ls Enhanced


Figure 7.3 Contrast stretching.

zao


Image
histogram
v r
\
\ I I
I \:/
\ II
\
,'/
"I --- I ""-
U II U
b

Figure 7.4 Clipping transformations.

v v

• --,--r-.......--.- u

Figure 7.5 Thresholding transforma-


tions.
" .-
,
• • ••


H •, •
-. s'."'.
,

• •


Figure 7.6 Clipping and thresholding.

237
v

' - - - - - ' ' - - -__ u Figure 7.7 Digital negative transforma-


o L tion.

Digital Negative

A negative image can be obtained by reverse scaling of the gray levels according to
the transformation (Fig. 7.7) .
~ _._-_.- " -----',
Iv =L - uI f
(7.3)
<
~
---"'
.
Figure 7.8 shows the digital negatives of different images. Digital negatives are
useful in the dis£lay'of medical images and in producing negative prints of images.

Intensity Level Slicing (Fig. 7.9)


{~'
a<u<b
Without background: v= (7.4)
otherwise

With background: . v = {L,


u,
a'5.u::s,b
otherwise
(7.5)

These.transformations permit segmentation of certain gray level region~lrom


the rest of the image. This technique is useful whendifferent features of an image
are contained in different gray levels. Figure 7.10 shows the result of intensity
window slicing for segmentation of low-temperature regions (clouds, hurricane) of
two images where high-intensity gray levels are proportional to low temperatures.


, .,. !

..." -.'.
,.
"'
'{
-
.. , -
.
' ,,-, " '

~.

r;{~0t~ '11 Nyr "!li~1\i~ttPf¢')&W''


t-
o'
-"'~--
• '1
,

(a) (b)
Figure 7.8 Digital negatives.

238 Image Enhancement. Chap. 7



,

v t I
I
I
I I
I I
L f------- I I
I I
I
I I I
I I I
I I I
I I I
45° I i I " ,
, U u
a b 8 b L
, , ..
Ie) Without background (bl With background

Figure 7;9 Intensity level slicing.

Bit Extraction

Suppose each image pixel is uniformly quantized to B bits. It is desired to extract


• the nth most-significant bit and display it. Let

'II =k12s-1+k22s-2+ ... +kn2S - n+ ... +kB - 12+ks (7.6)

Then we want the output to be


L ifkn = l

v= { 0,' otherwise
It is easy to show that
(7.8) ••

I I

, "

" ,
. '. .

. "

Figure 7.10 Level slidng of intensity •


window {175 250J. Top row: visual and
infrared (IR images; bottom row: seg- "
'mented imag s.

Sec. 7.2 Point Operations. 239


,,".-.< -,
,

t.-
• ,

,,t
t.

iI

Ie) First four significant bil images


o 1
,• lb) last four significant bit images
4 ,5
2 3 .
6 7
,
Figure t.ll 8-bit planes of a digital image.

where

.t n =
A I nt [ 2 u-
B n , Int(x J ~ integer part of x (7.9)

Figure 7.11 shows the first 8 most-significant bit images.of an 8·bit image. T1}i§
JTllusformatioD is useful in ~termining..1:he number, QLY.isually-significaut bitStIl,
. ~n image. In Fig. 7.11 only the first 6 hits are visually significant, because the
remaining bits do not convey any information about the image structure.

Range Compression;~
--
."'~-'-
\ \
,
"

Sometimes the,fcynllinic rangeof theimagedata may be very large.,. For example,


the dynamic range of a typical unitarily transformed image is solarge that only a
few pixels are visible. The dynamic range can be compressed via the logarithmic
transformation
1/ = c IOglO(l + lui) (7.10)
where c is a scaling constant. Thiuransforroation enhances the small magnitude
]lxel.s..campared to those pixels with large magnitudes (Fig. 7.12).

Image Subtraction and Change Detection

In many imaging applications it is desired to compare two complicated or busy'


ima~~ A simple but powerful method is to align the two images and subtract them.
The difference image is then enhanced: For example, the missing components on a
circuit board can be detected by subtracting its image from that of a properly
assembled board. Another application is in imaging of the blood vessels and arteries

240 Image Enhancement Chap. 7
!'!' j}._, ....,.,,~i···''47· ~,."-,-,-~,,.~.~., ,r - _." •. ,,-- '." ~ ' __ "_~"O'n--;~
",.., --
" ..•

,f;:
• ,

. •

(,
ii, • • '. ,,- :
'"
f " -,,'
"

·.~~:;. , , . ',,- ." - '

t-,l, ,:
... - .~.~. '¥
~..

.. ~.
;;: - •
- ";


'-",

-
.;
,,, .
-""'-:.".-
'
,- -

s
,

~• ...· - . ' ,-'.' ,


~"

• • • • • ','
.-,"','-;'

. "."
• 'N
, . ,
,-
, '-
:'.'
..,-,
.
.• "'.,. . .
. , c,-
·· -.' . . H. ..-" _
,"'. -',
.",
" '-;
;'~. . ".,
:. -;
- - .'
. .. :,'-, .s.,
,
.. .
". ' ,
, .
" ,',.
'? -
Ii
' /'
,

. ". . .-
";;,' ;.,0',.'-
• " . - ','

- -
A
• •
-'.'-'
,-,
.- . " "
.-.,' ,
,"

.
•..
"

r' .' -.. . • ~

- .; -. .- - -
'
" '_"._~V'_-,-
-',
~
. -,.-.\:"'.;-. '. ~

.. - ... , '.:
"

- - " r' " ,


,
."
"
".
'.-',
.. .
' J " ", ' "
.'.-
,"'.

• • •

la) Original (b) log


,

Figure 7.12 Range compression.

in a body. The blood stream is injected with a radio-opaque dye and X-ray images
are taken before and after the injection. The difference of the two images yields a
clear display of the blood-flow paths (see Fig. 9.44). 9ther applications of change
detection
......-..
are in security monitoring systems, automated inspection of . ~
,--- -- -,- ,,-_.~ ,--_.

circuits, and so on. .

7.3 HISTOGRAM MODELING

The histogram of animage represents the relative frequency of occurrence.of the


-
various gray levels in the.image. Histogram-rnodeling techniques modify an image
so that its histogram has a desired shape.rus is useful'irrstretching the low-contrast
levels of images with narrow histograms. Histogram modeling has been found to be
a powerful technique for image enhancement.
..
, Histogram Equalization

In histogram equalization, the goal is to obtain a uniform histogram for the output
. ~image., Consider an image pixel value u 2: 0 to be a random variable with a continu-
ous probability density function pu (It') and cumulative probability distribution
Fu (a) A P [u <: uJ. Then the random variable , .
A A (U
v ~F;,(u)=), Pu(lI)du (7.11)
o .
"-,," ,-

' . .
will be uniformly distributed over (0, 1) (Problem 7.3). To implement this transfor-
mation on digital images, suppose the input u has L gray levels Xi> i := 0,1, ... ,
L - 1 with probabilities p; (x;). These probabilities can be determined from the

Sec. 7.3 Histogram Modeling 241


• u • v Uniform v•
l; Pu {X,, ' ,
](-""0 , quantization

Figure 7.13 Histogram equalization


transformation.

histogram of the image that gives h (XI)' the number of pixels with gray level value Xi.
Then .

i = 0, 1, ... , L - 1 (7.12)

i=O

The output y' , also assumed to have L levels, is given as follows: t

(7.13b)

where V""n is the smallest positive value of v obtained from (7.13a). Now v will be,
uniformly distributed only approximately because v is not a uniformly distributed '
variable (Problem 7.3). Figure 7·.13 shows the histogram-equalization algorithm for
digital images. From (7.13a) note that Y IS a discrete variable that takes the value
k
Vk = 2:
- i"'O
p; (Xi) (7.14)

if U = Xk. Equation (7.13b) simply uniformlY regi!antizes the set'{ Yk} to {vk}; Note
that this requantization step is necessary because the probabilities P. (Xk) and p, (Yk)
are the same. Figure 7.14 shows some results of histogram equalization.

Histogram Modification

A generalization of the procedure of Fig. 7.13 is given in Fig. 7.15. The input gray
level U is first transformed nonlinearly by f(u), and the output is uniformly quan-
tized. In histogram equalization, the function

• • ,

, I(u) "" 2: P.(Xi) (7:15)

typically performs a compression of the input variable


.%,""'0

»,
- .
Other choices of f( u) that '
have similar behavior are '

n'=2,3, ... (7.16)

t We assume .1:0= O.

242 Image Enhancement Chap. 7





,

,, •.

. j .
,
'. "' '
I "

n
-, '
. I

,t'" . ,.", *' -. 0


_.......... "i'

'.
~

, £

. ---
"
"
~-- •

~-

,
fa) Top row: input image_ its histogram; bottom row: fb) left columns: input images; right columns: proc-
processed image, Its histogram; essedimages,

Figure 7.14 Histogram equalization.

f(u) = log (1 + U), u ;:;:0 (7.17)


f(u) =, U lin, u>o, n=2,3, ... (7.18)

These functions are similar to the companding transformations used in Image
quantization.

Histogram Specification

Suppose the random variable u ;:;:0 with probability density Pu{tt) is to be trans-
formed to v 2: 0 such that it has a specified prot ability density pv (0'). For this to be
true, we define a uniform random variable


u V Uniform v·
(I'll quentization
• • 'Flgure 7.1: Histogram modification.

Sec. 7.3 Histogram Modeling 243


(j
,,
,
- .·1. .
• Figure 7.16 Histogram specification .


If u and v are given as discrete random variables that take values Xi and Yi,
i = 0, ., . , L - 1, with probabilities pu (Xi) and P» (Yi), respectively, then (7.21) can
be implemented approximately as follows. Define
u k
A - A
w:: 2: po (Xi), Wk = 2: p; (Yi), k = 0, ... ,L - 1 (7.22)
Xj=O > i=O

Let w' denote the value W. such that W. - IV 2:0 for the smallest value of n. Then
V' = Yn is the output corresponding to u. Figure 7.16 shows this algorithm.

Example 7.1
Given Xi "" Yi "" 0, 1,2,3, pu (Xi) = 0.25, i = 0, ... , 3, PV(YQ) = O,P,(YI) = pv(Y2) = 0.5,
p, (Y3) "" O. Find the transformation between u and v, The accompanying table shows
,, how this mapping is developed.

u Pu(Xi) w -
w. w· n v· ,
0 0.25 0.25 0.00 0.50 1 1
1 0.25 0.50 0.50 0.50 1 1
2 0.25 0.75 1.00 1.00 2 2 P« (Xi) = ?
3 0.25 1.00 1.00 1.00 2 2 •


'1.4 SPATIAL OPERATIONS

Many image enhancement techniques are based on spatial operations performed on


local neighborhoods of input pixels. Often, the image is convolved with a finite
impulse response filter called sllatial rnask. ~)

Spatial Averaging and Spatial Low-pass Filtering


Here.eachpi~el is replaced by a, weighted average of its neighbo~hoQQ.:pjx~!S.,that is,

v(rn, n) = 2:2: a(k, I)y(rn - k, n -I) (7.23)


(k.I)<W

where y(m, n) and vern, n) are the input and output images, respectively, W is a
suitably chosen window, and a(k, I) are the filter weights. A common class of spatial
. averaging filters has all equal weights, giving ---, . .

vern. n) =- :.2:2: y(m - k, n-l) (7.24)


,
,' N w (k.t)< W
where a(k, 1) = l/Nw and N; is the number of pixels in the window W. Anorhe»
spatial averaging filter used often is given by ,

244 Image Enhancement . Chap. 7


I • I ,
o 1 -1 0 1 -1 0 1

k 0
,
1
4 -
1
4 k: -1
1
9
•,
9
1
9
k -1 0 -a
••
0

1, -
1
-
1
o -•
1 1 1 1 1 ,
4 4
•, 9 e 4
,
8

1 1 1
1 0 0
9 9 9 8

(a) 2 X 2 window (bl 3 X 3 window (cl 5·point weighted


averaging

. Figure 7.17 Spatial averaging masks a(k.l).

v(m, n) = Hy(m, n) + 1{y(m -1, n) + y(m + 1, n) (7.25)


+ y(m, n - 1) + y(m, n + I)}]
that is, each pixel is replaced by its average with the average of its nearest four
pixels. Figure 7.17 shows some spatial averaging masks.
Spatial averaging is used for noise smoothing, low-pass filtering, and sub-
sampling of images. Suppose the observed image is given as
ly{m, n) = u(m,~) +1l(m, n)J (7.26)
...........-' ." ~

where 1l(m, n) is white noise with zero mean and variance (J'~. Then the spatial
average of (7.~4) yields
1 ' .
v(m,n) = V 2:2: uim -k,n -l)+'l](m,n)~.27)
J. W(k,l)4H' ~..;I
~ u,

"
where 'l] (m. n) is the spatial average of 'l](rn, n). JUs a simple matter to show_that
.TJ (m, n) has zero mean and variance"i:T~ = g;IN", .jhat is, the noise poweris reduced
by a factor equal to the number of pixelsin the window W. If the noiseless image
u(m, n) is constant over the window W, then spatial averaging results in an im-
provement in the output signal-to-noise ratio by a factor of N w • In practice the size
of the window Wis limited due to the fact that u (m, n) is not really constant, so that
spatial averaging introduces a distortion in the form of blurring. Figure 7.18 shows
examples of spatial averaging of an image containing Gaussian noise.

Directional Smoothing

To protect the edges from blurring.while smoothing, a directional a.."eraj1;ingJiltet


,>:aD be useful. Spatial averages vim, n ; e) are calculated in several directions (see
Fig. 7.19) as

v(m,n:e)=.1.. 2:2: y(m-k,n-l) (7.28)


, N~ (k. fl' W. '

and a direction.'~"
e* is found such that Iy(m, n) -
~,,,,,
v(m, n : e*)1 is minimum. Then
vern, n) = v(rn, n : eO) • (7.29)
gives the desired result. Figure 7.20 shows an example of this method.

Sec. 7.4 Spatia! Operations 245


i?"- ",',_,-'
• ,~ '0-', :,>'\": <"'!;;V}',,'; /i!l
, /4 ,to' . ,
"J


, " . .
,
!• -'
, ;.. " .
,

- •
, •
'''.~
~ "-.
- -,?-
..,,,-
'F "
"
' " ,'"
. "¥,,-,,
~;

?If' ....

--:;,

.

'"
-{
~.

-",
• :ij'
""

.. _-" ... #,'


• •
-' ","
"~,~,,v. ~

Ib) noisy
,
la) Original

- •

,~ ,

," "v ",

,;Il
.4l
\
,- '"
"\
,', "
,,
\'
<,
j, . .


...:1 A
Ie) 3 x 3 fitter (dl7 x 7 filter

,
Figure 7.18 Spatial averaging filters for smoothing images containing Gaussian

norse,

Median Filtering '-"

.
Here the input pixel is replaced bythqlledUlILofthe pixels contained in a window
around the pixel, that is, .
v(m, n) = median{y(m - k, n -l), (k, I) E W} (7.30)
where W is a suitably chosen window. The algorithm for median filtering requires
arranging the pixel values in the window in increasing or decreasing"order and
g;picking the middle value. Generally the window size is chosen so that N; is odd; If
N.. is even, then the median is taken as the average of the two values in the middle.

246 Image Enhancement Chap. 7


• • 8 •
_t

OJ

• •

Figure 7.19 Directional smoothing filter.


"
Typical windows are 3 x 3, 5 x 5, 7 x 7, or the five-point window considered for
spatial averaging in Fig. 7.17c.
Example 7.2
Let {y (rn)} = {2, 3, 8, 4, 2} and W = [-1,0,1]. The median filter output is given by

,/1
.v (0) ~ 2 (boundary value), v (1) = median {2, 3, 8} == 3
.,;,¥'i

.c.:;,'P v(2) = median {3, 8,4} == 4, v(3) = median {8, 4, 2} = 4


v(4) = 2 (boundary value)
Hence [v (rn)} = {2, 3, 4, 4, 2}. If W contains an $lven number of pixels-for example,
W = [-1,0,1, 2J-tben v(O) = 2, vel) = 3, v(2) = median {2, 3, 8, 4} == (3 + 4)/2 = 3.5, '
and so on, gives {v (m)} = {2, 3, 3.5, 3.5, 2}.

The median filter has the following properties: ~'-'

1. It is a nonlinear filter.'" Thus for two sequences x(m) and y(m)


-_ .... , .., " - -~-

median{x(m) + y(m)} 4- median{x(m)} + median{y(m)}

.• , .
. '

'- !

.' I
, )/<,;;:'
.~ "
_.. -
"

" •I
. '


. ,• • ".'


•,"
,:;-~ -•
,- '

lall! x 5 spatial smoothing (b)'directional smoothing

Figure 7.20

Sec. 1.4 Spatial Operations ,~47 .


2. It is useful for ~l11ovingj~so!a!ecl}ines or p}xel~}\Ihi~J2Ie~erving;;pa~tiaLrcsolu.
tions,Figure 7.21 shows that the median filter performs very well on images
.containing binary noise but performs poorly when the noise is Gaussian.
Figure 7.22 compares median filtering with spatial averaging for images

containing binary noise.
3. Its performance is poor when the number of noise pixels in the window is
greater than or half the number of pixels in the window.

Since the median is the (N", + 1)/2 largest value (N", "" odd), its search requires
(N", - 1) + (N", - 2) + ... + (N", -1)/2 "" 3(N; -1)/8 comparisons. This number
equals 30 for 3 x 3 windows and 224 for 5 x 5 windows. Using a more efficient

..
""",
"1'" .. '?-_!' -
\i

,

,

~ ,
,T<' . .)~
,

. ..
"-~-'r:ve
;;;'1; .. c, . '4

,
I

, . ,

{a) Image with binary noise iblS x ,3 median filtered

r'-~"c>" '''-~'\ ".""'Y.~~,/-'"7;7 .... IS' ',-5>!'n ii . _+.,.l'<i ,f·, Y'.)!"-,S,"',"",,,,kW l~'._-,_;_."ilf'\_»\ .h,' i Ik".,ii\i'----_ ",!?,n4;+:"h:'V(f_t';<~
- --<'{
'•'
"f
;:fi
t1 ,
~. ;-,
, •

•~-

•t· •

, . , 1
\
•"J. -
r'''''''- . •
7f
'" :' -~




,j
,
.";
,•
-,0' ,j
f';
'v_ ..
- •

j

:. ..' ..



..• I
,. , . - , ,

~'-'.'- - •
,
"
!

, -:-1-j

'1
,·•
-'.~
F ,
4- _r
.-


,
,-"
!(: ;'. , d""""oi"__w,_,W'"~'. ,,_ - ,.• _"'_'c.,.~

j'1,i..:i.
~.
_.#'" '.00"._"._.",._".,.,.._",-=_,__."" . ."f_i." -'if
'.-r- /' ~""~"'~_~_
,'i!(JEl}~~i

[c) image with Gaussian noise' (dIS x 3 median filtered.
Figure 7.21 Median filtering .

248 Image Enhancement Chap. 7


,

HtfHHHH "
Figure 7.22 Spatia! averaging versus
.;. II
median filtering, \ II b
.J,., (a) Original c d
(b) with binary noise
(c) five nearest neighbors spatia! average
(d) 3 x 3 median filtered

search technique, the number of comparisons can be reduced to approximately


!Nw log2Nw [5]. For moving window medians, the operation count can be reduced
further. For example, if kpixels are deleted and k new pixels are added to a window,
then the new median can be found in no more than k(N.. + 1) comparisons. A
practical two-dimensional median filter is the sepNable median filter, which is
obtained by successive one-dimensional median filtering of rows and columns .

Other Smoothing Techniques

An alternative to median filtering for remoYil}g bin<j[Lor iso!!,tt.:d noise is to find the
spatial average according to (7.24) and replace the pixel at m;n by this average
whenever the noise is large, that is, the quantity Iv(m, n) - y(m, n)j is greater than
some prescribed threshold. !:.2r additive Gaussian noiseJ more sophisticated
. §moothing algorithms are possible. These algorithms utilize the statistical proper-
ties of the image and the noise fields, ,AQaptive al 0 . that adjust the filter
response according to local variations in the statistical properties of the data are also
possible. In many cases the noise.is multiplicative, Noise-smoothing algorithms for
such images can also be designed. These and other algorithms are considered in

o
Chapter 8.

Unsharp Masking and Crispening •

The unsharp masking technique is used commonly in the printing industry for
crispening the edges. A signal proportional to the unsharp, or low-pass filtered,
version of the image issubtracted Iromthe.Image. ~'hi~valenttQa.dding the '0'

~gradient, or a high-pass signal, to the image (see Fig. 7.23). In general the unsharp
masking operation can be represented by - ..

" •• "F v (/;', n) = u(m, n) + Ag(hl;n)~; (7.31)
r

where A> 0 and gem, n) is a~1i:--defined gradient at (m, n), A'cornmonly used
gradient function is the discrete LaE!.acian_

Sec. 7.4 Spatial Operations 249


Signal

------
121 low-pasS

_ _ _ .-I' i

!
. (31 (11 - (2) High-pass


(II fM3)

Figure 7.23 Unsharp masking opera-


tions,

gem, n) ~ uim, n) - Hu(m -1, n) + u(m, n -1) (7.32)


+ u(m + 1,n) + u(m, n -1)]
Figure 7.24 shows an example of unsharp masking using the Laplacian
operator.

Spatial Low-pass, High-pass, and Band-pass Filtering


,
Earlier it was mentioned that a spatial averaging operation is a low-pass filter
(Fig, 7.25a). If hJ.,P(m, n) denotes a FIR low-pass filter, then a FIR high-pass filter,
hHP (m, n), can be defined as (~ . . ' _. -. u"

hHl,(m, n) ='''O(~, n) - hLP(m, n) , (7.33)



Such a filter can be implemented by simply subtracting the low-pass filter output
from its input (Fig, 7,25b), Typically, the low-pass filter would perform a relatively
long-term spatial average (for example, on a 5 X 5, 7 x 7, or larger window).
A spatial band-pass filter can be characterized as (Fig. 7.25c) ,

.- h:;:;(m~n) =h-;:(m, n) ":hL,~ (7.34)


. .-.,
where hL I (m, n) and h L , (m, n) denote the FIRS of low-pass filters. Typically, hL ,
and h L , would represent short-term and long-term averages, respectively.

250 Image Enhancemel1t . Chap. 7



_ " •.~~'" > : , .. 't"'!."


• • , ,
, .. ~



• • •
·, , ,

, . •
{."
'\ <if , ".-

·• , "
• , . •

... ,
, , "..
,

..

, •
. • '
, ,. '-"4

'

, ••

• , "A:""'
'.41'

, •
f, ' - ,
, '
0; '1'

,
• ,


,
, -,-

"

:,/ ',F?L!
I' •
\
(,
V' _~""_g _. '
4:~ " ~ ""'4i--
~,_, '.')0'( - It: ".. _

/'Fili~re 7,24) Unsharp masking. Original (left), enhanced (right).


m,n I + VHf' ( m,n) •



uim, n) • LPF +
Spatial
averaging -
{al Spatial tow-pass Ii Iter lbl Spatial high-pass filter

u!m, nl LPF +
hL, (m, n) - +
-
LPF
h L t !mn
, I

(c) Spatial band-pass filter

Figure 7.25 Spatial filt, 'r,

. ~ow-pass fili@fS
are useful for noise..§m.QothitI aT!<:l.jn1~Q.D. !:Qgh-pa~s.
filters are usef!!Lil1extractirtg edges and in sharpenii ~ imagc.§.Band-pass filters are
psefu! in the enhancement of edges and other high-pss image characteristics in tlje •

presence ofnoise. Figure 7.26 shows examples of higl -pass, low-pass and band-pass
filters.

Sec. 7.4 Spatial Operations 251


(a) (bi (e) {dl
.~"
<\
~J-""""~-""\'. - - '-._ '.' ,"'._ -1, -,- "j•.•.•• t
.'-"
p'
';"'.
.x. ~" -

,. }'

. "
i

y
, ;.

I
.., , . •
, v. .".

~-.-

,. .-

(e)
Figure 7.26 Spatial filtering examples.
Top row: original, high-pass, low-pass and band-pass filtered images.
Bottom row: original and high-pass filtered images.

Inverse Contrast Ratio Mapping and Statistical Scaling


, "

The ability of our ,visual syste,m..to detect an object in a uniform background


depends on its size (resolution) and the.£{)~ ,,/, which is defined as
" ,
(J' ,
:',,/
j
= -n ,~ (7.35)
J r'

where ~ is the average Il,lminanc~ of the object and Sl. is theJl!<m9arcLdeyiatio.n.of the
luminance of the objectJ2lus its surround. Now consider the inverse contrast ratio
transformation .
_ fL(m. n)
v (m, n) - (J' (m, n) (7.36)

where I.l.(m, n) and ctm, n) are theJQcw mean and standard deviation of ulm, n)
measured over a window Wand are given by .
1 .
1
<
!k(m, n) = . 2::2:
Nw(k.I)<W
u(m - k, n -I) (7. 37a)
"
., 1 L'2
lcr(m,n)= Nw(~tw[u(m-k,n-l)-I.l.(m,n)]2 . (7.37b).

This transformation generates an image where the weak (that is, .!ow-contrast)
~are enhanced. •...A sIiecial case of this is the transfor~_
.- ,
. . ' . --
_;-- .. -'-~_ '- ~.- -_.~

.
_ Jt(m, n)
( )
vm,n-( ) .(7.38)
o m, n

~

Image Enhancement Chap. ,


252

,"

, ,

, ,,•
,

"', ><h:~~ k:'e"'" ,. ,

. '
~
r ,'#(7:'84
'-~·i\i)j ",>00,_'_»""_"'"'' ,v..",,_.__ _- i ' ''';; ,olpiif~

Figure 7.27 Inverse contrast ratio mapping of images. Faint edges in the originals
have been enhanced. For example, note the bricks on the patio and suspension
cables on the bridge,

which scales each.pixel by its stlllJdard <:ie.viallim tc.generate.an imagtu£hos~ IJi;xel§


have unity variance. This mapping is also called q,tatisf Cn [13]. Figure 7.27
sllows examples of inverse contrast ratio mapping.

Magnification and Interpolation (Zooming) .

Often it is desired to zoom on a given region of an image.


, This requires taking an
image and displaying it as a .larger image~ 'I , '

Replication is .~ zero-order hold where each pixel along a scan


Replication.
line is repeated once and then each scan line is repeated. This is equiY.;ilent to ta~i~
an M x N image and interlacing iLby rows and columns of zeros to obtain a
.
2M'~2N lllatnx and convolving the result with an array Ii, defined as .
,,--,~ .--_.. ....

H= [i iJ (7.39)

This gives

v(m,n)=u(k,l), k~Int[i]. I::"'Int[1]' m,n=O,1,2, ... (7.401'


_ ...

Figure 7.28 shows examples of interpolation by replication.

Linear Interpolation. Linear interpolation is a ...first order hold- where a


straight line is first fitted in between pixels along a row. Then pixels along each
column are interpolated along a straightline. For example, for a 2 x 2 magnifica-
tion, linear interpolation along rows gives .

Vj (m, 2n) = u(m, n),


O<m sM -l,Osn sN -1.
(7.41)
. VI (m, 2n + 1) = Hu(m, n) + u(m, It +,l)J,
Osm sM -l,O<n sN-1

Sec. 7.4 Spatial Operations 253


-1 0 3 0 2 0
- -1 1 3 3 2 2
1 3 2
- 000000 1 1 3 3 2 2
Zero Convolve

interface 4 0 506 0 H 4 4 5 566
4 5 6 ,
- 00000 0 445 5 6 6
- - I-

(a)

.... ~

....
J' ....

. ,.r·t
Hi

.~
- ,


""-
I 1.-W;;


,••.
r ....

f' ;,-- .

, /' II ." 1
iI' ill iI' lli'll ~
! ,
~$ 4l'
'"
• .._, .,-
-<,,-"" ',"
t I
,
~,
- '"~ • ill

.,

I
I
,
I '
1 \
(b)
Figure 7.28 Zooming by replication from 128 x 128 to 256 x 256 and 512 x 512 images.

Linear interpolation of the preceding along columns gives the first result as

V (2m, n) = VI (m, n)
.v(2m +.1, n) =~ [VI (m, n) + VI (m + 1, n)], (7.42)
Osm sM -l,OsN S2N-1
Here it is assumed that the input image is zero outside [0, M - 1J x [0, N - 1J. The

above result can also be obtained by convolving the 2M x 2N zero interlaced image
with the array
I
l ,2 t-
H= i-' m 1
2 (7.43)
! !
4 !

254 Image Enhancement Chap. 7


r
1 070
r-
I 7 3.5
• -1 4 7 3 . 5
7
- o 0 0 0 23.
Z.It> tntermtat«
interlace columnt •
1
- 3

o
0

0
1

0
0

0
32t0-.5

o 0 0 0
3 Z

1,5 1
1

0.5 0.25
0.5

L-
- - • ...

la}

,
I

•• c-4c.;,,~
• , .. ,co., '.y.. "

'!

, ..

• • ;j
~ __ .,)l ~-~ ..


\. '~_~',,"r}j}tc)x.x.w:D~e"J6,F
(bl
Figure 7.29 Zooming by linear interpolation from 128 x 128 to 256 x 256 and
512 x 512 images.

whose origin (m = 0, n = 0) is at the center of the array, that is, the boxed element.
Figure 7.29 contains examples of linear interpolation. Inmost of the image process-
ing applications, linear interpolation performs quite satisfactorily. Jligher-or@L .
(say, p) interpolation
.- is.PQssiblelu'.!lilddillg each.row and each column oftheJnput .

.image by prows and p columns ofzeros, respectively, and convolving it p times with
If (Fig. 1.30), For example p = 3 yields a cubic sElille inter.l221atioll,}n between the
pixels,'
,

u'm. n) 'nterlacing by ! vIm. ,,' .


p rows and Convolve H ' H H
columns of zeros • • • ,

1 2 ••• p

Figure 7.30 pth order interpolation.
.....:.-.-.-.~., ,. ~

Sec. 7.4 Spatial Operations 255


,
7.5 TRANSFORM OPERATIONS
. .
In the transform operation enhancement techniques, ,zero:!!leI!lon::,opc:!:ations are
performed on a transformed image followed by the inverse transformation, as
shown in Fig. 7.31. We start with the transformed image V = {v (k, i)} as
V = AUAT (7.44)
where U,.. {u(m, n)} is the inp\lt iI!l~M' Then the inverse transform of
v' (k, i) = f(v(k, l» . (7.45) .
gives the enhanced image as
(7.46)

Generalized Linear Filtering

In generalized linear filtering, the zero-memory transform domain operation is a


pixel-by-pixel multiplication
v'(k, I) '" g (k, /)v (k, /) (7.47)
where g(k, I) is called a_:{onal
,
mask. Figure 7.32 shows zonal masks for low-pass, .
band-pass and high-pass filters for the DFT and other orthogonal transforms.
, ,
PO,int v' (k,1) Inverse u' 1m. n)
u(m. nl Unitary
tran-sform
vlk,ll
,
operations
fl' )
--"""',c A-' V' [ATj-'
transform
AUA T ,
i

Figure 7.31 Image enhancement by transform filtering,


/ -... / .
o a bN-bN-aN-l

k c k

i d
'HPF
i
N-d
HPF
N-c
-,
<: , ;

(.) DFT zonal mask, (b) (Real) .!?"hoQ""alltimsfl1'm zona! masks


,

Figure 7.32 Examples of zonal masks g(k, 1) for low-pass filtering (LPF), band-
pass filtering (BPF), and high-pass (HPF) filtering in (complex) DFT and (real)
orthogonal transform domains. The function g(k,/) is zero outside the region of
support shown
. ,for the particular filler. •
• •

256 -
lmaae Enhancement
'
. Chap. 7

,,
r"::~-"/-1
.

I

"" y",.f:>j· "g;,!'!',-" -j $., -J'}}'v,-,"'F-', ><n

DCT


,

I
I ,;1
,,;*
,,,,.
,
'"

"• • "
r,l } '--,} • ,"''''''-.' ")'}"'-<' :····01

Hadamard Transform

.
!'i "'.....
c~ ,f?
_ .
;;- ''$.
"A"""'\"
II t *
.
" .....,. ,.
~ . ~, ,\'1.
", 4","0"",.. /\:

'.;,.. "';;:~;'f_ ,
"" ~; .'.", l'h;> H ' t~·, >.. f. '," _k"., , .... , .... """,. , •. -~,.

"r·-:, e
J
, I
• •
i•
','

DFT
Figure 7.33 Generalized linear filtering using ocr, and Hadamard transform and
OFT. In each case a b (a) original; (b) low-pass filtered; (c) hand-pass filtered;

c d

257

A filter of special interest is the ifH'l!r~fL Gm;,.££imdilter, whose zonal mask for
,V x S im3g~sis d.:rlned as --.- .
(k 2 + 12)}. .. N
g(k, I) "" exp 20.2 f { ' Os, k, 1 S,2 (7.48)
19(N .- k, N -l), otherwise

when A· in (7.44) is the • DFT. For other orthogonal transforms discussed in
Chapter 5,

os, k, I s, N - 1 (7.49)

'!}tis is a high·frequency emphasis filter that restores images blurred by atmospheric.


turbulence or other phenomena that can be modeled by Gaussian P8Fs.
Figures 7.33 and 7.34 show some examples of generalized linear filtering.
- P'!:...{' ""-4 p.,!.-~, ",.,; ".~ /"'t"t-!;- ", , _'

Root Filtering

The transform coefficients v(k, l) can be written as


-Xi-\
-
v(k, 1):- Iv(k,I)leiO(k./)
'
(7.50)
In root filtering, the a-root ofthe magnitude component of v(k, l) is taken, while
retaining the phase component, to yield
v'(k, 1)= Iv(k, 1)I"ejO(k,ll, Os, a: Si 1 (7.51)

For common images, since the magnitude of v(k, I) is relatively smaller at higher
• ••
. I
I • •
r
I •

, • ,
•·• ,

• ,
,.- '.


" ,t t ~
J( h'7, t' ,",
,~_ 2J ~~- "r:'~.~
Originals Enhanced l.owpass filtering for noise smoothing
Inwtell Gaussian filtering ru'"'T';b' (a) Original
I-'-H (b] noisy
c d (c) peT filtar
• {dl Hadamard transform filtar .

FIgure 7.34 Examples of transform based linear filtering.


258 Image Enhancement



••

• ••

s•
,

-
-" , •
. .
) .
-
.~

(a) Original; (0) c = 0 (phase only); (c) a = 0.5; (dl" = 0.7.


••

Figure 7.35 Root filtering

spatial frequencies, the effect of a-rooting is tQ.!'lnhance higher spatial frequenges


(low amplitudes);elative to lower spatial fnequenci~s (high amplitudes). Figure 7.35
shows the effect of these filters. (Also see Fig. 8.19.)

.Generalized Q.e,.g.~!..':'.w and Homomorphic Filtering

If the magnituce term in (7.51) is replaced by the logarithm of Iv(k, 1)1 and we
r
denne
s(k, I) A [loglv(k, 1)lJe j9(k, l) , Iv(k, 1)1 >0 (7.52)
then the inverse transform of stk, I), denoted by c(rn, n), is called the generalized
cepstrum of the image (Fig. 7.36). In prac~ice a gosi.1ive constan1.ls added to lv (k, 1)1
"
to prevent the logarithm from going to negative infinity. The image c(rn, n) is also
- _'d .-"~

f--------~------------------------, ,
, I
Image I Oepstrum
u{m, nl I vlk, I) $( k,l) ! elm, n)

I
AUA T /log I v( k, III ] "i.I', n A'1S(A TI"
I
I I
I
L . ._~
I

(81 Generalized cepstrum and the generalized homomorphic transform 3C .


c:(m, nl $(k, II ! v(k, I) u(m, Ii) •



ACA T axp ( I s I I} "lil'.n A·I S[Ar] ••

,
(b) InverseJ:lomomorphic transform. 3C.

Zonal 3C. ,
3C mask
,

(c) Generalized homomorphic linear filtering

Figure 7.36 . Generalized cepstrum and homomorphic filtering .

. Sec. 7.5 .Transform Operations 25£1



>" ,.' Wi. n,_,,_,._, i._"," '" 1,-, _.~ • ',~~'" "'. """""""""'''0', _ (I , .
••

,
._ .- • , _j ,0,.,,'
r' -.- ,
. ,
J",,<,_ ~

,, '\
..) -:
' -" '
.

f'
i• r
,

I, ij
'J•
5,-'
J,
,

,'

,
..• ,
I . I ..
~


cepstrurn of the building image generalized cepsrra
(a) original 1a b
(bl DFT c d
Ic) DCT
(dl Hadamard transform
Figure 7.37 Generalized cepstra:

called the ,generalized homomorphic transform, !;l', of the image uim, n). The
, generalized homomorphic linear filter performs zero-memory operations on the
..... _. __' _ ' _ S " _.. ,"" ~,

sr~transformof the image followed by inverse Y7~transform, as shown in Fig. 7.36.


'Examples of cepstra are given in Fig. 7.37. The homomQrphic trglllsfurmalkLn~
reduces the dynamic range of the image in the transform domain and increases it in
the cepstral domain.
-c'\'f))
, ',-. /
, /
7.6 MULTISPECTRAL IMAGE ENHANCEMENT

In multispectral imaging there is a sequence of I images U, (m, n), i = 1,2, ... ,J,
where the number I is typically between 2 and 12. Itis desired to combine these
images to generate a single or a few display images that are representative of their
leatures. There are three common methods of enhancing such Images.- .... ~-'

Intensity Ratios

Define the ratios


,
A Uj(m, n)
R,'.j (m, n) = u· ( m n)' i*j . (7.53)
I '
where uj(m, n) represents the intensity and is assumed to be positive. This method
gives [2 - I combinations for the ratios, the few most suitable of which are chosen by
visual inspection. Sometimes the ratios are defined with respect to the average

image (III) k~*l ui(m, n} to reduce the number of combinations.

260 Image Enhancement. Chap. .,




log-Ratios

Taking the logarithm on both sides of (7.53), we get


L;,;a logR;,;= log U; (m, n) -loguj(m, n) (7.54)
,

The log-ratio L;,; gives a better display when the dynamic range of R i,; is very large,
which can occur if the spectral features at a spatial location are quite different.

Principal Components

For each (m, n) define thel x 1 vector


u.irn, n)l
U2(m,n) ,
u(m, n) = •• (7.55)

UI(m, n)

The 1 x I kr:rtransform of u(m, n), denoted by <P, is d~termined from the auto-
correlation matrix of the ensemble of vectors {U; (m, n), i = 1, ... , 1}. The rows of
$, which are eigenvectors of the autocorrelation matrix, are arranged in decreasing
order of their associated eigenvalues. Then for any 10sl, the images v;{m, n),
i = 1, ... ,10 obtained from the KL transformed vector .
vim, n) = $u(m, n) (7.56)
are the first 10 principal components of the multispectral images.
Figure 7.38 contains examples of multispectral image enhancement.
i.:';-
>'
, ,


'"; -_"3*__ me" ("'c_ -Vi\<".)t'r>,yc "',Y'!\l


+J"/':'- ." "->'
-<

., :\
• •

Originals (top). vlsual Log Ratios Principal Compounds


-, band: (bottoml.I.R.
, Figure 7.38 Multispectral image enhancement. The clouds and land have been
separated.

Sec. 7.6 Multispectral Image Enhancement . 261


7.7 FALSE COLOR AND PSEUDOCOLOR

Since we can distinguish many more CC!0rs than gray levels, the perceptual dynamic
range of a display can be effectively increased by coding complex information in .
. color. False color implies mapping a color image into another color image to
provide a more striking color contrast (which may not be natural) ,to attract the
attention of the viewer,
. Pseudocolor refers to mapping a set of images u, (m, n), i = 1, ... ,1 into a
color image. Usually the mapping is determined such that different features of the
data set can be distinguished by different colors. Thus. a large data set can be
presented comprehensively to the viewer.
v1 (m • nj R
Input • 2 (rn, nl Color . I
Feature G

coordinate Display
extraction v3(m.nj transformation
Images B
u;(m, nj

Figure 7.39 Pseudocolor image enhancement.

. Figure 7.39 shows the general procedure for determining pseudocolor


mappings. The given input images are mapped into three feature images, which are
then mapped into the three color primaries. §lIppose it is desired to pseudocolor a
. monochrome image. Then we have to map the gray levels onto a suitably chosen .
curve in the color space. One way is to keep the saturation (S* ) constant and map
the gray level values into brightness (W*) and the local spatial averages of gray
levels into hue (9*)..
Other methods are possible, including a pseudorandom mapping of gray levels

into R. G, B coordinates, as is done in some image display systems. The comet
Halley image shown on the cover page of this text is an example of pseudocolor
image enhancement. Different densities of the comet surface (which is ice) were
mapped into 5iifferent colors.
For image data sets where the number of images is greater than or equal to
three, the data set can be reduced to three ratios, three log-ratios, or three principal
components, which are then mapped into suitable colors.
In general, the pseudocolor mappings are nonunique, and extensive inter-
active trials may be required to determine an acceptable mapping for displaying a
given set of data. .

7.8 COLOR IMAGE ENHANCEMENT

In addition to the requirements of monochrome image enhancement,j;Q!Qrimage.


enhancement may require improvement of color balance or color contrast in a color
image. Enhancement of color images becomes a more difficult task not only be-.
cause of the added dimension' of the data but also due to the added complexity of
color perception. A practical approach to developing color image enhancemem .

262 Image Enhancement Chap. 7


T, Monochrome • ,
T' W
image enhancement •
. •
R

Input G Coordinate T2 Monochrome T'2 Inverse G'


, coordinate Display
image conversion image enhancement
B transformation

To T'3 B'
Monochrome
image enhancement

Figure 7.46 Color image enhancement.

algorithms is shown in Fig. 7,40. ~ iJ!illll.-colorcQorQinates of each pixel !:Ire


independently transformed into another set of color coordinates. where the image,
"in each coordinate is enhanced by its own (monochrome) image enhancement
. algorithm. which could be chosen suitably from the foregoing set of algorithms. The
enhanced image coordinates Ti, n. , Tj are inverse transformed to R' , G' , S' for
display. Since each image plane 1k (m, 71), k = 1,2,3, is enhanced independently,
,,.
.care has to be taken so that the enhanced coordinates T; are within the color gamut ,I
of the R-G-S system. The choice of color coordinate system 1k ,k = 1,2,3, in whi<:h
enhancement algorithms.are implemented may be problem-dependent.
. '.. " .


7.9 SUMMARY
,•
In this chapter we have presented several image enhancement techniques sccom-
. panied by examples. Image enhancement techniques can be improved if the en-
hancement criterion can be stated precisely. Often such criteria are. application-
dependent, and the final enhancement algorithm can only be obtained by trial and
error. Modem digital image display systems offer a variety of control function

switches, which allow the user to enhance interactively an image for display.

PROBLEMS

7.1 (Enhancement of a low-contrast image) Take a 25¢ coin; scan and digitize it to obtain
. a 512 ;( 512 .mage. '
a. Enhance it by a suitable contrast stretching transformation and compare it with
histogram equalization. , . .
,
b. Perform 'uusbarp masking and spatial high-pass operations and contrast stretch the
. results, Compare their performance as edge enhancement operators.
7.2 pSing(7,6) and (7.9), prove the formula (7,B) for extracting the nth bit of a pixel.
7.3 a. Show that the random variable v defined via (7.11) satisfies the condition
Prob[v s u J== Prob[u Sp-l (tr)) == F(P-l (e-) == 0', where 0< 0-< 1. This means v .
is uniformly distributed over (0, 1).

Chap. 7 Problems , 263




b. On the other hand, show that the digital transformation of (7.12) gives p, (vd ='
1'''(.'', l. where 1'. is given by (7.14).
7.4. :rk,,·"'l' :Ill .ll,:,'f'ithm for (a) an M x ,If median filter. and (bl an}4 x 1 separable
mcclr.ru tittet; ~h.-lt minuuizcs the number of operations required for filtering 1'1 X N
images, where N 3> M. Compare the operation counts for M =' 3,5. 7.
7.5* (Adaptive unsharp masking) A powerful method of sharpening images in the pres-
ence of low levels of noise (such as film grain noise) is via the following algorithm [15].
The high-pass filter
-1 -2 -11
r.1 -2'
-1
12
-2
-2J
-1
which can be used for unsharp masking, can be written as 1~ times the sum: 'of the
following eight directional masks (H. ): . .
000 -100 0·2 a 00-1 000 000 000 000
-220 010 020 010 o2·2 o l. o 020 010
000 000 000 {) 0.0 . {) 0 a 00-1 0-2 a -100
The input image is filtered by each of these masks and the outputs that exceed a
threshold are summed and mixed with the input image (see Fig. n.S). Perform this
algorithm on a scanned photograph and compare with nonadaptive unsharp masking.

, ,
Ima9" Directional v. A 1:: vfJ T
Enhanced
Threshold" 1:
mask H. 6: I va 1>'1 image
I I I I

.. ~ : ,
• ,
, . , ..
r
Figure P7,S Adaptive unsharp masking.

7.6* {Filteringusing phase) One of the limitations of the noise-smoothing linear filters is
that their frequency response has zero phase. This means the phase distortions due to
noise remain unaffected by-these algorithms. To see the effect of phase, enhance a noisy
image (with, for instance, SNR =.3 dB) by spatial averaging and transform processing
such that the phase of the enhanced image is the same as that of the .original image (see
Fig. P7.6), Display it (m, n), ii (m, n) and u(m, n) and compare results at different noise
levels, If, instead of preserving e(k, I), suppose we preserve 10% of the samples
1\i(m, n) ~ [IDFT{exp jaCk, I)B that have the largest magnitudes. Develop an algorithm
for enhancing the noisy image now. , ,


OFT Measure phase
I, ej 9tk,l l •

I
i

u(m, nJ +
Noise smoothing
A
u{m, n)
OFT
A
lv(k,l}i
x IOFT
-
u(m. nl
1: . .
+ J


Noise f/{m, n)

Figure P7,6 Filtering using phase.

264 Image Enhancement Chap. 7



7.7 Take a 512 x 512 image containing noise. Design low-pass, band-pass, and high-pass
zonal masks in different transform domains such that their pass bands contain equal
..energy. Map the three filtered images into R, G, B color components at each pixel and
display the resultant pseudocolor image.
7.8 Scan and digitize a Io..v-conrrasr imagL' (such ':l~ fingl..~rpritHS or a coin). 0..:\'\.:10p ;.,
technique based on contrast ratio mapping to bring out the faint edges in such images.

BIBLIOGRAPHY

Section 1.1

General references on image enhancement include the several books on digital


image processing cited in Chapter 1. Other surveys are given in:

1. H. CiAndrews. Computer Techniques in Image Processing. New York: Academic Press,


1970.
2. H. C. Andrews, A. G. Tescher and R. P. Kruger. "Image Processing by Digital
Computers." IEEE Spectrum 9 (1972): 20-32. .
3. T. S. Huang. "Image Enhancement: A Review." Opto-Electronics 1 (1969): 49-59.
4. T. S. Huang, W. E Schreiber and ·0. J. Tretiak. "Image Processing." Proc. IEEE
59 (1971): 1586-1609.
5. J. S. Lim. "Image Enhancement." In Digital Image Processing Techniques (M. P.
Ekstrom, ed.) Chapter 1, pp. 1-51. New ~ork: Academic Press, 1984.
6. T. S. Huang (ed.), Two-Dimensional Digital Signal Processing I and II. Topics ill Applied
Physics, vols. 42-43. Berlin: Springer Verlag, 1981.

Sections 1.2 and 7.3 •

For gray level and histogram modification techniques:

7. R. Nathan. "Picture Enhancement for the Moon, Mars, and Man" in Pictorial Pattern
Recognition (G. C. Cheng, ed.). Washington, D.C.: Thompson, pp. 239-266, 1968.
8. F. Billingsley. "Applications of Digital Image Processing." Appl. Opt. 9 (February
1970): 289-299.
9. D. A. O'Handley and W. B. Green. "Recent Developments in Digital Image Processing
at the Image Processing Laboratory at the' Jet Propulsion Laboratory." Proc. IEEE
60 (1972): 821-828. '

10. R. E. Woods and R. C. Gonzalez. "Real Time. Digital Image Enhancement," Proc.
IEEE 69 (1981): 643-654.

Section 7.4

Further resultson median' filtering and other spatial neighborhood processing tech-
niques can be found in: .
.I
11. J. W. Tukey. Exploratory Data Analysis. Reading.-Mass.: Addison Wesley, 1971.

Chap: 7 Bibliography 265


12. T. S. Huang and G. Y. Tang. "A Fast Two-Dimensional Median filtering Algorithm."
IEEE Trans. Accoust. Speech, Signal Process. ASSP-27 (1979): 13-18.
13. R. H. Wallis. "An Approach for the Space Variant Restoration and Enhancement of
Images." Proc. Symp. Current Math. Problems iii image Sci. (1976).

14. W. F. Schreiber. "Image Processing for Quality Improvement." Proc. IEEE, 66 (1978):
",
1640-1651.
15. P. G. Powell and B. E. Bayer. "A Method for the Digital Enhancement of Unsharp,
Grainy Photographic Images." Proc. Int. Conf. ElectronicImage Proc., IEEE. U.K.
(July 1982): 179-183.

Section 7.5

For cepstral and homomorphic filtering based approaches see [4, 5] and:

16. T. G. Stockham, Jr. "Image Processing in the Context of a Visual Model." Proc. IEEE
60 (1972): 828-842.

Sections 1.6-7.8

For multispectral and pseudocolor enhancement techniques:

17. L. W. Nichols and J. Lamar. "Conversion of Infrared Images to Visible in Color."


. Appl. Opt. 7 (September 1968): 1757. .
18. E. R. Kreins and L. J. Allison. "Color Enhancement of Nimbus High Resolution
Infrared Radiometer Data." Appl. Opt. 9 (March 1970): 681. " •
19. A. K. Jain. A. Nassir, and D. Nelson. "Multispectral Feature Display via Pseudo
Coloring." Proc. 24th Annual SPSE Meeting, p. K-3.


.
,


• •

266 Image Enhancement Chap. 7


Image Filtering
and
Restoration

8.1 INTRODUCTION

Any image acquired by optical, electro-optical or electronic means is likely to be


degraded by the sensing environment. The degradations may be in the form of
sensor noise, blur due to camera misfocus, relative object-camera motion, random
atmospheric turbulence, and so on. Imagere.• oration is concerned with filtering the
observed image to minimize the effect of degradations (Fig. 8.1). The effectiveness
of image restoration filters depends on the extent and the accuracy of the knowl-
edge of the degradation process as well as all the filter design criterion. A frequently
used criterion is the mean square error. Although, as a global measure of visual
fidelity, its validity is questionable (see Chapter 3) it is a reasonable local mel/sure
and is mathematically tractable. Other criteria such as weighted mean square and
maximum entropy are also used, although less frequently,
. Image restoration differs from image enhancement in that the latter is
concerned more with accentuation or extraction of image features rather than
restoration of degradations. Image restoration problems can be quantified pre-
cisely, whereas enhancement criteria are difficult to zepresent mathematically.
Consequently, restoration techniques often depend only on the class or ensemble

Ii(". yl Imaging AtoD vim. a) Digital • m,


f-
n' o to A
system ,
convenilon filter corwersien
,
,

, •
O'sp ,":yor
reco n"
,- •
F~ '8.1 Digital image restoration system.

267

Imagi] r8-;tor3rion
,,
,

,• , .

,

Restoration models. 1 r-u',~;ar filtering Other methods


.. .

• tmage formation mode-Is • tnverse/pseuccinve-se filter '* Speckle noise r-eduction


It Detector and recorder floWiener Wter '* Maximum entropy
• Noise models it FIR filter restoration
• Sampled observation models .. Gene-raHzed Wiener filter .. Bayesian methods
"" Spline loterpotatlon/smoottunq '* Coordinate transformation
• Least squares and SVD. methods and geometric correction
• Recu"ive (Karman] filter '* Blind deccnvolutlcn
.. Semirecursive filter '* Extrapolation and
seper-resclutjcn

Fi~ure 8.2 Image restoration,

properties of a data set, whereas image enhancement techniques are much more
image dependent. Figure 8.2 summarizes several restoration techniques that are
discussed in this chapter.

8.2 IMAGE OBSERVATION MODELS

A typical imaging system consists of an image formation system, a detector, and a


recorder. For example, an electro-optical system such as the television camera
contains an optical system that focuses an image on a photoelectricdevice, which is
scanned for transmission or recording of the image. Similarly; an ordinary camera
uses a lens to form an image that is detected and recorded on a photosensitive film.
A general model for such systems (Fig. 8.3) can be expressed as
,

v (x, Y)=C[w(x, y)J + "fl(x, y) (8.1)

w(x"y):'Ifh(X,y;x',yl)u(X',yl)dx' dy'. (8.2)


!
• •

'll(x, y) =: f[g(w(x, y»lTJt (x, y) + "12 (x"y) (8.3)


;1
, •
• • •
u!x, yl Linear w{x. vl Point v(x. yl
,. •
$\'$Tem nonlinearity + ,
+
h(K.. y; x'.. y'} g( • l .
+
...
l1(X. YI
.
• ,
,.
.


0' fl • l X +
,


11\
fIx. vI
• •

FIgure 8.3 Image observation model.

268 Image Filtering and Restoration Chap. S



,

where u(x;y) represents the object (also called the' original image), and v(x,y) is
the observed' image. The image formation process can often be modeled by the
linear system of (8.2), where h (x, y;x', y ') is its impulse response. For space invari-
ant systems, we can write
h(x,y;x',y'),;h(x -x',Y -Y';O,O)~h(x ':"x',y -y') (8.4)

The functions f(·) and g (.) are generally nonlinear and represent the characteristics
of the detector/recording mechanisms. The term 11(X, y) represents the additive
noise, which has an image-dependent random component f[ g (W)]111 and an image-.
independent random component 112'

Image Formation Models.


• •

Table 8.1 lists impulse response models for several spatially invariant systems.
Diffraction-limited coherent systems have the effect of being ideal low-pass filters ..
For an incoherent system, this means band-limitedness and a frequency response
obtained by convolving the coherent transfer function (CTF) with itself (Fig. 8.4).
Degradations due to phase distortion in the CTF are called aberrations and manifest
themselves as distortions in the pass-band of the incoherent optical transfer function
(OTF). For example, a severely out-of-focus lens with rectangular aperture causes
an aberration in the OTF,as shownin Fig. 8.4.
Motion blur occurs when there is relative motion between the object and the
camera during exposure. Atmospheric turbulence is due to random variations in the
refractive index of the medium between the object and the imaging system. Such
degradations occur in the imaging of astronomical "Objects; Image blurring also
occurs in image acquisition by scanners in which the image pixels are integrated
over the scanning aperture. Examples of this can be found in image acquisition by
radar, beam-forming arrays, and conventional image display systems using rele-

TABLE 8.1 Examples of Spatially Invariant Models


".
Impulse response Frequency response
Type of system hex, y) H(~" ~2)
Diffraction limited, coherent
(with rectangular aperture)
ab sinc(ax) sinc(by) §.)
1;1 _:
rect ( ~
a'b •

Diffraction limited, incoherent sind'(ax) sind'(by) tri (~ ~2)


(with rectangular aperture) a'b

Horizontal motion 1 I x 1)
- rect I - ..., - 8(y) ,
(:\(0 ao 2
\

Atmospheric turbulence 2
exp{ -7I'0/.2(x + y2)}

Rectangular scanning aP.:'rture


, -.--,_." ' rect(~, ~)
1
• CCD interactions 2.:2.: (J(k,,8(x - kt», y -fA)"
k.t ... -1

Seo.8.2 Image Observation Models 269


H{~,.OJ

t cTF 01 a coherent diffraction limited svstern
• OTF of an incoherent diffraction limited system
-~~'

Aberration due to .everely out-of-focus lens


,,
,." ....
'" " ",,-
-1.0 -0.5 0.5 . 1.0 ~,


Figure 8.4 Degradations due to diffraction limitedness and lens ~l.>erration.

vision rasters. In the case of CCD arrays used forimage acquisition, local inter-
• actions between adjacent array elements blur the image.
Figure 8.5 shows the PSFs of some of these degradation phenomena. For an
ideal imaging system, the impulse response is the infinitesimally thin Dirac delta
function having an infinite passband. Hence, the extent of blur introduced by a
system can be judged by the shape and width of the PSF. Alternatively, the pass-
band of the frequency response can be used to judge the extent of blur or the
resolution of the system. Figures 2.2 and 2.5 show the PSFs and MTFs due .to
atmospheric turbulence and diffractiqn·limited systems. Figure 8.6 shows examples
of blurred images.

hlx, Y)

..'...1---...,
"0

---------:I---L .... x

r
/
(a) One dimensional motion blur

h{x, 0) h(x.O\


'--__ x '"-----x
o
y ,
Y
fbI Incoherllfll diffrllClion limlled ~.tem (Ie';. cutoff) (c) Average atmospheric turtlulei1ce
,
FIgure 8.5 Examples of spatially invariant PSFs.

270 .Image Filtering and Restoration Chap.S


,
, , •.,
,,
• ,•

b •• ,
,.....
"'"
,,~

••
.

,•• -
•••

, {
'-<1

.'

!I . •

".
~._-~._
~-' ; ....'.. _',-,= •..
'. .~
a b
Figure 8.6 Examples of blurred images . (a) Small exponential PSF blur;
c d

(See Fig. 8.5a.)


Example 8.2 (A Spatially Varying Blur)
A forward looking radar (FLR) mounted on a platform at altitude h sends radio
frequency (RF) pulses and scans around the vertical axis, resulting in doughnut- a
shaped coverage of the ground (see Fig. 8.7a). At any scan position, the area illu-
minated can be considered to be bounded by an elliptical contour, which is the radar
antenna half-power gain pattern. The received signa, at any sean position is the sum of
the returns from all points in a resolution cell, that is, the region illuminated at distance
• r during the pulse interval. Therefore, , ,

vp(r, 4»= j
. . <r) 11'<')
up(r+s,4>+6')sdsd9' (8.5)
-<1>0(') I,(Yi

where 4>0 (r) is the angular width of the elliptical contour from its major axis, I, (r) and
/2 (r) correspond to the inner and outer ground intercept; of radar pulse width around
• the point at radial distance r, and up(r, 4» and vp(r, 4» are the functions u(x,y) and
. v(x,y), respectively, expressed in polar coordinates. Figure 8.7c shows the effect of
scanning by a forward-looking radar. Note that the PSF associated wil:h (8.5) is not
shift invariant. (show!) ,

Sec. 8.2 Image Observation Models 271



••

,
I
I
r Radar beam axis

h
I, -</>

I• ,
I
I I,
I
I

I Antenna half-
power ellipse

Cross section of radar beam


(a) scanning geometry;

,U)¥

I


• I
••
, J.' ,,,
, ... ~ ," s
,
8; e.


"'/
• • _0,%

f
0
>

"0'

.. , t. •
.(
, •
-':'~ ~-"'t!'<

, ",,--;' 1
• ~>.> ..,

"
".

$»' ,
,
, , • •
lrh

to·
> " .
i'i
$-
~""~_"''' M_'~_'_ - .."..... ,._gf~
Ib) object; (el FLR image (simulated).

,
Figure 8.7 FLR imaging.

272 Image Filtlilring and Restcration



d

~-_":""_----_IOg'0W Figure 8.8 Typical response of a photo-


-do graphic film.


Detector and Recorder Models

The response of image detectors and recorders is generally nonlinear. For example,
the response of photographic films, image scanners, and display devices can be
written as
g '" a.w~ (8.6)
'where ex and fl are device-dependent constants and Ii' is the input variable. For
photographic films, however, a more useful form of the model is (Fig. 8.8)
d = 'Y 10glO Ii' - do (8.7)
where 'Y is called the gamma of the film. Here w.represents the incident light
intensity and d is called the optical density. A film is called positive if it has negative
'Y. For 'Y = -I, one obtains a linear model between wand the reflected or trans-
mitted light intensity, which is proportional to g ~ lO-d. For photoelectronic de-
vices, w'represents the incident light intensity, and the output g is the scanning
beam current. The quantityB is generally positive and around 0.75.

Noise Models

The general noise model of (8.3) is applicable in many situations. For example, in
photoelectronic systems the noise in the electron beam current is often modeled as
Vg(x, y) 'Ill (x, y) + '112 (x, y)
'I](x, y) = (8.8)
.
where g is obtained from (8.6) and 'rll and ""2 are zero-mean, mutually independent,
Gaussian white noise fields. The signal-dependent term arises because the detection

and recording processes involve random electron emission (or silvergrain deposi-
tion in the case of films) having a Poisson distribution with a mean value of g. This
distribution is approximated by the Gaussian distribution as a limiting case. Since
the mean and variance of a Poisson distribution are equal, the signal-dependent
term has a standard deviation vg if it is assumed that 'TJ1 has unity variance. The
other term, 'TJ., represents wideband thermal noise, which can be modeled as Gaus-
sian white noise.
..
Sec~ 8.2 Image Observation Models 213

In the case of films, there is no thermal noise and the noise model is
TJ(x, y) = Vg(x, y) '1'11 (x, y) , (8.9)
where g now equals d, the optical density given by (8:7). A more-accurate model for
film grain noise takes the form.
TJ(X, y) = e( g (x, y»)" TJI (x, y) (8.10)
where E is a normalization constant depending on the average film grain area and 1'1'
lies in the interval j to t. )
The presence of the signal-dependent term in the noise model makes restora-
tion algorithms particularly difficult. Often, in the functionf[g(w)), w is replaced
by its spatial average ....", giving
'I'l(x, y) = /[g(fL.,)]111 (x, y) + 'Il2 (x, y) (8.11)
which makes T1(X, y) a Gaussian white noise random field. If the detector is oper-
ating in its linear region, we obtain, forphotoelectronic ,devices, a linear observa-
tion model ot.the form
v (x, y) = w(x, y) + vf;::'I'l1 (x, y) + 'll2 (x, y) (8.12)
where we have set a = I in (8.6) without loss of generality. For photographic films
. (with 'Y = -1), we obtain .
v (r, y) = -log lO W + a711 (x, y) - do (8.13)

where a is obtained by absorbing the various quantities in the noise model of (8.10).
The constant do bas the effect of scaling w by a constant and can be ignored, giving
(8.14)

where v (x, y) represents the observed optical density. The light intensity associated
with v is given by
i (r, y) = lO- v(>. y)
'" w(x, y)10-""'(<'>~
= w(x,y)n(x,y) (8.15)
where n ~ 10-a, " now appears as multiplicative noise having a log-normal distribu-

non. •

A different type of noise that occurs in the coherent imaging of objects is .


called speckle noise. For low-resolution objects, it is multiplicative and occurs
whenever the surface roughness of the object being imaged is of the order of the
wavelength of the incident radiation. It ismodeled as
v (x. y) = u(x, y)s(x, y) + 'I'j(x, y). (8.16)
where s(x,y). the speckle noise intensity, is a white noise random field with ex-
ponential distribution, that is, •
,

274 . Image Filtering and Restol1ltion Chap. 8


p.m= 1.. exp(~\ t >0


(T2 . (T2)' .,,- (8.17)
0, otherwise
Digital processing of images with speckle is discussed in section 8.13.

S;;.mpled Image Observation Models

With uniform sampling the observation model of (8.1)-(8.3) can be reduced to a


. discrete approximation
. ' .
of
.
the form
; .-
"

vern, n) = g[w(rn, n)J + 'T)(rn, n), V(m, n) (8.18)

w(m, n) = 2:2: hem, nik, l)u(k, I) (8.19)


k./= -I»

'l'j(rn, n) = f[g(w(m, n»lrl'(rn, n) + 1)"(m, n) (8.20) .


where Tj'(m,n) and Tj"(m, n) are discrete white noise fields, hem, n ik, I) is the
impulse response of the sampled system, and u (m, n), v(m, n) are the average
values of u(x, y) and v (x, y) over a pixel area in the sampling grid (Problem 8.3).

\ 8~3)INVERSEAND WIENER FILTERING


• •

--./

Inverse Filter

Inverse filtering is the process of recovering the input of a system' from its output.
For example, in the absence. of noise the inverse .filter would be a system that
recovers u(m, n) from the observations vern, n) (Fig. 8.9). This means we must have.
s' (x) = s:' (x), or gf[g(X)) = x (8.21)
hi [m, n; k, I) = h -1 (m, n; k, I) (8.22)
that is,

2:2:
k',l'= _eo
hl(rn, n i k:', l')h(k', 1';k,/)=6(m -k,n -I) (8.23)

Inverse filters are useful for precorrecting an input signal in anticipation of the

degradations caused by the system, such as correcting the nonlinearity of a display.
Design of physically realizable inverse filters is diffkult because theX are often
• . , .

I •
I
u(m. n) wlm.n} vlm,n) I ulm,n)
him, n; k, I} gl • I gf( • I h1lm, n; k.il
; "
I
System lnv""" tilter

• Figure 8.9 Inverse filter. •

Sec. 8.3 Inverse and Wiener Filtering 275



unstable. F0J: example, for spatially invariant systems (8.23) can be written as

2:2:
k', I' "" __ee
h' (m - k', n '-1')Jz(k', I') = SCm, n), V(m, n) (8.24)

Fourier transforming both sides, we obtain HI «(,)" w2)H (WI> U>2) = 1, which gives

W(wl>U>2)=H(l ) (8.25)
Ulb W2
. ' " ,

that is, the inverse filter frequency response is the reciprocal of the frequency
response of the given system. However, HI (U>I> U>2) will not exist if H ('wI> U>2) has
any zeros.

Pseudoinverse Filter •

The pseudoinverse filter is a stabilized version of the inverse filter. For a linear shift
invariant system with frequency response H(u>!> wz), the pseudoinverse filter is
defined as .
1
Hi=O •

H-(W!>W2) = H(U>I>W2)' (8.26)


,0, H=O
Here, H- (u>!> W2) is also called the generalized inverse of H(w!> W2), in analogy
with the definition of the generalized inverse of matrices. In practice, H- (U>h U>2) is
set to zero whenever IHI is less than a suitably chosen positive quantity E.
Example 8.3
,
Figure 8.10 shows a blurred image simulated digitally as the output of a noiseless linear
system. Therefore, W(Wh Wi) = H(w" W2)U(W" (2)' The inverse filtered image is ob-
tained as O(W1W2)~ W(Wl' w,)IH(wl' W2). In the presence of additive noise, the in-
verse filter output can be 'written as
, W N N
U=-+-=U+- (8.27)
H H H
where N(w"w2) is the noise term. Even if N is small, NIH can assume large values
resulting in amplification of noise in the filtered image. This is shown in Fig. 8.lOc,
where the small amount of noise introduced by computer round-off errors has been .
amplified by the.inverse filter. Pseudoinverse filtering reduces this effect (Fig. 8.lOd).

• •
The Wiener Filter

The main limitation of inverse and pseudoinverse filtering is that these filters
remain very sensitive to noise. Wiener filtering is a method of restoring images in
,
the presence of blur as VleU as noise. .
Let u(m; n) and v(m, n) be arbitrary, zero mean, random seq!1ences. It is
desired to obtain an estimate, It (m, n), of u(m,n) from v(m, n) such that the mean
square error
O'~ = E{[u(m, n) - It (m, n)j2} (8.28)

276 Image Filtering and


. '
Restoration .
Chap. 8


_~~~'Y!-:\?!"?/ -"",.,,<! H¥ "F
t_J1!,r)'yF

~- .<
. •
_.'
!'I""--""","" ,
t
t.
f!
t ,.',.

~
o
\

¢'"
·o
• , •
o

\,>"
.~?
'L -~

• - u

,
o •
'.,,: "

~. _l~L ~
, .. ' - -""',
(al Original image Ib}8lurred image

-q,1,-~:l«y~y~ ~- ~
r-
"'"' '" ,..., . ../1( "" "' •• q
,,':'
-:~~j
~
,
..... "'~t:--t;-
--
-
'-~\

.-'"
f ,.-' ""/

'.•


- ,-.. ~
, '
o


,

~

• ," 'F";

• ""- ""<+",



,i
- "c"

-'.
· .
• ,
- - . '. '" /,_4
-.' - '

l
'j

., -
" ,•
o
---~
"':.-,.~-...- ",",(","'0,,,.,,. ",,",
~

',/ ;, - '
.",

"'0 ,
0" ,,'_

, ,,", <'"
, -
' ,_, ""'" ,\.:r - • ,_ ,_ "0" c,,' ..'.,., ",",' ,he)

.. ~'" ... l:>


(e} Inverse filtered (d) Pseudo-inverse filtered

Figure 8.10 Inverse and pseudo-inverse filtered images.



. is minimized. The best estimate it (m, n) is known to be the conditional mean of
u(m, n) given {v(m, n), for every (m, n)}, that is,
it (m, n) = E[u(m, n)jv(k, I), 'r/(k, I)] (8.29)
Equation (8.29), simple as it looks, is quite difficult to solve in general. This is
because it is nonlinear, and the conditional probability density Pur.. required for
solving (8.29) is difficult to calculate. Therefore, one generally settles for the best
linear estimate of the form

-~ ,.....
.
,_." u(min) = LL gim, n i k, l)v(k, I)
- - 'k.,J;:;:;-~ --
-
(8.30)

where the filter impulse response g(m, n; k, l) is determined such that the mean
square error of (8.28) is minimized. It is well known that if u(m, n) and vtm, n) are

, Sec. 8.3 Inverse and Wiener Filtering 277



. •
jointly Gaussian sequences, then the solution of (8.29) is linear. Minimization of
(8.28) requires that the orthogonality condition ,
E[{u(m,n)-u(m,n)}v(m',n')]=O, '. 'rI(m,n),(m',n') (8.31)
be satisfied. Using the definition of cross-correlation
rab (m, n; k, I) ~ Era (m, n)b (k, 1)J (8.32)
for two arbitrary random sequences aim, n) andb(k, I), and given (8.30), the
orthogonality condition yields the equation

2:2: gem, n; k, l)r" ik, I; In', n') = r•• (m, n; m " n') (8.33)
k./ =:\ -:;;

Equations (8.30) and (8.33) are called the Wiener filter equations. If the auto-
correlation function of the observed image v im, n) and its cross-correlation with the
object ulm, n) are known, then (8.33) can be solved in principle. Often, u(m, n)
and v (m, n) can be assumed to be jointly stationary so that
(8.34)
for (a, b) = (u, u), (u, v), (v, v), and so on. This simplifies g (m, n; k, 1) to a spatially
invariant filter, denoted by g (m - k, n -'-I), and (3.33) reduces to

2:2: gem - k, n -1)r,,(k,.I) = r a v (m, n) (8.35)


1<,'''''' -~
, . Taking Fourier transforms of both sides, and solving for G (1I.ll, 1I.l2), we get

G (WI' (2) = Su, (WI> (2)s;:h WI> 1I.l2) (8.36)
where 0, Su., and S" are the Fourier transforms of g, rm and rVY respectively..
Equation (8.36) gives the Wiener filter frequency response and the filter equations
,
become
~ .
u(m, n) = 2:2: gem - k, n -l)v(k, I)
k.L» -;>0
(8.37)
A
. .' •

U (WI> wz) = O(ooI, II.lZ)V(WlI (2) (8:38)


where U and V are the Fourier transforms of u and v respectively. Suppose the
.v(m, n) is modeled by a linear observarion system with additive noise, that is,

vern, n) = 2:2:
k, 1
hem -
=-$
k, n -l)u(k, l) + 'Tj(m, n) (8.39)

where l)(m, n) is a stationary noise sequence uncorrelated with uim, n) and which
has the power spectral density S"w Then
S,y (WI, (2) := IH (w" (2)1 2 Suu (WI, 1I.l2) + S"" (Wj, (2)
. . . (8.40)
SUY (WI, 00:2) := H* (WI' ooz)5•• (WI' 00:2)
This gives

• (8.41l

278 Image Filtering and Re~toratjon. Chap. a


o

which is also called the Fourier-Wiener filter. This filter is completely determined
by the power spectra of the object and the noise and the frequency response of the
imaging system. The mean square error can also be written as

rr;= 4~2 ff S, (wj, (P2) d wl d w2 (8.42a)


o.
wnei to S" the power spectrum density of t-he error, is
. S.(WhW2)~ 11- GH 12 Suu + IGI2S'll'll (8.42b)
With G given by (8.41), this gives
S =SuuSTf'! (8A2c)
e JHI Suu + S'll'll
2

Remarks

In general, the Wiener filter is not separable even if the PSF and the various
covariance functions are. This means that two-dimensional Wiener filtering is not
equivalent to row-by-row followed bycolumn-by-column one-dimensional Wiener
filtering. . ' . .
. Neither of the error fields, e(m, n) = u(m, n) - ii(m. n), and e(m, n) ~
v(m, n) - hem, n) ® u(m, n), is white even if the observation noise 7J(m, n) is white.

Wiener Filter for Nonzero Mean Images. The foregoing development


requires that the observation model of (8.39) be zero mean. Th~re is no loss of
generality in this because if u(m, n) and T](m, n) are not zero mean, then from (8.39)
. .' 4
14 (m, n) = h im, n)@ fl.u (m, n) +• IJ.'ll (m, n),

jA., = E[x] (8.43).
which gives the zero mean model
v(m,n) = h(m,n)®u(m, n)+ item, n) (8.44)
where i ~ x - fl.,. In practice, fl.. can be estimated as the sample mean of the
observed image and the Wiener filter may be implemented on ii (m, n). On the other
hand, if fl.u and j.l.'ll are known, then the Wiener filter becomes
A

U = G(V - M.) + M; ,,

S (8.45)
= GV + IH 21S 'll'll slit. - GM~ •
, uu + 1l'l
,

where M(wb W2) f) .9"{f.L(m, n)} is the Fourier translorm of the mean. Note that the
above filter allows spatially varying means for u(m n) and 7](m, n). Only the covar-
iance functions are required to be spatially invariant. The spatially varying mean
may be estimated by local averaging of the image. 1n practice, this spatially varying
filter can be quite effective. If f.Lu and ~ are constants, then M; and M.. are Dirac
delta functions at ~l =S2 = O. This means that a constant is added to the processed
image which does not affect its dynamic range and hence its display.

Sec. 8.3 Inverse and Wiener Filtering 279



Phase of the Wiener Filter. Equation (8.41) can be written as
G =IGle iijG
'

IG!=' lHISw. (8.46)


. Iii!' Suu + s~"
00 '" 00 (WI> (02) '" On' '" ~.aH '" OH-l

that is, the phase of the Wiener filter is equal to the phase of the inverse filter (in the
frequency domain). Therefore, the Wiener filter or, equivalently, the mean square
criterion of (8.28), does not compensate for phase distortions due to noisein the
observations.

Wiener Smoothing Filter. In the absence of any blur, H = 1 and, the
Wiener filter becomes '

G ' - $"" . Snr (8.47)


In ~ 1 - ""
Suu + S'1'1 Sn, + 1
where Snr~ Su"IS"~ defines the signal-to-noise ratio at the frequencies (WI> (2)' This
is also called the (Wiener) smoothing filter. It is a zero-phase filter that depends only
. ,

,
Suulw,.O)
Glw,.O)

--- -----:-:-:-:::7 ~- -------


"',-,--......
(0) Noise smoothing(H ~ 11
,

Inverse
filter
,

Wiene'r

--
• ........
filter
_~ Smoothing
.... ...,...- filter

280, Image Filtering and Restoration.' ChaP I;}


,
on the signal-to-noise ratio Sm" For frequencies where Sm'}> 1, G becomes nearly
equal to unity which means that all these frequency components are in the
passband. When Snr 4, 1, G = Sr,r; that is, all frequency components where Snr 4, 1,
. are attenuated in proportion to their signal-to-noise ratio. For images, Sm' is usually
high at lower spatial frequencies. Therefore, the noise smoothing filter is a low-pass
filter (see Fig. 8.11a).
. "

Relation with Inverse Filtering_ In the absence of noise, we set S"" = 0 and
the Wiener filter reduces to
H*Suu " 1 )
G I5",,_0= IHfSuu =Ei (8.48
which is the inversefilter, On the other hand, taking the limit 51)1)-> 0, we obtain

f.l
lim G =
5",,-0
jH'
0
H=i=O=lr (8.49)
, , H=O
which is the pseudoinverse filter. Since the blurring process is usually a low-pass
filter, the Wiener filter acts as a high-pass filter at low levels of noise.
"
Interpretation of Wiener Filter Frequency Response. When both noise
and blur are present, the Wiener filter achieves a compromise between the low-pass
noise smoothing filter and the high-pass inverse filter resulting in a band-pass filter
(see Fig. 8. lIb). Figure 8.12 shows Wiener filtering results for noisy blurred images.
Observe that the deblurring effect of the Wiener filter diminishes rapidly as the
noise level increases. "

Wiener Filter for Diffraction Limited Systems. The Wiener filter for the

continuous observation model, analogous to (8.39)

V(X,Y):= If hex -x',y - y/)U(X',y')dx'dy' +'T](x,y) (8.50)

is given by.

(8.51)

For a diffraction limited system, H(!";" 1";2) will be zero outside a region, say 9l, in the
frequency plane. From (8.51), G will also be zero outside a. Thus, the Wiener filter
cannot resolve beyond the diffraction limit. ,

. Wiener Filter Digital Implementation. The impulse response of the


Wiener filter is given by "
.. ,
'-

. gem, n)';;::4~2 fr G(WI' (02) exp{j(mwi + n~2)}dwldW2 '.



(8.52)
.- -'If'"

. ,
~, ,

"Sec. 8.3. "Inverse and Wiener Filtering Z81


,
J

,,
';.,. -

~ ,"'~/"'i~j'_.. ~ ...i""J'rr.?"JI
i
____
, , _ __ " ;

lal Blurred with small noise Ib) ~stored


.
image la)
'

.J

,
)
"I <t;#

-
,- '~.

,•

J'

.. ...; cJ
b
..
.
....-....
.\ , ' •..... i"at,v , _ ,t'
."
'-J ""l

Ic) Blurred with increased noise Itll Restored image Ie)

Figure 8.12 Wiener filtering of noisy blurred images.


This integral can be computed approximately on a uniform N x N grid as

g(m,n)=-\ ~~ -!:!.sm,fl<!:!.-l
G(k,l)w-(mHnJ), (8.53)
N (k.J)~-Nr2 . 2 2
where G (k,l) ~ G (21J'kIN, 21J'lIN), W ~ exp(- j21J'IN). The preceding series can
be calculated via the two-dimensional FF]" from which we can approximate gem, n)
by If (m, n) over the N x N grid defined above. Outside this grid g(m, n) is assumed
to be zero. Sometimes the region of support of gem, n) is much smaller than the
N x N grid. Then the convolution of gem, n) with v(m, n) could be implemented .
directly in the spatial domain. Modern image processing systems provide such a
facility in hardwareaUowing high-speed implementation. Figure 8.13a shows this
algorithm.

282 Image Filtering and Restoration . Chap. 8



G(w" "':1) Sample over Glk, II
- . 10FT 9Jm. nle glm, nl
Truncate
N X Ngrid NXN
,

,
Input image Restored image
A
Vim. nl Convolution ulm, n} •
. glm, nl

Ia) Direct convolution method

N N N
M vim, n} 0
N N -Vlk, II N Ulk,
A
n
M VIm, nl 0 0

vim, nl Zero OFT


X X
padding NXN •
Input • •

iIrulge N ;

Wlm,n}
windowing , N -
Glk,1I

Glw,. w2}
.
Sample over
!
. NX Ngrid

• M


M D(m, n)

NX N 10FT
and select

Ibl OFT method


Figure 8.13 Digital implementations of the Wiener filter.

Alternatively, the Wiener filter can be implemented in the DFf domain as


shown in Fig. 8.13b, Now the frequency response is sampled as G (k, I) and is -
multiplied by the DFf of the zero-padded, windowed observations, An appropriate
region of the inverse DFf then gives the restored image. This is simply an imple-
mentation of the convolution of gem, n) with windowed v(m, n) via the DfT, The
windowing operation is useful when an infinite sequence is replaced by a finite
sequence in convolution of two sequences. Two-dimensional windows are often

taken to be separable product of one-dimensional windows as
w(m, n) = Wl (m)wz (n) (8.54)
where WI (m) and Wz (n) are one-dimensional windows (see TaLYie 8.2), For images of
. size 256 x 256 or larger, the rectangular window has been found to be appropriate.
For smaller size images, other windows are more useful. If the image size is large
and the filter gem, n) can be approximated by a small size array, then their con- ,
volution can be realized by filtering the image in small, overlapping blocks and
summing the outputs of the block filters [9]•

Sec. 8.3· .: Inverse and Wiener Filtering 283 .



.
TABLE 8.2 One-Dimensional Windows win i for 0 S n s M -.,
Rectangular ·w(n)=l,.O<:n<:M-l
Zit
Bartlett
w()
n =
rM-l'
(Triangular)
2 - -2n
--
M -1'

Hanning . w(n)=~[~-co{~~\) ]'OSh sM-l


Hamming w(n) = 0.54 - 0.46 co{~'lT\), 0 S n S M~1

cos(~T'~ 1) : CO{~~1J 0 <: n. <: M -1

Blackman w(n) = 0.42 - 0.5 0.08


,

8.4 FINITE IMPULSE RESPONSE (FIR) WIENER FILTERS •

Theoretically, the Wiener filter has an infinite impulse' response which requires
working with large size DFTs_ However, the effective response of these filters is
often much smaller, than the object size. Therefore, optimum FIR filters could
achieve the performance of IIR filters but with lower computational complexity.

Filter Design

An FIR Wiener filter can


,
be implemented as a convolution of the form

u(m,n)= LL g(i,j)v(m -i,n-j) (8.55)


(i,}1 E W

W ={-M <i,j <:M} (8.56)


where g(i, j) are the optimal fitter weights that minimize the mean square error
E[{u(m. n) - (in,
,
un)}'].
.
The associated orthogonality condition

E({u(m, n) - u(m, n)}v(m - k, n -1)] = 0, 'rI(k, 1) E W . (8.57)
,
reduces to a set of (2M + 1)2 simultaneous equations
,>'

. ruv (k, I)':"" LL g(i, j)rvv (k .:. i, 1 - j) = 0, (k,l) E W (8.58)


~(i.l1 E W ,
, , .
Using (8.39) and assuming 'T}(m, n) to be zero mean white noise of variance O'~, it is
easy to deduce that .: •

rvv (k, I) = ru~ (k, I) @a(k, I) + O'~ ll(k, 1) (8.59)
,,'
a(k, 1) ~ h(k, 1) * h(~, I) = LL h (i, j)h(i + k,j + I) (8.60)
• •
J, J "'" -~

Image Filtering and Restoration Chap. a



,

'uy(k,l)==h(k,l)'k'uu(k,l) = L2: h(i,j)'uu(i +k,j +1) (8.61)


i,
j "'" -!l;\

Defining the correlation function

I) A ',m (k, I) _ '"" (k, 1) (8.62)
'0 (',
(
1.
= rUIi (0
. ,
0') - a '2

where o? is the variance of the image, and using the factthat z, (k, I) = 'o( -k, -1),
(8.58) reduces to

Cf<
~&(k,I:)+rG(k,I)@a(k,l) eg(k,l)
o (8.63)
= h (k, I) @'o(k, l), (k, l) E W
• ,

In matrix form this becomes a block Toeplitz system


, , .

[:~I+~]~=I"UV " (8.64)

where ~ is a (2M + 1) x (2M + 1) block Toeplitz matrix of basic dimension


(2M + 1) x (2M + 1) and ;'uv and~, are (2M + 1)2X 1 vectors containing the
knowns h (k, l)@re(k, I) and the unknowns g(k, I), respectively.
Remarks

The number of unknowns in (8.58) can be reduced if the PSF hem, n) and the
image covariance ru" (m, n) have some symmetry. For example, in the often-
encountered case where hem, n) = h(lmi, In!) and '"u(m, n) "" '"u(lml, In i), we will
have gem, n) = g(jmi, in i) and (8.63) can be reduced to (M + 1)2 simultaneous
equations.
The quantities aik, I) and 'U!' (k, I) can also be determined as the inverse
Fourier transforms of \8 (<OJ, W2W and Suu (<OJ, (02)H' (Wj, (02)' respectively.
When there is no blur (that is, H "" 1), h(k, I) = o(~, l), 'u,' ik, I) = r"u (k, I) and
the resulting filter is the optimum FIR smoothing filter.
The size of the FIR Wiener filter grows with the amount of blur and the
additive noise. Experimental results have shown that FIR Wiener filters of sizes up
to 15 x 15 can be quite satisfactory in a variety of cases with different levels of blur
and noise. In the special case of no blur, (8.63) becomes .
,
Cf2 ,
-;;¥o(k, I) + To (k, I) @g(k, I) = To (k, I), (k, l) E W' (8.65)

. From this, it is seen that as the, SNR -:- o 2/a ; -.. co, the filter response g(k, l)-i>'
o(k, 1), that is; the FIR filter support goes down to one pixel. On the other hand, if
SNR "" 0,' g (k, I) "'" (o- 2/Cf~)ro (k, I). This means, the filter support would effectively
be the same as the region outside of which the image correlations are negligible. For
images with roCk, I) = Q.9S + , this region happens to be of size 32 x 32.
V kZ P

In general, as Cf;":':'" 0, g(m, n) becomes the optimum FIR inverse filter. As


, g im, n) would converge to the pseudoinverse filter.
M -.. "",
. . '

Sec. 8.4' , Finite 'Impulse Response (FIR) Wiener Filters 285


• -
,

"[;J;{#i>!l
~
\"1!-,
,
,f :
't.
'. '"
i,
''C
~

.;>-
c '. '

J '
s.J4
,/" Cc
,;;"r & f
,iili;.", , _.$
,I"
.A... _ .-<~fr"
~

la) Blurred Image with no noise lbl Restored by,optimaI9'" 9 FIR filter

1 ~:~~--

,1;. "j

.-
,~liltl"
...;iF ,;
,,' >,,;,
<;,:,.;
':~
, ,"
'j"
~'-

})1 ,•
<~-

,/~ty·
'f" l1i- r
_ #' ",5
J;1:. .
.11:".. . '...,
; /.... 'I,'
• ,,~j#'
.

Ie) Blurred Image with small additive noise ldl Restored by 9 x 9 optimal FIR filter

Figure 8.14 FIR Wiener filtering of blurred images.

Example 8.4
Figure 8.14 shows examples of FIR Wiener filtering of images blurred by the

Gaussian PSF

Ot>O (8.66)

V 2
and modeled by the covariance function r(m, n) "" 0"20.95 ",z + n • If Ot "'" 0, and h(m, n)
is restricted to a finite region, say -ps m, n s p, then the h(m,n) defined above can
be used to model aberrations in a lens with finite aperture. Figs. 8.15 and 8.16 show
examples of FIR Wiener filtering of blurred images tbat were intentionally misfocused
by the digitizing camera.

286 Image Filtering and Restoration Chap. 8



/ •

"
'i'
,," " '\ ~ c",
~

,(

,
/

\",
" <>••
'"


j
,
-, ...:l. _'" ' ", '
- ',." -
(al Digitizad image (bl FIR filtered

if
' ~~
-;,- 'f'''-
s- '.; ',*'

,~"


.'-

v'
'1
\1:\~
'j:'- , (-'.,
J;'- •
"'-...;~-

1,..;.- ;f
fc!
\

',I)

"i
"*,
(c) Digitized image Cdl FIR filtered

Figure 8.15 FIR Wiener filtering of improperly digitized images.

Spatially Varying FiR Filters ,


~

The local structure and the ease of implementation of the FIR filters makes them •
attractive candidates for the restoration of nonstationary images blurred byspatially
varying PSFs. A simple but effective nonstationary model for images is one that has
spatially
,
varying mean and variance functions and sbift-invariant correlations, that
IS,

E[u(m; n)]:;;; /k(m, n)
E[{u(m, n) - Il-(m, nn {u(m - k, n -l).,. l1-(rn - k, n - 1m (8.67)

"" a;2(m, n)ro(k, I)


Sec. 8.4 Finite Impulse Response iFIR) Wiener Filters 281



: , ; ,7 ~",""'"""f'----""~
, -
• - ,

,,

, . ,

, I
'

I
.j



.~
£,~
T~
,,, .
;di
'If, .\
• !\
,-!
'1%
,..
,.
~4
.~
~; ',- .,"
.'

£
'".]j

!
1

I•
<al Digitilad im~ge
, lb) FIR filtered
-
r#''''''('''-~'':---'''''''''f ~_ "'<:'1 ":?i ~ .-~ '"<} !'/Y"'t ~ ":,' _ "4"'._ : ' / <1ft" I" " >,",.,.1. r, 'if
C4 :-,-!eF'·' '"'>d r ~ r- ,:".-; 'Y'".~ '1"". L·'!'" (' !".;j' ;, ( , ?.' ..... ,-- ~4 '" "".- -K" _ 11 '~~ \1' (;
-~,-- - ~~_ C:1':::'\""" "c~ ~,. ..., n ""","n '""K' ',<. '~T-' ", ,., ," '. ""It'" 'J • ...,.., .... '" / '. "\:: S,~: ~
~" ..",.<,_H
" " >,I'
__, .,...., _'_ __ ',' ~• ""....,."...,..,., ~"'., .~,~. _,.~ ,"" '." " ~
"~'" ~
,'~'" -n --'\ n".Q
n. -\.v··~"-.'_' •.• __
-'-':~- ~<_'~'_I'?')
~f.( '; f."l.<"",r;,'·' .r-rr-.>: ~ -r--
..
(;;./_~. ~~H" ~;_."'''':/ ~!'. __ ~,,'

• .. .. • •

- -.-
• ..
,......--
',_'d'_ -"
,..,-..•"_ ,_
',.
-,' -~

.1,
, • • .."- ..
• "
~
-- :-· ..·Ct'
,' .. ' "'~

- -~
• •

----------~._-----

":- '"
, . . -- -

tel Digitized imaga (dl FIR filtered

Figure 8.16 .FIR Wiener fdtering of improperly digitized images.


where ro (0, 0) = 1. If the effective width of the PSF is less than or equal to W, and h,
IL, and 0- 2 are slowly varying, then the spatially "'arying estimate can be shown to be
given by (see Problem 8.10) •

..
'.
U(m,n)=
. (i,ll
2:2:€ wgm,ll .(i, j)v(m,"': i.n - j) (8.68)
-
gm." (i, j) ~ gm."
.
(i, j) + (2/1/ I)Zll - 2:2: s-: (k, I ) '
't" (k,f) E W
(8.69)
2 2 2
where gm," (i,}') is the solution of (8.64) with 0- = ir (m, n). The quantity 0- tm, n)
can be estimated from the observation model. The second term in (8.69) adds a

. ' Image Filtering <lnd Restoration . Chap. 8


. ,- . '" -~ ,

,
. constant with respect to (i, j) to the FIR filter which is equivalent to adding to the
estimate a quantity proportional to the local average of the observations .

ExampleS.s:

.Consider the case of noisy images with no blur. Figures 8.17 and 8.18 show noisy
. images with SNR = 10 dB and their Wiener filter estimates. A spatially varying FIR
Wiener filter called Optmask: was designed using a 2(m, n) =G~(m, n) -<7', where
a; im, 11) was estimated as the local variance of the pixels in the neighborhood of
(m, 11). The size of the Optmask window VI lias changed from region to region such that
the output SNR was nearly constant [8]. This criterion gives a uniform visual quality to
the filtered image, resulting in a more favorable appearance. The imagesobtained by .

.'"',"' ..~

,••
o. .~

•,oJ;, ,j

',.'

f> '- •

••

(a) Noisy (SNR = 10d81 (b) Wiener

,
,,- ,

••

, ~-

it
-,
"
,•

(e) Cesar •
(d) Optrnask

Figure 8.17 Noisy image (SNR = 10 dB) and restored images.

.Sec. 804 Finite'!mpulse Re'sponse {HRi Wiener Filters



.
~•

- " -",~

r
l1
,J

'c"'
,- -'r
,
" .-,""-
, is
_ .. 0fAl'* ~ ~_ '" - lii.:'_"f,~,
I
lai Noisy ISNR = 'OdB) (b) Wiener

••
_ ',-;;

~." ." wd"< ,


(e) Cosar (d) Optmask

Flgur. 8.18 Noisy' image (SNR = 10 dB) and restored images.

the Casar filter are based on a spatially varying AR model, which is discussed in

Section 8.12. Comparison of the spatially adaptive FIR filter output with the con-
ventional Wiener filter shows a significant improvement. The comparison appears
much more dramatic on a aood quality CRT display unit compared to the pictures
printed in this text. .

8.5 OTHER FOURIER DOMAIN FILTERS

The Wiener filter of (8.41) motivates the use of other Fourier domain filters as
discussed below.

290 . Image Filtering and Restoration Chap.S



Geometl1c Mean Filter
,

This filter is the geometric mean of the pseudoinverse and the Wiener filters, that is,
" ( S H* los

G, = (H-)'\~u" 1;;1 + S""


2
, 0$s<1 (8.70)

For s =~, this becomes

G u2 F
s: 1I2
IHH-II12 exp{-j6n} (8.71)
z
IHl Suu + S'I'I
where 6n(WhWZ) is the phase of H(WhWZ)' Unlike the Wiener filter, the power
spectral density of the output of this filter is identical to that of the object when
H '1= O. As S~~ --l> 0, this filter also becomes the pseudoinverse filter (see Problem
8.11 ).

Nonlinear Filters

These filters are of the form


(8.72)
where G(x, w"wz) is a nonlinear function of . x.
. Digital implementation of such
nonlinear filters is quite easy. A simple nonlinear filter, called the root-filter, is
defined as
(8.73)
Depending on the value of a this could be a low-pass, a high-pass, or a band-pass
filter. For values of a ~ 1, it acts as a high-pass filter (Fig: 8.19) for typical images

, '" ,,,.",g~l~j
(8) a ~ 0.8 lb) a =,0.6
.' i •
,; ; ,, '
Figure 8.19 Root filtering of FLR image of Fig. 8.7c.


,
Sec. 8.5 .; ether Fourier Domain Filters ' ' , 291
(having small energy at, high spatial frequencies) because the samples of low
amplitudes of V (WI> w,) are enhanced relative to the high-amplitude samples. for
a s- 1, the large-magnitude samples .are amplified relative to the' low-magnitude
ones, giving a low-pass-filter effect for typical images. Another useful nonlinear
filter is the complex logarithm of the observations
. (;(w uh) = {log VI}exp{j6 v}, [V!;z e. (8.74)
lr. 0, otherwise
where E is some preselected small positive number. The sequence u(m, n) is also
called the cepstrum of v(m, n). (Also see Section 7.5.)

ae FILTERiNG USING IMAGE TRANSFORMS




Wiener Filtering

Consider the zero mean image observation model

v(m, n) = L: L: hem, n; i, j)u(i, j) + 'll(m, n), (8.75)


i j
o m;$ ;$ NI - 1, 0 ;$ n ;$ N z - 1
Assuming the object uim, n) is of size M I x M~ the preceding equation can be
written as (see Chapter 2), '
• (8.76)
where o- and 11/ are NI N z xl, u- is M I M 2 X 1 and $ , ,
is a block matrix. The best
,

linear estimate ' •

it, = 17o- (8.77)


that minimizes the average mean square error

O'~= lI-/M, E(tv-it,?(a-u-)] (8.78) ,

is obtained by the orthogonality relation


E(a - it,)o-'j = 0 (8.79)
This gives the Wiener filter as an M j M, x N I N z matrix
17 = E(a()-T]{E(,..o-TJrI = ~Sf'T(.n-~gp + a.t l (8.80)
where ~ and a.
are the covariance matrices of u- and 11/, respectively, which are
assumed to be uncorrelated, that is,
(8.81)
,

The resulting error



is found to

be
. '
(8.82)

292 • Image Filtering and Restoration ,Chap. 8



Assuming!7l and !7lr. to be positive definite, (8.80) can be written via the ABeD
lemma for inversion of a matrix A - BCD (see Table 2.6) as
/7 = [$T !7l;1 /7C + !7l-1fl$T ~l • (8.83)
For the stationary observation model where the PSF is spatially invariant, the object
is a stationary random field, and the noise is white with zero mean and variance a~,
the Wiener filter becomes
/7 = !7l$T[$!7l$T + a~ It l = [91-T.%, + a~ gz.-lfl$T (8.84)
where!7l and $ are doubly block Toeplitz matrices.
In the case of no blur, $ = I, and 11 becomes the optimum noise smoothing
filter
(8.85)

Remarks.

If the noise power goes to zero, then the Wiener filter becomes the pseudoinverse
of $(see Section 8.9), that is, ..

[gz-"T$r1$T, if NINz > MIM z, rank[$Tgz-"]


A = 1'v1I M2
/7- ~ lim 11= (8.86)
U~""0!7l$T[~$Trl, if N1Nz:S MIA12 , rank[.9n1i'a-r]

=NIN2

'*
The Wiener filter is not separable (that is, /7 G 1 ® G 2) even if the PSF and
the object and noise covariance functions are separable. However, the pseudo-
inverse filter /7- is separable if!7l and $ are.
The Wiener filter for nonzero mean random variables u and R- becomes
,
u;: p,,, + /7 (0- - p,~) = /70- + [~!7l~I$ + !7l-lr l m;-Ip... -/7p,n (8.87)
This form of the filter is useful only when the mean of the object and/or the noise
are spatially varying.

Generalized Wiener Filtering


The size of the filter matrix /7 becomes quite large even when the arrays u im, n)
and vern, n) are small. For example, for 16 x 16 arrays, /7 is256 x 256..Moreover,
to calculate /7, a matrix of size M I M 2 X M I M z or N I N 2 x N 1 N 2 has to be inverted.
' .. Generalized Wiener filtering gives an efficient method of approximately imple-
menting the Wiener filter using fast unitary transforms. Forinstance, using (8.84) in
(8.77) and defining
(8.88)
we can write....,
,'/
. .' . ·1' •
It ;:A*lA~*1]Jt{$T o II • (8.89)

~A*T!lJw j

Sec.8.6· Filtering'Using Image Transforms 293



where is a unitary matrix. jZ ~ ~'{!ZJk.{·T, ((}~ k-t[.%'T ()], This equatio'n sug-
.c
gests the implementation of Fig. 8.20 and is called generalized Wienerfiltering. For
stationary observation models, .<TC l' " is a convolution operation and <'l1turns out to
, be nearly diagonal for many unitary transforms. Therefore, approximating
A - -
(Q =!ZJ({) "" [Diagffi)w . (8.90)
and mapping (0 and (0 back into N, x N, arrays, we get
.
IV o; l) = P(k, l)w(k, I) (8,91)
-
where p (k, I) come from the diagonal elements 01!ZJ. Figure 8.20 shows the imple-
.

mentation of generalized Wiener filtering when !ZJ is approximated by its diagonal.


Now, if ~is a fast transform requiring O(N1NzlogNlNz) operations for trans-
forming an N, x N, array, the total number of operations are reduced from
O(Mf Jl,m to O(Ml M 2 logiVl l M 2 ) , For 16 x 16 arrays, this means a reduction from
0(224 ) to 0(2 11 ) . Among the various fast transforms, the cosine arid sine transforms
have been found useful for generalized Wiener filtering of images [12, 13].
The generalized Wiener filter implementation of Fig. 8.20 is exact when ~
diagonalizes flJ. However, this implementation is impractical because this diagonal-
izing transform is generally not a fast transform. For a fast transform ~, the
diagonal elements of flJ may be approximated as follows . .
-
Diag{!Z'} '= Diag{~'{l9CT9C + <T~a-lrl~*7}
.
• •

= [Diag{v.t.9l"T<9l"~*I) + <T~(Diag{v.tGfvt*I)-lrl
l.
where we have approximated Diag{~Gf-I~.(*1J by [Diag{v.tGf.-ut*1Jr This gives

p(k, l) = (iiz(k, 1) + ~~/.y(k, I» (8.92)


where h2(k, 1) and .y(k, I) are the elements on the diagonal of v(.9'CT 9C~*T and
vtlll~*T, respectively. Note that .y(k, I) are the mean square values of the trans-
fonncoefficients, that is, if u, ~~w, then :Y(k, 1) ~ E[lu(k, l)n
, A
eo w
II! =" {.te. ( *T
0- It
.9'{·T ~,( ""!" {_l'
••

tal Wiener filter in .. A. transform domain

A ,
vim, nl ...[) :m~ge
. w(k,n w(~. (l 2-D inverse . ,,(m. nl
hl-m, -n) X
. tr~n.form transform
. . .

lbl Implementation by approximation


-
of >l!by its diagonal

.Figure 8.20 Generalized Wiene.r filtering.

Image Filtering and Restoration. , Chap- S



,
Filtering by F~.lst Decompositions [13]
For narrow PSFs and stationary observation models, it is often possible to
decompose the filter gain matrix 1/ as .
1/ = I/o + l7 b,
o 0
I/o A !lJog{;r, a, ~ !lJ.g{;T (8.93)

where!lJo can be diagonalized by a fast transform and !lJb is sparse and of low rank.
Hence, the filter equation can be written as
(;, = I/o- = I/orr + I/pu ~;;,o + u! (8.94)
When J2Jo ~ ~-t!lJo.A:*T is diagonal for some fast unitary transform .A:, the com-
ponent ;;~o can be written as
A 0 _
lb - -en ..7L"
y/o
,

rrrT 0- -_ .../ (/ 0'


.,*T[t./(/~oV"t'
nl ~'.T1jv"" arT 0- ] -- (...ott-
/['l7t/
-
J*T """
~o to (8.95)
and can be implemented as an exact generalized Wiener filter. The other compo-
nent, (;,', generally requires performing operations on only a few boundary values of
g{;T o-, The number of operations required for calculating (;,b dependson the width
of thePSF and the image correlation model. Usually these require fewer operations
• a
than in the direct computation of from 1/. Therefore, the overall filter can be
implemented quite efficiently. If the image size is large compared to the width of the
I'SF, the boundary response can be ignored. Then, 11 is approximately Go, which
can be implemented via its diagonalizing fast transform [13J. ' .
~\
, I
, 8.7ISMOuTHING SPLINES AND INTERPOLATION [15-17]
J '
Smoothing splines are curves used to estimate a continuous function from its sample
values available on a grid. In image processing, spline functions are useful for ,
magnification and noise smoothing. Typically, pixels in each horizontal scan line are
first fit by smoothing splines and then finely sampled to achieve a desired magnifica-
tion in the horizontal direction. The same procedure is then repeated along the
vertical direction. Thus the image is smoothed and interpolated by a separable
function.
Let {Yi, 0 $ i $ N} be a given 'Set of (N + 1) observations of a continuous
functionf(x) sampled uniformly (this is only a convenient assumption) such that
~=~+~ h>O .
and
(8.96) .

o

where n (x) represents the errors (or noise) in the observation process. Smoothing
splines fit a smooth function g (x) through the available set of observations such that
its "roughness," measured by the energy in the second derivatives (that is,
2 2
f[d g(x)/dx j2 dx), over [XO,XN] is minimized. Simultaneously, the least squares
error at, the observation points is restricted, that is, iotg, ~ g (Xi)' . '

N 2
F~ 2: gi - Yi $ S (8.97)

.-
. 0 17..

Sec. 8.7 Smoothing Splines and Interpolation 115-17J 295 .

• •
For S = 0, this means an absolute fit through the observation points' is required.

Typically, o : is the mean square value of the noise. and Sis chosen to lie in the range
(N + 1) +- V2(N + 1), Whilll is also called the confidence jnterval of S. The minimi-
zation problem has two solutions: .

1. When S is sufficiently large, the constraint of (8.97) is satisfied by a straight-
line [it .
g(x) = a + bx, Xo <x <x",
(8.98)
' = (/Lx; - /Lx ....Y)
IJ 2) , . a = l.\.y - b /L.J'
(/Lxx - ....x '

ill

where /L denotes sample aver~ge, for instance, /Lx A (:to X;)/(N + 1), and so on .

2. The constraint in (8.97) is tight, so only the equality constraint can be satis-
fied. The solution becomes a set of piecewise continuous third-order
polynomials called cubic splines, .given by
g(x) = a, + bj(x - x;) + c;(x - X;)2 + dj(x - xY, Xj:$X <Xi .. ! (8.99)

The coefficients of these spline functions are obtained by solving


[P+AQ]c=.\.v, vgLTy, co= CiII= 0
o
a-
a=y--"Lc •
A.
(8.100)

OSi:$N-1

where a and yare (N + 1) x 1 vectors of the elements {ai, 0:$ i :$ N}, {Yj, 0 0$ is N},
respectively, and c is the (N -1) x 1 vector of elements [c., 1 Si -s N -1]. The
matrices Q and L are, respectively, (N - 1) x (N - 1) tridiagonal Toeplitz and
(N + 1) x (N - 1) lower triangular Toeplitz, namely,

1
-2
QAt! , L~.! 1 (8.101)
-3 -h

0
,
~ , •

and P tl (1~ LTL. The parameter i\ is such that the equality constraint
,
• , ' 2
F(A) ~ Iia -/11 = v T APAv = S (8.102)'
sr;

296 , Image Filtering and,Restoration Chap, 8


where A.~ [P + l\Qfl, is satisfied. The nonlinear equation F(A) == S is solved


iteratively-e-for instance, by Newton's method-as

_ F("-d - S l1 dF(l..) T
Ak+l - Ak - F'(Ak) ' peA) = dA == 2v AQAPAv (8.103)

Remarl:s

The solution of (8.99) gives gj 4 g(x;) = a., that is, a is the best least squares estimate
of y. It can be shown that a can also be considered as the Wiener filter estimate of
y based on appropriate autocorrelation models (see Problem 8,15). .. .
'. The special case S = 0 gives what are called the interpolating splines, where the
splines must pass through the given data points ..
.'
Example 8.6
Suppose the given data is 1, 3, 4, 2, 1 and let h = 1 and (J'" "" 1: Then N = 4,xo = 0,
. 'x."" 4. For the case of straight-line fit, we get I'-x = 2,ftY "" 2.2,I'-":Y = 4,2, and 11= = 6,
which gives, b "" -0.1, a ,= VI,. and g(x) "" 2A - O.Ix.' The least square error
" ~i (Yi - g,)2 = 6.7. The confidence interval for Sis [1.84,8.16]. However, if Sis chosen
to be less than 6.7, say S = 5, then. we have to go to the cubic splines. Now


Solution of F(A) - S = 0 gives ~ = 0.0274 for S == 5. From (8.100) we get
r-
2.199
2.397
0.198
0.057
r 0.000 -,
-0.033
0.000
-0.005
a= . 2.415 , b= -0.194 , c"" -0.049 , d'" 0.009
2.180 -0.357 -0.022 0.007
1.808.,.J 0.000 L 0.000 0.000

The least squares error checks out to be 4.998 = 5.

8.8 LEAST SQUARES FilTERS [18, 19]

In the previous section we saw that the smoothing splines solve a least squares
minimization problem. Many problems in linear estimation theory can also be
reduced to least squares minimization problems. In this section we consider such
problems in the context of image restoration.

Constrained least Squares Restoration

Consider the spatially invariant image Observation model of (8.39). The constrained
• •

. ~ . u
least squares restoration filter output (m, n~, which is an estimate of u(m, n),
.
muurmzes a quantity
J ~ IIq(m, n)@u (m, n)W (8.104)

Sec. 8,8 Least Squares Filters 297


subject to the constraint '

IIv(m, n) - hem, tI)@u (m, n)W:s sZ (8.105)
Z
where e 20 O.

IIa(m, n)W ~ 2: 2: Ja(m, n}j2 = /


m "
2 If
'IT -.
A (Wb Wz)1 dWl dwz ~ IIA (Wl1 wz)!I"
2
(8.106)
.

and q(m, n) is an operator that measures the "roughness" of it (m, n). For example,
if q(m, n) is the impulse response of a high-pass filter, then minimization of J
implies smoothing of high frequencies or rough edges. Using the Parseval relation
this implies minimization of
J = IIQ(OObWz)U whwz)IF
, (8.107)
subject to
, z Z I
IIV (WI. Wz) ~ H (Wb wz)U (W1> wz)ll :S e (8.108)
The solution obtained via the Lagrange multiplier method gives
,
U(001> wz) = Gis (wt, WZ)V(WhWz) (8.109)
G A H*(WhWZ)
(8.110)
Is == IH (Wh wz)f2 + "YIQ (Wh l1}z)P
,
The Lagrange multiplier '( is determined such that U satisfies the equality in
(8.108) subject to (8.109) and (8.110). This yields a nonlinear equation for "/:
41VI z
A.::L
IQIf --
!('Y)=4'ITz -l (IJIlz+'YIQIZ)zdwldw2
_ 2_
e-O (8.111)
In least squares filtering, q(m, n) is commonly chosen as a finite difference approxi-
mation of the Laplacian operator iffiJx 2' + flfa y 2. For example, on a square grid with
spacing Ax = Ay =: 1, and IX ':"! below, one obtains
. q(m, n) ~ -oem, 12) + a[o(m - 1, n) + oem + l,n) •
(8.112)
+ SCm, n - 1) + oem, n + 1)]

Remarks

For the vector image model problem, •

0' == .9r:'u' + ", (8.113)


z
min IWti,ll ,
. (8.114)
where $? is a given matrix, the least squares filter is obtained by solving

;" = (~T..'/'+9CT$ )- l $ T 0-

ill[.9r:'[$?T$?t l.9r:'T
+ 'YIrl o-W - e=0 Z (8.115)

298 Image FilterinQ and Restoration ,Chap. 8


,
In the absence of blur, one obtains the least squares smoothing filter
/;. == ('Y$T$ + 0- o-
1

(8.116)
This becomes the two-dimensional version of the smoothing spline solution if $' is
obtained from the discrete Laplacian operator of (8.112). A recursive solution of
this problem is considered in [32]. .
Comparison of (8.110) and (8.116) with (8.41) and (8.84), respectively, shows
that the least squares filter is in the class of Wiener filters. For example, if we specify
S'1'l ~ 'Y and Suu = 1/IQI2 , then the two filters are identical. In fact, specifyingg(m, n)
is equivalent to modeling the object by a random field whose power spectral density
function is l/IQ 12 (see Problem 8.17).

""\
~GENERALIZEDINVERSE, SVD, AND ITERATIVE METHODS

The foregoing least squares and mean square restoration filters can also be realized
by direct minimization of their quadratic cost functionals, Such direct minimization
· techniques are most useful when little is known about the statistical properties of
the observed image data and when the PSF is spatially variant. Consider the image
observation model
v=Hu (8.117) .

where v and u are vectors of appropriate dimensions and H is a rectangular (say


M x 1'1) PSF matrix. The unconstrained least squares estimate u minimizes the

norm
(8.118) •,

The Pseudoinverse

A solution vector uthat minimizes (8.118) must satisfy


HTHu=HTv (8.119)
T
If H H is nonsingular, this gives
ii =H-v, (8.120)

as the unique least squares solution. H- is called the pseudoinverse of H. This




• pseudoinverse has the interesting property .
(8.121)
. Note, however, that BH- =I-' ·1. If His M x N, then H- is N x M. A necessary
condition for BTB to be nonsingular is that M <:!: N and the rank of H should be N.
If M < N and the rank of His M, then the pseudoinverse of H is defined as a matrix
H-, which satisfies
HH-=l (8.122)

. Sec. 8.9 Generalized Inverse, SVD, and Iterative Methods


In this case, H- is not unique. One solutionis
H "" HT(HHTt 1• (lU23)

In general, whenever the rank of H is r < N. (8.119) does not have a jmique
solution. Additional constraints on u are then necessary to make it unique.

Minimum NQrm least Squares {MNLS) Solution


and the Generalized Inverse

A vector u" that has the minimum norm l/u!i1 among all the solutions of (8.119) is
called the MNLS (minimum norm least squares) solution. Thus
u" = m}n{IIt1!12; HTHii = H Tv} (8.124)
u
Clearly, if rank [HTH] = N, then u" = uis the least squares solution. Using the
singular value expansion of H, it can be shown that the transformation between v
and u" is linear and unique and is given by
(8.125)
,

The matrix H+ is called the generalized inverse of H. If the M x N matrix H has the
SVD expansion [see (5.188»)
r
.H = 2: A~$mq,~ (8.126a)
m~l

then, H+ is an N x M matrix with an SVD expansion . , •

r
., H+ = 2: A-1/2.l.o/m't'm
rtf
.II
• (8.126b)
m "" 1

where q,m and \jim are, respectively, the eigenvectors of HTH and HH T correspond-
ing to the singular values {Am, 1:5 m :5 r},
Using (8.126b),
,
it can be shown Hot satisfies the following relations:

1. g+ sa (HTH)-l H T, ifr =N
2. H+ == H T (HH T r 1, ifr=M
3. HH+=(HH?
4. g+H = (H+H)T
5. HH+H=H
• •
6. H+HHT=HT

.. The first two relations show that H+ is a pseudoinverse if r = N or M. The


MNLS solution is . .
;


(8.127)

This method is quite general and is applicable to arbitrary PSFs. The major
difficulty is computational because it requires calculation of \jIk and q,k for large

300 •

/
Image Filtering and Restoration· . Chap. S
matrices. For example, for M = N = 256, H is 65,536 x 65,536, For images with
separable PSF, that is, V "'= HI UHz, the generalized inverse is also separable, giving
U+=HtVH!. .

One-Step Gradient Methods

When the ultimate aim is to obtain the restored solution u", rather than the explicit

pseudoinverse H~, then iterative gradient methods are useful. One-step gradient
algorithms are of the form . .

Un + 1 = u, - an g., Uo = 0 (8.128)
g. ~ -H (V - HUn) = g.-I - a n - ! HTHgn _ 1
T
(8.129)

where u, and g. are, respectively, the trial solution and the gradient of J at iteration
step n and an is a scalar quantity. For the interval 0 ~ Un < 2IAm" (HTH), u, con-
verges to the MNLS solution u" as n ~ 00. If o; is chosen to be a constant, then its
optimum value for fastest convergence is given by [22] .
2
H"o;:'T-Hcc::)J
a opt = A-ma- x -=(H='T:;:-:H=)-+-Am-in--:(....
-=-[
. (8.130)

For a highly ill conditioned matrix HTH, the condition number, Am"/Amin is large .
.Then uopt,is close to its upper bound, 2IAmax and the error at iteration s, for o; = sr,
obeys ~

(8.131)
This implies that lien II is proportional to lien -! II, that is, the convergence is linear and
can be very slow, because a is bounded from above. To improve the speed of
convergence, iT is optimized at each iteration, which. yields the steepest descent
algorithm, with

(8.132)

However, even this may not significantly help in speeding up the convergence when
the condition number of A is high.

Van Cittert Filter [4J


From (8.128) and (8.129), the solution at iteration n = i, with an = a can be written
as


where G r is .a powerseries:
j ,
.. G;=a L (l-aBTH)kH T (8.133)
k=O

Sec. 8.9 Generatized inverse, SVD. and Iterative Methods Sal


.
This is called the Vall Ciuert filter. Physically it represents passing the modified

observation vector oUT v through i stages of identical filters and summing their
outputs (Fig. 8.21). For a spatially invariant PSF, this requires first passing the
observed image vern.
II) through a filter whose frequency response is o H" (Wl> 002)
and then through i stages of identical filters, each having the frequency response
1 - O'IH (oot, (2)1Z. >

One advantage of this method is that the region of support of each filter stage
is only twice (in each dimension) that of the PSF. Therefore, if the PSF is not too
broad, each filter stage can be conveniently implemented by an FIR filter. This can
be attractive for real-time pipeline implementations.

The Conjugate Gradient Method [22-24]



The conjugate gradient method is based on finding conjugate directions, which are .
vectors di 4= 0, such that L

. dfAdj=O, > i1=j,OSi,j<N-1 (8.134)


When A is positive definite, such a set of vectors exists and forms a basis in the
N-dimensional vector space. In terms of this basis, the solution can be written as
N-l •
u+ = 2: aid, . (8.135)
i=O
>

The scalars a, and vectors di can' be calculated conveniently via the recursions
> • •

. d A in d n ... A o·
Un+l=Un+llI.n n, a n = - drAd Uo=
• n'

dn=-gn+~n-ldn-h Q
P .. -. l A_ Tin Adn-1 , • (8.136)
dn - I Adn _ 1

g. = -HTv + AUn = gn-,I + (In-1Adn-h ~ ~ -H V T

In this algorithm, the direction vectors d, are conjugate and an minimizes J,


not only along the nth direction but also over the subspace generated by
do, d h . . . , dn-I' Hence.. the minimum is achieved in at most N steps, and the
• method is better than the one-step gradient method at each iteration. For image
restoration problems where N is a large number, one need not run the algorithm to
completion since large reductions in error are achieved in the first few steps. When
rank H is r < N, then it is sufficient to run the algorithm up to It =r - 1 and u,-+ u".
• •
~
U (m,nI
i


.E

vfm, nl
oH'lw" w21 - l-alHlw"w2 1l' , .. 1-oIH(w"w2 112 • •

1 • • .. .... • . .. i

Figure 8.21 Van Cittert filler.

302 Image Filtering and Restorlltion Chap. a



. The major computation at each step requires the single matrix-vector product
Ad n • All other vector operations require only N multiplications. If the PSF is narrow.
compared to the image size, we can make use of the sparseness of A and all the
operations can be carried out quickly and efficiently.

Example 8.7
Consider the solution of

.
whose eigenvalues are Al = 10 + V65 and A2 = 10 - V65. Since A is nonsingular,
R" _A- H
. .
1 T =.L [
35
14
-7
-67 1 ][2 21 31] [05 =.L
35
21 .-7]
-8 '11
,
•" '

This gives •

'u " -_H'" v --1-[2122JL- [0.600]


35 0.628

and
2
i1v - Hu"'11 '" 1.02857
For the one-step gradient algorithm, we get
-~
. 0' "; -2-01
uopt- 20- .

and

go= [-In .
VI. '"
[0.8]
1.3 ' J, =9.46

After 12 iterations, we get


O.685J 0.555]
gl1 '" [ 1.253 ' U12= [ 0.581 '

The steepest descent algorithm gives


0.5992J J3 = 1. 02857.
U3'" [ 0.6290 ' , •

The conjugate gradient algorithm converges in two steps, as expected, giving U2 = u".
Both conjugate gradient and steepest descent are much faster than the one-step
. gradient algorithm;

. Separable Point Spread Functions

If the two-dimensional point spread function is separable, then the generalized


inverse is also separable. Writing V "'" HIUH~, we call rederive the various iterative

Sec. 8.9 Generalized.lnverse, SVD, and Iterative Melilods


algorithms. For example, the matrix form of the conjugate 'gradient algorithm
becomes
-(G;, Dr.) u"
0: - 0=0
tt - (Dn , A, DnA 2 ) '
R _ (Gn.A1Dn-1A2)
Dn=-G.+l3n-1 Dn - b 1-'.-1-( (8.137)
, D n - hAI Dn- l A2 )
G. '" Gn - 1 + 0:.-1 A1Dn - 1 A z, Do'" Go = HfVHz
where Al ~ HfH h A z = H~H2 and (X, Y) ~ Lm LnX(m, n)y(m, n). This algorithm
has been found useful for restoration of images blurred by spatially variant PSFs
, [24].

8.10 RECURSIVE FILTERING FOR STATE VARIABLE SYSTEMS

Recursive filters realize an infinite impulse response with finite memory and are
particularly useful for spatially varying restoration problems. In this section we con-
sider the Kalman filtering technique, which is offundamental importance in linear
estimation theory.

Kalman filtering [25, 26]

Consider a state variable system


,

• .x, + I == An x; + Bnl::n, n =0,1, ...


, z; == en x, (8.138)
E[XoxbJ==Ro, E[Enen·]==Pno(n -n')

where X n , en, z, are m x 1, P x 1, and q x 1 vectors, respectively, all being Gaus-


sian, zero mean, random sequences. RO is the covariance matrix of the initial state
vector Xo, which is assumed to be uncorrelated with Eo • Suppose Zn is observed in the
presence of additive white Gaussian noise as
n==O,l, ...
(8.139)
,
where TI. and en are uncorrelated sequences. We define 8", gn, and xn as the best
mean square estimates of the state variable x; when the observations are available
up to n -1, n, and N '>n, respectively. Earlier, we have seen that mean square
. estimation of a sequence x(n) from observations y(n),O S n ;:; N, gives us the
Wiener filter. In this filter the entire estimated sequence is calculated from all the
observations simultaneously [see (8.77)]. Kalman filtering theory gives recursive
realizations of these estimates as the observation data arrives sequentially. This
reduces the computational complexity of the Wiener filter calculations, especially
when the state variable model is time varying. However, Kalman filtering requires a
stochastic model of the quantity to be estimated (that is, object) in the form of

304 Image Filtering and' Restoration Chap.S


. ,
(8.138), whereas Wiener filtering is totally based on the knowledge of the auto-
correlation of the observations and their cross-correlation with the object [see
. (8.80)].
One-Step Predictor. Theone-step predictor is defined as s.~ E[x.Iy."
0:$ n' :$ n -1], which is the best mean square prediction of x, based on observa-
tions up to n - 1. It is given by the recursions
8.+1 =AnSn+G.q~lvm So=o
Vn = Yn - C.S. (8.140)
G.~AnRnC~,
(8.141)

The sequence v. is called the innovations process. It represents the new information
obtained when the observation sample s, arrives. It can be shown [25, 26] that v; is
a zero mean white noise sequence with covariances Qr., that is,
E[v.] =0, (8.142)
.
Tile nonlinear equation in (8.141) is called the Riccati equation, where R. represents
the covariance of the prediction error en g x, - s., which is also a white sequence:
. ' .
E[e.] =0, E[ene~.J = R. &(n - n 1) (8.143)
The m x q matrix G. is called the Kalman gain. Note that Rn , Gn , and q, depend
only on the state variable model parameters and not on the data. Examining (8.140)
and Fig. 8.22 we see that the model structure of s; is recursive and is similar to the
state variable model. The predictor equation can also be written as
S.+1 =A.sn + Gnq;lvnl
(8.144)
J
which is another state variable realization of the output sequence Yn from a white
sequence V n of covariances qn' This is called the innovations representation and is
useful because it is causally invertible, that is, given the system output Ym one can
find the input sequence Y n according to (8.140). Such representations have applica-
tion in designing predictive coders for noisy data (see [6] of Chapter 11).

Online filter. The online filter is the best estimate of the current state based
on all the observations received currently, that is, .
,
g. ~ E[xn!Yn" 0 <n 1:$ n]
(8.145)
g,. - n n qn V n + Sn
-R CT -I

This estimate simply updates the predicted value s, by the new information P n with a
gain factor Rn~q;;l,to'obtain the most current estimate. From (8.140), we now see
that .
• •

(8.146)

Sec. 8.10 Recursive Filtering for State Variable Systems 305


I
I
I
f.
B.
+
+
Unit
delay
". I
n
C
2.
+
V n
.1'" :I-
t~n
+ ••


I
I
I
A. I
• I
I
Stale model I Observat ion modal

M State variable model

Y. + Itn + Unit s. •
+ G q-l + delay
•• r'
- -

A"

c•

[b] One step predictor

Yn + It.
R CT q-l
+
+ •• • +. g". filter

- +
tn. predictor -+-:....
Update.
Predict
Cn $" s.
C. ·z -1 A. •

Ie) Online filter

Figure 8.22 Kalman filtering.

Therefore, Kalman filtering can be thought of as having two steps (Fig. 8.22c). The
first step is to update the previous prediction by the innovations. The next step is to
predict from the latest update.

Fixed-interval smoother (Wiener filter). Suppose the observations are


available over the full interval [0, N]. Then, by definition,
in = E[xnly.·, O:s n' s N]
is the Wiener filter for x•. With x, given by (8.138), x. can be realized via the
so-called recursive backwards filter .
in = R, A. + S. (8.147)
An::= A~An+l - C~qn I G~ An+ l + C~q;;llI", AN+ 1= 0 (8.148)
. - .
To obtain i., the one-step predictor equations (8.140) are first solved recursively in
the forward direction for n = 0, 1, ... , N and the results R., s, (and possibly

306 Image Filtering and Restoration Chap.S


q~l, Gn , V n )
are stored. Then the preceding two equations are solved recursively
backwards from n ==- N to n = O.

Remarks
The number of computations and storage requirements for the various filters
are roughly O(m 3N") + O(m 2 N 2) + O(q3 N). If en R,; G n, and q, have been pre-
Z Z
computed and stored, then the number of online computations is O(m N ). It is
readily seen that the Riccati equation has the largest computational complexity.
Moreover, it is true that the Riccati equation and the filter gain calculations have
numerical deficiencies that can cause large computational errors. Several algo-
rithms that improve the computational accuracy of the filter gains are available and
may be found in [26-28]. If we are primarily interested in the smooth estimate in,
which is usually the case in image processing, then for shift invariant (or piecewise
shift invariant) systems we can avoid the Riccati equation all together by going to
certain FFf-based algorithms [29).
Example 8.8 (Recursive restoration of blurred images)
A blurred image observed with additive white noise is given by
12
y(n)'" 2: h(n,k)u(n-k)+'f}(n) (8.149)
k--/J

where y(n),n = 1,2, ... represents one scan line. The PSF hen, k) is assumed to be
spatially varying here in order to demonstrate the power of recursive methods. Assume
that each scan line is represented by a qth-order AR model

u(n) '"
•2: a(k)u(n - k) + ten), E[t(n)E(n')] '" 13z3(n - n') (8.150)

Without loss of generality let II + Iz:z. q and define a state variable x" = [u(n + II) ...
u(n + 1), u(n) ... u(n -lzW, which yields an (/1 + Iz + I)-order state variable system •

as in (8.138) and (8.139), where

a(l)a(2) ... a(q);O ... O



1 O.~ ~ : 0 : 1
o 1'. 0 : . •••
, B~ 0
•• , c. ~ [h(-I"n) ... h(/z,n)]
• •

and e. ~ e(n + I, + 1). This formulation is now in the framework of Kalman filtering:

and the various recursive estimates can be obtained readily• Extension of these ideas to

.wo -dunensional blurs is also possible. For example, it has been shown that an image
blurred by a Gaussian PSFs representing atmospheric turbulence can be restored by a
Kalman filter associated with a diffusion equation [66].

8.11 CAUSAL MODELS AND RECURSIVE FILTERING [32-39]


In this section we consider the recursive filtering of images having causal representa-
tions. Although the method presented next is valid for all causal MVRs, we restrict
ourselves to the special case .

Sec. 8.11 .Causal Models and Recursive Filtering [32-39) 307


/I (nI, I!) .• (illl (m - I, n) +,Pl u tm; n - 1) - P3 u (m '- 1, n - 1)


, ,
(8.151)

+jJ4u(m + l,n -1) + eim, nJ
,
E[e(m, ll)] = 0, E[e(m, n)c,(m - k, n -I)J = 132 8(k )8(l ) (8.152)
\Ve recall that for p. = 0, P3 = PI P2, and j:)2 cr" (1 - rl,)O - ~), this model is a
realization of the separable covariance function of (2.84). The causality of (8.151) is
with respect to column by column scanning of the image. ' " .

A Vector Recursive Filter , i


" - '

, (8.154)

Now (8.153) can be written as a vector AR process:


un=LU.-l + L11sn (8.155)
where L = Li l L2• Let the observations be given as
,

'y(m, n) = utm, n) + 'l)(m, n) (8.156)


where 1I(m, n) is a stationary white noise field with zero mean and variance u~. In
vector form, this becomes .

, = UN + 1In

Y. (8.157)
Equations (8.155) and (8.157) are now in the proper state variable form to yield the
Kalman filter equations .

Sn+l =Lg." ,
. ,

g., = R. (R. + (J"; Ifl (Yn - SN) + s,


R.+ l -
=LRnL T
+ LilP(Ll)-I, Ro=Ro
(8.158)

-
R, = (I - R, (R, + (J"~ IfIJRn = (J"; (R, + (J"; It IR, .'

Using the fact that L = L11 L2 , this gives the following recursions..
,
, Observation update: N"

.. , g(fI'l,n)=s(m,n)+ Lkn(m,i)[y(i;n)-s(i,n)]'· (8.159)


i "'" 1 ,

308 Jmage Filtering and Restoration. Chap B




Prediction:
=? s irn, n + 1) '= flls(m-l, n + 1) + P2g(m, n)

(~.160)

-r- P3g(m -1, n) +p4g(m + 1,n)
where k; (m, i) are the elements of K n ~ R, (R~ + (J"~ I)-I. The Riccati equation can
. T A
be Implemented as L 1 R, + 1 L I = R n , where '
~Ll = L - T + P, R,
- = [ 1- K JR, (8.161)
K n zRnL 2 n


This gives the following:
Forward recursion:'
L1 Rn+ 1 = Qn =? 'n+l (i,j) "" iVn + I (i - I,j) + qn (i,j) (8.162)
Backward recursion:
QnLl = lin =? qn (t, j) = Plqn (i,j + 1) + tn (i, j) (8.163)
, '

From Kalman filtering theory we know that R, is the covariance matrix of


, en ~ u, - s•. Now we can write
, Ll '
V n = Yn - Sn - Un - Sn + 'l'In = en + 'l'In
. (8.164)
cov[en] = R n , COv['l"jn] = O"~ I

This means that Kn> defined before, is the one-dimensional Wiener filter for the
noisy vector 11n and that the summation term in (8.159) represents the output of this
Wiener filter, that is, K" 11n = en> where en is the best mean square estimate of en
given V n • This gives
, gem, n) = sCm, n) + (m, n) (8.165) e
e
where (m, n) is obtained by processing the elements.of the Vn. In (8.159), the
estimate at the nth column is updated as Yn arrives. Equation (8.160) predicts
recursively the next column from this update (Fig. 8.23). Examining the preceding
,
I
,, ,, ,, I
Column I
tim. n) I
delay
, Z2
~1
iI
I .
y(m, nl
- + I
t(m.n+ll
• gim, n) . I• 2·D
+ + filter
,
+ , I
vIm; n) + •

I
• I
" I •
I tim," + 11 ~ p,s(m -\, 'n + 1) + P2u(m, nl
t-n I
Wiener I - psu(m - 1, nl + p.g(m + 1, nl
filter •
I • ' . ,

k. (m, il I •
.
I
h I
elm, nl ,
j

- .~" •
.~ - ~ -_.- , -<,'
"

I•
I

Update . i •
Predict
Figure 8,23 Two-dirnensional recursive filter.

Sec. S.n'Causal Models and Recursive Filtering [32-39] 309


equations, we see that the Riccati equation is computationally me most complex,
requiring O(N3 ) operations at each vector step, or 0 (N 2) operations per pixel, For
practical image sizes such a large number of operations is unacceptable. The follow-
ing simplifications lead to more-practical algorithms.

Stationary Models

For stationary models and large image sizes, R, will be nearly Toeplitz so that the
matrix operations in (8.158) can be approximated by convolutions, which can be
implemented via the FFT. This will reduce the number of operations to (0 (N 10gN)
per column or o (logN) per pixel.

Steady-State Filter
. ' .
For smooth images, R, achieves steady state quite rapidly, and therefore K" in
, (8.159) may be replaced by its steady-state value K. Given K, the filter equations
need O(N) operations per pixel.

A Two-Stage Recursive Filter [35]

If the steady-state gain is used, then 'from the steady-state solution of the Riccati
equation, it may be possible to find a low-order, approximate state variable model
for e(m, n), such as

(8.166)
e(m, n) = Cx; . •
where the dimension of the state vector Xm is small compared to N. This means that
for each n, the covariance matrix of the sequence e(rn, n), m = 1, ... ,N, is approxi-
mately R = lim Rn • The observation model for each fixed n is
n_<»

v(m, n) = e(m, n) + 1'}(rn, n) (8.167)


Then, the one-dimensional forward/backward smoothing filter operating recur-
sively on v(m, n), m = 1, ... ,N, will give X""., the optimum smooth estimate of X m
at the nth column. From this we can obtain e(m, n) = CXm • ., which gives the vector
en needed in (8.165). Therefore, the overall filter calculates e(rn, n) recursively in
rn, and s(m, n) recursively in n (Fig. 8.23). The model of (8.166) may be obtained
from R via ARmodeling, spectral factorization, or other techniques that have been
discussed in Chapter 6.
,
A Reduced Update Filter

In practice, the updated value etm, n) depends most strongly on the. observations
[i.e., v(rn, n)] in the vicinity of the pixel at (rn, n). Therefore, the dimensionality of .
e
the vector recursive filter can be reduced by Constraining (rn, n) to be the output of
a one-dimensional FIR Wiener filter of the form

310 Image Filtering and Restoration . Chap. 8


P
e(m, n) = 2:. a(k)v(m - k, n) (8.168)
k"" -p

In steady state, the coefficients a(k) are obtained by solving
P
. L: a(k)[r(m - k) + O"~ 8(k)] = rem), -p zs m sz p (8.169)
k'" -/1

where rem - k) are the elements of R, the Toeplitz covariance matrix of en used
previously. Substituting (8.165) in (8.160), the reduced update recursive filter be-
comes
sCm, n + 1)·"; Pls(m -l,n + I} - P3s(m -l,n) + pzs(m,n)

(8.170)

.+ P4s(m + l,n}- P3e (m -l,n) + Pze (m, n)+ P4e (m + l,n)

where e(m, n) is given by (8.168). A variant of this method has been considered in
(36].

Remarks

The recursive filters just considered are useful only when a causal stochastic model
such as (8.151) is available. If we start with a given covariance model, then the FIR
Wiener Filter discussed, earlier is more practical.

8.12 SEMlCAUSAL MODELS AND SEMIRECURSIVE FILTERING


.
It was shown in Section 6.9 that certain random fields represented by semicausal
models can be decomposed into a set of uncorrelated, one-dimensional random •

sequences. Such models yield semirecursive filtering algorithms, where each image
column is first unitarily transformed and each transform coefficient is then passed
through a recursive filter (Fig. 8.24). The overall filter is a combination oi fast
transform andrecursive algorithms. We start by writing the observed image as
• •

P2 q, . :
v(m,n)= 2: L h(k,l)u(m -k,n -1) + Tj(m, n),
k~-P11~ -ql (8.171}
1 Sm sN, n ""0,1,2, ...

h
y.(1) "'nO)
• .

RecuIllive filter
, •
.
,,
• •
• • •
h
Vn vn(k) «; (kl
A
Un
• lit Recursive filter lit-I ,
,•
.

, •
<
• •
• h •

• vn(Nl x.(NI
• . < , •

.
Recursive filter
- , , ~
• • . .

• • Figure 8.24 Semirecursive filtering.

Sec. 8.12.. .Semi-causal Models alld Semirecursive Filtering. 311


,where TI(m, n) is stationary
.
white noise. In vector notation this becomes ,

(8.172)
- -
.where v., Un, b., and 1). are N x 1 vectors and b. depends only on u( -Pz + 1, n), ..
u(O, n), u(N + 1, n}, u(N + PI> n), which are boundary elements of the nth column
of u (m,n). The HI are banded Toeplitz matrices.,

Filter Formulation
Let W be a fast unitary transform such that WHn W*T, for every fl, is nearly
diagonal. From chapter 5 we know that many sinusoidal transforms tend to diago-
nalize Toeplitz matrices. Therefore, definizu; ,
Li .v. Li .v. - Li lVrb- Li ....
Y. "" "r Vn>,
Xn "" ,.. Un> Cn = ~ ., , V n "" 'f." Yin

lJFH. 'l1"T "" Diag[WH. '11*1 ~ r, ~ Diag["yn (k)] (8.173)


and multiplying both sides of (8.172) by W, we can reduce it to a set of scalar
equations, decoupled in k, as
q, q,
)In(k) = 2.: 'fI(k)xn-l(k) + vn(k) + 2.: cn-I(k), k=l, .... ,N (8.174)
I=-ql I=-q,

In most situations the image background is known or can be estimated quite accu-
rately. Hence cn(k) can be assumed to be known and can be absorbed in Yn (k) to
give the observation system for each row of the transformed vectors as
qz
Yn (k) =
. 1=
2.:-q, 'Y1(k)x.~,(k) + v. (k), k=l, ... ,N (8.175)

Now for each row, x. (k), is represented by an AR' model


p

x. (k) ==
I~
2.:1adk)x.-J(k) + en (k), k=l, ... ,N (8.176)

which together with (8.175) can be set up in the framework of Kalman filtering, as
shown in Example B.8. Alternatively, each line [Yo (k), n = 0,1 ... J can be pro-
cessed by its one-dimensional Wiener filter. This method has been found useful in
adaptive filtering of noisy images (Fig. 8.25) using the cosine transform. The entire
image is divided into small blocks of size N x N (typically N == 16 or 32). For each k,
the spectral density Sy(00, k), of the sequence [y.. (k), n = 0, 1, ... ,N - 1], is esti-
mated by a one-dimensional spectral estimation technique, which assumes {Y.} to
be an AR sequence [81. Given S, (00, k) and cr;, the noise power, the sequence Y. (k)
• •
x.
is Wiener filtered to give (k), where the filter frequency response is given by
G(w, k) ~ Sx (00, k) , = Sf (00, k):- (j.~ (8.177)
Sx(w,k) +<1; . Sy(w,k) .
In practice, S. (00, k) is set to zero if the estimated S, (00, k) is less than (j~. Figures
8.17 and 8.18 show examples of this method, where it is called the COSAR (cosine-
AR) algorithm. ' .

312 . Imaga Filtering and Bastoration Chap. 8


..
. . Spectrum estimator for channel k
r~------------------~--------~-----'
I q2 I
I I
I " I
I
I Identify SytWr k} - s. Sx (w. kl
I
I
AR model

+ I
I + I
spectrum •
I
I 0 w
I I
I •
I
I •
I
IL ______________ I
~ - ~

_______ ~ _____ ~ ___J


• , • •

Inverse ,
"n Cosine Vn(kl 1·0 xn (kl cosine un
transform Wiener transform

W filter '!oT
,,

Figure 8.25 CaSAR algorithm for adaptive filtering using semicausal models.

8.13 DIGITAL PROCESSING OF SPECKLE IMAGES


When monochromatic radiation is scattered from a surface whose roughness is of
the order of a wavelength, interference of the waves produces a noise called speckle.
Such noise is observed in images produced by coherent radiation from the micro-
wave to visible regions of the spectrum. The presence of speckle noise in an imaging
system reduces its resolution, particularly for low-contrast images. Therefore, sup-
pression of speckle noise is an important consideration in design of coherent
imaging systems. The problem of speckle reduction is quite different from additive
noise smoothing because speckle noise is not additive. Figure 8.26b shows a
speckled test pattern image (Fig. 8.26a).

Speckle Representation .
In free space, speckle can be considered as an infinite sum of independent, identical
phasors with random amplitude and phase [41, 42]. This yields a representation of
its complex amplitude as
(8.178)
where an and a{ are zero mean. independent Gaussian random variables (for each
.x, y) with variance (1~. The intensity field is simply

s =s(x,y) = la(x,y)12=a~ + a; (8.179)
which has the exponential distribution of (8.17) with variance (1 2 ~ 2(1~ and mean
l-\o, = E[s] = (12, A white
, noise field with these statistics is called the fully developed
speckle. .
For.any speeklerthe contrast ratio is defined as .

standard deviation of s
'Y = .' mean value of s (8.180)

Sec. 8.13 Digital Processing of Speckle Images 313



,

(a)

';:
fAA i;td'- -, ;-, , ".I.

(b)

·'
O
:r>·······..\
1\.
". -


• II
il II Jill JII III ..

l' . .

• • •


(e)

Figure 8,26 Speckle images.

For fully developed speckle, "y = 1.


When an object with complex amplitude distribution g(x, y) is imaged by a
coherent linear system with impulse response K(x,y;x',y'), the observed image
intensity can be written as
~ 2
v(x, y) = JJK(x, Y ;.t', Y ')g(x', y')e j$(X',y') dx' dy ' + ,,(xt Y) (8.181)

where 'T](x, y) is the additive detector noise and <j:l(x, y) represents the phase dis-
tortion due to scattering. If the impulse response decays rapidly outside a region
Reel! (x, y), called the resolution cell, and g (x, y) is nearly constant in this region,
then [44]
v(x, y) "" Ig(x, y)l~ la(xt Y)1 + ll(x, y) 2
(8.182)
= u(x, y)s(x,y) + Tj(x, y)
where •

u(x, y) =AIg(x, Y)I 2, . a(x; y) ~ If


Rccu
K(x, y;x', y ')e J+(x',y') dx' dy' (8.183) .

314 Image Filtering and Restoration Chap. 8


The u (x, y) represents the object intensity distribution (reflectance or trans-
mittance) and s(x, y) is the speckle intensity distribution. The random field a(x, y)
is Gaussian, whose autocorrelation function has support on a region twice the size .
of R<cll. Equation (8.182) shows that speckle appears as a multiplicative noise in the
coherent imaging of low-resolution objects. Note that there will be no speckle in an
ideal imaging system. A uniformly sampled speckle field with pixel spacing equal to
or greater than the width of its correlation function will be uncorrelated.

Speckle Reduction [46-47]: N·Look Method

A simple method of speckle reduction is to take several statistically independent


intensity images of the object and average them (Fig. 8.26c). Assuming the detector
noise to be low and writing the lth image as
v/(x, y) = u(x, y)sdx, y), l=I, ... ,N (8.184)
then the temporal average of N looks is simply
N
VN(X,y) ~-Nl L3 v/(x,y) =u(X,y)SN(X,y) (8.185)
/-1

where SN (x, y) is the N -look average of the speckle fields. This is also the maximum
likelihood estimate of[v/(x,y),l = 1, ... ,N], which yields
2 2
' ] !J.s u [.] u
E[ VN = N' val' VN = NIX, (8.186)

This gives the contrast ratio 'I = liN for VN. Therefore, the contrast improves by a
factor of VN for N -look averaging.

Spatial Averaging of Speckle


If the available number of looks, N, is small, then it is desirable to perform some
kind of spatial filtering to reduce speckle. A standard technique used in synthetic
aperture radar systems (where speckle noise occurs) is to average the intensity
values of several adjacent pixels. The improvement in contrast ratio for spatial
. averaging is consistent with the N-Iook method except that there is an accom-
panying loss of resolution.

• •
. WN(X,y) ± z(x, y) + '11N(X, y) (8.188)

where 'l']N (x, y) is stationary white noise.

Sec. 8.13 Digital Processing of Speckle Images 315



-
• " -
rnput log { • I ~Viener
exp i r ]
Filtered
image - filter output
- -

fa) The algorithm


,f
, ,,' .
• .-
-

-

- ," d
-~- .
, ---
• •
" ...

.-

-- ,,- , -
-•

original speckled filtered



(b) Filtering Example

Figure 8.27 Homomorphic filtering of speckle.

For N ~2, 1]N can be modeled reasonably well by a Gaussian random field
[45J, whose spectral density function is given by

• 6' N=l
S" (£1' £2) = cr~ = 1 (8.189)
-N'
Now z(x, y) can be easily estimated from WN(X, y) using Wiener filtering tech-
niques. This gives the overall filter algorithm of Fig. 8.27, which is also called the
homomorphic filter. Experimental- studies have shown that the homomorphic
Wiener. filter performs quite well .eompared to linear filtering or other homo-
morphic linear filters [46]. Figure 8.27 shows the performance of an adaptive FIR
• Wiener filter used in the homomorphic mode.



8.14 MAXIMUM ENTROPY RESTORATION
-
The inputs, outputs, and the PSFs of incoherent imaging systems (the usual case)
,,' are nonnegative. The lelist squares or mean square criteria based restoration algo-
rithms do not yield images with nonnegative pixel values. A restoration method
based on the maximum entropy criterion gives nonnegative solutions. Since entropy
is a measure of uncertainty, the general argument behind this criterionIs that it
• assumes the least about the solution and gives it the maximum freedom within the

limits imposed by constraints.

316 Image Filterlng and Restoration Chap. 8


-
Distribution-Entropy Restoration
,

For an image observed as ,

, (8.190)

where !Te is the PSF matrix, and a. and o- are the object and observation arrays
mapped into vectors, a maximum entropy restoration problem is to maximize

(8.191)

subject to the constraint
~II{; - %",11 2
= <T~ (8.192)
°
where <T~ > is a specified quantity. Because II (n) is nonnegative and can be
normalized to give:En tI (n) = 1, it can be treated as a probability distribution whose
entropy is (} (tb). Using the usual Lagrangian method of optimization, the solution
u is given by the implicit equation '
" r u- = exp{-l- A!TCT ({; -!Tea)} (8.193)
"
where expjx} denotes a vector of elements explx(k)J, k = 0, 1, ... ,1 is a vector of
allls and A is a scalar Lagrange multiplier such that i.. satisfies the, constraint of
(8.192). Interestingly, a Taylor series expansion of the exponent, truncated to the
first two terms, yields the constrained least squares solution
u- = (.9·P 9'C + Al)-l%T (F (8.194)
Note that the solution of (8.193) is guaranteed to be nonnegative. Experimental
results show that this method gives sharper restorations than the least squares filters
. when the image contains' a small number of point objects (such as in astronomy
images) [48J.. '
A stronger restoration result is obtained by maximizing the entropy defined by
(8.191) subject to the constraints
, ,
,

,
" a (n) ~O, n = 0, ... ,N -1
" N~l (8.195)
2: It (m, DIU (j) '" (>- (m), m '" 0, ... ,M -1

j=O
) ,

,Now the solution is given by

a. (n) = 1 exp
M ~ 1 1
Lh (l,n)A(l) , n = 0, ... , N - 1

(8.196)
. e J. /=0.

. where A(l) are Lagrange multipliers (also called dual variables) that maximize the
functional .
N-l M-l
J(A) ~L
n~O
a(n) - '1-0'
L ~(l)(r(l) '
(8.197)

Sec. 8.14 M.aximum Entropy Restoration 317


The above problem is now unconstrained in A(n) and can be solved by invoking
several different algorithms from optimization theory. One example is a coordinate
ascent algorithm, where constraints are enforced one by one in cyclic iterations,
giving [49] .
O'(m)
N-I
2: h (m, k).X"j-l (k) (8.198)
k=O

where m = j moduloM, k = 1, ... , Nand j '= 0, 1, .... At the jth iteration, x;(n)
is updated for all n and a fixed m. After m = M, the iterations continue cyclically,
updating the constraints. Convergence to the true solution is often slow but is
assured as j ..... 00, for 0 S A (m, 11) S 1. Since the PSF is nonnegative, this condition is
easily satisfied by scaling the observations appropriately.
• •

Log-Entropy Restoration

There is another maximum entropy restoration problem, which maximizes


N-l
(] '= 2: log'; (n) (8.199)
n=O

subject to the constraints of (8.195). The solution now is obtained by solving the
nonlinear equations

u(n) = "1-1 1 (8.200)



2:
1=0
A (I, n )A(l)


(8.201)

Once again an iterative gradient or any other suitable method may be chosen to
maximize (8.201). A coordinate ascent method similar to (8, 198) yields the iterative
solution [50]

• () -
Ibj+1 n -1
a/en) d .
( ) . ()' m »] mo uloM, n=O,I, ... ,N-l
,
(8.202)
+ IXjh m, n Uj n
where IX; is determined such that the denominator term is positive and the constraint
N-l
2:
n=O
h (m,n)aj+ I (n) = If (m) (8.203)
• •

is satisfied at each iteration. This means we must solve for the positive root of the
nonlinear equation . •

318 .Image Filtering and Restoration Chap.S


(8.204)
• •

As before, the convergence, although slow, is assured as j - For tl (m, n) > 0, 00.

which is true for PSFs, this algorithm guarantees a positive estimate at any iteration
step. The speed of convergence can be improved by going to the gradient algorithm
[51]
Aj+l (m) = Aj(m) + o.jgj(m), j = 0,1, ... (8.205)
N-I
gj(m) = t.' (m) - 2: If (m,.n)u,j(n) (8.206)
n=O

1 (8.207)
11/-1
2: Ii (m, n)Aj (m)
m=O

where Xo(m) are chosen so that tlo(n) is positive and (Xj is a positive root of the

equation
N-I
f(o.j)~ 2: Gj(k)[Aj(k)+o.jGj(k)tI=O (8.208)
k~O


where
11/-1 . 11/-1
Gj(k) A 2: Ii (m, k)gj(m), Aj(k) = 2: If (m, k)Aj(m) (8.209)
m~O m~O

The search for o.j can be restricted to the interval [0, max{Gj (k)/Aj(k)JJ.
This maximum entropy problem appears often i~ the theory of spectral esti-
mation (see Problem 8.26b). The foregoing algorithms are valid in multidimensions
if a. (n) and o- (m) are sequences obtained by suitable ordering of elements of the'
multidimensional arrays u (i, i, ...) and v (i, i, ...), respectively.
~

/
.
8.1~\ BAYESIAN METHODS
f
." -.-..../
In many imaging situations-for instance, image recording by film-the observation
model is nonlinear of the form '
,
, (8.210)
where f(x) is a nonlinear function of x. The a posteriori conditional density given by
Bayes'rule
,, (8.211)

Sec. S.lS '. Bayesian Methods 319


,

observation model is nonlinear, it is difficult to-obtain the marginal density pep)


even when t( and 1} are Gaussian. (In the linear case p (u.jtr) is easily obtained since
it is Gaussian if u. and TJ are). However, the ,MAP and ML estimates do not require
pen) and are therefore easier to o b t a i n . ' " "
Under the assumption of Gaussian statistics for f" and \1, with eovariances I'lt.
and an, respectively, the ML and MAP estimates can be shown to be the solution
of the following equations: '
ML estimate, iecML: .!7P .0a;;l[o- - f(9'CuMzJJ == 0 (8.212)
where
af(x)
.0 ~Diag (8.213)
ax - X"" Wj

and liri are the elements of the vector w- ~9'CU'ML . •


,

MAP estimate, iec.\IAP: iec MAP= Po" + fJl",.'7P .01'lt;;l[U- - !(9'CuMAP)] (8.214)
where p.~ is the mean of Ib and .0 is defined in (8.213) but now ~ 91riec MAP' w-
Since these equations are nonlinear, an alternative is to maximize the appro-
priate log densities. For example, a gradient algorithm for iec MAPis
,
Wj + 1== iecj - Ctj{$fg;ja;;l(o - f(9'Ciec j) ] -:- 91:1[iecr - fL,,]} , ,(8.215)
where Ctj> 0, and0 j is evaluated at W-j Li 9lu,j.

Remarks

If the function f(x) is linear, say I(x):= x, and91n := O'~ I, then a.MI. reduces to the
least squares solution
. ' .
, 9C 9CU'ML = 9(' o- (8.216)
and the MAP estimate reduces to the Wiener filter output for zero mean noise [see
(8.87»),
a'MAP = fL", + lJ (0 - p.~) (8.217)

where /}',;, (91: 1+.'7P !ill. 1 9'Ct l .!7P{II.:l .
In practice, fL" may be estimated as a local average of o- and fL" "" .9'C"F 1 (fL~),
where.9'C" is the generalized inverse of 9'C. .'



. 8.16' COORDINATE TRANSFORMATION
.• AND GEOMETRIC CORRECTION
,

In many situations a geometric transformation of the image coordinates is required.


, An example is in the remote sensing of images via satellites, where the earth's
rotation relative to the scanning geometry of the sensor generates an image on a
, .' distorted raster [55]. The problem then is to estimate afunction j'(r ' j y") given at
discrete locationsof (x, y), where x' = h, (x; y),y' := h 2 (x, y) describe the geometric

320 Image Filtering andRestoratlcn


,
transformation between the two coordinate systems. Common examples of geo-
metric transformations are translation, scaling, rotation, skew, and reflection, all of
which can be represented by the affine transformation

• (8.218)

In principle, the image function in (x' l y') coordinates can be obtained from its
values on the (x;,y;) grid by an appropriate interpolation method followed by
resampling on the desired grid. Some commonly used algorithms for interpolation
at a point Q (Fig. 8.28) from samples at PI , P2 , P3 , and P4 are as follows..

1; Nearest neighbor:

k: min{dJ =< dk (8.219)


i
that is, Pk is the, nearest neighbor of Q.
2. Linear interpolation:
4
2: F(Pk)ld k
F(Q) =""k"",1-;4--'- (8.220)
"
, 2:1.
. k~ 1 dk
3. Bilinear interpolation: •

F(Q) =< F(QI)lds+ F(Q2)ldi; = F(QM;' + F(Q2)ds


(8.221a)
. . (lId;) + (lId;,) . ds + d6
where • •

F(Q ') = F(P1)d4 + F(P4)d; F(Q ) = F(P2)di + F(P3)df . (8.221b)


1 d1' + d'4 . , 2 df + di
These methods are local and require minimal computation. However, these
methods would be inappropriate if there was significant noise in the data.
Smoothing splines or global interpolation methods, which use all the available
data, would then be more suitable. '

For many imaging systems the. PSF is spatially varying in Cartesian coordi-
.nates but becomes spatially invariant in a different coordinate system, for example,
in systems with spherical aberrations, coma, astigmatism, and the like [56, 57J.
, . These and certain other distortions (such as that due to rotational motion) may be
, , •

... ,.'

Figure 8.28 Interpolation at Q.

Sec. 8.18 Coordinate Transformation and Geometric Correction 321


,•

Input imag. g(x, vi Coordinate


g(f,l1i

Spatially
• • •
-
f(~.l1i Inverse
,
fIX, vi
tnvenant coordinate
transformation •
transformation
• filter •

Figure 8.29 Spatially variant filtering by coordinate transformation.



corrected by the coordinate transformation method shown in Fig. 8.29. The input
image is transformed from (x, y) to (~, TJ), coordinates where it is possible to filter by
a spatially invariant system. The filter output is then inverse transformed to obtain
the estimate in the original coordinates.
For example, the image of an object fer, 9) obtained by an axially symmetric
imaging system with coma aberration is .

g(r, 6) = r f" a (ro)h (::s. 9 - 90)/(ro, 90 )rodrodllo ' (8.222)

where n 2: 1 and (r, 9) are the polar coordinates. The .PSF is spatially varying in
(x, y). In (r, 9) it is shift invariant in e but spatially variant in r. Under the loga-
rithmic transformations
E= In r, Eo = n In ro (8.223)
the ratio r/rfJ becomes a function cf the displacement £- ~ and (8.222) can be
written as a convolution integral: .

(8.224)

where

Spatially invariant filters can now be designed to restore /(k90) from 8(~,e).
Generalizations of this idea to other types of blurred images may be found in [56,
57]. .
The comet Halley shown on the front cover page of this text was reconstructed
from data gathered by NASA's Pioneer Venus Orbiter in 1986. The observed data
was severely distorted with several samples missing due to the activity of solar
flares. The restored image was obtained by proper coordinate transformation,
bilinear interpolation, and pseudocoloring.

. 8.17; BLIND DECONVOLUTION [58, 59]


I

Image restoration when the fSF is unknown is a difficult nonlinear restoration


problem. For spatially invariant imaging systems, the power spectral density of the
observed image obeys (8.40), which gives
.loglHl2,= log(S•• - STI'!) -logS.. • (8.225)

322 Image Filtering and Restoration Chap. 8


If the additive noise is small, we can estimate


1 M 1 hi
·logjHI = 2M kZ:l (loglVkl 2
l =M l~1 [logIVkI..,.logIUkl]
2
-'-logIUk ) (8.226)

where Vk and U«. k '" 1, ... ,M are obtained by dividing images v(m, n) and u(m, n)
into M blocks and then fourier transforming them. Therefore, identification of H
requires power spectrum estimation of the object and the observations. Restoration
methods that are based on unknown H are called Hind deconvolution methods.
Note that this method gives only the magnitude of H. In many imaging situations
the phase of H is zero or unimportant, such as when H represents average atmos-
pheric turbulence, camera misfocus (or lens aberration), or uniform motion (linear
phase or delay). In such cases it is sufficient to estimate the MTF, which can then be
used in the Wiener filter equation. Techniques that also identify the phase are
possible in special situations, but, in general, phase estimation is a difficult task.

\~g EXTRAPOLATION OF BANDLIMITED SIGNALS


Extrapolation means extending a signal outside a known interval. Extrapolation in
the spatial coordinates could improve the spectral resolution of an image, whereas
frequency domain extrapolation could improve the spatial resolution. Such prob-
lems arise in power spectrum estimation, resolution of closely spaced objects in
radio-astronomy, radar target detection and geophysical exploration, , and the like.

Analytic Continuation

A bandlimited signal f(x) can be determined completely from the knowledge of it '
over an arbitrary finite interval [-a, «]. This follows from the fact that a band-
limited function is an analytic function because its Taylor series

(8.227)

is convergent for all x and A. By letting x E [-a, o] and x + A > a, (8.227) can be
used to extrapolate f(x) anywhere outside theinterval [-0:,0:]. •

Super·resolution •
• •

The foregoing ideas can also be applied to a space-limited function (Le.,f(x) = 0 for
IxI> 0:) whose Fourier transform is given over a finite frequency band. This means,
theoretically, that a finite object imaged by a diffraction limited system can be
perfectly resolved by extrapolation in the Fourier domain. Extrapolation of the
spectrum of an object beyond the diffraction limit of the imaging system is called
super.resolution.· . . i •

Sec.8.18 '. Extrapolation of Sandlimited Signals 323



Extrapolation via Prolate Spheroidal Wave Functions (PSWFs) [60]

The high-order derivatives in (8.22"7) are extremely sensitive to noise and truncation
errors. This makes the analytic continuation method impractical for signal extrapo-
lation. An alternative is to evaluate f(x) by the series expansion

f(x) = L an$n(X), 'r/x (8.228)


n=O
1 ru
an = x: tf(X)$n (x) dx (8.229)
.
where $n (x) are called the prolate spheroidal wave functions (PSWFs). These func-
tions are bandlirnited, orthonormal over [-:-a, 0:], and complete in the class of
bandlimited functions. Moreover, in the interval -a <: X ::s; a, $n (x) are complete
and orthogonal, with ($n, $",) = An o(n - m), where A~ > 0 is the norm II$n f' Using.
this property in (8.228), an can be obtained from the knowledge ofj(x) over [-a, a]
via (8,229). Given an in (8.228), I(x) can be extrapolated for all values of x ..
In practice, we would truncate the above series. to a finite but sufficient
number of terms. In the presence of noise, the extrapolation error increases rapidly
with the number of terms in the series (Problem 8.28). Also, the numerical
computation of the PSWFs themselves is a difficult task, which is marred by its own
truncation and round-off errors. Because of these difficulties, the preceding
extrapolation 'algorithm also is quite impractical. However, the PSWFs remain
fundamentally important for analysis of bandlimited signals.

Extrapolation by Error Energy Reduction [61,62]

An interesting and more practical extrapolation algorithm is based on a principle of


successive energy reduction (Fig. 8.30). First the given function g(x) ~ go (x) = I(x),
x E [-a, a], is low-pass filtered by truncating its Fourier transform to zero outside
the interval (-~o, ~). This reduces the error energy in II (x) because the signal is
known to be bandlimited. To prove this, we use the Parseval formula to obtain

2
f"I/(X) - g(x)1 dx = [lFm - GO(~)12 d~ = J:IF(~) - F; (~)12 d~
, •

+ II>.IGo(~Wd~> flFm-F;mI2d~=
-.
rl/(x)-:Mx)1 dx
-
2
(8.230)
,
Now II (x) is bandlimited but does not match the observations over [-a,a]. The
error energy is reduced once again if II (x) is'substituted by I(x) over -a::S; X :S a.
Letting J denote, this space-limiting operation, yve obtain
gt (x) ~/I (x) - J/t (x) + go'(x) (8.231)
and

(8.232)

324 Image Filtering and Restoration



..

f, (xl F,W
("""'\

u,
~
(xl

f, (xl ..

••

Figure 8.30· Extrapolation by successive energy reduction,

> 1 ~I>.
If(x) - gdx)j1 dx = r-.If(x) - s, (X)j2 dx
Now gl (x), not being bandlimited anymore, is low-pass filtered, and the preceding
procedure is repeated, This gives the iterative algorithm
fn(x) = W'gn-I(X), go(x)=g(x)~Jf(x) (8.233)
gn(x) =go(x) + (Y-J)fn(x), n =1,2 ... •
.
where .57 is the identity operator and W' is the bandlimiting operator. In the limit as
n -+ 00, bothj, (r) and g, (x) converge tof(x) in the mean square sense [62J. It can be
shown that this algorithm is a special case of a gradient algorithm associated with a
least squares minimization problem (65]. This algorithm is also called the method of
alternating projections because the iterates are projected alternately in the space of
bandlimited and space-limited functions. Such algorithms are useful for solving
image restoration problems that include a certain class. of constraints [53, 63, 64].

Sec. 8.18 Extrapolation of 8andlimited Signals 325


Extrapolation of Sampled Signals [65]

For a small perturbation,


. /(x) = f(x) + E'Tj(X), E l' 0 (8.234)
- .
where 'Tj(x) is not bandJimited, the desired analyticity of f(x) is lost. Then it is
possible to find a large number of functions that approximate f(x) very closely
on the observation interval [-0:,0:] but differ greatly outside this interval. This
situation is inevitable when one tries to implement the extrapolation algorithms
digitally. Typically, the observed bandlimited function is oversampled, so it can be
estimated quite accurately by interpolating the finite number of samples over
[-ex, ex]. However, the interpolated signal cannot be bandlimited. Recognizing this
difficulty, we consider extrapolation of sampled bandlimited signals. This approach

leads to more-practical extrapolation algorithms.

Definitions. A sequence yen) is called bandlimited if its Fourier transform


Yew), -'1I':s; w < 1T ,satisfies the condition
Yew) = 0, WI < Iool:s; 11' (8.235)
This implies that yen) comes from a bandlimited signal that has been oversampled
with respect to its Nyquist rate. Analogous to wand J, the bandlimiting and
space-limiting operators, denoted by Land S. respectively, are now 00 x 00 and
(2M + 1) x 00 matrix operators defined as
[LY]m=
.
in~-~ sil'l(m -n)wly(n)~.'7{[LY]m}=
1T(m-n)
Yew),
0,
Iwl<wl
1ll1<lllll:s;11'
(8.236)

[SY]m = y(m), -M:s; m :s; M (8.237)


By definition, then, L is symmetric and idempotent, .that is, LT = L andL' = L
(repeated ideal low-pass filtering produces the same result).

The Extrapolation Problem. Let.y(m) be a bandlimited sequence. We are


. given a set of space-limited noise-free observations


z{m)=y(m), -M:s;m:s;M (8.238)
Given z{m), extrapolate y(m) outside the interval [-M, M].

Minimum Norm least Squares (MNlS) Extrapolation



,

Let z denote the (2M + 1) x 1 vector of observations and let y denote the infinite
vector of {y{n), Vn}. then z = Sy. Since yen) is a bandlimited sequence, Ly = y, and

we can wnte
z = SLy = Ay, A a SL (8.239)

. This can be viewed as an underdetermined image restoration problem, where A
represents a (2M + 1) x 00 PSF matrix. A unique solution that is bandlimited and

326 Image Filtering and Restoration Chap. 8



reproduces the (noiseless) signal over the interval [- M, M] is the MNLS solution. It
is given explicitly as ,
s: ~ A1"[A ATr 1z = VST[SLLTSTr 1z
(8.240)
= LST[SLS'r l z ~ LSTi, -I Z
. .
. where L is a (2M + 1) x (2M + 1), positive definite, Toeplitz matrix with elements
'

{sinl1)l(m -n)/'ir(m- n),":'M Sm, n SM}. The matrix


A+ ~ A T[AA Tt l = LSTi,-1 (8.241)
is the pseudoinverse of A and is called the pseudpinverse extrapolation filter. The
extrapolation algorithm requires first obtaining x ~ L-I Z and then low-pass filtering
the sequence {x (m), -M sz m S M} to obtain the extrapolation as

~ sin WI (m - j) (.)
y
+ (
m-""
) _
.. j=-M
.
'ir(m - j)
Xj, Iml>M (8.242)

This means the MNLS extrapolator is a time-varying FIR filter {L -1]".;I'l' followed by
,
a zero padder (81) and an ideal low-pass filter (L) (Fig. 8.31). ,

Iterative Algorithms

Although L is positive definite, it becomes increasingly ill-conditioned as M• in-. •
creases. In such instances, iterative algorithms that give a stabilized inverse of L are
'l~eful [65t An example is the conjugate gradient algorithm obtained by substi-
tuting A=L and go= --':z into (8.136). At n =2M + 1, let Z~Un. Then Yn~LSTZ

converges to y". Whenever L is ill-conditioned, the algorithm is terminated when ~n
becomes small for n < 2M + 1. Compared to the energy reduction algorithm, the
iterations here are performed on finite-size vectors, and only a finite number of "
iterations are required for convergence.

Discrete Prolate Spheroidal Sequences (DPSS)

Similar to the PSWF expansion in the continuous case, it is possible to obtain the
MNLS extrapolation via the expansion
W+l
. y+ (m) = 2: aklj>k(m), \1m • (8.243)
k=l

, . , •

,.
z(m) FIR xlm) Y+(ml . I ,
filter LPF
• x L
[L-'k,m' 1 . ,

,
,

, ,

-M 0 M •

Zero •

pa!iding {STI· FIgure 8.31 MNLS extrapolation.


,

Sec. 8.18 Extrapolation of Bandlimitea Signals 327


(8.245)

,Mean Square Extrapolation


In the presence of additive independent noise, the observation' equation becomes
z = Ay + 1'1 = SLy + ' I ) ' (8.246)
'The best linear mean square extrapolator is then given by the Wiener filter
y= R,ST[SRyS T + R~tlz (8.247)
where R y and R~ are the autocorrelation matrices of y and 'I), respectively. If the
autocorrelation of y is unknown, it is convenient to assume R, = (J'2L (that is, power
spectrum of y i,S: bandIimited and constant). Then, assuming the noise to be white
with R" = (J'~I, we obtain. '

Y=LST[SLST+~IrZ=LstL+~hrz . (8.248)

If (J'~ -;. 0, then y-;. y+, the MNLS extrapolation. A recursive Kalman filter imple-
mentation of (8.248) is also possible [65].
Example 8.9 •
Figure 8.32a shows the signal y (m) '" sin (O.0792'll'm) + sin (O.068'll'm), which is given
for -8 <: mS 8 (Fig. 8.32b) and is assumed to have a bandwidth of less than w, = O.hr.
Figures 8.32c and 8.32d show the extrapolations obtained via the iterative energy
reduction and conjugate gradient algorithms. As expected, the latter algorithm has
superior convergence. When the observations contain noise (13 dB below the signal
power), these algorithms tend to be unstable (Fig. 8.32e), but the mean square extrapo-
lation filter (Fig. 8.32f) improves the result. Comparison of Figures 8.32d and 8.32f
shows that the extrapolated region can be severely limited due to noise.

Generalization to Two Dimensions


The foregoing extrapolation algorithms can be easily generalized to two (or
higher) dimensions when the bandlimited and space-limited regions are rectangles .


, , ,


Actual signal Observations
2::1 2
i1
1 r, I • 1
/:
;

I ( I'
y(mH
a !
z(m)
Or I
~
,
I
-1 • -1 '~ .

-2 m -2 •
m
-100 -:5(1 .. a 50 100 • -100 -50 o .' 50 100
(a)' . (b)

Extrapolated signal Extrapolated sigtlal


2

1 1.

A ,
Vim} 0 y{m} 0

-1

-2 I.-,"-,'--...U ...l..---'-_...L- -2 1-_1--:.11.... _..I-_...L.


"'100 -60 O. 50 100 -100 -50 a 50 100
(e) (d)
(e) Extrapolation after 30 (d} Extrapolation .fter 5
iterations of energy Iterations of the conjugate
reduction ~Igoritnm gradient algorithm

• •

Extrapolated signal
2

A
y(ml 0

-1

o 50
-100 0 100
(e)
,
(el Extrapolation in the presence (1)
of noise (13 dB below the (fl Stabilized extrapolation in the
.• s!gnall using the extrapctation presence of noise via the mean
matrix . $quare extrapolation fliter

Figure 8.32 Comparison of extrapolation algorithms.


,
329
(or hyper-rectangles). Consider a two-dimensional sequencey(m, n); which is
• known over a finite observation window [- M, 1\1] x [- M, M] and bandlimited in
[-W/,Wl] x {-WhWl]' Let z(m,n):=y(m,n),~M$mJn$M. Then, using the
operators Land S, we can write
Z=SLYLS T (8.249)
where Z is a (2M + 1) x (2M + 1) matrix containing the z(m,.n). Defining z- and
rY as the row-ordered mappings of Z and Y, J' Ll S®S, and,.,t'~ L®L, we get
A
z- := J'.f'y == drY
A .
.(,,~ = J',.,t') (8.250)
• •

Similar to L, the two-dimensional low-pass filter rnatrlx.z" is symmetric and idem-


.potent, All the foregoing one-dimensional algorithms can be recast in terms of J'
and,.,t', from which the following two-dimensional versions
, .
follow.
MNLS extrapolation: A A
Y+ = LST[L -r ZL -1]SL !
(8.251)
. ==A+ZE+ T
Conjuqate Gradient Algorithm. Same as (8.137) with Al = A2 ~ t: and Go A - Z.
Then Yn ~ LSTUnSL converges to Y+ at n := 2M + 1.

Mean Square Extrapolation Filter. Assume!7l y = cr 2 ,.,t',!7l~ := cr;(I ® I).

;~ [(t®t) +~(I®I)]-~ (8.252)


.; = [(LSI) ® (LST)],z. ::} Y = LSrZSL
Now the matrix to be inverted is (2M + 1) x (2M + 1) block Toeplitzwith basic
dimension (2M + 1) x (2M . + 1). The DPSS <Pk of (8.244) can be useful for this
,

inversion. The.two-dimensional DPSS are given by the Kronecker product <Pk ® <PI.

8.19 SUMMARY
In this chapter we have considered linear and nonlinear image restoration tech-
niques. Among the linear restoration filters. we have considered the Wiener filter

and have shown that other filters such as the pseudoinverse, constrained
, . least
squares, and smoothing splines also belong to the class of Wiener filters. For linear
observation models, if the PSF is not very broad, then the FIR'Wiener filter is quite
efficient and can be adapted to handle spatially varying ·PSFs. Otherwise, non-
recursive implementations via the FFT (or other fast transforms) should be suitable.
Iterative methods are most useful for the more general spatially varying filters.
Recursive filters and semirecursive filters offer alternate realizations, of the Wiener
filter as well as other local estimators. •

We saw that a large number of image restoration problems can be reduced to


solving (approximately) a lineal: system of equations'
Hu "" v

330 Image Filtering lind Restoration


for the unknowns u from the observations v. Some special types of restoration
problems include extrapolation of bandlimited signals and image reconstruction
from projections. The latter class of problems will be studied in chapter 10.
Among the nonlinear techniques, we have considered homomorphic filtering
for speckle reduction, maximum entropy restoration, ML and MAP estimation for
nonlinear imaging models, and a blind deconvolution method for restoration with
unknown blur. •

; I',
PROBLEMS
, '

8.1 (Coherent image formation) , According to Fresnel theory of diffraction, the complex
amplitude field of an object u (z ', y '), illuminated by a uniform monochromatic source
of light at wavelength A at a distance Z, is given by

, {jk
'JJ~ u(x',y') exp 2z [(x -X')2+(y _y')2] di'dy'
v(x,y)=cl }
" , -~

where c, is a complex quantity with Ic,l =; I, k = 27f/A. Show that if z is much greater
than the size of the object, then V (x, y) is a, coherent image whose intensity is propor-
tional to the magnitude squared of the Fourier transform of the object. This is also
,
called the Fraunhofer diffraction pattern of the object. .
8.2 (Optical ftltering) ' An object u (x, y) illuminated by coherent light of wavelength 'A
and imaged by a lens can be modeled by the spatially invariant system shown in Pig.
P8.2, where
,
, '
hk (x, y) = .
JA
~
k
exp{j
A
~ k
2 y2
(X + )} ,k == O, 1,2

1(x, y) = M (x, y)p(x,y),


, , , ,, ,
p (x, y) is a square aperture of width a" and d2 is the focal length of the lens. Find the
incoherent impulse response of this system when lido + lIdl = Ifd2 and show how this
,method may be used to filter an image optically. (Hint: Recall that the incoherent
.' , 'impulse response is the squared magnitude of the coherent impulse response.)

, ,
Object Image
plana Lens E"it plane
pupil r---------------- - - - ------,
11(11, 1'1
i
I vlx, 1'1 I
I
! , IIi (x, 1'1
l I
I
~

-- lI(x.yl
r
.
ho(x.yl X hI (x, vI
: vlx.yI

I
,
i
I

I ,

,
,
I
I
I
I
,

I
I
I
n~rl
~I.,
do· ... I.. . .- - dl ' I I
I . I
IL ~ ~
I .

• Figure PS.2

Chap. 8 , Problems 331,


8.3 In the observation model of Equations (8,1)-(8.3), suppose the image v(x, y) is
. sampled by a uniform square grid with spacing A, such that
. rJA
,l
v(m, n) =;ii J
1 .
v(m.<1 +x, nil + y)dxdy
o .
If u(m, nY, w(m, nY, 1]'(m, n) and 'l]"(m, n) are defined similarly and u(x,y), v(x,y),
and w(x, y) are approximated as piecewise constant functions over this grid, show that
(8,1)-(8.3) can be approximated by (8.18)-(8.20) where

hem, n i k, I) = ~ fIfI" h(mA + x, n S: + y ;kA + x', ttl + y')dxdydx' dy '


o
and 7j'(m, n) and 7j"(m, n) are white noise discrete random fields.
8.4 Show that the inverse filter of a spatially invariant system will also be spatially invariant.

8.5 A digital image stored in a computer is to be recorded on a photographic film using a
flying spot scanner. The effect of spot size of the scanner and the film nonlinearity is
equivalent to first passing the image through the model of Fig. P8.5 and then recording
it on a perfect recording system. Find the inverse filter that should be inserted before

recording the image to compensate for (a) scanner spot size, assuming '( = -1, and

(b) film nonlinearity, ignoring the effect of spot size.

Recording system model


Input image ,

:u(m, n) r--------------~--------,
I I
• I I
• I I
6 I w v Perfect I
ntm, n) a:w-" recording I
I
I
I
,, system ,,
1

Inverse I J
~-----------------------
filter _.J
,

him, n) = 81m. n) +~ [~(m - 1. n) + 8(m + 1. n) + 8(m, n - 1) + S(m. n + 1)]


Figure PS.S

8.6 Prove the first two staterr.ents made under Remarks in' Section 8.3.
. 8.7 '. A motion blurred image (see Example 8.1) is observed in the presence of additive
white noise. What is the Wiener filter equation if the covariance function of u(x, y) is
rex, y) = o 2 exp{ -0.05Ix 1- .051y I}? Assume the mean of the object is known. Give an .
algorithm for digital implementation of this filter.
8.8 Show exactly how (8.53) may be implemented using the FFt.
,
8.9 Starting from (8.57) show all the steps which lead to the FIR Wiener filter equation
(8.64). . •
8.10 (Spatially varying FIR filters) In order to derive the formulas (8.68), (8.69) for the
spatiallyvarying FIR filter, note that the random field

Ii (m, n) ~ ~(m, n) - ,.,..(m, n)



<rem, n)
is stationary. Assume that jJ.(m, n) and <rem, n) are constant over a region 'Wo =
{-2M,,; i, Is 2M}, which contains the region of support of hem, n). Using this, write a
spatially invariant observation model with stationary covariances over 'Wo. Using rbe

.
Imace Filtering and
,
P~'lto(atioo

results of Section 8.4, assuming :f.:Eh(m, n) = 1 and using the fact u(m, n) should be
unbiased, prove (8.68) and (8.69) under the assumption

p.(m,n) = l1u(m,n) =f.l.v(m,n)'''''


,
Ii
(2M + 1)
2:2:e v(i~m,j-n)
(i.j) W

8.H Show that the Wiener filter 'does not restore the power spectra! density of the object,
whereas the geometric mean filter does when s -~, (Hint: S". = 1012 S", for any filter
G). Compare the mean square errors of the two filters.
8.12* Take a low-contrast image and root filter it to enhance its high spatial frequencies.
8.13 a. If It is an arbitrary filter in (8.77) show that the average mean square error is given
by

b. If It is the Wienerfilter, then show the minimum value of (J~ is given by (8.82).
8.14 (Sine/cosine transform-based Wiener filtering) Consider the white noise-driven
model for N x N images
u(m,n) =a[u(m -l,n)+u(m + l,n) +u(m, n -l)+u(m,n + I)J
'+E(m,n), la!<tOsm,nsN-1 .

s. (z,; zz) = /32 ,
where u(-l, n)=u(O, n),u(N, n)=u(N-I, n), u(m, -l)=u(m; 0), u(m, N)=

u(m, N -I). ,
a. Show that the cosine transform is the KL transform of u(m, n), which yields the
generalized Wiener filter gain for the noise smoothing problem as

- /3'
p(k,/)= /32 + (/' 2[' k
~ 1 - 2a(eos 'IT
/N / ' )]"
+ cos orriN • Osk,/<N-1

, b.
If u( -1, n) = u(N, n) = utm, -1) - uim, N) = 0, show that the sine transform is
the KL transform and find the generalized filter gain.
8.15 a, Show that the spline coefficient vector a can be obtained directly:

a= (1 + :~LQ-ILTr y
• which is a Wiener filter if the noise in (8.96) is assumed to be white with zero mean
and variance (J~ and if the autocorrelation matrix. of y is )..[LQ-I LTr'.
b. The interpolating splines can be obtained by setting S = 0 or, equivalently, letting
)....... "'" in (8.100).For the data of Example 8.6 find these splines.

;
8.16 Prove that (8.110) and (8.111) give the solution of the' constrained least squares
, restoration problem stated in the text.
8.17 Suppose the object u (m, n) is modeled as the output of a linear system driven by a zero
mean unit variance white noise random field a(m, n), namely,
q(m, n) «uirn, n) = a(m, n)
,-

If u(m, n) is observed via (8.39) with S~~ = 'Y, show that its Wiener filter is identical to
the least squares filter of (8.110). Write down the filter equation when qim, n) is given
by (8.112) and show that the object model is a white noise-driven noncausal model.

Chop.S 333
8.18 Show that (8.115) is the solution of the least squares problem defined in (8.114).
8.19 If the sequences u(~, n),h(m, n),q(m, n) are periodic over an N)( N grid with DFTs
U(k, l}, H(k, I), Q(k, I), ...• then show the impulse response of the constrained least
squares filter is given by, the inverse DFf of ..

Gkl).<i W(k,1) O<k,1sN -1


t.( , = IH(k, 1)1 2 + 'YIQ-(k-,1)"'7127,

where 't is obtained by solving



y IQ(k, tW/V(k, 1)1 _
2
2

N2~~[IH(k, 1)12 + 'YIQ(k, 1)1 2]2 - \!

8.20 Show that IF defined in (8.86) is the pseudoinverse of SC and is nonunique when
N,N,sM,M,. .
8.21 Show that for separable PSFs, for which the observation equation can be written •
as q. = (H, @Hz)'t<, (8.136) yields the two-dimensional conjugate gradient algorithm
of (8.137).
8.22 Write the Kalman filtering equations for a first-order AR sequence u(n), which is
observed" as
yen) '" u(n) + Hu(n -1) + u(" + I)J + '!j(n)
8.23 (K-step interpolator) In many noise smoothing applications it is of interest to obtain .
the estimate which lags the observations by K steps; that is, If: ~ E[xnIYn" x...
Os n ' s 11 + K]. Give a recursive algorithm for obtaining this estimate. In image
processing applications, often 'the one-step interpolator performs quite close to the
optimum smoother. Show that it is given by
• R ArCr
X n . l =: n
-, n n
+ DCr -'[
+ t qn + 1 V n GrCr
+1 -,
-&'« 11:
]+ qn V It - n n + 1 qn + 1 lin + 1 Sn

8.24 In the semicausal model of (6.106) assume u(O,N),u(N + l,n) are known (that is,
image background is given) and the observation model of (8.171) has no blur, that is,
h(k, I) == 8(k, I). Give the complete semirecursivefiltering algorithm and identify the
fast transform used.
8.25 Show that there is no speckle in images obtained by an ideal imaging system. Show
that, for a practical imaging system, the speckle, size measured by its correlation
distance can be used to estimate the resolution (that is, R ce II ) of the imaging system.
8.26 a. Show that (8.193) is the solution of mint 6' (No) + l\Hilq. - SCNoll - u~}}.
2
• *
b. (Maximum entropy spectrum estimation) A special case of log-entropy restora-
tion is the problem of maximizing
,

6' ~-!-J" logS(w)d~ '.


2'lT -'" • .
where S(lJ) is observed as

.
1'(n) = - 1J'"
2'IT -"
S ()e'
III , - ds» n=O,±l,···.±P
,
The maximization is performed with regard to the missing observations, that is,
{r(n),lnl > pl. This problem is equivalent to extrapolating the partial sequence of
autocorrelations {T(n)'\" 1s p} out to infinity such that the entropy (] is maximized
and 5(00) = 2::*_..
r(n)e-J- is the SDF associated with r(n). Show that the max'

. Image Filtering and Restoration' Chap. 8


imum entropy solution requires that 8(z) ~ 8(w) = 11[2.:: _ _p A(n)z -n]. z = ej~,
that is, the Fourier series of IIS(w) must be truncated to +p terms. Noting that S(z)
can be factored ~ZIAp(z)Ap(z-l),AAz)~l-2i._,a(k)z-k show that ~2 and
a (k), k = 1, ... .p: are obtained by solving the AR model equations (6.13a) and
(6.13b). What is the resulting spectral density function? For an alternate direct
algorithm see {68].
8.27 Using the Gaussian assumptions for a and 'YJ prove the formulas for the ML and MAP
estimates given by (8.212) and (8.214).
8.28 Suppose the bandlimited signal {(x) is observed as j(x) = {(x) + 1](x),lxl <a, where
1](x) is. white noise with E[1](x)1](x
A
')]
N I
= . ,
i
'a~1I(x - x '), If (x) is extrapolated by the
truncated PSWF series f(x) =2.:n':-o iin$n (x), for every x, then show that the
minimum integral mean square error is f= E[lf(x) - z
j(xW]dx = :>':::.Nlanl +

0"; 2.:~':-; (1/;>'n) where Ao'" Al > A2 > '" '" An> An + I ... and an is given by (8.229).
This means the error due to noise increases with the number of terms in the PSWF

expansion, , .
8.29 a. If the bandlimited function f(x) is sampled at Nyquist rate (11L\) to yield
y(m) = f(mA), -M Sm sM, what is the MNLSextrapolation ofy(m)?
b. Show that the MNLS extrapolation of a bandlimited sequence.is bandlimited and
consistent with the given observations.

BIBLIOGRAPHY
Section 8.1, 8.2
. For general surveys of image restoration and modeling of imaging systems:

1. T. S. Huang, W. F. Schreiber and O. J. Tretiak. "Image Processing." hoc. IEEE 59,


. no. 11 (November 1971): 1586-1609.
2. M. M. Sondhi, "Image Restoration: The Removal of Spatially Invariant Degradations."
Proc. IEEE 60 (July 1972): 842-853.
3. T. S. Huang (ed.). "Picture Processing and Digital Filtering." In Topics in Applied
Physics, Vol. 6. Berlin: Springer-Verlag, 1975..
4. H. C. Andrews and B. R. Hunt. Digital Image Restoration, Englewood Cliffs, N.J.:
Prentice-Hall, 1977. . .
5. B. R. Frieden. "Image Enhancement and Restoration" in [3]. References on nonlinear
filtering are also available here. .
.6. J. W. Goodman. Introduction to Fourier Optics, Ch. 6, 7. San Francisco: McGraw-Hill,
1968.
Section 8.3
Discussions on inverse and Wiener filtering of images are available in several places;
see, for instance, [1, 2, 5] and:

7. C. W. Helstrom. "Image Restoration by the Method of least Squares." J. Opt.' Soc.


Am. 57 (March 1967): 297-303.

.

Chap. 8 ' Bibliography 335



Section 8.4

For FIR Wiener Filtering Theory and.more.examples:
8. A. K. Jain and S. Ranganath. "Applications; of Two Dimensional Spectral Estimation in
Image Restoration." Proc. ICASSP-1981 (May 1981): 1113-1116. Also see Prot.
lCASSP-1982 (May 1982): 1520-1523.
• •
For block by block filtering techniques:

9. J. S. Lim. "Image Restoration by Short Space Spectral Subtraction." IEEE Trans.


AcollS. Speech, Sig., Proc. ASSP-28, no. 2 (April 1980): 191-197.
10. H. J. Trussell and B. R. Hunt. "Sectioned Methods for Image Restoration." IEEE
Trans. AcollS. Speech, Sig, Proc. ASSp·26 (April 1978): 157-164.
11. R. Chellapa and R. L. Kashyap. "Digital Image Restoration Using Spatial Interaction
Models." IEEE Trans. AcollS. Speech SigcProc. ASSP-30 (June 1982): 461-472.

Section 8.6

For generalized Wiener filtering and filtering by fast decompositions:

12. W. K. Pratt. "Generalized Wiener Filter Computation Techniques." Trans. IEEE


Comput. C-21 (July 1972): 636-641. .
13. A. K. Jain. "An Operator Factorization Method for Restoration of Blurred Images."
iEEE Trans. Computers C-26 (November 1977): 1061-1071. Also see vol, C-26 (June
1977): 560-571. .
14. A. K. Jain. "Fast Inversion of Banded Toeplitz Matrices via Circular Decompositions."
IEEE Trans. Acous. Speech Sig. Proc. ASSP-26, no. 2 (April 1978): 121-126.

Section 8.7

15. T; N. E. Greville (ed.) Theory and Applications of Spline Functions. New York:
Academic Press, 1969. .
16. M. J. Peyrovian. "Image Restoration by Spline Functions." USCIPI Report No. 680,
University of Southern California, Los Angeles, August 1976. Also see Applied Optics
16 (December 1977): 3147-3153.
17. H. S. Hou. "Least Squares Image Restoration Using Spline Interpolation." Ph.D.

Dissertation, IPI Report No. 650, University of Southern California, Los Angeles,
March 1976. Also see IEEE Trans. Computers C-26. no. 9 (September 1977): 856-873.
. ,
Section 8.8

18. S. Twomey. "On the Numerical Solution of Fredholm Integral Equations of First Kind
by the Inversion of Linear System Produced by Quadrature." J. Assoc. Comput, Mach .
. 10 (January 1963): 97-101. .
19. B. R. Hunt. "The Application of Constrained Least Squares Estimation to Image
Restoration by Digital Computer." IEEE Trans. Comput' C-22 (September 1973):
805-812. '

Image Filtering and Restoration '. Chap. e



Section 8.9.

The generalized inverse of a matrix is sometimes also called the Moore-Penrose


inverse. For greater details and examples:

20. C. R. Rao and S. K. Mitra. Generalized Inverse of Matrices and its Applications.
New York: John Wiley and Sons, 1971.
21. A. Albert. Regression and Moore-Penrose Pseudoinverse. New York: Academic Press,
1972.

For numerical properties of the gradient algorithms and other iterative methods and
their applications to space-variant image restoration problems:

22. D. G. Luenberger. Introduction to Linear and Nonlinear


. Programming.
.
Reading, Mass.:
.

Addison-Wesley, 1973.

23. T. S. Huang, D. A. Barker and S. P. Berger. "Iterative Image Restoration." Applied
Optics 14, no. 5 (MaY 1975): 1165-1168.
24. E. S. Angel and A. K. Jain. "Restoration of Images Degraded by Spatially Varying Point
Spread Functions by a Conjugate Gradient Method." Applied Optics 17 (July 1978):
2186-2190.

Section 8.10

For Kalman's original work and its various extensions in recursive filtering theory:

25. R. E. Kalman. "A New Approach to Linear Filtering and Prediction Problems." Trans.
ASME. Ser. D., J. Basic Engineering, 82 (1960): 35-45.
26. B. D. O. Anderson and J. H. Moore. Optimal Filtering. Englewood Cliffs, N.J.:
Prentice-Hall, 1979.
27. G. J. Bierman. "A Comparison of Discrete LinearFiltering Algorithms." IEEE Trans.
Aerosp. Electron. Syst. AES-9 (January 1973): 28-37.
28. M. Morf and T. Kailath. "Square Root Algorithms for Least-Squares Estimation."
IEEE Tram. Aut. Contr. AC·20 (August 1975): 487-497.

For FFr based algorithms for linear estimation and Riccati equations:

29. A. K. Jain and J. J asiulek. "A Class of FIT Based Algorithms for Linear Estimation and
Boundary Value Problems." IEEE Trans. Acous. Speech Sig, Proc. ASSP 31, no. 6
(December 1983): 1435-1446. •

For state variable formulation for image estimation and smoothing and its exten-
sions to restoration of motion degraded images:

30. N. E. Nahi. "Role of Recursive Estimation in Statistical Image Enhancement." Proc.


'IEEE 60 (JulyJ972): 8 7 2 - 8 7 7 , · . .
.
31. A. O. Aboutalib and L. M. Silverman. "Restoration of Motion Degraded Images."
IEEE Trans. Cir. Sys. CAS-22 (March 1975): 278-286.

Chap, 8 Bibliography 331


Section 8.11

Recursive algorithms for least squares filtering and linear estimation of images have
been considered in:

32. A. K. Jain and E. Angel. "Image Restoration, Modeling and Reduction of Dimen-
sionality." IEEE Trans. Computers C-23 (May 1974): 470-476. Also sec IEEE Trans.
Aut. Contr. AC-18 (February 1973): 59-62.
33. A Habibi. "Tho-Dimensional Bayesian Estimate of Images." Proc. IEEE 60 (july
1972): 878-883. Also see M. Stnntzis, "Comments on Two-Dimensional Bayesian Esti-
• • mate of Images." Proc. IEEE 64 (August 1976): 1255-1257.
34. A. K. Jain and J. R Jain. "Partial Differential Equations and Finite Difference Methods
in Image Processing, Part II: Image Restoration." IEEE Trans. Aut. Control AC-23
(October 1978): 817-834. . .
35. F. C. Schoute, M. F. Terhorst and J. C. Willems. "Hierarchic Recursive Image
Enhancement." IEEE Trans. Circuits and Systems CAS-24 (february 1977): 67-78.
36. J. W. Woods and C. H. Radewan, "Kalman Filtering in Two Dimensions." IEEE Trans.
Inform. Theory IT-23 (July 1977): 473-482.
37. S. A. Rajala and R. J. P. De Figueiredo. "Adaptive Nonlinear Image Restoration
by a Modified Kalman Filtering Approach." IEEE Trans. Acoust. Speech Sig. Proc.
ASSP-29 (October 1981):,1033-1042.
38. S. S. Dikshit, "A Recursive Kalman Window Approach to Image Restoration." IEEE
Trans. Acoust. Speech Sig, Proc. ASSP-30, no. 2 (April 1982): 125-129.

'The performance of recursive filters can be improved by adapting the image model
to spatial variations; for example:

39. N. E. Nahi and A. Habibi, "Decision Directed Recursive Image Enhancement." IEEE

Trans. Cir. Sys. CAS-22 (March 1975): 286-293.

Section 8.12

Semirecursive filtering algorithms for images were introduced in:

40. A. K. Jain. "A Semicausal Model for Recursive Filtering of Two-Dimensional Images."
IEEE Trans. Computers C·26 (April 1977): 345-350.

For generalization of this approach see [8, 34].

Section 8.13

For fundamentals of speckle theory and some recent advances:

41. J. C. Dainty (ed.). Laser Speckle. New York: Springer Verlag, 1975.
42. J. W. Goodman. "Statistical Properties of Laser Speckle Patterns," In Laser Speckle
(41). ,

43. Speckle in Optics. Special Issue, J; Opt. Soc. Am. 66 (November 1976).

Image Filtering and Restoration Chap.S


338

For other results on speckle theory we follow:

44. M. TIJI, K. C. Chin and J. W. Goodman. "When Is Speckle Noise Multiplicative."


(letter) Applied Optics, 21 (April 1982): 1157-1159.
45. H. H. Arsenault and G. April. "Properties of Speckle Integrated with a Finite Aperture
. and Logarithmically Transformed." In [43J, pp. 1160-1163.
"

For digital processing of speckle we follow:


46. A. K. Jain and C. R. Christensen. "Digital Processing of Images in Speckle Noise."
Proc. SPIE, Applications of Speckle Phenomena 243 (July 1980): 46-50.
47. J. S. Lim and H. Nawab. "Techniques for Speckle Noise Removal." hoc, SPIE
243 (July 1980): 35-44.

For other homomorphic filtering applications and related bibliography-see [58].

Section 8.14

For maximum entropy restoration algorithms applicable to images, see [5] and:

48. B. R. Frieden. "Restoring with Maximum Likelihood and Maximum Entropy," J. Opt.
Soc. Amer. 62 (1972): 511-518. . .
49. A. Lent. "A Convergent Algorithm for Maximum Entropy Image Restoration with a
Medical X-ray Application." In Image Analysis and Evaluation, SPSE Conf. Proc,
(R. Shaw, ed.), Toronto, Canada, July 1976, pp. 221-267. .

The iterative algorithms presented are based on unconstrained optimization of the


dual functions and follow from: . .

50. A. K. Jain and J. Ranganath. "Two-Dimensional Spectral Estimation." Proc. RADC


Workshop on Spectral Estimation (1978): 151-157.
51. S. W. Lang. "Spectral Estimation for Sensor Arrays." Ph.D. Thesis, M.LT., August
1981. Also see IEEE Trans. Acous. Speech Sig, Proc. ASSP-30, no. 6 (December 1982):
880-887.

Section 8.15

For application of Bayesian methods for realizing MAP and ML estimators for
nonlinear image restoration problems, see [4] and:

52. B. R. Hunt. "Bayesian Methods in Nonlinear Digital Image Restoration." IEEE Trans.
Computers C-26,.n(\. 3, pp. 219-229.
53. H. J. Trussell. "A Relationship between Image Restoration by the Maximum A Poste-
riori Method and a Maximum Entropy Method,' IEEE Trans. Acous. Speech Sig, Proc.
ASSP-28, no. 1 (February 1980): 114-117. Also see vol. ASSP-31, no. 1 (February 1983):
129-136.
54. J. B. Morton and H. C. Andrews. "A Posteriori Method of Image Restoration." J. Opt.
Soc. Amer. 69, no. 2 (February 1979): 280-290.

Chap.S Bibliography 339



.Seetion 8.16'
. " . ,
55. R. Bernstein (ed.).. Digital Image Processing for Remote Sensing. New York: IEEE
Press, 1978. .

Coordinate transformation methods that convert certain space-varying PSFs into a


space-invariant PSF are developed in:

56. G. M. Robbins. "Image Restoration for a Class of Linear Spatially Variant Degrada-
. tions." Pattern Recognition 2 (1970): 91-1m3. Also see Proc. IEEE 60 (July 1972):
862-872. .
57. A. A. Sawchuk. "Space-Variant Image Restoration by Coordinate Transformations."
J. Opt. Soc. Am. 64 (February 1974): 1~B-144. Also see J. Opt. Soc. Am. 63 (1973):
1052-1062. •
I

Section 8.17

For. blind deconvolution methods and their application in image restoration: •

58. E. R. Cole. "The Removal of Unknown Image Blurs by Hormomorphic Filtering."


Ph.D. Dissertation, Department of Electrical Engineering, University of Utah, Salt
Lake City, 1973.
59. K. T. Knox: "Image Retrievalfrom Astronomical Speckle Patterns." J. Opt. Soc. Am.
66 (November 1976): 1236-1239.

Section 8.18

Here we follow:

60. D. Slepian and H. O. Pollak. "Prolate Spheroidal Wave Functions, Fourier Analysis and
Uncertainty-I." BSTJ 40 (Janaury 1961): 43-62.

For energy reduction algorithm and its generalization to method of alternating


projections:
,
61. R. W. Gerchberg. "Super-resolution through Error Energy Reduction." Opt. Acta 14,
no. 9 (September 1979): 709-720. ,
.
62. A. Papoulisv vA New Algorithm in Spectral Analysis and Bandlimited Extrapolation."
IEEE Trans. Cir. Sys. CAS-22, no. 9 (September
• •
1975).
63. D. C. Youla. "Generalized Image Restoration by the Method of Alternating Orthogonal
Projections." IEEE Trans. Cir. Sys. CAS-25, no. 9 (September 1978): 694-702. Also see
Youla and Webb. IEEE Trans. Medical Imaging MI-1, no. 2 (October 1982): 81-94.
.64. M. 1. Sezan and H. Stark. "Image Restoration by the Method of Convex Projections,
Par 2-Applications and Numerical Results." IEEE Traps. Medica/Imaging MI-1, no. 2
(October 1982).
65. A. K. Jain and S. Ranganath. "Extrapolation Algorithms for Discrete Signals with
Applications in Spectral Estimation," IEEE Trans. Aeous. Speech; Sig, Proc. ASSp-29


340, Image Filtering and Restor,ation Chap. 8
,

(August 1981): 830-845. Other references on bandlimited signal extrapolation can be


found here.


Other References

66. E. S. Angel and A. K. Jain. "Frame to Frame Restoration of Diffusion Images." IEEE
Trans. Auto. Control A0Z3 (October 1978): 850-855.
67. B. L. McGlaimmery. "Restoration of Turbulence Degraded Images." J. Opt. Soc. Am.
. 57 (March 1967): 293-297. . .
.68. J. S. Lim and N. A. Malik. "A New Algorithm for Two-Dimensional Maximum Entropy
Power Spectrum Estimation." IEEE Trans. Acoust. Speech, Signal Process. ASSP·Z9,
1981,401-413.

• ,. •
• •

<If_ >0"

Chap. 8 Bibliography 341


,
~'it _ _

• Image Analysis

and' Computer Vision


• '. . ' . ' . .' . .' . I .
• •

9.1 INTRODUCTION

The ultimate aim in a large number of image processing applications (Table 9.1) is
to extract important features from image data, from which a description, interpreta-
tion, or understanding of the scene can be provided by the machine (Fig. 9.1). For
example, a vision system may distinguish parts on an assembly line and list their
features, such as size and number of holes. More sophisticated vision systems are

TABLE 9.1 Computer Vision Applications


Applications Problems '

1 Mail sorting, label reading, supermarket-product billing, Character recognition


bank-check processing, text reading
2 Tumor detection, measurement of size and shape of internal Medical image analysis
organs, chromosome analysis, blood cell count
Parts identification on assembly tines, defect and fault Industrial automation
inspection
4 Recognition and interpretation of objects in a scene, motion Robotics
control and execution through visual feedback, "
Map making from photographs, synth,,~lS of weather maps •
$ Cartography
6 Finger-print matching and analysis of automated security Forensics
systems
1 Target detection and identification, guidance of helicopters and Radar imaging
aircraft in landing, guidance of remotely piloted vehicles
(RPV). missiles and satellites from visual cues
8 Multispectral image analysis, weather prediction, classification Remote sensing
and monitoring of urban, agricultural, and marine
environments from satellite images


342 •
,
. . I Conclusion
Data analysIs -:-"'" from analysis

Input
I
I

1.1'18ge
Preprocessing
Feature
Segmenta!ion
Feature I Classification
extraction extraction and deseription

- ~_
Image
... -
analysis
...
system
-
Image unden1tanding system
---- " .. _ - - - - - _ ...
.
_ .. .

.
I•

I .

Symbolic ; Interpretation
representation -: and description

I
Figure 9.1 A comIJuter vision system

able to interpret the results of analyses and describe the various objects and their
relationships in the scene. In this sense image analysis is quite different from other
image processing operations, such as restoration, enhancement, and coding, where

the output is another image. Image analysis basically involves the study of feature
extraction, segmentation, and classification techniques (Fig. 9.2).
,· . In computer vision systems such as the one shown in Fig. 9.1, the input image
is first :Ireprocessed, which may involve restoration, enhancement, or just proper
representation of the data. Then certain features are extracted for segmentation of •
the image into its components for example, separation of different objects by
extracting their boundaries. The segmented image is fed into a classifier or an image.
understanding system. Image classification maps different regions or segments into
one of several objects, each identified by a label. For example, in sorting nuts and
bolts, all objects identified as square shapes with a hole may be classified as nuts and
those with elongated shapes, as bolts. Image understanding systems determine the
relationships between different objects in a scene in order to provide its description.
For example, an image understanding system should be able to send the report:
The field of view contains a dirt road surrounded by grass.

Such a system should be able to classify different textures such as sand, grass, or
corn using prior knowledge and then be able to use predefined rules to generate a
description.
Image analy'i. techniques
r

Feature extraction Segmentation Classification


I ,_. ~

* Spatial features • Template matching • Clustering


.. Transform features • .Thresholding • Statistical
'. Edges and boundaries : • Boundary detection • Oec{sion trees ",
• She"" features • Clustering • Similarity measures .
• Momentl. • Quad·trees • Min. spanning trees
'{f Texture • Texture matching

Figure 9.2 .

Sec. 9.1 Introduction 343


9.2 SPATIAL FEATURE EXTR.4CTION

Spatial features of an object may be characterized by its gray levels, their joint
probability distributions, spatial distribution, and the Ii ke.· .

Amplitude Features

Histogram Features -
Histogram features are based on the histogram of a region of the image. Let u be a
random variable representing a gray level in a given region of the image. Define
( ) A P b] _ l>
number of pixels with gray level x
Pu x = ro u -x - total number of pixels in the region' (9.1)
x=O, ...,L-l
Common features of P. (x) are its moments, entropy, ,and so on, which are defined
next.
L-l
Moments:. mi=E[ui]=L xip.(x), i =1,2, ... (9.2)
'.r; 0
l::::

L-I
Absolute moments: mi"'" E[lul~"'" L Ix lip. (x) (9.3)
L-l
Central moments: lki=EHu-E(u)J}= L (X-ml)'Pu(X) (9.4)
x =0
L-l

Absolute central moments: fli = E[lu - E(um = 2: Ix - mllipu (x) (9.5)


x=Q .

/ Entropy:
L-I (9.6)
= - L p. (x) log2P. (x) bits

x=o
Some of the common histogram features are dispersion = (1.1> mean = ml>
variance = lk2, mean square value or average energy =m2, skewness = lk3, kurtosis
• • •
=
lk4 - 3. Other useful features are the median and the mode. A narrow histogram
indicates a low contrast region. Variance can be used to measure local activity in the
• amplitudes. Histogram features are also useful for shape analysis of objects from
their projections (see Section 9.7).

344 Image Analysis and ComputerVision Chap, 9


Often these features are measured over a small moving window W. Some of
the histogram features can be measured without explicitly determining the
histogram; for example,

mj(k,l)=~ L2:tv [u(m-k,n-l»)' (9.7)


w(m,n) E

1 .
p-j(k, I) = N LL [u(m - k, n -I)..., m, (k, I)Ji (9.8)
w (m, n) t W
where i = 1,2, ... and N; is the number pixels in the window W. Figure 9.3 shows
the spatial distribution of different histogram features measured over a 3 x 3
moving window. The standard deviation emphasizes the strong edges in the image
and dispersion feature extracts the fine edge structure. The mean, median, and
mode extract low spatial-frequency features.
Second-order joint probabilities have also been found useful in applications
such as feature extraction of textures (see Section 9.11). A second-order joint
probability is defined as
P. (Xt, Xl) ~ PUl>U~ (Xt,Xl) ~ Prob[ul '" Xl, Uz = X2], Xl> Xz ;. 0, ... ,L -1
(9.9)
"'" __n=u:;:m;;.:b:.:e:.,:r_o:..f~p..;;aI;;.;·rs;.-:O:.:f:..,p£.l:.:'x:.,:e:.,ls:..:::u..:.1_-,.:.x.:..1:..;.,:..u:.::,2_x;;,:;,.2r-rr--
total number of such pairs of pixels in the region


/"''''''
j>o.,


. .

,. ..,.
~''1:

,.

,. " • .' ,
'.

,J
. I

~"

- . .' .'

Figure 9.3 Spatial distribution of histogram features measured over a 3 X 3 mov-


ing window. In each case, top to bottom and left to right, original image, mean,
median, mode, standard deviation, and dispersion.:

Sec. 9.2 Spatial Feature Extracti on 345


where u, and·uz are the two pixels in the image region specified by some relation.
For example, Uz could be specified as a pixel at distance r and angle 11 from Ul. The .
L x L array {Po (XI>Xz)} is also called the concurrence matrix.
Example 9.1 .

For a 2-bit, 4 x 4 image region given as

o 102
U= 3 2 1 1
2 1 0 1
~3 1 2 0

we have pu (0) =f. pu (1) =i, p. (2) = t and p. (3) =~. The second-order histogram for
UI = u(m, n), U2" u(m + 1, /I + 1) is the 4 x 4 concurrence matrix
J. -+X2
xIII 1 0 •
I 0 2 1 0
91100
o 1 0 0

9.3 mANSFORM FEATURES

Image transforms provide the frequency domain information in the data. Transform
features are extracted by zonal-filtering the image in the selected transform space
(Fig. 9.4). The zonal filter, also called the-feature mask, is simply a slit or an aper-
ture. Figure 9.5 shows different masks. and Fourier transform features of different
shapes. Generally. the high-frequency features can be used for edge and boundary
detection, and angular slits can be used for detection of orientation. For example, .
an image containing several parallel lines with orientation 11 will exhibit strong
energy along a line at angle 7'1/2 + e passing through the origin of its two-
dimensional Fourier transform. This follows from the properties of the Fourier
transform (also see the projection theorem, Section lOA). A combination of an
angular slit with a bandlimited low-pass, band-pass or high-pass filter can be used
for discriminating periodic or quasiperiodic textures, Other transforms, such as
Haar and Hadamard, are also potentially useful for feature extraction. However, .
systematic studies remain to be done to determine their applications. Chapters 5
and 7 contain examples of image transforms and their processed outputs.
Transform-feature .extraction techniques are also important when the source
data originatesin the transform coordinates. For example, in optical and optical-
digital (hybrid) image analysis applications, the data can be acquired directly in the
Fourier domain for real-time feature extraction in the focal plane.

• •
, ,
Input ulm.n) Forward Vlk,ll vi!;'./! Invef1l/l !I(m.n)
image transform X mnsform
. •
.

Mask
• ·g(k.1l

Figure 9.4 Transform feature extraction.

. •

346 Image Analysis and Computer Vision Chap. 9


1-----~~,----2f~-------I

,, ,

"'-
,g,
iL
f~ -
1<'
l{:
t " ,.

. '- '

la) Slits and aperturas fbi Rectangular and circular shapes

, .".
" ---," ,' -. ;-

, l l' ; -t
, _ , ,". • 7 ' "1 /.; ;"'.~
'-l.. ':' "'''t.
,""~, .~', ",~.
,~
Y'7~,
,,-'
0- I; ,
-
'0 ".... •. , .,

Ie) Triangular and vertical shapes ld) 45' orientation and the letter J

Figure 9.5 Fourier domain features.

9.4 EDGE DETECTION


,,
• A problem of fundamental importance in image analysis is- edge detection. Edges
characterize object boundaries and are therefore useful for segmentation, registra-

:lon, and identification of objects in scenes. Edge points can be thought of as pixel
locations of abrupt gray-level change. For example, it is reasonable to define edge
. points in binary images as black pixels with at least one white nearest neighbor, 'that
is, pixel locations (m, n) such that u(m, n) = 0 and g(m, n) = 1, where
g(m, n) ~ [u(m, n)Eeu(m± 1,n)].OR.[u(m, n)Eeu(m, n + 1)] (9.10)

Sec. 9.4 'Edge Detection 347


y
,

fix, y) "
n
,~ Edge '. ,

r


\
\
ty \

'--:-'--'- --
Figure 9.6 Gradient of f{x, y) along r
, 'i-- ' x . direction.
-.;.....

where EEl denotes the logical exclusive-Ole operation. For a continuous image
f(x, y) its derivative assumes a local maximum in the direction of the edge. There-
fore, one edge detection technique is to measure the gradient of f along r in a
direction e (Fig. ,9.6), that is,
af ilfax ill ily . .
- = - - +- - = f, cos e +}; sme (9.11)
ilr ax ar ay ar x Y

The maximum value of ill/ilr is obtained when (a/aa)(aflilr) "" O. This gives

-fxsin8g+fycoseg=O :;} eg=tan-l(~ (9.12a)


(r,tx ='1/;+/; (9. 12b)



where ag is the -direction of the edge. Based on these concepts, two types of edge
detection operators have been introduced [6-11), gradient operators and compass
operators. For digital images these operators, also called masks, represent finite-
difference approximations of either the orthogonal gradients fx,fY or the directional
gradient oflar. Let H denote a p x p mask and define, for an arbitrary image V,
their inner product at location (m, n) as the correlation

(V, H}m.n ~ 2. 2. h(i,j)u(i + m, j + n) "" u(m, n) 0h(r-m, -n) (9.13)


I j

Gradient Operators

These are represented by a pair of masks H h H 2 , which measure the gradient of the
. image u(m, n) in two orthogonal directions (Fig. 9.7). Defining the bidirectional
". gradientsg!'(m, n) ~ (U, Hi }m, .,g2 (m, n) ~ (V, H 2 }",•• the gradient vectormagnitude
and
-, direction
. are given by ,
,. gem, n) =;: Vgi(m, n) + gHm, n) . (9.14)

• . eg(rn, n)' = tan~lg2 (m, n) (9.15)


81 em, n)

348 Image Analvslsand.Ccmputer vpion Chap. 9


.
g, (m, nl
h 1 t-m '-nl
. ' • g-- .jg21'"
.I. g2
2
I,, Magnitude
11--- Edce

u(m, ni • ,,, g(m, nl map
,,
172 {m,n} t
h,(-m. -nl 0, = too-' (021g,1
I Threshold
,,
Figure 9.7 . Edge detectionvia gradient operators.

Often the magnitude gradient-is calculated as


gem, n)~ IEl (m, n)1 + !g2(m, n)1 (9.16)

rather than as in (9.14). This calculation is easier to perform and is preferred
especially when iniplemented in digital hardware.
Table 9.2 lists some of the common gradient operators. The Prewitt, Sobel,
and isotropic operators compute horizontal and vertical differences of local sums.
This reduces the effect of noise in the data. Note these operators have the desirable
property of yielding zeros for uniform regions.
The pixel location (m, n) is declared an edge location if gem, n) exceeds some
threshold t. The locations of edge points constitute an edge map e(m, n), which is
defined as
(m,n) E I g
ll(m, n) - otherwisek (9.17)
where
I g ~ {em, n);g(m, fl) > t} (9.18)
. .
The edge map gives the necessary data for tracing the object boundaries in an
image. Typically, t may be selected using the cumulative histogram of gem, n) so

TABLE 9.2 Some Common Gradient Operators.


Boxed element indicates the location of the origin

Roberts [9] [2]1 ITJo


-1 0 [0 -1
Smoothed (Prewitt [6]) -1 0 1 -1 -1 -1
-1 [2] 1 o @] 0
-1 0 1 1 1 1

1
. Sobel [7] -1 0 1 -1 -2 -1 :•
[Q] 2 0[2]
~J
-2
-1 0 1 1 2

Isotropic -1 0 1 • -1 -Vi -1
-Vi [2] Vi o [Q] 0
-1 0 i l' 1 Vi 1

Sec. 9.4 . Edge Detection 349



r~
-~

, •\, .
I!<"~'
,
'? ' , '.
'
Yo
f /' ' , ,
• -iii
:Ii
c-
i

, ,

lal Sobet Ibl Kirsch

,
,'1
' ':
, '

'-.<.. , , '+
i
, _
;ji
',t
.JJl
.. ~

1j
,
i -- :~


../---- "T'·'I'. -" _--'_
-~

tcl Stochastic 5 x 5 Idl Laplacian

Figure 9.8 Edge detection examples. In each case, gradient images (left), edge
maps (right).

that 5 to 10% of pixels with largest gradients are declared as edges. Figure 9,8a
shows the gradients and edge maps using the Sobel operator on two different

Images.

Compass Operators

Compass operators measure' gradients in a selected number of directions (Fig. 9.9). •

Table 9.3 shows four different compass gradients for north-going edges. An anti-
0
clockwise circular shift of the eight boundary elements of these masks gives a 45
rotation of the gradient direction. For example, the eight compass gradients corre-
sponding to the third operator of Table 9:3 are . ,


350 Image Analysis and Computer Vision Chap. 9
r

oim, nl (II( 1m, nj Gradient g(m, n) 1 . Edge map


h.(-m, -n} M;x {l , I} 0 t
___
11. "
Threshold
,

Figure 9.9 Edge detection via compass operators.

1 1 1 t 1
£ 0 <,1 1 o -1
o -1
- o -1 -1 «
1 o -1 (SW)
0 0 "0 (N) 1 o -1 (NW) 1 (W)
":'1 -1 -1 o -1 -1 1 o -1 1 1 0
-1 -1 ~1 ! '
0 0 0(8)
"
-1 -1
-1 0
0 \.
1 (SE)
,(
• -1
-1
0
0
1
1 (E)
- -1 0
0 1 1 J'
1 (NE)
1 1 :1 0 1 1 -1 0 1 -1 -1 0
, Let g" (~, n) denote the compass gradient in the direction (lk = 'IT/2 + k1r/4,
k = 0, ... ,7. The gradient at location (m, n) is defined as
gem, n) A max{ig.(m, n)1} (9.19)
• •

which can be thresholded to obtain the edge map as before, Figure 9.8b shows the
results for the Kirsch operator. Note that only four of the preceding eight compass
gradients are linearly independent. Therefore, it is possible to define four 3 x 3
arrays that are mutually orthogonal and span the space of these compass gradients.
These arrays are.called orthogonal gradients and can be used in place of the compass'
gradients [12]. Compass gradients with higher angular resolution can be designed by
increasing the size of the mask. .

TABLE 9.3 Compass Gradients (North). Each Clockwise Circular Shift of


Elements about the Center Rotates the Gradient Direction bV 45" ,
, ,
,

1 1 1 ,
1 1 1
1} . 1 81
1
-1 -1 -1 t
3)
,
0 0
-1 -1 -1
@]

5 5· 5 121
2} -3 IJD -3 (Kirsch) 4) 0 @] 0
-3 -3 -3 -1 -2 -1

laplace Operators and Zero Crossings


The foregoing methods of estimating the gradients work best when the gray-level
transition is quite abrupt, like a step function. As the transition region gets wider
(Fig, 9.10), it is more advantageous to apply the second-order derivatives. One
frequently encountered operator is the Laplacian operator, defined as
,

. iPf iif
. '. ax 2 +~
.Vlf= iJy 2 •
(9.20)

Sec, 9.4 Edge Detection 351


fIx)

df
-
dx

tal First and second derivatives for edge detection

Figure 9.16 Edge detection via zero-



lbl An image and its zero-crossings. crossings.

Table 9.4 gives three different discrete approximations of this operator. Figure 9.8d
shows the edge extraction ability of the Laplace mask (2). Because of the second-
order derivatives, this gradient operator is more sensitive to noise than those pre-
viously defined. Also, the thresholded magnitude of \72f produces double edges.
For these reasons, together with its .inability to detect the edge direction, the
Laplacian as such is not a good edge detection operator. A better utilization of the
Laplacian is to use its zero-crossings to detect the edge locations (Fig. 9.10). A
generalized Laplacian operator, which approximates the Laplacian of Gaussian
functions, is• a powerful zero-crossing detector [13]. It is defined as

2+n 2 2+n 2
6. (m ) m
him, n) "" c 1- 2 exp - '. 2 2 (9.21)
.er er

where ercontrols the width of the Gaussian kernel and c normalizes the sum of the
elements of a given size mask to unity. Zero-crossings of a given image convolved
with h(m, n) give its edge locations. 'On a two-dimensional grIl:!, a zero-crossing is
said to occur wherever there is a zero-crossing in at least one

direction.

352 Image Analysis sup Computer Vision Cbap.9



The hem, n) is the sampled impulse response of an analog band-pass filter
whose frequency response is proportional to (~1 + ~~) exp[ -2(T2 (~1 + (Tin. There-
fore, the zero-crossings detector is equivalent to a low-pass filter having a Gaussian
impulse response followed by a Laplace operator. The low-pass filter serves to
attenuate the noise sensitivity of the Laplacian. The parameter cr xontrols the
amplitude response of the filter output but does not affect the location of the

zero-crossings.
Directional information of the edges can be obtained by searching the zero-
crossings of the second-order derivative along r for each direction a. From (9.11),
we obtain
iif af," af, a2f iPf a2 f -.26
~=---=: cos6+::i.l sin 6 = oos26+2 ' sin e cos6+ . sin (9.22)
ar ar
2 ar ax 2 ax oy oy 2 .

Zero-crossings are searched as 6 is varied [14],




• Stochastic Gradients [16]

The foregoing gradient masksperform poorly in the presence of noise. Averaging,


low-pass filtering, or least squares edge fitting [15J techniques can yield some
reduction of the detrimental effects of noise. A better alternative is to design edge
extraction masks, which take intO account the presence of noise in a controlled
manner. Consider an edge model whose transition region is 1 pixel wide (Fig. 9.11).

I

u(m. nl I I I
I • I I I
I I
I I
t I
I
I
I
I
I
I
I
I
I
I
I
I
I I I I I I
I I I
I I I I I I
I I I .
I I I I I I I n
,
j I
. I
I
I
I
. I
I
I
I
I
I
I
I
I
I
I

I I I I I I I, I I
, I ! I I I I ! I I

•,
X X X X 0 ,
X X .", X
in
X X X X 0 X X X X
1

X X
• A " . ..... ,

X u, 'Xl
c."
.
oP 1"'-'
IX'
l.,;_:J
A
Ub X X x m
X X X X 0 X ·x X X
Wo

•• ..,._,rX-'
~ X X X X x X X


H, •

Figure 9.11 Edge model with transition region one pixel wide.

'. Sec. 9.4 Edge Detection 353


!

TABLE 9.5 Stochastic Gradients H, for Edge Extraction of Noisy Images with To (k, I) = .99"11"2+12, H2 ~ Hi;
SNR = a 2/a2,. •
. '
' '

• SNR=l SNR=9

. ))(3 0.97 0 -0.97 0.776 0 -0.776


• 1.00 @] -1.00 1.00 @] -1.00
0.97 0 -0.97 0.776 0 -0.776

-3
'Ill
0.802 0.836
0.845 0.897
0 -0.836
0 -0.897
-0.802
-0.845
0.267
0.373
0.364 0
0.562 0
-0.364
-0.562
-0.267
-0.373
1IJ
III 5><5 0.870 1.00 @] -1.00 -0.870 0.463 1.00 . @] -1.00 . -0.463
;-' 0.845 0.897 0 -0.891 -0.845 0.373 0.562' 0 -0562 -0.373
.-~
III

_. 0.802 0.836 0 -0.836 -0.802 0.267 0.364 0 -0.364 -0.267


,Ill •
0>
:::l 0.641 0.672 0.719 0 -0.719-0.672 -0.641 0.073 0.240 0.283 0 -0.283 -0.140 -0.073
0.
oo 0.656 0.719 0.781 0 -0.781 -0.719 -0.656 O.t04 0.213 0.348 0 -0.348 -0.213 -0.104
0.688 0.781 0.875 0 -0.875' -0.781 -0.688 0.165 0.354 0.579 0 -0.579 -0.354 -0.165
3
't:l 7><7 0.703 0.813 1.00 @] -1.00 -0.813 -0.703 0.195 0.463 1.00 @] -1.00 -0.463 -0.195
~... 0.688 0.781 0.875 0 -0.875 -0.781 -0.688 0.165 0.354 0.579 0 -0.579 -0.354 -0.165
0.656 0.719 0.781 0 -0.781 -0.719 -0.656 0.104 0.213 0.348 0 -0.348 -0.213 -0.104
_.o
:S
III
0.641 0.672 0.719 0 -0.719 -0.672 -0,641

0.073 0.140 0.283 0 -0.238 -0.140 -0.073
:::l

o::T
m • •

'P •
CD
• •
To detect the presence of an edge at location P, calculate the horizontal gradient,
for instance, as .
. gJ (m, n) ~ uf(m, n - 1) - u. (m, n, + 1) . (9.23)
Here uf(m, n) and Ub (m, n) are the optimum forward and backward estimates of
u (m, n) based on the noisy observations given over some finite regions Wof the left-
and right half-planes, respectively. Thus uf(m, n) and {I b (m, n) are sernicausal esti- .
mates (see Chapter 6). For observations v (m, n) containing additive white noise, we
can find the best linear mean square semicausal FIR estimate of the form

Uf(m,n)~ 2:2: a(k,l)v(m-k,n-l), W=r(k,I):lkl<p,O<I<q] (9.24)


(k.I)'W

The filter weights atk, I) can be determined following Section 8.4 with the
modification that W is a semicausal window. [See (8.65) and (8.69)]. The backward
semicausal estimate employs the same filter weights, but backward. Using the
definitions in (9.23), the stochastic gradient operator HI is obtained as shown in
Table 9.5. The operator H 2 would be the 90" counterclockwise rotation of HI, which,
due to its symmetry properties, would simply be Hr.
These masks have been
normalized so that the coefficient a(O,O) in (9.24) is unity. Note that for high SNR
the filter weights decay rapidly. Figure 9.8c shows the gradients and edge maps
obtained by applying the 5 x 5 stochastic masks designed for SNR = 9 but applied to
noiseless images. Figure 9.12 compares the edges detected from noisy images by the
Sobel and the 'stochastic gradient masks.

Performance of Edge Detection Operators

Edge detection operators can be compared in a number of different ways. First, the
image gradients may be compared visually, since the eye itself performs some sort of
edge detection. Figure 9.13 displays different gradients for noiseless as well as noisy
images. In the noiseless case all the operators are roughly equivalent. The stochastic
gradient is found to be quite effective when noise is present. Quantitatively, the
performance in noise of an edge detection operator may be measured as follows.
Let no be the number of edge pixels declared and nt be number of missed or new
edge pixels after adding noise. If no is held fixed for the noiseless as well as noisy
images, then the edge detection error rate is

(9.25)
,

In Figure 9.12 the error rate for the Sobel operator used. on noisy images with
SNR"" 10 dB is 24%, whereas it is only 2% for the stochastic operator.
Another figure of merit for the noise performance of edge detection.operators
is the quantity

p= 1 .2 1

(9.26)
• max(Nl • ND) i. I 1 + adf •

Sec. 9.4. Edge Detection 355


• • •

• •
,,;.,
.~-
' ....., ",..., .

..
- "-
..
"
".J.
• •
- ..
,

• •

, •:;, ••

• ,•.
,
t.- •

!Ie
'.
I

• • >.
,
.. , .
• ".
. ;
I ."r"- - _
• " •
>-

, .-
• .'~ •
f
fe , ,
t_ f .
. ,"; ,
I

><'~~
...
" ,'. - -- .. ---~
,, .
, ",
,
. '-'.;,'
~k
.. ~ <.

• •
c"'" '1 ,,-' • •
"" .'~,,- , • , ,
-,. ,
')'r~_'
~

4 •

.. --, - ----- _.,-


I. i
,>. ' - , .•
_ <' ~,- .

Figure 9.12 Edge detection from noisy images, Upper two, Sobel, Lower two,
stochastic,

where d, is the distance between a pixel declared as edge and the nearest ideal edge •
pixel, a is a calibration constant, and N1 and N D are the number of ideal and
detected edge pixels respectively, Among the gradient and compass operators of
Tables 9,2 and 9:3 (not including the stochastic masks), the Sobel and Prewitt
operators have been found to yield the highest performance (where performance is
proportional to the value of P) [17]. .

.Line and Spot Detection

Lines are extended edges. Table 9.6 shows compass gradients for line 'detection.
Other forms of line detection require fitting a line (or a curve) through a set of edge
points. Some of these ideas are explored in Section 9.5.

356 Image Analysis and Computer Vision Chap. 9



tI\'f(W""y. } , ". .<"W "1< t.'+'+ .,,"\,..1,:1','" , . ·,·tWA>' ,> :"1-;/:' .. ,,'......., 1/<","""" '.. . -.. ':'''''... ""'/.. ,,\
~. -<

,•• •
I. •
,
, h
,.' .,
.
,~'-'..
-- , ..
--'

-..
-- ~

,.
..
--
,

-
'.'

.', ' " I" ,


, ','

". . ' ,
., .'
'~

'. t" ,
. ... --
".
~>
..

'.'
,
.,

- , ,' -.. . r,
,-- , •
. '
," ', ",'
;'
, '
.
\

,,
.
,-,-,~,
,

.. .. -.

(a) Gradients for noiseless image (b) Gradients for noisy image

1 -,
?
.

Figure 9.13 Comparison of edge extraction operators, In each case 3 the .


operators are (1) smoothed gradient, (2) sobel, (3) isotropic. (4) semicausal model
based 5 x 5 stochastic. 'Largest 5% of the gradient magnitudes were declared edges.

TABLE 9.6 Line Detection Operators


" -1 -1 -1 -1 -1 2 --1 2 -1 1 2 -1 -1
.
2
-1
.2
-1
2
-1
-1
"'"

2 -1
-1 -1
-1
..-1
2
2
-1
-1
J •
-1
.. -1
-
-1
? -1
2
(a) E-W (b) NE-SW (c) N-S (d) NW-SE

Spots are isolated edges. These are most easily detected by comparing the
, value of a pixel with an average or median of the neighborhood pixels. '

9.5 BOUNDARY EXTRACTION

Boundaries are linked edges that characterize the shape of an object. They are
useful in computation of geometry features such as size or orientation.
Connectivity

,
Conceptually, boundaries can be found by tracing the connected edges. On a rec-

tangular grid a pixel is said to be four- or eight-connected when it has the same
properties as one of its nearest four or eight neighbors, respectively (Fig. 9.14).
There are difficulties associated with these definitions of connectivity, as shown in
Fig. 9.14c. Under four-connectivity, segments 1,.2,3, and 4 would be classified as
disjoint, although they are perceived to form a connected ring. Under eight-
connectivity these segments are connected, but the inside hole (for example, pixel
B) is also eight-connected to the outside (for instance, pixel C). Such problems can

Sec. 9.5 ~ 'Boundary Extraction 357


, 1
, ,

.
,
! , c
I ,
, I
.. .'

,
. I
I
II . ,
,
, ·
,

·
!I
, - .','
, B , r~
! '.
J

, , , .
.. ' , , . -J
j4
UT
• ,
• .;"
. .,
, , , . ",
A "
A 2 .'. ·
.'.' ,.: -r
• ,, .; -
, ,,
",
, , . . '.

, , %~'1
.'. •
,
, , , , ·
. . .
'C-. , • • , • , •
· ~
.. .. I
<:
I ' . .'
/~:::~{
J ,
"
,

3
1.1 , (b! (el
,

Figure 9.14 Connectivity on a rectangular grid. Pixel A and its (,,) 4-connected
and (b) s-coonectcd neighbors; (c) connectivity paradox: "Are Band C con-
nected?" •

be avoided by considering eight-connectivity for object and four-connectivity for


background: An alternative is to use triangular or hexagonal grids, where three- or
six-connectedness can be defined. However, there are other practical difficulties
that arise in working with nonrectangular grids. '.

Contour Following
As the name suggests, contour-following algorithms trace boundaries by ordering
successive edge points. A simple algorithm for tracing closed boundaries in binary • •

images is shown in Fig. 9.15. This algorithm can yield a coarse contour, with some of
the boundary pixels appearing twice. Refinements based on eight-connectivity tests
for edge pixels can improve the contour trace [2]. Given this trace a smooth curve,
such as a spline, through the nodes can be used to represent the contour. Note that
this algorithm will always trace a boundary, open or closed, as a closed contour. This
method can be extended to gray-level images by searching for edges in the 45° to
135° direction from the direction of the gradient to move from the inside to the
outside of the boundary, and vice-versa [19J. A modified version of this contour-
following method is called the crack-following algorithm [25J. In that algorithm each
pixel is viewed as having a square-shaped boundary, and the object boundary is
traced by following the edge-pixel boundaries.

Edge Linking and Heuristic Graph Searching [18":'211

A boundary-can also be viewed as a path through a 'grapl] formed by linking the


edge elements together. Linkage rules give the procedure for connecting the edge
elements. Suppose a graph with node locations x., k = 1,2, ... is formed from node
A to node B. Also, suppose we are given an evaluation function </>(Xk), which gives
the value of the path from A to B constrained to go through the node Xk' In heuristic
search algorithms, we examine the successors of the start node and select the node
that maximizes </>(-). The selected node now becomes the new start node. and the
process ill repeated until we reach B. The sequence of selected nodes then consti-
, .


358 Image Analysis and Computer Vision Chap. 9
2 3


B
16
--- 4
1
....
5 12 13
/
17

.... ,, /
/
/
.... 7 11 14 15 19
34 35
, 10 18
I
I ", /
/


I
A
" , /
/
9 20 21
32
33 , a
", 2& 24
,
" .... -, 30 ..-
..... "' .....
--- 27
26 23
.,• 22
31

29 28

Algorithm
1. Start inside A (e,g., 11
2. Turn left and step to next pixel if
in region A. (e,g., 1 to 21, otherwise
• tum right and step (e.g., 2 to 3)
3. Continue until arrive at starting
point 1

Figure 9.15 Contour following in a binary image.

tutes the boundary path. The speed of the algorithm depends on the chosen 4>0
[20,21]. Note that such an algorithm need not give the globally optimum path. '
EXliIJIlple 9.2 Heuristic search algorithms [19)
Consider a 3 x 5 array of edges whose gradient magnitudes Igi and tangential contour
directions e are shown in Fig. 9.16a. The contour directions are at 90" to the gradient
directions. A pixel X is considered to be linked to Y if the latter is one of the three
eight-connected neighbors (1';, Yz, or Y3 in Fig. 9.16b) in front of the contour direction
and if 16(x) -ll(y)[ < 90". This yields the graph of Fig. 9.16c. •

As an example, suppose <i>(Xk) is the sum of edge gradient magnitudes along the
path from A to xi, At A, the successor nodes are D, C, and G, with 4>(D), = 12,
<f>(C) = 6, and <I>(G) = 8. Therefore, node D is, selected, and C and G are discarded.
From here on nodes E, F, and B provide the remaining path. Therefore, the boundary
"path isADEFB. On the other hand, note that path ACDEFB is the path of maximum
cumulative gradient.

Dynamic Programming


Dynamic programming is a method offinding the global optimum of multistage pro-
cesses. It is based on Bellman's principal of optimality [22], which states that the
optimum path between two given points is also optimum between any two points lying

Sec.9.S .Boundarv Extraction 359


;

'/: "'-"85
J'

5
• Y ~ 6

3 2 3
~ • • . ,,110



(e) Gradient magnitudes (b) Linkage rules (c) Graph interpretation
contour directions

Figure 9.16 Heuristic graph search method for boundary extraction.

on the path. Thus if C is a point on the optimum path between A and H (Fig. 9.17),
then the segment CB is the optimum path from C to H, no matter how one arrives
at C. . •
To apply this idea to boundary extraction [23], suppose the edge map has been
converted into a forward-connected graph of N stages and we have an evaluation
function . .
N N tV
S(x!>xz, ... 'Y-N,N) ~ 2: Ig(Xk)/- a 2: IO(~) - O(xk-l)j-13 2:d(XhXk.,.l) (9.27)
k=l k=2 k=2

Here Xb k = 1, ... ,N represents the nodes (that is, the vector of edge pixel loca-
tions) in the kth stage of the graph, d(x, y) is the distance between two nodes x and
y; jg(xk)l, e(Xk) are the gradient magnitude and angles, respectively, at the node Xk,
and a and ~ are nonnegative parameters. The optimum boundary is given by
connecting the nodes Xb k = 1, ... ,N, so that S(xj,xz, ... ,xN,N) is maximum.
Define '.' . .
d ..
¢l(XN,N) = max {S(x!> ... ,xN,N)} (9.28)
. "l···7l.N-l
Using the definition of (9.27), we can write the recursion

S(Xl"" ,xN,N) =S(x], ... ,x;v-j,N -1)


+ {lg(XN)! - ale(x",) - 6(x"'-1)!-l3d(xN,XN-l)} (9.29)


~ 5(x1.'" ,XN-hN -1) +!(XN-l> XN)

8 .

••
F~ 9.17 Bellman's principle of opti-

• • mality. If. the path AB Is optimum, then .
D so is CB no matter how you' arrive at C..

360 Image Analysis and Computer Vision. . Chap. 9


where [(x,v-l> x,v) represents the terms in the brackets. Letting N = k in (9.28) and
(9.29), it follows by induction that

4>(Xk' k) = max {S (xt, ... , Xk _h k - 1) + [(Xk _" Xk)}
"l·";(k-l

= max{4>(xk-" k - 1) + [(Xk'h Xk)}, . k = 2, .. , ,N.


"" - 1 (9.30)
. S(xJ, ... ,XN' N) = max {4>(x N, N)}
XN

<P(Xh 1) ~ Ig (xl)1
This procedure is remarkable in that the global optimization of S(Xl>' .. , xN,N) has
been reduced to N stages of two variable optimizations. In each stage, for each
value of Xk one has to search the optimum <fl(Xh k). Therefore, if each x, takes L
different values, the total number of search operations is (N - l)(L 2 - 1) + (L - 1).
This would be significantly smaller than the L N - 1 exhaustive searches required for
direct maximization of S (X2, X2, .•• , XN, N) when Land N are large..

Example 9.3
Consider the gradient image of Fig. 9.16. Applying the linkage rule of Example 9.2
and letting a = 411r, (:l = 0, we obtain the graph of Fig.9.I8a which shows the values
of various segments connecting different nodes. Specifically, we have N = 5 and
~(A, 1) = 5. For k = 2, we get

. weD, 2) = max(ll, 12) "= 12


which means in arriving at D, the path A CD is chosen. Proceeding in this manner some
of the candidate paths (shown by dotted lines) are eliminated. At k = 4, only two paths
are acceptable, namely, ACDEF and AGHJ. At k = 5, the path JS is eliminated,
giving the optimal boundary as ACDEFB.

D E . F

max (11. 121 (l61 (231



A I
5 (5) ----oc ~_(8) .-max (19, 2Sj
A " ,," 8
, ..•.. II '......... ",.II'
/
.
. (81· max (8, 101 max [13, 101
,. G H J
(6) Paths with valuM

1 -2 3 4 Ik
,
... .." (b) <1>1"., kj at vedcus .tall'>'. Solid
line gives the optimal path.
,
<

Figure 9.18 Dynamic programming for optimal boundary extraction.

. ..
Sec. 9.5 Boundary Extraction 3151
.'
y 8

. • (',8)

'----------.o_ .
lal Straight line fbI .Hough transform

Figure 9.111 The Hough transform,

Hough Transform [1,24]


A straight line at a distance s and orientation 9{Fig. 9. 19a) can be represented as


s= x cos 9 + Y sin 0 . (9.31)
The Hough transform of this line is just a point in the (s, 0) plane; that is, all the
pointson this line map into a single point (Fig. 9.1%). This fact can be used to
detect straight lines in a given set of boundary points. Suppose we are given bound-
ary points (x" Yi), i = 1, ... , N. For some chosen quantized values of parameters s
and 0, map each (Xi,Yi) into the (s, 0) space and count C(s, 9), the number of edge
points that map into the location (s, 0), that is, set


C(Sh ( 1) = C(Sh Ill) + 1, if Xi cos 6 + Yi SiDO = Sk for 0 = 61 . (9.32)
Then the local maxima of C(s, 0) give the different straight line segments through
the edge points. This two-dimensional search can be reduced to a one-dimensional
search if the gradients 6; at each edge location are also known. Differentiating both
,. sides of (9.31) with respect to x, we obtain

5!I
dx
= -cot 6 = tan(2: + (1\
2 j
·(9.33)

Hence C(s, 9) need be evaluated only for 6 = -rr/2 - ai • The Hough transform can
also be generalized to detect curves other than straight lines. This, however,
increases the dimension of the space of parameters that must be searched [3J. From
Chapter 10, it can be concluded that the Hough transform can also be expressed as
the Radon transform of a line delta function.

9.6 BOUNDARY REPRESENTATION

Proper representation of object boundaries is important for analysis and synthesis


of shape. Shape analysis is often required for detection and recognition of objects in
a scene. Shape synthesis is useful in computer-aided design (CAD) of parts and
assemblies, image simulation applications such as video games, cartoon movies, .

362 Image Analysis and Computer Vision Chap. 9


environmental modeling of aircraft-landing testing' and training, and other com-
puter graphics problems.

.Chain Codes [26]

In chain coding the direction vectors between successive boundary pixels are en-
coded. For example, a commonly used chain code (Fig. 9.20) employs eight direc-
tions, which can be coded by 3-bit code words. Typically, the chain code contains
the start pixel address followed by a string of code words. Such codes can be
generalized by increasing the number of allowed direction vectors between
successive boundary pixels. A limiting case is to encode the curvature of the contour
as a function of contour length t (Fig. 9.21).

A
Algorithm:
1 . 1. Start at any boundary
3 2 pixel. A,
2. Find the nearest edge
pixel and code irs
orientation. In case
of a tie. choose the
one with largest
, for smallest) code
value,
3. Continue until there
7 are no more boundary
6 pixels.
4

lbl Contour

Boundary pixel orientations: fAI, 76010655432421 ,


Chain code, A 111110000001 000 110 101101110011 010100010001

Figure 9•.20 Chain code for boundary representation.

v 8

u::::r::::L--L.l.:.LJ....:~_ t

. fal Contour (bl 8 \'S. t curve. Encode 8(t),

• Figure 9.21 Generalized chain coding. •

Sec. 9.6"· ,Soundary Representation 363


Fitting !.lna Segments [l]

Straight-line segments give simple ,approximation of curve boundaries. An interest-
ing sequential algorithm for-fitting a curve by line segments is as follows (Fig. 9.22).
'", . •, I

Algorithm. Approximate the curve by the line segment joining its end points
(A, B). If the distance from the farthest curve point (C) to the segment is greater
" ,

than a predetermined quantity,, join AC and BC. Repeat the procedure for new , '

segments AC and BC, and continue until the desired accuracy is reached.

a-Spline Representatipn [27-29J

H -splines are piecewise polynomial functions that can provide local approximations
of contours of shapes using a small number of parameters. This is useful because
human perception of shapes is deemed to be based on curvatures of parts of'
contours (or object surfaces) [30]. This results in compression of boundary data as
well as smoothing of coarsely digitized contours. H -splines have been used in shape
synthesis and analysis, computer graphics, and recognition' of parts from bound-
• •
anes,
Let t be a boundary curve parameter and let x(t) and yet) denote the given
boundary addresses. The B -spline representation is written as

x(t) ""
1
2:
t» 0
Pi B; k (t)
(9.34)
x(t) A (x (t),y (t)V,
where Pi are called the control points and the Bi,k (r), i == 0, 1, ... ,n, k = 1,2... are
called the normalized Brsplines oforder k. In computer graphics these functions are
also called basis splines or blending functions and can be generated via the recursion
A (t - l j) B j,k- l (t) (ti+k-t)Hi+l.k-l(t) k '3' ( )
B i k (t ) == + )' = 2, ,... 9.35a
, ti+k-\-ti (ti+k-ti+l

B (t)~ 1, tj:St<!I+I (9.35b)


t, I 0, otherwise
where we adopt the convention 0/0 ~ 0. The parameters ti, i == 0, 1, ... are called the
knots. These are locations where the spline functions are tied together. Associated

c ' D
c
,, - -- I
I /
I A A I
I I E
B '-./..., ' /
I
I
8 1:::::...,...-.
(2) (3)

Figure 9.22 Successive approximation by line segments.


364 Image Analysh. and Computer. Vision, Chap. 9



,
with the knots are nodes s.; which are defined as the mean locations of successive
k - 1 knots, that is, .
. -,
k ~2 (9.36)

The variable t is also called the node parameter, of which tl and s, are special values.
Figure 9.23 shows some of the B -spline functions. These functions are ncnnegative
and have finite support. In fact, for the normalized B-splines, 0 <"BI,k (rYo< 1 and the
region of support of Bi.k (r) is [ti, tl + k). The functions BI• k (t) form a basis in the space
of piecewise-polynomial functions. These functions are called open Bisplines or
closed (or periodic) Bssplines, depending on whether the boundary being repre-
sented is open or closed. The parameter k controls the order of continuity of the
curve. For example, for k "" 3 the splines are piecewise quadratic polynomials. For
k "" 4, these are cubic polynomials. In computer graphics k: = 3 or 4 is generally
found to be sufficient. .
When the knots are uniformly spaced, that is,
Vi (9.37a)

the Bi.k(t) are called uniform splines and they become translates ofBo,k (t), that is,
•• •

• ,
Bi.k(t) =BO,k(t -i), i =k -1,k, ... ,n - k + 1 (9.37b)
Near the boundaries BI.k(t) is obtained from (9.35). For uniform open B-splines
with I1t = 1, the knot values can be chosen as
o i<k
t;= i-k+l k cSi is n l ' (9.38)
•, ,
n -k +2 i>n "•

and for uniform periodic (or closed) B -splines, the knots can be chosen as
tl=imod(n+l) (9.39)

B1, ,, (t) = Bo,,,[(t - i)mod(n + 1)] (9.40)


8" 1 (tl

1 ;0
Constant
Unear

Sec. 9.6 Boundary Representation 355


For k = 1,2,3,4 and knots given by (9.39), the analyticforms of B n,dt) are pro-
vided in Table 9,7.

Control points. . The control points Pi are not only the series coefficients in
(9.34), they physically define vertices of a polygon that guides the splines to trace a
smooth curve (Fig, 9.24), Once the control points are given, it is straightforward to
obtain the curve trace x(t) via (9.34). The number of control points necessary to
reproduce a given boundary accurately is usually much less than the number of
points needed to trace a smooth curve. Data compression by factors of 10 to 1000
can be achieved, depending on the resolution and complexity of the shape.
A B-spline-generated boundary can be translated, scaled (zooming or shrink-
ing), or rotated by performing corresponding control point transformations as
follows:

Translation: Pi = Pi + XO, Xo = [xo, YoV (9.41)


Scaling: P= op., 0'. = scalar (9.42)
R~ cos 00 -sin 60
Rotation: Pi = Rp; - sin 00 cas 00
(9.43)

The transformation of (9.43) gives an anticlockwise rotation by an angle 60, Since


the object boundary can be reproduced via the control points, the latter constitute a
set of regenerative shape features. Many useful shape parameters, such as center of
mass, area, and perimeter, can be estimated easily from the control points (Problem
9.9).
Often, we are given the boundary points at discrete values of t = So, SI, ••. , s;
and we must find the control points Pi' Then
"
x(s;} "" L Pi Bi• k: (Sj)'
i~O
j =0,1, . . . ,n· (9.44)

which can be written as


(9.45)

TABLE 9.7 Uniform Periodic B -splines for 0 <: t 5 n


1, OSt<1 OSt<1
t,
Bo. I (t) = 0, otherwise Bo,2(t ) = 2 - t , 1St<2
0, otherwise

t'
-6' Ost<1
05t<1 -3t 3 + 12(2 -121 + 4
6 -, lst<2
1St<2 Bo•• (I) = 3t' - 24(2 + 60( - 44 2<t<3
25t<3 6 '
(4 - t)3
otherwise 3S t. <4
6 '
0, otherwise'
p .

366 Image Analysis and Computer Vision Chap. 9


, ,

lal Ibl

._,' ••_Al." ,
Control /"... '....
......................... _ , / points "./ '1

j"
.",.....
:
*,•••••••••
...
. .\ '
......' ••••• • ••••••••_...
....
'"

,..
•.•' ."
v ' . . .....)i....
tI.
.. """ .
", ," .. ~. -
•.........i
I
r
l
•••• ,,~,
..
t "
..,
","',
.

:- • : ," ..... 'llo ..


',": ~,'\' I.'{ . ~~ 1 ;J#'~ :-,

I ",
. ""

t..
Y

\
e,.... ,.. ,............
j
.
,.;l
"'-. .
"...
/ : . . . " """"
.'
r r ... •'t"""l.
,
\r
"1\"
... /
"'"
-,
......
.-..;

r'
'"
.•'
..•. -......,.
'.
..~.
l
\ ,'.:/~
f'.' "", :'
• -.f
.~
.~

\
.
~
.......,.... ... -..-..-............•
"
!
.J

.
,."., '-" i
,"
''J
...
..
1_"
.. .
",
..\;..... ........
" ~ ..•..
.-I ? "-'1
4'

"
./
J

" -'. " .


.... ......,
•• ~ •.•..•.. ·•··
n,," •7 ..

leI ldl

Figure 9.24 (a), (b) B-spline curves fitted through 128 and. 11M points of the
original boundaries containing 1038 and 10536 points respectively to yield indistin-
guishable reproduction: (c), (d) corresponding 16 and 99 control points respec-
tively, Since (a), (b) can be reproduced from (c), (d). compression ratios of greater _
than 100: 1 are achieved,

where Bk> P, and x are (n + 1) x (n + 1), (n + 1) x 2, and (n + 1) x 2 matrices of ,


elements Bj,k(Sj), Pi> x(Sj) respectively. When Sj are the node locations the matrix Bk
is guaranteed to be nonsingular and the control-point array is obtained as
P=B;I X (9.46)
For uniformly sampled closed splines, Bk becomes a circulant matrix, whose first-
row is given by , '

, bo~ [bob l . , . b4, O... 0, bq • • • bl1 • (9.47)


where
, rek -1)
,j=o, ... ,q, and q = Integer 2 l"
, Sj~ 1 - Sj =; fj + 1 - tj = constant, 'Vj
In the case of open B -splines, Bk is nearly Toeplitz when Sj = fi' for every j.

Sec. 9.6" ' Boundary Representation 367




Example 9.4 Quadratic B.splincs, k ;:: 3 .
(a) Periodic case: From (9.36) and (9.39) the nodes (sampling locations) are

3 5 7 . 2n - 1 n 1]

[sO,S"S2,: .. ,Sn]= [ 2"'2"'2'''''' 2 '22

Then from (9.47), the blending function B".3 (t) gives the circulant matrix
1
1
B3 = &
1
6

(b) Nonperiodic case: From (9.36) lind ,(9.38) the knots and nodes are obtained as
[to, t., ... ,tn ... 3] = [0,0,0,1,2,3, ... .n - 2, n - 1, n - 1, n - I}

The nonperiodic blending functions for k = 3 are obtained as

B
0.3
(r) =- (t ~ 1)', O:=;t< 1 .
0, 1:::;;tSn-1
-~(t-~r+;, OSl<l

~Y-2Y, lSI<Z
0, ZStSn-1

0, O:::;;t<j-Z

~ (t - j + 2)" j-2Sl<j-1

Bj.J(t)= -(t-j+iY +~, j-1s«j



1 (t - j _1)2, 'jS«j+l
2
0, j + 1 es t <n - 1, j ;: 2, 3, ... .n - Z

OSt<n-3
n -3$( <n-2 ,
.
n-2S($n-1


0$«n-2
• n-2StSn-l •

368 Image Analysis and Computer Vision ' Chap, 9



From these we obtain
n +1
8
2 5 1 0
1 6 1
B3 =k <,,""-"<, n+1
1 6 1
o 1 5 2 .
sJ
• .
Figure 9.25 shows a set' of
uniformly sampled spline boundary points X(Sj),
j = 0, ... ,13. Observe that the sj, and not X(Si),Y(Si), are required to be uniformly
spaced. Periodic and open quadratic B -spline interpolations .are shown in Fig. 9.25b
and c. Note that in the case of open B -splines, the endpoints of the curve are also
control points.

The foregoing method of extracting control points from uniformly sampled


boundaries has one remaining difficulty-it requires the number of control points
be equal to the number of sampled points. In practice, we often have a large number
of finely sampled points on the contour and, as is evident from Fig. 9.24, the
number of control points necessary to represent the contour accurately may be
much smaller. Therefore, weare given x(t) for t = /';0, ~ b' .. , ~m, where m if;> n.
Then we have an overdetermined (m + 1) x (n + 1) system of equations
n
x(~j) = 2: Bi.k(~j)P;, j =O, ... .m (9.48)
i=O

Least squares techniques can now be applied to estimate the control points Pi'
With proper indexing of the sampling points ~i and letting the ratio (m + l)/(n + 1)

• •
I

• • j
• • Control
points


... /~ •
~ Y
• •
• i
\ •" .

• •
• •

(bl (el
.' .. "

Figure 9.25 (a) Given points; (b) quadratic periodic B-spline interpolation;
(c) quadratic nonperiodic B-spline interpolation.

. .
~ ,

Sec. 9.6 Boundary Representation 369


be an integer, the least squares solution can be shown to require inversion of a
. circulant matrix, for which fast algorithms are available [29].

Fourier Descriptors

Once the boundary trace is known, we can consider it as a pair of waveforms


x(t),y(t). Hence any of the traditional one-dimensional signal representation tech-
niques can be used. For any sampled boundary we can define
u(n) ~ x(n) + jy(n), n = 0, 1, ... ,N -1 (9.49)
which, for a closed boundary, would be periodic with period N. Its DFT repre- .
• •
sentatton IS .
1 N-l(j2'1Tkn).
u(n)~uLa(k)exp N ' OsnsN-1 •
1" k ",,0 \
(9.50)
A N-l (-j21Tkn)
a(k) =
,
nLo u(n) exp N '

The complex coefficients a (k) are called the Fourier descriptors (FDs) of the bound-
ary. For a continuous boundary function, u (t), defined in a similar manner to (9.49), .
the FDs are its (infinite) Fourier series coefficients. Fourier descriptors have been
found useful in character recognition problems [32]. i ·

Effect of geometric transformations. Several geometrical transforma-


tions of a boundary or shape can be related to simple operations on the FDs (Table
9.8). If the boundary is translated by
A ..
Uo = Xo + JYo (9.51)
then the new FDs remain the same except at k = O. The effect of scaling, that is, •
shrinking or expanding the boundary results in scaling of the a(k). Changing the
starting point in tracing the boundary results in a modulation of the a (k). Rotation
of the boundary by an angle 60 causes a constant phase shift of 60 in the FDs.
Reflection of the boundary (or shape) about a straight line inclined at an angle 6
(Fig. 9.26),
• Ax+By +C=O (9.52)
TABLE 9.8 Properties of Fourier Descriptors
,
Transformation Boundary Fourier Descriptors
Identity ·u(n) a(k)
Translation u(n) = u(n) + Uo ii(k) = a (I;;) + Un8(k)
Scaling or Zooming u(n) = «u(n) ii(k) = aa(k)
Starting Point u(n)=u(n -no) ii(k) = a(k)e~j2~""k1N

Rotation u(n) = u(n)e i •• ii(k) = a(k)e iOo
Reflection u(n) = u*(n)ej2ll + 20y ii(k) = a* (-k) ei2<l + 2oy8(k)

310 Image Analysis and Computer Vision . Chap. 9




y

(x,y)
\,
\
\
,,
c
-- \,
8 «x. V)

_~:::"_I..-_-.,-_...:ll. ~_x

FIgure 9.26 Reflection about a straight


--A
C
line.

gives the new boundary x(n),ji (n) as (Problem 9.11)

it (n) "" u*(n)e j2G + 20y

where (9.53)
-(A + jB)C
~ . -(A + jB)2
'Y = A2 + B 2 ' exp(j26) = A 2 + B2 •

For example, if the line (9.52) is the x-axls, that is, A = C "" 0, then e = 0, 'Y = 0 and
the new FDs are the complex conjugates of the old ones. .
Fourier descriptors are also regenerative shape features. The number of de-
scriptors needed for reconstruction depends on the shape and the desired accuracy.
, Figure 9.27 shows the effect of truncation and quantization of the FDs. From Table
9.8 it can be observed that the FD magnitudes have some invariant properties. For
example Iii (k)l, k = 1,2, ... ,N -1 are invariant to starting point, rotation, and
reflection. The features Ii (k)/Ici (k)1 are invariant to scaling. These properties can be
used in detecting shapes regardless of their size, orientation, and so on. However,
the FD magnitude or phase alone are generally inadequate for reconstruction of the
original shape (Fig. 9.27).

Boundary matching. The Fourier descriptors can be used to match similar


shapes even if they have different size and orientation. If a(k) and b(k) are the FDs
of two boundaries u(n) and v(n), respectively, then their shapes are similar if the
distance .

(9.54)

.
, is small. The parameters uo, tt, nQ, and eQ are chosen to minimize
, .
the effects of
translation, scaling, starting points and rotation, respectively. If u(n) and v(n) ate
normalized so that Iu(n) = Iv(n) = 0, then fora given shiftllo, the above distance is

Sec. 9.6 'Boundary Representation 371


. '.

&11.13

• FD real part

• o

-59.13 '--'----~----------_~
1 128

69Jl1

FO imaginary part
"-

(a] •

-69.61 \
128
(b)

(c) rdl

• •
reI

Figure 9.27 Fourier descriptors. (a) Given shape; (b) FDs, real and imaginary
.components; (c) shape derived from largest five FDs; (d) derived from a1I1'Os
quantized to 171evels'each; (el amplitude reconstruction; (f) phase reconstruction .
• ,

• Image Analysis and Computer Vision Chap. 9


372
>

minimum when
,
,
::E c(k) COS(lJIk +kep + 60)
k .

(9.55)
and
Le(k) sin($k + k<?)
k . .
tan eo = -
L c(k) COS($k + k<?)
k

where a(k)b* (k) = c(k)ei<l<k, <?.<l -2Tin oIN, and c(k) is a real quantity. These equa-
tions give a and eo, from which the minimum distance d is given by

. d = min [d(<?)] = min lr L la(k) - ab (k) exp[j(k<f> + eo}]i21 (9.56)


'" 01> k . ' . J
Thedistancezifdr) can be. evaluated for each <? = <j>(nu),no "" 0,1, ... ,N -1 and the
minimum searched to obtain d. The quantity d is then a useful measure of difference
between two shapes. The FDs can also be used for analysis of line patterns or open
curves, skeletonization of patterns, computation of area of a surface, and so on (see
Problem 9.12).
Instead of using two functions x(t) and yet); it is possible to use only one
function when t represents the arc length along the boundary curve. Defining the
arc tangent angle (see Fig. 9.21)
-1 dy (t)ldt
. O(t) = tan dx(t)ldt (9.57)
,
the curve can be traced ifx(O), yeO), and O(t) are known. Since tis the distance along
the curve, it is true that

dP= dx +dy 2 =} (:!Jt


2
r + (~r = 1 (9.58)

which gives dxldt = cos e(t), dyldt = sin e(t), or


x(t)=x(O)+i' C?S6(T) d-t (9.59)
, .' . 0 Sme(T)

\
Sometimes, the FDs of the curvature of the boundary .
. () A dO(t)
K t = dt (9.60)

or those of the detrended function


(9.61)

• •
Sec. 9.6 Boundary Representation 373
are used [31]. The latter has the advantage that () (t) does not have the singularities.
at corner points that are encountered in polygon shapes. Although we have now
only a real scalar set of FDs, their rate of decay is found to be much slower than
those of u (t). .

Autoregressive Models

If we are given a class of object boundaries for instance, screwdrivers of different


sizes with arbitrary orientations then we have an ensemble of boundaries that
could be represented by a stochastic model. For instance, the boundary coordinates
x(n),y(n) could be represented by AR processes [33]
p

Ui(n) = 2: ai(k)u,(n -k)+ ei(n)


k"'" 1 •
(9.62)
x,(n) ""'Ui(n) +p.." i==1,2 1

where x(n) !J. XI (n) and yen) ~ X2 (n). Here u, (n) is a zero mean stationary random
sequence, lJ.1 is the ensemble mean of Xi (n), and !lien) is an uncorrelated sequence
with zero mean and variance f3~, For simplicity we assume 101 (n) and 1>2 (n) to be
independent, so that the coordinates XJ (n) and X2 (n) can be processed indepen-

dently. For closed boundaries the covariances of the sequences (x; (n)};i = 1,2 will
be periodic. The AR model parameters a, (k), a7, and fJ., can be considered as
features of the given ensemble

of shapes. These features can be estimated from a
given boundary data set by following the procedures of Chapter 6.

Properties of AR features. Table 9.9 lists the effect of different geometric


transformations on the AR model parameters. The features a;(k) are invariant
under translation, scaling, and starting point. This is because the underlying corre-

lations of the sequences x, (n). which determine the a, (k), arc also invariant under
these transformations. The feature £31 is sensitive to scaling and j.l.; is sensitive to
scaling as well as translation. In the case of rotation the sum lal (k)f + 102 (k)1 2
be, can
shown to remain invariant. . ,
AR models are also regenerative. Given the features {a; (k), fl.;, {)7} and the
residuals E, (k), the boundary can be reconstructed. The AR model, once identified
,
TABLE 9.9 Properties of AR Model Parameters for Closed Boundaries
,. ,
ARmodel
Parameters

Transformation x;(n) ii,(k) -J.!.i
Identity xi(n) Ei(n) /liCk) \.i'i j.l.i
Translation x,(n) + XI,n Ei(n) /liCk) f31 p.; + Xi. 0
Scaling/zooming axl(n) aE.{n) ai(k) a '~i !llLi
Starting point x.(n + 110•• ) E.(n + no,.) a.(k) \.i'i j.\-i
Rotation lal(k)I' + lii,(kW = la,(k)1 + laz(k)1
2 2

374 Image ,A,nalysls and Computer Vision .Chap, 9 .



,

for a class of objects, can also be used for compression of boundary data,
Xl (n),Xl (n) via the DPCM method (see Chapter 11).
,


9.7 REGION REPRESENTATION

The shape of an object may be directly represented by the region it occupies. For
example, the binary array

u(mn) = 1, if (m,~) E r7t (9.63)


, 0, otherwise
is a simple representation of the region ",1'. Boundaries give an efficient representa-
tion of regions because only a subset of u(m, n) is stored. Other forms of region
representation are discussed next.

. Run-length Codes


Any region or a binary image can be viewed as a sequence of alternating strings of Os
and Is. Run-length codes represent these strings, or runs. For raster scanned
regions. a Simple run-length code consists of the start address of each string of Is (or
Os), followed by the length of that string (Fig. 9.28). There are several forms of
run-length codes that are aimed at minimizing the number of bits required to
represent binary images. Details are discussed in Section 11.9. The run-length codes
have the advantage that regardless of the complexity of the region, its representa-
tion is obtained in a single raster scan. The main disadvantage is that it does not give
the region boundary points ordered along its contours, as in chain coding. This
makes it difficult to segment different regions if several are present in an image.

Quad-trees [34]

In the quad-tree method, the given region is enclosed in a convenient rectangular


area. This area is divided into four quadrants, each of which is examined if it is
o 1 2· 3 4 Run·lengthcode Run #
11,111,(1.312 1,2
o (2,0)4 3
(3,112 4


(a) Binary image
"
Figure 9.28 Run-length coding for binary image boundary ~ epresentation.

Sec. 9.7 : Region Representation, 375


totally black (15) or totally white (Os). The quadrant that has both black as well as
white pixels is called gray and is further divided into four quadrants. A tree struc-
.ture is generated until each subquadrant is either black only or white oniy. The tree
can be encoded by a unique 'string of symbols b (black), w (white), and g (gray),
where each g is necessarily followed by four symbols or groups of four svmbols
• • •
representing the subquadrants; see, for example, Fig. 9.29. It appears that quad-
tree coding would be more efficient than run-length coding from a data-
compression standpoint. However, computation of shape measurements such as
perimeter and moments as well as image segmentation may be more difficult.

Projections

A two-dimensional shape or region 9Z can be represented by its projections. A


projection g(s, 6) is simply the sum of the run-lengths of 15 along a straight line
oriented at angle e and placed at a distance s (Fig. ,9.30). In this sense a projection is .
simply a histogram that gives the number of pixels that project into a bin at distance
s along a line of orientation e. Features of this histogram are useful in shape analysis
as well as image segmentation. For example, the first moments of g(s, 0) and,
g(s, 11/2) give the center of mass coordinates of the region 9Z. Higher order mo-
ments of g(s, 6) can be used for calculating the moment invariants of shape dis-
cussed in section 9.8. Other features such as the region of support, the local
maxima, and minima of g(s, 6) can be used to determine the bounding rectangles
and convex hulls of shapes, which are, in turn, useful in image segmentation
problems [see Ref. 11 in Chapter 10J. Projections can also serve as regenerative'
features of an object. The theory of reconstruction of an object from its projections
is considered in detail in Chapter 10.

3 2
,

I I H

F I H G I
D C
Code: gbghwwbwghwwgbwwb
Decode as: g(bg(bwwblwg(bwwgtbwwblll
• 4 3 2 1

(a) Different quadrants (hI Quad tree encoding

Figure 9.t9 Quad-tree representation of regions.

376 Image Analysis and Computer Vision Chap. 9



" +'7'
I

Figure 9.30 Projection g(s. 0) of are- <

gion 91'. g(s. a) = I, + Ii.

9.8 MOMENT REPRESENTA1'/ON


<

The theory of moments provides an interesting and sometimes useful alternative to


series expansions for representing shape of objects. Here we discuss the use of
moments as features of an objectf(x, y).

Definitions

Let f(x, y) 2: 0 be a real bounded function with support on a finite region !J(. We
define its (p + q)th-order moment

m p • Q"" fff(x,y)XPyQdXdy, p,q=O,1,2... (9.64)


.'/'

Note that setting/ex, y) = 1 gives the moments of the region 9? that could represent
a shape. Thus the results presented here would be applicable to arbitrary objects as
well as their shapes. Without loss of generality we can assume thatj'(z, y} is nonzero
only in the region > {x E(-1, l),y E(-1, I)}. Then higher-order moments will, in
general, have increasingly smaller magnitudes. <

The characteristic function of f(x, y) is defined as its conjugate Fourier trans-


fu~ .

F"(~1> ~2) ~ JL,f(X, y) exp{j2'lT(X~ 1+ Y~2)}dxdy (9.65)

The moment-generating function of f(x, y) is defined as



(9.66)

Sec. 9.8 Moment Representation 317


It gives the moments as


(9.67)

Moment Representation Theorem [35J

The infinite set of moments {mp,q,p, q = 0, 1, ...} uniquely determine f(x, y), and

vice-versa.
The proof is obtained by expanding into power series the exponential term in
(9.65), interchanging the order of integration and summation, using (9.64) and
taking Fourier transform of both sides. This yields the reconstruction formula

f(x, y) = iF e-
j2
#«t, +,yt,) ~Oq~O m:.q.(j~~;!+q £~,~"" d~ 1 d£2 (9.68)

Unfortunately this formula is not practical because we cannot interchange the order
of integration and summation due to the fact that the Fourier transform of (j217£ tY
is not bounded. Therefore, we cannot truncate the series in (9.68) to find an
approximation of/ex, y).

Moment Matching

In spite of the foregoing difficulty, if we know the moments of/ex, y) up to a given


order N, it is possible to find a continuous function g (x, y) whose moments of order
up to p + q = N match those of I(x, y), that is,

(9.69)

The coefficients gr.! can be found by matching the moments, that is, by setting the
moments of g(x. y) equal to mp,q' A disadvantage of this approach is that the co-
efficientsgi • i , once determined, change if more moments are included, meaning that
we must solve a coupled set of equations which grows in size with N.
Example 9.5
For N = 3, we obtain 10 algebraic equations (p + q <: 3). (Show!)

1 ! ! go. 0
, 0 1710 , 0

!3 "~ .1 g2,0 ..
-. I m 2,0"
1 1 1
3 ;; S. gO.2 IVo.>
• (9.70)
1
J ' 5
1
5
1

I
7
•15
1
I
g,.0


g3,o ::=4
•• ml,O

m3.0
1 1 I
ii IS 15 1,2 mt.2

8 r.r "": m•.•


where three additional equations are obtained by interchanging the indices in (9.70).

378 . Image Anal'r-sis and Computer V!$ion en"p, 9


Orthogonal Moments
p
The moments m; q are the projections of f(x, y) onto monomials {x y'1}, which are
nonorthogonal. An alternative is to use the orthogonal Legendre polynomials (36),
defined as .
Po (x) = 1

n = 1,2 ... (9.71)

I 2
I-1
P.(x)Pm(x)dx = 2
n+
1 &(rn - n)

Now f(x, y) has the representation


., .,.
f(x,y) =2: 2: Ap,qPP(x)Pq(y)
p=Oq=O (9.72)

AM
= (2p + 1)(2q + 1)
4 ff'f(X, Y)P. (x )P. ( Y)rixdY
p q
-,
where the Ap,q are called the orthogonal moments. Writing Pm (x) as an mth-order
polynomial
m
Pm (x) = 2: Cm,;xJ (9.73)
i""'O

the relationship between the orthogonal moments and the mp , ,, is .obtained by


substituting (9.72) in (9.73), as
p q
_ (2p + 1)(2q + 1)
Ap,q- 4 2: 2:
'=Ok=O
Cp,jCq.kmj.k (9;74)

For example, this gives


,_3 , - 3m
AO,O = ~ mo,o. "1.0 -;i m 1,0, "0,1 - 4 0, I
(9.75)
A2,o = ~ [3m2,O - mo,o], i
Ao,2 = [3mO,2 - mo,o),

The orthogonal moments depend on the usual moments, which are at most of the
same order, and vice versa. Now an approximation to f(x, y) can be obtained by
truncating (9.72) at a given finite order p + q = N, that is,

N N-p
f(x,y)=g(x,y) = 2: 2: Ap.qPP(X)Pq(y) (9,76)
p=Oq=O

The preceding equation is the same as (9.69) except that the different terms have
been regrouped. The advantage of this representation is that the equations for >"p, q
are decoupled (see Eqn. (9.74» so that unlike mp,q, the Ap• q are not required. to be
updated as the order Nis increased.

. ~, 0:8 Mt.ment R9prescntAlion


, .. 319

Moment Invariants •• •

• •

These refer to certain functions of moments, which are invariant to geometric


transformationssuch as translation, scaling, and rotation. Such features are useful
in identification of objects with unique shapes regardless of their location, size, and
orientation.

Translation. Under a translation of coordinates, x I =X + a, y =Y + j3., the


I

central moments
Ikp.q = Jf(x -:X)p (y - y)qf(x, y) dx dy (9.77)
are invariants, where x ~ ml,olmo,o,y ~ mo, I Imo,o, In the sequel we will consider
only the central moments.

Scaling. Under a scale change, x ' = ax, y ,= ay the moments of f(ax, oy)
Q 2
change to Ik;,q =" Ikp,qla + + , The normalized moments, defined as
P .
I
. _ Ikp.q
l)p, Q -r- ( I )'1' "! = (p + q + 2)/2 (9.78)
!J.o,o
are then invariant to size change.

Rotation and reflection. Under a linear coordinate transformation

X' _ a j3. x
y'
- "! 1> Y
(9.79)

the moment-generating function will change. Via the theory of algebraic invariants
[37], it is possible to find certain polynomials of Ikp, q that remain unchanged under
the transformation of (9.79). For example, some moment invariants with respect to
rotation (that is, for a = 1> = cos (1, j3. = -I' = sin (1) and reflection (a = -8 = cos 6,
j3. = 'Y = sin 6) are given as follows:

1. For first-order moments, 1ko.1 = J.lol.O = 0, (always invariant).
2. For second-order moments, (p + q = 2), the invariants are
$1 = 1k2.0 + ....0.2
• (9.80) ,
<l>z = ( !J.2,O - "!J.0.2
)2 2
+ 4P;1.1
3. For third-order moments (p + q = 3), the invariants are
$3 = (!J.3,O - 3J.loI, 2)2 + (IkO.3 ..... 31k2,li
. $4 = (J.lo3.0 + Ikl,2)2 + (IkO,3 + J.lo2.1)2

2
<P5 = (113,0 - 3Ikd(113.0 + Ikd[(113.0 + Ikd - 3(J.l.2,1 + 1ko.3)2] (9.81)
, -t (/ko.3- 31k2,1)(1ko,3 +'.Ik2,I)[(J.loo,3 + 1k2,1)2 - 3(J.loI,2 + 1-L),o)ZJ
2
4>6 = (112:0 -Iko,2}(J.lo3.0 + Ikd - (J.lo2.1+ 1ko.3)2] + 41kl,l (J.lo3.0 + jJ.d(110,3 + 1k2.1)

380 Image Analysis and, ComputerVision Chap. 9


,
It can be shown that for Nth-order moments (N > 3), there are (N + 1) absolute
invariant moments, which remain unchanged under both reflection and rotation
[35]. A number o~f other moments can be found that are invariant in absolute value,
in the sense that they remain unchanged under rotation but change sign under
reflection. For example, for third-order moments, we have
,
$7 = (31J-2.1 -IJ-O.3)(1-t3,O + /k1.2)[(1J-3.0 + 1J-1,2)2 - 3(1J-2.1 + lJ-o,3)2] ~ (9.82)
+ (1J-3.0 - 3/k2,1)(1J-1.1 + IJ-b.3)[(IJ-o.) + 1J-2. Ii - 3(1J-3.0 + /kl,2n

The relationship between invariant moments and IJ-p,q becomes more complicated
for higher-order moments. Moment invariants can be, expressed more conveniently
in terms of what are called Zernike moments. These moments are defined as the
projections of I(x, y) on a class of polynomials, called Zernike polynomials [36].
, These polynomials are separable in the polar coordinates and are orthogonal over
the unit circle.

Applications,- of Moment
, .
Invariants

Being invariant under linear coordinate transformations, the moment invariants are
useful features in pattern-recognition problems. Using N moments, for instance, an
image can be represented as a point in an N-dimensiomil vector space. This con-
verts the pattern recognition problem into a standard decision theory problem, for
which several approaches are available. For binary digital images we can set
f(x, y) = 1, (x, y) E ~. Then the moment calculation reduces to the separable
, computation

mp,q.= L x'' 2: yq (9.83)


x y

These moments are useful for shape analysis. Moments can also "be computed
optically [38] at high speeds. Moments have been used in distinguishing between
shapes of different aircraft, character recognition, and scene-matching applications
[39,40J. '

9.9 STRUCTURE

In many computer vision applications, the objects in a scene can be characterized


satisfactorily by structures composed of line or arc patterns. Examples include
handwritten' or printed characters, fingerprint ridge patterns, chromosomes and
biological cell structures, circuit diagrams and engineering drawings, and the like.
In such situations the thickness of the pattern strokes does not contribute to the
recognition process. In this section we present several transformations that are
useful for analysis of structure of patterns.

Medial Axis Transform


, , '

Suppose that a fire line propagates with constant speed from the contour of a
connected Object towards its inside. Then all those points lying in positions where at

'Sec. 9.9 ' Structure 381


least two wave fronts of the fire line meet during the propagation (quench points)
will constitute a form of a skeleton called the medial axis [41] of the object.
Algorithms used to obtain the medial axis Can be grouped into two main
categories, depending on the kind of information preserved:

Skeleton algorithms. Here the image is described using an intrinsic co-


ordinate system. Every point is specified by giving its distance from •
the nearest
boundary point. The skeleton is defined as the set of points whose distance from the
nearest boundary is locally maximum. Skeletons can be obtained using the follow-
ing algorithm: .

1. Distance transform
Uk (m, n) = Uo (m, n) + min {uk-t.(i, j); ((I, j):A(m, n; i,j) :s; I)},
. (9.84)
A A(m.n;I.J! .
uo(m,n)= u(m, n) k-l,2,... I •

where A(m, n; i, j) is the distance between (m, n) and (I, j). The transform is
done when k equals the maximum thickness of the region.

2. The skeleton is the set of points:


{(m,n):uk(m,n)Z:.uk(i,j),A,(m,n;i,n:s; I} (9.85)
Figure 9.31 shows an example of the preceding algorithm when A(m, n; i, j)
represents the Euclidean distance. It is possible to recover the original image
given its skeleton and the distance of each skeleton point to its contour. It is
simply obtained by taking the union of the circular neighborhoods centered on
the skeleton points and having radii equal to the associated contour distance.
Thus the skeleton is a regenerative representation of an object.

Thinning algorithms. Thinning algorithms transform an object to a set of


simple digital arcs, which lie roughly along their medial axes. The structure ob-

1 ·1 1 1 , 1 1 1111111 1111111 1111111 1 1 • •


111 1 111 1 2 2 222 1 1 222 221 1222221 2 2 • •
1111111 • 1 2 2 222 1 .. 1233321 • 1 2 3 332 1 • 333 »Or •••
111 1 1 1 1 k~1 1222221 "=21222221 k~3.1222221 2 2 • •

1111111 1111111 1111111 4.5 1111111 1 •

1 • •
uolm. nl u, {m.1l1 u2{m, III u3 ' u4 • Uti • Skeleton
"

FillUre 9.31 Skeleton examples.

382 Image Analysis and Computer Vision Chap. 9



,
P3 P2 Pg
p. P, p.

Ps Pe P7

(a) labelinG point P, and its neighbors.

1 1 0 0 0 0 1 0 .1 ,
,
1 P, 1 1 P, 0 0 P, 0
0 0 0 0 0 0 1 1 1

(il Iii) (iii I

(h) Examples where P, is not deletable IP, = 1).


(i) Deleting P, will tend to split the region;
(ii) deleting P, will shorten arc ends;
(im 2 <: NZIP,) <:6 but P, is not deletable.

.,

liil
(c) Example of thinning.
Ol Original; •

(ii) thinned. Figure 9.32 A thinning algorithm.

rained is not influenced by small contour inflections that may be present on the
initial contour. The basic approach [42] is to delete from the object X simple border
'points that have more, than one neighbor in X and whose deletion does not locally
disconnect X. Here a connected region is defined as or.e in which any two points in
the region can be connected by a curve that lies entirely in the region. In this way,
endpoints of thin arcs are not deleted. A simple algorithm that yields connected arcs
while being insensitive to contour noise is as follows [43].
Referring to Figure 9.32a, let ZO(PI ) be the number of zero to nonzero
transitions in the ordered set P,., f>:J, P4' ••. ,P9 , P2• Let NZ (1\) be .the number of
nonzero neighbors of Pt. Then PI is deleted if (Fig. 9.32b)
2<NZ(PI ) s 6 ~ •

and ZO(Pj ) -1
(9.86)
and P2 • P4 • Pg = 0 or ZO(P2 ) oF 1
and Pz ' P4 • P6=O or ZO(~) oF 1
• The procedure is repeated until no further changes occur in. the ir.rage. Figure 9.32c
. gives an example of applying this algorithm. Note that at each location such as ~ we
end up examining pixels from a 5 x 5 neighborhood.

Sec.9.B . Structure

"The term morphology originally comes from the study of forms of plants and
.animals. In our context we mean study of topology or structure of objects from their
images. Morphological processing refers to certain operations where an object is hit
with a structuring element and thereby reduced to a more revealing shape.

Bas:c operations. Most morphological operations can be defined in terms'


of two basic operations, erosion and dilation [44]. Suppose the Object X and the
structuring element 13 are represented as sets in two-dimensional Euclidean space.
Let B, denote the translation of 13 so that its origin is located at x. Then the erosion
of X by 13 is defined as the set of all points x such that Ex is included in X, that is,
Erosion: X 8 B ~ {x: HxC X} (9.87)
Similarly, the dilation of X by 13 is defined as the set of all points x such that B, hits
X, that is, they have a nonempty intersection:
Dilation: X(j1B ~ {x: Bx n X =F <l>} (9.88)
Figure 9.33 shows examples of erosion and dilation. Clearly, erosion is a shrinking
operation, whereas dilation is an expansion operation. It is also obvious that ero-
sion of an object is accompanied by enlargement or dilation of the background.

Properties. The erosion and dilation operations have the following proper-
ties:

1. They are translation invariant, that is, a translation of the object causes the
same shift in the result.

2. They are not inverses of each other.
3. Distributivity:
X(j1(B UB ') = (XEBB)U (X(j1B /)
(9.89)
X8(B U B')=(X8B) n (X8B'}

4. Local knowledge:
(X n Z)8B=(X8E) n (Z8B) (9.90)
, .

S. Iteration:
(X8B)8B ' =X8(EEBB')
(9.91)
(X (j1 B) (j1B' = X EEl (B 83 B')

384 Image Analysis, and Computer Vision. Chllp.9



I
OBJECT STRUCTURE •
ELEMENT
$lllE'ilI' 0000
e.~",
0""''''
ecee
4tHIH. ill'
OOllo.
0"".
oc~.r; ooee
/l'il:O@QOO 0000"."

""""
o (l'IHI
OOl'Ill'"
Oiflllllll

DILATE

,- . . ,.. ill' llHHIHI "


<llC'llIil1'
1l>III(i1<l1 1:'1&""
" q. 'Hi"I,
ltJ.:roofl:
~ 0 4) ill' ill'
o~ e"'~VOeXfO •
'" I/ii q • <,"
GHI'$O'>
.Ill. " .. e
SGltCe.
ill' 0 il"I"" •
OOGOlfoe 'Ill" liHHHH.O

.....
o .. e. o"oeo" •
'lIl0.00
""ll>.
HIT· MISS
.0"1> , 0000

. " ••
CIlO".
IHn~ .. 0000
0000
0000
••••
",e • • • • •
1~iJf-c.'1infl
IN comers} . 0000
0.000 a 0"
••••e
\) IIHt
0·0 00
0000

(}3JECT STflUCTURE
ELEMEI'~'l'

•••• e • II"" 1Il Ii>
c ••• o .0000
4110 • • • eooo<ll
o •••• >ltOCQl QG.~eOOO.

ill' 41H/UIHIP <If " II"" "'0000000<11


e" • • • • • oo • III " 1Il IHIHi' e •


~.
fbC$$
e\i!l;)ee lI'O(ll.Clll
CHIP G """ : / ' ('rigl" • 'lHI Ii' "Hll III !IJ
'1"1> '" " 1II O4JI$O.b.OtJO
<lHI' " " e e e <II I) It e e I1HH' llH!HlHI' G
Ill'll! IH~ llH~
e e e e e e e e.e
'1Ht ..
<1/
" It <I"" III <ilHI "..., IHI!
<lHlHHIH1HlHIH' '"
IIHI • 1Il '" IlHII
'" IHH;Hi'

IH!HII Figure '9.33 Examples of some morpho-
, 1Il logical operations.
,

. Sec. 9.9 Structure 385


QfiJECT STRUCTURE OPEN
ELEMENT

o
••••••••"
••• ...... .
111I1 e e II e II II
.
o IHIHIL It • • • • • • • 0 .. 11"'1'1> •
16;eak sma~ ttlands
trnd sr:lCQln boUfldio!y)


CtOSE

..... •
........

. '" .
.........
.......... ••
• .1!--
-. .....
•• ••••••••
•••
(9101::l1: up
••••••••
naHOW
cM"r-<l!s)
,
,
OBJECT SKELETON

••••••• 0000'000
011I000.0
••••
0 ••••• "

......." ... G 000"'000

........ • .y"""
00000
00.00
•••
.... •••
•••
o ill 0

.. 000
• •

• $J<.nh:1¢ii hlilt 00 dillt,:(lFlr.ectlld Oil
a di{itllt! Wid ('lililWl if 1M ObjIlC!
is. OOflflOCICdi·
,

OBJECT STRUCTURE
• ELEMENT THIN
.. •

"
...............
IIHHiHIII • IIHIL • • • • • • • .00000000<1>000.•
.000000001il00.0

...............
IliHI>" . . . . . " • • " • • •

••• • • • • • 11> • • •
0
0.0000000000.0
.00 00000000.
0

OBJECT STRU;TURE PRUNE


ElEMENT
o
'" •••••••••••••••••
..... 0

••••
THICK
OBJECT


STRUCTURE
ELEMENT
• •
••
• .................
........... " •••
• 'IHII • 4HI> • • • • • /III /III •

............... "• V £I.ili'" .


• '" • • '" '" II """IHI"IHII
.
• • • iii> III •.• III III • • • III '" 000 '" •••••••••••••••
••••••••••••••••• •

••• • •••••••• • •••••••••••••••


FIgure 9.33 Conl'd.

386 Image Analysis and Computer VisiEln Chap. 9


,

7. Duality: Let X" denote the complement of X. Then


XC(BB = (X8BY (9.93)

This means erosion and dilation are duals with respect to the complement oper-
• •
anon,

Morphological Transforms
The medial axes transforms and thinning operations are just two examples of
morphological transforms. Table 9.10 lists several useful morphological transforms
that are derived
, .
from the basic erosion and dilation operations. The hit-miss trans-
form tests whether or not the structure Bob belongs to X and Bbk belongs to Xc. The
opening of X with respect to B, denoted by XII' defines the domain twept by all
translates of B that are included in X. Closing is the dual .of opening. Boundary gives
the boundary pixels of the object, but they are not ordered along its contour. This
table also shows how the morphological opera~ons could be used to obtain the
. previously defined skeletonizing and thinning transformations. Thickening is the
dual of thinning. Pruning operation smooths skeletons or thinned objects by re-
moving parasitic branches.
Figure 9.33 shows examples of morphological transforms. Figure 9.34 shows
:'00 npplleation of morphological processing in a printed circuit board inspection
application. The observed image is binarized by thresholding and is reduced t-o a
single-pixel-wide contour image by the thinning transform. The result is pruned to
obtain clean line segments, which can be used for inspection of faults such as cuts
(open circuits), short circuits, and the like.
We now give the development of skeleton and thinning algorithms in the
context of the basic morphological operations.

Skeletons. Let rDx denote a disc of radius r at point x. Letsr(x) denote the
set of centers of maximal discs rDx that arecontained in X and intersect the
boundary of X at two or more locations. Then the skeleton S (X) is the set of centers
s, (x)..
S(X) = Us,(x)
,>0
= U {(X8rD)/(X8rD)d,D} (9.94)
,>0 .
where U and I represent the set union and set difference operations, respectively,
and drD denotes opening with respect to an infinitesimal disc.
To recover the original object from its skeleton, we take the union of the
circular neighborhoods centered on the skeleton points and having radii equal to the
associated contour distance. .
X= U {Sr(x)(BrD} (9.95)
,>0

We can find the skeleton on a digitized grid by replacing the disc rD in (9.94)
by the 3 x 3 square grid G and obtain the algorithm summarized in Table 9.10. Here
the operation (X8nG) denotes the nth iteration (X8G. 8G8'" 8G and
(X8nG)G is the opening of (X8nG) with respect to G•.

Sec. 9.9 Structure 387


TABLE 9.10 Some Useful Morphological Transforms

Operation Definition Properties & Usage

Hit-Miss Searching.for a-match or a specific


configuration. E.",: set formed
from pixels in B that should be-
long to the object. Rb k : •••
background. .
Open x" = (X8B)EBB Smooths contours, suppress small
islands and sharp caps of X. Ideal
for object size distribution study.
Close Blocks up narrow channels and thin
lakes. Ideal for the study of inter
object distance .
Boundary ilX=XlX8G . . Gives the set of boundary points.
Convex Hull X\=X E I, B2 , .•. are rotated versions of
1
X;+J =(XI@B ) the structuring element B.

, 4 . . C is an appropriate structuring
X CH = U XJ~
j-I element choice for B.

Skeleton S(X)= U Sn(X) n m" max size after which X erodes


.-0
n m 3 J<
:
down to an empty set.
= U (X8nG)/(X8nG)o] The skeleton is a regenerative repre-
n=O
nmaJt sentation of the object.
X= U (Sn(X)EBnG] .
n~O

Thin XOB= XlXGB To symmetrically thin X a sequence


XO{B} = « ... «XOBJ)OIi') ... )OB") of structuring elements,
i
{B} = {B , 1 S i <: n}, is used in
i
cascade, where B is a rotated
version of Bi - '.
A widely used element is L.
Thick X0B=XUXGB Dual of thinning.
Prune X, =XO{B} Eisa suitable structuring element.

Xz = U (X 1GP/)
• X z : end points
j-l Xp n : pruned object with Parasite
Xp n = X, U [(X z EB {Gn n xl branches suppressed.
The symbols "I" and" U" represent the set difference and the set union operations
respectively. Examples of structuring elements are

1 d d o 0 0 d d d

1 1 1
G= 1• 1 1 , c= 1 d , L= 1 , E= .0 1 0
1 1 1 1 d 1 o (} (}
where 1, 0, and d signify the object, background and 'don't care' states, respectively.

,
.

,

• ,
388 Image Analysis and comcuter Vlslon . Chap. 9

,
._-, ..
".~~-~-.-~~~ .. - .. _... - ,,_.~

"
,
'"
,

. ,

Figure 9.34 Morphological processing


for printed circuit board inspection. (a)
t
Original; (b) preprocessed (thresholded);
~. .
;;:.\-.i;i<"&.""""'''''''''''''¥~''_' '"~"""';'"'' .,,,,"',-."',.'"-< """""-",,,'_:'0"'.,4~, ~ " , . -- (c) thinned; (d) pruned.

Thinning. In the context of morphological operations, thinning can be de-



fined as "

XOB""X/(X0B) (9.96)
,

where B is the structuring element chosen for the thinning and (3 denotes the
hit-miss operation defined in Table 9.10. .
To thin X symmetrically, a sequence of, structuring elements, {B} [B', g
1 sis n}, is used in cascade, where Bi is a rotated version of Bi -1.
-
XO{B}= « ... «X 0 1
B) 0 2
B ) ••• ) 0 B") (9.97)
A suitable structuring element for the thinning operation is the Lsstructuring ele-
.ment shown in Table 9.10.
The thinning process is usually followed by a pruning operation to trim the
resulting arcs (Table 9.10). In general, the original objects are likely to have noisy
boundaries, which result in unwanted parasitic branches in the thinned version. It is
the job of this step to clean up these without disconnecting the arcs.

Syntactic Representation [45]




The foregoing techniques reduce an object to a set 'of structural elements, or
primitives. By adding a syntax, such as connectivity rules, it is possible to obtain a
syntactic representation, which is simply a string of symbols, each representing a
primitive (Figure 9.35). The syntax allows a unique representation and interpreta-
tion of the

string. The design of a syntax that transforms the symbolic and the •

syntactic representations back and forth is a difficult task. It requires specification


. of a complete and unambiguous set of rules, which have to be derived from the
understanding of the scene under study.

.. .
.Sec; 9.9 Structure 3119


I A A ......-....
Primitive
structural
symbols: 8 b c d

, C "'\ d

a . ./
Object structure C e
a ,

>b •

1
a

Syntllctic representation: a c d II} 8 C8 b8



Figure 9.35 Syntactic representation.

9.10 SHAPE FEATURES .

The shape of an object refers to its profile and physical structure. These character-
istics can be represented by the previously discussed boundary, region, moment,
and structural representations. These representations can be used for matching
shapes, recognizing objects, or for making measurements of shapes. Figure 9.36
lists several useful features of shape. ',

Shape representation

Regenerative features Measurement features


• •

• Boundaries
• Regions •
Geometry Moments
• Moments •
'.
• Structural and syntactic Perimeter Center of mass
,Area • Orientation
Max-min radii Bounding rectangle
and eccentricity !lest-fit ellipse
Corners Eccentricity
Roundnes,;
Bending energy

Holes
Euler number
Symmetry
, ,.
Figure 9.3ll

390 Image Analysis and Computer Visio,n Chap. 9


,
Geometry Features

In many image analysis problems the ultimate aim is tq measure certain geometric
attributes of the object, such as the following:

1. Perimeter
(9.98)
where t is necessarily the boundary parameter but not necessarily its length.
2. Area

A = Jt dx dy = L>fy(t)~~t) dt - LGf x(t) d~;t) dt (9.99) .

where rll and iJrll denote the object region and its boundary, respectively.

3. Radii Rmin , Rmex are the minimum and maximum distances, respectively, to
boundary from the center of mass (Fig. 9.37a). Sometimes the ratio R_IRmin
is used as a measure of eccentricity or elongation of the object.

R....

(a) Maximum and minimum radii


I«r)

:----_ _...J.-;- r - - ' - _............. t


a
(b) Curvature functions for comer detlI:;tion

A
c, , •
....-;:=:: Square A hll$ (. fold symmetry
Circle 8 is rotal,mally symmetric
C,. "...
Small circles C. have
4-fold symmetry
Triangles 4 haw :Hold symmetry
C3

Ie) Types of symmetry

Figure 9.37 Geometry features. •

Sec.• 9.10 . Shape Features 391


4. Number of holes no
S. Euler 'number
(] ~ number of connectedregions -no
, . , . - .• p
(9.100)
6. Corners These are locations on the boundary where the curvature K(t)
becomes unbounded. When t represents distance along the boundary, then
from (9.57) and (9.58), we can obtain
I 2 A ~F y 2 . d 2X 2
,K(t)1 "" dt 2 + dt 2 (9.101)
I
In practice, a comer is declared whenever II$(t)! assumes a large value (Fig.
9.37b).
7. Bending energy This is another attribute associated with the curvature.

r IK(t)1 dt
,
E "" 1. 1
(9.102)
, T 0

1n terms of {a (k)}, the FDsof u(t), this is given by


E"" i: k 00 la(k)1
2
(2;kr (9.103) .

8. Roundness, or compactness
(perimeter)2
-Y"" 4'lr(area) (9.104)

For a disc, 'Y is minimum and equals 1.


9. Symmetry There are two common types of symmetry of shapes, rotational
and mirror. Other forms of symmetry are twofold, fourfold, eightfold, and so
on (Fig. 9.37c). Distances from the center of mass to different points on the
boundary can be used to analyze symmetry of shapes. Corner locations are
also useful in determining object symmetry.
Moment-Based Features

Many shape features can be conveniently represented in terms of moments. For a


shape represented by a region 9£ containing N pixels, we have the following:
,

1. Center of
,
mass

;n=;1. 2:L m, 1i;"1. LL


n (9.105)
I

, ,
N (m. nj E !II N (m. nj E 91
,
The (p, q) order central moments become

f!.p,q= LL
(m. nj E :II
(m -iiff(n -n)q •
(9.106)

2. Orientation Orientation is defined as the angle of axis of the leastmoment of


inertia. It is obtained by minimizing with respect to 6 (Fig. 9.38a) the sum

392 Image Analysis and. Computer Vision Chap. 9


l'
8,

r - - - - - ----
.
----- ----~--..,
. I
I I
I (m, n) _~I
I --
I n I
I I
I A 0 I
I
I 1
I - I
i m I
IL ' .-.JI
{a) Orientation '

8,
,
n
(b) BOundary rectangle
(1

---
- "
a

,
----l--m
,

(0) Best fit ellipse

Flgure9.38 Moment-based features.

1(8)= 2:L IJ2(m,n)= 2:L [(n -ii) cos8-(m '-ffi) sin e]' (9.107)
(m, n) ~ G£ (m, n) t 1lI
- .
The result is

. e = !1 tan-1[ !J.2,O2jJ-l,1
- 1J<J,2
] (9.108)

. 3. Bounding rectangle The bounding rectangle is the smallest rectangle en-


closing the object that is also aligned with its orientation (Fig. 9.38b). Once 6
is known we use the transformation
a=x cos8+y sine
. .. (9.109)
,

(3 = -x sin 6 + Y cos
, e
••
on the boundary-points and search f90 amin, "maXl f3mln' and 13~x. These give the
locations of point"s:~,,0i,Af, and :tt'ir; respectively, in Fig. 9.38b. From these
;'.' 'I'
. '"
"

'. .
• •

Sec. 9.10 393


~,
the bounding rectangle is known immediately with length ,~ '= am., - amin and
• width Wb == f3ma~ -13min' The ratio (lb wblarea) is also a useful shape feature .
4. Best-fit ellipse The best-fit ellipse is the ellipse whose second moment equals
that of the object. Let a and b denote the lengths of semimajor and semiminor
axes, respectively, of the best-fit ellipse (Fig. 9.38c). The least and the greatest
moments of inertia for an ellipse are

(9.110)

For orientation 6, the above moments can be calculated as

l:"in= 2:2: {(n-n)cosll-(m-iii)sinOj2


(m, n) { ,;r , . (9.111)
l:.. ax = 2:2: [(n -n) sin6+(m -iii) cos s]'
,

(m,n) E !if I

For the best-fit ellipse we want lmin = l:..m,l""" = l:n... which gives
. = (1) 1/4 (1:".,)3 118 = (1) 1/4 (I:"in)3 IIll ,
(9.112)
a 'IT I'.
mm
' b 'IT ['
max

5. Eccentricity
11 (""2.0 - floo.z)2 + 4""1, I'
E= '
area
Other representations of eccentricity are Rm.,IRmin, l:'•• II:"in, and alb.

The foregoing shape features are very useful in the design of vision systems for
object recognition.

9.11 TEXTURE

Texture is observed in the structural patterns of surfaces of objects such as wood,


grain. sand, grass, and cloth, Figure 9.39 shows some examples of textures [46]. The

term texture generally refers to repetition of basic texture elements called texels. A
texel contains several pixels, whose placement could be periodic, quasi-periodic or
random. Natural textures are generally random, whereas artificial textures are
often deterministic or periodic. Texture may be coarse, fine, smooth, granulated, .
rippled, regular, irregular, or linear. In image analysis, texture is broadly classified
into two main categories, statistical and structural [47].

Statistical Approaches

Textures that are random in nature are well suited for statistical characterization,
for example, as realizations of random fields. Figure 9.40 lists several statistical
measures of texture. We discuss these. briefly next.

394 Image Analysis and Computer Vision'· ,Chap. 9


>
,i '
,-
, I
, , ,
;11 I < •

I·' t "

, ,
,

,;

1, , ,
, ,
,f
I,
,, ,i
I
r • ,,
,,'
,,

'" -- .
,
." ,,- "

,
, ."" /._"
..""",
.. -
o"-c' _1;';:;#';;:,ii i...; :,__.....__ ;G:",;- d

Figure 9.;W Brodatz textures.

Classification of texture

Statistical Structural Other


,

ACF Mosaic models


Transforms
Edge·ness
Concurrence matrix Periodic Random
Texture transforms • j -

Primitives: Edge density


Random field models Gray levels E"treme densitv
Shape Run lengths
Homogeneity
Placement rules:
Period
Adjacency
Closest distances

Figure 9.40 '


, ,

The autocorrelation function lACF). The spatial size of the tonal prim.'
itives (i.e., texels) in texture can be represented by the width of the spatial ACF
r(k, I) = m2 (k, l)/m2 (0, 0) [see (9.7)]. The coarseness of texture is expected to be
proportional to the width of the ACF which can be represented by distances xo,Yo
such that r(xo, 0) = r(O,yo) = !. Other measures of spread of the ACF are obtained

via the moment-generating function '


.
M(k, /) ~ ~~ (m -~lY(n -p.,yr(m, n) (9.113)
'" . . ,

,
Sec. 9.11 Texture '
395

where
D. .
fJ-l = 2: 2: mr(m, n), fJ-2 tl. 2: 2: nr(m, n)
m n m n

Features of special interest are the profile spreads M (2,0) and' M (0,2), the cross-
relation M (1,1), and the second-degree spread M (2,2). The calibration of the ACF
spread on a fine-coarse texture scale depends on the resolution of the image. This is
because a seemingly flat region (no texture) at a given resolution could appear as
fine texture at higher resolution and coarse texture at lower resolution. The ACF by
itself is not sufficient to distinguish among several texture fields because many
different image ensembles can have the same ACF. .

Image transforms. Texture features such as coarseness, fineness, or


orientation can be estimated by generalized linear filtering techniques utilizing
image transforms (Fig. 9.4). A two-dimensional transform v (k, l) of the input image
is passed through several band-pass filters or masks gi (k, I), i = 1,2,3, ... , as
z, (k, I) = v (k, l)gi (k, I) (9.114)


Then the energy in this z, (k, I) represents a transform feature. Different types of .
masks appropriate for texture analysis are shown in Fig. 9.5.a. With circular slits we
measure energy in different spatial frequency orsequency bands. Angular slits are
useful in detecting orientation features. Combinations of angular and circular slits
are useful for periodic or quasi-periodic textures. Image transforms have been
applied for discrimination of terrain types-for example, deserts, farms, moun-
tains, riverbeds, urban areas, and clouds [48]. Fourier spectral analysis has been
found useful in detection and classification of black lung disease by comparing the
textural patterns of the diseased and normal areas [49].

Edge density. The coarseness of random texture can also be represented


by the density of the edge pixels. Given an edge map [see (9.17)] the edge density is
measured by the average number of edge pixels per unit area.

Histogram features. The two-dimensional histogram discussed in Section


9.2 has proven to be quite useful for texture analysis. For two pixels Uj and Uz at

relative distance r and orientation e, the distribution function [see (9.9)] can be
explicitly written as ,
.
p; (Xh X2) = fer, e;XloXZ) (9.115)
Some useful texture features based on this function are

(9.116)


(9.117)

t The symbol No·represents the total number of orientations.

396 Image Analysis and Computer Vision


,
Chap. 9
Variance distribution: a2(r;XJ,X2) . ~ 2: [fer, 8;XJ, X2)
. 1VO e (9.118)
- /L(r;XJ, X2)f

Spread distribution: ·T)(r;xJ,x2) = max {f(r, 8;X"X2)}
. e (9.119)
- min {fer, 8;Xl>X2)}
e
The inertia is useful in representing the spread of the function fer, B; x" X2) for a
given set of (r,8) values. 1 Cr, e) becomes proportional to the coarseness of the
texture at different distances and orientations. The mean .distribution flo(r; Xl> X2) is
useful when the angular variations in textural properties are unimportant. The
variance (J 2 (r; x h X2) indicates the angular t1uctuations of textural properties. The
function T)(r;Xl>X2) gives a measure of orientation-independent spread.

Random texture models. It has been suggested that visual perception of


random texture fields may be unique only up to second-orderdensities [50]. It was
observed that two textured fields with the same second-order probability distribu-
tions appeared to be indistinguishable. Although not always true, this conjecture
has proven useful for synthesis and analysis of many types of textures.. Thus two
different textures can often be . discriminated by comparing their second-order .

histograms.
A simple model for texture analysis is shown in Fig. 9.41a [51J. The texture
field is first decorrelated by a filter a(m, n), which can be designed from the knowl-
edge of the ACF. Thus if rem, n) is the ACF, then
tl .
u(m, n) ® a(m, n) = <-em, n) (9.120)

is an uncorrelated random field. From Chapter 6 (see Section 6.6) this means that
any WNDR of u(m, n) ,
would give an admissible whitening (ordecorrelating) filter.

ACF
measurement
, Texture
Texture u(m, n) features
Feature x
pattern • extraction

·A(z"z,) elm, n) Histogram


decorrelation measurement •
Ii Iter .

(a) Texture analysis by decorrelation


. , •

elm, n) uim, n)

. ,

., . (b) Texture synthesis using linear filters··
;, r ,..,
_;'."', ,r"
••
. : ,,,It!gw-e. 9;41 Random.texture
.,!".~. ,},' models,
_~ .1', ',i'

, Sec; 9.11 Texture 397


Such a filter is not unique, and it could have causal, semicausal, or noncausal .
structure. Since the edge extraction operators have a tendency to decorrelate im-
ages, these have been used -[51 J as alternatives to the true whitening filters. The
ACF features such as M (0, 2),M(2, 0), M(l, 1), and M(2, 2) [see (9.113)J and the
features of the first-order histogram of e(rn, n), such as average rnt, deviation ~,
skewness J.L3, and kurtosis J.L4 - 3, have been used as the elements of the texture
feature vector x in Fig. 9.41a.
Random field representations of texture have been considered using one-
dimensional time series as well as two-dimensional random field models (see [52],
[53] and bibliography of Chapter 6). Following Chapter 6, such models can be .
identified from the given data. The model coefficients are then used as features for
texture discrimination. Moreover these random field models can synthesize random
texture fields when driven by the uncorrelated random field e{rn, n) of known
probability density (Fig. 9,41b).
Example 9.6 Texture synthesis via causal and semicausal models
Figure 9.42a shows a given 256 x 256 grass texture. Using estimated covariances, a
(p, q) "" (3,4)-order white Gaussian noise-driven causal model was designed and used
to synthesize the texture of Fig. 9.42b. Figure 9A2e show. the texture synthesized via a
(p, q) "" (3,4) semicausal white noise-driven model. This model was designed via the
Wiener-Doob homomorphic factorization method of Section 6.8.

Structural Approaches [4, 47, 54J

Purely structural textures are deterministic rexels, which repeat according to some
placement rules, deterministic or random. A texe! is isolated by identifying a group
.of pixels having certain invariant properties, which repeat in the given image. The
texel may be defined by its gray level, shape, or homogeneity of some local prop-
erty, such as size, orientation, or second-order histogram (concurrence matrix). The

placement rules define the spatial relationships between the texels, These spatial
relationships may be expressed in terms of adjacency, closest distance, period-

Is} Original gross texture Ibl Textura synthesized by lei Texture synthesized by
causal model semicausal modeJ

Figure !M2 ,Te¥ture synthesis using causal and semicausal models.

398 Image Analysis and Computer Vision Chap.9.


• •

kirks, and so on, in the case of deterministic placement rules. In such cases the
texture is labeled as being strong.
For randomly placed texels, the associated texture is called weak and the
placement rules maybe expressed in terms of measures such as the following:

1. Edge density
2. Run lengths of maximally connected texels
3. Relative extrema density, which is the number of pixels per unit area showing
gray levels that are locally maxima or minima relative to their neighbors. For
example, a pixel u (m, n) is a relative minimum or a relative maximum if it is,
respectively, less than or greater than its nearest four neighbors. (In a region
of constant gray levels, which may be a plateau or a valley, each pixel counts as
an extremum.) • •
This definition does not distinguish between images having a few large
plateaus and those having many . single extrema. An.
alternative is to count
each plateau as one extremum. The height and the area of each extremum
may also be considered as features describing the texels.
Example 9.7 Synthesis for quasiperiodic textures
The raffia texture (Fig. 9.43a) can be viewed as a quasiperiodic repetition of a
deterministic pattern. The spatial covariance function of a small portion of the image
. was analyzed to estimate the periodicity and the randomness in the periodic rate. A
17 x 17 primitive was extracted from the parent texture and repeated according to the
quasiperiodic placement rule to give the image of Fig. 9.43b .

Other Approaches

A method that combines the statistical and the structural approaches is based on
what have been called mosaic models [55]. These models represent random geo-
metrical processes. For example, regular or random tessellations of a plane into
bounded convex polygons give rise to cell-structured textures. A mosaic model
.",'.. "
... - ~­
.' -"1"''/;;':,'
--- '
" ~

- 1# __ ~
~,

••

,..
,,-
•-', -
. - ..., -..
• •

-
C __" - , " : , ,
,
.-- ,'.

~"~""''''-'"'­
+•-"4_'_"
• ?
- .
,.'


,-
.....
--,,',,-
_,
-<
_._'

-~-'-~'
. .
.-
,-"\,
--
.. J_
t _ ,'".';
,._,},X,f.
~
~

lal Ofiginal 256 x 256 raffia' Ibl Synthesized raffia by quasi-periodic



placement of 8 primitive

Figure 9.43 Texture synthesis by structural approach.

Sec. 9.11' . Texture 399


could define rules for partitioning aplane into different cells, where each cell con-
tains a geometrical figure whose features (such as center or orientation) have
prescribed probability distributions. For. example circles of fixed radius placed
.

according to a Poisson point process defined on a plane would constitute a mosaic


texture. In general, mosaic models should provide higher resolution than the
random field models. However, quantification of the underlying geometric pattern
and identification of the placement process would be more difficult. For textures
that exhibit regularities in primitive placements, grammatical models can be devel-
oped [3]. Such grammars give a few rules for combining certain primitive shapes or
symbols to generate several complex patterns.

9.12 SCENE MATCHING AND DETECTION.



A problem of much significance in image analysis is the detection of change or.
presence of an object in a given scene. Such problems occur in remote sensing for
monitoring growth patterns of urban areas, weather prediction from satellite
images, diagnosis of disease from medical images, target detection from radar
images, and automation using robot vision, and the like. Change detection is also
useful in alignment or spatial registration of two scenes imaged at different instants
or using different sensors. For example, a large object photographed. in small
overlapping seetions can be reconstructed by matching the overlapping parts.

Image Subtraction .

Changes in a dynamic scene observed as u, (m, n), i = 1, 2, ... are given by


e.im, n) = u.tm, n) - Ui-j (m, n) (9.121)
Although elementary, this image-subtraction technique is quite powerful in care-
fully controlled imaging situations. Figure 9A4 shows an example from digital
radiology. The images Uj and Uz represent, respectively, the X-ray images before
and after injection of a radio-opaque dye in a renal study. The change, not visible in
uz, can be easily detected as renal arteries after u, has been subtracted out. Image
subtraction is also useful in motion detection based security monitoring systems,
segmentation of parts from a complex assembly, and so on.

Template Matching and Area Correlation


The presence of !( known object in a scene can be deteeted by searching for the
location of match between the object template u(m, n) and the scene y(m, n).
Template matching can be conducted by searching the displacement of u(rn, n),
where the mismatch energy is minimum. For a displacement (p, q), we define the
mismatch energy

O'~(P,q)~LL [v(m,n)-u(m <p,» _q)J2


m n (9.122)
". =LLlv(m,n)12+ LLlu(m,n)12-2LLv(m,n)u(m -p,n -q)
m n m n m n

400 Image Analysis and Computer Vision Chap. 9



,
~"~,,
J, ',',.

"\\1
• '.. ,

,
.~



1

. a) Precontrast b) Pcstcontrast

,••
; ""

I,
I••

!•, •

,I•
•,•,
!
I
s

.,

c) Difference
-------.. "

Flgure 9.44 Change detection in digital radiography.




For IT;(p, q) to achieve a minimum, it is sufficient to maximize the cross-
correlation .
cvu(p,q)@L,2,v(m,n)u(m-p,n-q), T/(p, q) . (9.123)
m n
. From the Cauchy-Schwarz inequality, we have

le""l"" L L v(m, n)u(m.- ~,.,


p, n - q)
. m.n _ ""....... ""'....

. (9.124)'
112 ,112
Lm Ln [v (11'l~ n )1
~:-L
2
m n

Sec. 9.12 Scene Matchi na'Blnd


. -,
Detecti0ttl-
,-~.". 401

vim, nl , C,uIP, ql Search


X :l::l;(.; Object
mn , peaks Iocationlsl
,
.

111m - p, n - q)

Figure 9.45 Template matching by area correlation,

where the equality occurs if and only if v(m, n) = au(m - p, n - q), where a is an
arbitrary constant and can be set equal to 1, This means the cross-correlation
c•• (p, q) attains the maximum value when the displaced position of the template
coincides with the observed image, Then, we obtain

c•• (p, q) = LL jv(m, n)1 2


>0 (9,125)
m "
and the desired maximum occurs when the observed image and the template are
spatially registered. Therefore, agiven object u(m, n) can be located in the scene by
searching the peaks of the cross correlation function (Fig. 9A5). Often the given
template and the observed image are not only spatially translated but are also
relatively scaled and ro-ared. For example,
,
m r-
,
p n-q ,
v(m,n)=au , ;6 (9.126).
. ~l ~2

where 'Yl and ~2 are the scale factors, (p', q 1 are the displacement coordinates, and
6 is the rotation angle of the observed image with respect to the template. In such
cases the cross-correlation function maxima have to be searched in the parameter
space (p', q', "It. ~2, (I). This can become quite impractical unless reasonable esti-
mates of ~h ~i,and 6 are given. '
The cross-correlation c•• (p, q) is also called the area correlation. It can be
evaluated either directly or as the inverse Fourier transform of ,
C•• (wt. (2) ~ ,:?'{c•• (p, q)} = V(WI. (2)U' (WI. (2) (9.127)
The direct computation of the area correlation is useful when the template is small.
Otherwise, a suitable-size FFT is employed to perform the Fourier transform calcu-
lations. Template matching is 'particularly efficient when the data is binary. In that
case, it is sufficient to search the minima of the total binary difference

"'Iv. (p, q) ~ LL [v(m, n) EBu(m - p, n- q)] (9.128)


m n

which requires only the simple logical exclusive-OR operations. The quantity
'Yv. (p, q) gives the number of, pixels in the image that do not match with the
template at location (p, q). This algorithm is useful in recognition of printed
characters or objects characterized by known boundaries as in the inspection of
printed circuit boards.•

402 Image Analysis and Computer Vision .Chap. 9




Matched Filtering [56-57]·

Suppose a deterministic object u(m,n), displaced by (mo, no), is observed in the


presence of a surround (for example, other objects), arid observations are a colored
noise field 1'J(m, n) with power spectral density S~ (OOb (02)' The·observations are
v(m, n) = u(m - mo, n - no) + "Il(m, n) (9.129)

The matched filtering problem is to find a linear filter gem, n) that maximizes the
output signal-to-noise ratio (SNR)

SNR ~
2
Is (0, 0)1 ,
2: 2: E[lg(m, n)@11(m, n)1 2
] (9.130)
m n s(m,n)~g(m,n)@u(m -mo,n -no)
Here sim, n) represents the signal content in the filtered output gem, n)@v(m, n).
Following Problem 9.16, the matched filter frequency response is found to be

0(OOb 002)

= S ((oot> 00i) [.(
) exp -J 00l mO+ 002 nO)] (9 • 131)

... OOb 002

which gives its impulse response as .


gem, n) = r... (m, n) ® u(-m - mo, -n - no)
where

_( )a .-r--l
r~ m, n = <7'
1
S~ (OOb 00z) (9.132)
.

Defining
v(m, n) a vim, n)@r... (m, n) (9.133)

the matched filter output can be written as


gem, n)®v(m, n). u{ -m - rna, -n - no)@v(m, n) (9.134)
= 2: 2: v(i, j)u(i - m - rna,} - n - no)
i j

which, according to (9.123), is c_u (rn + mo, n + no), the area correlation of vern, it)
with u (m + mo, n + no). If (mo, no) were known, then the SNR would be maximized
at im, n) = (0,0), as desired in (9.130) (show!). In practice ·these displacement
values are unknown. Therefore, we compute the' correlation Cpu (m, n) and search
for the location of maxima that gives (mo, no). Therefore, the matched filter can be
implemented as an area correlator with a preprocessing filter (Fig. 9.46a). Recall
from Section 6.7 (Eq. (6.91» that r... (m, n) would be proportional to the impulse
response of the minimum variance noncausal prediction error filter for a random
field with power spectral density S." (001, (02). For highly correlated random fields-
for instance, the usual monochrome images-r; (m, n) represents a high-pass filter.
For example, if the background has object-like power spectrum [see Section 2.11]

Sec,,9.12. Scene Matching and Detection 403


glm. nl n
r---------------------j
, I
I I

vim. nl
I
:
Min variance
nonceuse! !·Im. n)

I
I
1 IJ.2
-'" " 2

I PEF • . ut-rrn, -n) ,


I O· -0: 1 -"

I . I
! r;;lm. nl I m -1 (X2 -01 01 2
I I m
I
L ;::~ --iI -1 0 1

(a) Overall-filter,. '(b) Example of r;;lm. n)


, j""
,;...
i
Figure 9.46 Matched filtering in the presence of colored noise. For white noise
case r;; (m,n) = l'>(m, n ) . " . .

S~ (W1, w,) = 1
-,------'=:-----
1
O:sa<:z (9.135)
. - (1-2a coswl)(1-2a coswz)'


then r~ (m, n) is a high-pass filter whose impulse response is shown in Fig. 9.46b (see
Example 6.10). This suggests that template matching is more effective if the edges
(and other high frequencies) are matched whenever a given object has to be de-
tected in the presence of a correlated background. .
If'll(m, n) is white noise, then S~ will be constant for instance, and r: (m, n) =
oem, n). Now the matched filter reduces to the arca correlator of Fig. 9.45.
Direct Sea.ch Methods [58-59]
.
Direct methods of searching for an object in a scene are useful when the template
size is small compared to the region of search. We discuss some efficient direct
search techniques next .. •

Two-dimensional logarithmic search. This method reduces the search


iterations to about log n for an n x n area. Consider the mean distortion function
M N

D(i,j) = ,~N:;ln~/(v(m, n) - u(m + i, n + j», . -p:s i.] »» (9.136)

where f(x) is a given positive and increasing function of x, u (m, n) is an M x N


template and v(m, n) is the observed image. The template match is restricted to a
preselected [-p, p] x [-p, p] region. Some useful choices for f(x) are Ixl and Xl.
We define direction ofminimum distortion (DMD) as the direction vector (i,j) that
minimizes D(i,j). Template match occurs when the DMD has been found within
the search region.
, Exhaustive search for DMD would require evaluation of D(i, j) for (2p + 1)'
directions. lithe D (i, j) increases monotonically as we move away from the DMD
along any direction, then the search can be speeded up by successively reducing the
area of search: Figure 9.47a illustrates the procedure for p = 5. The algorithm
consists of searching five locations (marked 0), which contain the center of the
search area and the midpoints between the center and the four boundariesof the
area. The locations searched at'the initial step are marked 1. The optimum direction.
. ,

404 Image Analysis and Computer Vision Chap. 9




/-5 /-4 /-3 /-2 /-1 I /+ 1 /+2 /+3 /+4 /+5
;- 5 • I
, I •
• • •

2 • I, •
,, •

,
;-4 •
, ,,,
,
3 3 IGY
; 3 •,
I
;-3 --
I•
• 3 ,3 3 ,,,
,
;- 2
'2
I, 0 V0 t
~-

2

,

!
1-1 I,
,
3 • 3 3 ,
,, A .i.. •


I
{- 1"V 1 ,

!
;+ 1
,, •
,,,
!,
• ,
\

1+2 ,

-.& • •

, 1T",

,i
, ,
,


;+3 ,
,
• , •

;+4 I • •
,
, ,

;+5
I• ;
I,

A 2-D logarithmic search procedure for the direction


of minimum distortion. Thefigure shows the concept
of the 2-D logarithmic search to find a pixel in another
frame, which is registered with respect to the pixel
(i, j) of a given frame. such that the mean square error'
over a block defined around ii, j} is minimized. The
search is done step by step, \vith 0 indicating the
directions searched at a step number marked. The
numbers circled show the optimum directions for that
search step and the " shows the final optimum direction, ,
tI - 3, i + 1}.: in this example. This procedure requires;
searching only 13 to 21 locations jar the given grid, as
opposed to 121 total possibilities.

(a)
11\" I

,
"~

(' ,
., .
~.
,F>
'-'i

Ii.,". ,
ft-
'~
I"
~, , ~;, , ,", , *, , , , ,~
xl Figure 9.47 Two-dimensional logarith-
I:;'''--'WM''';'''_'''' , t ...,::.:- ....... ,J' mic search. (a) The algorithm. (b) Exam-
pie Courtesy Stuart Wells, Herriott,Watt
(b) Univ. U.K. '

Sec. 9.12 . Scene Matching and Detection 405

..

. {circled numbers) gives the location of the new center .for the next step. This
procedure continues until the plane of search reduces to a 3 x 3 size. In the final
step aU the nine locations are searched and the location corresponding to the
minimum gives the DMD. .

.If the direction of minimum distortion lies outside.;Y(p), the algorithm con-
verges to a point on the boundary that is closest to the DMD. This algorithm has
been found useful for estimating planar-motion of objects by measuring displace-
ments of local regions from one frame to another. Figure 9.47b shows the motion .
vectors detected in an underwater scene involving a diver and turbulent water flow.

vim. n)

ulm.nl u 1 1m, n)
o
u2(m.n)
o
v, 1m, nl
••

Figure 9.48 . Hierarchical search. Shaded area shows the region where' match oc-
curs. Dotted lines show regions searched.

406 Image Analysis and Computer Vision Chap•.9


. Sequential search. Another way of speeding up search is to compute the
cumulative error

ep.q(i,j)~ ± ± Iv(m,n)-u(m +i,n +j)I, P :5M,q <N (9.137)


nt "" 1 1'1 "" I

and terminate the search at (i, j) if ep•q (i, j) exceeds some predetermined threshold.
The search may then be continued only in those directions where ep,q (i, j) is below a
threshold .:
Another possibility is to search in the i direction until a minimum is found and
then switch the search in the j direction. This search in alternating conjugate direc-
tions is continued until the location of the minimum remains unchanged.

Hierarchical search. If the observed image is very large, we may first


search a low-resolution-reduced copy using a likewise reduced copy of the template.
If multiple matches occur (Fig. 9.48), then the regions represented by these loca-
tions are searched using higher-resolution copies to further refine and reduce the
search area. Thus the full-resolution region searched can be a small fraction of the
total area. This method of coarse-fine search is also logarithmically efficient.

9.13 IMAGE SEGMENTATION •

Image segmentation refers to the decomposition of a scene into its components. It is


a key step in image analysis. For example, a document reader would first segment
the various characters before proceeding to identify them. figure 9,49 lists several
image segmentation techniques which will be discussed now.

Amplitude Thresholding or Window Slicing

Amplitude thresholding is useful whenever the amplitude features (see Section 9.2)
sufficiently characterize the object. The appropriate amplitude feature values are.
calibrated so that a given amplitude interval represents a unique object character-
istic. For example, the large amplitudes in the remotely sensed IR image of Fig.
7.10b represent low temperatures or high altitudes. Thresholding the high-intensity
values segments the cloud patterns (Fig. 7.lOd). Thresholding techniques are also
useful in segmentation of binary images such as printed documents, line drawings'
and graphics, multispectral and color images, X-ray images, and so on. Threshold

Ima!l'l segmentation techniques

Amplitude . Component 8oondary-based Region'based Template . Texture


th'O$hokfing labeling approaches approach.. matching segmentation
and cluiltering •

Figure!M9

Sec. 9.13' Image Segmentation •


407
.,
selection is an important step in this method. Some commonly used approaches are
as follows: .
1. The histogram of the image is examined for locating peaks and valleys. If it is
multimodal, then the valleys can be used for selecting thresholds. .
2. Select the threshold (t) so that a predetermined fraction (1')) of the total

number of samples is below t.
3. Adaptively threshold ,by examining local neighborhood histograms.
4. Selectively threshold by ~amining histograms only of those points that satisfy
a chosencriterion. Forexample.. in low-contrast images, the histogram of
those pixels whose Laplacian magnitude is above a prescribed value will
exhibit clearer bimodal features than that of the original image.
5. If a probabilistic model of the different segmentation classes is known,
determine the .threshold to minimize the probability of error or some other
quantity, for instance, Bayes' risk (see Section 9.14)..
Example 9.8
We want to segment the Washington Monument from 'the scene in Fig. 9.50. First, the
low intensities are thresholded to isolate the very dark areas (trees here). Then we
detect a rectangle bounding the monument by thresholding the horizontal and vertical
projection signatures defined as hen) ~ u (m, n)I'l:,m 1, v (m) ~ 'l:,m u(m,n)/2':n 1.
I;m
Contour-following the boundary of the object inside the rectangle gives the segmented
object.

m.

• Input Tree removed r-r:_ _.., Segmented


image 255 f-/ Image
Horizontal
and vertical , Contour image
i , projection --1 following
60 255 .l. I analysis
• hln) J

, n it

Figure 9.50 Image segmentation using horizontal and vertical projections.

408 Image Analysis and Computer Vision Chap. 9



c A D

B X

Figure 9.51 Neighborhood of pixel X in


a pixel labeling algorithm.

Component Labeling

A simple and effective method of segmentation of binary images is by examining the


connectivity of pixels with their neighbors and labeling the connected sets. Two
practical algorithms are as follows.

Pixel labeling. Suppose a binary image is .raster scanned left to right and
top to bottom. The current pixel, X (Fig. 9.51), IS labeled as belonging to either an
object (Is) or a hole (Os) by examining its connectivity to the neighbors A, B, C, and
D. For example, if X = 1, then it is assigned to the object(s) to which it is connected.
If there are two or more qualified objects, then those objects are declared to be
. equivalent and are merged. A new object label is assigned when a transition from Os
to an isolated 1 is detected. Once the pixel is labeled, the features of that object are
updated. At the end of the scan, features such as centroid, area, and perimeter are
. saved for each region of connected Is. .,.

Run-length connectivity analysis. An alternate method of segmenting


binary images is to analyze the connectivity of run lengths from successive scan
lines. To illustrate this idea, we consider Fig. 9.52, where the black or white runs are
denoted by a, b, c, .... A segmentation table is created, where the run a of the first
scan line is entered into the first column. The object of the first run a is named A.
The first run of the next scan line, b, is of the same color as a and overlaps a. Hence

Column 1 2 3 .
• 5 ~

a
A • Level
Object
1
A
2
B
1 2
A. B A
1 3
C

b Iel B C
• Flag.
101 8 C
e i IC2 A 8
ID2 A 8
• , ,
J n
\
II
b c d I
0 •
S 'e f g h I

J k m n I Data
t v 0 .p r a q
t u v
w w •

'{a) Input: binary Image Ib) Output: .egmentatlon table


,

Figure 9.52 Run-length connectivity algorithm.


Sec. 9.13 Image Segmentation 409


b belongs to the object A and is placed underneath a in the first column. Since c is of
different color, it is placed in a new column.. for an object labeled B. The run d is of
the same color as a and overlaps a. Since band d both overlap a, divergence is said to
have occurred, and a new column ofobject A is created, where d is placed. A divergence
flag IDI is set in this column to indicate that object B.has caused this divergence.
Also, the flag ID2 of B (column 2) is set toA to indicate B has caused divergence in
A. Similarly, convergence occurs when two or more runs of Os or Is in a given line
overlap with a run of same color in the previous line. Thus convergence occurs in
run u, which sets the convergence flags ICi to C in column 4 and IC2 to B in column
6. Similarly, w sets the convergence flag IC2 to A in column 2, and the column 5 is
labeled as belonging to object A.
In this manner, all the objects with different closed boundaries are segmented
in a single pass. The segmentation table gives the data relevant to each object. The
convergence and divergence flags also give the hierarchy structure of the object.
Since B causes divergence as well a, convergence in A and C has a similar relation-
ship with B, the objects A, D, and C are assigned levels 1, 2, and 3, respectively.
o

Example 9.9
A vision system based on run-length connectivity analysis is outlined in Fig. 9.S3a. The

input object is imaged and digitized to give a binary image. Figure 9.53b shows the
run-length representation of a key and its segmentation into the outer profile and the

object AtoD Preprocess RUll'length Featul'll


Camera Classification
converter' and binarize se!JlYlentation extracnon

lal Vision syttam

• •

(e) Display of Bast-Fit Ellipse. Bounding


(b! Run length Data for a Key Rectangle and Center of Mass for the Key.

Figure 9.53 Visi!ln system based on run-length connecitvity analysis.

410 Image Analysis and Computer Vision Chap.9 .


"
three holes. For each object, features such as number of holes, area of holes, bounding
rectangle, center of mass, orientation, and lengths of major and minor axes of the best-
, fit ellipse are calculated (Fig. 9.53c). A system trained on the basis of such features can
then identify the given object from a trained vocabulary of objects [69J.

Boundary-Based Approaches

Boundary extraction techniques segment objects on the basis of their profiles.


Thus, contour following, connectivity, edge linkiag and graph searching, curve
fitting, Hough transform, and other techniques of Section 9.5 are applicable to
image segmentation. Difficulties with boundary-based methods occur when objects
are touching or overlapping or if a break occurs in the boundary due to noise or
. artifacts in the image.
Example 9•.10 Boundary analysis-based vision system
Figure 9.54a shows an example of an object-recognition system, which uses the bound-
ary information for image segmentation. The edges detected from the image of the
input object are linked to determine the boundary. A spline fit (Section 9.6) is per-
formed to extract the control points (Fig. 9.54b), which are then used to determine the
object location (center of mass), orientation, and other shape parameters [71].
"

Segmentation • Spline control


Object Edll" ClaSsification
by boundary point feature and analysis
detection detection extraction

lal System block diagram


"-I \
I \
I X"Il.
. -
\

Object boundary points


Cubic spline
,
fit


• •
• ,

• (bl •

Figure 11.54 Object recognition system based on boundary analysis.



,• .
Sec. 9.13' "Image Segmentation 411

Flegion-Based Approaches and Clustering'
The main idea in region-based segmentation techniques is to identify various
regions in an image that have similar features. Clustering techniques encountered in
pattern-recognition literature have similar objectives and can be applied for image
segmentation. Examples of clustering are given in Section 9.14.
One class of region-based techniques involves region growing [72]. The image
is divided into atomic regions of constant gray levels. Similar adjacent regions are
merged sequentially until the adjacent regions become sufficiently different (Fig.
9.55). The trick lies in selecting the criterion for merging. Some merging heuristics
are as follows:

1. Merge two regions !/tj and f7t j if wlPm> 61> where Pm = min(p;, }}), P; and}} are
the perimeters of !/t i and f7t}, and w' is the number of weak boundary locations
(pixels on either side have their magnitude difference less than some
.
threshold'
.

(1). The parameter 01 controls the size of the region to be merged. For example
6) = 1 implies two regions will be merged only if one of the regions almost
surrounds the other. Typically, 01 = 0.5.
2. Merge !/ti and !/tj if wll > 62 , where I is the length of the common boundary
between the two regions. Typically 62 = 0.75. So the two regions are merged if
the boundary is sufficiently weak. Often this step is applied after the first
heuristic has been used .to reduce the numberof regions. .
3. Merge flti and f7t] only if there are no strong edge points between them. Note
that the run-length connectivity method for binary images can be interpreted
as an example of this heuristic.
4. Merge Ge i and Gej if their similarity distance [see Section 9.14] is less than a
threshold.

Instead of merging regions, we can approach the segmentation problem by


splitting a given region. For example the image could be split by the quad-tree
approach and then similar regions could be merged (Fig. 9.56).
Region-based approaches are generally less sensitive to noise than the
boundary-based methods. However, their implementation complexity can often be
quite large.
Weak
boundary • •

01, I . 01 2
/
.r-
I/'\.
/ __ .---1 ~ Merge


• ,

Figure 9.55 Region growing by merging.

412 Image Analysis and Computer Vi~ion· Chap. 9


,
1 2
,

A I B \, A 1 B
,
- I
Split D C I D C Merge
. .. - -- I--B •
A
4
1 D I C
3
(a) Input

< ,

1 2 3 4
lC,2D,M 11A, 8, D)
, ,
21A, 8, C)

~ ]) ~
31B. C, D)
cc2
B
v B D A C
=v 4

(b) Quad treesplit (c) Segmented regions


,

Figure 9.56 Region growing by split and merge techniques.

Template Matching

One direct method of segmenting an image is to match it against templates from a


< given list. The detected objects can then be segmented out and the remaining image
can be analyzed by other techniques (Fig. 9.57). This method can be used to
segment busy images, such as journal pages containing text and graphics. The text
can be segmented by template-matching techniques and graphics can be analyzed
by boundary following algorithms.

Texture Segmentation

Texture segmentation becomes important when objects in a scene have a textured


background. Since texture often contains a high density of edges, boundary-based
techniques may become ineffective unless texture is filtered out. Clustering and


'r (\""" ..-._""T,' '" .
.,,, .'rY~1:·

, •,

;
• •f•

• I,t..,

la) Template lbl Input image , Ie) Filtered image

Figure 9.57 Background segmentation (or filtering) via template matching.

Sec. 9.13 Image Segmentation


, 413

region-based approaches applied to textured features can be used to segment tex-


tured regions. In general, texture classification and segmentation is quite a difficult
problem. Use of a priori knowledge about the' existence and kinds of textures that
may be present in a scene can be of great utility in practical problems.

9.14 CLASSIFICATION TECHNIQUES .

A major task after feature extraction is to classify the object into one of several
categories. Figure 9.2 lists various classification techniques applicable in image
analysis. Although an in-depth discussion of classification techniques can be found
in the pattern-recognition literature see, for example, [l}-we will briefly review
these here to establish their relevance in image analysis.
It should be mentioned that classification and segmentation processes have
closely related objectives. Classification can lead to segmentation, and vice-versa.
Classification of pixels in an image is another form of component labeling that can
result in segmentation of various objects in the image. For example, in remote'
sensing, classification of multispectral data at each pixel location results in segmen-
tation of various regions of wheat, barley, rice, and the like. Similarly, image
segmentation by template matching, as in character recognition, leads to classifica-
tion or identification of each object. .
There are two basic approaches to classification, supervised and nonsuper-
. vised, depending on whether or not a set of prototypes is available.

Supervised Learning

Supervised learning, also called supervised classification, can be distribution free


or statistical. Distribution-free methods do not require knowledge of any a priori
probability distribution functions and are based on reasoning and heuristics. Statis-
tical techniques are based on probability distribution models, which may be
parametric (such as Gaussian distributions) or nonparameiric.

Distribution-free classification. Suppose there are K different objects or
pattern classes Sh S2, ... ,Sk, ... ,SK. Each class is characterized by Mk prototypes,
which have N x 1 feature vectors y<::1, m "" 1, ... ,Mk • Let x denote an N x 1 feature
vector obtained from the observed image. A fundamental function in pattern recog-
nition is called the discriminant function. His defined such that the kth discriminant
function gk (x) takes the maximum value if x belongs to class k, that is, the decision
rule is
(9.138)
For a K class problem, we need K - 1 discriminant functions. These functions
divide the N -dimensional feature space into K different regions with a maximum of
K(K -1)/2 hypersurfaces. The partitions become hyperplanes if the discriminant
function is linear, thatis, if it has the form ,
• (9.139)

414 Image Analysis and Computer Vision Chap. 9.


.

Such a function arises, for example, when x is classified to the class whose centroid
is nearest in Euclidean distance to it (Problem 9.17). The associated classifier is
called the minimum mean (Euclidean) distance classifier.
An alternative decision rule is to classify x to S, if among a total of k nearest
prototype neighbors of x, the maximum number of neighbors belong to class Si. This
is the k-nearest neighbor classifier, which for k = 1 becomes a minimum-distance
classifier.
When the discriminant function can classify the prototypes correctly for some
linear discriminants, the classes are said to be linearly separable. In that case, the
weights ak and bk can be determined via a successive linear training algorithm. Other
discriminants can be piecewise linear, quadratic, or polynomial functions. The
k-nearest neighbor classification can be shown to be equivalent to using piecewise
linear discriminants.
Decision tree classification [60-61]. Another distribution-free classifier,
called a decision tree classifier, splits the N-dimensional feature space into unique
regions by a sequential method. The algorithm is such that every class need not be
tested to arrive at a decision. This becomes advantageous when the number of
classes is very large. Moreover, unlike many other training algorithms, this algo-
rithm is guaranteed to converge whether or not the feature space is linearly sepa-
rable. •
Let !J.k (i) and Uk (i) denote the mean and standard deviation, respectively,
measured from repeated independent observations of the kth prototype vector
element y};) (i), m "" I, ... ,Mk • Define the normalized average prototype features
Zk (i) ~ !J.k (i)/uk(i) and an N X K matrix

42(1) zk(l)
z2(2) zk(2)
••

(9.140)
zl(N) z2(N) ... zk(N)
The row number of Z is the feature number and the columnnumber is the object or
class number. Further, let Z' ~ (Z] denote the matrix obtained by arranging the
elements of each row of Z in increasing order with the smallest element on the left
and the largest on the right. Now, the algorithm is as follows.
Decision Tree Algorithm
Step 1 Convert Z to Z'. Find the maximum distance between adjacent row .
elements in each row of Z'. Find r, the row number with the largest maximum
distance. The row r represents a feature. Set a threshold at the- midpoint of the
maximum distance boundarill.s and split row r into two parts.
Step 2 Convert Z' to Z such that the row r is the same in both the matrices.
The elements of the other rows of Z' are rearranged such that each column of Z
represents a prototype vector. This means, simply, that the elements of each row of
Z are in the same order as the elements of row r. Split Z into two rnatricesZ, and Zz
by splitting each row in a manner similar to row r.
, Step 3 Repeat Steps 1 and 2 for the split matrices that have more than one
column. Terminate the process when all the split matrices have only one column.

Sec. 9.14 Classification Techniques 415


The preceding process produces a series of thresholds that induce questions
of the form, Is feature j > threshold? The questions and the two possible decisions
for each question generate a series of nodes and branches of a decision tree. The
terminal branches of the tree give the classification decision.

Example '9.11
The accompanying table contains the normalized average areas and perimeter lengths
. of five different object classes for which a vision system is to be trained.
1 2 3 4 5

z(l)= (~) area 6 12


"
20 24 27
" "

z(2) =(;) perimeter 56 281 41 35 48


• I

The largest adjacent difference in the first row is 8; in the second row it is 7. Hence the
.

first row is chosen, and z(l) is the feature to be thresholded. This splits Z, into Z. and
-
Z" as shown. Proceeding similarly with these matrices, we get .

The thresholds partition the feature space and induce the decision tree, as shown in
Fig. 9.58.

Statistical classification. In statistical classification techniques it is


assumed the different object' classes and the feature vector have an underlying joint
probability ·density. Let P(Sd be the a priori probability of occurrence of class Sk
and p (x) be the probability density function of the random feature vector observed
as x,
"

Bayes' minimum-risk classifier. The Bayes' minimum-risk classifier mini-


mizes the average loss or risk in assigning" to a wrong class. Define .

416 Image Analysis and Computer Vision Chap. 9


Measure.l'(1 l. z(2l


.1'(2)
1
No 1, {
56
- .

Yes 52 ,....
Yes 5
s, - •
No 44 - 3
- •
36 - •4
Yes
-,.... 2
28 •
No S3
20
- I I I I I
zil)
0 8 16 24

Figure 9.58 Decision tree classifier.

• (9.141)

where C;,k is the cost of assigning x to S« when x E Si in fact and Rc represents the
region of the feature space where P (xiS.) > p (xiS;), for every i k: The quantity '*
C(XiSk) represents the total cost of assigning x to Sk' It is well known the decision
rule that minimizes 9't is given by
. K K

L CI,k P(Si)P (xiS;) < L C;,jP(Si)P(XjSi),


1=1 i=1
't/j '* k ::}x E Sk (9.142)

If Ci, k = 1, i '* k, and Ci, = 0, i = k, then the decision rule simplifies to


k

't/j '* k ::} x E S,


(9,143)
In this case the probability of error in classification is also minimized and the
minimum error classifier discriminant becomes
(9.144)
In practice the P(XISk) are estimated from the prototype data by either parametric
or nonparametric techniques which can yield simplified expressions for the discrimi-
nant function.
There .also exist-some sequential classification techniques such as sequential
probability ratio test (SPRT) and generalized SPRT, where decisions can be made
initially using fewer than N features and refined as more features are acquired
sequentially [62]. The advantage lies in situations where N is large, so that it is

Sec. g,14 Classification Techniques 417



desirable to terminate the process if only a few features measured early can yield
adequate results. '

Nonsupetvised Learning or Clustering

In nonsupervised learning, we attempt to identify clusters or natural groupings in


the feature space. A cluster is a set of points in the feature space for which their
local density is large (relative maximum) compared to the density of feature points
in the surrounding region. Clustering techniques are useful for image segmentation
and for classification of raw data to establish classes and prototypes. Clustering is
also a useful vector quantization technique for compression of images.
Example 9.12
The visual and IR images u, (m, n) and U2 (m, n), respectively (Fig. 9.59a), are trans-
formed pixel by pixel to give the features as v, (m, fl) = (Ut (m, n) + U2 (m, 71»/\12.
v2(m, n) = (u,.(m, n) - u2(m, n»/\I2. This is simply the 2 x 2 Hadamard transform of
the 2 x 1 vector rUt U,JT. Figure 9.59b shows the feature images. The images VI (m, n)
and V2 (m, n) are found to contain mainly the clouds and land features, respectively.
Thresholding these images yield the left-side images in Fig. 9.59c and J. Notice the
. clouds contain some land features, and vice-versa. A scatter diagram, which plots each
vector [v, v2f as a point in the v, versus V2 space, is seen to have two main clusters (Fig.
9.60). Using the cluster boundaries for segmentation, we can remove the land features
from clouds, and vice versa, as shown in Fig. 9.59c and d (right-side images).

,
,
/--." ~'
, "'-+. - ...%!,
"""'. ~ lv1iir~-~,~~iI,i: '. ;l,,,,',...
--......._ _..............
(a) (b) •

.,
: e-'

(e) (d)

Figure 9.59 Segmentation by clustering. (a) Input images Ut (m. n) and u,(m, n);
(b) feature images V, ('II. n) and v, (m, n); (e) segmenation of clouds by thresholding .
V, (left) and by clustering (right): (d) segmentation of land by thresholding v, (left)
and by clustering (right). •
,

418 Image Analysis and Computer


,
Vision . Chap. 9
-~I' 'Ti
240 ,
t-1 i
;
;
,
:
-
-
1
-Y
I
=l -,

• ~


100 • •

~


• •
, • •
i\ • • if •

r-, /

120 ,

I <t

60
, •


m. •

• •

• ,
.~

,. • Cluster 1
• •
• •

I
'\
1'- ;f'
o 60 1m 180 240

FIgure 9.00 Scatter diagram in feature space.

Similarity measure approach. The success of clustering techniques rests


on the partitioning of the feature space into cluster subsets. A general clustering
algorithm is based on split and merge ideas (Fig. 9.61). Using a similarity measure,
the input vectors are partitioned into subsets. Each partition is tested to check'
whether or not the subsets are sufficiently distinct. Subsets that are not sufficiently
distinct are merged. The procedure is repeated on each of the subsets until no
further subdivisions result or some other convergence criterion is satisfied. Thus, a
similarity measure, a distinctiveness test, and a stopping rule are required to define '
a clustering algorithm. For any two feature vectors x/ and xi> some of the commonly
used similarity measures are:
Dot product: (Xi, xi) ~ xI Xi"" IIx; 1IIIxi II COS(Xi, Xi)

Similarity rule: s(. ) ~ (Xi, Xi) ,


x" x J
- (Xi, x/) + (xi> Xi) - (X/, Xj)
,

Weighted Euclidean distance: d(xi>Xj) ~ 2: [x;(k)-Xi(k)]2 Wi


, k

Normalized correlation: P(Xi, Xi) ~ (Xi, Xi)


" V(x;,X/)(Xi,Xi)
Several different algorithms exist for clustering based bn similarity approach.
Examples are given next. .

Sec.9.14 'Classification Techniques 419


Input
data Yes


No

Figure 9.61 A clustering approach.

Chain method [63J. The first data sample is designated as the representative
of the first cluster and similarity or distance of the next sample is measured from the
first cluster representative. If this distance is less than a threshold, say", then it is
placed in the first cluster; otherwise it becomes the representative of the second
cluster. The process is continued for each new data sample until all the data has
been exhausted. Note that this is a one-pass method.

An iterative method lIsodatal [641. Assume the number of clusters, K, is


known. The partitioning of the data is done such that the average spread or variance
of the partition is minimized. Let ~,(n) denote the kth cluster center at the nth
iteration and R, denote the region of the kth cluster at a given iteration. 'Initially, we
assign arbitrary values to ~k (0). At the nth iteration take one of the data points Xi
and assign it to the cluster whose center is closest to it, that is,

Xi E R, ¢:;> d(Xi,j.lk(n» == i=Tin.K [d(xid.lj(n)J (9.145)
where d(x,y) is the distance measure used. Recompute the cluster" centers by
finding the point that minimizes the distance for elements within each cluster. Thus

fLk(n +1): 2:R d(x;,f.lk(n +l»==~in 2:


Xi ~ Y 111' E'R k
d(XhY)' k == 1, ... , K (9.146)
k

The procedure is repeated for each Xi> one at a time, until the clusters' and their
centers remain unchanged. If d (x, y) is the Euclidean distance, then a cluster center
is simply the mean location of its elements. If K is not known, we start with a large
" . \,

., • S. k:l ..... K
I; ,
, Classification
, kth class
• •
,
Image Feature.

Feature
extraction
Symhols, Description
• \ Symbolic ,
• representation I, Interpretation
• •

'. •

,
Vi.ual Look up"
model." In tables

Figure 9.62 Image understanding systems.


420 Image Analysis and Computer Vision. Chap. 9
value of K and then merge to K - 1, K - 2, ... clusters by a suitable cluster-distance
measure.

Other Methods
Clusters can also be viewed as being located at the nodes of the joint Nth-order
histogram of the feature vector. Other clustering methods are based of! statistical
nonsupervised learning techniques, ranking, and intrinsic dimensionality determi-
nation, graph theory, and so on [65, 66]. Discussion of those techniques is beyond •

the goals of this text. •

Finally it should be noted that success of clustering techniques is closely tied
to feature selection. Ousters not detected in a given feature space may be easier to
detect in rotated, sealed, or transformed coordinates. For images the feature vector
elements could represent gray level, gradient magnitude, gradient phase, color,
.and/or other attributes.. It may also be useful to decorrelate the elements of the
feature vector.

9.15 IMAGE UNDERSTANDING


Image understanding (IV) refers to a body of knowledge that transforms pictorial
inputs into commonly understood, descriptions or symbols. Image pattern-
recognition techniques we have studied classify an input into one of several
categories. Interpretation
• • to a class is provided by a priori knowledge, or super- •

vision. Such pattern-recognition systems are the simplest of IV systems. In more-


advanced systems (Fig, 9.62), the features are first mapped into symbols;for exam-

(aJ (b)

,
leI
• Figure 9,63 A rule-based approach for printed circuit board inspection, (a) Pre-
processed image; (b) image after thinning and identifying tracks and pads; (c)
, .' '~egmented image (obtained by region growing). Rules can be applied to the image
... m (c) and violations can be detected.. .

Sec. 9.15',. 'Image Understanding 421


pie, the shape features may be mapped into the symbols representing circles,
rectangles, ellipses, and the like. Interpretation is provided to the collection of
symbols to develop a description of the scene. To provide interpretation, different
visual models and practical rules are adopted. For example, syntactic techniques .
provide grammars for strings of symbols. Other relational models provide rules for
describing relations and interconnections between symbols. For example. pro-
jections at different angles of a spherical object may be symbolically represented as
several circles. A relational model would provide the interpretation of a sphere or a
ball. Figure 9.63 shows an example of image understanding applied to inspection of
printed circuit boards [73, 74].
Much work remains to be done in formulation of problems and development
of techniques for image understanding. Although the closing topic for this chapter,
it offers a new beginning to a researcher interested in computer vision.

PROBLEMS


9.1 Calculate the means, autocorrelation, covariance, and inertia [see Eq. (9.116)J of the
second-order histogram considered in Example 9.1.
9.2* Display the following features measured over 3 x 3, 5 x 5, 9 x 9, and 16 x 16 windows
of a 512 x 512 image: (a) mean, (b) median, (c) dispersion, (d) standard deviation, (e)
entropy, (f) skewness, and (g) kurtosis. Repeat the experiment for different images
-. and draw conclusions about the possible use of these features in image processing
applications.
9.3* From an image of your choice, extract the horizontal, vertical, 30°, 45°, and 60° edges,
using the DFT and extract texture using the Haar or any other transform. .
9.4* Compare the performances of the gradient operators of Table 9.2 and the 5 x 5
stochastic gradient of Table 9.5 on a noisy ideal edge model (Fig. 9.11) image with
SNR = 9. Use the performance criteria of (9.25) and (9.26). Repeat the results at
different noise levels and plot performance index versus SNR.
9.5* Evaluate the performance of zero-crossing operators on suitable noiseless and noisy
images. Compare results with the gradient operators.
9.6 Consider a linear filter whose impulse response is the second derivative of the
Gaussian kernel exp( -x 2/20' 2). Show that, regardless of the value of (Y, the response of
this filter to an edge modeled by a step function, is a signal whose zero-crossing is at
the location of the edge. Generalize this result in two dimensions by considering the
Laplacian of the Gaussian kernel exp[-(x 2 + y2)/2(T2].
9,7 The gradient magnitude and contour directions of a 4 x 6 image are shown in Fig.
P9.7. Using the linkage rules of Fig. 9.16b, sketch the graph interpretation and find the
edge path if the evaluation function represents the sum of edge gradient magnitudes.
Apply dynamic programming to Fig. P9.7 to determine the edge curve using the
criterion of Eq. (9.27) with co: = 411r, 13 '" 1 and d(x, y) = Euclidean distance between x
and y.
9.8 a, Find the Hough transforms of the figures shown below in Figure P9.8.
b. {Generalized Hough transform] Suppose it is desired to detect a curve defined


422 Image AnalV$is and Computer Vision Chap. 9
,

5/ 4
' . "'-...,4 -,
4

,
5
. 3/
4
•• 3/
5
. 5

5
. 4"'-...,
4
.
4

4

4
..
4 "'. / 3
/ 3
/ 3 .

Figure P9.7

101 lbl Ie) .

Figure P9,8

parametrically by q,(x, y, a) '" 0, where a is a p X 1 vector of parameters, from a set


of edge point (XI.Yi), i -1",. ,N, Run a counter C(a) as follows:
Initialize: C(a) = 0
Doi"'1,N: C(a)= C(a) +1, where a is such that q,(Xi,Yi, a) '" 0
Then the local maxima of C(a) gives the particular curvets) that pass through the
given edge points, If each element ofais quantized to L different levels, the
dimension of vector C(a) will be LP, Write the algorithm for detecting elliptical
segments described by

If xo,. Yo, a, and b are represented by 8-bit words each, what is the dimension of
C(a)?
c. If the gradient angles ll, at each edge point are given, then show how the relation
ilq, iJq,
- +- tan6=O for (x, y, 6) '" (Xi'Yill,)
ilx ily •

might be used to reduce the dimensionality of the search problem, •

9.9 a. Show that the normalized uniform periodic B -splines satisfy


I< . 1<-1

LBo,.(t)dt '" 1
0;.,
and L
j-e
Bo,.(t + j) = 1, O<t < 1

b. If an object of uniform density is approximated by the polygon Obtained by joining
the ,adjacent control points by straight lines, find the expressions ic« Center of mass,
perimeter, area, and moments in terms of the control points. '

Problems Chap. 9
9.10 (Cubic Bssplines} Show that the control points and the cubic B-splines sampled at
uniformly spaced nodes are related via the matrices 13" as follows:
41 1 0
1 4
,
Q.
6
°1
1 4 0 •
I
B4=~ , ;;
1
1 t) 1 4

where the first matrix is for the periodic case and the second is for the nonperiodic
case.
9.11 (Properties of FDs)
a. Prove the properties of the Fourier descriptors summarized in Table 9.8.
b. Using Fig. 9.26, show that the reflection of x,, x, is given by

x, = A 2 1 B 2 [-ZABxl
+
+ (A' - B 2
)x2 - 2BC]

From these relations prove (953).


c. Show how the size, location, orientation, and symmetry of an object might be
determined if its FDs and those of a prototype are given.
d. Given the FDs of u(n), find the FDs of Xl (n) and x, (n) and list their properties with
respect to the geometrical transformations considered in the text.
9.12 (Additional properties of FDs [32]) , .
a. (FDs for a polygon curve) For a continuous curve u(t) il. x, (r) + jX2 (t) . with
period T, the FDs are the Fourier series coefficients a{k) = (l1THl u(t)
exp( -j2'trklIT) dt. If the object boundary is a polygon whose vertices are repre-
sented by phasors V., k = 0, 1, ... ,m -1, show that the FDs are given by .
. T '" .
a(k) == (2 k),2: (b,- 1 - hi) exp( -j2'trkt;fT)
'fi '-1
where

k >O,to~O

v,

L- +-x,

(aJ· (oj


• Figure P9.12

b. (Line patterns) If the given curve is a line pattern then a closed contour can be

424 Image Analysis and Computer Vision Chap.S


,
,

obtained by retracing it. Using the symmetry of the periodic curve, show that the
FDs satisfy the relation .
,
a(k) "" a( _k)e-jk(2~1T)~
for some [3, If the trace begins at t "" 0 at one of the endpoints of the pattern, then
[3 = 0, Show how this property may be used to skeletonize a shape, .
c. The area A enclosed by the outer boundary of a surface is given by
Ii 1f·· IJrT dx,(t) I rT
dx;
A ='j'x1dx,-. x ,dx,='i ,.ox,(t) dt dt -'),.ox, (t)-dt dt
00

In terms ofFDs show that A = - L la(k)i' k-n, Verify this result for surface area of
, .,
' .. - ' k_1$f
. '

a line pattern,
9.13 (Properties ofAR models)
a. Prove the translation, scaling, and rotation properties of AR model parameters
listed in Table 9.9.
b. Show a closed boundary can lie reconstructed from the AR model residuals Ej (n)
by inverting a circulant matrix.
c. Find the relation between AR model features and FDs of closed boundaries,
9.14* Scan and digitize the ASCII characters and find their medial axis transforms, Develop
any alternative practical thinning algorithm to reduce printed characters to line
shapes, .
9.15 Compare the complexity of printed character recognition algorithms based on (a)
template matching, (b) Fourier descriptors, and (c) moment matching,
9.16 (Matched filtering) Write the matched filter output SNR as

SNR = [oor)GS~I2][S;;1I2 Uexp{-j(w,mo + W2nO)} di», dwz'/Jf»IGI s.a«, dw,


2

where G and U are Fourier transforms of gem, n), u(m, n), respectively, Apply the
Schwartz inequality to show that SNR is maximized only when (9,132) is satisfied
within a scaling constant that can be set to unity. What is the maximum value of SNR?
9.17 If Jl.k denotes the mean vector of class k prototypes, show that the decision rule:
I!x - !LkW < IIx - !L111 , i 4- k :;, x E Sk' gives a linear discriminant with a, = 2jJ.k,
2

bk = -11p,k!f,
9.18 Find the decision tree of Example 9,11 if an object class with z(l) = 15, z(2) = 30 is
added to the list of prototypes.
9.19* A printed circuit board can be modeled as a network of pathways that either merge
into other paths as terminate into a node. Develop a vision system for isolating defects
such as breaks (open circuits) and leaks (short circuits) in the pathways. Discuss and
develop practical preprocessing; segmentation, and recognition algorithms for your
system.

BIBLIOGRAPHY.

Section 9.1-9.3 •

Some general references on feature extraction, image analysis'and computer vision


are:

Bibliography Chap. 9 4ZS


1. R. O. Duda and P. E. Hart. Pattern Recognition and Scene Analysis. New York: John
Wiley, 1973.
2. A. Rosenfeld and A. C. Kak, Digital Picture Processing. New York: Academic Press,
1976. Also see Vols. I and II, 1982.
3. D. H. Ballard and C. M. Brown. Computer Vision. Englewood Cliffs, N.J.: Prentice-
Hall, 1982.
4. B. S. Lipkin and A. Rosenfeld (eds.). Picture Processing and Psychopictorics. New York:

Academic Press, 1970.
. 5. J. K. Aggarwal, R. O. Duda and A. Rosenfeld (eds.), Computer Methods in Image
Analysis. Los Angeles: IEEE Computer Society, 1977.

Additional literature on image analysis may be found in several texts referred


to in Chapter I, in journals such as Computer Graphics and image Processing,
Pattern Recognition, iEEE Trans. Pattern Analysis and Machine intelligence, and ill
the proceedings of conferences and workshops such as ,IEEE Conferences on
Pattern Recognition Image Processing, Computer Vision and Pattern Recognition,
International Joint Conference Pattern Recognition, and the like.

Section 9.4
• • •

Edge detection is a problem of fundamental importance in image analysis. Different


edge detection techniques discussed here follow from:

6. J. M. S. Prewitt. "Object Enhancement and Extraction," in [4].


7. L. S. Davis. "A Survey of Edge Detection Techniques." Computer Graphics and Image
Processing, vol. 4, pp. 248-270, 1975.
8. A. Rosenfeld and M. Thurston. "Edge and Curve Detection for Visual Scene Analysis,"
in [5]. Also see Vol. C-21, no. 7, (July 1972): 677-715.
9. L. G. Roberts. "Machine Perception of Three-Dimensional Solids," in [5].
10. R. Kirsch. "Computer Determination of the Constituent Structure in Biological Im-
ages." Compt. Biomed. Res. 4, no. 3 (1971): 315-328.
11. G. S. Robinson. "Edge Detection by Compass Gradient Masks." Comput. Graphics
Image Proc. 6 (1977); 492-501.
12. W. Frei and C. C. Chen. "Fast Boundary Detection: A Generalization and a New
Algorithm." IEEE Trans. Computers 26, no. 2 (October 1977): 988-998.
13. D. Marr and E. C. Hildreth. "Theory of Edge Detection." Proc. R. Soc. Lond. B 270
(1980): 187-217. . .
. 14. R. M. Haralick. "Zero Crossing of Second Directional Derivative Edge Detector."
Robot Vision (A. Rosenfeld, ed.), SPIE 336, (1982): '91-96.
15. M. Heuckel. "An Operator Which Locates Edges in Digitized Pictures." J. ACM 18,
no. 1 (January 1971): 113-125. Also see J. ACM 20, no. 4 (October 1973): 634-647.
16. A. K. Jain and S. Ranganath, "Image Restoration and Edge Extraction Based on 2-D
Stochastic Models." Proc. ICASSP-82, Paris, May 1982.
17. W. K. Pratt. Digital Image Processing. New York: Wiley Interscience, 1978, p. 497.

426 Image Analysis and ComputeT Vision Chap. 9



Section 9.5

For various types of edge linkage rules, contour-following. boundary detection


techniques, dynamic programming, and the like, we follow: .
18. R. Nevatia. "Locating Object Boundaries in Textured Environments:' IEEE Trails.
Comput. C-25 (November 1976): 1170-1180.
19. A. Martelli. "Edge Detection Using Heuristic Search Methods." Compo Graphics Image
Proc. 1 (August 1972): 169-182. Also see Martelli in [5].
20. G. P. Ashkar and J. W. Modestine. "The Contour Extraction Problem with Biomedical
Applications." Compo Graphics Image Proc. 7 (1978): 331-355.
21. J. M. Lester, H. A. Williams, B. A. Weintraub, and J. F. Brenner. "Two Graph
Searching Techniques for Boundary Finding in White Blood Cell Images." Compo Bioi.
Med. 8 (1978): 293-308..
22. R. E. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton, N.J.: Prince-
ton University Press, 1962.
23. U. Montanari. "On the Optimal Detection of Curves in Noisy Pictures." Commun.
. ACM 14 (May 1971): 335-345.
24. P. V. C. Hough. "Method and Means of Recognizing Complex Patterns." U.S. Patent
3,069,654, 1962.
25. R. Cederberg. "Chain-Link Coding and Segmentation for Raster Scan Devices."
Computer Graphics and Image Proc. 10, (1979): 224-234.

Section 9.6

For chain codes, its generalizations, and run-length coding based segmentation
approaches we follow:

26. H. Freeman. "Computer Processing of Line Drawing Images." 'Computer Surveys 6


(March 1974): 57-98. Also see Freeman in [5] and J. A. Saghri and H. Freeman in IEEE
Trans. PAMI (September 1981): 533-539.

The theory of B -splines is well documented in the literature. For its applica-
tions in computer graphics: '

27. W. J. Gordon and R. F. Riesenfeld. "B-spline Curves and Surfaces," in R. E. Barnhill


and R. F. Riesenfeld (eds.), Computer Aided Geometric Design, New York: Academic
Press, 1974, pp. 95-126.
28. B. A. Barsky and D. P. Greenberg. "Determining a Set of B-spline Control Vertices
to Generate an Interpolating Surface." Computer Graphics and Image Proc. 14 (1980):
203-226.
29. D. Paglieroni and A. K. Jain. "A Control Point Theory for Boundary Representation
and Matching." Proc. ICASSP, Vol. 4, pp. 1851-1854, Tampa, Fla. 1985.
. 30. D. Hoffman, "The Interpretation of Visual Illusions." Scientific American, Dec. 1983,
pp. 151-162. ., .'. .,' .

Bibliographv .' Chap. 9 427


Fourier Descriptors have been applied for shape aaalysisof closed curves and
• •

hand-printed characters. For details see Granlund in [5] and:


• •

31. C. T. Zahn and R. S. Roskies. "Fourier Descriptors for PlaneClosed Curves." IEEE
Trans. Computers C·21 (March 1972): 269-281. .
32. E. Persoon and K. S. Fu. "Shape Discrimination Using Fourier Descriptors." IEEE
. Trans. Sys. Man, Cybern. SMC-7 (March 1977): 170-179..

For theory of AR models for boundary representation, we follow:


. .
33. R. L. Kashyap and R. Chellappa, "Stochastic Models for Closed Boundary Analysis:
Representation and Reconstruction." IEEE Trans. Inform. Theory IT-27 (September
1981): 627'-637. .


Section 9.7 •

Further details on quad-trees and medial axis transform:

34. H. Samet. "Region Representation: Quadtrees from Boundary Codes." Comm. ACM
23 (March 1980): 163-170.

Section 9.8

For basic theory of moments and its applications:




35. M. K. Hu. "Visual Pattern Recognition by Moment Invariants," in [5}.
36. M. R. Teague. "Image Analysis via the General Theory of Moments." J. of Optical
Society of America 70, no, 8 (August 1980): 920-930. •
37. G. B. Gurevich. Foundations of the Theory of Algebraic Invariants. Groningen, The
, Netherlands: P. Noordhoff, 1964.
38. D. Casasent and D. Psaltis, "Hybrid Processor to Compute Invariant Moments for
Pattern Recognition." J. Optical Society ofAmerica 5, no. 9 (September 1980); 395-397.
39. S. Dudani, K. Breeding, and R. McGhee. "Aircraft Identification by Moment In-
variants." IEEE Trans. on Computers C-26, no. 1 (January 1977): 39-45.
40. R. Wong and E. Hall. "Scene Matching with Moment Invariants." Computer Graphics
and Image Processing 8 (1978): 16-24.
41. H. Blum. "A Transformation for Extracting New Descriptions of Shape." Symposium
on Models for the Perception of Speech and Visual Form, Cambridge: MIT Press, 1964.
42. E. R. Davies and A. P. Plummer. "Thinning Algorithms: A Critique and a New Method-
ology." Pattern Recognition 14, (1981); 53-63. . '
43. D. Rutovitz. "Pattern Recognition." J. of Royal Stat. Soc. 129, no. 66 (1966): 403-420.

44. J.Serra. Images Analysis and Mathematical Morphology. New York: Academic Press,
1982.. , . .
45. T. Pavlidis. "MiIiimum Storage Boundary Tracing Algorithm and Its Application to
Automatic Inspection." IEEE Transactions on Sys., Man; and Cybern..8, no. 1 (January
1978): 66-69.

428 Image Analysis and COmputer VIsion Chap. 9



Section 9.11

For surveys and further details on texture, see Hawkins in [4], Picket in [4}, Haralick
et at in [5], and:

46. P. Brodatz, Textures: A Photographic Album for Artists and Designers, Toronto: Dover
Publishing Co" 1966.
47. R. M. Haralick. "Statistical and Structural Approaches to Texture." Proc. IEEE 67
(May 1979):
,
" ,
786-809. Also see Image Texture Analysis, New York: Plenum, 1981.
48. G. G. Lendaris and G. L. Stanley. "Diffraction Pattern Sampling for Automatic Pattern
Recognition," in [5}.
49. R. P. Kruger, W. B. Thompson, and A. F. Turner. "Computer Diagnosisof Pneumo-
coniosis:' IEEE Trans. Sys. Man, Cybern. SMC-4, (January 1974): 40-49.
50. B. Julesz, et at "Inability of Humans to Discriminate Between Visual Textures that
Agree in Second Order Statistics-Revisited." Perception 2 (1973): 391-405. Also see
IRE Trans. Inform. .Theory IT-8 (February
.
1962): 84-92.
51. O. D. Faugeraus and W. K. Pratt. "Decorrelation Methods of Texture Feature Extrac-
tion." IEEE Trans. Pattern Anal. Mach. Inieli. PAMI-2 (July 1980): 323-332.
52. B. H. McCormick and S. N. Jayaramamurthy. "Time Series Model for Texture Syn.
thesis." Int. J. Comput, Inform. Sci 3 (1974): 329-343. Also see vol, 4, (1975): 1-38.. .
53. G. R. Cross and A. K. Jain. "MarkovRandom Field Texture Models." IEEE Trans.
Pattern Anal. Mach. Iraell. PAMI-5, no. I.(January 1983): 25-39.
54. T. Pavlidis. Structural Pattern Recognition. New York: Springer-Verlag, 1977.
• •

55. N. Ahuja and A. Rosenfeld. "Mosaic Models for Textures." IEEE Trans. Pattern Ana/.
Mach. Intell. PAMI-3, no. 1 (January 1981): 1-11.

, Section 9.12
56. G. L. Turin. "An Introduction to Matched Filtering." IRE Trans. Inform. Theory
(June 1960): 311-329.
57. A,. Vander Lugt, F. B. Rotz, and A. Klooster, Jr. "Character Reading by Optical Spatial
Filtering." in J. Tippett et al. (eds), Optical and Electro-Optical Information Processing.
Cambridge, Mass.: MIT Press, 1965, pp. 125-141. Also seepp, 5-11 in [5].
58. J. R. Jain and A. K. Jain. "Displacement Measurement and Its Application in Inter-
frame Image Coding." IEEE Trans. Common COM·29 (December 1981): 1799-1898.
59. D. L. Barnea and H. F. Silverman. "A Class of Algorithms for Fast Digital Image
Registration." IEEE Trans. Computers (February 1972):'179-186.

. Section 9.13, 9.14


• •

Details of classification and clustering techniques may be. found in [1] and other
texts on pattern recognition. For decision tree algorithm and other segmentation
techniques:

60. C. Rosen et al. "Exploratory Research in Advanced Automation." SRI Technical Report
First, Second and Third Reports, NSF Grant GI-38100Xl, SRI Project 2591, Menlo
Park, Calif.: SRI, December 1974. .

Bibliography Chap. 9 429


61. G. J. Agin and R. O. Duda. "SRI Vision Research for Advanced Automation." Proc,
2nd. USA Japan Computer Conf., Tokyo, Japan, August 1975, pp. 113-117.
62. H. C. Andrews. Introduction to MathematicalTechniques in Pattern Recognition, New
York: John Wiley,

1972. Also see A B. Coleman and H. C. Andrews, "Image Segmen- •
tation by Clustering." Proc. IEEE 67, no. 5 (May 1979): 773-785.
63. G. Nagy. "State of the Art in Pattern Recognition." Proc. IEEE 5, no. 5 (May 1968):
836-86L
64. G. H. Ball and D. J. Hall. "ISODATA, A Novel Method of Data Analysis and Pattern
Classification," International Communication Conference, Philadelphia, June 1966.
65. C. T. Zahn. "Graph-Theoretical Methods for Detecting and Describing Gestalt Clus-
ters." IEEE Trans. Computers C-20, no. 1 (January 1971): 68-86.
66. J. C. Gower and G. J. S. Ross. "Minimum Spanning Trees, and Single Linkage Cluster
Analysis:' Appl. Statistics 18, no. 1 (1969): 54-64.
67. M. R. Anderberg. Cluster Analysis for Application. •
New York: Academic Press, 1971.

68. E. B. Henrichon, Jr. and K. S. Fu. "A Nonpararnetric Partitioning Procedure for Pattern
Classification." IEEE Trans. Computers C-18, no. 7 (July 1969).
69. I. Kabir. "A Computer Vision System Using Fast, One Pass Algorithms," M.S. Thesis,
University of California at Davis, 1983.
70. G. Hirzinger and K. Landzattel. "A Fast Technique for Segmentation and Recognition
of Binary Patterns." IEEE Conference on Pattern Recognition and Image Processing,
1981. .
71. D. W. Paglieroni. "Control Point Algorithms for Contour Processing and Shape Analy-
sis," Ph.D. Thesis, University of California, Davis, 1986.
72. C. R. Brice and C. L. Fennema. "Scene Analysis Using Regions," in (5]. .

Section 9,15

For further reading or image understanding research see proceedings of Interna-


tional Joint Conference on Artificial Intelligence, DARPA Image Understanding
Workshop and the various references cited there. For pc board inspection and rule
based systems: .

73. A. Darwish and A. K. Jain. "A Rule Based Approach for Visual Pattern Inspection."
IEEE Trans. Pattern Anal. Mach. Intell: PAMI-10, no, 1 (January 1988): 56-68.
74. J. R. Mandeville. "A Novel Method for Analysis of Printed Circuit Images," IBM J. Res.
Dev. 29 (January 1985): 73-86.




430 Image Analysis and Computer Vision Chap. 9


Image Reconstruction
from Projections
-

10.1 INTRODUCTION

An important problem in image processing is to reconstruct a cross section of an


object from several images of its transaxial projections [1-11]. A projection is a
shadowgram obtained by illuminating an object by penetrating radiation. Figure
10.1 shows a typical method of obtaining projections. Each horizontal line shown in
this figure is a one-dimensional projection of a horizontal slice of the object. Each
pixel on the projected image represents the total absorption of the X-ray along its
path from the source to the detector. By rotating the source-detector assembly
around the object, projection views for several different angles can be obtained.
The goal of image reconstruction is to obtain an image of a cross section of the object
from these projections. Imaging systems that generate such slice views are called cr
(computerized tomography) scanners. Note that in obtaining the projections, we
lose resolution along the path of the X-rays. CT restores this resolution by using
information from multiple projections. Therefore, image reconstruction from pro-
jections can also be viewed as a special case of image restoration.

Transmission Tomographv

For X-ray cr scanners, a simple model of the detected image is obtained as follows.
Let f(x, y) denote the absorption coefficient of the object at a point (x, y) in a slice
• at some fixed value of z (Fig. 10.1). Assuming the illumination to consist of an
infinitely thin parallel beam of X-rays, the intensity of the detected beam is given by
• •

l=loexp -J/(x,Y)duJ (10.1)


431
z


v, 0


"-
--

r ,
• I I
\
• 0

J .
~ f \ a.
I \
X-rays V tt;x, y) ..... \
I
A
\
"- "
rei';
~
is
i:;i
\
/ \ I K~w. .
--
0

\ /
Typical projection

7- ~ •

Display
Source Object Detectors . Computer slice
,c . ,.
./ viev~
-
/
"~ L

Figure 10.1 An X-ray cr scanning system.


where 10 is the intensity of the incident beam, L is the path of the ray,' and u is the
distance along L (Fig. 10.2). Defining the observed signal as

(10.2)

we obtain the linear transformation



g A gts, e) = J/(x,y)du, _-00 < s < 00, O.s 9 < 7T (10.3)
••

where (s, 8) represent the coordinates of the X-ray relative to the object. The image
reconstruction problem is to determine f(x, y) from g(s, 8). In practice we can only
estimate f(x,y) because only a finite number of views of g(s, (I) are available. The
preceding imaging technique is called transmission tomography because the trans-
mission characteristics of the object are being imaged. Figure 10.1 also shows an
X-ray CT scan of a dog's thorax, that is, a cross-section slice, reconstructed from
120 such projections. -X-ray CT scanners are used in medical imaging and non-
destructive testing of l}1echanical objects.

Reflection Tomography

There are other situations where the detected image is related to the object by a
transformation equivalent to (10.3). For example, in radar imaging we often obtain

432· Image Reconstruction from Projections Chap. 10


gIs, OJ
• • y

,...,'--_,..-_ _ x

fIx. y)

X~ray source

Figure 10.2 Projection imaging geometry in cr scanning.

a projection of the reflectivity of the object. This is called reflection tomography.


For instance, in the FLR imaging geometry of Figure 8.7a, suppose the radar pulse
width is infinitesimal (ideally) and the radar altitude (h) is large compared to the
minor axis of the antenna half-power ellipse. Then the radar return at ground range
r and scan angle <l> can be approximated by (10.3), where f(x, y) represents the
ground reflectivity and L is the straight line parallel to the minor axis of the ellipse
and passing through the center point of the shaded area. Other examples are found
in spot mode synthetic aperture and CHIRP-doppler radar imaging [10, 36].

Emission Tomography

Another form of imaging based on the use of projections is emission tomography,


for example, positron emission tomography (PET), where the emissive properties of
isotopes planted within an object are imaged. Medical emission tomography ex-
ploits the fact that certain chemical compounds containing radioactive nuclei have a
tendency to affix themselves to specific areas of the body, such as bone, blood,
tumors, and the like. The gamma rays emitted by the decay of the isotopes are
detected, from which the location of the chemical and" the associated tissue within
the body can be determined. In PET, the radioactive nuclei used are such that
positrons (positive electrons) are emitted during decay. Near the source of emis-
sion, the positrons .combine with an electron to emit two gamma rays in nearly
"opposite directions. Upon detection of these two rays, a measurement representing
the line integral of the absorption distribution along each'path is obtained.

Sec, 10.1 Introduction " " 433


Magnetic Resonance Imaging

Another important situation where the image reconstruction problem arises is ifI
magnetic resonance imaging (MRI). t Being noninvasive, it is becoming increasingly
attractive in medical imaging for measuring (most commonly) the density of protons
(that is, hydrogen nuclei) in tissue. This imaging technique is based on the funda-
mental property that protons (and all other nuclei that have an odd number of
.protom or neutrons) possess a magnetic moment and spin. When placed in a mag-
netic field, the proton precesses about the magnetic field in a manner analogous to a
top spinning about the earth's gravitational field. Initially the protons are aligned
either parallel or antiparallel to the magnetic field. When an RF signal having an
appropriate strength and, frequency is applied to the object, the protons absorb
energy, and more of them switch to the antiparalJel state. When the applied RF
• signal is removed, the absorbed energy is reemitted and is detected by an RF .
receiver. The proton density and environment can be determined from the charac-
teristics of this detected signal. By controlling the applied RF signal and the sur-
rounding magnetic field, these events can be made to occur along only one line'
within the object. The detected signal is then a function of the line integral of the
MRI signal in the object. In fact, it can be shown that the detected signal is the
Fourier transform of the projection at a given angle [8, 9]. .

Projection~based Image Processing



In the foregoing CT problems, the projection-space coordinates (8, 6) arise nat-
urally because of the data gathering mechanics. This coordinate system plays an
. important role in many other image processing applications unrelated to CT. For
example, the Hough transform, useful for detection of straight-line segments of
polygonal shapes (see Section 9.5), is a representation of a straight line in the
projection space. Also, two-dimensional linear shift invariant filters can be realized
by a set of decoupled one-dimensional filters by working in the projection space.
Other applications where projections are useful are in image segmentation (see
Example 9.8), geometrical analysis of objects [11] and in image processing applica-
tions requiring transformations between polar and rectangular coordinates.
We are now ready to discuss the Radon transform, which provides the mathe-
• matical framework necessary for going back and forth between the spatial coor-
dinates (z, y) and the projection-space coordinates (s, 6).

10.2 THE RADON TRANSFORM [12. 13] •

Definition

The Radon transform of a function f(x, y), denoted as g(s, 6), is defined as its line
integral along a line inclined at an angle 6 from the j--axis and at a distance s from
.
t Also called nuclear magnetic resonance (NMR) imaging. To emphasize its noninvasive features, the
word nuclear is being dropped by manufacturers of such imaging systems to avoid confusion with nuclear
reactions associated with nuclear energy and radioactivity.

• Image Reconstruction from Projections" Chap. 10



the origin (Fig. 10.2). Mathematically, it is written as

g{s, 9) ~ rJlI == Jr I(x, y)f>(x cos e+ y sin e- s) dx dy, . (10.4)


-~

. -oo<s <oo,ose<'iT
The symbol rJl, denoting the Radon transform operator, is also called the projection
operator. The function g(8, e), the Radon transform of I(x, y), is the one-dimen-
sional projection of I(x, y) at an angle 6. In the rotated coordinate system (s, u),
where
s v- x cos9+y sin9 x ""scos9-u sin 6
or (10.5)
u "" -x sin a+ y cos 6 y =s sinO+u cos9
(10.4) can be expressed as .

g(s, 6)= r I(s CQSO-U sine,s sin9+u cos9)du, (10.6)


-~ -00<8 <00,0<e<1I'

The quantity g(s, e) is also called a ray-sum, since it represents the summation of
I(x, y) along a ray at a distance s and at an angle 9.
The Radon transform maps the spatial domain (x, y) to the domain (s, 0).
Each point in the (s, 6) space corresponds to a line in the spatial domain (x, y). Note .
that (s, 6) are not the polar coordinates of (x, y), In fact, if (r, $) are the polar
coordinates of (x, y), that is,
x == r cos 4>, y "" r sin 4> (10.7)
then from Fig. 10.3a
s == r cos(e - $) (10.8)

For a fixed point (r, $), this equation gives the locus of all the points in (8, 0), which
is a sinusoid as shown in Fig. 10.3b. Recall from section 9.5 that the coordinate pair
(8, 6) is also the Hough transform of the straight line in Fig. 1O.3a.
Example 10.1
Consider a plane wave,f(x,y) "" exp[j2'IT(4x + 3y)]. Then its projection function is
. ~

g(s, 0)= L~ exp[j8'Jl'(s cosil-u sinO)] exp[j61r(s sinO +ucosil)]du


= exp[j2Tfs(4 cos a + 3 sinO)] f: exp[-j2'1l"u(4 sin 0 - 3 cosO)]du


= exp[j2'Jl's(4 cos e + 3 sin 11)]0(4 sin e - 3 cos e) = 6)e J'- 6( 1l - «I>J
where «I> = tan-l(~). Here we have used the identity

o[f(O)] a ~ [1f'tOk)I]&(O - Ok) . (10.9) .

where!'(O) ~ df(fJ)/d6 and Ok, k "'1,2,. 0 0' are the roots of{(9).

Sec. 10.2 The Radon Transform 435 .


y

• •

o ,
-r r •

lal Spatial domain (x, V) (bl The pointP maps into a


sinusoid in the ($,9) plane

!'
~"- -- -_._-.-

.~ ~. ~

~ ••
, ... .. f

-

x
lei An image and its Radon transform
Figure 10.3 Spatial and Radon transform domains.


Notation
In order to avoid confusion between functions defined in different coordinates, we
adopt the following notation. Let 'II be the spaceof functions defined on lR2, where
R denotes the real line. The two-dimensional Fourier transform pair for a function
[(x, y) E i't is denoted by the relation .
. 72
. . . ' f(x, y) -_)0 F(~ 1, ~z)
~(

(10.10)
In polar coordinates we write
, •

435 Image ReconStructiol1 from Projections Chap. 10




J},(£, a) = F(£cose, §sin 6) (10.11)
The inner product in 'it is defined as •

<f1 ,fzt ~ ff} (x, y)R (x,y)dxdy, fll~ ~ {f,ft (10.12)

Let Vbe the space of functions defined on JR x [0,11"]. The one-dimensional Fourier
transform of a function g(s, 6) E 'I) is defined with respect to the variable s and is
indicated as
.71
g(s, e) ~.---:_~ G(§, e) (10.13) .
s
The inner product in V is defined as

(10.14)

For simplicity we will generally consider 'tt and '1/ to be spaces of real functions.
The notation

g = r;¥f (10.15)
will be used to denote the Radon transform of f(x, y), where it will be understood
thatf E 'It,g E V.

Proper;lies of the Radon Transform

The. Radon transform is linear and has several useful properties (Table 10.1), which
can be summarized as follows. The projections g(s, It) are space-limited in s if the
object f(x, y) is space-limited in (x, y), and are periodic in e with period 211". A
translation of [(x, y) results in the shift of g(s, It) by a distance equal to the pro-

TABLE 10.1 Properties ofthe Radon Transform


.• Function Radon Transform

f(x,y) = fp(r, q,) . g(s, 6)


1 Linearity: ad, (x, y) +a2!2 (x, y) a,g, (s, 6) +a2gi (s,'6)
2 Space limitedness:
D
f(x, y) "" 0, Ixl >"2'!Y 1>"2
D g(s, 6) =0,'
DV2
lsi> "}
3 Symmetry: f(x, y) g (s, 9) eca g (-s, a:t'IT) -
4 Periodicity: f(x, y) g(s, a) '" g(s, 6 + 2h),
k = inteeer
, "
5 Shift: f(x - Xo , y - Yo) g (s - Xo cos 6 - )'0 sin e, 6)
6 Rotation by eo : fp (r, <I> + aD) g(s, e + 60 )
7 1
Scaling: f(ax, ay) jaTg(as, ~), a oF 0
8 Mass conservation:
M"" fff(x,Y)dxdY 'M ",,[~g(S, 6)ds, \fa

"Sec; 10.2 The Radon Transform 437


1.0....--...,....--,--..,----,.--.,.-----,--·-,..--..,----,----,
• !

0.81---/---+----+-:
• •

7 e
0.41---t-----J r---+I7-~----1--,

~
0.21---1--+1--1--++----+'''<-" r - - t - -
I' V f
o 1---I--1+-I---+---t---H-+--+---,f--f---t--t+-1---1
t
Ie

\1\

. h
i j
-0.4 f---j---/-' -+-_.....,....,1-9WBt-1I---1--+-.HY----+--c--i
I '/


-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

(al

\
\

(bl

figure 10.4 (a) Head phantom model; (b) constant-density ellipise,f(x; y) =f. for
(x'/a') + (Y'lb') S 1.

438 Image Reconstruction from Projections Chap. 10


TABLE 10.2 Head Phantom Components; IX, yo I are the coordinates of the center
of the ellipse. The densities indicated are relative to the density of
water [18]. •
Major Minor Incliuation Density
Ellipse -x -y • •
semiaxis •
semraxis

(dqm:es) t (x, y)
a 0.0000 0.0000 0.6900 0.9200 0.00 1.0000
b 0.0000 -0.0184 0.6624 0.8740 0.00 -0.9800
c 0.2200 0.0000 0.1100 0.3100 -18.00 -0.0200
d -0.2200 0.0000 0.1600 0.4100 18.00 -0.0200
iii 0.0000 0.3500 0.2100 0.2500 0.00 0.0100
f 0.0000 0.1000 0.0460 0.0460 0.00 0.0100
g 0.0000 -0.1000 •
0.0460 0.0460 0.00 0.0100
h -0.0800 -0.6050 0.0460 0.0230 0.00 0.0100

.I 0.0000 -0.6060 0.0230 0.0230 0.00 0.0100

J 0.0600 -0.6050 0.0230 0.0460 0.00 0.0100

jection of the translation vector on the line s = x cos I) + y sin e. A rotation of the
object by an angle eo causes a translation of its Radon transform in the variable e. A
scaling of the (z, y) coordinates of f(x, y) results in scaling of the s coordinate
together with anamplitude scaling of g (s, I). Finally, the total mass of a distribution
f(x,y) is preserved by g(s, 6) for aile.
Example 10.2 Computer generation of projections of a phantom
In the development and evaluation of reconstruction algorithms, it is useful to simulate
projection data corresponding to an idealized object. Figure 10.4a shows an object
composed of ellipses, which is intended to model the human head [18, 21]. Table 10.2
gives the parameters of the component ellipses. For the ellipse shown in Fig. 10Ab, the
projection at an angle a is given by ,

2abv'r..,- S 2 lsi SSm


go(s, 6)=fod = s~ fo,
0, lsi >s",
where s;, = a cos a + b sin 1l, Using the superposition, translation, and rotation
2 2 2 2

properties of the Radon transform, the projection function for the object of Fig. 10.4a
can be calculated (see Fig. 1O.13a).

10,3, THE BACK·PROJECTION OPERATOR

Definition

Associated with the Radon transform is the back-projection operator (;,f, which is
defined as
,

b(x,y) ~ {llg = r g(x COS 6 + Y sin 6, 6)d6 • (10.16)

Sec.10.3 . The Back-Projsctlcn Operator 439


The quantity b (x, y) is called'the back projection of g(s,(l).ln polar coordinates it
can be written as

b(x, y) = b~ (z, 4» = r str cos(O - 4», (l)de . •


(10.17)

Back projection represents the accumulation of the ray-sums ofall of the rays that .
. pass through the point (z, y) or (r, $). For
.
example, if , ' ,

sv, 0) =gl(s)8(0"': OJ) + g2(s)8(8 -(2)


that is, if there are only two projections, then (see Fig. 10.5)
bp(r, ¢)=gl(SI)+g2(52)
,
where SI = r cos(01 - 4», .1'2 = r cos(O2 - ¢). In general, for a fixed point (x, Y) or
(r, ¢), the value of back projection aJg is evaluated by integrating g(s, 0) over El .
for all lines that pass through that point. In view of (10.8) and (10.17), the back-
projection at (r, ¢) is also the integration of g(8, 0) along the sinusoid
8 = r cos(e - $) in the (.I', 0) plane (Fig. 1O.3b).

.Remarks

The back-projection operator f]J maps a function of (.I', El) coordinates into a func-
tion of spatial coordinates (x, y) or (r, 4». '
The back-projection b(x,y) at any pixel (x,y) requires projections from all
directions. This is evident from (10.16). • .
0, Is)

s,

IJ, (s)
y

I,
I
I
I
.. t
t
<, I
<,
<, I
0 I S2
¢, 0

,

• Figure 10.5 Back-projection of gt (8)


and 82 (8) at (r, 4». .

tmageRecol'lstruction from Projections . Chap. 10


,

• •
• •
• •

It can be shown that the back-projected Radon transform


. !(x. y) U (jJ g = !2J 9ft • (10.18)
is an image off(x, y) blurred by the PSF 1/(x 2 + y2)112, that is,
}(x, y) = f(x, y) 0 (x~ + y2)-1I1; (10.19)
where ® denotes the two-dimensional convolution in the Cartesian coordinates. In
polar coordinates
-. 1
fir, 4» = Mr, 4» 0 -rlI' (10.20)

• where @ now denotes the convolution expressed in polar coordinates (Problem


10.6). Thus! the operator (jJ is not the inverse of rlt. In fact, W is the adjoint of f7(
[Problem 10.7]. Suppose the objectf(x, y) and its projections g(s, 8), for all 8, are
.discretized and mapped into vectors f and g and are related by a matrix trans-
formation g = Rf. The matrix R then is a finite difference approximation of the
operator a. The matrix R T would represent the approximation of the back-
projection operator {}J.
The operation I = {}J[ f7(f] gives the summation algorithm (Fig. 10.6). For a
set of.isolated small objects with a small number of projections, this method gives a
star pattern artifact (Fig. 10.6) [15J.
The objectf(x, y) can be restored fromf(x, y) by a two-dimensional (inverse)
filter whose frequency responser is 1;1 = VS1 + ;1, that is,
(10.21)

f g(s.8)
-
tt«, y)

c

Back-projected projections f<x. y)


.

Figure 10.6 Summatio~ algorithm for image recOnstructi~n, j e, illg.


t Note that the Fourier transform of (x' + y ') - ," is (s~ + si) -11'.

Sec. 10.3' The Back-Projection Operator 441


TABLE 10.3 Filter Functions for Convolution/Filter Back-Projection Algorithms, d ~ 1/2 E0


Discrete impulse
. Frequency response , Impulse resp0!lse , response
Filter . H(~) h(s) h(m)~dh(md)

Ram-Lak hRds) = 1f7,. -


hRI.(m) = m=O
'[2 siilc(2~os) - sine' (~os)J

2(1 + sin21T~os) 2
Shepp-Logan I~l sinc(~d) rect(~d) ,
2 2 2
11' (d - 4s') 1T d (1 - 4m )

Low-pass
cosine
lei cos('lIed) rect(ed) ![hRL(S - ~) + hRL(S + ~)1 ![hRdm -!) + h"dm + i)1

Generalized I~I [c + (1 - n) cos 21T~dJ' 1-" 1 - ",)


MRds)+ 2 . ,ah"L (m)+ ( 2 •
Hamming recl(~d). 0 s" s 1
. [h"L (s - d) + h RL(s +d)J ' •[h RL(m -1) + h"L (m + I)J
Stochastic See eq. (10,70) and
Example 10.6 •

where .':7 2 denotes the two-dimensional Fourier transform operator. In practice the
filter I~l is replaced by a pnysically realizable approximation (see Table 10.3). This
method [16] is appealing because the filtering operations can be implemented
approximately via the FFT. However, it has two major difficulties. First, the Fourier
domain computation of I~JF;,(t, 6) gives F(O, 0) := 0, which yields the total density
fJf(x, y) dxdy := O. Second.since the support of (fig is unbounded,f(x, y) has to be ,
computed over a region much larger than the region of support of f(x, y). A better
algorithm, which follows from the projection theorem discussed next, reverses the
• order of filtering and back-projection operations and is more attractive for practical
implementations.


10.4 THE PROJECTION THEOREM [5-7, 12, 13]

There is a fundamental relationship between the two-dimensional Fourier trans-


form of a function and the one-dimensional Fourier transform of its Radon trans-
form. This relationship provides the theoretical basis for several image recon-
structionalgorithms, The result is summarized by the following theorem.

Projection Theorem, The one-dimensional Fourier transform with respect


to 8 of the projection g(s, 9) is equal to the central slice, at angle 9, of the two-
dimensional Fourier transform of the objectf(x, y), that is, if
g(8, 9) (.7
1
) G(~, 6)
s
then, ,
,

G(~, 8) =- Fp(~, Q) .~ F(~ cos e, ~ sin 6) (10.22)

442 Image Reconstruction from Projections Chap. 10


Figure 10.7 shows the meaning of this result. This theorem is also called the
projection-slice theorem.
Proof Using (10.6) in the definition of G (t. 6), we can write

G(~.a)~ r g(s, 6)e-j',,~ds


-.. (10.23)
= Jff(s cos6-u sin9 .s sinll+ucos9)e-j'''''dsdu

Performing the coordinate transformation from (8, u) to (x, y), [see (10.5)]. this
becomes

G(E.9) = If f(x,y) exp[ -j2'1f(x~ cos s + yl; sinG)]~dy


= F(E cos 6.1; sin 9)
which proves (10.22).

Remarks
.
From the symmetry property of Table 10.1, we find that the Fourier transform slice
also satisfies a similar property
G( -~.9 + 'If) = G(E. 9) (10.24)
,
Iff(x,y) is bandlimited, tben so are the projections. This follows immediately
from tbe projection theorem. ' .
An important consequence of the projection theorem is the following result.

• •

(x. y)
& W,
<il .' (s)

,
'.

(X.Y) 5'2 F(~"~2}


Ix. Y!

FillUl'e 10.7 The projection theorem, G(j;.6) = F,,(~.S).

Sec. 10.4 '. 'The Projection Theorem 443


Convolution-Projection Theorem. . The Radon transform of the two-dimen-


sional convolution of two functions flx, y) and f2(x, y) is equal to the one-
dimensional convolution of their Radon transforms, that is, if gk ~ ,,"/Ilk, k = 1,2,
: then
i
rJl: JfNx-x',y -'y')flx',y')dx dy ' = [gl(S r:s ', e)g2(S', e)ds' (10.25)

The proof is developed in Problem 10.9. This theorem is useful in the


implementation of two-dimensional linear filters by one-dimensional filters. (See
Fig. 10.9 and the accompanying discussion in Section 10.5.)
Example 10.3
We will use the projection theorem to obtain the g(s, e) of Example 10.1. The
two-dimensional Fourier transform of [i», y) is F(~" ~ 2) =' 8(b - 4)8(~ 2- 3) =
8(~cose-4)l:l(~sine-3) .. From (10.22) this gives G(~,e)=II(~cose-4)
1I(~ sin B- 3). Taking the one-dimensional inverse Fourier transform with respect to ~.
and using the identity (10.9), we get the desired result

g(s, 0) = r~ lI(s cosO-4)o(~ sinO-3)ej2"$'d~

= (IcosO.
1 I) exp( j8'lT~)'o(4 tan e- 3) = G)ejH>,,, 0(6 - 4»
,cos. .


10.5 THE INVERSE RADON TRANSFORM (6, 12, 13, 17]

The image reconstruction problem defined in Section 10.1 is theoretically equiv-


alent to finding the inverse Radon transform of g (s, e). The projection theorem is
useful in obtaining this inverse. The result is summarized by the following theorem.

Inverse Radon Transform Theorem. Given g(s, fJ) ~ Uf, -0::<$ <rt>,
os; !l < 'lI', its inverse Radon transform is .

f( x, y ) "" ( 2'l1'2
1) J" r~_~x cos(ClgfBS)(S:
0 J
6)] ds de
a + y sm e- s (10.26)

In polar coordinates . •

a..
fp(r, 4»=f(r c054>,r sm4»- 2'l1'2 o
. _( 1 ) {"J" [(agfas)(s, e)] .
J _~r cos(e-4»-sdsd9 (10.21) ,

Proof The inverse Fourier transform

f(x,y) = If F(i;;,i;z) expfi2'l1'(~IX +~2y)d~ld;2


when written in polar coordinates in the frequency plane, gives
, '
f(x, y) = rr Fp(~, 6) exp[j2':1'~(X cos e + y sin e)J~d~de (10.28)

Image Reconstruction from Projections Chap. 10



where

g(s, e)~ r -00


IgIG(;, e)e « d~
i2w (10.30)

Writing 1~IG as ~Gsgn(g) and applying the convolution theorem, we obtain


g(s, e) = [ .'Tll{~G(t e)}]@ [ .'Tll{sgn(~)}]

= [C;J ~~ (s, e)] (j~~ ) @


(10.31)
= ( 1 )\ f~ ag(t, 0) 1 dt • •
271"2 _00 at s - t

where (1/j271")[ag(s, e)!as] and (-1/j271"s) are the Fourier inverses of ~G(~, 13) and
sgn(g), respectively; Combining (10.29) and (10.31), we obtain the desired result of
(10.26). Equation (10.27) is arrived at by the change ofcoordinates x = r cos 4> and
y = r sin 4>.

Remarks.

The inverse Radon transform is obtained in two steps (Fig. 10.8a). First, each
projection g (s, ll) is filtered by a one-dimensional filter whose frequency response is
Igl. The result, 9(s, e), is then back-projected to yield f(x, y). The filtering operation
can be performed either in the s domain orin the £ domain. This process yields two
different methods offinding !;('-l, which are discussed shortly.
. The integrands in (10.26), (10.27), and (10.31) have singularities. Therefore,
the Cauchy principal value should be taken (via contour integration) in evaluating
the integrals.

Definition. The Hilbert transform of a function 4>(1) is defined as
l/1(s) ~ .5f(j> ~ 4>(s)@(.l)=(1.)Ioo4>(t)dt(10.32)
7I"S 11' -'X S - t
,. .
The symbol .91' represents the Hilbert transform operator. From this definition it .
follows that g(s, e) is the Hilbert transform of (1/271")ag(s, e)/as for each e.
• Because the back-projection operation is required for finding .<J[ -1, the recon-
structed image pixel at (x, y) requires projections from all directions.

Sec: 10.5 .. The Inverse Radon Transform


• 445
A
g(s, /}J 1·0 filter g(s, III .
fIx, yl
HI <II

(allnverse radon transform

I
gls, (II I Differentiate Hilbert
transform
! gIs, III Back·proj<",t fIx, yl
I
I
..!...:o
2,. -
3'C
I
I
<II
IL --'I

(bl Convolution back- projection method


,

!
Filter Inverse
A
gl$'.81 Fourier G(t. III Fourier g(s,8J aack·project fIx, yl
, transform X transform
9',
It' 9' ,1 <II
,
(c) Filter back-projection method

Figure 10.8 Inverse radon transform methods.

Convolution Back-Projection Method

Defining a derivative operator as


(10.33)

The inverse Radon transform can be written as


-
f(x, y) = (1/2'lT)Y/ S7('rJ)g.
.
(10.34)
Thus the inverse Radon transform operator is fIZ- 1 = (112'lT)QJ % Q1 , This means -
~-l can also be implemented by convolving the differentiated projections with
112'lTS and back-projecting the result (Figure 10.8b).

Filter Back-Projection Method


.

From (10.29) and (10.30), we can also write
f(x, y) = aJ S7(g (10.35)
where %is a one-dimensional filter whose frequency response is I~I, that is,
,

g ~ %g ~ r I~IG(~, 8)e i2T!es ds


-~

. = .:7i {I~I [ .:71 g]}


1
(10.36)

Image Reconstruction from Projections Chap. 10



,
This gives'
I(x, y) = U1.7i 1[IEI.Y'lg] (10.37)

which can be implemented by filtering the projections in the Fourier domain and
back-projecting the inverse Fourier transform of the result (Fig. 1O.8e).
Example 10.4
.WewiU'find the inverse Radon transform of g(s, 6) = (!)e 110,,, 1l(6 - ",).
Convolution back-projection method. Using aglas = j2'lTe 1l<'-1l(9 - 01» in (10.26)

!(x,y)= ;~ ff:ell""'(xCOSO+YSinO-sr'll(O-",)d9dr

= (f-)
J1r
f~ e11O'''[s - (x cos'" + Y sin4>W 'dr
-~ .
Since the Fourier inverse of 1/(~ - a) is j1re iZ"a' sgn(t), the preceding integral becomes
{(x, y) = exp[j21T(X cos e + y sin<l»t] sgn(t)l, _, = exp[jlO1r(x cos <1>'1 y sin",)).

Filter back-projection method.


• G(~, 6) = a>ll(~ - 5)8(6 - $)

:;> Us, 6) = (!) f);18(~ - 5)8(0- 4» exp(j2'lTs~)dE = eil"'" 1l(6 - 4» •

:;>f(x,y) - [exp[jlO1T(X cos s + y sin9)]1l(6 - $)d9

=exp[j101T(x cos o + y sin$)]


For <I> = tanI (~), {(x, y) wiU be the same as in Example 10.1.

Two-Dimensional Filtering via the Radon Transform

A useful application of the convolution-projection theorem is in the implementa-


tion of two-dimensional filters. Let.A (t I , ~ 2) represent the frequency response of a
two-dimensional filter. Referring to Fig. 10.9 and eq. (10.25), this filter can be
implemented by first filtering for each 6, the one-dimensional projection g(s, D) by

,...-----, , ,
{lx. yJ 2-0 filter flx. yJ {(x. yl g(s.O) 1-0 liltern q(s. OJ fIx. yl
-
Alt,. t21
,"'" (ll
Ap lt . 61
(ll

Domain
, h
fIx, yl g(s.OI 1-0 filte!'!l gls.OI fix, yl
= (ll &

IEI Ap(E. 81 <


Figure 10.9 Generalized filter back projection algoritbmfor two-dimensional fil- «

ter implementation.
,

Sec. 10.5 . The Inverse Radon Transform 447


• • 0


a one-dimensional filter whose frequency response is Ap (~, 6) and then taking the 0

inverse Radon transform of th~ result. Using the representation of yz-l in Fig.
1O.8a, we obtain a generalized filter-back-projection algorithm, where the filter now
becomes i~IAp (~, 6). Hence, the two-dimensional filter A.( ~ 1 , ~ 2) can be imple-
mented as
• 0 0


. .. •
0

ii (x, y) @f(x, y) '" ai.91; g (10:38)


wh~re Sf,; represents a one-dimensional filter with frequency response A p (~, 6) ~
1~lAp (~, 6). o. 0 0 0 0

10.6 CONVOLUTiON/FILTER BACK-PROJECTION ALGOR!THMS:


DIGITAL IMPLEMENTATION [18-21]

The foregoing results are useful for developing practical image reconstruction
algorithms. We now discuss various considerations for digital implementation of
these algorithms. 0

Sampling Considerations

• (10.40)

Choice of Filters . •

,

The filter function I~I required for the inverse Radon transform emphasizes the
high-spatial frequencies. Since most practical images have a low SNR at high fre-
o quencies, the use of this filter results in noise amplification. To limit the unbounded 0

nature of the frequency response, a bandlimited filter, caned the Ram-Lak filter
[19] 0


(10.41)

has been proposed. In practice, most objects are space-limited and a bandlimiting
filter with a sharp.cutoff frequency ~ 0 is not very suitable, especially in the presence

Image Reconstruction from Projections . Chap. 10


• 0 •


,
• •

of noise. A small value of ~6 gives poor resolution and a very large value leads to
noise amplification. A generalization of (10.41) is the class of filters
H(~) xx: I~IW(~) (10.42)
Here W(s) is a bandlimiting window function that is chosen to give a more-
moderate high-frequency resppnse in order to achieve a better trade-off between
the filter bandwidth (that is, high-frequency response) and noise suppression. Table
10.3 lists several commonly used f~lters. Figure 10.10 shows the frequency and

Frequency response H(~) Impulse response h(s) .


0.3
0.2 ,,
0.6 0.1
\,
'"'"
'"a
'C

-.-
0.4 '"
> 0
1f
<C
0.2 -0.1
o I-I-L.....L....; -0.2
-0.6 ·0.2 0 0;2 0.6 . 0 1 2 3 4 5 6
. ,
Frequency. ~ Distance, S
(al RAM·LAK
• 0.2 •

0.6 0.1
~ <'
-
::l
.-
- 0.2
~
0.4 -s'"
-0.1
0

<C •
0 '·0.2
-0.6 -0.2 0 .0.2 0.6 0 1 2 4 3 5 6
Frequency. ~ Oistance. s

(b) Shepp·Logan

, f: 0,6 0.2
'"
'C
0.4
-0-'"
.t:
0.2
-"'"
>'"
0.1
E 0

0 -0.1
-0.6 -0.2 0 0.2 0.6 0 1 2 3 4 5 6
Frequency, ~ Distance, s

(c) Lowpass cos! ne

0.6 0.2 •
'"
-
'C

'"-0-
••
0.4 e 0.1
E 0.2
->.,::>

<C 0
0 -0.1
-0.6 -0.20 0.2 0.6 0 1 2 4 6
. Frequency, ~ Distance, S
(d) Generalized hamming
• •
Figur/1 10,16· Reconstruction filters. Left column: Frequency response; right col-
umn: Impulse response; dotted lines show linearly interpolated response.

Sec;'10.6 " Convolution/Filter Back-Projection Algorithms



• " . -- _...- 9n l s)
_..
~,~ --,-,
Drscrete
~

untml Discrete un(ml


. Linear '" ftx, YI
convolution baek-nrojection
•h(ml interpolation
<B N
(a) Convolution back-projection algorithm: Digital implementation;
-

• •l1n tm) A

OFT Gnlkl Gnlkl 10FT Linear Un(s) Discrete "" fix, yl


X back·projection
(K) . (Kl interpolation

Hie!
A
Hlk)

f--'- Un (mI."
Sample •unlml

--2
K 0 K-l
,

(b) Filter back-projection algorithm: Digital implementation.

Figure 10.11 Implementation of convolution/filter back projection algorithms.

the impulse responses of these filters for d = 1. Since these: functions are real and
even, the impulse responses are displayed on the positive real line only. For low
levels of observation noise, the Shepp-Legan filter is preferred. over the Ram-Lak
filter. The generalized low-pass Hamming window, with the value of IX optimized
for the noise level, is used when the noise is significant. In the presence of noise a
better approach is to use the optimum mean square reconstruction filter 'also called
the stochastic filter, (see Section 10.8).
Once the filter has been selected, a practical reconstruction algorithm has two

major steps:

1. For each 6, filter the projections g(s, 6) by a one-dimensional filter whose


frequency response is HW or impulse response is h(s).
2. Back-project the filteredprojections, g(s, Il).

Depending on the implementation method of the filter, we obtain two distinct


algorithms (Fig. 10.11). In both cases the back-projection integral [see eq. (10.17)]
is implemented by a suitable finite-difference approximation. The steps required in
the two algorithms are summarized next.
, '

Convolution Back·Projection Algorithm

The equations implemented in this algorithm are (Fig. 1O.11a)


Convolution: , 8 ( 8 , 6)=g(s, 6)@h(s) (1O.43a)
• •
Back projection: {(x, y) = (lJ g (l0.43b)
. , ,
450 Image Reconstruction from Projections Chap. 10
. The filtering operation is implemented by a direct convolution in the s do-
main. The steps involved in the digital implementation are as follows:

1. Perform the following discrete convolution as an approximate realization of
sampled values of the filtered projections, that is,
M/2-1. -M M
g(md,nA)=i.{m)~ 2: gn(k)h(m-k), 2 sm$2'-l (10.44)
k~-Mi2

where h(m)@dh(md) is obtained by sampling and scaling h(s). Table 10.3


lists hem) for the various filters. The preceding convolution can be imple-
mented either directly or via the FFT as discussed in Section 5.4.
2. Linearly interpolate gn(m) to obtain a piecewise continuous approximation of
g(s, hA) as

g(s, nA) = gn(m) + (~- m )[i.{m + 1) - Um)},


(10,45)
mdss«m+1)d
3. Approximate the back-projection integral by the following operation to give
N-I
f(x, Y) =J(x, y) ~ (jJ Ng ~ A 2: g(x cosnA + y sinnA,nA) (10.46)
n~O

where (jJ N is called the discrete back-projection operator. Because of the back-
projection operation, it is necessary to interpolate the filtered projections
g.(m). This is required even if the reconstructed image is evaluated on a
sampled grid. For example, to evaluate
N-I
f(iAx,jA y) = A 2: g(iAx cosnA + jAy sin nA, nA) (10.47)
"=0
on a grid with spacing (Ax, Ay),i,j =0, ±l, ±2, .. _, we still need to evaluate
g(s, nA) at locations in between the points md, m = - M 12, ... , M /2 - 1. Al-
though higher-order interpolation via the Lagrange functions (see Chapter 4)
is possible, the linear interpolation of (10,45) has been found to give a good
trade-off between resolution and smoothing [18]. A zero-order hold is some-
times used to speed up the back-projection operation for hardware imple-
mentation.

filttr( Back-Projection Algorithm

In Fig. 1O.l1b, the filtering operation is performed in the frequency domain accord-
ing to the equation
g(s, 6) = g-ll[G(~,6)H(~)1 . .(10,48)
> •

Given H(~), the filter frequency response, this filter is implemented approximately
by using a sampled approximation of G (~, 6) and substituting a suitable FFr for the

Sec. 10.6 ConvolutionlFilter Back-Projection Algorithms 451




inverse Fourier transform. The algorithm is shown in Fig. lO.l1b, which is a one-
dimensional equivalent ofthe algorithm discussed in Section 8.3 (Fig. 8.13b). The
steps of this algorithm are given next:'

1. Extend the sequence g" (m), - MI2':5 m :5 (M 12) - 1 by padding zeros and
periodic repetition to obtain the sequence g" (m)" 0:5 m <. K - 1. Take its
FFT to obtain G; (k), 0 -< k <. K - L The choice of K determines the sampling
resolution in the frequency domain. Typically K = 2M if M is large; for exam-
ple, K = 512 if M = 256. ..
2. Sample H(f;J to obtaiilli(k)~H(kA~),Ii(K-k)~H*(k), O:5k<KI2,
where * denotes the complex conjugate.
3. Multiply the sequences (;n (k) and Ii (k), 0:5 k <. K -1, and take the inverse
.. FFT of the product. A periodic extension' of the result gives in (m),
- K/2:5 m <. (K/2) -1. The reconstructed image is obtained via (10.45) and
(10.46) as before.
Example 10.5
Figure 1O.12b shows a typical projection of an object digitized on a 128 x 128 grid (Fig.
1O.12a). Reconstructions obtained from 90 such projections, each with 256 samples per
line, using the convolution back-projection algorithm with Ram-Lak and Shepp-Legan
filters, are shown in Fig. 10.12c and d, respectively. Intensity plots of the object and its
reconstructions along a horizontal line through its center are shown in Fig. 10.12f
through h. The two reconstructions are almost identical in this (noiseless) case. The
background noise that appears is due to the high-frequency response of the recon-
struction filter and is typical of inverse (or pseudoinverse) filtering. The stochastic filter
• •

outputsshown in Fig. 1O.12e and i show an improvement over this result. This filter is
discussed in Section 10.8.

Reconstruction Using a Parallel Pipeline Processor


Recently, a powerful hardware architecture has been developed [11J that en-
ables the high speed computation of digital approximations to the Radon transform
and the back-projection operators. This allows the rapid implementation of
convolution/filter back-projection algorithms as well as a large number of other
image processing operations in the Radon space. Figure 10.13 shows some results of
reconstruction using this processor architecture.

10.7 RADON TRANSFORM OF RANDOM FIELDS [22, 23J


SO far we have consideredf(x, y) to be a deterministic function. In many problems,
such as data compression and filtering of noise, it is useful to consider the input
f(x, y) to be a random field. Therefore, it becomes necessary to study the properties
of the Radon transform of random fields, that.is, projections of random fields. .

-
A Unitary Transform· a '
Radon transform theory for random fields can be understood more easily by consid-
ering the operator •

452 Image Reconstruction from Projections Chap. '0



S
p_ L
40
,

" 30
./

20

, g(s, 01
t" 10

lL, !
t'..--
%
y; o
,

i -10
-128 -96 -64 -32 o 32 64 96 128

L~-~"",",,,._",~_ '0._''.,._'_=>_-e'._"-""'_',._'. ,_~_._ ~v_ . ._,,__v_--'"_.""""", ,,~0<" s


(a) Original object (b) A typical projection

~ .. .'

~:
,
-,' -

Ie) Ram-lak filter (d) Shepp-lo9an filter


r w+ lffY}xh'Y:S;Zf-I)/Af;\, -Pt" 1J¥!;l\Hi}f!",S~F~,_,­

Ii--
t-

tt;
,\

j
•,

J •

~;;;
z'-
If-

• •
Figure 10.12 Image rcconstrucnon
(el Stochastic filter example,

Sec. 10.7 .Radon Transform ofRandom Fields 453


3 - 3

2 2
"" ,
1 f- , 1 •

o ... '-- o

-1 -1

, •
,

-2 I I I I I I -2 '---'-_ _-L-_ _--l_...L..--l_..J


,
-64 -48 -32 -16 0 16 32' 48 64, -64 -4B -32 -16 0 16 32 64

lfl Object line igl Reconstruction via


flAM·LAK filter



3

~1 -1 ,

-2 - _..L-_ - - ---''---'---J -2 "---..l._...I.._.L....--l._...L.'--L..-...l----I


-64 -48 -32 -16 0 16 32 48 ' 64 -64 -48 -32 -16 0 16 32 46 64 •

(h) Reconstruction via il) Reconstruction via


Shepp-Logan filter stochastic filter.
Also see Example 10.6

Figure 10.12 Cont'd

(10.49)
where Sf112 represents a one-dimensional filter whose frequency response is 1~1112.
The operation
, g (8, 6) ~ (;if = Sf112 9£f = .9'('112 g (10.50)
is equivalent to filtering I12
the projections by !K (Fig.
10.14). This operation can also
be realized by a two-dimensional filter with frequency response (~1 + ~WI4 followed
by the Radon transform.

a- + denote the adjoint operation of 9[:- -!.iZ



Theorem 10.1. Let • The operator
is unitary, that is, '
a -1 = (;i + = (1J .9'('112 (10.51)
_.',. -
This means the iaverse of a is equal to its adjoint and the {It transform preserves
.

energy, that is,


Image Reconstruction from Projections Chap. 10


jsl Original phantom image (b) reConSlruction via ctmvolution
back-projection,


" - .--
-_.-,.,--~

abcdefg
,

hijklmn
0pCIrstu

VlVXYZ
.,r

, Ie} original binary image . (ell reconstruction using fully constrained ART
algorithm

Figure 10.13 Reconstruction examples using parallel pipeline processor.

• Jflf(x,Y)l2dxdY ""1" f.iK(S, 6)12ds~6 (10.52)


-.,
This theorem is useful for developing the properties of the Radon transform for
random fields. For proofs of this and the following theorems, seeProblem 10.13.

Sec. 10.7 Radon Transform of Random Fields


r·-- - ...,....---~ .... ~·-~I ,- ------------1
I I I
I 9 " FWter I 9 - f I 2·D filter f I -
f 61 I ~ ,,112 \ l~~ + ~WI4 61 •
9
I '" I
I I
I IL JI
L_ ~ ~_ .._~.J
~

Figure 10.14 The ,;f·transform,

Radon Transform Properties for Random Fields

Definitions. Let f(x, y) be a stationary random field with power spectrum


density S(~l>~2) and autocorrelation function r('I'1,T2)' Then S(~h~2) and r('I'),T2)
form a two-dimensional Fourier transform pair. Let Sp (~, e) denote the polar-
coordinate representation of S (~h \; 2)' that is, I

sp(~,e) ~S(~ cose,~ sine) (10.53)

Also, let rp (s, 6) be the one-dimensional inverse Fourier transform of Sp(i;, 6), that

IS,

(10.55)
-
Theorem 10.2. The operator ill is a whitening transform in e for stationary
random fields, and the autocorrelation function of frs, 6) is given by

rii(s, 6;;', 6'),1. E[g(s, 6)g(s', e~)l =rg(s -s', 6)0(e-6') (1O.56a)
where
ri (s, 6) = r, (s, 8) (lO.56b)

This means the random field g (s, e) defined via (10.50) is stationary in sand
uncorrelated in O. Since g (s, e) can be obtained by passing g(s, 6) through !7('1!2,
which is independent of 6, g(s, 6) itself must be also un correlated in e. Thus, the
Radon transform is also a whitening transform in efor stationary random fields and
the autocorrelation function of g(s, &) must be of the form
. '" ;.

rgg(s, 6;s';6')~E[g(s, 6)g(s', O')]=rg($ -s', e)8(e-6') (10.57)


"

where rg(s, 6) is yet to be specified. Now, for 'any given e, we define the power
spectrum density of g (s, e) as the one-dimensional Fourier transform of its auto-
correlation function with respect to s, that is, .
,
(10.58)

456 Image Reconstruction from Projections Chap. 10


,

From Fig. 10.14 we can write


Sg (~, 0) = 1~ISg (~, e) (10.59)

These
- ..
results lead to the following useful theorem.
.

Projection Theorem for Random Fields


-
Theorem 10.3. The one-dimensional power spectrum density Sf! (~, 6) of the 9'l
transform of a stationary random field f(x, y) is the central slice at angle 6 of its
two-dimensional power spectrumdensity S (~ 1'~2)' that is,

Sg(~, 6) = Sp (~, 0) = S(~ cos e, ~ sin 0) (10.60)

This theorem is noteworthy because it states that the central slice of the two-
dimensional power spectrum density S (~ 1,~2) is equal to the one-dimensional
power spectrum of g (s, 0) and not of g(s, e). On the other hand, the projection
theorem states that the central slice of a two-dimensional amplitude spectrum den-
sity (that is, the Fourier transform) F(g I , ~ 2) is equal to the one-dimensional ampli-
tude spectrum density (that is, the Fourier transform) of g(s, 0) and not of g (s, 0).
.. Combining (10.59) and (10.60), we get

sp (~, ll) = Sf! (s, ll) = ~ISg (g, 6) • (10.61)


which gives; formally,

(10.62)

and

r (s· 6) = ~7'-1 Sp(~, 0) (10.63)


g' v I Igl
Theorem 10.3 is useful for finding the power spectrum density of noise in the
reconstructed image due to noise in the observed projections. For example, suppose
. v(s, 6) is a zero mean random field, given to be stationary in sand uncorrelated in 0,
with
E[v(s, 6)v(s', 6')] = rv(s -s', 6)0(6 - 6') (lO.64a)
d,
S.(g,6)' s . tv (s, 6) (10.64b)
,
If v(s, 6) represents the additive observation noise in the projections, then the noise
component in the reconstructed image will be

'T](x,y)~ (}J5'Zv= (}J:JZ'lI2 v = Gf-1ii (10,65)


where ~ ~yztl2v. Rewriting (10.65) as

(10.66)

Sec. 10.7-· Radon Transform of Random Fields 457


and applying Theorem 10.3, we can write S'lP re, 0), the power spectrum density of 11.
• as
s'lP(~,e) = Sv(~,6) = Is!s.(s,l1) (10.67)
This means the observation noise power spectrum density is amplified by (~1 + ~WI2
by the reconstruction process (that is, by iII'-I). The power spectrum S~ is bounded
only if/~IS.(~, 6) remains finite as ;~ 00. For example, if the random field v(s, 6) is
bandlimited, then 'f\(x, y) will also be bandlimited and S" will remain bounded.

10.8 RECONSTRUCTION FROM BLURRED NOISY PROJECTIONS


.
[22-25] . ;

Measurement Model ,

In the presence of noise, the reconstruction filters listed in Table 10.3 are not
optimal in any sense. Suppose the projections are observed as

w(s, 6)= r
_00
hp(s -s', O)g(s', 0) lis , +v(s, 0),
(10.68)
-oo<S <00,0<9:$11'
The function hp (s, 9) represents a shift invariant blur (with respect to s), which may
occur due to the projection-gathering instrumentation, and v(s, 9) is additive, zero
mean noise independent of f(x; y) and uneorrelated in 6 [see (10.64a)]. The opti-
mum linear mean square reconstruction filter can be determined by applying the
Wiener filtering ideas that were discussed in Chapter 8.

The Optimum Mean Square Filter

The optimum linear mean square estimate off(x,y), denoted by!(x,y), can be
reconstructed from w(s, 0), by the filter/convolution back-projection algorithm
(Problem 10.14)

g(s, 0) = r-<0
ap(s -s', 6)w(s','0) lis '

/ (x, y) =(}Jg • (10.69)


where

ap(s, 9) (
(10.70)
hp (s, 9) (

Remarks

The foregoing optimum reconstruction filter can be implemented as a generalized


filter/convolution back-projection algorithm using the techniques of Section 10.6. A

Image fleconstructioll from Projections' Chap. 10



,
provision has to be made for the fact that now we have a one-dimensional filter
Q" (s, a), which can change with e.

Reconstruction from noisy projections. In the absence of blur we have


hI' (s, 0) :::: 3(s) and .
w(s, 0):::: g(s, 6) + v(s, 6) (10.71 )
The reconstruction filter is then given by
_ 1~ISI'(~,6)
AI' (~, 6) - [Sp (~, 6) + 1~ISv (~, Ii)] (10.72)
Note that if there is no noise, that is, Sv- 0, then AI' (~, 6)-ltl, which is, of course,
the filter required for the inverse Radon transform .
. Using (10.61) in (10.70) we can write
Ap(~,O)=I~IAp(~,6) (10.73)
where
- fJ. HEr S8 _ H;Sg
A p(~, 6) = IHpj" 58 + S; - SO' (10.74)

Note that Ap(~,O) is the one-dimensional Weiner filter for g(s, 6) given w(s, 0).
This means the overall optimum filter AI' is the cascade of-'~l, the filter required for.
the inverse Radon transform, and a window function ~ (~, 6), representing the
locally optimum filter for each projection. In practice, Ap(~, 6) can be estimated
adaptively for each 6 by estimating Sw(~, 6), the power spectrum density of the
observed projection w(s, 6).
Example 10.6 Reconstruction from noisy projections
Suppose the covariance function of the object is modeled by the isotropic function \
T(X, y) = a' exp(-av'x' + y'). The corresponding power spectrum is then S(~ .. ~,) ...
. 21Taa'[a' + 4'lT' (~1 + ~m-312 or Sp (~, 6) '" Z1Taa'[a' + 41T' er"'''.
Assume there is no
blur and let Tv (8, 6) = (T~. Then the frequency response of the optimum reconstruction
filter, henceforth called the stochastic filter, is given by
1~ISp (~,ll) Z'lTlXl:r'I~1
A p
(~, 0) =. S,f(~,ll) + <l.1~i "" 2'lTCxrr 2 + I~I<I. (a' + 41T 2 er
1~12'lTa(SNR) A 0"
=Z'1r<l(SNR) + 1~1«l + 4'lT2 ~ ' ) " " ' . SNR = ~
This filter is independent of 6 and has a frequency response much like that of a .
band-pass filter (Fig. 10. 15a). Figure 10.I5b shows the impulse response of the sto- .
chastic filter used for reconstruction from noisy projections with (T~ =. 5, (J" =. 0.0102,
and a = 0.266. Results of reconstruction are shown in Fig. 1O.15c through i. Com-
parisons with the Shepp-Legan filter indicate significant improvement results from the
use of the stochastic filter. In terms of mean square error, the stochastic filter performs
13.5 dB better than the Shepp-Legan filter in the case of a~ = 5. Even in the noiseless
case (Fig. 10.12) the stochastic filter designed with a high value of SNR (such as 100), .
provides a better reconstruction. This is because the stochastic filter tends to moderate
the high-frequency components of the noise that arise from errors in computation.

Sec. 10.8 Reconstruction from Blurred NoisV Projections 459


0,70

0.60

0.50

0.40

0.30

,
0.20

0.10


o 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0


(0) A typical frequencv response of a stochastic filter
Ap(t 8) ;A p(-(, 01

1.0

0.8
.,
0.6

h(s) 0.4
+
0.2

o

-0.2

-.0.4 L-_--I._ _....L.._-'-.l.-_--I._ _....L.._ _J...._--'_ _...J


.. .
o 4 8 12 16• ·20 24 32
• S
, d _
(b) Impulse response of the stochastic
filter used

Figure 10.15 Reconstruction from noisy projections.

460 Image Reconstruction from Projections Chap. 10


,
30L 3

2
20 I
,I
1 'dl
'\
o Jr
-1

-10 -2 l-.......J.--l.-I.._.L----il-.--l._..l--_L.---J
-128 -96 -64 -32 0 32 64' 96 128 -64 -48 -32 -16 0 16 32 48 64

(c) Typical noisy projection, U; ee 5 (d) Reconstruction via Shepp·Logan filter


"

"
1

-2 l--I._..L---:..l_..L---:..l_..L--I....,-
-64 -48 -32 -16 0 16 32 48 64

Ie) Reconstruction via the stochastic filter

,,
,,,
. !!

(9) Stochastic filter, 07. # 1;


, Figure 10.15 Cont'd

Sec. 10.8 Reconstruction from Blurred Noisy Projections 451



~~~"'.«."l'.."",,,,~~
~ji '

! .
., " '
-, ,

\

,
;;.

'i
. •
• •



"• ~
,'&"." .... r'i1',· liin ' '.. _~ k'& "w ,f .. ",y '; " .....
(hI Shepp-logan filter, ~ ~ 6; III Stochastic filter, ~ = 5.

Figure 10.15 Conl'd

10.9 FOURIER RECONSTRUCTION r~~ETHOD [26-29] .

A conceptually simple method of reconstruction that follows from the projection


• theorem is to fill the two-dimensional Fourier space by the one-dimensional Fourier
transforms of the projections and then take the two-dimensional inverse Fourier
transform (Fig. 10.16a), that is,
f(x,y) = 0-/[ ,7'tg] (10.75)

Algorithm

There are three stages of this algorithm (Fig.. 10. 16b). First we obtain
Gn(k)=G(k&f"nAe),':'K/2 S kSK/2-1,OSI1.s.N-l, as in Fig. (lO.llb).
Next, the Fourier domain samples available on a polar raster are interpolated to
yield estimates on a rectangular raster (see Section 8.16). In the final stage of the
algorithm, the two-dimensional inverse Fourier transform is approximated by a
suitable-size inverse FFT. Usually, the size of the inverse FFT is taken to be two to
three times that of each dimension of the image. Further, an appropriate window is
. used before inverse transforming in order to minimize the effects of Fourier domain
truncation and sampling.
Although there are. many examples of successful implementation of this
algorithm [29], it has not been as popular as the convolution back-projection algo-
rithm. The primary reason is that the interpolation from polar to raster grid in the
frequency plane is prone to aliasing effects that could yield an inferior reconstructed
image. •

462 Image Reconstruction from Projections Chap. 10 ..


g(s, 01 G(~, 0) fix, Y)

Fill fourier
space

lal The concept

,
~

Interpolate Flk&~I' 1&~21 Window 2-0 f{mAJf, nAy)


g.lm)e G.lkl
FFT from polar to , andlor ' , inverse
rectangular raster pad zeros FFT

I

o 0 0 0
o 0 0 0

o 0 o 0
o 0 o 0
• •

lbl A practical fourier reconstruction algorithm

Figure 10.16 Fourier reconstruction method.


Reconstru«<tion of Magnetic Resonance Images (Fig. 10.17)

In magnetic resonance imaging there are two distinct scanning modalities, the
projection geometry and the Fourier geometry [30]. In the projection geometry
mode, the observed signal is G(~,!l), sampled at ~ =k.:i~, --K/2sksKI2-1,
!l = n.:i6, 0 s n s N - 1,.:ie = 'It/N. Reconstruction from such data necessitates the
availability of an FFT processor, regardless of which algorithm is used. For exam-
ple, the filter back-projection algorithm would require inverse Fourier transform of

(a) MRI data; fbI Reconstructed image;

• Figure 10.17 Magnetic resonance image reconstruction; •

Sec. 10.9 ' .Fourier Reconstruction Method


G(~,e)H(n Alternatively, the Fourier reconstruction algorithm just described is


also suitable, especially since an Fl<T processor is already available.
. In the Fourier geometry mode, which is becoming increasingly popular, we
directly obtain samples on a rectangular raster in the Fourier domain. The recon-
struction algorithm then simply requires a two-dimensional inverse FFT after win-
dowing and zero-padding the data. .
Figure 1O.17a shows a 512 x 128 MRI image acquired in Fourier geometry
mode. A 512 x 256 image is reconstructed (Fig. 10.17b) by a 512 x 256 inverse FFT
of the raw data windowed by a two-dimensional Gaussian function and padded by
zeros.

10.10 FAN-BEAM RECONSTRUCTION •



Often the projection data is collected using fan-beams rather than parallel beams
(Fig. 10.18). This is a more practical method because it allows rapid collection of
projections compared to parallel beam scanning. Referring to Fig. 1O.18b, the
.source S emits a thin divergent beam of X-rays, and a detector receives the beam
after attenuation by the object. The source position is characterized by the angle 13,
and each projection ray is represented by the coordinates (0',13), -Tr/2 s 0'< Tr/2,
o-< 13 < 2Tr. The coordinates of the (0", (3) rayare related to the parallel beam
coordinates (s, e) as (Fig. 10.18c) .
s = R sin c
(10.76)
6=0'+13
where R is the distance of the source from the origin of the object. For a space-
limited o~ect with maximum radius D/2, the angle 0" lies in the interval
[-,,/, "/], "/ = sin" (D/2R) ..Sincea ray in the fan-beam geometry is also some ray in
the parallel beam geometry, we can relate their respective projection functions
bt«, 13) and g(s, 9) as
b(O',f3)=g(s, O)=g(R sina,O'+f3) (10.77)
If b(O",I3) is given on a grid (O"m,f3.), then this relation gives g(s, 6) on a grid
(Sm, 6.),sm ~ R sin 0"';', e.~ 0'. + 13•. This data then has to be interpolated to obtain
g(sm,O.) on the uniform grid s., = md, -(MI2) ss m < (M/2) -1; 6.=n.i,OSn S .
N -1. This is called rebinning. Alternatively, we can , use .

(10.78)

to estimate g(Sm, en) on the uniform grid by interpolating b(O'm, 13.). Once g(s, 6) is
available on a uniform grid, we can use the foregoing parallel beam reconstruction
algorithms. Another alternative is to derive the divergent beam reconstruction
algorithms directly in terms of b(O", 13) by using (10.77) and (10.78)iri the inverse
Radon transform formulas. (See Problem 10.16,)
In practice, rebinning seems to be preferred because it is simpler and can be
fed into the already developed convolution/filter back-projection algorithms (or.

464 Image ReccnatructfcnfrcmProjections Chap. 1!l



Detector'
,

lal Parallel beam , (b} Fan beam

,
\ ,

Source S

o
Object

Ie) Fan beam geometry

Figure 10.18 Projection data acquisition, •

processors). However, there are situations where the data volume is so large that the
storage requirements for rebinning assume unmanageable proportions. In such
cases the direct divergent beam reconstruction algorithms would be preferable
because only one projection, beG', /3), would be used at a time, in a manner charac-
teristic of the convolution/filter back-projection algorithms, '

10.11 ALGEBRAiC METHODS

All the, foregoing reconstruction algorithms are based on Radon transform theory
and the projection theorem. It is possible to formulate the reconstruction problem
as a general image restoration problem solvable by techniques discussed in Chapter
8. ,.

The Re'Construction Problem as a Set of Linear Equations


, , •

Suppose [(x, y) is approximated by a finite series


I I
[(x, y) = I (x, y) = 2: 2: ai.i<Pi,J (x, y), (10.79)
i-1 i."'" 1
,,
where {<Pi.j(X, y)} in set of basis functions. Then
I J I J
g(s, fJ)= 9tl= I¥<'i 'i ai.iCM'</>i'i]~ 'i 2, ai.ihi,j(S, e)
!j:" 1 i= 1/-1
(10.80)

Sec.1lUl Alqebrl'lic Methods



where hl,i(s, 9) is the Radon transform of ~1.i(X,y), which can be computed in


advance. When the observations are available on a discrete grid (Sm, e.), we can

write "
I J ,
" g(sm,e n) = 2: 2: QI.ihi,i(sm,lln), Osm sM -1, O:S;n:s;N-l (10.81)
i~li~l '

which can .be solved for ai.j as a set of linear simultaneous equations via least

squares, generalized inverse, or other methods. Once the a/.j are known, I (x, y) is
obtained directly from (10.79).
A particular case of interest is when I(x, y) is digitized on, for instance, an
I x J grid and f is assumed to be constant in each pixel region. Then Qi.i equals Aj,
the sampled value of/(x, y) in the (i,j)th pixel, and
1, inside the (i, j)th pixel region ,
~i.i(X, y) = (10.82)
0, otherwise
Now (10.81) becomes

I J
g(Sm, 6n) = 2: 2:fi.jh/. i (sm,On), O:s;m:s;M -1, O:S;n:s;N-1 (10.83)
i=IJ=1

Mappingf;.i into a Q XI (Q ~ IJ) vectorJby row (or column) ordering, we get


,ff = 9f7' ' (10.84)
where,ff and 9Care P x 1, (P ~MN)and P x Q,arrays, respectively. A more-
realistic observation equation is of the form
(10.85)
,

where 'I) represents noise, The reconstruction problem now is to estimateJfrom.Jl'"'


Equations (10.84) and (10.85) are now in the framework of Wiener filtering,
, pseudcinverse, generalized inverse, maximum entropy, and other restoration algo-
rithms considered in Chapter 8. The main advantage of this approach is that the
algorithms needed henceforth would be independent of the scanning modality (e.g.,
parallel beam versus fan beam). Also, the observation model can easily incorporate
a more realistic projection gathering model, which may not approximate well the
Radon transform.
The main limitations of the algebraic formulation arise from the large size of
the matrix 9C. For example, .for a 256 x ~56 image with 100 projections each
sampled to 512 points••9rbecomes a 51,200 x 65,536matrix. However, a'will be
o
a highly sparse matrix containing only (I) or 0(1) nonzero entries per row. These
nonzero entries correspond to the pixel locations that fall in the path of the ray
(Sm, en). Restoration algorithms that exploit the sparse structure of 9tare feasible.

Algebraic Reconstruction Techniques


,.
A subset of iterative reconstruction algorithms have been historically called ART
(algebraic reconstruction techniques). These algorithms iteratively solve a set of 'P

equations •
,

466 Image Reconstruction from Projections Chap. 10 .


p = 0, ... ,P - 1 (10.86)

where ,~~ is the pth row of $ and..9' p is the pth element of..9" The algorithm,
originally due to Kaczmarz [30], has iterations that progress cyclically as

k =0,1, ... (10.87)

wherif (k +1) determineif(k + 1), depending on the constraints imposed on J (see


Table 10 {CO) is.rme initial condition, and..9' k and kk :appear cyclically, that is,
t4),
I, '
(10.88)

Each iteration is such that only one of the P equations (or constraints) is satisfied at
a time. For example, from (10.87) we can see ;

(10.89)

that is, the [(k + 1) modulo P]th constraint is satisfied at (k + l)th iteration. This
algorithm is easy to implement since it operates on one projection sample at a time.
The operation (kk+ I ,Ji ) is equivalent to taking the I(k + 1) modulo P]th pro-
k

jection sample of the previous estimate. The sparse structure • of kk can easily be
exploited to reduce the number of operations at each iteration. The speed of con-
vergence is usually slow, and difficulties arise in deciding when to stop the iter-
ations, Figure 1O.13d shows an image reconstructed using the fully constrained -.
ART algorithm in (to.87). The result shown was obtained after five complete passes
through the image starting from an aU black image.

TABLE 10.4 ART Algorithms


Algorithm
.r> LJi; 1 S j < Q} Comments
1 Unconstrained ART If L, ~ {JI.9l7=,9'} is nonernpty, the
, J(I,) = J(k) algorithm converges to the element of L,
. J I
with the smallest distance to ./(0).
2 Partially constrained ART: If L 2 ~ { . / 1.W''''',9', J/> O} is nonernpty,
the algorithm converges to an element of
j ( k ) "" ?'. if Jjk) < 0 L2,
/ J~k) otherwise
J /
3 Fully constrained ART: . If LJ ~ {J' J.9l7",9", 0 s,h's I} is non-
o~ if, «/-(k)j
<0 . empty, the algorithm converges to an
JY) "" J(kl, if 0 < i~k) < 1 element of L 3 •

1,J if • /-' ·(~f
I
':11 •

Sec. 10.11 "A/gebraic Methods 467



10.12 THREE-DIMENSIONAL TOMOGRAPHY

If a three-dimensional object is scanned by a parallel beam, as shown in Fig. 10.1,


then the entire three-dimensional object can be reconstructed from a set of two-
dimensional slices (such as the slice A), each of which can be reconstructed using
the foregoing algorithms. Suppose we are given one-dimensional projections of a
three-dimensional object [(x, y, z). These projections are obtained by integrating
[(x, y, z) in a plane whose orientation is described by a unit vector a (Fig. 10.19),
that is, ..

g(s, u) = [9Zf](s, u) = JIff(x)o(x T


u- s) dx
_00
(10.90)
= fJff(x, y, z)8(x sin e cos <j> + y sin e sin <j> + z cos e - s 1dx dy dz
-00
• •

where xD. [x. y, zr, u~[sine cos o , sine sin o, cos a]" This is also called the
three-dimensional Radon transform of [i», y, z). The Radon transform theory can
be readily generalized to three or higher dimensions. The following theorems
provide algorithms for reconstruction of fex, y, z) from the projections g (s, a).

Three-Dimensional Projection Theorem. The one-dimensional Fourier trans-


form of g(s, a) defined as

. G(~,a)D. r g(s, a)e- j 2"", ds .(10.91)


-~ , ~

is the central slice F(~a) iri the direction oof the three-dimensional Fourier trans-
form off(x), that is, .
a (~, a) = F(~a) ~ F(~ sin e cos <j>, ~ sin e sin <j>, ~ cos e) . (10.92)

..... x sin 8 cos <p +,y sin (J sin ¢I +ZCOS 0 =$

'.' .'.

..........8

, /"'----v.
,
I
I
\Y
"' ..J
Figure 10.19 Projection geometry for
x • three-dimensional objects.

468 Image Reconstruction from Projections Chap. 10



,
gIs, 0) 1 o'g'($,o)
O(s,o,) fIx, y; z)
- -2 ffi
8r. os'
,
Figure 10,20 Three-dimensional inverse radon transform.

where
,

(10.93)
,
. - ' . ~.",

Three-Dimensional Inverse Radon Transform Theorem. The inverse of the


three-dimensional Radon transform of (10,90) is given by (Fig. 10.20)

f(x) = (}Jg ~r [" g(XT a, a) sin ad<\>da (10.94)

where
g (s, a ) = 7-1{C(I:. )el
2
"(
g s, a
)=_ 1
8'!T2
a as 2 '1 <",0: 2J
(10.95)
,

Proofs and extensions of these theorems may be found in [13].

Three-Dimensional Reconstruction Algorithms


,

The preceding theorem gives a direct three-dimensional digital reconstruction algo-


rithm. If g(sm. an) are the sampled values of the projections given for s.; = md,
°
- M 12 :5. m :5.M12 - 1, -< n :5. N - 1, for instance, then we can approximate the
second partial derivative by a three-point digital filter to give '

In order to approximate the back-projection integral by a sum, we have to sample <\>


and a, where (<\>k> 6j ) define the direction an' A suitable arrangement for the
projection angles (<\>k> aj ) is one that gives a uniform distribution of the projections
over, the surface of a sphere. Note that if a and <\> are sampled uniformly, then the
projections will have higher concentration near the poles compared to- the equator.
If a is sampled uniformly with Aa = '!Til, we will have aj = (j + i)M, j = 0, 1, ... ,
J - 1. The density of projections at elevation angles ej will be proportional to
l/sin aj • Therefore, for uniform distribution of projections, we should increment <\>k
in proportion to l/sin 6j , that is, Ii<\>= elsin 6j , <\>k = klicj>, k = 0, ... ,K - 1, where e
is a proportionality constant. Since <\>K must equal Zrr, we obtain

k = 0, 1, ... .K, - 1 (10.97)


, .
.This relation can be used to estimate K j to the nearest integer value for different aj •

Sec. 10.12 , Three-Dimensional Tomography 469



g(s, <>1 f~(x',zl rtx, y, z)
~ . (Ie'
' .• •
(Jl : '


'." :
Figure 10.21 Two-stage reconstruction.
,

The back-projection integral is approximated as


A A 1l 2'fil J"l 1 K~ I
Bg = [(x, y, z) = J .t:.J sin 6j -K .t:.J •
j=O j k=O
(10.98)
.g(x sin OJ cos ef>k + y sin 6j sin ef>k + ZCOS 6j , an)
.
. , "

where (ef>k, 6j ) define the direction an, n = 0, ... ,N - 1, where.


}-I

N=jLKJ. (10.99)
j=O

There is an alternate reconstruction algorithm (Fig. 10.21), which contains two


stages of two-dimensional inverse Radon transforms, In the first stage, ef> is held.
constant (by Gr;,~) and in the second stage, z is held constant (by Ge;.: 0»' Details are
given in Problem 10.17. Compared to the direct method, where an arbitrary section
of the object can be reconstructed, the two-stage algorithm necessitates recon-
struction of parallel cross sections of the object. The advantage is that only two-
dimensional reconstruction algorithms are needed.

10.13 SUMMARY

In this chapter we have studied the fundamentals and algorithms for reconstruction
of objects from their projections. the projection theorem is a fundamental result of.
Fourier theory, which leads to useful algorithms for inverting the Radon transform.

Among the various algorithms discussed here, the convolution back projection is
the most widely used. Among the various filters, the stochastic reconstruction filter
seems to give the best results, especially for noisy projections. For low levels of
noise, the modified Shepp-Legan filter performs equally well..
The Radon transform theory itself is very useful for filtering and representa-
tion of multidimensional signals. Most of the results discussed here can also be
extended to multidimensions.

PROBLEMS

10.1 Prove the properties of the Radon transform listed in Table 10.1.
10.2 Derive the expression for the Radon transform of the ellipse shown in Figure lO.4b.
10.3 Express the Radon transform of an object fp (r, cl» given in polar coordinates.,
10.4 Find the Radon transform of
a. exp[ -1T(X 2 + y2)]. 'fix, y
j2'll' .] L L •
b. exp [ L (kx +ly) ~ -'2<x,y:5'2 •

470 Image Reconstruction from Projections Chap. 10,



rokx roly L L
c. cos L cos L ' -2:5x.y:5
2
.d. cos27r(QX + [3y), 'Ix' -t- y':5a
Assume the given functions are zero outside the region of support defined.
10.5 Find the jmpulse response of the following linear systems. .

fIx, l(l gl., OJ gl., 0) b(x. l(1

Figure PIO.S

10.6 In order to prove (10.19), use the definitions of ul and fA' to show that

/(x,y) = If{(x',y') r 8«x' -x) cosH (y' - y) sin6)d6 dxr dy '

Now using the identity (10.9), prove (10.19). Transform to polar coordinates and prove
(10.20).
10.7 Show that the back-projection operator ril is the adjoint of fA'. This means for any
a(x,y) E Itandb(s, 6) E 1J, show that (YI'a, b)" = (a, rAbL.
10.8 Find the Radon transforms of the functions defined in Problem 10:4 by applying the
projection theorem.
10.9 Apply the projection theorem to the function{(x,Y)~f,(x,Y)@f.(x,y)and show
.0',[ Yl'f] = O. (~, 6)0, (I;, 6). From this, prove the convolution-projection theorem.
10.10 Using the formulas (10.26) or (10.27) verify the inverse Radon transforms of the
results obtained in Problem 10.8.
10.11 If an object is space limited by a circle of diameter D and if ~o is the largest spatial
frequency of interest in the polar coordinates of the Fourier domain, show the number
of projections required to avoid aliasing effects due to angular sampling in the trans-
form domain must be N > 'TI'D I; 0 •
10.12'1' Compute the frequency responses of the linearly interpolated digital filter responses
shown in Figure 10.10. Plot and compare these with H(~).
ll2
lO.U a. (Proof o{ Theorem 10.1) Using the fact that Itl is real and symmetric, first show
r - -
thatW·'/2[.'7?·J12 g = (1I2'lT).'7?·v)g, whercw and rl! are defined in (10.32) and
(10.33). Then show that fIr ,',4 = iJJ,'7?'1I2 ~'7?"i2 YI' = ul!Jf),4 = f,4-' YI' =.v (iden-
tity). Now observe that (g,g)" = (g,g)., = (,V?f,i)" and use Problem 10.7 to show
(YI'f, g).. = (f, !lIg}" = (f,f}#·
-
b. (Proof of Theorem 103) From Fig. 10.14 observe that g = !Iff = Mf. Using this
. -
ili~iliat~~ :
(i) E[j (x, y)g (s, I)J ='p
(s - x cos e - y sin e),fp (s, 9) ~ ....ril{I~ISp (~, e)}

(ii) E[g (s, 6)g(s', e)J = [ltlSp (~, 0) exp(j2'1T~s')a(s. e;~, e')d~
,
where «(s, 9; 1;, 0') is the Radon transform of the plane wave
exp{- j2'lTt(x cos 0' + y sin ll')} and equals exp( - j2'lTI;s)o(9 -ll')/Iti. Simplify
this and obtain (10.56a). Combine (1O.56b) and (10.58) to prove (10.60).
10.14 Write the observ~~~n model of (10.68) as vex, y).~ [iI'-1 w= hex, Y)~f(x,!,) + 1)(x, y)
where h(x,y). • H(~h~2)=Hp(~,9)' • h,,(s, e) and 1) = a v, whose

. Problems" Chap.
, 10 471


power spectrum density is given by (10.67). Show that the frequency response of the
two-dimensional Wiener filter for [(x, y), written in polar coordinates, is A (~, 6) =
H;; 51'llHp 1 51' +. ~iS;;·'r'. Implement this filter in the Radon transform domain. as
2

shown in Fig. 10.9, to arrive at the filter AI' = i~iAp.


10.15' Compare the operation counts of the Fourier method with the convolution/filter back-
projection methods. Assume N X Nimage size with aN projections. a ~ constant.
·10.16 (Radon inversion formula for divergent rays)
a. Starting with the inverse Radon transform in polar coordinates, show that the
reconstructed object from fan-beam geometry projections b (a, 13) can be written as

. . [iJb(a,p)_iJb(a,MJ
. 1 r2"I~ iJO' iJ13
fp (r, lj» = 4-rr2 Jo _, r cos(a + 13 -<I» _ R sin 0' dad 13

where lal <: -y. •


b. Rewrite the preceding result as a generalized convolution back-projection result,


called the Radon inversion formula for divergent rays, as

where
a'-a [1 <3b(a,l3) liJb(a,13)].
ljJ(a,/3,(J"')~ sin(q' -0") p iJa - p 8/3 '
.0, . . jaj>-y


, .l
a = tan
-I r cos(13 - <l»
R +r SIn. (I>.
1'-",
A.) , p ~ {[r cos{13 -<I>lY + [R + r sin(13 _lj»]'}'f2 >

°

Show that a' and p correspond to a ray (a' , \3) that goes through the object at
location ir, lj» and p is the distance between the source and (r, 4». The inner
integral in the above Radon inversion formula is the Hilbert transform of ljJ(a, " -)
and the outer integral is analogous to back projection.
c. Develop a practical reconstruction algorithm by replacing the Hilbert transform by
a bandlimited filter, as in the caseof parallel beam geometry.
10.17 (Two-stage reconstruction in three dimensions)
a. Referring to Fig. 10.19, rotate the x- and y-axes by an angle <1>, that is, let
x' = x cos 4> + y sin lj>, v' = -x sin lj> + y cos 4>, and obtain

, g(s, ,p,6)=g(s, a) = fff.;(x',z)8(X' sin 6 +z cose-s)dx'dz


-
-- . .
. where 16 and g are the two-dimensional Radon transforms off (with z constant) and
f., (with <I> constant), respectively, that is, .

" .
.b. Develop the block diagram for a digital implementation. of the two-stage
reconstruction algorithm .

472 .Image Reconstruction from Projections Chap. 10


,
BIBLIOGRAPHY

Section 10.1 •

For image formation models of CT, PET, MRI and overview of computerized
tomography: ,

1. IEEE Trans. Nucl. Sci. Special Issue on topics related to image reconstruction. NS-21,
no. 3 (1974); NS-26, no, 2 (April 1979); NS-27, no. 3 (June 1980).
,

2. IEEE Trans. Biomed. Engineering. Special Issue on computerized medical imaging.
BME-28, no. 2 (February 1981). '"
.: 3. Proc. IEEE. Special Issue on Computerized Tomography.n, no. 3 (March 1983).
,'4. A. C. Kak. "Image Reconstruction from Projections," in M. P. Ekstrom (ed.). Digital
Image Processing Techniques. New York: Academic Press, 1984, pp. 111-171. •

5. G. T. Herman (ed.), Image Reconstruction from Projections. Topics in Applied Physics,


vol, 32. New York: Springer-Verlag,.
1979.
...
'6. G, T. Herman. Image Reconstruction from Projections-e-The Fundamentals of Com-
puterized Tomography. New York: Academic Press, 1980,'
'7. H. J. Scudder. "Introduction to Computer Aided Tomography." Prot. IEEE 66, no. 6
(June 1978).
8. Z. H. Cho, H.S. Kim, H. B. Song, and J. Cumming. "Fourier Transform Nuclear
Magnetic Resonance Tomographic Imaging." Proc. IEEE, 70, no. 10 (October 1982):
1152-1173.
9. W. S. Hinshaw and A. H. Lent. "An Introduction to NMR Imaging," in (3].
10. D. C. Munson, Jr., J. O'Brien, K. W. Jenkins. "A Tomographic Formulation of Spot-
light Mode Synthetic Aperture Radar." Proc. IEEE, 71, (August 1983): 917~925.

Literature on image reconstruction also appears in other journals such as: J. Com-
o put. Asst. .Tomo., Science, Brit. J. Radiol., J. Magn. .Reson. Medicine, Comput.
Biol. Med., and Medical Physics. .

11. J. L. C. Sanz, E. B. Hinkle, A. K. Jain. Radon and Projection Transform-Based Machine


Vision: Algorithms, A Pipeline Architecture, and Industrial Applications, Berlin:
Springer-Verlag, (1988). Also see, Journal of Parallel and Distributed Computing, vol. 4,
no. 1 (Feb. 1987): 45-78. .

Sec.tions 10.2-10.5

Fundamentals of Radon transform theory appear in several of the above references,


such as [4-7J, and:

12. J. Radon. BOber die Bestimrnung von Funktionen durch ihreIntegralwerte Tangs
gewisser Mannigfaltigkeiten" (On the determination of functions from their integrals
along certain. manifolds). Bertichte Saechsiche Akad. Wissenschatten.Il.eipzig), Math.
fh)'i Klass 69, (1917): 262-277. .
13. D. Ludwig. "The Radon Transform on Euclidean Space." Commun. Pure Appl. Math.
. .19, (1966): 49-81. 0 • 0 •

Bibliography o' Chap. 10" 473


14. D. E. Kuhl and R. Q. Edwards. "Image Separation Isotope Scanning," Radiology 80,
no. 4 (1963): 653-662.

15. P. F.• C. Gilbert. "The Reconstruction of 'a Three-Dimensional •
Structure from
Projections and its Applicationto Electron Microscopy: II. Direct Methods." Proc. Roy.
. Soc. London Ser. B, vol. 182, (1972): 89-102. .
16. P. R. Smith, T. M. Peters, and R. H. T. Bates. "Image Reconstruction from Finite
Number of Projections." J. Phys. A: Math Nucl. Gen. 6, (1973): 361-382. Also see New
Zealand J. Sci., 14, (1971): 883-896.
17. S. R. Deans. The Radon Transform and Some of Its Applications. New York: Wiley,
1983.

Section 10.6

For convolution/filter back-projection algorithms, simulations, and related details:

18. S. W. Rowland, in [5], pp. 9-79.


19. G. N. Ramachandran and A. V. Lakshminarayanan, "Three-Dimensional Recon-
struction from Radiographs and Electron Micrographs: II. Application of convolutions
. instead of Fourier Transforms." Proc. Nat. Acad. Sci., 68 (1971): 2236-2240. Also see
Indian J. Pure Appl. Phys. 9 (1971): 997-1003.
20. R. N. Bracewell and A.C. Riddle. "Inversion of Fan-Beam Scans in Radio Astronomy."
• Astrophys, J. 150 (1967): ,P.~-437. .
21. L. A. Shepp and B. F. Logan. "The Fourier Reconstruction of a Head Section." IEEE
Trans. Nucl. Sci. NS-21, no. 3 (1974): 21-43.

Sections 10.7-10.8

Results on Radon transform of random fields were introduced in

22. A. K. Jain and S. Ansari. "Radon Transform Theory for Random Fields and Image
Reconstruction From Noisy Projections." Proc. ICASSP, S2'1 Diego,.1984.
23. A. K. Jain. "Digital Image Processing: Problems and Methods," in T. Kailath (ed.),
Modern Signal Processing. Washington: Hemisphere Publishing Corp., 1985.

For reconstruction from noisy projections see the above references and:

24. Z. Cho and J. Burger. "Construction, Restoration, and Enhancement of 2- and


3-Dimensional Images," IEEE Trans. Nue!. Sci. NS·24, no. 2 (April 1977): 886-895.
25. E. T. Tsui and T. F. Budinger. "A Stochastic Filter for Transverse Section Recon-
struction." IEEE Trans. Nucl. Sci. NS-26, no. 2 (April 1979): 2687-26~0 .

. Section 10.9

26. R. N. Bracewell. "Strip Integration in Radio Astronomy." Aust. J. Phys. 9 (1956):


198-217.
27. R. A. Crowther, D.· J. Derosier, and A. Klug, "The Reconstruction of a Three- .
Dimensional Structure from Projections and Its Application to Electr';.,n Microscopy."

474 Image Reconstruction from .Projections


.
Chap. 10.

Proc. Roy. Soc. London Ser. A, vol. 317 (1970): 319-340. Also see Nature (London) 217
"(1968): 130-134.
28. G. N. Ramachandran. "Reconstruction of Substance from Shadow: I. Mathematical
Theory with Application to Three Dimensional Radiology and Electron Microscopy."
Proc. Indian Acad. Sci. 74 (1971): 14-24.
29. R. M. Merseau and A. V. Oppenheim. "Digital Reconstruction of Multidimensional
. Signals From Their Projections." Proc. IEEE 62, (1974): 1319-1332.
30. R. F. King and P. R. Moran. "Unified Description of NMR Imaging Data Collection
. Strategies and Reconstruction." Medical Physics 11,1.0. 1 (1984): 1-14.

Sections 10.10-10.13

For fan-beam reconstruction theory, see [6, 7] and Horn in [1(iii), pp. 1616-1623].
For algebraic techniques and ART algorithms, see [5, 6] 'and: .

31. S. Kaczmarz, "Angenaherte Auflosung von Systernen Linearer Gleichungen." Bull.


Acad. Polon. Sci. Lett. A. 35, (1937): pp. 355-357.
32. R. Gordon, "A Tutorial on ART (Algebraic Reconstruction Techniques)." IEEE TraIlS.
Nucl. Sci. NS·21, (1974): 78.
33. P. F. C. Gilbert. "Iterative Methods for the Reconstruction of Three-Dimensional Ob-
jects from Projections." J. Theor. Bioi. 36 (1.972): 105-117. •
34, G. T. Herman, A. Lent, and S. W. Rowland. "ART: Mathematics and Applications," J.
Theor. Biol. 42 (1973): 1-32.
35. A. M. Cormack. "Representation of a Function by its Line Integrals with Some
Radiological Applications." J. Appl. Phys. 34 (1963): 2722-2727. Also see Part II, J.
Appl. Phys. 35 (1964): 2908-2913. .

For other applications of Radon transform and its extensions: •

36. M. Bernfield. "CHIRP Doppler Radar." Proc, IEEE, vol. 72, no. 4 (April 1984):
540-541.
37. J. Raviv, J. F. Greenleaf, and G. T. Herman (eds.), Computer Aided Tomography and
Ultrasonics in Medicine. Amsterdam: North-Holland, 1979.

,

• •
, '

••


Bibliography Chap. 10 475


Image Data Compression

11.1 INTRODUCTION
Image data compression is concerned with minimizing the number of bits required
to represent an image. Perhaps the simplest and most dramatic form of data
compression is the sampling of bandlimited images, where an infinite number of
pixels per unit area is reduced to one sample without any loss of information
(assuming an ideal low-pass filter is available). Consequently, the number of sam-
ples per unit area is infinitely reduced. .
Applications of data compression are primarily in transmission and storage of
information. Image transmission applications are in broadcast television, remote
sensing via satellite, military communications via aircraft, radar and sonar, tele-
conferencing, computer communications, facsimile transmission, and the like.
Image storage is required for educational and business documents, medical images
that arise in computer tomography (CT), magnetic resonance imaging (MRI) and
digital radiology, motion pictures, satellite images, weather maps, geological sur-
veys, and so on. Application of data compression is also possible in the development
of fast algorithms where the number of operations required to implement an algo-
rithm is reduced by working with the compressed data.

Image Raw Data Rates


Typical television images have spatial resolution of approximately 512 x 512 pixels

per frame. At 8 bits per pixel per color channel and 30 frames per second, this
6
translates into a rate of nearly 180 x 10 bits/so Depending on the application, digital
image raw data rates can vary from lOs bits per frame to 108 bits per frame or higher.
The large-channel capacity and memory requirements (see Table LIb) for digital
image transmission
• •
and storage make it desirable to consider data•
compression
techniques. .

476
Data Compression versus Bandwidth Compression
The mere process of converting an analog video signal into a digital signal results in
. increased bandwidth requirements for transmission. For example; a 4-MHz tele-
vision signal sampled at Nyquist rate with 8 bits per sample would require a band-
width of32 MHz when transmitted using a digital modulation scheme, such as phase
shift keying (PSK), which requires 1 Hz per 2 bits. Thus, although digitized informa-
tion has advantages over its analog form in terms of processing flexibility, random
access in storage, higher signal to noise ratio for transmission with the possibility of
errrorless communication, and so on, one has to pay the price in terms of this
eightfold increase in bandwidth. Data compression techniques seek to minimize this
cost and sometimes try to reduce the bandwidth of the digital signal below its analog
bandwidth requirements.
Image data compression methods fall into two common categories. In the first
category, called predictive coding, are methods that exploit redundancy in the data.
a
Redundancy is characteristic related to such factors as predictability, randomness,
and smoothness in the data, For example, an image of constant gray levels is fully
predictable once the gray level of the first pixel is known. On the other hand, a
white noise random field is totally unpredictable and every pixel has to be stored to
reproduce the image. Techniques, such as delta modulation and differential pulse
code modulation fall into this category. In the second category, called transform
coding, compression is achieved by transforming the given image into another array
such that a large amount of information is packed into a small number of samples.
Other image data compression algorithms exist that are generalizations or com-
binations of these two methods. The compression process inevitably results in some
distortion because of accompanying A to D conversion as well as rejection of some
relatively insignificant information. Efficient compression techniques tend to min-
imize this distortion. For digitized data, distortionless compression techiques are
possible, Figure 1l.1 gives a summary classification of various data compression
techniques.

Information Rates
Raw image data rate does not necessarily represent its average information rate,
which for a source with L possible independent symbols with probabilities Pi,
Image data compression techniques
,
••
,


.
• . •
.


[ Pixel coding ] Predictive coding , ITransform coding J Other methods

• ?CM/quantization .. Delta modulation • Zonal coding ,. Hybrid coding


• Run·length coding " t.ine-bv-line DPCM • Threshold coding .. Two-tone/graphics
.. Bit-plane coding • 2-D OPCM ,. Multid;meoJilonal coding

. ,.
.-,_""".' *:Interframe techniques
• Adaptive
techniques
* Adaptive
.. Color image coding

.. Vector quantization
, .. Miscellaneous

Figure 11.1 .Image data compression techniques.

Sec. 11.1' Introduction 477



i = 0, ... , L - 1, is given by the entropy
L -1
H = .- I Pi IOg2Pi bits per symbol (11.1)
i=O

According to Shannon's noiseless coding theorem (Section 2.13) it is possible


to code, without distortion, a source of entropy H bits per symbol using H + e bits
per symbol, where e is an arbitrarily small positive quantity. Then the maximum
achievable compression C, defined by
C = average bit rate of the original raw data (B) .
(11.2)
average bit rate of the encoded data (H + e)
is BI(H + e) "'" BIH. Computation of such a compression ratio for images is
impractical, if not impossible, For example anN x M digitalimage with B bits per
pixel is one of L = 2BNM possible image patterns that could occur. Thus if Pi, the
probability of the ith image pattern, were known, one could compute the entropy,
that is, the information rate for B bits per pixel N x M images. Then one could store
all the L possible image patterns and encode the image by its address using a
suitable encoding method, which will require approximately H bits per image or
H INM bits per pixel. .
Such a method of coding is called vector quantization, or block coding [12].
The main difficulty with this method is that even for small values of Nand M, Lean
2048 614
be prohibitively large; for example, for B = 8, N = M = 16 and L = 2 = 10 •
Figure 11.2 shows a practical adaptation of this idea for vector quantization of 4 x 4
image blocks with B = 6. Each block is normalized to have zero mean and unity
variance. Using a few prototype training images, the most probable subset contain.
ing L' 4; L images is stored. If the input block is one of these L' blocks, it is coded
by the address of the block; otherwise it is replaced by its mean value..
. The entropy of an image can also be estimated from its conditional entropy.
For a block of N pixels 140, Ul,' •• ,UN'1, with B bits per pixel and arranged in an
arbitrary order, the N th-order conditional entropy is defined as

(11.3)

Coder I Decoder
• I
I
4 X4 image block Calculate
mean u and
fJ,(J ! (Jul + fJ
u'(m, nl
ulm, nl std. dev, (J I
" I •

I ,


Prototype "
~ Read
patterns, ul ul(m, nl
I
1=0, 1, .. "L - t I •
fJ,(J I
ill I I
I

U-fJ
A
u(m, nl Match ,
. I •

o pattern
i
I
I .

. FIgure 11.Z Vector quantization of images.

478 Image Data Compression Chap, 11



, •
where each u., i = 0, ... ,N - 1, takes 2 values, and ph" . ..) represent the
8

relevant probabilities. For 8-bit monochrome television-quality images, the zero- to


second-order entropies (with nearest-neighbor ordering) generally lie in the range
of 2 to 6 bits/pixel. Theoretically, for ergodic sequences, as N ~ "', H", converges to
H, the per-pixel entropy. Shannon's theory tells us that the bit rate of any exact
coding method can never be below the entropy H.
Subsampling, Coarse Quantization,
Frame Repetition, and Interlacing
One obvious method of data compression would be to reduce the sampling rate, the
number of quantization levels, and the refresh rate (number of frames per second)
down to the limits of aliasing, contouring, and flickering phenomena, respectively.
The distortions introduced by subsampling and coarse quantization for a given level
of compression are generally much larger than the more sophisticated methods
available for data compression. To avoid flicker in motion images, successive frames
have to be refreshed above the critical fusion frequency (CFF), which is 50 to 60
pictures per second (Section 3.12). Typically, to capture motion a refresh rate of 25
to 30 frames per second is generally sufficient. Thus, a compression of 2 to 1 could
be achieved by transmitting (or storing) only 30 frames per second but refreshing at
60 frames per second by repeating each frame. This requires a frame storage, but an
image breakup or jump effect (not flicker) is often observed. Note that the frame
repetition rate is chosen at 60 per second rather than 55 per second, for instance, to
avoid any interference with the line frequency of 60 Hz (in the United States).
instead of frame skipping and repetition, line interlacing is found to 'give
better visual rendition. Each frame is divided into an odd field containing the odd ,
line addresses and an even field containing the even line addresses; frames are
transmitted alternately. Each field is displayed at half the refresh rate in frames per "
second. Although the jump or image breakup effect is significantly reduced by line
interlacing, spatial frequency resolution is somewhat degraded because each field is
a subsampled image. An appropriate increase in the scan rate (that is, lines per
frame) with line interlacing gives an actual compression of about 37% for the same
subjective quality at the 60 frames per second refresh rate without repetition. The
success of this method rests on the fact that the human visual system has poor
response for simultaneously occurring high spatial and temporal frequencies. Other
interlacing techniques, such as vertical line interlacing in each field (Fig. 4.9), can
reduce the data rate further without introducing aliasing if the spatial frequency
spectrum does not contain simultaneously horizontal and vertical high frequencies
(such as diagonal edges). Interlacing techniques are unsuitable for the display of
high resolution graphics and other computer generated images that contain sharp
edges and transitions. Such images are commonly displayed on a large raster (e.g, ,
1024 x 1024) refreshed at 60 Hz. ,
'

11.2 PIXEL CODING •


,

In these techniques each pixel is processed independently, ignoring the inter pixel
dependencies. , . ' .

Sec. 11.2 Pixel Coding 479


pcrv'!
.. ,

In:PCM the incoming video signal is sampled, quantized, and coded by a suitable
code word (before feeding it to a digital 'modulator for transmission) (Fig. 11.3).
The quantizer output is generally coded by a fixed-length binary code word having
B bits. Commonly, 8 bits are sufficient for monochrome broadcast or video-'
conferencing quality images, whereas medical images or color video signals may
require 10 to 12 bits per pixel. .
The number of quantizing bits needed for visual display of images can be
reduced to 4 to 8 bits per pixel by using companding, contrast quantization, or
dithering techniques discussed in Chapter 4. Halftone techniques reduce the
quantizer output to 1 bit per pixel, but usually the input sampling rate must be
increased by a factor of 2 to 16. The compression achieved by these techniques is
generally less than 2 : 1. .
In terms of a mean square distortion, the minimum achievable rate by PCM is
given by the rate-distortion formula
. cr 2
R pCM =! log, ;, cr~ < cr~ (11.4)
cr q
where cr~ is the variance of the quantizer input and cr~ is the quantizer mean square
distortion.

Entropy Coding

If the quantized pixels are not uniformly distributed, then their entropy will be less
than B, and there exists a code that uses less than B bits per pixel. In entropy coding
the goal is to encode a block of M pixels containing MB bits with probabilities
pi, i = 0,1, ... ,L - 1, L = 2MB, by -logzPI bits, so that the average bit rate is

This gives a variable-length code for each block, where highly probable blocks (or
symbols) are represented by small-length codes, and vice versa. If -!ogzp, is not an
integer, the achieved rate exceeds H but approaches it asymptotically with in-
creasing block size. For a given block size, a technique called Huffman coding is the.
most efficient fixed to variable length encoding method.

The Huffman Coding Algorithm

i. Arrange the symbol probabilities Pi in a decreasing order and consider them as


leaf nodes {If a tree.

ult) ,, u(nT) Zero-memory g,(n) Transmit


Sample Digital
quantizer
. and hold modulation
, and coder

• Figure n.3 Pulse code



modulation (PCM).

480 Image Oata Compression Chap. 11


• •


2. While there is more than one node:
o Merge the two nodes with smallest probability to form a new node whose
probability is the sum of the two merged nodes.
e Arbitrarily assign 1 and 0 to each pair of branches merging into a node.
3. Read sequentially from the root node to the leaf node where the symbol is
• located.

The preceding algorithm gives the Huffman code book for any given set of
probabilities. Coding and decoding is done simply by looking up values in a table.
Since the code words have variable length, a buffer is needed if, as is usually the
case, information is to be transmitted over a constant-rate channel. The size of the
code book is L and the longest code word can have as many as L bits. These
parameters become prohibitively large as L increases. A practical version of Huff-
man code is called the truncated Huffman code. Here, for a suitably selected L 1 < L,
the first L 1 symbols are Huffman coded and the remaining symbols are Coded by a
prefix code followed by a suitable fixed-length code. .
• Another alternative is called the modified Huffman code, where the integer i is
represented as ,
.. (L -1)
i = qL1 + [. o<: q :S Int L ' 0 -s j s L 1 - 1
1 •
(11.5)

The first L 1 symbols are Huffman coded. The remaining symbols are coded by a
prefix code representing the quotient q, followed by a terminator code, which is the
same as the Huffman code for the remainder j, 0 S j S L 1 - L
• •
The long-term histogram for television images is approximately uniform, al-
though the short-term statistics are highly nonstationary. Consequently entropy
coding is not very practical for raw image data. However, it is quite useful in
predictive and transform coding algorithms and also for coding of binary data such
as graphics and facsimile images.
Example 11.1
Figure 11.4 shows an example of the tree structure and the Huffman codes. The
algorithm gives code words that can be uniquely decoded. This is because no code word
can be a prefix of any larger-length code word. For example, if the Huffman coded bit
stream is
o 0 0 1 0 1 1 0 1 0 1 1 ...

then the symbol sequence is SOS2S,$3' • , , A prefix code (circled elements) is obtained
by reading the code of the leaves that lead up to the first node that serves as a root for
the truncated symbols. In this example there are two prefix codes (Fig. 11.4). For the
.". truncated Huffman code the symbols s., .... , $7 'are coded by a 2-bit binary code word.
This code happens to be less efficient than the simple fixed length binary code in this
example. But this is not typical of the truncated Huffman code.

Run-Length

CodilJ9A'

-.'

. Consider a binary source whose output is coded as the number of as between two
successive Is, that is, the length of the runs of as are coded. This is called run-length

Sec; 11;2 481


.

, Truncated Modified 1
Huffman
I
,
• Huffman Huffm.n
Binary
$vmb<>l
code
PI ,
,
code
{He} COde, L, "2 code. L, =4 I
{THC} IMHCI I
\

0 0, 54 0 P 00
.

o O' 00
I
$0 000 0.25 1.0
\,

$, 001 0.21
(I
;
/ .0
0 0.46 1 0 1 0 1 0
,
,
\

1 x' y
, •

~ 0' 0
,
'2 o10 0.15 0 THe 010 (110

::x
0.29 Q) ,
,
.

, '2 o1 1 0.14 1 1
Root nodes
oI 1 o1 1 0 o1 1

for trurn:ated I x II x" it:

~OO [i2]OO
symbols
'. 100 0.0625 .. 0
, j!;125
1 1 0 0

0 MHC
0.25 .
=s 101 0.0625'- 1 1 1 0 1 1 1 0 1 1 1 1 0 •

. .
1 " •
'6 1 10 O.0625~ (I 1 1. 1 0 1 1 1 0 1 101 0

.
,
0.125
,

. '7 1 1 1 0.0625" 1

1 1 1 1 ,
1 1 1 1 1 10 1 1

.

Average
code 3.0 2.781 (entropy) 2.79 3.06 2,915
len!/lh

Code
efficiency 92,7% 99.7% 90,3% 95,4%
HIS.

Figure 11.4 Huffman coding example. r, x', x" = prefix codes, y = fixed length
code. z = terminator code. In general z, x'; x" can be different.


coding (RLC), It is useful whenever large runs of Os are expected. Such a situation
occurs in printed documents, graphics, weather maps, and so on, where p, the
probability of a 0 (representing a white pixel) is close to unity. (See- Section 11.9.)
Suppose the runs are coded in maximum lengths of M and, for simplicity, let
M = 2'" - 1. Then it will take m bits to code each runby a fixed-length code. If the
successive Os occur independently, then the probability distribution of the run
lengths turns out to be the geometric distribution .
.

I ) = pl(1- p), 0 $[ $ M - 1
g ( p", l=M (11.6)

Since a run length of 1$ M - 1 implies a sequence of 1Os followed by a 1, that is,


\

Image Data Compression' Chap. 11



, ,
(I + 1) symbols, the average number of symbols per run will be
M~l . (1- M)
11-= 2:
/=0
(l+l)p/(1-p)+Mp M "" (1!)
p
(11.7)
,
Thus it takes m bits to establish a run-length code for a sequence of 11- binary
symbols, on the average. The compression achieved is, therefore, '
IL (i-pM)
C===' (11.8)
m m(l-p)
For p = 0.9 and M = 15 we obtain m == 4, JJ. = 7.94, and C = 1.985. The achieved
average rate is B. = mlu. = 0.516 bit per pixel and the code efficiency, defined as
H lB., is 0.469/0.516 = 91 %. For a given value of p, the optimum value of M can be'
determined to give the highest efficiency. RLC efficiency can be improved further
.by going to a variable length coding method such as Huffman coding for the blocks
of length m. Another alternative is to use arithmetic coding [10] instead of the RLC.

Bit-Plane Encoding [11]


I

A 256 gray-level image can be considered as a set of eight I-bu planes, each of which'
can be run-length encoded. For S-bit monochrome images, compression ratios of
1.5 to Z can be achieved. This method becomes very sensitive to channel errors
unless the significant bit planes are carefully protected. .

11.3 PREDICTIVE TECHNIQUES

Basic Principle \

The philosophy underlying predictive techniques is to remove mutual redundancy


between successive pixels and encode only the new information. Consider a sam-
pled sequence u(m), which has been coded up to m =n -1 and let u'(n -1),
u: (n - 2), ... be the values of the reproduced (decoded) sequence. At m = n, when
u(n) arrives, a quantity u' (n), an estimate of u(n), is predicted from the previously
decoded samples u: (n - 1), u: (n - 2) ... , that is,
u' (n) == W(u' (n - 1), u· (n - 2), ...) ( 1 1 . 9 )
, where W(',', ...) denotes the prediction rule. Now it is sufficient to code the predic-

non error I

. e(n) ~u(n) - u' (n) , ' . (11.10)


If e: (n) is the quantized value of. e(n), then the reproduced value of u(n) is taken as
,

u' (n) == u' (n) + r (n) (11.11)


,

The coding process continues recursively in this manner, This method is .ealled
differential pulse code modulation (DPCM) or differential PCM. Figure 11.5 shows


Sec. 11.3, Predictive Techniques 483
r---- - - --.---------!
< J <
i
u(n) + etnj e' Inl Communication 'f(n) I + I /J'(nl
:l: Ouantizer ,, :l:
-
channel
I ;, I
<

+ u'(n) I

• , , ,, J

inn) I I
<
u'(n) + ,
<
I <

Predictor
I
• Predictor I
<
I
, <

<
with delays k ,, with delays I
<

r- + I I
• I I
• L ________________ --l
Reconstruction filter
Coder Decoder

• Figure U.S Differential pulse code modulation (DPCM) CODEC. .

the DPCM codec (coder-decoder). Note that the coder has tocalculate the re-
produced sequence u' (n).The decoder is 'simply the predictor loop of the coder.
Rewriting (11.10) as
u(n) = u' (n) + e(n) (11.12)
and subtracting (11.11) from (11.12), we obtain
(Ju(n) ~ u(n) - u: (n) = e(n) -if (n) = q(n) (11.13)
,

Thus, the pointwise coding error in the input sequence is exactly equal to q(n), the
quantization error in e(n). With a reasonable predictor the mean square value of the
differential signal e(n) is much smaller than that of u(n). This means, for the same
mean square quantization error, e(n) requires fewer quantization bits than u(n).

Feedback Versus Feedforward Prediction
<

An important aspect of DPCM is (11.9) which says the prediction is based on the
output-the quantized samples-rather than the input-the unquantized samples.
This results in the predictor being in the feedback loop around the quantizer, so that
the quantizer error at a given step is fed back to the quantizer input at the next step.
This has a stabilizing effect that prevents de drift and accumulation of error in the
reconstructed signal ic (n)..
On the other hand, if the prediction rule is based on the past inputs (Fig.
11.6a), the signal reconstruction error would depend on all the past and present

<'r nl
<

Quantizer

u(n) + «n) 68 a + u •(n)


'+ +
- < '

b ,

Entropy
b
+

. coder/decoder
tIn) •
u(nl
<

Predictor
• < ,,
<
, [predictor ]
• <
<

<

Figure 11.6 Feedforward coding (a) with distortion; (b) distortionless.


484 Image Data Compression Chap. 11


5 I-

.
1
I I I I I
-6 -4 -2 2 4 6
-1


.
,

'-- -5 Figure n.7 Ouantizer used in Exam-


ple 11.2.

quantization errors in the feedforward prediction-error sequence zen). Generally,


the mean square value of this reconstruction error will be greater than that in
DPCM, as illustrated by the following example (alsosee Problem 11.3).
Example 11.2
The sequence 100, 102, 120, 120, 120, 118, 116 is to be predictively coded using the
previous clement prediction rule, u' (n) "" u' (n - 1) for DPC~f and u (n) ..:. u(n -:- 1) for
the feedforward predictive coder. Assume a 2-bit quantizer shown in Fig. 11.7 is used,
except the first sample is quantized separately by a 7-bit uniform quantizer, giving
u: (0) "" u(O) = 100. The following table shows how reconstruction error builds up with
a feedforward predictive coder, whereas it tends to stabilize with the feedback system
of DPCM.

Input DPCM Feedforward Predictive Coder

n u(n) u'(n) e(n) e' (n) £((n) 8u(n) u(n) sen) a' (n) . u' (n) lIu (n)
.

1
0 100
102
-
100 2
- -
1
100
101
0
1,
-
100
-2 -
1
100
101
0
1
Edge-;. 2 120 101 19 5 106 14 102 18 5 106 14
3 120 106 14 5 111 9 120 0 -1 105 15
4 120 111 . 9 5 116 4 120 0 . -1 101 16
5 118 116 2 1 117 1 120 -2 .-5 . 99 19

Distortionless Predictive Coding

In digital processing the input sequence u(n) is generally digitized at the source
itself by a sufficient number of bits (typically 8 for images). Then, u(n) may be
considered as an integer sequence. By requiring the predictor outputs to be integer
values, the prediction error sequence will also take integer values and can be
entropy coded without distortion. This gives a distortionless predictive codec (Fig.
. 11.6b), whose minimum achievable rate would be equal to the entropy of the
prediction-error sequence e(n).

Sec. 11.3 Predictive Techniques 485'


Performance Analysis of DPCM

Denoting the mean square values of quantization error q(n) and the prediction
error e(n) by lT~ and O"~, respectively, and noting that (11.13) implies
E[(Su(n»:] = lT~ (11.14)

the minimum achievable rate by DPCM is given by the rate-distortion formula {see
(2.116)]

RDPC.Il =i
I IT.
log, 2"
2
bits/pixel (11.15)
lT q

- In deducing this relationship, we have used the fact that common zero memory
quantizers (for arbitrary distributions) do not achieve a rate lower than the Shannon
quantizer for Gaussian distributions (see Section 4.9), For the same distortion
lT~:S; IT;, the reduction in DPCM rate compared to PCM is [see (11.4)] .

'R R I O'~. 1.I lT~


PCM- DPcM=21o~ 2" =06 ogto 2' bits/pixel (11.16)
U e • ae
This shows the achieved compression depends on the reduction of the variance ratio
(11~/O'~), that is, on the ability to predict u(n) and, therefore, on the intersample
redundancy in the sequence. Also, the recursive nature of DPCM requires that the
predictor be causal: For minimum prediction-error variance, the optimum predictor
is the conditional mean E[u(n)lu' (m), m :s; n -1]. Because of the quantizer, this is
a nonlinear function and is difficult to determine even when u(n) is a stationary
Gaussian sequence. The optimum feedforward predictor is linear and shift invariant
for such sequences, that is,

urn) = 4>(u(n -1),.,.) = La(k)u(n - k) (ILl7)


k

If the feed forward prediction error E(n) has variance (32, then
(32 <'O'~ (11.18)
This is true because tr (n) is based on the quantization noise containing
samples {u' (m), m <: n] and could never be better than 12 (n). As the number of
quantization levels is increased to infinity, (T~ will approach f32, Hence, a lower
bound on the rate achievable by DPCM is •
'

(11.19)

When the quantization error is small, R DPCM approaches R min • This expression is
usefulbecauseit is much easier to evaluate 13 than 0'; in (ILl5), and it can be used
2

to estimate the achievable compression. The SNR corresponding to O'~ is given •


by • •

. lTi . (T~ (J'~


(SNR)DPCM = 10 IOglll~ = 10 IOglO u;f(B):S; 10 loglo ~lf(B) (11.20)
, ' .
where feB) is the quantizer mean square distortion function for It unit variance

486 Image pata Compression Chap. 11



input and B quantization bits. For equal number of bits used, the gain in SNR of
DPCM over PCM is
lT~' lT~
(SNR)DPcM - (SNR)p~M = 10 logw ~ :;;10 loglO J32 dB (11.21)

which is proportional to the log of the variance ratio (0'~/f$2). Using (11.16) we note
that the increase in SNR is approximately 6 (R p CM - R DPCM ) dB, that is, 6 dB per bit
of available compression.
From these measures we see that the performance of predictive coders
depends on the design ofthe predictor and the quantizer. For simplicity, the predic-
tor is designed without considering the quantizer effects. This means the prediction
rule deemed optimum for Ii (n) is applied to estimate u· (n). For example, if it (n) is
given by (11.17) then the DPCM predictor is designed as

U'(n) =4>(u'(n -1),u'(n -2), ...) = 2:a(k)u'(n -k) (11.22)


k

In two (or higher) dimensions this approach requires finding the' optimum causal
prediction rules. Under the mean square criterion the minimum variance causal
representations can be used directly. Note that the DPCM coder remains nonlinear
even with the linear predictor of (11.22). However, the decoder will now be a linear
filter, The quantizer is generally designed using the statistical properties of the
innovations sequence E(n), which can be estimated from the predictor design.
Figure 11.8 shows a typical prediction error signal histogram. Note the prediction
• •

0.50

••

0.40 I," ,


, , .', _.-
,. • . - . \ • •

0.30


• •

0.20 •••
. . :l~';:);

, .

,.
• •
0.10

. . 0.00 I
• •

. -64-56 -48 -40 -32 -24 -16 - 8 0 8 1624 32 4048' 66. 64

"
., . , .\-":' ,
• • •
Figure 1l.8 Predictions - error histogram.

Sec. 11.3 Predictive Techniques 487


error takes large values near the edges. Often, the prediction erroris-modeled as a
zero mean uncorrelated sequence with a Laplacian probability distribution, that is,
1, ('-v'2,1"1
, P (e)= 132 exp '{3
I
(1 L~3)

, where 13 2 is its variance. The quantizer is generally chosen to be either the Lloyd-
Max (for a constant bit rate at the output) or the optimum uniform quantizer
(followed by an entropy coder to minimize the average rate). Practical predictive
codecs differ with respect to realizations and the choices of, predictors and
quantizers. Some of the common classes of predictive codecs for images are de-
scribed next.

Delta Modulation

Delta modulation (DM) is the, simplest of the predictive coders. It uses a one-step
delay function as a predictor and a J-bit quantizer, giving a l-bit representation of
the signal. Thus
U'(n)=u'(n -1), e(n) =u (n) - U· (n - 1) (11.24)
A practical DM system, which does not require sampling, of the input signal, is
shown in Fig. 11.9a. The predictor integrates the quantizer output, which is a
sequence of binary pulses. The receiver is a simple integrator. Figure lL9b shows
typical input-output signals of a delta modulator. Primary limitations of delta
modulation are (1) slope overload, (2) granularity noise, and .(3) instability to
channel errors. Slope overload occurs whenever there is a large jump or discon-
tinuity in the signal, to which the quantizer can respond only in several delta steps.
Granularity noise is the steplike nature of the output when the input signal is almost
constant. Figure l1.lOb shows the blurring effect of slope overload near the edges
and the granularity effect in the constant gray-level background.
Both of these errors can be compensated to a certain extent by low-pass
filtering the input and output signals. Slope overload can. also be reduced by in-
creasing the sampling rate, which will reduce the interpixel differences. However,
the higher sampling rate will tend to lower the achievable compression. An alterna-
tive for reducing granularity while retaining simplicity is to go to a tristate delta
modulator. The advantage is that a large number (65 to 85%) of pixels are found to
be in the .level, or 0, state, whereas the remaining pixels are in the + 1 states.
Huffman coding the three states or run-length coding the 0 states with a 2-bit code
for the other states yields rates around 1 bit per pixel for different images [14].
The reconstruction filter, which is a simple integrator, is unstable. Therefore,
in the presence of channel errors, the receiver output can accumulate large errors. It
can be stabilized by attenuating the predictor output by a positive constant o < 1
(called leak). This will, however, not retain the simple realization of Fig. 11.9a.
For delta modulation of images, the signal is generally preserited line by line
and no advantage is taken or the two-dimensional correlation in. the data. When
each, scan line of the image is 'represented by a first-order AR process (after
subtracting the mean),

488 Image Data Compression Chap. 11


Transmitter'
Threshold
t· +q "",' •

I
u(t) . +"......,.
, eitl !
,,'Id e'(nl
Input --I"': Low-pass , ,
signa1 filter
q

Integrator
,
, I u'it) , 1
)
J
,,

i

\ I
,
Integrator

! e'(nj u'(nl
Low-pass ! u'Ul

., .
Channel
I .
I , filter

I j I Receiver
ial A practical system

Granularity

t \
-......,...,. =t:R::::t--
Slope
overload

t ...

(bl Typical input ~ output sigoals

Figure 11.9 Delta modulation.

u(n) = pu(n -1) + e(n), E[ e(n)] = 0, E[e(n)e(m)l ,


(11.25)
" = (1 - p2)(T~ SCm - n)

the SNR of the reconstructed signal is given, approximately, by (see Problem 11.4)
, "

.' , {l- (2p - l)f(l)} .


.(SNR)DM=10Io glO {2(1-p)f(l)} , dB (11.26)

· Assuming the. prediction.error to be Gaussian and quantized by its Lloyd-Max


,.. '

. 'quantizer and p = 0.95, the SNR


, . '
is 12.8 dB, which is an S.4-dB improvement over ~

PCM at 1 bit per pixel. This amounts to a compression of 2,5, or a savings of about
· 1.5 bits per pixel. Equations (11.25) and (H.26) indicate the SNR of delta modu-

· Sec. 1l.3. I Predictive Techniques 489


f
I if--

)- ,!

L
'0- _

• _., " ,jr:."*,,


'.'-. /£011' \..,***"')",,»-- - - -*",

(al Original; (bl delta modulation. laa~ = 0.9;

r
I
,

I, :1# f

..
't'
'i
-1

,
"'!
)_. _,_,,","n;;l~
(e) lina·by-line OPCM, 3 bits/pixel; (dl two-dlmenslcnal OPCM. 3 bits/pixel.

Figure 11.10 Examples of predictive coding,

lation can be improved by increasing p, which can be done by increasing the


sampling rate of the.quantizer output. For example, by doubling the sampling rate
in this example, p will be increased to 0.975, and the SNJ:<. will increase by 3 dB. At
the same time, however, the data rate is also doubled. Better performance can be
obtained by going to adaptive techniques or increasing the number.of quantizer bits,
which leads to DPCM. In fact, a large number of the ills of delta modulation can be
cured by DPCM, thereby making it a more attractive alternative for data com-

pression. •

• •
Line~bv·line DPCM

In this method each scan line of the image is coded independently by the DPCM
technique. Generally, a suitable AR representation is used for designing the pre-

490 Image Data Compressio~ Chap. 11


,
,

,
dieter. Thus if we have apth-order, stationary All sequence (see Section 6,2)
p

u(n) - :z a(k)u(n - k) = E(n),


k-I
(11,27)
,

the DPCM system equations are


p

.Predictor: u'(n)= 2: a (k)u'(n -k) •


(11.28a)
k-l
Quantizer input: e(n) = u(n) - u' (n), quantizer, output «« (n) (11.28b)
Reconstruction filter: u' (n) =u' (n) + e' (n)
(reproduced output) (l1.28c)
p
= 2: a(k)u' (n ..; k) + e' (n)
k-I

For the first-order AR model of (11.25), the SNR of a B-bit DPCM system output
can be estimated as (Problem 11,6)
. (1- p2f(B»
(SNR)DPCM = 10 IOglO (1- p2)f(B) dB (11.29)
,

Forp = 0.95 and a Laplacian density-based quantizer, roughly 8-dB to lO-dB SNR
improvement over PCM can be expected at rates of 1 to 3 bits per pixel.
Alternatively, for small distortion levels (f(B) "" 0), the rate reduction over PCM is
. [see (11.16») .

R pCM - R DPCM = f log, (1 2p~ bits/pixel (11.30)

, This means, for example, the SNR of 6-bit PCM can be achieved by 4-bit line-by-
line DPCM for p = 0.97. Figure l1.lOc shows a line-by-line DPCM coded image at 3
bits per pixel.

Two-Dimensional DPCM

The foregoing ideas can be extended to two dimensions by using the causal MVRs
discussed in chapter 6 (Section 6.6), which define a predictor of the form
,

u(m,n)= 2:2:.w, a(k,/)u(m"-k,n-/) (11.31)


(k,/jE
• •
where "'l is a causal prediction window. The coefficients atk, I) are determined by
solving (6.66) for x = 1, which minimizes the variance of the prediction error in the
image. For common images it has been found that increasing size of W; beyond the
four nearest (causal) neighbors (Fig. 11.11) does not give any appreciable reduction
in prediction error variance. Thus for row-by-row scanned images, it is sufficient to
consider predictors of the form .
, ,
Ii (m, n) == olu(m -l,n) + a2u(m. n -1)
(11.32)
+ aJu(m -l,n -1) + tl.4u(m -l,n +'1)

Sec. 11.3 'PrediCtive Techniques 491


I .. n

m
,

8 C D
e €I €I , '

A
e •

Here QI , Q2 , QJ , a4, and f32 are obtained by solving the linear equations
r(l,O) = Qlr(O,O) + Q2r(1, -1) + QJr{O, 1) +a,r(O, 1)
, . , '

reO, 1) = QI r(l, -1) + Q2r(0, 0) + Q J r'(l , 0) + Q4r(1, -2)


r(l, 1) == al reO, 1) + a2r(1, 0) + Q3r(0, 0) + Q,r(0,2)
(11.33)
r(l, -1) = al r(D, 1) + a2r(1, -2) + QJr(O, 2) + Q4r(0, 0)
f32= E[e 2(m,
,
n)]
= reO, 0) - QI r(l, 0) - Q2r(0, 1:) - a;r(l, 1) - a4r(1, -1)
,

where rtk, I) is the covariance function of u(m, n), In the special case of the
separable covariance function of (2.84), we obtain

(11.34) ,
(32 == (.12 (1- p1)(l - P1) •

Recall from Chapter 6 that unlike the one-dimension case, this solution of
(11.33) can give rise to an unstable causal model. This means while the prediction
error variance will be minimized (ignoring the quantization effects), the recon-
struction filter could be unstable causing any channel error to be amplified greatly at
the receiver. Therefore, the predictor has to be tested for stability and, if not stable,
it has to be modified (at the cost of either increasing the prediction error
, variance or
increasing the predictor order). Fortunately, for common monochrome image data
(such as television images), this problem is rarely encountered.
Given the predictor as just described, the equations for a two-dimensional
DPCM system become
Predictor: tr (m, 71)= al u' (m -1, n) + a2 u' (m, It - 1)
(11.35a)
, + a; u' em - 1, n - 1) + a4 u: (m -1, n + I)

'Quantizerinput: e(m,n)=u(m,n)-u'(m,n) (11.35b)
"
Reconstruction filter: u: (m, 71) = u' (m, 71) + e' (m, n) (11.35c)
• • •

The performance bounds of thismethod can be evaluated via (11.19) and (11.20).
An example, of a two-dimensional DPCM coding at 3 bits per pixel is shown in Fig.
1l.10d.
,

Image Data Compression Chap. 11


492
35 ,

30

t 25-
- Gauss-Markov p "': 0.95
-
'" 20 i-
-a:
-c
I""

5i' 15 i- ~ PCM (Gaus.ian random variablel •


,
10 i-

,
o I •

1 2 3 4
Rata (bit.!sam;:lle) •
Figure 11.12 Performance of predictive codes.

'Performance Comparisons

Figure 11.12 shows the theoretical SNR versus bit rate of two-dimensional DPCM
of images' modeled by (11.34) and (11.35) with a, = O. Comparison with one-
dimensional line-by-line DPCM and rCM is also shown. Note that delta modu-
lation is the same as J-bit DPCM in these curves. In practice, two-dimensional
DPCM does not achieve quite as much as a 20-dB improvement over POI/I, as
expected for random fields with parameters of (11.34). This is because the two-
dimensional separable covariance model is overly optimistic about the variance of
the prediction error. Figure 11.13 compares the coding-error images in one- and

r
I
r !

I
I
!,
,
-.
,', , , " " j'" f' " "
-.
- --
Ii,
-.,. '.-
,, . I' ,,'
~ '.
"".",,,, up; ,,,,,,,,,,,,.J;I..;'.,,. m\f\Mi44t\&L &j&i/.,n'"""",,,,,,""jd""~" .£. Ad.
,
.. oJ
j~
I. l.
\
rvi: f ,t
,1." \

,
b-,'>-/'
"," !

~ ., , ,
,l_
I ..,
-f;
., ,
,
'-' .. ,
':t,'
\,..0." i
• 'J,
" ,
- '4
"
6~' ,', ,"" , I
,
""~""
-. ~.

. •
(a) cne-dlrnenslcnal (b) two-dimensional

Figure 11.13 One- and two-dimensional PPCM images coded at 1 bit (upper
images) and 3 bits (lower images) and their errors in reproduction..

Sec. 11.3 . Predictive Techniques 493


two-dimensional DPCM. The subjective quality of an image and its tolerance to
channel errors can be improved by two-dimensional predictors. Generally a
3-bit-per-pixel DPCM coder can give very good quality images. With Huffman
coding, the output rate of a 3-bit quantizer in two-dimensional DPCM can be
reduced to 2 to 2.5 bits per pixel average.

Remarks

Strictly speaking, the predictors used in DPCM are for zero mean data (that is, the
dc value is zero). Otherwise, for a constant background fJ., the predicted value
u' (m, n) '= (al + a~ + aJ + a4)fL (11.36)
would yield a bias of (1 - at - al - aJ - 'l4)11, which would be zero only if the sum of
the predictor coefficients is unity. Theoretically, this will yield an unstable rccon~
struction filter (e.g., in delta modulation with no leak). This bias can be minimized
by (1) choosing the predictors coefficients whose sum is close to but less than unity,
(2) designing the quantizer reconstruction level to be zero for inputs near zero, and
(3) tracking the mean of the quantizer output and feeding the bias correction to the
predictor.. .
The quantizer should be designed to limit the three. types of degradations,
granularity, slope overload, and edge-busyness. Coarsely placed inner levels of the
quantizer cause granularity in the flat regions of the image. Slope overload occurs at
high-contrast edges where the prediction error exceeds the extreme levels of the
quantizer, resulting in blurred edges. Edge-busyness is caused at less sharp edges,
where the reproduced pixels on adjacent scan lines have different quantization
levels. In the region of edges the optimum mean square quantizer based on Lapla-
cian density for the prediction error sequence turns out to be too companded; that
is, the inner quantization. steps are too small, whereas the outer levels are too
coarse, resulting in edge-busyness. A solution for minimizing these effects is to
increase the number of quantizer levels and use an entropy coder for its outputs.
This increases the dynamic .range and the resolution of the quantizer. The average
coder rate will now depew:. on the relative occurrences of the edges. Another
alternative is to incorporate visual properties in the quantizer design using the
visibility function [18]. In practice, standard quantizers are optimized iteratively to .
achieve appropriate subjective picture quality.
In hardware implementations of two-dimensional DPCM, the predictor is
often simplified to minimize the number of multiplications per step. With reference .
to Fig. 11.11, some simplified prediction rules are discussed in Table 11.2,
The choice of prediction rule is also influenced by the response of the recon-
struction filter to channel errors. See Section 1L8 for details.
. For interlaced image frames, the foregoing design principles are applied to
each field rather than each frame. This is because successive fields are 6li s apart and
the intrafield correlations are expected to be higher (in the presence of motion) than
the pixel correlations in the de-interlaced adjacent lines.
Overall, DPCM is simple and well suited for real-time (video rate) hardware
implementation. The major drawbacks are its sensitivity to variations in image

. 494 Image Data Compression Chap, 11



statistics and to channel errors. Adaptive techniques can be used to improve the
compression performance of DPCM.· (Channel-error effects are discussed in
Section 11.8.) .

Adaptive Techniques

The performance of DPCM can be improved by adapting the quantizer and pre-
dictor characteristics to variations in the local statistics of the image data. Adaptive
techniques use a range of quantizing characteristics and/or predictors from which a
"current optimum" is selected according to local image properties. To eliminate the
overhead due to the adaptation procedure, previously coded pixels are used to
determine the mode of operation of the adaptive coder. In the absence of trans-
mission errors, this allows the receiver to follow the same sequence of decisions
made at the transmitter. Adaptive predictors are generally designed to improve the
subjective image quality, especially at the edges. A popular technique is to use
several predictors, each of which performs well if the image is highly correlated in a
certain direction. The direction of maximum correlation is computed from previ-
ously coded pixels and the corresponding predictor is chosen.
Adaptive quantization schemes are based on two approaches, as discussed
'next.

Adaptive gain control. For a fixed predictor, the variance of the prediction
error will fluctuate with changes in spatial details of the image. A simple adaptive
quantizer updates the variance of the prediction error at each step and adjusts the
spacing of the quantizer levels accordingly. This can be done by normalizing the
prediction error by its updated standard deviation and designing the quantizer levels
for unit variance inputs (Fig. 11.14a).
Let (T; (j) and &; (j) denote the variances of the quantizer input and output,
respectively, at step j of a DPCM loop. (For a two-dimensional system, this means
we are mapping (m, n) into j.) Since « (j) is available at the transmitter as well as
Act'1\'.
e r , e e' •
e
"
code
~ Quantizer X , Activity
,. measure Ouantizer .....
,
, ,
IJ

,
0.
Gain
}; " "00 QUantizer
estimator •
• • • ·
• , •

181 Adaptive gain control , , ,


I Quantin" '"

, . . +
,
,
, •
,
Predictor
,
Ibl Adaptive classif~tion

Figure 11.14' Ac..ptive


, quantization.

Sec. 11.3 Predictive Techniques 495


&;
the receiver, it is easy to estimate (j). A simple estimate, called the exponential
average variance estimator, is of the form
0-; (j + 1) = (1 - '1) [e' (n]" + '1&; (j), &; (0) = (e' (0»2, j = 0, 1,.. . (11.37)
where ° For small quantization errors, we may use eYe (j) as an estimate of
<: '1:51.
0', (j). For Lloyd-Max quantizers, since the variance of the input equals the sum of
the variances of the output and the quantization error [see (4.47)}, we can obtain the
recursion for a;(j) as

(11.38)

(11.39)

where 'Y is a constant determined experimentally so that the mean square error is
minimized. The above two estimates become poor at low rates, for example, when
B == 1. An alternative, originally suggested for adaptive delta modulation [7], is to
define a gain a, = gem, n), which is recursively updated as

(11.40)

where M (Iq;i) is a multiplier factor that depends on the quantizer levels qj and Clk.1
are weights which sum up to unity. Often OI:k.1 = liN"" where N", is the number of
pixels in the causal window W. For example (see Table 11.1), for a three-level
quantizer (L = 3) using the predictor neighbors of Fig. 11.11 and the gain-control
formula
gem, n) = ![g(m -I, n)M(lq"'-l.n I) + gem, n - l)M(!qm.n-l D] (11.41)
,

the multiplier factor M(iql) takes the values M(O) '" 0.7,M(:tqd = 1.7. The values
in Table 11.1 are based on experimental studies [19] on S-bit images.

Adaptive clas~ification.
Adaptive classification schemes segment the
image into different regions according to spatial detail, or activity, and different
quantizer characteristics' are used for each activity class (Fig. 11.14b). A simple •

TABLE 11.1 Gain-Centro! Parameters for Adaptive Quantization in DPCM


Multipliers At (q)
L gmitl- gmax q "" 0 ±qi

3 5 5S 0.7 1.7
5 5 40 0.8 1.0 2.6
7 4 32 0.6 1.0 , i.s 4.0
' .

• ,

496 Image Data Comp.ression Chap. 11



measure'ofactivity is the variance of the pixels in the neighborhood of the pixel to
be predicted. The flat regions are- quantized more finely than edges or detailed
areas. This scheme takes advantage of the fact that noise visibility decreases with
- increased activity, Typically, up to four activity classes are sufficient. An example
would be to divide the image into 16 x,16 blocks and classify each block into one of
four classes. This requires only a small overhead 'of 2 bits per block of 256 pixels.

Other Methods [17, 20]

At low bit rates (B = 1) the performance of DPCM deteriorates rapidly. One


reason is that the predictor and the quantizer, which were designed independently,
no longer operate at near-optimum levels. Thus the successive inputs to the quan-
,

TABLE n.2 Summary of Predictive Coding


Design Parameter Comments

Predictor Predictors of orders 3 to 4 are adequate,


Linear mean square Determined from image correlations. Performs very well
as long as image class does not change very much.
Previous element Sharp vertical or diagonal edges are blurred and exhibit
'fA . edge-busyness. Channel error manifests itself as a
horizontal streak,
Averaged prediction Significant improvement over previous element prediction
a. 'Y ( A +
2
D) for vertical and most sloping edges. Horizontal and
gradual rising edges blurred. 'The two predictors using
pixel D perform equally well but better than (A + C)/2 on
b. 'Y(A ; C) gradual rising edges, Edge-busyness and sensitivity to
channel errors much reduced (Fig. 11.38). .
A + (C + D)/2)
c. 'I ( 2
Planar prediction Better than previous element prediction but worse than
a. 'Y(A +(C;....B» averaged prediction with respect to edge busyness and
channel errors (Fig. 11.38).
b. 'Y(A+(D;B))
Leak ('I) o< 'Y < 1. As the leak is increased, transmission errors
become less visible.ibut granularity and contouring
become more visible.
Quantizer
a. Optimum mean square Recommended when the compression ratio is not too
(Lloyd-Max) high (:s3) and a fixed length code is used. Prediction .
error may be modeled by Laplacian or Gaussian
probability densities.
, b. Visual Difficult to design. One alternative is to perturb the levels
of the max quantizer to obtain an increased subjective
. ."
quality. .
c. Uniform
, Useful in high-compression schemes (>3) where the
, quantizer output is entropy coded. .

.
Sec. 11.3 Predictive Techniques 497
tizer may have, significant correlation, and the predictor may not be good enough.
Two methods that can improve the performance are

1. Delayed predictive coding


2. Predictive vector quantization

In the first method [17], a tree code is generated by the prediction filter excited by
different quantization levels. As successive pixels are coded, the predictor selects a
path in the tree (rather than a branch value, as in DPCM) such that the mean square
error is minimized. Delays are introduced in the predictor to enable development of
a tree with sufficient look-ahead paths.
In the second method [20]. the successive inputs to the quantizer are entered
in a shift register, whose state is used to define the quantizer output value. Thus the
quantizer current output depends on its previous outputs. .
. I

11.4 TRANSFORM CODING THEORY

The Optimum Transform Coder

Transform coding. also called block quantization. is an alternative to predictive


coding. A block of data is unitarily transformed so that a large fraction of its total
energy is packed in relatively few transform coefficients, which are quantized inde-
pendently. The optimum transform coder is defined as the one that minimizes the
mean square distortion of the reproduced data for a given number of total bits. This
turns out to be the KL transform.
Suppose an N x 1 random vector u 'with zero mean and covariance R is
linearly transformed by an N x N (complex) matrix A, not necessarily unitary, to
produce a (complex) vector v such that its components v(k) are mutually
uncorrelated (Fig. 11.15). After quantizing each component v(k) independently,
the output vector v' is linearly transformed by a matrix B to yield a vector u', The
problem is to find the optimum matrices A and B and the optimum quantizers such
that the overall average mean square distortion
1 N 1
D=-E
N
2:
n=l
(u(n)-u'(n»2 "'-E[(u-uy(u-u')]
N
(11.42)


is minimized. The solution of this problem is summarized as follows:

1. For an arbitrary quantizer the optimal reconstruction matrix B is given by
B = A~l I' (11.43)
where I" is a diagonal matrix of elements 'Yk defined as


.1 Ak
'Yk=~ (11.44a)

~k ~ E[v(k)v' *(k)] , Ai. A E[lv' (km (1l.44b)


,
498 lmage Data Compreaslon Chap. 11
N qusmizers

,
ulll •
. I'm 1"111
, ,
u ""' , 1'= =1' = II

ui21 vl21 v'i21

I.inear Linear
transform vikl v'lkl transform
• A B •
• •
• •
• •
• •,

, .

uiNl vlM v'iM u'lM


• • ,

Figure H.tS One-dimensional transform coding,

2. The Lloyd-Max quantizer for each v(k) minimizes the overall mean square
, ,
error grvmg
.
r=I . (11.45)

3. The optimal decorrelating matrix A is the KL transform of u, that is, the rows
of A are the orthonormalized eigenvectors of the autocovariance matrix R.
This gives
(11.46)

Proofs .

1. In terms of the transformed vectors v and v' , the distortion can be written as

(11.47a)
• •

=' ~ Tr[A -I A(A -1)*T + BA'B.. T - A-IAB*T - BA.*T (A -1)"'1] (11.47b)



where
A ~ E[vv*1], A' ~ E[v'(v')*1], A. ~ 5~~v(v')*1] (1L47c)
, Since v(k),k = 0, 1, .. "N -1 are uncorrelate ;' and are quantized indepen-
• •
dently, the matrices A, A', and A are diagonal with Ak' Ak, and Ak as their
respective diagonal elements. Minimizing D by d:lf;:rentiating it with respect
to B* (or B), we obtain (see Problem 2.15)
BA·-A-1A.=O ~ B=A-1A.(A'r1 (11.48)

Sec. 11.4 . Transform Coding Theory 499


which gives (11.43) and

D =~ Tr[A-I E[y - fV'J[v - rV'J*T (A -1)*1] (11.49a)


• •
2. The preceding expression for distortion can also be written in the form
N-IN-I .
1

D ==N 2: 2: ![A-I]'Au~{k)"k
;"",0 k-O
(11.49b)

where ~(k) is the distortion as if v (k) had unity variance, that is


. . u~(k)~ E[lv(k) -"Ykv'(k)12]!"k (11.49c)
From this it follows that 0- ~ (k) should be minimized for each k by the quan-
tizer no matter what A is. t This means we have to minimize the mean square
error between the quantizer input v.(k) and its scaled output "Ik v' (k). Without
loss of generality, we can absorb 'Yk inside the quantizer and require it to be a
minimum mean square quantizer. For a given number of bits, this would be
the Lloyd-Max quantizer. Note that 'Yk becomes unity for any quantizer whose
output levels minimize the mean square quantization error regardless of its
decision levels. For the Lloyd-Max quantizer, it is not only that "Ik equals
unity, but also its decision levels are such that the mean square quantization
error is minimum. Thus (11.45) is true and we get B == A -I. This gives
~(k) = "k iT~ (k) == E(\v (k) - V· (k)[2] == r-..k!(nk) (11.50)
• •

where I(x) is the distortion-rate function of an x-bit Lloyd-Max quantizer for


unity variance inputs (Table 4.4). Substituting (11.50) in (11.49b), we obtain

D = ~ Tr[A -1 FA(A -1)*1], (11.51)

Since v equals Au, its covariance is given by the diagonal matrix


E[vv*1] ~ A = ARA «r (11.52)
Substitution for A in (11.51) gives

D ""tTr[A-IFARl (11.53)
where F and ~ do not depend on A. Minimizing D with respect to A, we obtain
(see Problem 2.15)

0= iiD == _1. [A-I FARA -if + 1. [RA -1 F]T


aa . N N (11.54)
. = ? F(ARA-I) "" (ARA-l)F
,;f:

Thus, Fand ARA -1 commute. BecauseF is diagonal, ARA -i must also be


• •

diagonal. But ARA *T is also diagonal, Therefore, these two matrices must be
related by Ii diagonal matrix G. as "
t Note that <r.(k) is independent of the transform A.

500 Image Data Compression Chap. 11


ARA*T=(ARA-1)G. (11.55)
This implies AA *T = G, so the columns of A must be orthogonal, If A is
replaced by G lrJ. A, the overall result of transform coding will remain un-
changed because B = A-l. Therefore, A can be taken as a unitary matrix,
which proves (11.46). This result and (11.52) imply that A is the KL transform
of u (see Sections 2.9 and 5.11).

Remarks

Not being a fast transform in general, the KL transform can be replaced either by a
fast unitary transform, such as the cosine, sine, DFf, Hadamard, or Slant, which is
not a perfect decorrelator, or by a fast decorrelating transform, which is not unitary.
In practice, the former choice gives better performance (Problem 11.9).
The foregoing result establishes the optimality of the KL transform among all
decorrelating transformations. It can be shown that it is also optimal among all the
unitary transforms (see Problem 11.8) and also performs better than DPCM (which
can be viewed as a nonlinar transform; Problem 11.10).

Bit Allocation and Rate-Distortion Characteristics


The transform coefficient variances are generally unequal, and therefore each
requires a different number of quantizing bits. To complete the transform coder'
design we have to allocate a given number of total bits among all the transform
coefficients so that the overall distortion is minimum. Referring to Fig. 11.15, for
.any unitary transform A, arbitrary quantizers, and B = A-I = A H; the distortion
becomes
1 N -I 1 N-l
D =N .2.,0 E[lv(k) - 2
v (k)1 ] = N.~o (TU(n.) (11.56)

where (T~ is the variance of the transform coefficient v (k), which is allocated nk bits,
and I('), the quantizer distortion function, is monotone convex with f(O) = 1 and.
f( co) =d 0. We are given a desired average bit rate per sample, B; then the rate for the
A-transform coder is .

1 !V-I .
RA~- L nk=B (11.57)
N k=O •

. The bit allocation problem is to find nk ~ 0 that minimize the distortion D, subject
to (11.57). Its solution is given

by the following

algorithm, '.

."
. Bit Allocation Algorithm

Step 1.' Define the inverse function off'(x) ~ df(x)/dx as hex) ~f H (x), or
h if I (z) =.x. Find .e,the' root of the nonlinear equation. ,

RA~1.. 2.,. h (flf'}O») = B ' (11.58)


N k;~l>a (T k

Sec. 11.4 Transform Coding Theory 501


The solution may be obtained by an iterative technique such as the Newton method.
The parameter II is a threshold that controls which transform coefficients are to be
coded for transmission.
Step 2. The number of bits allocated to the kth transform coefficient are •
given by .
n - o, crf<O (11.59)
• k - h (Of'(O)/cr rr f ;'2: 0 n,
Note that the coefficients whose mean square value falls below 0 are not coded at
all. •

Step 3. The minimum achievable distortion is then


• 1
D=-
N
L: U'U(nk)+ L: crf (11.60)
k:(T~>ll k:ut<tl

Sometimes we specify the average distortion D = d rather than the average rate B.
In that case (11.60) is first solved for O. Then (11.59) and (11.58) give the bit
allocation and the minimum achievable rate. Given ns, the number of quantizer
levels can be approximated as Int(2nk ] . Note that nk is not necessarily an integer.
This algorithm is also useful for calculating the rate versus distortion characteristics
of a transform coder based on a given transform A and a quantizer with distortion
function f(x). .
In the special case of the Shannon quantizer, we have I(x) == 2-2.<, whichgives


. -x
rex) = -(2 log.2)2-2.< =? hex) = -~ log, 210g..2 .

(11
11k = max 0, ~log2 9 (H.61) .

. 1 N-l
1
V=- L: crl+ L: I)
N ..i<8
= -N L: min(6,(1i) (11.62)
..i>9 k<O

1
R,.=-
N
(11.63)
1N~1 (I (11
= N k~O max ,0, '2 log26

More generally, whenf(x) is modeled by piecewise exponentials as in Table 4.4, we


can similarly obtain the bit allocation formulas (32]. Equations (11.62) and (11.63)
give the rate-distottion bound for transform coding of an N x 1 Gaussian random
vector u by a unitary.transform A. This means for a fixed distortion V, the rate R",
will be lower than the rate achieved by using any practical quantizer, When V is
small enough so that 0 < 6 < min {crn, we get 6 == V, and
. IN~1 1 Nn-l
0"1'
R", == - ..:.. ! lo~~ = log, 0' i - ~ log, V (11.64)
N k<O V 2N k~O

502 Image
. Data
. Compression Chap. 11
In the case of the KL transform, (J'i .,., Ak and ITk Ak'" IRI, which gives

(11.65)
,

For small but equal distortion levels,


1 (11.66)
R A - R KL .,. 2N log,

where we have used (2.43) to give


/'I - I /'1-1

IRl'" lARA *1]:$ IT


k=O
[ARA*1Jk,k = IT 0'1
k-O

For PCM coding, it is equivalent to assuming A = I, so that


1 •
, R pCM - R KL = - 2N log2!RI;;:: 0 (11.67)

where R'" {rem, n)/(J'~} is the correlation matrix of D, and O'~ are the variances of its
elements.
Example 11.3
The determinant of the covariance matrix R = {plm - n~ of a Markov sequence of length
N is IRI ". (1 -ll)N-t. This gives

D <min{A.} (11.68)

For N ". 16 and p = 0.95, the value of min{A.} is 0.026 (see Table 5.2). So for D "'0.01, '
we get R KL '" 1.81 bits per sample. Rearranging (11.68) we can write
I (1 - p2) 1 ( 2)
R KL = Z 1082 D - 2N 1082 1 - P (11.69)

As N ..... oo, the rate RK L goes down to a lower bound R Kd oo)=410g2 \ I - p2) 1D,
and RpCM- RKdoo ) = -41082 (1- p~ = 1.6 bits per sample. Also" as N ..... IX>, the
eigenvalues of R follow the distribution A(OO) = (1- p2)/(1 + p2 + 2p cos 00), which
gives min{A.} = (1 ..., p2)/(1 + p)2 = (1- p)/(1 + p). For p =0.95, D = 0.01 we obtain
R KL (oo) ". 1.6 bits per sample. ,

Integer Bit Allocation Algorithm. The number of quantizing bits nk are often
specified as integers. Then the solution of the bit allocation problem is obtained by
applying a theory of marginal analysis [6, 21], which yields the following simple
algorithm.

Step 1. Start withthe allocation n~ = 0, 0 <: k :$ N -1. Setj .,. 1.


Step 2. ni .,. ni- + li(k - i), where i is any index for which
1

,
Il k .1. u i[f (ni- 1) - f (ni- 1+1)] .
is maximum. .1.k is the reduction in distortion if the jth bit is allocated to the kth

quannzer.

See. 11.4" Transform Coding Theory 503


Step 3. If 2.kn{::> NB, stop; otherwise j -> j + land go to Step 2.

If ties occur for the maximizing index, the procedure is successively initiated
with the allocation n~ = ni< -1 + o(i - k) for each i, This algorithm simply means that
the marginal returns .

k =0, ... ,N -1,j = 1, ... ,NB (11.70)

are arranged in a decreasing order and bits are assigned one by one according to this
order. For an average bit rate of B, we have to search N marginal returns NB times.
This algorithm can be speeded up whenever the distortion function is of the form
f(x) = a2- bx • Then Ak •i = (1 - 2-b)crU(n~-I), which means the quantizer having the
largest distortion, at any step j, is allocated the next bit. Thus, as we allocate a bit,
we update the quantizer distortion and the step 2 of the algorithm becomes:
Step 2: Find the index i such that '
D; = mp-x[crU(n{-l)] is maximum
Then
nk = ni- 1 + o(k - i)
D;=2- b D;
The piecewise exponential models of Table 4.4 can be used to implement this step.

,
11.5 TRANSfORM CODING OF IMAGES
• •
The foregoing one-dimensional transform coding theory can be easily generalized.
to two dimensions by simply mapping a given N x M image u(m, n) to a one-
dimensional NM x 1 vector u, The KL transform of u would be a matrix of size
NM x NM. In practice, this transform is replaced by a separable fast transform such
as the cosine, sine, Fourier, Slant, or Hadamard; these, as we saw in chapter 5, pack
a considerable amount of the image energy in a small number of coefficients.
To make transform coding practical, a given image is divided into small rectan-
gular blocks, and each block is transform coded independently. For an N x M
image divided into ;>'iMIpq blocks, each of size p x q, the main storage require-
ments for implementing the transform are reduced by a factor of NMlpq. The
computational load js reduced by a factor of log2MN/log2pq for a fast transform
requiring aN log2Noperations to transform an N x 1 vector. For 512 x 512 images
divided into 16 x 16 blocks, these factors are 1024 and 2.25, respectively. Although
the operation count is not greatly reduced, the complexity of the hardware for
implementing small-size transforms is reduced significantly. However,. smaller
block sizes yield lower compression, as shown by Fig. 11.16. Typically, a block size
of 16 x 16 is used.
• .
Two-Dimensional Transform Coding Algorithm. We now state. a practical.
transform coding algorithm for images (Fig. 11.17).

504 Image Data Compression Chap. 11


1.0

0.5 '--_ _.L....._ _'"-_ _-'--_ _- ' -_ _--'-_ _- ' -_ _- '


2X2 4X4 s x a 16X 16 32X32 64X64128X128
Block size po

Figure 11.16 Rate achievable by block KL transform coders for Gaussian random
fields with separable covariance function, p = p, =0.95, at distortion D = 0.25%.

q
• .
p Vi Vi
u, ApUiA~ Quantize Code XMIT/Store

• I

,'. , I
(a) Coder
I
I •


, ,
q
: r


V, •
p
Decode A,TV'A;
p ,
I• Ui •

(b} Decoder

Figure 11.17 Two-dimensional transform coding.

1. Divide the given image. Divide the image into small rectangular blocks of size
. p x q and transform each block to obtain Vi, i "" 0, ... ,I - 1, I L'..NMIpq.
2. Determine the bit allocation. Calculate the transform coefficient variances cr~, I
.. via (5 .3!i) gf Problem 5.29b if the image covariance function is given. AIterna-
.. tively, estimate the variances (H.I from the ensemble of coefficients Vi (k, I),
i ;: 0, ... ,I - 1, obtained from a given prototype image normalized to have
unity variance. From this, the (J' II for the image with variance (J' 2 are estimated

Sec. 11.5 Transform Coding of Images 5D5


as uL == u~,ler2. The UZ,I can be interpreted as the power spectral density of
the image blocks in the chosen transform domain,
The bit allocation algorithms of the previous section can be applied
after mapping ut.t into a one-dimensional sequence. The ideal case, where

f(x) == 2-2<, yields the formulas '
; 2
I U"k.{
, nk./ == max ( 0, 2 log, 6 '
p-lq-l
D = -l. 2: 2: min(6,uL), (11,71)
. pqk=OI=O
p-!q-!
RA = -l. 2: 2: nk,I(9)
pq k=O I~O
Alternatively, the integer bit allocation algorithm can be used. Figure 11.18
shows a typical bit allocation for 16 x 16 block coding of an image by the
cosine transform to achieve an average rate B = 1 bit per pixel,
3. Design the quantizers. For most transforms and common images (which are
nonnegative) the de coefficient Vi(O,O) is nonnegative, and the remaining
coefficients have zero mean values. The de coefficient distribution is modeled
by the Rayleigh density (see Problem 4.15). Alternatively, one-sided Gaussian
or Laplacian densities can be used. For the remaining tranform coefficients,
Laplacian or Gaussian densities are used to design theirquantizers. Since the
transform coefficients are allocated unequal bits, we need a different quan-
tizer for each value of ns.i- For example, in Fig. 11.18 the allocated bits range
from 1 to 8. Therefore, eight different quantizers are needed. To implement
these quantizers, the input sample Vi (k, I) is first normalized so that it has .
unity variance, that is, .

, (k I) ~ v/(k, I) tk, I) "" (0,0) (11.72)


Vi s - ITk, I '

These coefficients are quantized by an nk, rbit quantizer, which is designed for
zero mean, unity variance inputs. Coefficients that are allocated zero bits are
,
j ~

8 7 6 5 3 3 2 2 2 1 1 1 1 100
765~33221 1 1 1 1000.
6 5 433 222 1 1 1 1 1 000
5 4 3 3 3 2 221 1 1 1 100 Q
333 3 2 221 1 1 1 1 0 0 0 0
3322222111110000
2 2 2 2 2 2 1 1 j 1 1 000 0 0

f 2.2221 1 1 1 1 1 1 00000
2 1 11 1 1 111 10 0 0 0 0 0
1 1 1·1 1 1 1 1 1 10000000
1 1 111 1 1 1 000 0 0 0 0 0 Figure lull Bit allocation for 16 x 16
1 1 1 1 1 1 000 0 0 000 00. block cosine transform coding. of images
1 1 1 100 0 0 0 0 boO 0 0 0
lOOP 0 0 0 0 00,000000 modeled by isotropic covariance function
0000000000000000 with p =0.95. Average rate > 1 bit per
0000000000000000

pixel.

506 Image Data Compression Chap. 11


. not processed at all. At the decoder, which knows the bit allocation table in
advance, the unprocessed coefficients are replaced by zeros (that is, their
mean values). .
4•. Code thequantizer output. Code the output into code words and transmit or
store.
5. Reproduce the coefficients. Assuming a noiseless channel, reproduce the coef-
ficients at the decoder as
v. (k l) = v; (k, l)ak.!' (k, I) E I, (11.73)
., 0, otherwise
where It denotes the set of transmitted coefficients. The inverse trans-
formation U,' = A * TV;' A" gives the reproduced image blocks.

Once a bit assignment for transform coefficients has been determined, the
performance of the coder can be estimated by the relations
p-lq-l p-lq-l

q 2:
D =p..L
k=OI~O
2: a~.d(nk.I)' RA =-jq 2: 2: k~OI=O
nk,1 (11.74)

Transform Coding Performance Trade-Offs and Examples


Example 11.4 Choice of transform
Figure 11.19 compares the performance of different transforms for 16 x.16 block
coding of a random field. Table 11.3 shows examples of SNR values at different rates.
The cosine transform performance is superior to the other fast transforms and is almost
indistinguishable from the KL transform. Recall from Section 5.12 that the cosine
transform has near-optimal performance for first-order stationary Markov sequences
with p> 0.5. Considerations for the choice of other transforms are summarized in
Table 11.4. .... . . ,

,. .'
- _ . Cosine, KL
- - Hadamard . " , ,,
" 0 . Sine'

A. Fourier

t


" ' "


• •


,F'gure l1.t9 Distortion versus rate
characteristics for different transforms
0.25 0.50 1.0 2.0 _,4.0 " for a two-dimensional isotropic random
Rate (bits/sample) field,

Sec. 1'.5
I' I
Transform Coding of Images

507

. TABLE 11.3 SNR Comparisons of Various Transform Coders for Random Fields

with Isotropic Covariance Function, p = 0.95

Rate .. SNR (dB)


Block bits!

SIZe pixel KL Cosine Sine Fourier Hadamard

8x8 0.25 11.74 n·66 9.08 10.15 . 10.79


0.50 13.82 13.76 11.69 12.27 12.65
1.00 16.24 16.19 . 14.82 14.99 15.17
2.00 20.95 20.89 19.53 19.73 19.86
4.00 31.61 31.54 30.17 30.44 30.49
16x 16 0.25 12.35 10.37 10.77 10.99
0.50 14.25 12.82 12.87 12.78
1.00 16.58 15.65 • 15.52 15.27
2.00 21.26 20.37 •
20.24 20.01
4.00 31.90 31.00 •
30.88 30.69
Note: The KL transform would be nonseparable here.

Example n.5 Choice of block size


The effect of block size on coder performance can easily be analyzed for the case of
separable covariance random fields, that is, r(m, n) = pi""": In l• For a block size of
N x N, the covariance matrix can be written as

~=R®R ~ IGiEI = IRi2."=(I- p2? N (N - ')

where R is given by (2.68). Applying (11.69) the rate achievable by an N x Nblock KL


transform coder is •
.( ) I
R KL N =:; log,
(1 - p2? 1 ( 2,
D . - 2N log2 1 - P ) ,
<:
D_ l+p
(1-'- p)2
(11.75) .

When plotted as a function of N (Fig. 11.16), this shows the block size of 16 x 16 is
suitable for p = 0.95. For higher values of the correlation parameter, p, the block size
should be increased. Figure 11.20 shows some 16 x 16 block coded results.
Example 11.6 Choice of covariance model

The transform coefficient variances are important xfor designing the quantizers. Al-
though the separable covariance model is convenient for analysis and design of
transform coders, it is not very accurate. Figure 11.20 shows the results of 16 x 16
cosine transform coders based on the separable covariance model, the isotropic covar-
iance model, and the actual measured transform coefficient variances. As expected,
the actual measured variances yield the best coder performance. Generally, the
isotropic covariance model performs better than the separable covariance model.

Zonal Versus Threshold Coding


Examination of bit allocation patterns (Fig. 11.18) reveals that only a small zone of
transformed image is transmitted (unless the average rate was very high). Let N, be
the number of transmitted samples. We define a zonal mask as the array

nt (k, l) = 1, k, 1 E If ,
(11.76)

0, otherwise

508 Image Data Compression Chap. 11


,,
, ,,
,
'i ~"'-:-
'. -- ._, ,

. t<,., -

,,

, ObI>

tal separable, SNR' = 37.5 etB, the right side shows (b) isotropic, SNR' = 37.8 dEl
error Images

, .


leI measured covariance, SNR' = 40.3 dEl

, Figure 11.20 Two-dimensional lti x 16 block cosine transform coding at 1 bit/pixel


rate using different covariance models. The right half shows error images. SNR' is
defined by cq. (3.13),
,
which takes the unity value in the zone of largest N, variances of the transformed
samples. Figure 1l.21ashows a typical zonal mask. If we apply a zonal mask to the
transformed blocks and encode only the nonzero elements, then the method is
called zonal coding.
In threshold coding we encode the N, coefficients of largest amplitude rather
than the N, coefficients having the largest variances, as in zonal coding, The address
set of transmitted samples is now .
It' "" {k, 1; jv(k, /)1 > TJ} (11.77)
where 1'J is a suitably chosen threshold that controls the achievable average bit rate.
For a given ensemble of images, since the transform coefficient variances are fixed,

1 1 1 1 0 •
1 1 1 0 0 I 1 1 1 I 0
1 1 ',0 0 0 1 1 1 0 0
I ,,
1 0 0 Q 0 1 I,,, 0.\ 1 1 0
,
0 0 0 0 0 0 0 0 0 0
. ~ ,
- ....c·, ..
,

ia) Zonal mask . '(b) Threshold mask (c). Ordering for


threshold' coding

Figure 11,21 Zonal and threshold masks.

, '

Sec. 11.5 ' . Transform Coding of Images 509


the zonal mask remains unchanged from one block to the next for a fixed bit rate.
However, the threshold mask m~, (Fig. 11.21b) defined as

I7l (k 1) = 1, k, I E It' (11.78)


~, 0, otherwise •

can change because 1/, the set of largest amplitude coefficients, need not be the
same for different blocks. The samples retained are quantized by a suitable uniform
quantizer followed by an entropy coder.
For the same number of transmitted samplestor quantizing bits), the thresh-
old mask gives a better choice of transmission samples (that
, , , is, lower distortion).
However, it also results in an increased rate because the addresses of the
transmitted samples, that is, the boundary of the threshold mask, has to be coded
for every image block. One method is to run-length code the transition boundaries
in the threshold mask line by line. Alrerratively, the two-dimensional transform
coefficients are mapped into a one-dimensional'sequence arranged in a predeter-
mined order, such as in Fig. 11.21c. The thresholded sequence transitions are then
run-length coded. Threshold coding is adaptive in nature and is useful
, for achieving
high compression ratios when the image contents change considerably from block to
block so that a fixed zonal mask would be inefficient.

Fast KL Transform Coding ,


,

For first-order AR sequences and for certain random fields represented by low-
order noncausal models, fast KL transform coding approaches or exceeds the data

compression efficiency of block KL transform coders. Recall from Section 6.5 that
an N x 1 vector u whose elements u (n), 1 s 11 S N, come from a first-order, sta-
tionary, AR sequence with zero mean and correlation p has the decomposition
b
u=UO+u (11.79)
b
where u is completely determined by the boundary variables u(O) and u(N + 1)
(see Fig. 6.8) and UO and u b are mutually orthogonal random vectors. The KL trans-
form of the sequence {uO(n), 1 <:: n <: N} is the sine transform, which is a fast trans-
form. Thus (11.79) expresses the N x 1 segment of a stationary Markov process as a
two-source model. The first source has a fast KL transform, and the second source,
has only two degrees of freedom (that is, it is determined by two variables).
Suppose we are given the N + 2,elements u (n), 0 S n S N + 1. Then the N x 1
sequences uO(n) and ub(n) are realized as follows. First the boundary variables u(O)
and u(N + 1) are passed through an interpolating FIR filter, which gives ub(n), the
best mean square estimate of u (n), 1 s n <: N, as '
(11.80)
Then, we obtain the residual sequence ,
'uO(n)~u(n)-ub(n), i<:nsN (11.81)
Instead of transform coding the original (N + 2) x 1 sequence byits KL trans-
form, UO and u, can be ceded separately using three different methods [6, 27].. One
of these methods; called recursive block coding, is discussed here.
• ,

510 Image Data Compression Chap. 11,


Recursive block coding. In the conventional block transform coding meth-
ods, the successive blocks of data are processed independently. The block .size
should be large enough so that interbloek redundancy is minimal. But large block
sizes spell large hardware complexity for the transformer. Also, at low bit rates (less
than
.
1 bit per
.
pixel) the
-,
block boundaries start becoming visibly objectionable.
. I
Previous block
.....- -'A'- ---,
r \
ufO) ulll' • • ulnl u(N+ II u(OI
1 I 1_ _
I- I I I
ufO) um ••• u(n) ulN + I)
\
V •
,
Current block
u = {um. •• II(N)
"

u'(OI +
I:
-.
"

II
\If
-
V
Ouantizer
-
V-

0 " -
u'"



.,0-' "

0

ulN+ 11 uW~ 11
Quantizer , ' "

"
, "

,
Delav

u"lN+lI •

N+1

,,
, (el Coder;,

-v •
' -<i
II
,
'It-I

.
• + ,
u'(N+ 11 Delay u'IO) l;'\. II ,
i
N+l
,
• +
0
,
"
• ub
• • ,

• ",0- 1
• , "
"

0
,


u'IN + 11 ,
, '~

,
Ib)Decode~ •
Figure 11.22 Fast KL transform coding (recursive block coding). Each successive
block brings (N + I) new pixels, . .

Sec. 11.5 Transform Coding of Images 511 .


In recursive block coding (RBC), the correlation between successive blocks of


data is exploited through the use of block boundaries. This yields additional com-
pression and allows the use of smaller block sizes ($ x 8 or less) without sacrificing
performance. Moreover, this method significantly reduces the block boundary ef-
fects.
Figure 11.22 shows this method. For each block the boundary variable
u(N + 1) is first coded. The reproduced value u'(N + 1) together with the initial
value u: (0) (which is the boundary value of the previous block) are used to generate
u b ' (n), an estimate of ub(n). The difference
. uO(n)~ u(n) - u b'(n)
is sine transform coded, This yields better performance than the conventional block
KLT coding (see Fig. 11.23),

Remarks

The FIR filter aQ-! can be shown to be approximately the simple straight-line
interpolator when p is close to 1 (see Problem 6.13 and [27]). Hence, ub can be
viewed as a low-resolution copy obtained by subsampling and interpolating the
original image. This fact can be utilized in image archival applications, where only
the low resolution image is retrieved in search mode and the residual image is called
once the desired image has beel). located. This way the search .process can be
speeded up.

10°

Conventional
- Kl.Tcoder

KLT coder
(RBe) first

10-2 L.,. -..L. ..;,..JI.- ..J


10-3 10° • •
Figure 11.23 !!tate versus . distortion

Distortion, D , characters.

512 Image Data Compression Chl1P. 11




The foregoing theory can also be extended to second-order and higher AR
models [27]. In these cases the KL transform of the residuals uG(n) is no longer a fast
transform. This may not be a disadvantage for the recursive block coding algorithm
because the transform size can now be quite small, so that a fast transform is not
necessary.
In two dimensions many noncausal random field models yield fast KLT de-
compositions (see Example 6.17). Two-dimensional fast KLT coding algorithms
similar to the ones just discussed can be designed using these decompositions.
Figure 11.24 shows a two-dimensional recursive block coder. Figure 11.25 compares
results of recursive block coding with cosine transform coding. The error images
show the reduction in the block effects..

Two-Source Coding

The decomposition of (11.79) represents a stationary source by two sources, which


can be realized from the image (block) and its boundary values; We could extend

I I I I I
I I I I I
--+-------t-------~-------J-------4--
I I I I I
I I I I I
I I 1 . I II' _
I I I I
I I \ I :
I I, _ .I 1I I
--i--------.... --~---- +- - - - - - - - - - - - - - - - - -
I I I I I
1 1 I I
I I ! ! I
II I l>m%1 I
I I I ~~ I I
I I 1 I I
I I D, I I I Data already coded
.--+- -.---- - t- - - ~ - - _..L - - - - - --1- -_ - l. __
1
I ,I r--,::=:::;--:I--;::=::;--t-------..L-
I I
I I I
I D I K I b 1<+1 I
I 21 ,4 I •
+ !
' I - I
I
I I
I I
,i
__ L L -b,-~-----...J

Boundaries to be coded with Kth block

I mage block ,
U + U me I U-0'
"-
- transform
coder
- u·•
,,
- • •
1>3' b, Il:.. b. Filter
Quantiz'e \R-1
- -
, .
• • -
1"'- #j.,'{ • ~ b,. Il:. •

- Delays -

Figure 11.24 Two-dimensional recursive block coding.


Sec. -l1.5. Transform Coding of Images 513

~--~~}~
".
1

,
,

, ¥ • - '
,,',--,,','~?

·.i .
J ,~'H% "'_'_"~ '01 , ,; A
!\

(e) OCT (b) RBC

ldl RBC distortion


Figure 11.25 Recursive block coding examples. TWo-dimensional results, .block


size = 8 x 8, entropy = .24 bits/pixel. (Note that error images are magnified and
enhanced to show coding distortion more clearly.)
'.
this idea to represent an image as a nonstationary source
U=U,+Uf (11.82)
where Us is a stationary random field and Vr is a deterministic component that
represents certain features in the image. The two components are coded separately
to preserve the different features in the image. One method, considered in [28],
separates the image into its low- and high-spatial-frequency components. The high-
frequency component, obtained by employing the Laplacian operator, is used to
detect the edges whose locations are encoded. The low-frequency component can
easily be encoded by' transform or DPCM techniques. At the receiver a quantity

514 Image Data Compression Chap. 11



,/
".----
/ Low-pass
_I A ...._
/
component
·V
------ ,/

,
Input signal Synthetic highs Synthesillld edge

lal

Corner point

Input signal Local average Stationary residual

(bl
Figure Il.U Two-source coding via (a) synthetic heights; (b) detrending.

proportional to the Laplacian of a ramp function, called synthetic highs, is gener-


ated at the location of the edges. The reproduced image is the sum of the low-
frequency component and the synthetic highs (Fig. l1.26a). .
, In other two source-coding schemes CYan and Sakrison in [Icj), the stationary
source is found by subtracting from the image a local average found by piecewise
fitting planes or low-order surfaces through boundary or corner points. The corner
points and changes in amplitude are coded (Fig. 11.2Gb). The residual signal, which
is a much better candidate for a stationary random field model, is transform coded.

Transform Coding Under Visual Criteria [30]

From Section 3.3 we know that a weighted mean square criterion can be useful for
visual
, evaluation of images. An FFT coder that incorporates this criterion (Fig.
11.27) quantizes the transform coefficients of the image contrast field weighted by
H(k, I), the sampled frequency response function of the visual system. Inverse
weighting followed by the inverse FFT gives the reconstructed contrast field. To
apply this method to block image coding, using arbitrary transforms, the image
contrast field should first be convolved with h (m, n), the sampled Fourier inverse of •

H(~l ,~). The resulting field can then be coded by any desired method. At the
receiver, the decoded field must then' be convolved with the inverse filter whose
frequency response is 1/H (~l , ~). .

Adaptive Transform Coding

There are essentially three types of adaptation for transform coding:


1. Adaptation of transform
2. Adaptation of bit allocation .
~. Adaptation of quantizer levels

Sec. 11.5 '. Transform Coding of Images 515



Luminance
J
I
.I
Lim, n)

I
.! fl • )
luminance to
u FUI'
V Vi,ual ' : V·
F*V F""
U· r'l')
contrast to
'I L'I m,n I
quantizes- I

contrast I luminance

.I
Encoder
I
I Decoder
f

lal Overall system

,
vlk, II Frequency wik, /l MMSE w'{k,n Inverse v'ik, n
weighting quantizer weighting
H{k,ll
,I .
.

lIHlk,/l

(b) Frequency domain visual quantization

Figure 11.27 Transform coding under a-visual criterion; F = unitary DFf,

Adaptation of the transform basis vectors is most expensive because a new set of KL
basis vectors is required whenever any change occurs in the statistical parameters. A
more practical method is to adapt the bit assignment of an image block, classified
into one of several predetermined categories, according to the spatial activity (for
instance, the variance of the data) in that block [l(c), p. l285J. This results in a
variable average rate from block to block but gives a better utilization of the total
bits over the entire ensemble of image, blocks. Another adaptive scheme is to
allocate bits to image blocks so that each block' has the same distortion [29]. This
results in a uniform degradation of the image and appears less objectionable to the
eye.
In adaptive quantization schemes, the bit allocation is kept constant but the
quantizer levels are adjusted according to changes in the variances of the transform
coefficients. Transform domain variances may be estimated either by updating the
statistical parameters of the covariance model or by local averaging of the squared
magnitude of the transform domain samples. Examples of adaptive transform cod-
ing are given in Section 11.7, where we consider interframe transform coding.

Summary of Transform Coding

In summary, transform coding achieves relatively larger compression than predic-


tive methods. Any distortion due to quantization and channel errors gets distrib-
uted, during inverse transformation, over the entire image (Fig. 11.40). Visually,
this is less objectionable than predictive coding errors, which appear locally at the
source. Although transform and predictive coding schemes are theoretically close in
performance at low distortion levels for one-dimensional Markov sequences, their
performance difference in two dimensions is substantial. This is because of two
reasons. First, predictive coding is quite sensitive to changes in the statistics of the
data. Therefore, in practice; only adaptive predictive coding schemes achieve the
efficiency of (nonadaptive) transform coding methods. Second, in two dimensions,

516 Image Data


,
Compression Chap. 11


,

finite-order causal predictors may never achieve compression ability close to trans-
form coding because a finite-order causal representation of a two-dimensional
random field may not exist. From an implementation point of view, predictive
coding has much lower complexity both in terms of memory requirements and the
number of operations to be performed, However, with therapidly decreasing cost
of digital hardware and computer memory, the hardware complexity of transform
coders will not remain a disadvantage for very long. Table 11.4 summarizes the

TABLE 11.4 Practical Considerations in Designing Transform Coders


Design variables
-
Comments
- - - - -
1. Covariance model
Separable see (2.84) Convenient to analyze. Actual performance lower
compared to nonseparable exponential.
Nonseparable see (2.85) Works well when () 0, 0:" 0:2 are matched to the
image class.
Noncausal NC2 Useful in designing 2-D fast KLT coders [6].
(see Section 6.9)
2. Block size (N) Choice depends on available memory size and the
16 x 16 value of the one-step correlation p. For p :5 0.9,
to , 16 x 16 size is adequate, A rule of thumb is to
64x64 pick N such that pN<.<1 (say 0.2). For smaller
block size, recursive block coding is helpful.
3. Transform This choice is important if block size is small,
say N:564.
Cosine Performs best for highly correlated data (p 2: 0.5)
Sine For fast KL or recursive block coders.
DFT Requires working with complex variables.
Recommended if use of frequency domain is
mandatory, such as in visual coding, and in cr,
MRI image data, where source data has to pass
through the Fourier domain.
Hadamard Useful for small block sizes ( =4 x 4).
Implementation is much simpler than sinusoidal
fast transforms.
Haar Useful if higher spatial frequencies are to be
emphasized. Poor compression on mean square
basis. . .
KL Optimum on mean square basis. Difficult to
implement. Cosine or other sinusoidal
transforms ,are preferable:
.
Slant Best performance among nonsinusoidal fast
transforms.
4. Quantizer
Lloyd-Max . Either Laplacian or Gaussian density may be
used, For de coefficient a Rayleigh density may
. ,
be used,
Optimum uniform Useful if the output is entropy coded or if the

number of quantization levels is very large.

Sec.' 11.5 . Transform Coding of Images 517


TABLE 11.5 Overall Performance of Transform Coding Methods

Typical compressions .
Method ratios for images

One-dimensional 2-4
Two-dimensional 4-8
'Iwo-dimensional adaptive 8-16
Three-dimensional 8-16
Three-dimensional adaptive 16-32

practical considerations in developing a design for transform coders. Table 11.5


compares the various transform coding schemes in terms of their compression
ability. Here the compression ratio is the ratio ofthe number of bits per pixel in
the original digital image (typically, 8) and the average number of bits per pixel
required in encoding. Compression ratio values listed are to achieve SNR's in the
30- to 36-dB range.

11.6 HYBRID CODING AND VECTOR DPCM


Basic Idea
The predictive coding techniques of Section 11.3 are based on raster scanning and
scalar recursive prediction rules. 'If the image is vector scanned, for instance, a
column at a time, then it is possible to generalize the DPCM techniques by
considering vector recursive predictors. Hybrid coding is a method of implementing
an N x 1 vector DPCM coder by N decoupJed scalar DPCM coders. This is achieved
by combining transform and predictive coding techniques. Typically, the image is
unitarily transformed in one of its dimensions to decorrelate the pixels in that
direction. Each transform coefficient is then sequentially coded in the other direc-
tion by one-dimensional DPCM (Fig. 11.28). This technique combines the advan-
tages of hardware simplicity of DPCM and the robust performance of transform
coding. The hardware complexity of.his method is that of a one-dimensional
transform coder and at most N DPCM channels. In practice the number of DPCM

,
V.11l . v~l1I ,

OPCM Filter ,
\

v.121 v~121 ,
OPCM -;... Filter •
u;121


+ •
Channel

, ~-t


• • • •
• • • •
VaiN) V; IN) •
DPCM Filter

FilJUft 11.28 Hybrid coding method.

618 Image Data Compression . Chap. 11



,
channels is significantly less than N because many elements of the transformed
vector are allocated zero bits and are therefore not coded at all.

Hybrid Coding Algorithm. Let u., n "" 0,1, ... , denote N x 1 columns of
an image, which are transformed as .
n=O,1,4, ... (11.83)
For each k, the sequence V n (k) is usually modeled by a first-order AR process [32],
as
v. (k) = a(k)vn-I (k) + b(k)en (k),
(11.84)
E[e. (k)e., (k')] = <T~(k)~(k - k')8(n - n')
The parameters of this model can be identified from the covariances of v. (k),
n - 0, 1, ... , for each k (see Section 6.4). Some sernicausal representations of
images can-also be reduced to such models (see Section 6.9). The DPCM equations
for the kth channel can now be written as
Predictor: v~ (k) = a(k)v~_l (k) (11.85a)
. _ (k') ~ v. (k) - v~ (k)
Quantizer input: en - b(k)

Quantizer output: e,; (k) (1l.8Sh)


Filter: v~ (k) = v~ (k) + b(k)e~ (k) (ll.SSe)
The receiver simply reconstructs the transformed vectors according to (1l.85e) and
performs the inverse transformation "1'-1. Ideally, the transform "I' should be the
KL transform of u•. In practice, a fast sinusoidal transform such as the cosine or \,
sine is used. .
To complete the design we now need to specify the quantizer in each DPCM
channel. Let B denote the average desired bit rate in bits per pixel, nk be the
number of hits allocated to the kth DPCM channel, and 11~(k) be the quantizer
mean square error in the kth channel, that is,
1 N
B := - 2: nk> . nk;;:: 0 (11.86)
Nk~1
,

Assuming that all the DPCM channels are in their steady state, the average mean
square distortion in the coding of any vector (for noiseless channels) is simply the
'average of distortions in the various DPCM channels, that is, .
IV
-1 ~ 2 A [(x)
D- N /::/ dnk)l1 e(k), gk(x)=1_Ia(k)]2/(x) (11.87)

where I(x) and gk (x) are the distortion-rate functions of the quantizer and the kth
DPCM channel, respectively, for unity variance prediction error (see Problem .
11.6). The bit allocation problem for hybrid coding is to minimize (11,.87) subject to .
(1l.86). This is now in the framework of the problem defined in Section 11.4, and
the algorithms given there can be applied.

Sec. 11.6 Hybrid Coding and Vector DPCM 519



Example 11.7
Suppose the sernicausal model (see section 6.9, and Eq. (6.106))


uim, n);a[u(m -I,n) +u(m + I,n)] +.yu(rn,n-1)+ e(rn,n),
uim, 0) = 0, Vm
~[E(m, n)] = 0, £[£(m, n)£(i,j)] = (3'8(m - i, n - j)
is used to represent an N x M image with high interpixel correlation. At the bound-
aries we can assume u(O, n) = u(1, n) and u(N, n) = u(N + 1, n), With these boundary
conditions, this model has the realization of (11.84) for cosine transformed columns of
the image with a(k) ~'{IA(k), b(k) ~ l/A(k), <T~(k) '" 13 , ~(k) 4 1 _ 20: cos (k -l)'TrIN,
2

1 ~ k ~ N. In an actual experiment, a 256 x 256 image was coded in blocks of 16 x 256.


The integer bit allocation for B = 1 was obtained as
• •

3, 3, 3, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
Thus, only the first eight cosine transform coefficients of each 16 x 1 column are used
for DPCM coding. Figure 11.29a shows the result of hybrid coding using this model.

Adaptive Hybrid Coding

By updating the AR model parameters with variations in image statistics, adaptive


hybrid coding algorithms can be designed [32]. One simple method is to apply the
adaptive variance estimation algorithm [see Fig. l1.14a and eq. (11.38)] to each
DPCM channel. The adaptive classification method discussed earlier gives better
results especially at low rates. Each image column is classified into one of I (which is
typically 4) predetermined classes that are fixed according to the variance distribu-
tion of the columns. Bits are allocated among different classes so that columns of
high dynamic activity are assigned more bits than those of low activity. The class

j4it4!lttrq"~""!ir -
-" '

• •
,;

f
~I
, ~
!
.-,.
¥,'.'
r
!,
,
, •
t •".-
,;~
~"w_, ..,'",,"'"'__ -'>!""""'_,;":.-,,'.,.. -;"',i'r.,., 'M0'''' ¢---,-""
(a) (b)
, •

Figure 11.29 Hybfid encoded images at 1 bit/pixel. (a) Nonadaptive; (b) adaptive classification.

520 Image Data Compression. Chap. 11


membership information requires additional overhead of log21 bits per column or


(lIN) log zl bits per pixel. Figure 11.29 shows the result of applying adaptive
algorithms to each DPCMchannel of Example 11.7. Generally, the adaptive hybrid
coding schemes can improve upon the nonadaptive techniques by about 3 dB, which
is significant at low rates. .

Hybrid Coding Conclusions

Hybrid coders combine the advantages of simple hardware complexity of DPCM


coders and the high performance of transform coders, particularly at moderate bit
rates (around 1 bit per pixel). Its performance lies between transform coding and
DPCM. It is easily adaptable to noisy images [6, 32]. and to changes in image
· statistics. It is particularly useful for interframe image data compression of motion
images, as we shall see in Section 11.7. It is less sensitive to channel errors than
DPCM but is not as robust as transform coding. Hybrid coders have been imple-
mented for real-time data compression of images acquired by remotely piloted
vehicles (RPV) [33]. .

11.7 INTERFRAME CODING •

Teleconferencing, broadcast, and many medical images are received as sequences


of two-dimensional image frames. Interframe coding techniques exploit the redun-
dancy between the successive frames. The differences between successive frames
are due to object motion or camera motion, panning, 'zooming, and the like.

Frame Repetition

Beyond the horizontal and/or vertical line interlace methods discussed in Section
11.1, a simple method of interframe compression is to subsample and frame-repeat
interlaced pictures. This, however> does not produce good-quality moving images.
An alternative is selective replenishment, where the frames are transmitted at a
reduced rate according to a fixed, predetermined updating algorithm. At the re-
ceiver, any nonupdated data is refreshed from the previous frame stored in the
frame memory. This method is reasonable for slow-moving areas only.

·Resolution Exchange •
.
The response of the human visual system is poor for dynamic scenes that simulta-
neously contain high spatial and temporal frequencies. Thus, rapidly changing areas
of a scene can be represented with reduced amplitude and spatial resolution when
compared with the stationary areas. This allows exchange of spatial resolution with
temporal resolution-and can be used to produce good-quality images at data rates of ..
2-2.5 bits per pixel. One such method segments the image into stationary and
. moving areas by thresholding the value of the frame-difference signal. In stationary
areas frame differences are transmitted for every other pixel and the remaining

· SeC". 11.7 . lnterframe Coding 521


pixels are repeated from the previous frame. In moving areas 2 : 1 horizontal sub-
sampling is used, with intervening elements restored by interpolation along the scan
lines. Using 5-bits-per-pixel frame-differential coding, a channel rate of 2.5 bits per
pixel can be achieved. The main distortion occurs at sharp edges moving with
moderate speed.

Conditional Replenishment

This technique is based on detection and coding of the moving areas, which are
replenished from frame to frame. Let u(m. n, i) denote the pixel at location (m, n)
in frame i. The interframe difference signal is
e(m, n, i) = uim, n, i) - u' (m, n, j - 1) (11.88)
where u' (m, n, i-I) is the reproduced value of u(m, n, i-I) in the (i - l)st frame.
Whenever the magnitude of e(m, n, i) exceeds a threshold 1'}, it is quantized and
coded for transmission. At the receiver, a pixel is reconstructed either by repeating
the value of that pixel location from the previous frame if it came from a stationary
area, or it is replenished by the decoded difference signal if it came from a moving
• •
area, grvmg
utm,n, i) = uim; n, i-I) + e'(m, n, i), if le(m,n, i)1 >1'} (11.89)
u' (m, n, i-I)" . otherwise
For transmission, code words representing the quantized values and their
addresses are generated. Isolated points or very small clusters of moving areas are
ignored to make the address coding scheme efficient. A reasonable-size buffer with
appropriate buffer-control strategy is necessary to achieve a steady bit rate. With
insufficient buffer size, its control can require extreme action (such as stopping the
. coder temporarily), which can cause jerky reproduction of motion (Fig. 11.3Oa).
Simulation studies [6, 39} have shown that with a suitably large buffer a I-bit-per-
pixel rate can be achieved conveniently with an average SNR' of about 34 dB (39 dB
in stationary areas and 30 dB in moving areas), Figure 11.30b shows an encoded
image and the encoding error magnitudes for a typical frame.

Adaptive Predictive Coding

Adaptations to motion characteristics can yield considerable gains in performance


of interframe predictive coding methods. Figure 11.31 shows one such method,
where the incoming pixel is classified as belonging to an area of stationary (Cs),
moderate/slow (CM ) , or rapid (CR) motion. Classification is based on an activity
index a(m, n., i), which is the absolute sum of interframedifferences of a neigh-
borhood vY'(Fig. 11.11) of previously encoded pixels, that is,

a(m, n, i) = 2: IU" (m + x, n + y, i) - u' (m + x, n + y, i - 1)1 (11.90)


~n'~ .
vY'~ {CO, -s), (-1, -1), (-I, 0), (-l,l)} (11.91)

522 Image Data Co~pression Chap.11 ,


. '

\
,
t
-I
, ':~
i

;,'082
J

~
DBD • \
\

lbj Frame Replenishment Coding.


Bit-rate = 1 bit/pixel, SNR' = 34.19 dB.

. ..,-
l
J
'

, .
• •

.~~--

" "
, . , ,


.. o.
"

,,
,
-~

-/ i/
pf-
, . '

J £ _,_.
iil'>w;w;, -083
"d;;;?j ¥a\~


Figure 11.30 Results of interframe predictive schemes.

, .
523

<,J
<: •

~ •

I t


m
.
IQu¥tntizer o, ,
Raster I) + e e Entropy
scanner
+ IOuantizer OM , coder
i
-t I)'
I Ouantizer OR I

• Delays and
memories
Sampling
step • Intensity predictors Buffer
predictor , • •

Motion To channel



.

FlguN 11.31 Interframe adaptive predictive coding.

where s = 2 in 2: 1 subsarnpling mode and is unity otherwise, A large value of


aim, n, i) indicates large motion in the neighborhood of the pixel. The predicted
value of the current pixel, utm, n,i), is .
u'(m,n,i -1), (m,n) E Cs
utm -p,n -q,i-l), (m,n) E CM
u' (m, n, i) = p\u'(m, n -1,i) + P2u'(m -1,n, i)
(11 92)
.
- PIPZu'(m -l,n -1,i), (m, n) E CR

where Pl and P2 are the one-pixel correlations coefficients along m and n,


respectively. Displacements p and q are chosen by estimating the average displace-
ment of the neighborhood .Vthat gives the minimum activity. Observe that for the
case of rapid motion, the two-dimensional predictor of (11.35a) is used. This is
because temporal prediction would be difficult for rapidly changing areas. The
number of quantizer levels used for each class is proportional to its activity. This
,
method achieves additional compression by a factor of two over the conditional
replenishment while maintaining approximately the same SNR (Fig. 11.30c). For
greater details on coder design, see [39]. ..

Predictive Coding with Motion Compensation

In principle, if the motion trajectory of each pixel could be measured, then, only
the initial frame and the trajectory information would need to be coded. To ,re-
produce the images we could simply propagate each pixel along its trajectory. In
practice, the motion of objects in the scene can be approximated by piecewise
displacements' from frame to frame. The displacement vector•
is used to direct

the

524 Image Data Compression Chap. 11


,
motion-compensated interframe predictor. The success of a motion-compensated
coder depends on accuracy, speed, 'and robustness '(with respect to noise) of the
displacement estimator.

Displacement Estimation Algorithms


1. Search techniques. Search techniques look for a displacement vector
d g [p, q Y such that a distortion function D (p, q) between a reference frame
and the current frame is minimized. Examples are template matching, loga-
rithmic search, hierarchical search, conjugate direction, and gradient search
techniques, which are discussed in Section 9.12. In these techniques the log
search converges most rapidly when the search area is large. For small search
areas, the conjugate direction search method is simpler. These techniques are
quite robust and are useful especially when the displacement is constant for a
block of pixels. For interframe motion estimation the search can be usually
limited to a window of 5 x 5 pixels.
,
2. Recursive displacement estimation. To understand this algorithm it is con-
'venient to consider the continuous function u(x, y, t) representing the image
frame at time t. Given a displacement error measure I(x), one possibility is to
update d recursively via the gradient algorithm [37]
• •
dk := dk- 1- e'Vd!k-l
(11.93)
/k~f[u(x,y,t)-u(x -P,oY -q"t-T»)
where dk is the displacement estimate at pixel k for the adopted scanning
sequence, Vd is the gradient of f with respect to d, and 'l' is the interframe
interval. The s is a small positive quantity that controls the correction at each
2
, recursion. For f(x) := x /2, (11.93) reduces to
d k := d k - ' ! - e[u(x, y, t) - u(x - Pk-J,Y - qk-J,t - T)J'
au '.
;;- (x - Pk-bY - qk-b t - T)
vx
(11.94)

au
ay (x - Pk-bY - qk-b t -T)J

Since u (x, y, t) is available only at sampling locations, interpolation is required


to evaluate u, au/ax, and au/ay at (x - Pk,y - qb t - '1'). The advantage of this
algorithm is that it estimates displacement values for each pixel. However, it
lacks the robustness of block search techniques. .
3. Differential techniques. These algorithms are based on the fact that the gray-
level value of a pixel remains constant along its path of motion, that is,
u(x(t),y(t),t) =constant (11.95)
where x (t), y UHs.lhe motion trajectory. Differentiating both sides with re-
spect toz, we obtain '

>
iiuau ilu
- + Vj + V2""'r' = 0, (x, y) E f7( (11.96)
at ax ay
Sec. 11.7 Interframe Coding . 525

where VI ~ dx (t)ldt, Vz ~ dy (t)ldt, are the two velocity components and Y? is
the region of moving pixels having the same motion trajectory. The displace-
ment vector d can be estimated as .
,
d==
f o+1' 1
vdt=v'l' (11.97)

'0

assuming the velocity remains constant during the frame intervals. The veloc-
ity vector can be estimated from the interframe data after it has been seg-
mented into stationary and moving areas [41] by minimizing the function
,

. (11.98)

Setting V.I = 0 gives the solution as,


-I
ex)' ext
(11.99)
Cyx c y... . c y,

where caP denotes the correlation between auliJn and iJuliJl', that is,

Cap ~
== If au iluR dx dy for u, ~ == x, y, t (11.100)
'" on at'
This calculation can be speeded up by estimating the correlations
,
as [40] ,

ca~= If iJu sgn(au)dxdY (11.101) .


•.of on a[3 .
Thus v can be calculated from the partial differentials of u(x, y. t), which can
be approximated from the given 'interframe sampled data. Like the block
search algorithms. the differential, techniques also give the displacement
vector for a block or a region. These methods are faster, although still not as
robust as the block search algorithms.
Having estimated the motion, compression is achieved by skipping im-
age frames and reproducing the missing frames at the receiver either by frame
repetition or by interpolation along the motion trajectory. For example, if the
alternate frames, say u(m, n; 2i), i == 1, 2, ... , have been skipped, then with
motion compensation we have
Frame repetition: u' (m, 11, 2i) = u: (m - p, n - q, 2i - 1) (11.102)
Frame interpolation: uim, n, 2i) == Uu' (m - p, n -q, 2i - 1)
(11.103)
+u'(m -p',n -q', 2i+l)J

where (p, q) and (p', q ') are the displacement vectors relative to the pre-:
ceding and following frames, respectively. Without motion compensation, we
would set p == q == p' == q' == O. Figure 11.32 shows ,the advantage of motion
compensation in frame skipping. ,The improvement due to motion compen-
sation, roughly 10 dB, is quite significant.

526 ,Image Data Compression ChaP. 11


Interframe Hybrid Coding

Hybrid coding is particularly useful for inrerframe image data compression of
motion images. A two-dimensional M x N block of the ith frame, denoted by Vi> is
first transformed to give VI' For each (k, l) the sequence Vi (k, I), i = 1, 2, ... .is
considered a one-dimensional random process and is coded independently by a
suitable one-dimensional DPCM method. The receiver reconstructs v; (k, I) and

'Along temporal axis. SNR' ~ Hl.90 dB

,
'""" . •

;
'.,

Along monon trajectory. SNA' = 26.69 dB

(al Frame repetition lor interframe prediction) based on the preceding frame •

Figure 11.32. Effects of motion compensation on interframe prediction and il'lter-


polaticn.

Sec. 11.7 Interframe Coding 527


-i#


,_ .~... k'

....,4'
•._.-.4-
1fC>

-If"
,•
,
t
,
~'

'" ,-.c.

,r
.-
l.ll>\•.,,,,, "~""'''' _ ._.. ''_ d'·""""" '_" __ ~'_·'''0 __ "",.• ,.,••..,.

Along temporal axis. SNR' = 19.34 dB

...... _...
#'

'--, '
-~----,-.,.

~,_.~; @~"
" ' ... ".'
--. ' .. '
2
--.,~
,
f:-~

• "
..--
-
..


.-
• .. ....J>.... -
~

. --- '
r.. . -_ ,...

j ,'" "
)" .. , , <
4 • •
" "' •r
• • • " • • ·f • " -- ~,~-~,

r
, "

."- •

,,'m
,.,,,,-,-;.;,,*,
," "

• -. - , •

• •
•<' I" •
••

,•
... -' ~

i: ,,- •
h
• •• •
• •
."
. ,•
"

• " '

s

• .... - " . •.• '


D32

Along motion trajectory. SNR' = 29.56 dB

(b) Frame Interpolation from the preceding and the following frames

Figure 11.32 (Cont'd';


-,
performs its two-dimensional inverse transform. A typical method uses the discrete
cosine transform and a first-order AR model for each DPCM channel. In a motion-
compensated hybrid coder the DPCM prediction error becomes
e, (k, l) = Vi (k, l) - (X,V; -I (k, l) (11.104)
where v; -I (k, I) are obtained by transforming the motion-compensated sequence
u; -I (m - p, n - q). If o , the prediction coefficient, is constant for each channel
(k, I), then ei (k, l) would be the same as the transform of u, (m, n)-
OLU;_I (m - p, n - q). This yields a motion-compensated hybrid coder, as shown in

528 Image Data Compression Chap. 11


u,(m,n) + ~
f0- e e'
2·D
+ DCT
QU311ti2er Transmit
- :,- ,

! I ' I I

2~O

I DCr'

...

+ •
+ •

ui _ ! (m - p, n - q) Frame
" memory


Trans Jetton
of bl ock
Motion Transmit
Frame extimator Displacement vector 0', q)
memory Figure 11.33 ' Interfrarne hybrid coding
with motion compensation.

. .'. Fig. 11.33, Results of different interframe hybrid coding methods are shown in Fig.
11.34. These and other results [6, 36J showthat with motion compensation, the
adaptive hybrid coding method performs better than adaptive predictive coding and
adaptive three-dimensional transform coding. However, the coder now requires
two sets of two-dimensional transformations. .

Three-Dimensional Transform Coding



In many applications, (for example', in multispectral imaging, interframe video
imaging, medical cineangiography, CT scanning, and so on), we have to work with
three- (or higher-) dimensional data. Transform coding schemes are possible for
compression of such data by extending the basic ideas of Section 11.5, A three-
dimensional (separable) transform of a M X N x I sequence u (m, n, i) is defined as
M-1N-lj-1
v(k, l,j) ~ 2: 2: 2: u im, n, i)a;\f(k, rn)aN(l,n)ajU, i) (11.105)
m""Q n=O i""O

where Os (k, m) s M -1, Os (I, n) -s N r: 1, Os (t i) sl':'1, and {lhl(k, m)} are


the elements of an M x M unitary matrix AM, and so on. The transform coefficients
given by (ILI05) are simply the result of taking the A-transform with respect to
, each index and will require MNI log, (MN!) operations for a fast transform. The
storage requirement for the data is MNI. As before, the practical approach is to
partition the data into small blocks (such as 16 x 16 x 16) and process each block
independently, The coding algorithm after transformation is the same as before
except that we are W?Eking with triple indexed variables, Figure 11.35 shows results
. for one frame of a sequence of cosine transform coded images. The result of Fig,
1l.35a corresponds to the use of the three-dimensional separable covariance model
rem, n,i) = (Tz P17'1 P1.1p~l
Sec. 11.7 Interframe Coding 529



,
')

lal 0.5 bitipixel, nonadaptive, SNR' = 34 dB;

i_

,
• F I
I
I
I 1089
, .
'.
• DSD
,~ .
,
" ,~

[b) 0.5 bitlplxlll, adaptive, SNR' = 40.3 db;



• •
1
I ••
, •,
, ,
,•

, •

""

• •
.. •
, 7'"


••


• ,• •


• •

".,70.,,·*01""%
'},:' ~

___...............) II L. ,,.,AlJ2J
Illl 0.125 bit/pixel. adaptive with motion oompensatlon, SNR' = 36.7 dB

Figure 11.34 Interframe hybrid coding




530 •
>;

p
•• • "; ,

• 'j',

'-• ,',


,'. ."•"
! -.
,
.,•
I.~"

I Dc;
lal 0.6 bitlpixili. separeble covariancll model, SNR' = 32.1 dB;

;
• •


, •
}

.i
.,
.
10Lf ;
. -, yj \

Ib/ 0.5 bit/pixel. measured covariancas, SNR' =36.8 dB;

it,-'

~
:; ,
;; •
,
J •
I
"k-,,,,,,"rb"1
• asa
lcl 0.5 bit/pixel. measured covariances, adaptive, SNR' = 41.2 dB.

Figure 11.35 Interframe transform coding

·531

which, as expected, performs poorly. Also, the adaptive hybrid coding with motion
compensation performs better than three-dimensional transform coding. This is
because incorporating motion information in a three-dimensional transform coder
requires selecting spacial blocks along the motion trajectory, which is not a very
attractive alternative.

11.8 IMAGE CODING IN THE PRESENCE OF CHANNEL. ERRORS

So far we have assumed the channel between the coder and the decoder to be
noiseless. To account for channel errors, we have to add redundancy to the input
by appending error correcting bits. Thus a proper trade-off between source coding
(redundancy removal) and channel coding (redundancy injection) has to be
achieved in the design of data compression systems. Often, the error-correcting
codes are designed to reduce the probability of bit errors, and forsimplicity, equal
protection is provided to all the samples. For image data compression algorithms,
this does not minimize the overall error. In this section we consider source-channel-
encoding methods that minimize the overall mean square error.
Consider the PCM transmission system of Fig. 11.36, where a quantizer gener-
ates k-bit outputs x, E S, which are mapped, one-to-one, into zr-bit (n ?:k)
codewords g. EC. Let 13(') denote this mapping. The channel is assumed to be
memoryless ~nd binary symmetric with bit error probability p.: It maps the set C of
K = 2' possible n -bit 'code words into a set V of 2" possible n -bit words. At the
receiver, h(') denotes the mapping of elements of V into the elements on the real
line R. The identity element of V is the vector 0 ~ [0,0, ... ,0].

The Optimum Mean Square Decoder

The mean square error between the decoder output and the encoder input is given
by

(T~=(d(l3,h)~E[(y-X)2] = 2.: p(V)lr 2.: (X-(v)-x)Zp(xlv) (11.106)


vEV xES .

and depends on the mappings 13(') and A(')' From estimation theory (see Section

2.12) we know that given the encoding rule 13, the decoder that minimizes this error
is given by the conditional mean of x, that is,

~


yh(V)= 2.: xp(xlv)=E[xlvl
• (11.107)
xES

where p (x Iv) is the conditional density of x given the channel output v. The function
A(v) need not map the channel output into the set S even if n = k..

z Quantizer x Encoder 9 Channel v Decoder y


"I • ) :• iii • ) p{v I g) XI • )

Figure 11.36 Channel coding for PCM transmission.


532 Image Data Compression Chap. 11
,

The Optimum Encoding Rule



The optimum encoding rule pO that minimizes (11.106) requires an exhaustive
search for the optimum subspace C over all subspaces of V. Practical solutions are
found by restricting the search to a particular set of subspaces [39,42]. Table 11.6
shows one set of practical basis vectors for uniformly distributed sources, from

TABLE 11.6 Basic Vectors {<Pi< i == 1, ... , k} for (n,kl Group Codes
n r k
. I• k 0 1 2 3 4
1 10000000 100000001 1000000011
10000000110
2 01000000 010000000 01000000101
0100000001
3 00100000 001000000 0010000000
00100000011
4 00010000 0001 ()()aOO 0001000000 000 lOOOO111
5 00001000 000010000 0000100000 00001000000
6 00000100 000001000 0000010000 00000100000
7 00000010 000000100 0000001000 00()OOO10000
8 00000001 000000010 0000000100 00000001000
1 1000000 10000001 100000011 1000000110 10000001110
"
2 0100000 01000000 010000001 0100000101 01000001010
3 0010000 00100000 001000000 0010000011 00100000101
4 7 0001000 00010000 000100000 0001000111 00010000011
5 0000100 00001000 000010000 0000100000 00001000000
6 0000010 00000100 000001000 0000010000 OOODOllJOOOO
7 0000001 00000010 000000100 0000001000 00000010000
1 100000 1000001 10000011 . 100000110 . 1000001110
2 010000 0100000 01000001 010000101 0100001010
3
6 001000 0010000 00100000 001000011 0010000101
4 000100 0001000 00010000 OOOlOOl11 0001000011
s . ,
000010 0000100 00001000 000010000 0000100000
'6 000001 0000010 00000100 000001000 0000010000
,1 10000 100001 1000011 10000110 100001110
. 2 . 01000 010000 0100001 01000101 010001010
3 5 00100 001000 0010000 00100011 001000101
4 00010 000100 0001000 00010111 000100011
5 00001 000010 0000100 00001000 000010000
.
1 1000 10001 100011 1000110 10001110
2 4
0100 01000 010001 0100101 01001010
3 0010 00100 00100Q, 0010011 OOlQ0101
4 0001 00010 000100 0001111 00010011
1 100 1001 10011 100110 1001110
2 3 010 0100 . 01001 . 010101 0101010
3 001 0010 00100 001011 0010101
, 1 2
10 101 1011 10110 010111
2 01 010 0101 01101 101110
1 1 1
. .
,11 111 1111 11111
• ., 1"- f .." . •

n""l1 , k=6 .

I ..... 1 3 2 4 5 6
g, ..... 10000011101 0100001010000100001011 000100001100 00001000001
• .
00000100000

Sec. 11.8 Image Coding in the Presence of Channel Errors 533


which 130 is obtained. as follows. Let b~[b(1),b(2),... ,b(k)] be the binary
representation of an element of S; then
. k

g= PCb) = L <:B bel) . $1 (11.108)


i= 1
,

where I EEl denotes exclusive-OR summation and, denotes the binary product. The
codes generated by this method are called the (n, k) group codes.
Example 11.8
Let n =4 and k =2, so that n -k=2. Then ~I = [1 0 1 1], </12=[0 1 0 1],
and 13(') is given as follows .
x b g= f3(b)
0 0 0 0 0 0 0='0' $1 EEl 0 . +>
1 0 1 0 1 0 1 = o· $1 EEl 1 . $~ ,
2 1 1 0 1 l=l·$IEElO·+>
3 1
°1 1 1 1 o=~, EEl $,
In general the basis vectors ~i
depend on the bit error probability p, and the source'
probability i&tribution. For other distributions, Table 11.5 is found- to lower the .
channel codrtig performance only slightly for P. <l! 1 [39]. Therefore, these group codes
are recommended for all mean square channel coding applications.

Optimization of PCM Transmission

If lJe and lJq denote the channel and the quantizer errors, we can write (from Fig.
11.36) the input sample as '
2: =X + 'l1q''''' Y + lJe + lJq (11.109) .
This gives the total mean square error as
<,; = E[(z - y)2] = E[(lJe + lJq)2] (11.110)
, For a fixed channel coder 13('), this error is minimum when [6,39] (1) lJe and lJq are


orthogonal and (2) u~ ~ Efl1~] and u;
~ E[lJ~] are minimum. This requires
y A A(V) = E[xlv] (11.111)
, x = a(z) = E[zlz E ..:Til (11.112)
-.
where 5;, i = 1, ... , 2k denotes the lth quantization interval of the quantizer,
This result says that the optimum decoder is independent of the optimum
quantizer, which is the Lloyd-Max quantizer, Thus the overall optimal design can be
accomplished by optimizing the quantizer and the decoder individually. This gives
u; = u; + u; , (11.113)
Let f(k) and c(n, k) denote the mean square distortions due to the k-bit
Lloyd-Max quantizer and the channel, respectively, when the quantizer input is a
unit variance random variable (Tables 11.7 and 11.8). Then we can write the total

Image Data Compression . Chap. 11



~ 1 2 4 5 6 7 8
w
...... Gaussian 0.3634
0.3634
0,1175 0.0095 2.5 10--3
X
4.1 X 10-3
6.4 X 10- 4
1.06 X 10- 3
1.6 X 10- 4
2.7 X 10-4
4 x 10- 5
7 x 10-5
Laplacian 0.1762 0.0154
03 Uniform 0.2500 0.0625 Q.nX 10-4 2.44 x to-> 6.1 x 10- 5 1.52 x 10- 5
----_. --_. - - _. - - - -- --
0.0039
--- - --------------------
TABLE 1'1.8 Channel Distortion c(n, k) for In,k) eiock Coding of Outputs of a Quantizer with Unity Variance Input
' ..
• , . .

p. "" 0.01 P.= 0.001



• Input o 1 2 3· 4 o 1 2 3 4
'. Density
1 0.0252 0.0129 0.00075 0.0004 0.00003 0.00254 0.0013 0.8 x 10,5. -0.4 x 10-<1 <10- 6
• 2 0.0483 0.Q280 0.0066 0.0018 0.0010 0.0050 0.0028 - 0.0005 0.00002 0.00001
• 3 0.0656 O.03n 0.0115 0.0033 0.0025 0.0069 0.0038 0.0010 0.00004 0.00003
• 4 ~)'o765 0.0423 0.0138 0.0056 0.0032 0.0083 0.0044 0.0012 0.00006 O.OOU03
" S 0.0821 0.0450 0.0149 - 0.0062 0.0036 0.0093 0.0047 0.0013 {WOOL 0.00006
. 6 0.0856 0.0463 0.0154 0.0064 0.0037 0.0101 0.0049 0.0014 0.0001 0.00008
• 7 0.0923 0.0477 0.0156 0.0068 0.0037 _ 0.0112 0.0051 0.0014 0.00012 0.00008
8 0.1050 0.0508 0.0169 0.0076 0.0143 0.0056 0.0015 0.00015
.. 1 0.0193 0.0098 0.0006 0.0003 <10- 4 0.0020 0.0010 <10- 4
<10- 4 <10- 5
. 2 0.0554 0.0312 0.0015 0.0022 0.0001 0.0058 0.0032 0.0006 <10-' <10-'
- 3 0.0934 0.0480 0.0151 0.0042 0.0029 0.0102 0.0049 0.0013 <10"4 <10"
\- 4 0.1230 0.0601 0.0195 0.0083 0.0040 0.0148 0.0063 0.0017 <10-' <10->
• 0.0001
S 0.1381 O.06ln 0.0219 0.0091 0.0048 0.0241 0.0075 0.0020 0.0002.
6 0.1404 - 0.0719 0.0231 0.0098 0.0050 0.0210 0.0083 0.0021 0.0002 0.0001
7 0.1406 0.0133 0.0234 0.0099 0.0051 0.0217 0.0088 0.0022 0.0002 0.0001
8 0.14065 0.07395 0,02385 0.00995 0.02185 0.00905 0.00235 0.00025

1 0.0297 0.0151 0.0009 0.0005 <10- 4


0.0030 . 0.0015 <10- 4
<10-4 <10-4

2 0.0371 0.0226 0.0053 0.0015 0.0010 0.0038 0.0023 0.0004 <10-' <10- 4
• '3 0.0390 0.0245 0.0072 0.0026 0.0021 -0.0040 0.0025 0.0006 <10- 4 <10-4
4
4 0.0391 0.0249 0.0011 0.0034 0.0024 0.0040 0.0025 0.0006 <10-' <10-
S 0.0396 0.0250 0.0071 0:0035 0.0025 0.0040 0.0025 0.0006 <10- 4 <10-'
6 - 0.0396 0.0250 0.0078 0.0035• 0.0025 0.0040 0.0025 0.00066 0.00006 0.00004
I 1 0.0396 0.Q250 0.0078 0.0035 0.0025 0.0040 0.0025 0.0006 0.00005
0.00005 .
0.00004
• ~ 0.0396 0.0251 0.0078 0.0035 0.0040 0.0025 0.00065
~ ... ".
mean square error for the input of z of variance (T; as
u~ = u;o}, &l ~ [f(k) + c(n, k)]
(l 1.114)
, (T~~dU(k), u~~uic(n,k) .
For a fixed nand k <: n, f(k) is a monotonically decreasing function of k, whereas
c(n, k) is a monotonically increasing function of k. Hence for every n there is an
optimum value of k = kv (n) for which &; is minimized. Let den) denote the min-
imum value of &; with respect to k, that is, ,
den) = min
k
{&~ (n, k)} ~ &~ (n, k o (n»
.
(11.115)
Figure 11.37 shows the plot of the distortions &;(n, n) and den) versus the rate for
n-bit peM transmission of a Gaussian random variable when p, = 0.01. The quan-
tity M (n, n) represents the distortion of the PCM system if no channel error
protection is provided and all the bits ani used for quantization. It shows, for
example, that optimum combination of error protection and quantization could
improve the system performance by about 11 dB for an 8-bit transmission.

Channel Error Effects in DPCM


,
In DPCM the total error in the reconstructed image can be written as

ou(i, j) = Tiq (i, j) + 2: L h (i -


.
I
"
I
r. j - j')Tic(i I, j') (11.116)
, •

. where 7Jq is now the DPCM quantizer noise and h (i, j) is the impulse response of the
reconstruction filter. .

-2 _ "0'· No channel protection


A With optimum channel
protection

-6

-8
- -10
ec
~in, n)
-'"e
~

'"•
In,
-12
:::;
-14

-16
,

-18

,,

4 5
Rate (bits)
,

536 Image Data Compression Chap., 11


,

It is essential that the reconstruction filter be stable to prevent the channel


errors from accumulating to arbitrarily large values. Even when the predictor
models are stable, the channel mean square error gets amplified by a factor (J~!I3" by
the reconstruction filter where p2 is the theoretical prediction error variance (with-
out quantizer) (Problem 11.5). For the optimum mean square channel decoder the
total mean square error in the reconstructed pixel at (I, j) can be written as
2 2 1
(J;=(T~+ (J~~"=(T; !(k)+i32c(n,k) (11.117)

where 132= 132/(T~, and (T; is the variance of the actual prediction error in the DPCM

•2
loop. Recall that high compression is achieved for small values of 13 • Equation
(11.117) shows that the higher the compression, the larger is the channel error
amplification. Visually, channel noise in DPCM tends to create two-dimensional
patterns that originate at the channel error locations and propagate until the
reconstruction filter impulse response decays to zero (see Fig. 11.38). In line-by-line
DPCM, streaks of erroneous lines appear. In such cases, the erroneous line ,can be
replaced by the previous line or by an average of neighboring lines. A median filter
operating orthogonally to the scanning direction can also be effective.
To minimize channel error effects, cd given by (11.117) must be minimized to
find the quantizer optimum allocation If = ken) for a given overall rate of n bits per
~el.· .
Example 11.9 .
A predictor with a, = 0.848, a, = 0.755, a3 = -0.608, a. = 0 in (11.35a) and ~' = 0.019
is used for DPCM of images. Assuming a Gaussian distribution for the quantizer input,
the optimum pairs tn, ken)] are found to be:

TABLE 11.9 Optimum Pairs [n, k{n)] for DPCM Transmission
pe = 0.01 p, = 0.001

n 1 234 5 1 234 5
ken) 1 111 2 1 222 3

This shows that if the error rate is high (Pe = 0.01), it is better to protect against
channel errors than to worry about the quantizer errors. To obtain an optimum pair,
we evaluate u;!a; via (11.117) and Tables 11.6, 11.7, and 11.8 for different values of k
(for each n). Then the values of k for which this quantity is minimum is found. For a
given choice of (n, k), the basis vectors from Table 11.6 can be used to generate the
transmission code words.

Optimization of Transform Coding

Suppose a channel errpr.causes a distortion QV (k, I) ofthe (k, l)th transform coeffi-
cient. This errormanifests itself by spreading in the reconstructed image in propor-
tion to the (k, l)th basis image, as '
(11.118)

Sec. 11.8 Image Coding in the Presence of Channel Errors 537



,
"

"',

, ,
, ,
i_'t_'· •
~}r

(a)
,
.~-,""~,,,,_: ....y
••"",c' _

",.'-
".,

: ;.,
II
" '<,

,•
'7&_
r,l
M.A1t'~&;._
.WiIP','f'lkn
'.
'.
4> .' .
. ,

'~,

~ (~
Figure 11.38 Two bits/pixel DPCM coding in the presence of a transmission error rate of 10". (a)
Propagation of transmission errors for different predictors. Clockwise from top left, error location:
optimum, three point, and two point predictors. (b) Optimum linear predictor. (c) Two point predictor
. ,')I (A; D) (d) Three-point predictor ')I(A + C - B).

This is actually an advantage of transform coding over DPCM because, for the same
mean square value, localizederrors tend to be more objectionable than distributed
errors. The foregoing results can be applied for designing transform coders that
protect against channel errors. A transform coder contains several PCM channels,
each operating on one transform coefficient. If we represent Zj as the jth transform
coefficient with variance rr 1, then the average mean square distortion of a transform
coding scheme in the presence of channel errors becomes

D = 2: uJd(nj) (11.119)
i
where nj is the number of bits allocated to the jth PCM channel. The bit allocation
, algorithm for a transform coder will now use the function d(n), which can be

538 Image Data Compression . Chap. 11


J ) j )
665~3J221 I 1 1 1000 5~32000000000000
65~JJ222111 11000 ~J22000000000000

544332221 I 1 10000 3220000~OOOOOOOO
4 3 3 J 2 2 2 1 1 1 1 100 0 0 220 0 0 0 0 0 0 0 0 0 0 0 0 0
3332222111110000, 0000000000000000
2222221111100000 0000000000000000
2222211111100000 0000000000000000
12221111111000000 10000000CO'0000000
1 11 111 1110000 000 10000000000000000
11111111100000000
1111111000000000
1111100000000000
0006000000000000
0000000000000000
0000000000000000

1100000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
lal Bit allocated to transform Ibl Channel protection bits allocated
quentizers, k.l;' j) to quantizer output; nli, /) ~ k.II, II
Figure 11.39 Bit allocations for quantizers lind channel protection in 16)( 16
block, cosine transform coding of images modeled by the isotropic covariance
function, Average bit rate is 1 bit/pixel, lind channel error is 1%.


. ",-

-;~

,I
-" . '. (


I~
',' i
• • .... I<L"
".",' .....,.J..,.
~

.'
"~
::;~
."

~ "'I

••_...._--_.""\.:; ~--"~..,.~ ~
Y'iJ
,

(e) 1 bit/pixel, p = 10-2• without channel (d) 1 bit/pixel. p = 10'2. with channel
error protectioll error protection
Figure 11.40 Transform coding in the presence of channel errors.

539
evaluated via (11.115). Knowing nj, we can find k, = k (nj), the corresponding
optimum number of quantizer bits.
Figure 11.39 shows the bit allocation pattern k o(i, j) for the quantizers and the
allocation of channel protection bits (n (i, j) - k o (i, j)) at an overall average bit rate
of 1 bit per pixel for 16 x 16 block coding of images modeled by the isotropic
covariance function. As expected, more protection is provided to samples that have
large; variances (and are, therefore, more important for transmission). The over-
head due to channel protection, even for the large value of p; = 0.01, is only 15%.
For p, = 0.001, the overhead is about 4%. Figure 11.40 shows the results of the
preceding technique applied for transform coding of an image in the presence of
channel errors. The improvement in SNR is 10 dB at p = 0.01 and is also significant
visually. This scheme has been found to be quite robust with respect to fluctuations
in the channel error rates [6, 39].
,

11.9 CODiNG OF TWO-TONE IMAGES

The need for electronic storage and transmission of graphics and two-tone images
such as line drawings, letters, newsprint, maps, and other documents has been
increasing rapidly, especially with the advent of personal computers and modern
telecommunications. Commercial products for document transmission over tele-
phone lines and data lines already exist. The CCITTt has recommended a set of
eight documents (Fig. 11.41) for comparison and evaluation of different binary
image coding algorithms. The CCITT standard sampling rates (or typical A4 (8Hn.
by Ll-in.) documents for transmission over the so-called Group 3 digital facsimile
apparatus are 3.85 lines per millimeter at normal resolution and 7.7 lines per
millimeter at high resolution in the vertical direction. The horizontal
.
sampling
.
rate
standard is 1728 pixels per line, which corresponds to 7.7 lines per millimeter
resolution or 200 points per inch (ppi). For newspaper pages and other documents
that contain text as well as halftone images, sampling rates of 400 to 1000 ppi are
used. Thus, for the standard 8~·in. by l l-in. page, 1.87 x 10 bits wiII be required at
6

200 ppi x 100 Ipi sampling density. Transmitting this information over a 4800-bit/s
telephone line will take over 6 min. Compression by a factor of, say,S can reduce
the transmission time to about 1.3 minutes.
Many compression algorithms for binary images exploit the facts that (1) most
. pixels are white and (2) the black pixels occur with a regularity that manifests itself
in the form of characters, symbols, or connected boundaries. There are three basic
concepts of coding such images: (1) coding only transition points between black
and white, (2) skipping white, and (3) pattern recognition. Figure 11.42 shows a
convenient classification of algorithms based on these concepts.

Run-length Coding

In run-length coding (RLC) the lengths of black and white runs on the scan lines are
coded. Since white (Is) and black (Os) runs alternate, the color of the run need not

t Comite Consultatif International de Telephonic er Telegraphic.


540 Image Data Compression Chap. 11



, h~, \ ,.. •
.... lh it Jl ," II ",·lill'l I' .Il
<
... "'Ii'
•• ~
- " fI ,'- ,-
'" . ,
~.
•• 2
< • ••
-I', ," 'I I;',
••• • 1. ' "
If' '" .•
, -; fi •~ ~­ •
ee.s- U
r• -."
"' ""'j' ",,,, ~ ~

'. < ES i ~. W ~ ~ • r
J •
~j.~ " hl!;;I'1 ... -....."-'" <: . ,- , , ... ;" b
...' ,.•••., -
< .".
. •
• II
,, t~ • !,
~ i)'{"
" i. :~F' .
I "~t,,.,. ~t~zr - ~, ~
,'> f>
,. " 1"\ z'" ., .
,"~~,~-"
z
- , 7,:;1>·- I
An
,'" 'J.- ~
,. iIlll
.. t 'f I 'j': .," ~~-'l
...1-" - ;,~!. •
"" ,; ';l ;'~"- es
.• > .-••
.U:: ~"iI\'~
,·~ a: - . ·
,"--:' .. ~-• "
~
-
:r.
J l
-,-
...... :;1;
"
I'
i lil , , ,;-1~
,.;:: t o.! 1":i
· ..h,· - •
, ~ e-
•• •• L " I .. * ~
~. 1- ;I ! ~ d:~

~!iI
II I' I' ! i'"~.,""' ~ ~ ~
~
, '"
• .1 .. - -1f

."--er-I'-. ... I, ~,~
•~",.: i-,
i 'Ii a! • i ;:;: i ."y t'l':
• • • .,~~1·
'l~
f , ,
t
i
I
t ~\l:d
;~
' •.• * l' -< ~,

. . . !of:
~ 7 ! f; Q
~
,-- •
~ 'I n •
~ • •"
--• r .. . •-
"~.,,,
E :' ~
! . ·,.. f;':'
rc> ••
'. ' ;' "':!:
~
~ J!:- •
1 ~ FP· r s , ,:'1,'" !
.. - 11 -e
11:Jlli! > ,,~ 9 :' t;
-g ·" •~ ~ ;1 t
~J.: >-
~ );':l •
lUI• . i I'" • -
.. • !i ••
.. , , ,I· . - F: ;;:
,- ·' -, H;n •
••• ~~ ~

,
"~i ii ~L ;, i , • I . """-
'A , < "" ~
i, Wh~ " . :;,,,. ..n' -
<. < '• • ~ .;':; •
i! E ~ , 'I' ,- ~ "
.~
. t:
~ ~
;;:; .. ? . ...., ." f ::
Ii
s ~ ,• -,--"
I -, ., ....... · .. j :
~ ~
a
'
, • ' " t , ~ ~ ~ • § ., . ,·..-.., . ,
t,
> •• ,• ;- ~
" • . j' ~
,",
f~} •
"I ! I :I'; ...• 'I~~
. -
-,!...... i'i ,t
- •
:, ..-,' \.-i
~ n" .
_. " ·,.. , ~
, · "
~
,
~
~
.s
~ ~ .-- i, , I , ~ .'
. f '" ,- !:~2t ,I
-., " " t ';
. .. - r, -i • }{ a ..••- \'.. ~ " ~

••• .~ ~
8c:~ js 1",.1 • •
,'~
----'='- "V
I"
~ r
",,-
fti ..." • ... "

1::
~3'A
..... --~-------
~

§
"~
J~"'''''I''
"'~' :,~<-,.... r'''''·;'~''I''''l'''''''·~''!!t.!·''l~'''''-;''1"
.. ~< .• ~ 1 ,,~.,., .--"J. ",r '" ~~r·~!~~~r~~~~t~~~r
:~·i;?·;:!1"'i'!';:_.';
....... "!-~".'" .. ~ ',J",
:;:.:,'!!I'.r'~",
t-~ .. __",.", S;;.,
n ~gd .. Hq;dJJf
~.," % - '~'-,,"-'"~-,!lr,,*
I I'' ' - "'0..
1' - ",
,.~;.", '\"". .'i'"
",. ...
.• '. . . . .•
<:,.-;.,. ".,._~} ',1- ~,!, .. ;~1·~i11;111~~'.jZt
-.:i.~~ ~~;j ;~;~l, Pi!; Jle;:~ ~,i;~'::~~HP; 5 1='i~:
'~."'.>"
•..
,i~,··!~~1,(*:
~ ~~.
'" . ....
o
., > ' . , '.... 't"' ~
~~"'~".,~
- ". ~ ",·~, ' •• "- «-"'~.
•• -."'t-"'_1,.,,,, '''-,.,
.;;:-'~:t-' , •. '1'>'::1 , .• ,t;''!1··,t·,'·''''~1~;;t
r tJ ;] -?-
• • ,~., .... ; -," • }?~'""~:;,,,'!f' " '"
{'
7' r, '-.":-". ,',. ° off""," ~"-t---l !~?!,i!:}t!illl;~p~ Pf-J
;:
~
~ ,j-;;
_."
7 l: f r 1 J. t. H'j \ ~l =;(.1.--
1:: t 1..f';
....,.,.~. ''i" ,'" : , t -1,A ~
"." . ."
i!3~','gl;:!i '1F~~1
f" -,: ~~~ ~",:.;,., ;"~>"" •• ": -:;'''111 ~
• " .. :: ~'> '!:; , ' . I "I'" 1 ' .. " ~ ~ 1 I ._ '~:,"'1;~~',:1: s s es s :
~, U "-1:'"'' . ' ,,~¥,,~:!.
"I
t~",:~~:',,";!~ ~n,::.t~I';~;~:·.;;i>::l,n: ec : ~ 1_",,,"
".•!H
~;-""i',.!·~: ,,'~j'..,J;l~~ "Jl'"lr~Z!\3 b ~'I:;,n;~l!
"I i~l'oj,~
1 •• " • • "" 1'- ~" ~" ,,~t~t ~ ~ ' • .," --., _. "1 ..
~ ~.;.
';"~"-;: 1"'~'·~"1~~··'j /1;~~1 """"\·~jf t,·
.- ~.- "'I ',,' ~ ~~. ", . . ~ - ',.
'"j ~,t,,:.\"" " •• "
"'"
\ '1""1"".,} !~,;; ,.' ,.~?!.:." 'J<:l "\ ' ' .... ", .. ..
r rt~, "".,,~
: ";:'l~J"'~- . , ' ' j , ' ' , , ; : ' '~'~"_"~'A ~ :;1'; .,
'.,.. '. '. ", • ., . , > ~,'" " \.I:t;i'~ll ;.~zt:,
i;' .•.!; ~1, ,t" .~ ",,~,,-, , ' ; ; : " - ' l -tl:'''~ t, 'J";";'-·~'~l. }J:i~~lj
, . 2 ; ;~J j 21&';:· \l{::;~';jl'; ~'eJit ; :i";~""'~~""',':" .:''---It •
~ "If--P~3J-'-l: -
:I -~t,,; .. :<r1 ~r, ;;" J : '. 'Ll';t :; ;~~~J;")<~ "~lJf~
ii'g
~ 7 ;- !" · ; '-,,·t '.~,. -l ,_".' ",1 . , , \ .•• ,_,._;~'~)"., l ' ."t r t" '"
,: " ! "'-,. I." "'--': ~I' ," . . . 1,"t~ ~ ,~- -" .... -~ ,)'~ - " {'
' til'''' ~ 't",;,,",)l'" > ;~"~ ~<'<'ll" ,·< •. ~.~_\5;.l., .Z!I,. 3
:14 >"~, ~ , ' • • ~-',.';"' '". ",l,~,~" .":,".,,, .~.,.,
~ ~ \ " " ~ ~ {-- <1,·
"~_-,,"~"""I'~ _~ \,' , "" 'j""'" ~," ..
""\'"',:' ,,~, ''''',"
'
..... "':' ..-t.'" ~,1·.: .. g
'". '; ,J"1"-:
\ ' ' o• ,,'",. 1.
:"~',_ ~~'" ....,,'"1. J"" '" .. ~'~..
"".~~ '1 ~,l~1
'~ , .-;{";:,\"¥r:, Cl .. if "
~ ;: ,,'" - " :~-'! , ,,,,~,,C ''''''1' , , ,, :;·.c·,,~~ ~~1';·,
~
. :;; .•• ..' >'--, " ~"'~"', - 'iH
, " j- "1.,.", .'" .'~"~'
-" " " ' " , It"'. l:.~,,"
,·,t~ "~,,., '-'-
." . '1H~'~,'
-"
!
,,
,"~,)
" "
__, I ) " ' " " ""","
~,
! " , " , _ ,. ~"l"
'>
" ' i.'
'y'i ,•
1 ~'~.} ~ lil·I"r,.;:~1 ·,l~~~f "
,-', 'l> l·"··-~ ~,.,\
0"l' , ·'""~·""l·~'"'·
'~.' 'i.'~"'~ ,~- " J'~ ~ ii!","'; ,.,f,;'. ;7'''''~ -'Ih
:;7> -;, ; ,,; ",_1'1:{ 'l':-i. ,;;1,11 .", " '~·l·.,.. "'--"."
"'.. 1 e 'r. '.. _, ~~." ~,,>,,',, j' "., 1 .. ' .,."
-g .
~,.t .;.-l'i·~
l"""" '-'",". "
" '" -!. " 'i"
,,
., , " >.
I l' ~,
r,,~W" ~,
,
~"
__ , ' '.~ 'I'--:t~'"
~
.;,<
" " ' f '.;' ,',,;
' , c :. , '" "
~ , ",- i~ ",;'~ .... '"
\~."',,\:)
'I~"
.; ;;';~.l ""I! ',,-"
~\·t \>
,•, "'~
",'~ \-,,">" , .... ",'"
.. ,' ,,,,,',>,-""
• ''''}'l' -~-~$
;-"}'·'~~,"·,·l75 .'~.,t,,'''''t.~t .• '~1~' ·'It--V\ yt.ty.-1
~; ~, ' ~,"t ;1:' J. ':Jlh ':; ; , , : " "",'\ 'I ''''1,' l",~)~~ o·
""c- . ""''''-''1 ' N
';,',', :,' ... , <~-, '-",-,'1 ~:~,,'
, ... " ,I " " '~I"
!'" -
." "', .. '<- " ~ ''''-~ _W"'i'l ";',":~
~ T' ~'i '11 "', -;; 1: J.1'J-<
"~ " .• "')'1- l_ ,
." ,"" " " . .
... ;';-
'"' - "
'-ii
~ ~l'
,'~,,:",
""'"
"
<",
~~
• , .
.., . ,
'~'~"'...~ ~~...
~< .•
, "'.~_'#' ~ ~~.,
;;'~l,f,,~-·t, ,'l.l~',
. .. t' ~ ....1' q
<U""
-, I>
::: hll
t
t:.iAIo ....._ e _lilt _rw ....... , '":'.--r---'--;---rr
-• _.,-,--
-..,-,.---,---,=
.-<I "........
.a.-
_
1"..._10 . . . ___
,~\ju r.,..,....."",' _loIo ........._.
i •• 1' l i\OIOt ~'ili,
., .. ~1. r: t••i
,
,
.H-+~-T-
r

• ._ ~l.'[ To .~}/n ~"


r
,.~- : ..
, '.'''' _.-
tl ..... ,.... fl\ _1"fl'<I"1'"

'. 1_ '!!i'.
,.
. :t.
,,,
»][' ,
.~
.. • ~." ,«0,. r. ",io
0><"
{"",,4io'
;,to ..."., ",>it .>to) " .............I M" ~
~'_·-I

, •• , _ 1 - " , .....-... l)i, ... - 4 T. "'.... ~ ... ,~.


•.f.-.-pf---j-
,. , l'4' ",.. <I< I.l"""'_'''' _ 1> "."""'- J.'"
.. f";',e<' ... ""'''''- ...""". # ' " f• .. (."It;/,
oiI...,z..It po<>" 4·•• ~. h f..... .~. ii, .:....
..... *" .,,,..1 ... I'......... _ ".... .r. ~ ilill ..
1'10oo>'" -. "1l/I1~ .....y.. - . , . -I". __. _ l'i.,,"_ t lo I...".. ,.."'..... It i"",,,~ I,
.... r"" • i_'''''''' .....,.,,_," k
~
.""""'_ " lot ~ J.... 1II _
_ _',", U,. _ _ ok. ".";1.,, __
*,'i! :: "I "",,~,,~ ~....... -
... Iir
... 41". ""'rrt 00< ~","","14 l< ""'" ~ " ...._
<4Ml ....1

f
f. _ _l. ~ tM .. ... " ••"'" ."M ....t_ t ';'r'"tooi',_ ~"" 1 "" ~,~ •
all.. ......... I l l• • II. ~"ool_
.... ", ...,,"'_.
J • • '" f .. :."l1f3"'t ... ~"',." ........ _ _ ...... ::..... I r _ • ....,............
_ • ..L .. 1'J.I -[--+--+-- •
I -10) I ... f- I ,.'
';i!

.f,--f--+"<
~, ".
:;:::-"
,
,,, ~ _, . _ ... i ""... (\At" ;0.;;0;.-,

...
_;·_ ... "-.~_ • ..-r•
,
r..... "~-"'~'-
-")'-'
, .
",
I
,
r.... .,~llh-n ;
• ._-r.. n

~
"
...............-
(--,~

,
• .. , ('-_<i'r-.. . . .

,
.. - . _- , .... ... -
I, '<, _ <i. ~-
,""'... ......

r-, , ..-'" '.......


1~1
11'1 '" 0 ~

~~

! .._
_

. . _T._'. . _.. .
... ".,""'" _ 4 _ ........1 ;"1'
........ tliM , .-'" iLAlll .. '~~
"" -""""'" II·.....,,' II .... H _ ·
1:.......
,..
._,,-
14"" "- - 10 .:. ,...,
•• ••
-,_ .. .
_ . - . . . . •.(q..... -

_-
• _10 ....... _

._,
r
-,
~
,
• .~
.;.. ,r_.,.l}~IJ;" oll.l _ ....
_ _"'_"_''''.<:.1'
~,
'-~l/-J.I~ _ ..._ ...... 10'......__
•..-T'........" ....... _10_,."

Document 5 Document 6

! I
,
,
""
f';o
e
C
I
'-;f
I ,
T
T
t
'"
Gl!
ll! ..-- t~
'r- ...
~ll- ,

L4'
,
·, -
• "II
H ,
• • " .. '"


I
i,,
r,
~,-

'»""-"" •
,, d
,J,~
,,

Dooumont7 Document 8

Flpre 11.41 (Omt'd)

,
542 , •

Binary data compression techniques


White block Run-length Relative addressing Predictive Other
skipping coding techniques coding

• 1-0 • Fixed length • P~Q • Pattern recognition


'2·0 • Huffman • RAe • Boundary following
• Quad trees • MHC • READ • Halftones
• THe
• 81
Figure 11.42 Binary data compression techniques,

be coded (Fig. 11.43). The first run is always a white run with length zero, if
necessary. The run lengths can be coded by fixed-length m-bit code words, each
representing a block of maximum run length M - 1, M ~ 2 , where M can be
m

.optimized to maximize compression (see Section.11.2 and Problem 11.17).


A more-efficient technique is to use Huffman coding. To avoid a large code
book, the truncated and the modified Huffman codes are used. The truncated
Huffman code assigns separate Huffman code words for white and black runs up to
lengths L; and L/" respectively. Typical values of these runs have been found to be
L; = 47, L b = 15 [Ic, p. 1426]. Longer runs, which have lower probabilities, are
assigned by a fixed-length code word, which consists of a prefix code plus an If-bit
binary code of the run length.
The modified Huffman code, which has been recommended by the CCITT
as a one-dimensional standard code for Group 3 facsimile transmission, uses
L; = L b = 63. Run lengths smaller than 64 are Huffman coded to give the terminator
code. The remaining runs are assigned two code words, consisting of a make-up
code and a terminator code. Table 11.10 gives the codes for the Group 3 standard. It
also gives the end-of-line (EOL) code and an extended code table for larger paper
widths up to A3 in size, which require up to 2560 pixels per line. .
Other forms of variable-length coding that simplify the coding-decoding pro-
cedures are algorithm based. Noteworthy among these codes are the AN and EN
codes [Ic, p. 1406]. The AN codes, also called L N codes, are multiple fixed-length
codes that are nearly optimal for exponentially distributed run lengths. They. belong
to a class of linear codes whose length increases approximately linearly with the
, number of messages. If lk' k = 1, 2, 3, ..• , are the run lengths, then the AN code of
block size N is obtained by writing k = q (21'1 - 1) + r, where 1 S r S 2N .... 1 and q is a
o 0 11110 0 11111 0 0 0 W8S,N=4
I II II II II II " II I
00000000111000001111000000000000 Data
• •
L lL..J1 II II I
BW 3B 5W 48 12W Run lengths

1000 0011 0101 0100 1100 RLC, filled
length
Figure 11.43 Run-length coding and white block skipping.

-
Sec. 11.9 . Coding of Two-tone Images 543
TASlE 11.10 Modified Huffman Code Tables for One-dimensional Run-length Coding

Modified Huffman code table

Terminating code words Terminating code words



------~
Run White Black Run White Black
length. runs runs . length runs runs

o 00110101 0000110111 . 32 00011011 000001101010


1 000111 010 33 00010010 000001101011
2 0111 11 00010011 000011010010
3 1000 10 3S 00010100 000011010011
4 1011 011 36 00010101 000011010100
5 1100 0011 37 00010110 000011010101
6 1110 0010 38 00010111 000011010110
7 1111 00011 •
39 00101000 000011010111
8 10011 000101 • 40 0010}OOl 000001101100
9 10100 000100 41 00101010 000001101101
10 00111 0000100 42 00101011 000011011010
11 01000 0000101 43 00101100 000011011011
12 001000 0000111 44 00101101 000001010100
13 000011 00000100 45 00000100 000001010101
14 110100 oeooctu 46 00000101 . 000001010110
IS 110101 000011000 47 00001010 0000010 10 111
16 101010 0000010111 48 00001011 000001100100
17 101011 0000011000 49 01010010 000001100101
18 0100111 ooooomooo 50 01010011 00000101001 0
19 0001100 00001100111 51 01010100 000001010011
20 0001000 00001101000 52 01010101 000000100100
21 0010111 0000110 lWO 53 00100100 000000110111
22 0000011 00000110111 54 00100101 000000111 000
23 0000100 0000010]000 55 01011000 600000100111
24 0101000 oooono 10111 56 01011001 000000 IOJ 000
25 0101011 00000011000 . 57 01011010 000001011000
26 0010011 00001100 1010 58 01011011 00000 1011001
27 0100100 000011001011 59 01001010 000000101011
28 0011000 000011001100 60 01001011 000000101100
29 00000010 000011001101 61 00110010 000001011010
30 00000011 00000 1101 000 62 00110011 000001100110
31 00011010 000001101001 63 00110100 000001100111

nonnegative integer. The codeword for i, has (q + l)N bits, of which the first qN
bits are 0 and the last N bits are the binary representation of r. For example if N == 2,
the A z code for Is (8 == 2 x 3 + 2) is 000010. For a geometric distribution with mean
fJ., the optimum N is the integer nearest to (1 + log, /-L).
Experimental evidence shows that long run lengths are more common than
predicted by an exponential distribution. A better model for run-length distribution
is of the form .

c
P(!) == fa' CI. > 0, C == constant .' (11.140)

• •

544 Image Data Compression Chap. 11


TABLE 11.10 (Continued)

which decreases less rapidly with 1than the exponential distribution.


_.~- The B N codes, also called H N codes, are also multiples of fixed-length codes.
Word length of B N increases roughly as the logarithm of N. The fixed block length
for a B N code is N + 1 bits. It is constructed by listing all possible N-bit words, .
followed by all possible 2N-bit words, and then 3N-bit words, and so on. An
additional bit is inserted after every block of N bits. The inserted bit is{) except for

the bit inserted after the last block, which is 1.
Table 11.11 shows the construction of the B 1 code, which has been found
useful for

RLC..

Sec. 11.9 Coding of Two-tone Images 545


TABLE 11.11 Construction of B, Code.


Inserted bits are, underscored.
,
• Run length kN -bit words B1 Code
1 o 01-
2 1 11-
3 00 0001
- -
4
5
01
10
- -
0011
1001
- -
6 11 -
1011
-
7
•••
000
:

000001
.... --
••

..

Example 11.10
Table 11.12 lists the averages 11.. and ILb and the entropies H; and H; of white and black
run lengths for CCITT documents. From this data. an upper bound on achievable
compression can be obtained as . ,
ILw+ ILb
C.... = Ii.. + H; (11.121)
which is also listed in the table. These results show compression factors of 5 to 20 are
achievable by RLC techmques. '

R = (1- PH)(N + 1) + PH
N
N
. (1.1. 122)
= (1 - PN + ~) bits/pixel

TABLE 11.12 Run-Length Measurements for CCITT Documents

Document
number ILw lJ.b Hw n, Cmax
1 156.3 6,8 5.5 3,6 18.0
2 257.1 14.3 8.2 4,5 21.4
3 89.8 8.5 5.7 ,
3,6 10.6
4 39.0 5.7 4.7 3.1 5.7
5 79.2 7.0 5.7 3.3 9.5
6 138.5 8.0 6.2 3.6 14.9
7 45.3 4.4 5.9 3.1 5.6
8 85.7 70.9 6.9 5.8 12.4

Image Data Compression Chap. 11


where PH is the probability that a block contains all white pixels. This rate depends
on the block size N. The value N = 10 has been found to be suitable for a large
r,.. nge of images. Note that WBS is a simple form of the truncated Huffman code
and should work well especially when an image contains large white areas.
An adaptive WBS scheme improves the performance significantly by coding
911 white scan lines separately. A 0 is assigned to an all-white scan line. If a line
contains at least one black pixel, a 1 precedes the regular WBS code for that line.
The WBS method can also be extended to two dimensions by considering M x N
blocks of pixels. An all-white block is coded by a O. Other blocks are coded by
(MN + 1) bits, whose first bit is 1 followed by the block bit pattern. An adaptive
WBS scheme that uses variable block size proceeds as follows. If the initial block
contains all white pixels, it is represented by O. Otherwise, a prefix of 1 is assigned
and the block is subdivided into several subblocks, each of which is then treated
similarly. The process continues until an elementary block is reached, which is
coded by the regular WBS method. Note that this method is very similar to gener-
ating the quad-tree code for a region (see Section 9.7).

Prediction Differential Quantization [Ic, p. 1418]

Prediction differential quantization (PDQ) is an extension of run-length coding,


where the correlation between scan lines is exploited. The method basically encodes
the overlap information of a black run in successive scan lines. This is done by
coding differences !J..' and !J.." in black runs from line to line together with messages,
new start (NS) when black runs start, and merge (M) when there is no further
overlap of that run. A new start is coded by a special code word; whereas a merge is
represented by coding white and black runlengths r w , rb, as shown in Fig. 11.44.
,,

Relative Address Coding [If, p. 834]

Relative address coding (RAC) uses the same principle as the PDQ method and
computes run-length differences by tracking either the last transition on the same
line or the nearest transition on the previous line. For example, the transition pixel
Q (Fig. 11.45) is encoded by the shortest distance PQ or Q 'Q, where P is the
preceding transition element on the current line and Q I is the nearest transition
element to the right of P on the previous line, whose direction of transition is the
.same as that of Q. If P does not exist, then it is 'considered to be the imaginary pixel
. to the right of the last pixel on the preceding line. The distance QQ I is coded as +N

r, Ivf
. ~ ,......., G'
00111110000110000 111000000000111111 Preceding line
00011111000000110 1111000000011 t0011 Currentline
, 1 I I I P Q
I'I I
I
INsl
I I FIgure 11.45 RAe method. PQ = 1,
I 1--<-'2....,.........'w-t-l
Ii t tr.. t QQ' = -1, RACdistance =-1

• A'
FIgure 11.44 The PDQ method: A" =" - r-:
Sec. 11.9 Coding of Two-tone Images •
547

TABLE 11.13 Relative Address• Codes. x .•. x '" binary representation of

N.


Distance Code 1'1 F(N)

+0 D 1-4 Oxx
+1 100 5-20 lOxxxx
-1 101 21-84 1l0xxxxxx
1'1(1'1>1) 1Il F(N) 85-340 1110xxxxxxxx
+1'1(1'1 >2) 1100 F(N) 341-1364 11 1lOxxxxxx.uxx
-1'1(1'1>2) 1101 F(N) 1365-5460 111110xxxx=xxxxxx
• •• ••
• •

if Q' is N (?:-O) pixels to the left or - N if Q' is N (?:-1) pixels to the right of Q on the
preceding line. Distance PQ is coded as N(?:-l) if it is N pixels away. The RAC
distances are coded by a code similar to the B1 code, except for the choice of the
reference line and for very short distances, +0,+ 1, -1 (see Table 11.13).

CCITI Modified ~elative Element Address Designate Coding

. The modified relative element address designate (READ) algorithm has been rec-
ommended by CCITT for two-dimensional coding of documents. It is a modifica-
tion of the RAC and other similar codes [H, p. 854]. Referring to Fig. 11.46 we
define ao as the reference transition element whose position is defined by the
previous coding mode (to be discussed shortly). Initially, ao is taken to be the
imaginary white transition pixel situated to the left of the first pixel on the coding

Reference line
Coding line
-f-I-
ill,

(al Pass mode





Vertical mode


8,0,
I"
I
)101
I b, I
I I •

• Reference line

CQdingline
I a, I
1 a2
-
I
I 1
.. j .. )r I

Horizontal mode

(bl Veltical and horizontal mode •


'Figure 11,46 ccrrr modified READ coding. •

548 Image Data Compression Chap. 11


line. The next pair of transition pixels to the right oiao are labeled al and a, on the
coding line and b, and b 2 on the reference line and have alternating colors to ao . Any
of the elements 01> a2, bl> bi not detected for a particular coding line is taken a, the
imaginary pixel to the right of the last element of its respective scan line. Pixel aj
represents the next transition element to be coded. The algorithm has three modes
of coding, as follows.

Pass mode. b1 is to the left of aj (Fig. 11.46a). This identifies the white or
black runs on the reference line that do not overlap with the corresponding white or
black runs on the coding line. The reference element ao is set below b 1 in
preparation for the next coding.

Vertical mode. al is coded relative to b l by the distance alb!> which is


allowed to take the values 0, 1,2,3 to the right or left of b l '. These are represented
by V(O), V Il (z), Vdx),x = 1,2,3 (Table 11.14). In this mode ao is set at al in prepa-
ration for the next coding.

Horizontal mode. If !al b 1!> 3, the vertical mode is not used and the run
lengths aOal and al a2 are coded using the modified Huffman codes of Table 11.10.
. After coding, the new position of ao is set at a2 . If this mode is needed for the first
element on the coding line, then the value ao al - 1 rather than ao al is coded. Thus if
the first element is black, then a run length of zero is coded.


TABlE11.14 CCITT Modified READ Code Table [H, p, 8651
.
Elements ,
Mode •
to be coded Notation CodeWord

Pass bl> b2 •
P • 0001
,
Horizontal aOal, a.a, , H 001 + M(aoa,) + M(a,a,)
• .
a, Just
under b, alb, = a V(O) 1
a-b, = 1 VR(l) 011
a, to the
a.b, = 2 VR(2) 000011
Vertical right of b, •
a.b, = 3 Vll(3) 0000011
-a.b, = 1 V,Jl) 010
a, to the
a.b, = 2 VL(2) 000010
left of hi
,. alb, = 3 VL(3) 0000010
2-D extensions 000000lxxx
1-D extensions OOOO()(){''Olxxx
End-of-line (EOL) code word 000000000001

I-D coding of next line EOL+ /1'
• ,
_ _ _ _ _1 2-D coding of next line
• • • Q'
,EOL + 0 , /

. M(aoa,) and M(a; a,) are code words taken from the modified Huftman code tables given in
Table 11.10. The bit assignment for the xxx bits is 111 for the uncompressed mode.

Sec; 11.9 Coding of Two-tone Images 549




t

1:'=0
.
, . •

Yes One-dimensiona~ Kil K factor


K=O? coding


No • •


set So"" 0

Detect 8 , •

• •

Detect b ,. b 2 •

~
·
No No Detect
b 2 <' ,? 'o,b,I<3? '2

y"" Yes •

·

No
• 80 :;: 0

Yes

80S 1 = "081 - 1

.
PI\$$ mode Vertical Horizontal
codlo9 mode coding modecoqing

Set'?l) = b'l set·o=a, Setao~


I


No
.0 = ,1291

• Ves


K=K-' \

, No •
End of PlIll"?

• y",


end
,
Figure 11.47 CCITf modified READ coding algorithm•

550
" The coding procedure along a line continues until the imaginary transition
element to the right of the last actual element on the line has been detected. In
this way exactly 1728 pixels are coded on each line. Figure 11.47 shows the flow
diagram for the algorithm. Here K is called the K -factor, which means that after a
one-dimensionally coded line, no more than K - 1 successive" lines are two-
dimensionally coded. CCfIT recommended values for K are 2 and 4 for documents
scanned at normal resolution and high resolution, respectively. The K -factor is used
to minimize the effect of channel noise on decoded images. The one-dimensional
and two-dimensional extension code words listed in Table 11.14, with xxx equal to
111, are used to allow the coder to enter the uncompressed mode, which may be
desired when the run lengths are very small or random, such as in areas of halftone
images or cross hatchings present in some business forms. "

Predictive Coding

The principles of predictive coding can be easily applied to binary images. The main
difference is that the prediction error is also a binary variable, so that a quantizer is
not needed. If tbe original data has redundancy, then the prediction error sequence
will have large runs of Os (or Is). For a binary image u (m, n), let u (m, n) denote its
predicted value based on the values of pixels in a prediction window W, which
contains some of the previously coded pixels. The prediction error is defined as
e(m,n)= 1, u(m,n)4u(m,n)
0, u(m; n) = u(m, n) • (11.123)
=u(m, n)Et>u (m, n)
The sequence e(m, n) can be coded by a run-length or entropy coding method. The,
image is reconstructed from e(m, n) simply a s ' .
u(m, n) = 'il (m, n) Et>e(m, n) (11.124)
. Note that this is an errorless predictive coding method. An example of a prediction
window W for a raster scanned image is shown in Fig. 11.48.
A reasonable prediction criterion is to minimize the prediction error proba-
N
bility. For an N-element prediction window, there are 2 different states. Let Sk'
N
k = 1, 2, ... ,2 denote the kth state of W with probability Pk and define

qk = Prob[u (m, n) = IISk] (11.125)
Then the optimum prediction rule having minimum prediction error probability is
"

u (m n) = 1, if qk 2: 0.5 (11.126)
, 0, if qk < 0.5
If the random sequence u(m, n) is strict-sense stationary, then the various
probabilities will remain constant at every (m, n), and therefore the prediction rule
stays the same. In practice a suitable choice of N has to be made to achieve a
trade-off between prediction error probability and the complexity of the predictor

. Sec.11.9 '. Coding of Two-tone Images 551


,
VIr! x. X3 X2

X, Xo
.
'k .. 0 0 2 6 15 13 800 o 1 1 1 o 0 2 7 13 S'

Prediction errors
t t t t t
i State So I..-'.~·- - 5 - - - - - - - -...·1..·-~-----------
Run
i<>ngth. StateS, 1f....,.....---3--~--------++--------­
I' .. . ,

State S,II--.>----------2----------.-.I--- ---

Figure 11.48 TUH method of predictive


,
coding. The run length h of state Sk
means a prediction error has occurred after state S, has repeated lk times.

due to large values of N. Experimentally, 4 to 7 pixel predictors have been found to


be adequate. Corresponding to the prediction rule of (11.126), the minimized
prediction error is

pe = 2: p, min(qk, 1- qk) (11.127)


k~l

If the random sequence u(m, n) is Markovian with respect to the prediction window
W, then the run lengths for each state S, are independent. Hence, the prediction-
error run lengths for each state (Fig. 11.47) can be coded by the truncated Huffman
code, for example. This method .has been called the Technical University of
Hannover (TUH) code [Ic, p. 1425].

Adaptive Predictors

Adaptive predictors are useful in practice because the image data is generally
nonstationary, In general, any pattern classifier or a discriminant function could be
used as a predictor. A simple classifier is a linear learning machine or adaptive
threshold logic unit (TLU), which calculates the threshold qk as a linear functional of
the states of the pixels in the prediction window. Another type of pattern classifier is
a network of TLUs called layered machines and'includes piecewise linear discrimi-
nant functions and the so-called o-perceptron. A practical adaptive predictor uses a
L
counter C« of L bits, for each state [43J. The counter runs from 0 to 2 - 1. The
adaptive prediction rule is
ifCk'>2L~1

it(m n)=[l, (11:128)
, [O, if Ck < 2L - 1
After prediction of a pixel has been performed, the counter is updated as
L-1),
, C = min(Ck+l,2 ifu(m,n)-l (11.129)
k
max (Ck - 1,0), . otherwise

552 Image Data Compression . Chap. 11


TABLE 11.15 Compression Ratios of Different Binary Coding Algorithms

. One-dimensional codes Two-dimensional codes •

. Normal or high resolution Normal resolution High resolution


Truncated Modified RAC TUH ccrrr RAC TUH CCIn
RI - Huffman Huffman code code READ code code READ
Document code code code K "" 4 K=4 code, K =2 K ==4 K '" 4 code, K == 4
1 13.62 17.28 16.53 16.67 20.32 15.71 19.68 24.66 19.77
"2 14.45 15.05 16.34 24.66 " 24.94 19.21 28.67 28.99 26.12
3 8.00 8.88 9.42 10.42 11.57 9.89 12.06 14.96 12.58
4 4.81 5.74 5.76 4.52 5.94 5.02 5.42 7.52 6.27
5 7.67 8.63 . 9.15 9.39 11.45 . 9.07 10.97 13.82 11.63
6 9.78 10.14 10.98 15.41 17.26 13.63 17.59 19.96 18.18
7 4.60 4.69 5.20 4.56 5.28 5.10 5.36 6.62 6.30
8 8.54 7.28 8.70 12.85 13.10 11.13 14.73 15.03 15.55
Average 8.93 9.71 10.26 12.31 13.71 11.10 14.31 16.45 14.55

The value L = 3 has been found to yield minimum prediction error for a typical
printed page. .
Comparison of Algorithms
Table 11.15 shows a comparison of the compression ratios achievable by different
algorithms. The compression ratios for one-dimensional codes are independent of
the vertical resolution. At normal resolution the two-dimensional codes improve
the .compression by only 10 to 30% over the modified Huffman code. At high
resolution the improvements are 40 to 60% and are significant enough to warrant
the use of these algorithms. Among the two-dimensional codes, the TUH predictive
code is superior to the relative address techniques, especially for text information.
However, the latter are simpler to code. The CCITT READ code, which is a
modification of the RAC, performs somewhat better.
Other Methods
Algorithms that utilize higher-level information, such as whether the image con-
tains a known type (or font) of characters or graphics, line drawings, and the like,
can be designed to obtain very high-compression ratios. For example, in the case of
printed text limited to the 128 ASCII characters, for instance, each character can
be coded by 7 bits. The coding technique would require a character recognition
algorithm. Likewise, line drawings can be efficiently coded by boundary-following
algorithms, such as chain codes, line segments, or splines. Algorithms discussed
,,"" here are not directly useful for halftone images because the image area has been
modulated by pseudorandom noise and thresholded thereafter. In all these cases
special preprocessing and segmentation is required to code the data efficiently.


31.10 COLOR AND·MULTISPECiRAI..IMAGE CODING
Data compression techniques discussed so far can be generalized to color and
multispectral images, as shown in Fig. 11.49. Each pixel is represented by a p x 1

Sec. 11.10 Color and Multispectral/mage Coding 553



, T1 Component T'1
R p'
coder ,

Coordinate T2 Component T:2 Inverse


G tramformation coder transformation
,
T2 Component
T'3
8 8'
coder

Figure 11.49 Component coding of color images, For multispectral images the
input vector has a dimension greater than or equal to 2.

vector. For example, in the case of color, the input is a 3 x 1 vector containing the
R, G, B components, This vector is transformed to another coordinate system, .
where each component can be processed by an independent spatial coder.
In coding color images, consideration should be given to the facts that (1) ,
the luminance component (Y) has higher bandwidth than the chrominance compo-
nents (I, Q) or (U, V) and (2) the color-difference metric is non-Euclidean in these
coordinates, that is, equal noise power in different color components is perceived
differently. In practical image coding schemes, the lower-bandwidth chrominance '.
signals are sampled at correspondingly lower rates. Typically, the I and Q signals are
sampled at one-third and one-sixth of the sampling rate of the luminance signal. Use
of color-distance metric(s) is possible but has not been used in practical systems
primarily because of the complexity of the color vision model (see Chapter 3),
An alternate method of coding color images is by processing the composite
color signal. This is useful in broadcast applications, where it is desired to manage
only one signal. However, since the luminance and color signals are not in the same
frequency band, the foregoing monochrome image coding techniques are not very'
efficient if applied directly. Typically, the composite signal is sampled at 3/.c (the
lowest integer multiple of subcarrier frequency above the Nyquist rate) or 4fl<, and

---"")1A., 0 0 )I
C. 8. A.
.
Previous
III
0 C
I> ,iii
8
1/11
A


------------~
field
'"'----.-.- .....
o. C. 8. 4.
- .. ---

8 3 A 3 0 3 C3 8:i A; ,

- _._-- --- - - -----' field



.. 'P...,..",t
-.-<t- .. -e-.-.---
8 2 A, O 2 02
0 O~ll~-'/

. 0 0 l(
___ Current
line (\)
A; C1 8 1 Al A; 0 1 C1 8 1 Al 0
--------------- •

(at 3m - $lImpling (b) 4fsc - sampling


Figure 11.50 Subcarrier phase relationships in sampled NTSC signal. Pixels hav-
inll th~ same labels have the same subcarrier phase. '. .

554 Image Data Compression ' Chap. 11



, •
the designs of predictors (for DPCM coding) or the block size (for transform
coding) take into account the relative phases of the pixels, For example, at 3/"
sampling, the adjacent samples of subcarrier have 120" phase difference and at 4fu
sampling, the phase difference is 90°, Figure 11.50 shows the subcarrier phase
relationships of neighboring pixels in the sampled NTSC signal. Due to the
presence of the modulated subcarrier, higher-order predictors are required for
DPCM of composite signals, Table 11,16 gives examples of predictors for predictive
coding of the NTSC signal.
Table 11.17 lists the practical bit rates achievable by different coding algo--
rithms for broadcast quality reproduced images. These results show that the chromi-
nance components can be coded by as few as! bit per pixel (via adaptive transform
coding) to 1 bit per pixel (via DPCM).
Due to the flexibility in the design of coders, component coding performs
somewhat better than composite coding and may well become the preferred choice
with the advent of digital television.
For multispectral images, the input data is generally KL transformed in the
temporal direction to obtain the principal components (see Section 7.6). Each
,
>

TABLE 11.16 Predictors for DPCM of Composite NTSC Signal. .z-' = 1 pixel delay,
Z-N == 1 line delay, Z-262N = 1 field delay, p (Ieak):s 1.
> >

Sampling
rate Predictor, P(z) •

I-D
I-D
2-D >


3-D -,
I-D
2-D
2-D
3-D

TABLE 11.17 Typical Performance of Component Coding Algorithms on COlor Images

Rate Average
Components per component rate
• Method
coded Description bits/component/pixel bits/pixel
PCM R,O,B Raw data 8 24
u: , v- , w· Color space quantizer 1024 color cells 10
Y,l, Q I, Q subsampled 8 12
DPCM One-step predictor Y,I, Q 1, Q subsampled 2t03 3 to 4.5
Transform (cosine. slant) Y,I, Q No subsampling Y(1.15 to 2) 2.510 3

>
I, Q (0.75 to 1)
Y,I,Q Same as above witll Variable 1 t02

Y, U, V adaptive classification

Sec. 11.10 Color and Multispectral Image Coding. 555


TABLE 11.18 Summary of Image Data Compression Methods
~
Typical
average rates ••

Method bits/pixel Comments


~-
Zero-memory methods Simple to implement.
PCM 6-8

Contrast quantization 4-5
Pseudorandom noise-e-quamizaticn 4-5
Line interlace 4

Dot interlace 2--4
Predictive coding
Delta modulation 1 Performance poorer than DPCM, oversample data for improvement.
Intraframe DPCM 2-3 Predictive methods are generally simple to implement, but sensitive
Intratrame adaptive DPCM 1-2 to data statistics. Adaptive techniques improve performance
Interfrarne conditional-replenishment 1-2 .substantially. Channel error effects are cumulative and visibly
Intcrframe DPCM 1-1.5 degrade image quality.
Interfrarne adaptive DPCM OS-1
Transform coding
Intraframc 1-1.5 Achieve high performance, small sensitivity to fluctuation in data
Intraframe adaptive 0.5-1 statistics, channel and quantization errors distributed over the
Interframe 0.5-1 image block. Easy to provide .channel protection. Hardware
Interfrarne adaptive 0.1-0,5 complexity is high,
Hybrid coding
Intraframe 1-2 Achieve performance dose to transform coding at moderate rates
Intraframe adaptive 0.5-1.5 (0.5 to 1 bit/pixel). Complexity lies midway between transform
Interframe 0.5-1 coding and DPCM,
Interframe adaptive 0.25-0.5

Color image coding •
Intraframe 1-3 The above, techniques are applicable.
Interframe 0.25-1
1Wo-tone image coding
I-D Methods O.Q6...{).2 Distortionless coding. Higher compression achievable by pattern
2-D Methods O.O3-<l.2 recognition techniques.
component is independently coded by a two-dimensional coder. An alternative is to
identify a finite number of clusters in a suitable feature space. Each multispectral
pixel is represented by the information pertaining to (e.g., the centroid of) the
cluster to which it belongs. Usually the fidelity criterion is the classification (rather
than mean square) accuracy of the encoded data [45, 46].

11.11 SUMMARY

Image data compression techniques are of significant practical interest. We have


considered a large number of compression algorithms that are available for imple-
mentation. We conclude this chapter by providing a summary of these methods in
Table lLl8. . ' .

PROBLEMS

11.1* For an S-bit integer image of your choice, determine the Nth·order prediction error
field £N(m, n) 6. u(m, n) - uN(m, n) where uN(m, n) is the best mean square causal
predictor based on the N nearest neighbors of u(m, n). Truncate u,,(m, n) to the
nearest integer and calculate the entropy of £,v(m, n) from its histogram' for N =
0,1,2,3,4,5. Using these as estimates for the Nth-order entropies, calculate the
achievable compression. .
11.2 The output of a binary source is to be coded in blocks of M samples. If the successive
outputs are independent and identically distributed with p = 0.95 (for a 0), find the
Huffman codes for M = 1,2,3,4 and calculate their efficiencies.
11.3 For the AR sequence of (11.25), the predictor for feedforward predictive coding
(Fig 11.6) is chosen as II (n) ~ pu(n -:-1). The prediction error sequence £(n) ~
u(n) - it (n) is quantized using B bits. Show that in the steady state,
.
, .
a2
E[I&u(n)J2J = 1 q 2 = a~/(B)
-p

where a~ = 0-;; (1 - p2)/(B) is the mean square quantization error of e(n). Hence the
feedforward predictive coder cannot, perform better than DPCM because the pre-
ceding result shows its mean square error is precisely the same as in PCM. This result
happens to be true for arbitrary stationary sequences utilizing arbitrary linear pre-
dictors. A possible instance where the feedforward predictive coder may be preferred
over DPCM is in the distortionless case, where the quantizer is replaced by an
entropy coder. The two coders will perform identically, but the feedforward predic-
tive coder will have a somewhat simpler hardware implementation.
11.4 (Delta modulation analysis) For delta modulation of the AR sequence of (11.25),
write the prediction .error as e(n) = e(n) - (1- p)u(n -1) + &e(n -1). Assuming a
l-bit Lloyd-Max quantizer and Seen) to be an uncorrelated sequence, show that
• a;(n) = 2(1- p)cr: + (2p -l)a;(n -l)f(l)
from which (11.26) follows after finding the steady-state value of az/cr ; (n).

Problems Chap. 11 557


:U.S Consider images with power spectral density function
_ (32
S(ZI,ZZ)-A(Zt ,ZZ.
)A(ZJ-I ,ZZ-I)

. A (zt, zz) ~ 1 .; P(ZI • zz)


where A (ZI ,zz) is the minimum-variance, causal, prediction-error filter and ~; is the
variance of the prediction error. Show that the DPCM algorithm discussed in the text
takes the form shown in Pig. PH.5, where H (zt, Z2) = l/A (ZI ,Z2)' If the channel
. adds independent white noise of variance IT ~, show that th! total noise in the
reconstructed output would be C1~ = <7; + (0'~Q'~)I132, where aZ = E[(u(m, n»~.
•• •

Ir--------~-----~
. I
ulm. nl +
+
. Qu.ntiz....
e •
+')
.-. , +
, +
I -y. (m,n}

lI

I

ii'
- + 'I I
+ I
I
Plz,. z,l I A(z,. z,l I
+ I
I
+ Channel
, IL _____ ...... _____ ....JI
norse
Coder Decoder

Figure PII.S

11.6 (DPCM analysis) For DPCM of the AR sequence of (11.25), write the prediction
error as e(n) = £«(1) + p8e(n -1). Assuming oe(n) to be all uncorrelated sequence,
show that ,the steady-state distortion due to DPCM is .



For p=O.95, plot the normalized distortion D/fY~ as a function of bit rate B for
B = 1,2,3,4 for a Laplacian density quantizer and compare it with PCM.
11.7* For a 512 x 512 image of your choice, design DPCM coders using mean square
predictors of orders up to four. Implement the coders for B = 3 and compare the
reconstructed images visually as well as on the basis of their mean square errors and
entropies.
11.8 a. Using the transform coefficient variances given in Table 5.2 and the Shannon
quantizer based rate distortion formulas (11.61) to (11.63), compare the dieter-
tion versus rate curves for the various transforms. (Hint: An easy way is to arrange
fT; in decreasing order and let 6 = o J, j = 0, ... ,N - 1 and plot D, ~ lIN
~N-I 2
_k_/O'kversus R1=A 1I2u ",j I 2/ 2,
'~"'k-O Og,<7k,fT/). . .
b. Compare the cosine transform R versus D function when the bit allocation is
determined first by truncating the real numbers obtained via (11.61) to nearest
integers and second, using the integer bit allocation algorithm.
11.9 (Whitening transform versus urUiary transform) An N x 1 vector u with covariance
R={pl"-h~ is transformed as v=Lu, where L is a lower triangular (nonunitary)
matrix whose elements are
• •

I, . • 1=/
11•1 -= -p, i -j =1
.

• •
• •
. , 0, , otnerwise

Image Data Compression C:hap.11


a. Show that v is a vector of, uncorrelated elements with 0'; (0) = 1 and
O'Z(k)"" I-p", Isk sN -1. A.' •
b. If v (k) is quantized using fl. bits, show the reproduced vector u' = L-, v' has the
mean square distortion
IN-I ;:.(l-p2N)
D = N 2: w<!(nk), Wo = 1 _ i .... '
k=O P
e. For N = 15, P = 0.95, and f(x) = 2- 2x, find the optimum rate versus distortion
function of this coder and compare it with that of the cosine transform coder.
From these results conclude that it is more advantageous to replace the
(usuatly slow) KL transform by a fast unitary transform rather than a fast deeorre-
lating transform.
n.10 (Transform coding versus DPCM) Suppose a sequence of length N has a causal

representation
n-l
u(n) "" U (n) + ten) ~ 2: a(n, k)u(k) + E(n),' 0 s n s N-l
k-o
where u (n) is the optimum linear predictor of u(n) and ten) is an uncorrelated
sequence of variance ~;;. Writing this in vector notation as Lu = '" ,,(0) ~ u (0), where
L is an N x N unit lower triangular matrix, it can be shown that R. the covariance
N-I
matrix of u, satisfies IRI = n
'-0
I3Z. Ifu(n) is DPCM coded, then for small levels of dis-
tortion, the average minimum achievable rate is

1 N-I <r;(k)
R O reM "" 2N 2:
log, D '
• -0
D <:min{f.m

a. Show that at small distortion levels, the above results imply
" 1 N-I 113Z 1 .
RoPeM>W k~O log,:o = W logzlRI- iJogzD =RK L \
that is, the average minimum rate achievable by KL transform coding is lower
than that of DPCM.
b. Suppose the Markov sequence of (11.25) with 0';: '" 1, 13~;'" 1, 13; = (1 - p2) for
1 s n s N - 1, is DPCM coded and the steady state exists for n "" 1. Using the
2B
results of Problem 11.6, with f(B) '" 2- , show that the number of bits B required
to achieve a distortion level D is given by

ISn sN-1. •

For the initial sample E(O), the same distortion level is achieved by using

. ~ !og,(I/D) bits. From these and (11.69) show that

R oPeM= J.. logz.!. + !N -1) log, p2 + (1- p)


• 2N D 2N D

(N - 1) pZ D \ (1 - p)
• R DPCM - R KL =
. 2N
log, 1 + ---
(1 - p') ,
I D <: .
(1 + p)


Calculate this difference for N '" 16, p '" 0.95, and D '"' 0.01. and conclude that
at low levels of distortion the performance of KL transform and DPCM coders

Pro:llems Chap. 11 559



,

is close for Markov sequences, This is a useful result, which can be generalized
fOJ AR sequences. For ARMA sequences, bandlimited sequences, and two-
dimensional random fields, this dirference can be more significant.
11.11 For the separable covariance model used in Example 11,5, with p = 0,95, plot and
compare the R versus D performances of (a) various transform coders for 16 X'16
size block utilizing Shannon quantizers (Hint: Use the data of Table 5,2.) and (b)
N x N block cosine transform coders with N = 2", n = 1, 2, ... ,8. (Hint: Use eq.
(P5.28--2).]
n.12 Plot and compare the R versus Dcurves for 16 x'16 block transform coding of images
modeled by the nonseparable exponential covariance function O.95 Vm' + n' using the
discrete, Fourier, cosine, sine, Hadamard, slant, and Haar transforms. (Hint: Use
results of Problem PS.29 to calculate transform domain variances.)
11.13* Implement the zonal transform coding algorithm of Section 11. 5 on 16 x 16 blocks of
an image of your choice. Compare your results for average rates of 0.5, 1.0, and 2.0
bits per pixel using the cosine transform or .any other transform of your choice.
11.14* Develop a chart of adaptive transform coding algorithms containing details of the
algorithms and their relative merits and complexities. Implement your favorite of
these and compare it with the 16 x 16 block cosine transform coding algorithm.
11.15 The motivation for hybrid coding comes from the following example. Suppose an
N x N image u(m, n) has the autocorrelation function rtk, 1) = pikl +lil.
a. If each column of the image transformed as v, = <IIun , where 0 is' the KLT
of Un' then show that the autocorrelation of Vn , that is, E[vn (k)vn , (k')] =
Ak pin - n'l o(k - k '). What are <II and Ak? '
b. This means the transformed image is uncorrelated across the rows and show that
the pixels along each row can be modeled by the first order AI<. process of (11.84)
with a(k) - p, b(k) = 1, and cr;(k) = (1- p')io. k •
11.16 For images having separable covariance function with p = 0.95, find the optimum
pairs n, ken) for DPCM transmission over a I!0isy channel with p, = 0.001 employing
the optimum mean square predictor. (Hint: 13' = (1- p')'.) .
11.17 Let the transition probabilities qo = p(OII) and qi = p(lIO) be given. Assuming all the
runs to be independent, their probabilities can be written as
1 ;::; 1, I = O(white), l(blac;;)
a. Show that the average run lengths and entropies of white and black runs are
fL' = ilq, and H,» (-1/qi)[qi log2qi + (1 - q,) 10/52 (1- qi)]. Hence the achievable
compression ratio is iH« PO/fLO + H, PI/It'), where P, = q,/(qo + q,), i = 0, 1 are the
a priori probabilities of white and black pixels.
b. Suppose each run length is coded in blocks of m -bit words, each word represent-
ing the M - 1 run lengths in the interval [kM, (k + I)M - 1], M = 2M , k = 0,
1, ... , and a block terminator code, Hence the average number of bits used for
white and black runs will be m :E%~o(k + I)P[kMsii s ( k +l)M -1],1 "'0,1.·
What is the compression achieved? Show how to select M to maximize it•

,

, ,

560 Image Data Compression Chap. 11


BIBLIOGRAPHY

Section 11.1

Data compression has been a topic of immense interest in digital image processing.
Several special issues and review papers have been devoted to this. For details and
extended bibliographies:

. 1. Special Issues (a) Proc. IEEE 55, no. 3 (March 1967), (b) IEEE Commun. Tech.
COM-19, no. 6, part I (December 1971), (c) IEEE Trans. Commun. COM-2S, no. 11
(November 1977), (d) Proc. IEEE 68, no. 7 (July 1980). (e) IEEE Trans. Commun.
COM-29 (December 1981), (f) Proc. IEEE 73, no. 2 (February 1985).
2. T. S. Huang and O. J. Tretiak (eds.), Picture Bandwidth Compression. New York:
Gordon and Breach, 1972.
3. L. D. Davisson and R. M. Gray (eds.), Data Compression. Benchmark Papers in Elec-
trical Engineering and Computer Science, Stroudsberg, Penn.: Dowden Hunchinson &
Ross, Inc., 1976.
. ,
4. W. K. Pratt (ed.). Image Transmission Techniques. New York: Academic Press, 1979.
5. A. N. Netravali and J. O. Limb. "Picture Coding: A Review." Proc. IEEE' 68, no. 3
(March 1980): 366-406.

6. A. K. Jain. "Image Data Compression: A Review." Proc. IEEE 69, no. 3(March 1981):
349-389. .
7. N. S. Jayant and P, Noll. Digital Coding ofWaveforms. Englewood Cliffs, N.J.: Prentice-
Hall, 1984. .
8. A. K. Jain, P. M. Farrelle, and V. R. Algazi, "Image Data Compression." In Digital
Image Processing Techniques, M. P. Ekstrom, ed. New York: Academic Press, 1984.
9. E. Dubois, B. Prasada, and M. S. Sabri. "Image Sequence Coding." In Image Sequence
Analysis, T. S. Huang, (ed.) New York: Springer-Verlag, 1981, pp. 229-288. .

Section 11.2

For entropy coding, Huffman coding, run-length coding, arithmetic coding, vector
quantization, and related results ofthis section see papers in [31 and:

10. J. Rissanen and G. Langdon. "Arithmetic Codic." IBM J. Res. Develop. 23, (March
1979): 149-162. Also see IEEE Trans. Comm. COM-29, no. 6 (June 1981): g5~-867.
11. J. W. Schwartz and R. C. Barker. "Bit-Plane Encoding: A Technique for Source
Encoding." IEEE Trans. Aerospace Electron. Syst. AES-2, no. 4 (July 1966): 385-392•
. 12. A. Gersho. "On the Structure of Vector Quantizers." IEEE Trans. Inform. Theory
.,.,1'"
IT-28 (March 1982): 157-165. Also see vol, IT-2S (July 1979): 373-380.

Section 11.3

For some early workon'predictive coding, Delta modulation and DPCM see Oliver,
Harrison, O'Neal, and others in Bell Systems Technical Ioumal issues of July 1952,
May-June 1966, and December 1972. For more recent work:

Bib!iography Chap.. n 561



13. R. Steele. Delta Modulation Systems.


,
New York: John Wiley, 1975. .
14. I. M. Paz, a.c. Collins and B. H. Batson. "A Tri-State Delta Modulator for Run-Length
Encoding of Video." Proc. National Telecomm. 'Can! Dallas, Texas, vol. I (November
1976): 6.3-1-6.3-6. '

For adaptive delta modulation algorithms and applications to image trans-


mission, see 17], Cutler (pp, 898-906), Song, et at. (pp. 1033-1044) in [lb], Lei et al.
in [Ic]. For more recent work on DPCM of two-dimensional images, see Musmann
'in [4]. Habibi (pp. 948-956) in [Ib], Sharma et al. in [lc],
15. J. B. O'Neal, Jr. "Differential Pulse-Code Modulation (DPCM) with Entropy Coding."
IEEE Trans. Inform. Theory IT-21. no. 2 (March 1976): 169-1.74. Also see vol. IT-23
(November 1977): 697-707.
16. V. R. Algazi and J. T. DeWitte. "Theoretical Performance of Entropy Coded DPCM."
IEEE Trans. Commun, COM-30, no. 5 (May 1982): 1088-1095.
17. J. W.Modestino and V. Bhaskaran. "Robust Two-Dimensional Tree Encoding of
Images." IEEE Trans. Commun. COM-29, no. 12 {December 1981): 1786-1798.
18. A. No Netravali, "On Quantizers for DPCM Coding of Picture Signals." IEEE Trans.
Inform. Theory IT-ll (May 1977): ~OO-370. Also see Proc. IEEE 65 (Apri] 1977):
536-548.

• •
For adaptive DPCM, see Zschunk (pp. 1295-1302) and Habibi (pp. 1275-
1284) in [Ic], Jain and Wang [32], and: .

19. L. H. Zetterberg, S. Ericsson, C. Couturier. "DPCM Picture Coding with Tho-


Dimensional Control of Adaptive Quantization." IEEE Trans. Commun. COM~32,
no. 4 (April 1984): 457-462.
20. H. M. Hang and J. W. Woods. "Predictive Vector Quantization of Images," IEEE \
Trans. Commun., (1985).

Section 11.4

For results related to the optimality of theKL transform, see Chapter 5 and the
bibliography of that chapter. For the optimality of KL transform coding and bit
allocations, see [6], [32], and: .

21. A. Segall. "Bit Allocation and Encoding for Vector Sources." IEEE Trans. Inform.
Theory IT-22, no. 2 (March 1976): 162-169. .
,
,, ,
Section 11.5

For early work on transform coding and subsequent developments and examples of
different transforms and algorithms, see Pratt and Andrews (pp. 515-554), Woods
and Huang (pp. 555-573) in [2], and:

22. A. Habibi and P, A. W'mtz. "Image Coding by Linear Transformation and Block Quanti-
zation." IEEE rra'lS. Commun. Tech. COM-19. no. 1 (February 1971): 50-63.

582 Iman" Data ccmpresston Chap. 11


23, P. A. Wintz. "Transform Picture Coding:' Proc. IEEE 60. no. 7 (July 1972); 809-823.
24. W. K. Pratt, W. H. Chen, and L. R. Welch, "Slant Transform Image Coding." IEEE
Ts aus, Commun. COM-22, no. 8'(Augus! 1974): 1073-1093.
25. K. R. Rao, M. A. Narasimhan, and K. Revuluri. "Image Data Processing by Hadamard-
Haar Transforms." IEEE Trans. Computers C·23, no. 9 (September 1975): 888-896.

The concepts of fast KL transform and recursive block coding were introduced
in [26 and Ref 17, Ch 5J. For details and extensions see [6J, Meiri et al. (pp,
1728-1735) in [Ie], Jain et al, in [8], and:

26. A. K. Jain. "A Fast Karhunen-Loeve Transform for a Class of Random Processes."
IEEE Trans. Commun. COM·24 (September 1976): 1023-1029.
27. A. K. Jain and P. M. Farrelle, "Recursive Block Coding." Sixteenth Annual Asilomar
Conference on Circuits, Systems, and Computers, November 1982. Also see P. M.
Farrelle and A. K. Jain. IEEE Trans. Commun. (February 1986), and P. M. Farrelle,
Ph.D. Dissertation, U.c. Davis, 1988. .

For results on two-source coding, adaptive transform coding, and the like,
see Yan and Sakrison (pp. 1315-1322) in [Ic], Jain and Wang [32J, Tasto and Wintz
(pp, 956-972) in [lb], Graham (pp. 336-346) in [Ia], Chen and Smith in [Ic], and:

.. ... 28. W. F. Schreiber, C. F. Knapp, and N. D. Kay. "Synthetic Highs: An Experimental TV


Bandwidth Reduction System." J. Soc. Motion Picture and Television Engineers 68
(August 1959): 525-·537.
29. V. R. Algazi and D. J. Sakrison, "Encoding of a Counting Rate Source with Orthogonal
Functions." Computer Processing in Communications. N.Y.: Polytechnic Institute of
Brooklyn, 1969, pp. 85-100.
30. J. L. Mannes and D. J. Sakrison. "The Effects of a Visual Fidelity.Criterion on the
Encoding of Images.' IEEE Trans. Inform. Theory IT-20 (July 1974): 525-536.
(

Section 11.6
• •

Hybrid coding principle, its analysis and relationship with semicausal models, and
its applications can be found in [6, 8], Jones (Chapter 5) in [4], and:

31. A. Habibi. "Hybrid Coding of Pictorial Data." IEEE Trans. Commun. COM-22
(May 1974): 614-626.
32. A. K. Jain and S. H. Wang. "Stochastic Image Models and Hybrid Coding," Final
Report, NOSe contract N00953-77-C·003MJE, Department of Electrical Engineering,
SUNY Buffalo, New York, October 1977. Also see Technical Report #SIPL·79·6, Signal
and Image Processing Laboratory, ECE Dept., University of California at Davis,
September 1979.
33. R. W. Means, E. H. Wrench and H. J. Whitehouse. "Image Transmission via Spread
Spectrum Techniques." ARPA Quarterly Technical Reports ARPA-QR6, QR8 Naval
Ocean Systems Center, San Diego, Calif., January-December 1975.
34. .A. K. Jain. "Advances in Mathematica1 Models for Image Processing." Proc. IEEE 69
(March 1981): 502-528.

BibliographV . Chap. 11

Section 11.7
,
For interframe predictive coding, see [5,~, 8J, Haskell et al. (pp. 1339-1348) in [lc],
. Haskell (Chapter 6) in [4], and:

35. J. C. Candy, et al, "Transmitting Television as Clusters of Frame-to-Frame Differences."


Bell Syst. Tech. J. 50 (August 1971): 1889-1917.
36. J. R. Jain and A. K. Jain. "Displacement Measurement and Its Application in Inter-
frame Image Coding." IEEE Trans. Comm. COM-29 (December 1981): 1799-1808.
37. A. N. Netravali and J. D. Robbins. "Motion Compensated Television Coding-Part I,"
Bell Syst, Tech. J. (March 1979): 631-670.

Interframe hybrid and transform coding techniques are discussed in [5, 6, 8, 9],
Roese et al. (pp, 1329-1338), Natrajan and Ahmed (pp. 1323-1329) in [Ic], and:

38. J. A. Stuller and A. N. Netravali. "Transform Domain Motion Estimation," Bell Syst.
Tech. J. (September 1979): 1623-1702. Also see pages 1703-1718 of the same issue for
application to coding.
39. J. R. Jain and A. K. Jain. "Interfrarne Adaptive Data Compression Techniques for
Images." Tech. Rept., Signal and Image Processing Laboratory, ECE Department,
University of California at Davis, August 1979.
40. J. O. Limb and J. A. Murphy. "Measuring the Speed of Moving Objects from Television
Signals," IEEE Trans. Commun. COM-23 (April 1975): 474-478.
41. C. Cafforio and F. Rocca. "Methods for Measuring Small Displacements of Television
Images." IEEE Trans. Inform. Theory IT-22 (September 1976): 573-579.

Section 11.8

For the optimal mean square encoding and decoding results and their application,
we follow [6, 39] and:
42. G. A. Wolf. "The Optimum Mean Square Estimate for Decoding Binary Block Codes."
Ph.D. Thesis, University of Wisconsin at Madison, 1973. Also see G. A. Wolf and
R. Redinbo. fREE Trans. Inform. Theory IT·20 (May 1974): 344-351.
,
For extended bibliography, see [6, 8].

Section 11.9

[If] is devoted to coding of two tone images. Details of CeITT standards and
various algorithms are available here. Some other useful references are Arps (pp,
222-276) in [4], Huang in [1c, 2J, Musmann and Preuss in [Ic], and:

43. H. Kobayashi and L. R. Baht "Image Data Compression by Predictive Coding I:


Prediction Algorithms." and "II: Encoding Algorithms." IBM J. Res. Dev. 18, no. 2
(March 1974): 164-179. •

"

664 Image Data Compression Chap. 11




44. T. S. Huang and A. B. S. Hussain. "Facsimile Coding by Skipping White." iEEE Trans.
Commun. COM-23, no. 12 (December 1915): 1452-1466.

Section 11.10

Color and multispectral coding techniques discussed here have been discussed by
Limb et al. in [Ic), Pratt in rib], and:

45. F. E. Hilbert. "Cluster Compression Algorithm, A'Joint Clustering/Data Compression


Concept." JPL Publication 77-43, Jet Propulsion Laboratory, Pasadena, Calif.,
December 1, 1977.
46. J. N. Gupta and P. A. Wintz. "A Boundary Finding Algorithm and its Applications."
IEEE Trans. Circuits and Systems CAS-22 (April 1976): 351-362.

Section 1'1.11

Several techniques not discussed in this chapter include nonuniform sampling tech-
niques combined with interpolation (such as using splines), use of singular value
decompositions, autoregressive (AR) model synthesis, and the like, Summary dis-
cussion and the relevant sources of these and other useful methods are given in
[6, 8]. • •

Bibliography Chap. 11 565


You might also like