Digital Image Processing BN
Digital Image Processing BN
using
Local Segmentation
Torsten Seemann
B. Sc (Hons)
April 2002
Contents
1 Introduction
2.1
Digital images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Image statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1
The histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2
The mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3
The variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2.4
The entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Image algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.1
Image-scalar operations . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2
Image-image operations . . . . . . . . . . . . . . . . . . . . . . .
12
2.4
Image acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.5
Types of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5.1
Additive noise . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5.2
Multiplicative noise . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.5.3
Impulse noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3
iii
iv
CONTENTS
2.5.4
Quantization noise . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.5.5
16
2.6
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.7
Local windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
19
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.2
Global segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2.1
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2.2
Spatial segmentation . . . . . . . . . . . . . . . . . . . . . . . . .
32
Local segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.3.1
37
3.3.2
38
3.3.3
SUSAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.4.1
Temporal filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.4.2
Spatial filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.4.3
43
3.4.4
Rank filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.4.5
49
3.4.6
59
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
3.3
3.4
3.5
CONTENTS
63
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.2
64
4.2.1
65
4.2.2
66
4.2.3
Image sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.2.4
68
4.2.5
69
4.2.6
Pixel quantization . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.2.7
72
4.2.8
Image margins . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
76
4.3.1
Visual inspection . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
4.3.2
78
4.3.3
Difference images . . . . . . . . . . . . . . . . . . . . . . . . . .
80
Test images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.4.1
Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.4.2
Lenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
4.4.3
Montage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
4.5
84
4.6
86
4.6.1
89
4.3
4.4
Application to denoising . . . . . . . . . . . . . . . . . . . . . . .
vi
CONTENTS
4.7
91
4.7.1
92
4.7.2
Students -test . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
4.8
4.9
CONTENTS
vii
153
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
153
5.2
154
5.2.1
Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . .
155
5.2.2
157
5.2.3
Penalized likelihood . . . . . . . . . . . . . . . . . . . . . . . . .
158
5.2.4
Bayesianism . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159
5.2.5
160
5.2.6
161
Describing segmentations . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
5.3.1
Possible segmentations . . . . . . . . . . . . . . . . . . . . . . . .
164
5.3.2
Canonical segmentations . . . . . . . . . . . . . . . . . . . . . . .
164
5.3.3
Valid segmentations . . . . . . . . . . . . . . . . . . . . . . . . .
165
5.3.4
Segmentation parameters . . . . . . . . . . . . . . . . . . . . . . .
166
167
5.4.1
A message format . . . . . . . . . . . . . . . . . . . . . . . . . .
167
5.4.2
A uniform prior . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
5.4.3
A worked example . . . . . . . . . . . . . . . . . . . . . . . . . .
169
5.4.4
Posterior probability . . . . . . . . . . . . . . . . . . . . . . . . .
171
5.5
Application to denoising . . . . . . . . . . . . . . . . . . . . . . . . . . .
172
5.6
173
5.6.1
174
5.3
5.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
CONTENTS
5.7
5.8
5.9
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.9.2
CONTENTS
ix
215
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
215
6.2
216
6.2.1
Impulse noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
216
6.2.2
Multiplicative noise . . . . . . . . . . . . . . . . . . . . . . . . . .
222
6.2.3
223
224
6.3.1
Multispectral pixels . . . . . . . . . . . . . . . . . . . . . . . . . .
224
6.3.2
Planar regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
224
6.3.3
226
6.3.4
227
6.4
Pixel classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
228
6.5
Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
230
6.5.1
231
6.5.2
234
6.6
Image enlargement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
237
6.7
Image compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
6.7.1
242
6.7.2
Adaptive BTC . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243
6.7.3
245
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
247
6.3
6.8
7 Conclusions
249
List of Figures
2.1
2.2
2.3
10
2.4
12
2.5
Alpha-blending example: (a) first image; (b) second image; (c) blended image using . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.6
13
2.7
Different types of noise: (a) original image; (b) additive noise; (c) multiplicative noise; (d) impulse noise. . . . . . . . . . . . . . . . . . . . . . .
15
2.8
17
2.9
Pixel connectedness: (a) 4-connected [Type I]; (b) 4-connected [Type II];
(c) 8-connected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
18
3.1
20
3.2
A sub-image taken from the centre of a image with two segments.
22
3.3
(a) original image; (b) local standard deviations, with black representing
3.4
22
23
xi
xii
LIST OF FIGURES
3.5
A sub-image taken from the right of the image with two segments.
24
3.6
(a) the 8 bpp pellets image; (b) its histogram; (c) thresholded using .
26
3.7
27
3.8
(a) the mug image; (b) its histogram; (c) thresholded using . . . . .
29
3.9
(a) image with 3 segments; (b) co-occurrence matrix for horizontal pairs
of pixels; (c) co-occurrence matrix for vertical pairs of pixels. . . . . . . . .
36
3.10 BTC example: (a) original
pixel block; (b) decomposed into a bit-plane
and two means; (c) reconstructed pixels. . . . . . . . . . . . . . . . . . . .
38
3.11 Comparison of the hard and soft cut-off functions for pixel assimilation in
the SUSAN algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.12 Application of the box filter: (a) original; (b) noisy; (c) filtered. . . . .
42
3.13 Box filtering in the presence of edges: (a) original image; (b) filtered. . . .
43
44
46
46
47
47
3.19 Using the centre weighted median ( !"# ) on a noiseless corner block. . . .
48
3.20 (a) original image; (b) median filtered; (c) weighted median, !"$ .
48
3.21 The Kuwahara filter considers nine regions within a window. . . .
50
3.22 Gradient inverse weighted smoothing (GIWS): (a) original pixels; (b) computed weights, unnormalized; (c) smoothed value when centre pixel included. 51
3.23 Pixel weighting function for the GIWS adaptive denoising algorithm. . . . .
52
52
LIST OF FIGURES
xiii
3.25 Anisotropic diffusion % weighting functions when &'# . Each function has
been scaled for comparison purposes. . . . . . . . . . . . . . . . . . . . .
54
3.26 The SUSAN spatial weighting component when &' , ignoring that (*)+-,./0
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
58
3.28 SUSAN denoising: (a) original pixels; (b) spatial weights, unnormalized,
&23
; (c) intensity difference weights, unnormalized,
3
4 ; (d) denoised
value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.1
Two images of the same scene: (a) light intensity; (b) range, darker is closer.
65
4.2
One dimensional polynomial approximation: (a) the original signal; (b) constant; (c) linear; (d) quadratic. . . . . . . . . . . . . . . . . . . . . . . . .
4.3
Discrete sampling of an object aligned to the pixel grid: (a) original scene;
(b) superimposed sampling grid; (c) digitized image. . . . . . . . . . . . .
4.4
66
67
Discrete sampling of an object mis-aligned with the pixel grid: (a) original
scene; (b) superimposed sampling grid; (c) digitized image. . . . . . . . . .
67
4.5
(a) step edge; (b) line; (c) ramp edge; (d) roof. . . . . . . . . . . . . . . . .
68
4.6
70
4.7
The effect of different Gaussian noise levels: (a) no noise; (b) added noise
. . . . . . . . . . . . . .
71
4.8
72
4.9
4.10 Missing pixels are easily replaced with their nearest neighbours. . . . . . .
75
76
4.12 Noisy equals original plus noise: (a) original image, = ; (b) additive noise, > ;
(c) noisy image, =@? == + > . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
xiv
LIST OF FIGURES
4.13 Visual assessment using ground truth: (a) original image; (b) denoised image. 78
4.14 Visual assessment without ground truth: (a) noisy image; (b) denoised image. 79
4.15 Qualitative measure of filter performance: (a) original image; (b) noisy &A
B image; (c) median filtered; (d) difference between original and denoised,
81
4.16 Qualitative measure of filter performance: (a) noisy image; (b) median filtered; (c) difference between noisy and denoised, mid-grey representing zero. 82
4.17 The BC2 8 bpp square test image. . . . . . . . . . . . . . . . . . . . .
83
83
84
85
4.21 (a) original; (b) noisy &'D version; (c) denoised using averaging. . .
86
4.22 The mean is a poor threshold when the clusters have different populations. .
88
segment model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
90
4.25 [top to bottom, left to right] A mixture of two normal distributions with
common variance, with their means separated by (a) F& ; (b) F& ; (c) F& ; (d)
1& . 92
4.26 (a) noisy, &G; ; (b)(f) simple model selection using HIJ;;. . . . . .
93
94
4.28 (a) noisy, &G#F ; (b)(f) simple model selection using HIJ;;. . . . . .
95
4.29 (a) noisy, &GB ; (b)(f) simple model selection using HIJ;;. . . . . .
95
4.30 Effect of H
96
LIST OF FIGURES
xv
99
4.32 The effective H values used by the -test criterion: KLI and M$-NB . . .
99
4.33 Visual comparison of two model selection functions. Columns are: noisy,
HO$ filtered, and -test filtered images. Rows are: &2 10, 20, 30 and 40. .
101
102
4.35 Oscillatory root images: (a) original image; (b) after one iteration; (c) after
two iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103
4.36 Iterating 100 times: (a) noiseless square; (b) local segmentation assuming &G# ; (c) median filtered. . . . . . . . . . . . . . . . . . . .
104
4.37 Iterating 100 times: (a) noisy &DB square; (b)
local segmentation
assuming &'B ; (c) median filtered. . . . . . . . . . . . . . . . . . .
105
4.38 Iterating 100 times: (a) noisy &O square; (b)
local segmentation
assuming &'$ ; (c) median filtered. . . . . . . . . . . . . . . . . . .
105
4.39 Effect of iteration on RMSE for denoising montage. The same value of &
is used for each iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . .
106
107
4.41 The effect of a two class model on a three class region: (a) original pixels;
(b) segmented assuming QL
. . . . . . . . . . . . . . . . . . . . . . . . .
108
109
110
4.44 DNH disabled: (a) original image; (b) QLD , 64%; (c) QI
, 36%. . . . .
110
111
4.46 Common windows: (a) 5; (b) 9; (c) 21; (d) 25; (e) 37 pixels. . . . . . . . .
112
xvi
LIST OF FIGURES
4.47 Effect of different windows for denoising montage, DNH disabled. . . . . 113
4.48 Effect of different windows for denoising montage, DNH enabled. . . . . 114
4.49 Each pixel participates in 9 local windows. . . . . . . . . . . . . . . . 115
4.50 Weighting overlapping estimates: (a) any S ; (b) S*$ ; (c) ST#-4 ; (d) S*D . 116
4.51 Effect of S on RMSE when denoising montage. . . . . . . . . . . . . . . 117
4.52 Effect of S on WCAE when denoising montage. . . . . . . . . . . . . . . 118
4.53 Denoising square: (a) noisy original, &DU
1 ; (b) output when S9V ;
(c) output when STD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.54 Effect of confidence weighting on RMSE when denoising montage. . . . 119
4.55 Effect of activity weighting on RMSE when denoising montage. . . . . . 120
4.56 Comparison of three different methods for choosing the initial Q means, in
terms of RMSE denoising performance on montage. . . . . . . . . . . . . 125
4.57 Comparison of binary and multi-class local segmentation for denoising montage. DNH is disabled and ST . . . . . . . . . . . . . . . . . . . . . . . 127
4.58 Comparison of binary and multi-class local segmentation for denoising montage. DNH is disabled and STD . . . . . . . . . . . . . . . . . . . . . . . 128
4.59 Comparison of binary and multiclass local segmentation models on
montage. DNH is enabled and S*$ . . . . . . . . . . . . . . . . . . . . 129
4.60 Comparison of binary and multiclass local segmentation models on
montage. DNH is enabled and S*O . . . . . . . . . . . . . . . . . . . . 130
4.61 Binary local segmentation of noisy &'I montage. White denotes: (a) DNH;
(b) QLD ; (c) QLI
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.62 Multi-class local segmentation of noisy &<W montage. White denotes:
(a) DNH; (b)(i) Q to QL . . . . . . . . . . . . . . . . . . . . . . . 131
4.63 (a) noisy &'I montage; (b) local standard deviations. . . . . . . . 133
LIST OF FIGURES
xvii
133
134
134
4.67 (a) original lenna image; (b) after Immerkr structure suppression, with
mid grey representing zero. . . . . . . . . . . . . . . . . . . . . . . . . . .
135
136
137
4.70 Different noise estimation algorithms for lenna, corrected for existing noise. 138
4.71 RMSE comparison for montage, true & supplied. . . . . . . . . . . . . .
140
141
4.73 (a) noisy &XY part of montage; (b) FUELS enhanced difference image,
RMSE=3.48; (c) SUSAN37 enhanced difference image, RMSE=4.11. . . .
142
143
4.75 (a) part of montage without added noise; (b) SUSAN37 output using
143
144
145
146
147
148
4.81 For barb2 with no added noise: (a)(c) FUELS, SUSAN9 and SUSAN37
denoised output; (d)(f) corresponding enhanced difference images. . . . .
149
4.82 Partial histogram of FUELS and SUSAN denoising errors for barb2. . . .
150
xviii
LIST OF FIGURES
5.1
5.2
5.3
The Bayesian MAP estimate may not always coincide with the largest peaked
zone of probability mass. . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.4
5.5
5.6
The 16 possible
Z
binary segment maps. . . . . . . . . . . . . . . . . . 164
5.7
The 8 canonical
Z[
binary segment maps. . . . . . . . . . . . . . . . . . 164
5.8
5.9
prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.19 Pixels are encoded relative to their segment mean. . . . . . . . . . . . . . . 176
LIST OF FIGURES
xix
5.20 Different encodings for a segment means when &G# and _
BF . . . . .
177
178
, white Q#a
:
(a) Pseudo-MML; (b) MML; (c) difference: white shows where MML chose
QLI
over Pseudo-MMLs QR . . . . . . . . . . . . . . . . . . . . . . .
179
5.23 Posterior blending of the two example models from Section 5.4.3. . . . . .
180
5.24 RMSE comparison for denoising montage, with true & supplied. . . . . .
181
182
5.26 Incorporating DNH: (a) null model encodes the data as is; (b) standard model. 183
5.27 FUELS-style DNH usage when denoising montage, true & supplied. . . .
184
185
186
5.30 How often DNH is invoked for montage, true & supplied. . . . . . . . . .
187
188
190
191
5.34 Evolution of the average two-part message length for the MML denoising
algorithm, with & estimated from the noisy montage image. . . . . . . . .
192
5.35 As the amount of data increases, the model part becomes less significant. . .
193
194
5.37 Potential segment maps: (a) popular Qb^ ; (bc) common edge patterns;
(d) would rarely occur. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
5.38 Learning c0d;)feg/ : RMSE comparison for montage, with & estimated. . . .
197
5.39 The 15 most popular (canonical) segment maps for montage when &'# .
197
xx
LIST OF FIGURES
5.40 Learning c0d;)fhjiGk/ : RMSE comparison for montage, with & estimated. . 198
5.41 Comparison of DNH invocation for montage, with & estimated. . . . . . . 199
5.42 More models: RMSE comparison for denoising montage, with & estimated. 201
5.43 Proportion of times a QL#
model was most probable in montage. . . . . 202
5.44 RMSE comparison for denoising montage, with & estimated. . . . . . . . 203
5.45 Proportion of times that QLI
was deemed best when denoising montage.
204
6.2
(a) clean image; (b) with lAW 4 impulse noise; (c) multi-class FUELS,
MASS=2; (d) MASS=3; (e) median; (f) weighted median. . . . . . . . . . 219
6.3
(a) clean image; (b) with lX`-m impulse noise; (c) multi-class FUELS,
MASS=2; (d) MASS=3; (e) median; (f) weighted median. . . . . . . . . . 221
LIST OF FIGURES
xxi
6.4
225
6.5
226
6.6
227
6.7
(a) original lenna; (b) classification into smooth (black) and shaped (white)
feature points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8
(a) original lenna; (b) smooth points (white); (c) shaped points (white);
(d) textured points (white). . . . . . . . . . . . . . . . . . . . . . . . . . .
6.9
229
231
233
234
235
6.12 Example of probabilistic pixel boundary detection: (a) part of lenna; (b) s ;
(c) t ; (d) )fsvubtw/yxB
; (e) z )fs|{wuPt}{./~xB
; (f) R)fsJ,~t/ . . . . . . . . . . . .
236
238
239
239
240
6.17 Comparison of doubling methods: (a) denoised image; (b) pixel replication;
(c) linear interpolation; (d) MML-256 based enlargement. . . . . . . . . . .
241
242
6.19 Standard BTC: (a) original image at 8 bpp; (b) reconstructed image at 2 bpp;
(c) bitmap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
243
6.20 Adaptive BTC: (a) original image; (b) bitmap; (c) reconstructed image at
1.39 bpp, 41% homogeneous blocks using threshold of 8.5; (d) reconstructed
image at 2 bpp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
244
6.21 The causal local neighbourhood consists of pixels known to both encoder
and decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245
List of Tables
4.1
70
4.2
Proportion of pixels at the margins for various mask and image sizes . . . .
77
4.3
97
4.4
Various values of
used in -testing. . . . . . . . . . . . . . . . . . . . .
98
4.5
131
5.1
158
5.2
166
6.1
220
xxiii
Listings
3.1
30
3.2
31
4.1
88
4.2
122
5.1
165
xxv
Abstract
A unifying philosophy for carrying out low level image processing called local segmentation is presented. Local segmentation provides a way to examine and understand existing
algorithms, as well as a paradigm for creating new ones. Local segmentation may be applied
to range of important image processing tasks. Using a traditional segmentation technique
in intensity thresholding and a simple model selection criterion, the new FUELS denoising
algorithm is shown to be highly competitive with state-of-the-art algorithms on a range of
images. In an effort to improve the local segmentation, the minimum message length information theoretic criterion for model selection (MML) is used to select between models
having different structure and complexity. This leads to further improvements in denoising
performance. Both FUELS and the MML variants thereof require no special user supplied
parameters, but instead learn from the image itself. It is believed that image processing in
general could benefit greatly from the application of the local segmentation methodology.
xxvii
Declaration
This thesis contains no material that has been accepted for the award of any other degree or
diploma in any university or other institution. Furthermore, to the best of my knowledge, this
thesis contains no material previously published or written by another person, except where
due reference is made in the text of the thesis.
Torsten Seemann
January 28, 2003
xxix
Acknowledgments
I love deadlines. I love the whooshing sound they make as they fly by.
Douglas Adams, 19522001
I wish to acknowledge the following people for their role in the completion of this thesis:
My supervisor, Peter Tischer, for his guidance, constructive proofreading, and many
fruitful discussions, some of which even stayed on topic. Thank you for helping me
produce a thesis I can be proud of.
David Powell, for many productive white-board exchanges over 6 years of sharing an
office, and for actually understanding most of my thesis after a single proofreading.
My parents, Sandra and Gunther Seemann, for always supporting my choices in life,
and only occasionally nagging me to finish up and get a real job that pays money.
My partner, Naomi Maguire, for her devoted love, support and encouragement. What
else could you ask for?
Naomis parents, Brenda and Bill Maguire, for accepting an unemployed doctoral student bum into their family.
Bernd Meyer, for a handful of valuable thesis related discussions we had when he
occasionally blessed our office with a visit.
The staff in the School of Computer Science and Software Engineering at Monash University. In particular, Trevor Dix, for regularly checking in on me, and commiserating
with the plight of the doctoral student.
This thesis was typeset using LATEX 2 [KD95]. Graphs were generated by gnuplot [WKL ],
and diagrams were constructed using XFig [SKYS].
xxxi
Chapter 1
Introduction
Image processing is a rapidly growing area of computer science. Its growth has been fueled
by technological advances in digital imaging, computer processors and mass storage devices.
Fields which traditionally used analog imaging are now switching to digital systems, for their
flexibility and affordability. Important examples are medicine, film and video production,
photography, remote sensing, and security monitoring. These and other sources produce
huge volumes of digital image data every day, more than could ever be examined manually.
Digital image processing is concerned primarily with extracting useful information from
images. Ideally, this is done by computers, with little or no human intervention. Image processing algorithms may be placed at three levels. At the lowest level are those techniques
which deal directly with the raw, possibly noisy pixel values, with denoising and edge detection being good examples. In the middle are algorithms which utilise low level results for
further means, such as segmentation and edge linking. At the highest level are those methods which attempt to extract semantic meaning from the information provided by the lower
levels, for example, handwriting recognition.
The literature abounds with algorithms for achieving various image processing tasks. However, there does not appear to be any unifying principle guiding many of them. Some are
one dimensional signal processing techniques which have been extended to two dimensions.
Others apply methods from alternative disciplines to image data in a somewhat inappropriate
manner. Many are the same basic algorithm with parameter values tweaked to suit the problem at hand. Alternatively, the parameters are optimized with respect to a suitable training
1
Chapter 1. Introduction
set, without thought on how to vary them for images with different properties. There do
exist well considered methods, but unfortunately a large proportion of new ideas have been
ad hoc, without any central guiding principle.
This thesis proposes a unified approach to low level image processing called local segmentation. The local segmentation principle states that the first step in processing a pixel should
be to segment explicitly the local region encompassing it. On a local scale, this has the effect
of making clear which pixels belong together, and which pixels do not. The segmentation
process results in a local approximation of the underlying image, effectively separating the
signal from the noise. Thus higher level algorithms can operate directly on the signal without risk of amplifying the noise. Local segmentation can be seen as providing a common
framework for constructing image processing algorithms.
Many existing image processing algorithms already make partial use of the local segmentation concept. It is possible to examine these algorithms with respect to the local segmentation
model they use. This helps to make their strengths and weaknesses more apparent. Even popular techniques, such as linear and rank filters, can be framed in terms of their application
of local segmentation. In most cases the segmentation is implicit rather than explicit. That
is, the choice of which pixels belong together is performed in a systematic, but sometimes
roundabout manner. The SUSAN image processing system, developed by Smith and Brady,
originally considered using explicit segmentation, but eventually chose to allow all pixels to
have partial membership in the centre pixels segment.
Image denoising is particularly suited to demonstrating the utility of local segmentation.
Denoising is the process of removing unwanted noise from an image. A denoised image is an
approximation to the underlying true image, before it was contaminated. A good denoising
algorithm must simultaneously preserve structure and remove noise. Obviously, to do this the
algorithm must be able to identify what structure is present. Local segmentation specifically
attempts to separate structure from noise on a local scale. Denoising would therefore be a
good application with which to test different approaches to local segmentation.
Local regions only contain a small number of pixels. It is unlikely that there would be more
than a few segments present at such a scale, so unconnected, homogeneous groups of pixels
are likely to part of the same global segment. Traditional threshold-based segmentation
3
techniques, such as Q -means, perform well with this sort of data. This technique is used
to develop FUELS, an algorithm for denoising greyscale images affected by additive noise.
FUELS requires only one parameter, the noise variance, which can be supplied by the user
or estimated automatically from the image. The noise variance and pixel values are used by
a model order selection criterion for deciding the optimal number of segments present in the
local region.
FUELS has two additional original features. If the optimal segmentation is deemed unfit,
the noisy pixels are passed through unmodified. This is called the do no harm principle.
Also, local segmentation produces multiple overlapping estimates for the value of each pixel
in the image. FUELS is able to combine these estimates to further improve its denoising
performance. It will be shown that FUELS is competitive with state-of-the-art denoising
algorithms over a range of noise levels.
Although FUELS uses a simple segmentation algorithm, it still achieves good results. It does,
however, have some limitations. The nature of FUELS model selection criterion makes it
unable to distinguish segments having a contrast difference less than a specified multiple of
the noise variance. The criterion is also not flexible enough to allow FUELS to compare
different models having the same number of segments. This is because the thresholding
algorithm does not incorporate any spatial information.
Local segmentation may be considered an inductive inference problem. We wish to infer
the underlying image structure given only the noisy pixel values. Bayesian and information
theoretic techniques are recognised as some of the most powerful for these types of problems.
One such technique is Wallaces minimum message length information theoretic criterion
(MML), which has been developed continually since 1968. MML is essentially a Bayesian
method, but provides a sound way to choose a point estimate from the posterior distribution
over models. It works well with small amounts of data, and is invariant under non-linear
transformations of the parameter space.
Under MML, a candidate model is evaluated using its message length. A message is an
efficient lossless encoding of the data, and consists of two parts. The first part encodes
the model and its parameters. The second part encodes the data given the model from the
first part. The candidate model with the shortest overall message length is deemed the best
Chapter 1. Introduction
model for the data. This is similar, but not identical, to the maximum a posteriori (MAP)
estimate used in Bayesian analysis. When the model contains continuous parameters, MML
optimally quantizes the prior density such that the MAP estimate is invariant under non-linear
transformations of the data and model parameters.
The model selection criterion used by FUELS is replaced by a more flexible MML one.
MML makes it straightforward to include extra models in the pool of candidate models
being considered. For a
window, there exists 256 unique ways to divide the 9 pixels
into one or two segments. The MML version of FUELS is modified to evaluate and consider
all of these. This allows spatial information to be exploited, and overcomes the minimum
contrast difference that FUELS requires between segments. The do no harm principle is
re-interpreted, and shown to fit naturally within the MML framework.
Using MML for model selection is shown to improve results relative to FUELS, and to outperform other good denoising algorithms. It produces better local approximations, especially
for very noisy images. Evaluating and comparing large numbers of models using MML,
however, is much more computationally intensive than FUELS. In terms of RMSE, the improvements are not large. This indicates that simpler techniques like FUELS and SUSAN are
already good trade-offs between efficiency and effectiveness. In critical applications where
the best possible modeling is required, the use of MML methods could be warranted.
The local segmentation paradigm is not limited to denoising applications. It is shown that
a variety of image processing tasks may be addressed using local segmentation. Breaking
an image into its low level structural components could be considered an essential first step,
from which many other tasks are derived. It is shown how local segmentation may be used
for edge detection, pixel classification, image enlargement and image compression. The
extension to different noise models, such as impulse noise, image models, such as planar
segments, and higher dimensional data, such as volume images, is also discussed.
The FUELS local segmentation algorithm has the desirable feature of being simple, while
still producing good results. This makes it well suited to robotic and computer vision applications. It can be implemented using low amounts of memory and processing power, ideal
for putting into hardware or embedded microcontrollers. Local segmentation is inherently
parallelizable, because each pixels local region is processed independently. Thus a highly
5
concurrent implementation would be possible. This could be useful in real time applications
where many images per second need to be analysed.
I believe that local segmentation provides a unifying philosophy for carrying out low level
image processing. It provides a way to examine and understand existing algorithms, as well
as a paradigm for creating new ones. A local segmentation analysis of an image can be
re-used by a wide range of image processing tasks. Using a traditional segmentation technique in intensity thresholding and a simple model selection criterion, the FUELS denoising
algorithm is shown to be highly competitive with state-of-the-art algorithms on a range of
images. In an effort to improve the local segmentation, MML is applied to select between
a larger set of models having different structure and complexity. This leads to further improvements in denoising performance. Both FUELS and the MML variants thereof require
no special user supplied parameters, but instead learn from the noisy image itself. I believe
that image processing in general could benefit greatly from the application of the techniques
proposed in this thesis.
Chapter 2
Notation and Terminology
2.1 Digital images
A digital image is a discrete two-dimensional function, w)p|,-/ , which has been quantized
over its domain and range [GN98]. Without loss of generality, it will be assumed that the
image is rectangular, consisting of rows and
is written as < . By convention, w)f ,./ is taken to be the top left corner of the image,
and w)pWB,.I/ the bottom right corner. This is summarized in Figure 2.1.
f(0,0)
f(X1,Y1)
y
Figure 2.1: A rectangular digital image of resolution rL .
Each distinct coordinate in an image is called a pixel, which is short for picture element.
The nature of the output of w)p|,-/ for each pixel is dependent on the type of image. Most
images are the result of measuring a specific physical phenomenon, such as light, heat, distance, or energy. The measurement could take any numerical form.
A greyscale image measures light intensity only. Each pixel is a scalar proportional to the
brightness. The minimum brightness is called black, and the maximum brightness is called
white. A typical example is given in Figure 2.2. A colour image measures the intensity and
chrominance of light. Each colour pixel is a vector of colour components. Common colour
spaces are RGB (red, green and blue), HSV (hue, saturation, value), and CMYK (cyan,
magenta, yellow, black), which is used in the printing industry [GW92]. Pixels in a range
image measure the depth of distance to an object in the scene. Range data is commonly used
in machine vision applications [KS00].
For storage purposes, pixel values need to be quantized. The brightness in greyscale images
is usually quantized to _
image is referred to as having bits per pixel. Many common greyscale images use 8 bits
per pixel, giving 256 distinct grey levels. This is a rough bound on the number of different
intensities the human visual system is able to discern [Jah93]. For the same reasons, each
component in a colour pixel is usually stored using 8 bits.
Medical scans often use 1216 bits per pixel, because their accuracy could be critically
important. Those images to be processed predominantly by machine may often use higher
values of _
light intensity, such as range data, may also require a larger value of _
to store sufficient
distance information.
There are many other types of pixels. Some measure bands of the electromagnetic spectrum
such as infra-red or radio, or heat, in the case of thermal images. Volume images are actually
three-dimensional images, with each pixel being called a voxel. In some cases, volume
images may be treated as adjacent two-dimensional image slices. Although this thesis deals
with greyscale images, it is often straightforward to extend the methods to function with
different types of images.
10
0.012
Relative frequency
0.01
0.008
0.006
0.004
0.002
50
100
150
200
250
Intensity
Figure 2.3: The histogram for the greyscale image in Figure 2.2.
y=
w)p|,~/
(2.1)
Fd;y= .
Fd=|
=Py= ; {
C
(2.2)
)fw)p|,-/}Py= / {
(2.3)
11
1d;y=
C
y= {
by= {
C
)p|,-/}
.4
w)|,-/ { O
(2.4)
w) ? , ? /
(2.5)
{
w)|,/
(2.6)
H )+=/DA
cJd;)f5/-B
c0d)+5/
(2.7)
corresponding to a uniform histogram. It has a minimum value of 0 when all pixels have the
same intensity. The entropy is one measure of the information content of an image. Because
it is calculated from the histogram, it is unable to take spatial factors into consideration.
12
=L9!
w)p|,-/
w)p|,-/w$)|,/Z!
(2.8)
This idea could be used to enhance an image which is too dark. Consider the image in
Figure 2.4a which uses 8 bits per pixel (256 levels), but only contains pixels with intensities
from 64 to 191. One may consider enhancing it to use the full intensity range. This can
be achieved using Equation 2.9, where nq denotes integer truncation, and floating point
precision is used for all pixels during the calculation. The result is given in Figure 2.4b.
=V
jbF
F
BF
(2.9)
=09
(2.10)
13
Imagine wanting to generate a blended version of two greyscale images of identical resolution. This could be achieved using Equation 2.11, where determines the mixing proportion.
Alpha blending is a simple form of morphing, and is often used to dissolve between scenes
in film and television. A visual example for A#-4 is given in Figure 2.5.
(2.11)
=on+20u#)"b/2@q
Figure 2.5: Alpha-blending example: (a) first image; (b) second image; (c) blended image
using vB .
Actual house
Light
Atmospheric noise
House
Camera
Film
Scanner
Digital image
Digital image
Network transmission
errors
Figure 2.6: Noise may be introduced at each step in the acquisition process.
14
The air between the photographer and the house may contain dust particles which interfere
with the light reaching the camera lens. The silver-halide crystals on the film vary in size and
are discontinuous, resulting in film grain noise in the printing process [MJ66]. Most scanners
use a CCD array to scan a row of the print, which may introduce photo-electronic noise. The
scanners CCD array is controlled by a fine stepper motor. This motor has some degree of
vibration and error in its movement, which may cause pixels to be mis-aligned. The scanner
also quantizes the CCD signal, introducing quantization noise [GN98]. Transmitting the
image over the Internet is nearly always a bit preserving operation thanks to error checking
in network protocols. However, an image transmitted to Earth from a remote space probe
launched in the 1970s is almost guaranteed to contain errors.
? )|,-/0#w)|,/uX0)|,-/
(2.12)
Additive noise is independent of the pixel values in the original image. Typically, w)p|,-/
is symmetric about zero. This has the effect of not altering the average brightness of the
image, or large parts thereof. Additive noise is a good model for the thermal noise within
photo-electronic sensors [Pit95].
15
Figure 2.7: Different types of noise: (a) original image; (b) additive noise; (c) multiplicative
noise; (d) impulse noise.
? )|,/0w)p|,~/ub0)|,/yw)p|,~/w#w)p|,-/@uXw)p|,-/
(2.13)
16
? )p|,-/0
0)|,-/
"mTdyBF-m
w)|,/
"mTdyBF-m
"2
(2.14)
2.6 Segmentation
17
2.6 Segmentation
Segmentation involves partitioning an image into groups of pixels which are homogeneous
with respect to some criterion. Different groups must not intersect each other and adjacent
groups must be heterogeneous [PP93]. The groups are called segments. Figure 2.8a shows a
noisy image containing three objects on a background. The result of segmentation is given
in Figure 2.8b. Four segments were discovered, and are shown with a dashed outline.
Figure 2.8: (a) original image; (b) segmented into four segments.
The homogeneity criterion used for segmenting Figure 2.8 was based only on the similarity
of pixel intensities. For images containing large amounts of noise or fine structure, this
criterion may be insufficient for successful segmentation. In those cases, some information
regarding the spatial relationship between pixels is required. In particular, the assumption
that pixels belonging to the same segment are expected to be spatially connected is exploited.
A pixel only has eight immediate neighbours. If symmetry is required, there exists only
three forms of pixel connectedness which make sense, shown in Figure 2.9. There is one
type of 8-connectedness, and two types of 4-connectedness. Type I is more popular than
Type II due to the improved proximity of the four neighbouring pixels. In this thesis, only
8-connectedness and Type I 4-connectedness are considered.
Most digital images exist on a rectangular grid. This is primarily due to the arrangement of
image sensors on camera and scanning equipment. Research has been done on the superior
properties of hexagonal grids [Sta99], but they are unlikely to displace the square grid in the
near future. In this thesis we deal only with images sampled on a square grid.
18
Figure 2.9: Pixel connectedness: (a) 4-connected [Type I]; (b) 4-connected [Type II]; (c) 8connected.
Chapter 3
Local Segmentation in Image Processing
3.1 Introduction
This thesis proposes the use of local segmentation as an effective way to achieve a variety of
low level image processing tasks. The local segmentation principle states that the first step in
processing a pixel should be to segment the local region encompassing that pixel. This provides a snapshot of the local structural features of the image, with the signal clearly separated
from the noise. It is hoped that the identified structural information could be used to implement many image processing tasks including, but not limited to, image denoising [Mas85],
pixel classification [Cho99], edge detection [HSSB98], and pixel interpolation [AW96].
Local segmentation can be seen to belong to a continuum of approaches to image understanding, as shown in Figure 3.1. At the lowest level is local segmentation which operates
in a purely local manner using only a small number of pixels. At a higher level is global
segmentation which attempts to group together related pixels from throughout the image.
The highest level is object recognition, whereby global segments are combined into logical
units representing real world objects of interest.
The fundamental component of the local segmentation approach is the segmentation algorithm itself. Most segmentation algorithms are designed to operate upon a whole image, or a
large portion thereof. Local segmentation can only utilise a small number of pixels belonging to fragments of larger segments. Thus a local segmentation algorithm differs in that it
19
20
Object Recognition
Global Segmentation
Local Segmentation
Groups of segments
Metaglobal
Global
Local
has less data and less context to work with. In Section 3.2 the suitability of applying global
segmentation algorithms to local segmentation will be examined.
Some image processing techniques can be seen or interpreted as exploiting the principle of
local segmentation in some way. In most cases the local segmentation principle is not stated
explicitly, nor used as a guiding principle for developing related algorithms. One example is the lossy image compression technique Block Truncation Coding, or BTC [FNK94].
BTC uses simple thresholding to segment
blocks of pixels into two classes, but it was
many years before alternative segmentation algorithms were considered. Further instances
of existing local segmentation-based algorithms will be explored in Section 3.3.
Those image processing tasks suited to local segmentation are often the first to encounter the
raw image data. This data is usually contaminated with one or more forms of noise. The
fundamental attribute of a local segmentation based algorithm should be to preserve as much
image structure (useful information) as possible, and to suppress or remove as much noise
(useless information) as possible. These goals are complementary and inseparable the
ability to identify structure implies the ability to identify noise, and vice versa.
Good image denoising algorithms specialize in extracting structure from noisy images. This
application is probably the most appropriate low level technique for demonstrating local
segmentation. In Section 3.4, the extent to which existing denoising algorithms utilise the
principles espoused in this thesis will be explored. It is shown that the trend in state-of-the-art
denoising algorithms has been toward a local segmentation perspective.
Upon reading this chapter, certain themes will become apparent. Image processing is an
enormous field consisting of many different algorithms. Within a specific field it is often
21
difficult to compare results. This is due to the rarity of objective criteria for comparison, a
lack of standard test data, and simply the widely differing goals and needs of each system.
Some techniques are ad hoc in their approach, often having been inappropriately adapted
from other fields without thought to the validity of the underlying assumptions. Others have
one or more tunable parameters which, although data dependent, must be supplied by the
user, rather than learned automatically from the image itself. It is hoped that the work in this
thesis will go some way to improving this situation.
22
Figure 3.2: A } sub-image taken from the centre of a } image with two segments.
of the sub-image centered on that pixel. The standard deviations range from 0 to 72,
and are represented here using intensities from black to white.
Figure 3.3: (a) original image; (b) local standard deviations, with black representing 0
and white 72.
Homogeneous regions in the original image produce dark areas in the standard deviation
image, while edges are responsible for lighter areas. The majority of Figure 3.3b is dark.
Figure 3.4 shows a histogram of the standard deviations, which is expected to be unimodal
and skewed to the right [RLU99]. The peak at 2.4 corresponds roughly to the natural level of
variation within homogeneous regions. The skew toward larger standard deviations is caused
by heterogeneous edge and texture regions varying above the natural level.
23
0.08
0.07
Relative frequency
0.06
0.05
0.04
0.03
0.02
0.01
0
10
15
20
25
30
The area under the histogram near the natural level of variation is much higher than in the
skew. This suggests that it is highly likely that a randomly selected sub-image will be
homogeneous, consisting of a single segment. This is in stark contrast to global segmentation, where the diagnosis or even consideration of a single segment result is rare. A local
segmentation algorithm must therefore be able to determine automatically the number of
segments present in the sub-image. Fortunately, this task is made easier because the number
of segments is likely to be small.
A small sub-image implies small segments. Global segmentation algorithms, in an attempt to
reduce over-segmentation, often disallow segments containing fewer than a specified number
of pixels. A local segmentation algorithm should expect to diagnose many small segments,
including those consisting of a single pixel. Figure 3.5 shows a alternate sub-image
taken from a whole image. The one pixel segment may be considered noise by global segmentation, but local segmentation must be more lenient and allow for the fact that a lone
pixel may be part of a larger global segment.
Global segmentation deals mostly with segments consisting of a relatively large number of
pixels. This makes estimated parameter values for global segments naturally more robust.
24
Figure 3.5: A sub-image taken from the right of the image with two segments.
Local segmentation must be frugal in its demands for pixel data. An important factor is the
processing of pixels from the image margins because they have incomplete neighbourhood
information. For a window applied to an i
i
is )fiP*/yx@i { . For whole images ( iW
( i
) being locally segmented, this proportion can be very high. For the common
25
In the following sections those global segmentation algorithms relevant to the development
of good local segmentation algorithms are examined. The discussion is broken into two main
parts. Section 3.2.1 deals only with clustering techniques (non-spatial segmentation), while
Section 3.2.2 covers methods which incorporate spatial information.
3.2.1 Clustering
Segmentation which does not use spatial information is sometimes called clustering. Clustering was used in numerical taxonomy and multivariate analysis long before electronic computers existed [Fis36]. Image processing has adapted clustering algorithms from other disciplines by treating pixel values as independent variables. For example, colour pixels have
3 attributes and each pixel can be considered a point in 3-dimensional space. Clustering
involves grouping pixels in this multivariate space.
For the case of greyscale intensity data, clustering into
of estimating
as a dividing point between neighbouring clusters. Typically, each pixel in the thresholded
image, j)p|,-/ , is set to a unique intensity, 0 , associated with each of the
clusters.
v)p|,-/0
m
..
.
Z
..
.
)p|,-/`
)p|,-/`
{
)p|,-/`
Z
(3.1)
)p|,-/
Thresholding assumes that pixels from different segments form separate populations based
solely on the disparity of their intensities. It is well suited to images containing relatively few
objects and low amounts of noise. If there is a large number of segments, or large variation
within segments, it is more likely that the segments pixel value distributions will overlap,
rendering valleys in the histogram less obvious or even non-existent [LCP90]. Thresholding
would be a good candidate for local segmentation, because on a small scale only a small
26
Binary thresholding
Thresholding is one of the oldest, simplest and most popular techniques used in image processing [PP93]. Most of the thresholding literature is concerned with classifying pixels into
object or background classes [Wes78, FM81, SSW88]. This is known as binary or bi-level
thresholding. Algorithms which deal with three or more classes are called multilevel thresholding techniques [RRK84, PG94]. On a local scale, most sub-images are expected to be
homogeneous. This implies that the next most likely situation is sub-images with two segments. Thus an examination of binary clustering will be useful.
If an image consists of two or more clear objects, the histogram should have a corresponding
number of peaks. The thresholds should be chosen at the valleys of the histogram. For bilevel thresholding, Prewitt et al [PM66] repeatedly smoothed the histogram until a single
minimum existed between two maxima, and chose
Figure 3.6: (a) the 8 bpp pellets image; (b) its histogram; (c) thresholded using .
symbols. Kapur et
al [KSW85] split the histogram into two parts and compute the entropy 1 of each distribution.
The optimal threshold, , is chosen to maximize the sum of the entropies of the two parts.
1
The entropy measures the average information content of a set of symbols, assuming the probability of
each symbol is known. If the probability of symbol is y , and there are symbols, the entropy (in bits)
is given by the expression:
~g~ .
27
The aim is to retain as much information as possible in the binarized image by choosing
a split in which each of the two distributions are as uniform as possible. This method has
the advantage of not having to estimate any parameters, but does not help us decide on the
bimodality of the histogram.
The trend in thresholding has been toward mixture modeling, which treats an image histogram as a linear blend of parameterized statistical distributions [EH81, TSM85, MP00].
Kittlers minimum error method assumes the histogram is a mixture of Gaussians with separate means and variances [KI86]. The chosen objective criterion minimizes the classification
error. The resulting threshold corresponds to the intersection point of the two Gaussians,
shown in Figure 3.7. This point is also known as the Bayes minimum error threshold, and
Kittler provides both exhaustive and iterative search procedures for determining it [KI86].
Frequency
Intensity
2
The minimum error method replaced the previously popular method of Otsu [Ots79] and its
fast implementation [RRK84]. It was shown by Kurita et al [KOA92] that Otsus method is
actually equivalent to the Kittlers minimum error method if each distribution in the mixture
has the same form and the same variance. An adjustment to remove a mild bias of the variance estimates due to overlapping of the distributions was provided by Cho et al [CHY89].
In the minimum error paper, Kittler et al describe the problem of encountering a homogeneous image. In this case the histogram would be unimodal, causing the optimal threshold to
always be chosen at either the extreme left or extreme right of the histogram. To some extent
this condition can be used to distinguish between homogeneous and heterogeneous images.
This is important in terms of local segmentation as a large proportion of the sub-images are
expected to be homogeneous and should be treated as such.
28
Methods to evaluate binary thresholds objectively do exist [WR78]. Sahoo et al [SSW88] use
uniformity and shape measures to compare eight different algorithms. The uniformity
measure is concerned only with minimizing the sum of intra-class variances, while the shape
measure incorporates spatial coherence of edge boundaries. Using real images, which were
not bimodal, they found the minimum error method and the entropic method to perform best.
They also point out that it would be trivial to design a new thresholding algorithm which
jointly optimizes itself with respect to the shape and uniformity measures. This severely
limits the usefulness of these objective criteria.
Lee et al [LCP90] examined five global thresholding algorithms. They used two test images
for which the correct binary segmentation was known. In addition to the shape and uniformity measures, they measured the probability of mis-classification relative to the correct
segmentation. They stated that no algorithm obviously stood out, and lamented on the difficulty of comparing algorithms across different types of data. Despite this, they decided that
Otsus method [Ots79] was the best overall. This suggests that the minimum error method
would have done as well if not better. Glasbey [Gla93] also performed a similar experiment
by generating artificial histograms from mixtures of Gaussian distributions with different
means and variances. He found the iterated form of the minimum error method to do best.
This is not surprising given the way the data was generated.
Local thresholding
Global thresholding methods use the same threshold for every pixel in an image. Difficulties
arise when the illumination of a scene varies across the image [GW92]. In Figure 3.8 the
mug has been thresholded at <D , corresponding to the most prominent valley in its histogram. The resulting segmentation is poor, because the shadow and the darker background
area have been grouped with the mug. To counteract this, a different threshold, )p|,-/ , can
be used for each pixel in (or region of) the image. This dynamic threshold could be based on
the histogram of an appropriately sized sub-image encompassing each pixel. This is known
as adaptive, variable or local thresholding [CK72].
Nakagawa et al [NR79] divide an image into non-overlapping windows. The histogram is
assumed to be a mixture of Gaussians, but the mixture parameters are estimated under the
29
Figure 3.8: (a) the mug image; (b) its histogram; (c) thresholded using PF .
assumption of no overlap. A test for bimodality (two segments) versus unimodality (one segment) is then applied. The test involves three criteria and four user supplied numbers, and is
designed to ensure good separation and meaningful variances. If the window is bimodal, the
threshold is chosen to minimize the probability of mis-classification. For unimodal windows,
the threshold was chosen by linearly interpolating neighbouring thresholds. An extension to
windows of three segments is also described, where neighbouring threshold information aids
in choosing which of the two thresholds to choose.
Local thresholding is clearly exploiting the principle of local segmentation. Obviously, any
global thresholding technique could be used for determining each local threshold. Smaller
windows increase the chance of obtaining a unimodal histogram, corresponding to a homogeneous (or extremely noisy) region. If a method is unable to detect this situation, the
threshold it computes could be nonsensical. It is important to be able to determine the number of segments present in a local window.
Multilevel thresholding
Some binary thresholding techniques can be adapted to function with more than two clusters [RRK84, KI86]. The mixture modeling approach extends easily, because it models a
histogram as a blend of distributions, choosing thresholds at the valleys between the peaks
of distributions. Equation 3.2 gives the general form of a mixture model with Q classes. The
s are the mixing weights which must sum to unity, B)/ is the distribution function for
30
)pZ/J
(3.2)
;1)p /
Often each class is assumed normally distributed with unknown mean and variance. In this
situation )pZ/ has BQ degrees of freedom. These parameters are usually estimated using
the Expectation-Maximization (E.M.) algorithm [DLR77]. The E.M. algorithm is an iterative
process which is guaranteed to increase, at each step, the statistical likelihood [Fis12] of
the data given the mixture model probability density function,
algorithm always converges, it may do so only to a local optimum. Usually, multiple runs
at different starting conditions are required to overcome this. However, greyscale pixel data
is a simple one-dimensional case, and these problems usually only occur in multivariate
situations, such as colour pixels.
The general E.M. algorithm is computationally very expensive because it calculates Q integrals per datum per step. The integrals are used to determine the partial membership of
each datum to each distribution. For the normal distribution this must be done numerically
as no closed form exists. If the distributions are well separated, it is possible to assign each
pixel wholly to one class without biasing the parameter estimates. If, additionally, the distributions are assumed normal with a common variance, the maximum likelihood criterion
optimized by the E.M. algorithm reduces to minimizing the sum of squares of each datum to
its centroid [Alp98]. This criterion may be optimized using simpler techniques.
The Q -means algorithm really refers to two different algorithms, both designed to minimize
the sum of squared errors of each datum to its cluster centroid. The simpler form, referred to
as k -means by Hansen et al [HM01], is given in Listing 3.1. Hansen et al refers to the more
complex form as Q -means [McQ67, DH73, HW79], given in Listing 3.2.
31
In k -means, every pixel is reassigned per iteration, whereas Q -means only performs one optimal re-assignment. For one-dimensional data like greyscale pixels, both algorithms should
produce the same clustering if provided with sensible starting conditions. This is especially
true when Q is low, as it is expected to be in local segmentation.
The ISODATA [RC78] algorithm is similar to Q -means but introduces various heuristics for
improving the clustering result. These heuristics include forcing a bound on the minimum
and maximum number of classes, requiring a minimum class membership amount, and restricting the maximum variance within a class. Although these heuristics must be supplied as
parameters by the user, they do allow the ISODATA algorithm to go some way to estimating
the number of classes and ensuring they have sensible properties. They also allow the user
to incorporate a priori information into the clustering process. The natural level of variation
within the image, if known, could be used to restrict the maximum variance within a class.
Fuzzy ! -means [Bez81] is a variation on Q -means which allows each datum partial membership in each cluster, similar to a mixture model. Compared to the sudden jumps during re-assignment in Q -means, the cluster centroids in fuzzy ! -means move more smoothly.
Its objective function alss differs, claiming to better suited to non-spherically distributed
data [HB88]. Fuzzy ! -means must calculate partial memberships for each datum at each
step, which, from a computational perspective, may make it unsuited to local segmentation.
32
Summary
Thresholding techniques produce segments having pixels with similar intensities. They can
handle any underlying image model as long as each intensity is associated with only one
segment. The disregard of spatial information means that pixels within a segment may not
be connected. In terms of local segmentation this may be an advantage, because the disconnected components of a local segment could actually be part of the same global segment.
Global thresholding will suffer when pixels from different segments overlap in their use of
intensities. If this is due to noise, a technique such as the minimum error method can estimate
the underlying cluster parameters and choose the thresholds to minimize the classification error. If the overlap is due to variation in illumination across the image, variable thresholding
could be used. This can be seen as a form of local segmentation.
The likelihood of encountering unimodal histograms is high when thresholding sub-images.
Within the local segmentation framework it is extremely important to be able to detect the
number of segments in the window. Most of the thresholding literature either ignores this
situation, or handles it with special parameters which must be provided by the user. It would
be better if these parameters were estimated from the image itself.
Thresholding has low space complexity. Only one pass through the image is required to
build a histogram. All further calculations are performed using only the histogram, as no
spatial information is required. For this reason its time complexity is also low. If the number
of pixels in the image is low compared to the number of possible grey levels (<T_ ),
the same benefits apply without using a histogram. Local segmentation must be applied
independently to each pixel, making thresholding an attractive option.
33
the segment. The sample mean and variance may be used to estimate
#
(#*) {
& & { uX& ){
#'&
(3.3)
34
Fishers is maximized by classes with widely separate means and low variance. Typically
would be compared to a threshold if it is lower, merging would proceed. The value of the
threshold impacts the number and size of segments accepted. Ideally, its value should depend
on the natural level of variation of pixel values in the image. From a local segmentation point
of view, Fishers criterion could useful for distinguishing between the existence of one or
more clusters in a sub-image.
35
The edge detection and region growing steps in watershed segmentation are only loosely coupled, because the elevation map can be provided beforehand. Other hybrid techniques tightly
integrate these steps, for example, Tabbs multi-scale image segmentation approach [TA97].
Tabb suggests that scale selection and edge and region detection can not be separated, and
that good segmentation algorithms effectively have to perform a Gestalt analysis. Although
this approach produces pleasing results on whole images, it is difficult to apply on a local
level where only a small number of pixels are available. His algorithm does have the admirable feature of automatically determining all required parameters from the image itself.
Relaxation labeling
Relaxation labeling can be applied to many areas of computer vision. In the past it has
been applied to scientific computing applications, particularly to the solution of simultaneous
nonlinear equations [RHZ76]. The basic elements of the relaxation labeling method are a
set of features belonging to an object and a set of labels. In the wider context of image
processing, these features are usually points, edges and surfaces. For the case of image
segmentation each pixel has a feature describing which segment it belongs to, and there is a
different label for each segment in the image.
The labeling schemes are usually probabilistic in that each pixel is partially assigned to all
segments. A pixels assignment takes the form of normalized probabilities which estimate
the likelihood of it belonging to each segment. Different techniques are used to maximize
(or minimize) the probabilities by iterative adjustment, taking into account the probabilities
associated with neighbouring features. Relaxation strategies do not necessarily guarantee
convergence, thus pixels may not end up with full membership in a single segment.
Two main strategies are used for optimizing the iteration process [YG01]. Simulated annealing, or stochastic relaxation, introduces a random element into the iteration scheme, to reduce
the chance of becoming stuck in secondary optima. Geman et al [GG84] applied this to a
local Markov random field model [Li01]. In contrast to stochastic relaxation, which makes
random changes, deterministic relaxation only makes changes which improve the configuration, and thus converges much faster. Besags Iterated Conditional Modes (ICM) [Bes86]
algorithm uses this approach for image segmentation. Due to its greedy nature, ICM does
not find a global minimum, and its results are dependent on the initial conditions chosen.
36
the diagonal entries of the matrix (similar grey levels) while the off-diagonal entries should
be made up of edge pixels (differing grey levels). By creating two separate histograms from
these groups and finding where the valley in the first matches a peak in the second, a threshold can be obtained. The interested reader is directed to the paper for further information.
This approach is not suited to small sub-images as there are too few pixels to generate a
meaningful co-occurrence matrix.
b
1
a 1
b
1
a 1
0
a b
a
b
Figure 3.9: (a) image with 3 segments; (b) co-occurrence matrix for horizontal pairs of
pixels; (c) co-occurrence matrix for vertical pairs of pixels.
Leung and Lam [LL96] introduce the concept of segmented-scene spatial entropy (SSE) to
extend the entropic thresholding methods to make use of spatial information. Thresholds are
chosen to maximize the amount of information in the spatial structure of the segmented
image. Their spatial model relates a pixel to its four immediate neighbours. This method is
too complicated to apply to small sets of pixels with little connectedness information.
An interesting approach to bi-level thresholding was taken by Lindarto [Lin96]. Traditional
entropic methods try to maximize the information needed to encode the pixel values once
the segmentation is determined. Lindarto, however, chooses a threshold which minimizes
the combined lossless encoding length (information content) of the binary segment map and
37
the original pixels. The grey level pixels are encoded using a least-entropy linear predictor [Tis94] optimized for each segment, while the bitmap is efficiently stored using the JBIG
algorithm [PMGL88]. This approach uses Wallaces minimum message length (MML) inductive inference principle [WB68, OH94] to choose the model (threshold) which minimizes
the overall compressed length of the model plus the data given the model. The versatile MML
model selection technique will be applied to local segmentation in Chapter 5.
Summary
Segmentation algorithms which use spatial information can produce better results than those
which do not. This is especially true for noisy images, where pixel intensity alone is insufficient for distinguishing between segments. However, spatial segmentation algorithms would
struggle with the low amount of spatial information available in a small window. Local
segmentation must be applied to every pixel, and hence a simple and fast algorithm is also
desirable. Thresholding techniques are therefore an attractive option for local segmentation.
38
it. In particular, it assumes that each segment in the image is well modeled using a twodimensional polynomial function. The flat facet model assumes each piece has a constant
intensity. The sloped facet model allows the intensity to vary linearly in both directions, like
a plane. The idea extends easily to higher degree polynomials. It would be possible to fit
facet models to sub-images.
There are two main problems with the facet model. Firstly, it does not provide a criterion
for deciding which order polynomial best fits the local region. It is important to strike a
balance between over-fitting the noise and under-fitting the structure. Secondly, polynomial
functions are not well suited to modeling sharp edges between arbitrarily shaped segments.
Segment boundaries are visually very important and it is important to handle them correctly.
12
15
12
12
12
11
11
12
12
12
12
15
12
12
14
12
=2
0
= 12
1
Figure 3.10: BTC example: (a) original pixel block; (b) decomposed into a bit-plane and
two means; (c) reconstructed pixels.
From a local segmentation perspective, BTC assumes that segments consist of pixels of a
common grey level separated by step edges aligned to pixel boundaries. The assumption of
two segments implies the existence of a bimodal histogram, and the use of the mean as the
threshold implies that each segment contains approximately the same number of pixels.
39
The original BTC algorithm has been modified repeatedly over the years [FNK94]. In standard BTC, the number of segments is always two. Some techniques allow the number of
segments to be varied on a block by block basis. For example, if the difference between the
two means,
%
#
only one mean are encoded. Conversely, if the block variance is very high then multilevel
thresholding can be used. This approach to automatically detecting the number of clusters is
rudimentary, but still effective. In Chapter 4, this idea will be extended to form the basis of
an effective local segmentation technique for removing noise from images.
Some BTC variants replace thresholding with spatial techniques designed to better preserve
image structure. Gradient based BTC [QS95] and correlation based BTC [Cai96] choose
a different threshold for each pixel in the block. Each threshold depends on neighbouring
gradient and pixel difference information. This itself is a crude form of relaxation labeling.
For most images the gain is minimal, suggesting that spatial information may be of limited
value when segmenting small regions.
3.3.3 SUSAN
SUSAN [Smi95, Smi96, SB97], much like the local segmentation framework of this thesis,
evolved to be a general approach to low level image processing. One of SUSANs design
constraints was efficiency, so that it could be run on real time data in a robotic vision system.
SUSAN was originally designed for edge and corner detection, but was also adapted for
structure preserving denoising. The denoising component differs a little from the SUSAN
framework presented here, and is treated separately in Section 3.4.5.
SUSAN processes greyscale images in a local manner using an approximately circular window containing 37 pixels. The centre pixel in the window is called the nucleus. Neighbouring
pixels similar in intensity to the nucleus are grouped into an USAN, or Univalue Segment Assimilating Nucleus. The USAN creation process may appear to be a form of region growing
using the nucleus as a seed, except that pixels in the USAN are not necessarily connected.
The formation of an USAN is more related to clustering than segmentation.
The pixel similarity function is controlled by a brightness threshold, . Smith first describes
the USAN as accepting only pixels with an intensity difference of units from the nucleus.
40
This can be considered crisp clustering, where similar pixels receive full weight, and dissimilar pixels receive no weight. The size of the USAN is equal to the sum of weights given
to pixels in the window, which for crisp clustering, equals the number of pixels assimilated.
Smith, however, found a fuzzy membership function to give better feature detection. This
fuzzy function was optimally derived to have the form
,
.-0/*1325476
, where
8
is the intensity
difference from the nucleus. Figure 3.11 compares the crisp and fuzzy membership functions
as a function of
8
. When the fuzzy form is used, all pixels have partial contribution to the
size of the USAN. This is no longer an explicit segmentation of the local neighbourhood.
Cut-off function
Fuzzy / Soft
Crisp / Hard
USAN weight
0.8
0.6
0.4
0.2
10
15
20
25
30
35
40
Figure 3.11: Comparison of the hard and soft cut-off functions for pixel assimilation in the
SUSAN algorithm.
To perform feature detection, the USAN size for each pixel is plotted. On this surface homogeneous regions correspond to plateaus, edges to valleys, and corners to deeper valleys.
SUSAN processes this USAN surface to determine the directions and strengths of edges and
the positions of corners. Smith finds the SUSAN feature detector to give results on a par
with Cannys famous edge detection algorithm [Can86]. Smith states that SUSAN is suited
to images with segments of smoothly varying intensity and both step and ramp edges.
3.4 Denoising
41
The main difficulty with SUSAN is that its performance is dependent on a user supplied
brightness threshold, . Smith claims that SUSAN is relatively insensitive to the choice of ,
and advocates a default of 20. He states that may be used to vary SUSANs sensitivity to
features. The value of really controls the minimum feature contrast that can be detected.
My experiments show that should be proportional to the natural level of intensity variation
expected within image segments. This is the only way for the USAN to assimilate pixels in
homogeneous regions and not produce false edge output.
3.4 Denoising
Image denoising is the process of removing unwanted noise from an image [Mas85, BC90].
The noise can take a variety of forms and is introduced in differing amounts at each step
during the acquisition of the image. The literature abounds with denoising techniques, but
they may be classified broadly into temporal and spatial approaches.
samples, ; ;~:9 , are taken, the temporally filtered image, =? , can be
= ?
i
(3.4)
If each pixel in = was corrupted by the addition of symmetric zero-mean noise of variance
& { , then for any one pixel the expected noise variance, @)+=C/ { , is & { . However, the
expected noise for the ensemble average, =? , reduces to & { xi . If the noise was not additive
in nature, for example impulse noise, then a better averaging function, such as the median
(discussed later), should be used.
Temporal filtering is ideal if multiple version of the same image can be obtained. In practice
this does not usually happen, because the objects in the scene move, or the capturing equip-
42
ment wobbles. Even slight variations can displace the pixels in each sampled image, causing
the ensemble average to become a blurred version of the original.
In this thesis it is assumed that only a single noisy version of an original image is available,
so temporal filtering of the form in Equation 3.4 can not be used. However, it will be shown
how overlapping local segmentation results for the same pixel coordinate can be combined
to suppress noise in a fashion similar to temporal filtering.
Figure 3.12: Application of the box filter: (a) original; (b) noisy; (c) filtered.
3.4 Denoising
43
Although an averaging filter performs well for additive noise in homogeneous regions, it
tends to blur edges and other image structure in heterogeneous regions. Figure 3.13 gives an
example of this undesirable behaviour. To combat this deficiency, most research is concerned
with structure preserving denoising algorithms.
Figure 3.13: Box filtering in the presence of edges: (a) original image; (b) filtered.
To preserve image structure while removing noise implies the ability to distinguish between
the two. It will be shown in later chapters that local segmentation is a good basis for modeling the relationship between structure and noise in an image. Segments correspond to
homogeneous regions and boundaries between segments are structure that needs to be preserved. Existing spatial denoising algorithms can be examined in terms of the way they use
local segmentation. Some use no local segmentation at all, some use it implicitly, and some
explicitly use a local segmentation model.
u# ,
the operation of a linear filter is described by Equation 3.5, where the weights, (T)/ , are
assumed to sum to one.
? )|,-/0
9
;=<
9 =; > 9
(3.5)
arrangements of the weights used by the linear filter. Interestingly, McDonnell [McD81]
44
developed a recursive algorithm which can implement box filtering with only five operations
per pixel, independent of i .
1 1
1 1
1 1
1
1
1
BE
1 2
2 4
1 2
1
2
1
Gaussian filters use normalized weights of the form in Equation 3.6. This filter varies the
weights such that those pixels further from the centre contribute less to the overall average.
The spatial extent of a linear Gaussian filter is determined by & , usually chosen to be around
)
1iPuj/~xF , as to be mostly encapsulated by the bounding square. Figure 3.14b illustrates the
configuration of a
Gaussian filter in this case. Gaussian filters are sometimes preferred
over box filters because their effect on the Fourier spectrum of the image is better [GW92].
JB<CK L JB>K
(3.6)
KM K
From a local segmentation point of view, each linear filter weight may be interpreted as
being proportional to the believed probability that the pixel is related to the centre pixel.
Relatedness really means in the same segment. The centre pixel usually has the highest
weight, as it should be the best estimator of its own original value when the noise is additive.
If the centre pixel is at coordinate )|,-/ , and another pixel is at )p7uN? ,u(? / , the implied
probability that they are in the same segment is given by Equation 3.7.
(T)U? ,C? /
(*)+-,./
(3.7)
For example, the box filter trusts all the pixels in the window equally, giving them probability
one of being from the same source. It has been stated that the correlation between pixels is
often on the order of 0.9 to 0.95 [NH88]. From this point of view, the coefficients of the
box filter are quite reasonable. For the approximate Gaussian filter in Figure 3.14b, the
side pixels are considered
Bx1PV-4 correlated to the centre pixel, and the corner pixels
3.4 Denoising
45
x1RE
B correlated. This may be considered a primitive form of local segmentation, but
increases. A positive
feature is that smoothing is very good in homogeneous regions corrupted by additive noise,
reducing the noise variance by up to a factor of )
Fiu/ { . What is required is a method to
distinguish between homogeneous and heterogeneous regions, to get the benefits of smoothing without the disadvantage of blurring edges.
-9 4
, where
B - 4 - U 4 . For greyscale images, the ordering is determined by pixel intensity, but there
is no obvious analogue for higher dimensional data such as colour pixels. Manipulation of
the ordered set of data originated as a way to improve the statistical robustness of traditional
estimates (such as the mean) when the data was contaminated [MR74, Hub81]. Digital
images are often contaminated by noise. Thus, to some, it seemed natural to apply rank
techniques to image denoising.
The median
The simplest rank filter is the median filter. The median is the middle datum in a sorted
data sequence [Tuk71]. Just as the mean minimizes the average squared error, the median
minimizes the average absolute error. If the number of pixels is odd, the median is uniquely
defined. When it is even, there are an infinite number of solutions between the two middle
pixels. In this case, the average of the two middle values is usually taken as the median.
Equation 3.8 summarizes the calculation of the median, XW:Y , of
XW:YA
L \ 4
[- Z ]
_
K
- Z 4 uA - Z U4a`
{
K
K
^Q Q
ScbPSO
pixels.
(3.8)
46
Figure 3.15 gives an example of the median applied to a sub-image. The pixels values
are homogeneous with a small amount of additive noise. In this situation the median would
choose the filtered value to be 6. The mean, d , of these pixels is 5.78, which would usually be
rounded to 6. When the noise is additive and symmetric, the mean and median will usually
agree on the filtered value, as both estimate the centre of a distribution.
6 5
5 7
4 6
8
6
5
(455566678)
XW:YM
d I 4F
Consider the situation in Figure 3.16, whereby the centre pixel from Figure 3.15 has been
contaminated by impulse noise. The pixel values are no longer homogeneous. A box filter
would produce a filtered estimate of 13.1, the outlying pixel value 73 having pulled the mean
away from the true centre. The breakdown point of the median is 50% because it is still able
to produce useful estimates when up to 50% of pixels are contaminated. In this case, the
median is 6 again a much better estimate than the mean.
6 5
5 73
4 6
8
6
5
( 4 5 5 5 6 6 6 8 73 )
XW:YM
d -m
Figure 3.16: Application of the median to a homogeneous block with impulse noise.
In the previous two examples the pixel blocks were meant to be homogeneous. When the
noise is additive, the mean and median behave similarly. In the presence of impulse noise
the median is superior to the mean, as it can cope with up to 50% contamination. A good
structure preserving filter must also function well in the presence of edges. Figure 3.17 shows
a pixel block containing two noiseless segments. As expected, the mean averages across
the edge. The median correctly chooses 3 as its filtered estimate, making it appear to have
special structure preserving abilities.
Figure 3.18 gives a similar example, but now the centre pixel is on the corner of a noiseless
segment. The median in this case is 8 the value belonging to the alternate segment.
This illustrates that the median is implicitly performing primitive local segmentation. In the
3.4 Denoising
47
3 3
3 3
3 3
8
8
8
(333333888)
XWeYA
j
d $N
presence of two distinct segments, the median chooses a pixel from the segment containing
the majority of the pixels from the window. For a window, this decision rule fails
whenever the intersection of the local window with the centre pixels segment consists of
fewer than five pixels. When the window contains more than two segments, the medians
behaviour is a more complicated function of the segment populations.
3 3
3 3
8 8
8
8
8
(333388888)
XWeYA
j
d # 4B
Figure 3.18: The median (incorrectly) implicitly segmenting a noiseless corner block.
ure 3.18. Figure 3.19 shows that when the centre weighted median is used, the corner pixel
is correctly filtered as 3. The duplication of the centre pixel simply improves the probability
that the centre pixel will belong to the majority segment. For the plain median filter, the centre pixel needs to be in a segment having at least f ixB
hg members, where f ig denotes round-
48
ing to the nearest integer. The weighted median reduces this requirement to f )fi
P!/~xF
hg
pixels. This improvement comes at the cost of reducing its breakdown point.
3
3
8
3 8
3 8
8 8
(33333388888)
XW:YM
d I 4B
Figure 3.19: Using the centre weighted median (jb ) on a noiseless corner block.
Figure 3.20 illustrates the application of the median and centre weighted median filters to
a greyscale image. Despite their obvious failure on such a basic, albeit contrived, example,
these median based filters continue to be used as components in image processing systems.
Many extensions to the median have been suggested to overcome its defects [Tag96, GO96,
EM00]. These techniques suffer from not seriously attempting to diagnose the structural
features present within the local region [KL91], or require sets of parameters to be estimated
from a training set [CW00].
Figure 3.20: (a) original image; (b) } median filtered; (c) weighted median, jP .
The limitations of median based approaches are not due to the median operator itself, but to
its misapplication to heterogeneous data. The time spent on generalizing the median to two
dimensional image data may have been better spent concentrating on modeling the underlying image from a local segmentation point of view. Local segmentation would suggest that
pixels should first be appropriately segmented into homogeneous groups. The median could
then be applied just to the pixels belonging to the segment containing the centre pixel.
3.4 Denoising
49
1. Pixels that are spatially close are more likely to be in the same segment.
2. Pixels that are similar in intensity are more likely to be in the same segment.
Adaptive denoising algorithms apply these principles in order to use only those pixels in
the same segment as the centre pixel for denoising. This grouping can be seen as a partial
step of local segmentation, whereby a homogeneous segment involving the centre pixel is
formed. Perhaps surprising is that the same two principles are also the basis for most global
image segmentation algorithms. This suggests that segmentation and denoising are highly
related tasks. In Chapters 4 and 5, this fact will be used to develop local segmentation based
denoising algorithms.
50
Lowest Variance
Figure 3.21: The Kuwahara filter considers nine regions within a } window.
Despite its failings, the Kuwahara technique illustrates two important points. Firstly, the case
where the pixel being processed is at the centre of the window is not the only useful case;
each pixel participates in many overlapping windows. Secondly, the Kuwahara filter chooses
the best fitting model from a set of candidate models using an objective criterion. Both these
ideas are exploited by local segmentation denoising algorithms in Chapters 4 and 5.
{
%
(3.9)
There are two variants of the GIWS filter: the first includes the centre pixel in the weighted
average, while the second one does not. If the centre pixel is included it will always be given
3.4 Denoising
51
a weight of 2. If not included, GIWS becomes resistant to single pixel impulse noise [Hig91].
If the image is unaffected by impulse noise, the former variant should be used, as it will give
more smoothing in the presence of additive noise.
Figure 3.22 gives an example of GIWS applied to a noiseless
corner block. The resultant
weights are higher for those pixels in the centre pixels segment (intensity 3) than those in
the other segment (intensity 8). The smoothed estimate k is 3.56 when the centre pixel is
included in the linear combination, and 3.71 when omitted. Ideally the estimate should be
3, but in both cases it has been polluted with values from the other segment. The extent to
which this occurs may be controlled by modifying the
(a)
3 3
3 3
8 8
8
8
8
2
2 0.2
2
2 0.2
0.2 0.2 0.2
k #-4F
(c) j
Figure 3.22: Gradient inverse weighted smoothing (GIWS): (a) original pixels; (b) computed
weights, unnormalized; (c) smoothed value when centre pixel included.
The failure of GIWS in Figure 3.22 may be attributed to its refusal to make absolute decisions about which pixels belong together. Its soft cut-off function, shown in Figure 3.23,
allows information from neighbouring segments to diffuse into the current segment. A local
segmentation approach would suggest the use of a hard cut-off to explicitly separate pixels
from different segments. The cut-off point would be related to the natural level of variation
within image segments. For the example in Figure 3.22, a hard-cut off function would have
filtered the block exactly, given any reasonable cut-off point.
Sigma filter
Lees Sigma filter [Lee83] is based on the alpha-trimmed mean estimate for the centroid of a
data set [RL87]. Firstly, the standard deviation, l , of the pixels in the window is calculated.
The filtered value is then set to the average of those pixels within
]l of centre pixel. If the
number of pixels taking part in the average is too small, Lee recommends taking the average
of all the non-centre pixels. Lee claims that the Sigma filter performs better than the median
and GIWS with the centre pixel included.
52
2
1.8
1.6
1.4
Weight
1.2
1
0.8
0.6
0.4
0.2
0
10
15
20
Pixel difference
Figure 3.23: Pixel weighting function for the GIWS adaptive denoising algorithm.
Figure 3.24 demonstrates the Sigma filter on a corner block. The standard deviation of the
block is 2.5, resulting in the averaging of all pixels within 5 intensity units of the centre pixel.
Unfortunately, all the pixels are in this range, causing the Sigma filter to revert to a box filter.
This produces a blurred estimate of k IF .
3 3 8
3 3 8
8 8 8
&2I
F
j
k nmcoPp D q oPr 1
The Sigma filter tries to average only those pixels similar enough to the centre pixel. This
is an explicit form of local segmentation using a hard membership function. The similarity
measure uses the local variance to control the maximum intensity deviation a pixel may have
from the centre pixel while still being considered part of the same segment. In terms of
local segmentation, the local variance is an estimate of the natural level of variation within
3.4 Denoising
53
segments. In homogeneous regions it is a good estimate, but around edges it will be too high.
Thus the variation estimate is poor for those cases where its value is crucial, and acceptable
for those cases where it is not overly important.
The local variance is very sensitive to contamination from sources such as impulse noise. To
a lesser extent it also suffers from the small amount of pixel data used to calculate it. The
Sigma filter could be improved by replacing the local variance with a globally estimated or a
priori supplied natural level of variation. This is the approach taken by the local segmentation
based denoising algorithms of Chapters 4 and 5. The fact that the Sigma filter also falls back
to an overall average when too few pixels are available for averaging implies that it does not
allow segments to consist of a single pixel.
Anisotropic diffusion
Anisotropic diffusion [PM90, BSMH98] is a highly iterative process which slowly adjusts
pixel intensities to make them more like their neighbours. In its original form it has been
shown to be very similar to GIWS [Wei97]. Equation 3.10 describes the anisotropic diffusion
process: is the iteration number;
nearest ones;
)p|,~/0 2 )p|,~/u
%A)f 2 )p ? , ? /X 2 )p|,-/,& /
4atPu
(3.10)
The % function is the most important influence on behaviour of the diffusion process. Equation 3.11 lists the three most commonly used functions. Figure 3.25 plots these three functions with respect to the intensity difference, v , for &2 . Note that the functions have been
scaled to fit them together. Only the relative weights for a particular function are meaningful.
w
x
.PO QFdQ
BdSOFy
}~
S
%)Bv,&|/
Pv
%)Bv,&|/
{{z
{{ | K z K
{
v)"<) z / { /
%)Bv,&|/
(3.11)
m
v
&
SS
54
160
function
Standard
Lorentz
Tukey
140
Relative weight
120
100
80
60
40
20
0
10
15
20
25
30
The Standard formulation forces a pixel value toward the intensity of each neighbour, at a
rate directly proportional to its intensity difference from that neighbour. Equation 3.12 shows
that Standard diffusion can be interpreted as static linear filter. Equal averaging of all five
pixels in the local window occurs when
)p|,~/
#- .
N 2 )p ? ,~ ? /X 2 )|,/
- . 4tPu
$
2
)"b
~/ )|,-/u
2 p) ? ,~ ? /
- 4 t]u
)p|,-/u
(3.12)
(3.13)
The problem with Standard diffusion is that, if left for many iterations, the denoised image
would become completely homogeneous. In a fashion similar to the GIWS in Section 3.4.5,
all neighbouring pixels receive some weight, no matter what intensity they have. Therefore,
information from unrelated segments can diffuse across segment boundaries.
3.4 Denoising
55
The Lorentz formulation attempts to reduce pixel leakage between segments by slowing the
diffusion process at some point, defined by the parameter & . This parameter provides a
sense of how different pixel values can be while still being in the same segment. The plot
of the Lorentz weighting function in Figure 3.25 reaches a maximum when G
v
1& , and
decays thereafter. The decay is quite slow, so a large difference will still be given a significant weight. Thus pixels from different segments can still influence each other, eventually
producing a homogeneous image.
The Tukey formulation takes the Lorentz idea further and completely shuts down diffusion
when v $& , giving maximum weight when vI&|x
. A neighbouring pixel value deviating more than & intensity levels from the centre pixel will have no influence whatsoever. In
local segmentation, the & parameter may be considered proportional to the natural level of
variation of pixel values within a segment. For the simple case of constant intensity segments
with additive noise, this would relate to the standard deviation of the noise. The use of the
Tukey function ensures that the final image will never become a homogeneous image, with
the side-effect of being unable to preserve contrast differences less than & .
The
be
made small and the number of iterations large, as to more closely approximate the theoretically continuous diffusion formulae. It must be noted that the diffusion process is very slow,
with thousands of iterations being common. This also usually requires that all intermediate
results be kept to floating point accuracy.
Adaptive smoothing
Adaptive smoothing [SMCM91] claims to be strongly related to anisotropic diffusion and
GIWS. It was designed to denoise both intensity and range images. Instead of using the
intensity difference to weight neighbouring pixels, a more thorough estimate of the gradient
is used, shown as the
(T)U? C, ? /G,
<
>
R L J < L J >
KM K
(3.14)
56
From a local segmentation perspective, the gradient term discourages the incorporation of
pixels from different segments, as the gradient between them would cross a segment boundary and therefore be very high. Unfortunately, gradient functions are usually too coarse too
accurately control smoothing around fine image details [Smi95].
SUSAN
SUSAN was introduced in Section 3.3.3 in regard to its use of local segmentation principles
for edge and corner detection. SUSAN gathers the set of neighbouring pixels which are
most similar in intensity and closest in position to the centre pixel. This set is called the
USAN. The topographic pattern revealed by plotting the size of each USAN may be used to
determine the position of edges. Smith realized that a weighted average of the pixels within
an USAN form a denoised estimate of the centre pixel [Smi95, Smi96, SB97]. Unlike the
denoising algorithms described so far, SUSAN exploits both the similarity of pixel intensities
and their spatial proximity to the centre pixel. Equation 3.15 describes the resulting SUSAN
weighting function, where & is a spatial threshold and is a brightness threshold. The centre
pixel is excluded from the average.
(*)U? C, ? /GI,
<
>
< >
J C< K L J >K L J < L J >5 K
KM K
"m
(T)f-,.B/w
(3.15)
Due to its exponential-of-sums form, the weighting function is separable into its spatial and
intensity components. Ignoring that fact that the centre pixel receives no weight, the spatial
component is an isotropic 2-D Gaussian distribution with variance & { . Figure 3.26 plots
this function for &
SUSAN uses a 37 pixel circular
mask (see Figure 4.46e). The choice of &G prevents
the Gaussian spatial weighting function from extending very far. Thus the weights for the
closest and furthest pixels differ by less than a factor of two when using a 37 pixel mask.
The intensity difference component is the positive half of a Gaussian function with variance
{ xB
. Figure 3.27 plots this function for
images [Smi95]. SUSANs Gaussian intensity weighting function effectively cuts off after
2
3.4 Denoising
57
-3
-2
-1
x
3 -3
-2
-1
2
y
Figure 3.26: The SUSAN spatial weighting component when L , ignoring that X]h .
, which is
^
F . SUSANs intensity weighting is much closer to a hard cut-off than the
E
4 was chosen, resulting in a good filtered value
of k I-m; . The SUSAN algorithm is very sensitive to the choice of . If the default values
had been used, the denoised estimate would have been about 6. If the slightly lower
Y
was used, it would have been 3.02. If is chosen poorly, it seems that SUSAN introduces
noise into its filtered estimate. My experiments have shown that should be proportional to
the natural level of variation within the image. For the case of additive zero-mean Gaussian
noise with variance l { , setting $l works well.
When denoising, SUSAN gives no weight to the centre pixel. This has the advantage of
helping to suppress impulse noise, but the disadvantage of reducing the amount of smoothing when the centre pixel is legitimate. SUSAN has an additional mode which has not been
58
0.03
0.025
Weight
0.02
0.015
0.01
0.005
10
15
20
25
30
35
40
45
50
Intensity difference
3
3
8
3 8
3 8
8 8
j
k #-m;
Figure 3.28: SUSAN denoising: (a) original pixels; (b) spatial weights, unnormalized, LP ;
(c) intensity difference weights, unnormalized, PF ; (d) denoised value.
mentioned yet. When the weights for all the neighbouring pixels are too small (corresponding to a weak USAN), SUSAN ignores them and uses the median of its eight immediate
neighbours instead. In terms of local segmentation, this mode is invoked when no pixels
appear to belong in the centre pixels segment.
SUSANs denoising outperforms the box filter, the median, GIWS, and the Saint-Marcs
adaptive smoothing method [Smi95]. It takes the intensity weighting idea from GIWS, but
modifies it to behave more like a hard cut-off function. Although the intensity weighting
function is continuous, it effectively cuts off to zero for differences greater than
or so.
3.4 Denoising
59
also helps it to perform better. The main drawback is that the brightness threshold must
be supplied. Despite this, SUSAN is one of the best and fastest local denoising algorithms
available, in part due to its implicit application of the local segmentation principle.
, those
neighbouring (spatially connected) basins with means considered close enough to s mean
are incorporated into . The pixels in are then set to the aggregated mean. The criteria for
closeness can vary. In the early paper [HH95], the closest 50% neighbouring basin means
are used, whereas later [HH99], only those means within some difference threshold, , were
incorporated. They claim to outperform various filters, including anisotropic diffusion. The
60
only test image used was a simple 3-class piece-wise constant synthetic image containing
very high contrast edges corrupted by additive zero-mean Gaussian noise with variance F { .
They admit their technique depends on the choice of , which is related to the noise level.
The challenge is to estimate accurately the natural intensity variation one might expect to
observe within each segment.
3.5 Conclusions
Image segmentation algorithms attempt to partition an image into separate groups of related
pixels, called segments. Pixels are usually considered related if they have similar values,
or are located near each other. Clustering is a form of segmentation which ignores spatial
information. Segmentation is considered and essential component of image understanding
systems. Although segmentation of arbitrary images is inherently difficult, the principles
described in this chapter have been applied successfully to many tasks.
Local image processing algorithms assume that most of the information about a pixel can be
found within the small neighbourhood of pixels surrounding it. Local algorithms are often
used for low level tasks, such as denoising and edge detection. The local neighbourhood may
be considered a tiny image, but with different expected properties. It contains fewer pixels,
is made up of fewer distinct regions, and has a higher proportion of pixels in the margins.
Segmentation is a fundamental image processing task. Pixels within segments are homogeneous with respect to some predicate, and hence can be used to denoise the image while
preserving structure. The boundaries between segments correspond directly to discontinuities in the image, hence edge detection is achievable in the same framework. It would
seem obvious to apply these ideas directly to the sub-images used in local image processing.
Strangely, much of the literature seems to treat local techniques differently, preferring to use
ad hoc methods instead, which usually require image dependent parameters to be supplied,
or estimated from training sets.
Image denoising algorithms are a good basis for examining the state of the art for modeling
images on a local level. The SUSAN algorithm is currently one of the best local, one-pass
image denoising algorithms available. During its initial development, SUSAN used a form
3.5 Conclusions
61
of explicit local segmentation. Pixels similar enough to the centre pixel were assimilated
using a hard cut-off function, and then averaged. But rather than refine the local segmentation criterion used, SUSAN moved to an implicit local segmentation, admitting all the local
pixels to the local segment, but allowing their contribution to vary. This formulation made it
difficult to link the brightness threshold parameter to a specific image model.
This thesis advocates a progression toward an explicit local segmentation model, via the
suitable application of global segmentation techniques to local image data. The local segmentation principle states that the first step in processing a pixel should be to segment the
local region encompassing that pixel. The segmentation algorithm should automatically determine the number of segments present in an efficient manner, and any parameters should be
estimated from the image itself. The information obtained from local segmentation should
sufficiently separate the noise and structure within the region. The structural information
may then be used to achieve goals such as denoising and edge detection.
Chapter 4
Local Segmentation applied to
Structure Preserving Image Denoising
4.1 Introduction
The local segmentation principle may be used to develop a variety of low level image processing algorithms. In this chapter it will be applied to the specific problem of denoising
greyscale images contaminated by additive noise. The best image denoising techniques attempt to preserve image structure as well as remove noise. This problem domain is well
suited to demonstrating the utility of the local segmentation philosophy.
A multilevel thresholding technique will be used to segment the local region encompassing
each pixel. The number of segments will be determined automatically by ensuring that the
segment intensities are well separated. The separation criterion will adapt to the level of additive noise, which may be supplied by the user or estimated automatically by the algorithm.
The resulting segmentation provides a local approximation to the underlying pixel values,
which may be used to denoise the image.
The denoising algorithm presented is called FUELS, which stands for filtering using explicit
local segmentation. FUELS differs from existing local denoising methods in various ways.
The local segmentation process clearly decides which pixels belong together, and does so
democratically, without using the centre pixel as a reference value. If the computed local
63
64
approximation suggests changing a pixels value by too much, the approximation is ignored,
and the pixel is passed through unmodified. The fact that each local approximation overlaps
with its neighbour means that there are multiple estimates for the true value of each pixel.
By combining these overlapping estimates, denoising performance is further increased.
FUELS will be shown to outperform state-of-the-art algorithms on a variety of greyscale
images contaminated by additive noise. FUELS worst case error behaviour will be shown
to be proportional to the noise level, suggesting that it is quite adept at identifying structure
in the image. The denoised images produced by FUELS will be seen to preserve more image
structure than algorithms such as SUSAN and GIWS.
These images are part of the Range Image Segmentation Comparison Project at the University of South
Florida. http://marathon.csee.usf.edu/range/seg-comp/images.html
65
This is an example of a range image, for which the light source is actually a distance
measurement device. It has been shown that pixel intensities in a range image usually vary
in a spatially linear manner, due to the polyhedral nature of most objects [KS00]. However
this may not be an appropriate assumption to make when analyzing low resolution fingerprint
images in a criminal database. There one would expect many ridges and high contrast edges.
This thesis focuses on greyscale light intensity images. The generalization to other types of
images is considered in Chapter 6.
Figure 4.1: Two images of the same scene: (a) light intensity; (b) range, darker is closer.
is a constant. Each pixel in a constant facet is assumed to have the same value.
For greyscale data this would be a scalar representing the intensity, but for colour data it
would be an RGB vector. An image containing constant facets referred to as being piece-
66
Figure 4.2: One dimensional polynomial approximation: (a) the original signal; (b) constant;
(c) linear; (d) quadratic.
wise constant. Piece-wise constant image models are commonly used in image processing.
They only have one parameter to estimate, and are simple to manipulate.
First order polynomial approximation in two dimensions has the mathematical form of a
plane, namely w)p|,~/JruLu! . An image containing facets of this type is piece-wise
planar. Pixel values in a planar facet are linearly dependent on the their position within
the facet. Planar facets are more flexible than constant facets, but at the expense of having
needing three parameters to be estimated for them. If and ! are small enough, and we are
concerned only with a small area within a larger planar facet, then a constant approximation
may be sufficiently accurate.
67
Figure 4.3: Discrete sampling of an object aligned to the pixel grid: (a) original scene;
(b) superimposed sampling grid; (c) digitized image.
Figure 4.4 shows what occurs when an object in a scene does not align exactly with the sampling grid. The sampling process has produced a range of pixel intensities in the digitized
image. Each pixel on the object boundary has received an intensity which is a blend between the intensities of the two original segments. In fact, there are now seven unique pixel
intensities compared to the original two.
Figure 4.4: Discrete sampling of an object mis-aligned with the pixel grid: (a) original scene;
(b) superimposed sampling grid; (c) digitized image.
68
Most observers would still assert the existence of only two segments, but would have some
difficulty assigning membership of each boundary pixel to a specific segment. One interpretation is that those pixels with intermediate intensities have partial membership to both
segments. If an application requires a pixel to belong only to one segment, that segment
in which it has maximum membership could be chosen. Alternatively, it could be claimed
that the image is still piece-wise constant, but now consists of ten segments. These differing
interpretations highlight the fact that an image model for a continuous scene may no longer
apply to its digitized counterpart.
Figure 4.5: (a) step edge; (b) line; (c) ramp edge; (d) roof.
The step edge defines a perfect transition from one segment to another. If segments are piecewise constant and pixels can only belong to one segment, then a step edge model is implicitly
being used. If a segment is very narrow, it necessarily has two edges in close proximity. This
arrangement is called a line. An arguably more realistic model for edges is the ramp edge. A
ramp allows for a smoother transition between segments. This may be useful for modeling
the blurred edges created from sampling a scene containing objects not aligned to the pixel
grid. Two nearby ramp edges result in a line structure called a roof.
Edge profiles may be modeled by any mathematical function desired, but steps and ramps
are by far the most commonly used. If a ramp transition occurs over a large number of pixels,
69
it may be difficult to discriminate between it being an edge, or a planar facet. If pixels along
the ramp are assigned to a particular segment, the values of those pixels may be dissimilar to
the majority of pixels from inside the segment.
its intensity and spread, and = is the original image which may be required if the noise term
is data dependent. This formulation has scope for a huge number of possible noise functions.
(4.1)
w)p|,0+ ,w=/
A simple, but still useful and versatile noise model is additive zero-mean Gaussian noise
which is independently and identically distributed (i.i.d.) for each pixel. Under this model
the noise adds to the original pixel value before digitization. The noise term may be written
like Equation 4.2, where 698:)=#w,& { / denotes a random sample from a normal distribution
of mean # and variance & { . Figure 4.6 plots the shape of this noise distribution when & { O .
w)p|,T& { /0698
)f ,& { /
(4.2)
Because the noise is additive and symmetric about zero, it has the desirable effect, on average, of not altering the mean intensity of the image. It only has one parameter, the variance
& { , which determines the spread or strength of the noise. Although the work in this thesis
assumes that the noise variance is constant throughout the image, it would be possible to
vary it on a per pixel basis. This and other extensions are discussed in Chapter 6.
Consider a constant facet containing pixels with intensity 5 . After 8
it is expected that 99.7% of pixels will remain in the range 5T3F& . This is called the 1&
Pr(x)
70
-4
-3
-2
-1
confidence interval for 5 [MR74]. Examination of Figure 4.6 shows that very little probability remains for values of 5 outside the confidence interval. Table 4.1 lists the number of
standard deviations from the mean for which a given proportion of a normally distributed
data is expected to lie.
Fraction of
data (%)
50.0
68.3
90.0
95.0
95.4
98.0
99.0
99.7
Number of standard
deviations from mean
0.674
1.000
1.645
1.960
2.000
2.326
2.576
3.000
Figure 4.7 shows the effect of two different noise variances on the square image, which
has segments of intensity 50 and 200. When &
When & is quadrupled, the square obtains some pixels clearly having intensities closer to the
original background intensity. After clamping, the F& confidence interval for the background
is FR
17E)f-,;F/ , and
1BR
FCE)+B-,
FBB/ for the square. These limits have an overlap
71
of 90, or approximately
1& . On average, this situation would result in about 5% of pixels
being closer to the opposing segment mean. For the B"R square image this would affect
about 5 pixels, roughly corresponding with what is observed.
Figure 4.7: The effect of different Gaussian noise levels: (a) no noise; (b) added noise
L. ; (c) added noise .
For piece-wise constant segments, the noise standard deviation defines a natural level of
variation for the pixel values within that segment. The natural level of variation describes
the amount by which pixel values may vary while still belonging in same segment. For the
case of planar segments, the natural level of variation depends on both the noise level and
the coefficients of the fitted plane. If a global planar segment only has a mild slope, then
the variation due to the signal may be negligible for any local window onto that segment.
The natural level of variation in the window will be dominated by the noise rather than the
underlying image model.
, which
comes to _<I
B1 for 8 bpp images.
Consider the one-dimensional signal drawn with a dotted line in Figure 4.8. Quantization
to 8 levels produces the signal plotted with the solid line. Due to the rounding processing,
72
a quantized value, 5 , could have originally had any value in the range 5 $-4 ,.5 uI-F/ .
This quantization noise may be approximated using the standard deviation of a uniform
distribution of width 1, which is z x
, or about 0.29. Usually the other forms of noise are
at least an order of magnitude higher than this, so quantization noise can safely be ignored
by most applications.
Intensity
Original
7
Quantized
6
5
4
3
2
1
Position
0
Figure 4.8: Effect of quantizing the signal intensity to 8 levels.
A situation where one may not wish to ignore quantization noise is when magnifying an
image by interpolating pixel values. By default, the midpoint of the quantization region
would be used as the true value of the pixel, but it is possible to choose the true value from
anywhere within the original quantization interval. The most appropriate value could be
decided by optimization of a suitable cost function, for example, to maximize the sharpness
of the interpolated image.
? )p|,~/0+w)|,/uA8
(4.3)
73
After quantization there is a final clamping step to ensure the pixel values fall within the
legal range. Equation 4.4 describes the clamping process, which takes the noisy quantized
image, =@?0 , and produces the final image, =;? , available to the image processor.
? )p|,-/0!=TG1)|,/Z
_
? )|,-/
? )|,-/
? )|,-/_
(4.4)
FRS@dy"S
To that end a global image model with the following properties will be assumed:
An image model assuming piece-wise constant greyscale segments with additive noise is
probably the simplest, but still useful, image model to use. In Chapter 6, the extension of the
principles outlined in this chapter to more complex image models will be discussed.
74
processing pixels inhabiting the margins of the image, as these pixels have incomplete neighbourhoods. Figure 4.9 illustrates this situation for w)f-,/ when using a window. The 5
pixels within the dotted window labelled ? are denoted as missing, and the remaining 4 are
denoted as available pixels.
?
?
Figure 4.9: Incomplete neighbourhoods when processing pixels at the image margins.
is used to handle
negative coordinates.
Nearest neighbour Missing pixels take on values equal to their nearest available neighbour.
Average fill The missing pixels are set to the arithmetic mean of the available pixels.
Specialization The algorithm is specifically modified to handle the special cases where the
neighbourhood contains fewer pixels than normal.
The constant fill and wrap around methods are clearly inferior because they invent pixel
values bearing little relevance to the available pixels. The specialization method is desirable, but even when using a window there are eight special cases to handle (four sides
plus four corners), which can make it difficult to implement efficiently.
75
Figure 4.10 pictorially describes the nearest neighbour approach. This technique is simple
to implement, and possesses the ability to preserve certain structure in the image. For example, if the available pixels near the margin form two separate populations, the extrapolated
neighbourhood would too. The disadvantage is that some pixel values, such as the corners,
are duplicated more than once, and hence over-represented in the extrapolated set. This can
bias the results, especially if the available pixels are very noisy.
Figure 4.10: Missing pixels are easily replaced with their nearest neighbours.
The average fill method chooses values for the missing pixels which are closest, in a least
squares sense, to the available pixels. This is desirable when the available pixels are homogeneous. However, when the available pixels are heterogeneous, this method would invent
a third class of pixels all having the same value. It also requires more computation that the
nearest neighbour approach.
In this thesis the average fill method will be used. The averaging process suggests it would
be more robust under noisy conditions when compared to nearest neighbour. It is hoped
that it should have little effect on the overall results compared to specialization, as the
margin pixels comprise only a small proportion of the total number of pixels in the image.
Consider a b
contains pixels which have an incomplete neighbourhood. The outer dotted line denotes the
effective increase in image resolution when extrapolating missing pixels.
Equation 4.5 gives an expression for
pixels in the image. This is the proportion of pixels, which, when at the centre of the window,
are forced to generate missing pixel values using one of the techniques just described.
76
(m1)/2
(m1)/2
(m1)/2
(m1)/2
Y+m1
B%
X+m1
)=/)p
u9I(Eu$/
A
(4.5)
Table 4.2 lists the proportion of affected pixels for various common combinations of image
and neighbourhood dimensions. For the commonly used window, about 1% of processing involves missing pixels. This is not large enough to be a serious concern, especially
given the fact that the objects of interest are often positioned near the centre of the image,
away from the margins.
77
3
3
3
5
5
5
7
7
7
320
512
1280
320
512
1280
320
512
1280
240
512
960
240
512
960
240
512
960
1.45%
0.78%
0.36%
2.90%
1.56%
0.73%
4.33%
2.33%
1.09%
Table 4.2: Proportion of pixels at the margins for various mask and image sizes
Figure 4.12: Noisy equals original plus noise: (a) original image, ; (b) additive noise, ;
(c) noisy image, ? = +
There are two different situations that may occur when measuring the quality of a denoised
image: when the original image is available, and when it is unknown. The first case usually
occurs in an experimental situation where a known image is artificially corrupted with noise.
The original image is ground truth, to which any denoised images may be compared directly.
The second case is the more realistic situation whereby a noisy image has been sourced, say
remotely from a space telescope, and one wishes to denoise it before further processing.
Here there is no ground truth with which to compare. In either case, there are subjective and
objective techniques for assessing the quality of denoised images.
78
ing predefined quality classes, such as excellent, fine, passable, marginal, inferior
and unusable [GW92]. This type of experiment requires that the viewing conditions be
strictly controlled. This includes factors like ambient lighting levels, the display hardware,
and the position of the assessor relative to the displayed images.
Figure 4.13 shows samples of what an assessor may be asked to examine. For this example
a reasonable assessment for the denoised image may be passable, as there is blotchiness
in the foreground and background, and the top left corner of the bright square stands out as
being too dark. Obviously though, what is passable to one may be marginal to another,
and so on. However, most assessors will be consistent in their gradings, and a consensus
rating can usually be determined.
Figure 4.13: Visual assessment using ground truth: (a) original image; (b) denoised image.
When ground truth is unavailable, it is still possible to perform a visual comparison by inspecting the noisy and denoised images. Figure 4.14 shows this situation for the example.
Even without ground truth the assessor usually has some a priori beliefs on what features
are present in the original image, and the denoised image can be examined relative to these
beliefs. In effect, the assessor is determining the ground truth himself or herself. The human brain is quite good at identifying structure behind noise, so a skilled assessor could
use a noisy image as a guide to what is expected in the denoised output. The qualitative
descriptions described earlier can still be used.
79
Figure 4.14: Visual assessment without ground truth: (a) noisy image; (b) denoised image.
RMSE [GW92]. When ground truth is available, the two data sets could be the denoised
image, = , and the original image, = . Equation 4.6 calculates the RMSE in this case.
{
wk )|,-/}bw)|,/ `
(4.6)
The RMSE is proportional to the disparity between two images. In the case of two equivalent
images, it is zero. For the case of additive zero-mean Gaussian noise, the RMSE between
the noisy and original images is exactly equal to the noise standard deviation.
The peak signal to noise ratio, or PSNR, is derived from the RMSE, and is measured in
decibels (dB) [RJ91]. This logarithmic measure is computed using Equation 4.7, where
_ is the maximum possible pixel intensity.
0i
#
F0B
_
I
(4.7)
Q.
RMSE and PSNR use the square of the pixel difference. This penalizes large errors very
heavily. For example, a single pixel in error by 100 will have the same contribution as
10,000 pixels in error by 1. An alternative measure, which aims to alleviate this potential
problem, is the mean absolute error, or MAE, calculated using Equation 4.8. The MAE
penalizes errors by their magnitude, and is less likely, compared to RMSE, to be biased by
occasional large errors.
E
A
w)p|,-/}
k )p|,-/
(4.8)
80
The RMSE and MAE are useful in that they provide a number which can be compared
objectively. Their drawback is that they do not take into account the spatial distribution of
the pixel differences. Many small differences may be more tolerable than fewer larger errors,
especially if those errors occur at busy locations in the image. In fact, this perceptual
masking effect [TH94] is exploited by lossy image and audio compression algorithms. Large
errors clumped together, or near the image margins, may be preferred the same errors spread
around the image. There is evidence to suggest that RMSE may be well correlated to human
observers subjective opinions [MM99]. This fact, combined with its simple formula, has
allowed RMSE to be used widely throughout the literature.
The worst case absolute error, or WCAE, is the magnitude of the single largest difference
between two images. It provides a measure of a denoising algorithms worst case performance. The calculation of the WCAE is given in Equation 4.9.
HXO
argmax
w)|,-/} w
k )|,/
- 4
(4.9)
the range of the difference image will be _u#B,_@ . To view a difference image, the
pixel values must be mapped and clamped back to the legal range -,_@ . Equation 4.10
computes the difference image, , between two equal-sized images, = and = { . If the two
81
images are similar, the difference image mean should be 0, so the _xB
term centres 0 at the
middle grey level. The clamping process has the side-effect of not being able to distinguish
errors with a magnitude greater than _[xB
, but these should be rare.
A!=T
=[P= u
(4.10)
Figure 4.15 shows how a difference image can be used to compare a denoised image to
the original. The dark and light pixels in the difference image show where the denoising
algorithm did a poor job of estimating the original pixels. The mid-grey pixels around the
centre area represent where it did well. If the image were denoised perfectly, the difference
image would contain exactly the noise which was added.
Figure 4.15: Qualitative measure of filter performance: (a) original image; (b) noisy
image; (c) median filtered; (d) difference between original and denoised, midgrey representing zero.
In the previous example ground truth was available, as Figure 4.15a was synthetically created. Usually, however, the noiseless image will be unavailable. The image processor must
use only the noisy image to recover the original pixel information. In this situation, only the
difference between the noisy and denoised version can be generated. Figure 4.16 gives an
example of this situation. Although there is less apparent structure in the difference image,
82
there are some clumps of very bright and dark pixels near the corners of the square. These
errors could correspond to where the filter performed erratically, or to where there were very
noisy pixels. Without the original image it is difficult to determine which it is.
Figure 4.16: Qualitative measure of filter performance: (a) noisy image; (b) median filtered;
(c) difference between noisy and denoised, mid-grey representing zero.
4.4.1 Square
The square image in Figure 4.17 has already been encountered. It is an 8 bit per pixel
greyscale image of resolution B* . It consists of a 25 pixel square of intensity 200 atop
a 74 pixel background of intensity 50. It will be often used for illustrative purposes, and for
subjectively examining the effect of various techniques.
4.4.2 Lenna
The lenna image [len72] in Figure 4.18 has become a de facto standard test image throughout the image processing and image compression literature2 . Its usefulness lies in the fact
2
83
that it covers a wide range of image properties, such as flat regions, fine detail, varying edge
profiles, occasional scanning errors, and the fact that so many authors produce results using
it as a test image. One interesting feature is that the noise in lenna seems to be inversely
proportional to the brightness, perhaps a legacy of having been scanned from a negative.
Figure 4.18: (a) the B}B 8 bpp lenna image; (b) histogram.
4.4.3 Montage
Figure 4.19 shows a greyscale test image called montage, and its histogram. It has resolution
and uses 8 bits per pixel. The image consists of four quadrants. The top left
84
is the middle
BF
BF section of a smoothed version of the lenna image3 . The bottom
right is a right hand fragment of a German village street scene, which contains some small
amounts of natural noise. The top right is a synthetically created image which is perfectly
piece-wise constant. It covers a range of segment boundary shapes and intensity differences
between adjacent segments. The bottom left is the same as the top right, except that the
segments are piece-wise planar, covering a range of horizontal and vertical gradients.
It is hoped that this image covers a wide range of image properties, while also being very
low in noise. The low noise level is important for experiments in which synthetic noise will
be added. The image has features such as constant, planar and textured regions, step and
ramp-like edges, fine details, and homogeneous regions. The montage image will be used
for measuring objectively the RMSE performance of various techniques.
This image was provided by Bernd Meyer, the author of the TMW algorithm [MT97]. It is available for
download from http://www.csse.monash.edu.au/ torsten/phd/.
85
ure 4.20. However, the techniques described may be applied to any window configuration
and size. For brevity we will refer to the set of pixels from the window as the vector . The
pixels are indexed in raster order, namely to . The centre pixel -
U41
{3 is the pixel
being processed, and may also be denoted simply , without the subscript.
w)pRB,7/
w)p B,-/
)pRB,ru</
)p|,7/
)p|,-/
w)|,ru</
)pTu$B,7/
)pTu$B,-/
w)*u<B,u$/
p
E
q
r
Figure 4.20: Equivalent naming conventions for pixels in the local window.
The simplest form of local segmentation is to do no segmentation at all, and to assume that
the pixels in
are homogeneous. Because the local region is small compared to the whole
image, a large proportion of local regions are expected to be homogeneous anyway. This is
the assumption made by the linear filters with fixed weights described in Section 3.4.2.
Under a piece-wise constant image model, pixels, , from the interior of a global segment all
have the same true value, denoted # . Under a 8
noise, > , added to them. The noisy pixels, ? , have the properties shown in Equation 4.11.
? uX>
" SdS
#
PORQ
Z698
)f-,& { /
(4.11)
Equation 4.12 shows the noisy pixels to have the same expected value as they did prior to
noise being added. However, the uncertainty, or variance, in their values has increased from
0 to & { . The noisy pixels may be considered to be distributed 698
+ ?
p u9y
Fd;+?
{ ;1d;+-u$ { ;Fd;yZ
)B#w,& { / .
#Lu9
#
;u<"& {
& {
(4.12)
computed using Equation 4.13. Assuming local homogeneity, the sample mean is also the
best estimate for every pixel within the window. The variance, or standard error of the
86
. Thus the more pixels from the same segment that are averaged, the
j
k -k #'
k
1dR k Z
& {
(4.13)
Figure 4.21 shows the effect of this filter on the square image from Section 4.17. There
is clear noise reduction in the centre of the square and in the background, but there is also
obvious blurring at the boundary between them. The assumption of only one segment existing in the local window is equivalent to the box filter described in Section 3.4.2, which gives
equal weight to each pixel in the window. As a result, it has all the same drawbacks as fixed
linear filters. If filtering were repeated, further blurring would occur, eventually producing
a completely homogeneous image. This common pixel value would be similar to the image
mean, which for square, is )
B
FB"u91F/yxFB#B .
Figure 4.21: (a) original; (b) noisy L. version; (c) denoised using averaging.
87
Reconsider the square image from Figure 4.21. Globally, it consists of only two segments,
thus any local window must contain, at most, two segments. One solution for preserving
edges is to first classify the pixels from the window into two segments. Only pixels belonging
to the same segment could be used to calculate the denoised pixel values. This would prevent
the mixing of pixels from different global segments, resulting in a sharper image.
The division of pixels into two segments is obviously a form of segmentation. The BTC
technique, discussed in Section 3.3.2, successfully uses a two-class model for encoding J
pixel blocks. In most variants of BTC a single threshold, , is used to divide the pixels into
two clusters [FNK94]. The representative values, # k and # k { , for each cluster are set equal to
the average of the pixels within each cluster. This is shown in Equation 4.14, where and
#k
#k
{
{ a
(4.14)
The threshold, , may be chosen in many ways. Many techniques, including the popular
Absolute Moment BTC [LM84, CC94, MR95], simply use the mean of the pixels in the
window, calculated using Equation 4.15.
<
(4.15)
The mean measures the central tendency of the pixel values in the block. When the two
clusters have unequal numbers of pixels, the mean is biased toward the centroid of the larger
cluster. This pollutes the smaller cluster with pixels from the larger cluster, biasing both
clusters means. This effect is shown in Figure 4.22.
When the clusters are known to be normally distributed, it is better to choose such that the
overall mean squared error (MSE) is minimized, as shown in Equation 4.16. The optimal
can be found by exhaustive search of the
<
argmin
th- U4
) ? n# k / { u
) ? # k { / {
(4.16)
Frequency
88
Intensity
Biased
1
Overall
mean
Biased
2
Figure 4.22: The mean is a poor threshold when the clusters have different populations.
An iterative technique for determining the MSE threshold in Equation 4.16 was proposed
by Efrati et al [ELM91]. It uses the block mean as the initial threshold. The computed
cluster means are then themselves averaged to produce a new threshold, and new cluster
means computed. This continues until there is no significant change in the threshold. The
algorithm is outlined in Listing 4.1, where quantifies the required accuracy. Usually pixel
intensities have integer values, so the algorithm can terminate if the integer part of is no
longer changing.
1. Let $
-? .
%
)# k u# k {
/ then exit.
{
u k# { / and go to Step 2.
{
3. If
89
This algorithm produces means very similar to the levels produced by the Lloyd quantizer [Llo82], and is often referred to as Lloyds algorithm. In their BTC survey paper, Franti
et al [FNK94] found Lloyds algorithm to produce results identical to a true MSE optimizer
in nearly all situations, all while using only 1.71 iterations on average. It was not made clear
in which situations it failed perhaps it was due to the mean being a poor initial threshold,
resulting in convergence to a local, rather than global, optimum. Nevertheless, this method
will be used for selecting binary thresholds in this chapter.
k Z
#k
#k
?
?
(4.17)
Figure 4.23 compares using 1-segment and 2-segment models for denoising square. Filtering under the assumption of two segments preserves the edges of the object much better. The
thresholding procedure appears to have correctly classified the pixels, even in the presence of
additive Gaussian noise. Within homogeneous regions the smoothing is a little worse, and is
especially noticeable in the centre of the light square. This occurs because the pixels in each
window are forcibly divided into two segments, even those which are homogeneous. Thus
the variance within each local cluster is reduced only by a factor of or { , compared to
uA { when the window is treated homogeneously. The possibility of applying the
filter again to the denoised output for further smoothing is discussed in Section 4.8.
Figure 4.24 quantitatively compares the denoising performance of the 1-segment and 2segment local models for the montage image. For each value of & tested, artificial noise
6I8:)+-,& { / was added to montage, and the noisy pixels rounded to integers. The RMSE
was measured between the original montage and the denoised output of each algorithm.
The 2-segment model performs significantly better for all noise levels up to &ME
F . At the
90
Figure 4.23: (a) noisy, <. ; (b) filtered with 1-segment model; (c) filtered with 2-segment
model.
highest noise levels, the distinction between adjacent global segments becomes less clear,
especially if the contrast between them is low. When the noise swamps the signal, inappropriate thresholds are more likely to be chosen.
24
22
Local model
1-segment
2-segment
20
18
RMSE
16
14
12
10
8
6
4
2
10
15
20
25
30
Figure 4.24: Comparison of 1-segment and 2-segment models for denoising montage.
91
or Q<
. Consider an image with a dark background
comprising the remaining x1 of the pixels. If this image were corrupted by additive noise
698
)B# ,& { / u
)B# { ,& { /
8
(4.18)
Figure 4.25a shows this histogram for # , # { and noise standard deviation &2O . It
is clearly bimodal, because the two cluster means are F& apart. It was shown in Section 4.2.5
that 99.7% of normally distributed pixels will, on average, fall within F& of their mean.
Figures 4.25b-d show the histograms behaviour as the distance between the two means,
%
#
F& the histogram becomes unimodal, but not Gaussian, in nature. The exact point at which
this occurs would also depend on the number of pixels in each cluster. The more unequal the
blend, the more likely it is to appear unimodal.
From the previous observations it would be reasonable to suggest that a method for deciding
between Q<a and Q<V
should be based on determining whether the two clusters are
well separated. If we restrict the criterion to use the difference between cluster means, the
threshold should depend on the noise variance and the number of pixels in each cluster.
Equation 4.19 gives a template for the proposed model order selection technique.
Q
#k
# k
{
F~ S@d"S
)p&,F , { /
(4.19)
92
10
11
12
10
11
12
10
11
12
10
11
12
Figure 4.25: [top to bottom, left to right] A mixture of two normal distributions with common
variance, with their means separated by (a) ; (b) ; (c) ; (d) .
The two estimated cluster means, # k and # k { , are calculated using Equation 4.14. The hypothesis that there are two segments is rejected if the cluster means are too close together. The
measure of closeness is decided by comparing the inter-cluster distance with the output of
a threshold function, )+&, , { / . The threshold function may depend on the image noise
variance and the number of pixels in each cluster. This should make the thresholding process
adaptive to the image noise level.
it varies for different cluster populations, but it is obvious that the magnitude of the noise
variance is an important factor. The simplest, reasonable form for the separation threshold is
given in Equation 4.20, where H is a positive constant.
93
)p&,F ,F { /$HC&
(4.20)
This separation threshold is a form of Fishers criterion (Equation 3.3) under the assumption
of equal cluster variances. The two are related by the fact that H
if &PE&.&E&V) .
Visual inspection
Consider the square image again, which has a background intensity of 50 and a foreground
intensity of 200. Figures 4.26a shows a noisy &G; version of this image. Figures 4.26bf
show the denoised output values of H
if Q: is chosen, the denoising filter uses the one segment model of Section 4.5. If QvD
is chosen, the two segment model of Section 4.6 is applied. In Figure 4.26, the foreground
and background are separated by 1& , so no difficulties are expected. The denoised images
for H$ are equivalent, having successfully smoothed nearly the whole image.
Figure 4.26: (a) noisy, L. ; (b)(f) simple model selection using 9..~ .
Figure 4.27 provides the same results as Figure 4.26, except that the noise has been increased
to &X
1 . The the two pixel populations are still well separated by 41& . Once again, for
H< , the maximum possible smoothing has occurred.
94
Figure 4.27: (a) noisy, LP; ; (b)(f) simple model selection using 9..y .
Figure 4.28 shows a much more visibly noisy image with &b:B , corresponding to only a
1& intensity difference between the two image segments. At this separation, there is a 2%
overlap of the two populations. The best smoothing occurs when H equals 3 or 4. When HO
, some blurring of segment boundaries is observed, meaning Qj
incorrectly. When the global segment means are only 1& apart, insisting on an equivalent
cluster separation at the local level will sometimes be incorrect.
Finally, Figure 4.29 presents the very noisy case of &PY . The true object intensities are
only -F1& apart, corresponding to a 10% overlap. Thus it is expected that about 10 of the 99
pixels will have intensities which could be mis-classified. The best denoising occurs when
H
equals 2 or 3, depending on what type of artifacts are preferred. As expected, for H# ,
the model selection criterion fails miserably, blurring all the edges around the square.
From this analysis it may be concluded that HO$ is a good all-round choice for the square
image when the noise has additive, zero-mean Gaussian characteristics. The disadvantage
of the H& threshold is that it does not consider the number of pixels in each cluster. There
is low confidence in the average of a cluster containing 1 pixel compared to its 8 pixel
counterpart, because the standard error of the former is 8 times higher. A further problem
with a fixed threshold is that two segments with an intensity difference of less than HC& will
probably never be detected.
95
Figure 4.28: (a) noisy, LP; ; (b)(f) simple model selection using 9..~ .
Figure 4.29: (a) noisy, L ; (b)(f) simple model selection using 9..~ .
Quantitative results
Figure 4.30 shows the RMSE between the denoised and original montage image for 6
integer values of H . One would conclude that H
the noise levels. This supports the conclusion made in Section 4.7.1.
96
30
25
C=1
C=2
C=3
C=4
C=5
C=6
RMSE
20
15
10
10
15
20
25
30
Figure 4.30: Effect of on RMSE when denoising montage by switching between a 1segment and 2-segment local model.
One segment
Two segments
QRD
QR#
97
#
{
# X
# {
Null hypothesis
Alternative hypothesis
Table 4.3: Using hypothesis testing for model selection in local segmentation.
If it is assumed a priori that the segments are piece-wise constant with additive Gaussian
noise 698:)+-,& { / , a statistical -test can be used to determine the more likely hypothesis [Gro88]. In Equation 4.21, is a random variable following Students distribution with
KL uA { X
degrees of freedom.
If
% %
inferred to be homogeneous. If
= 2
% %
#k {
\ u
#k
&
(4.21)
K
a 2
tions. Equation 4.22 summarizes the -test model selection criterion for local segmentation.
Q
The value of
a 2
% %
% %
= 2
a 2
(4.22)
dence one wishes to have in the inference. Because the same window is used for each pixel,
a 2
K is fixed at u {
A
#
. To achieve ;B)
F/ confidence,
should
F
equal { . This may be calculated using numerical integration, or from pre-computed
tables [PH66], an extract of which is given in Table 4.4. For the case of Kj , Students
distribution becomes a normal distribution.
For a window of
% %
%
#k n
#k {
& \ u
a 2
(4.23)
a 2
(4.24)
98
Degrees of
freedom K
2
7
11
19
23
.10
1.886
1.415
1.363
1.328
1.319
1.282
.025
4.303
2.365
2.201
2.093
2.069
1.960
F
#k
n# k %
{
.005
9.925
3.499
3.106
2.861
2.807
2.576
used in -testing.
uA
{
{
a 2
&(
a 2
)+&Z,F ,F { /^
&
(4.25)
(4.26)
Figure 4.31 compares four different values of for denoising montage using the -test
criterion. The value of is used to determine dynamically the mean separation threshold in
Equation 4.26. A
filter is used, so
very low noise levels, the value of does not appear to affect results, but as & is increased,
the lowest value of A#-NB does best. This is the highest of the confidence levels tried.
In this thesis a significance level of 4BB , or 99% confidence, will be used. For
windows, K'
, so
a 2
1
{ {
q
)+&Z,F ,F { /0
&
{
(4.27)
The -test threshold function has a similar form to the simple one in Section 4.7.1, except
that H has been replaced by a term inversely related to the geometric mean of the two cluster
populations. Figure 4.32 plots versus the effective number of & separations required for
QI
to be declared. At the extremes it has values between 3 and 4, but at the middle where
the cluster populations are most symmetric is goes as low as 2.35. The -test approach allows
cluster means to be close together as long as the evidence, in the form of more accurate mean
estimates, is strong enough. When the cluster counts are skewed, the criterion insists on a
wider separation before accepting them.
30
99
Value of
0.050
0.025
0.010
0.005
25
RMSE
20
15
10
10
15
20
25
30
3.8
3.6
Effective value of C
3.4
3.2
3
2.8
2.6
2.4
2.2
3
4
5
6
Number of pixels in the first segment (m1 of 9)
100
In an implementation, Equation 4.26 would not need to be recomputed for each pixel. For a
given image, the values of
a 2
{ Y
, the only
I/~xB
unique thresholds. These may be pre-computed and stored in a
Visual inspection
Figure 4.33 provides a visual comparison of the -test and HO# model selection functions.
The four rows correspond to added noise standard deviations of 10, 20, 30 and 40. The three
columns contain the noisy image, the H3# denoised image, and the -test denoised image.
When &W
F , the light
squares have been filtered equally well, but the -test has performed slightly worse on the
background. For all cases other than or , the effective value of H determined by the
-test is less than 3. Therefore, on average, it will choose Q
more often than the simple
criterion. If there is a very noisy pixel present in a homogeneous area, the -test criterion is
more likely to place that pixel in a segment of its own.
Interestingly, increasing the noise to &2$B removes any observable differences between the
two methods. Normally, behaviour similar to that observed for &9
F is expected. But it
seems in this case that, by random chance, no extreme pixel values were present in the noisy
image. When & , the denoised outputs of both methods contain obvious artifacts. The
filtering errors in the H
output image are milder, but a little more frequent, than those
101
Figure 4.33: Visual comparison of two model selection functions. Columns are: noisy, XP
filtered, and -test filtered images. Rows are: L 10, 20, 30 and 40.
Quantitative results
Figure 4.34 shows that, for montage, the HED and -test criteria perform equivalently at
low noise levels. For values of &
, the simpler H
So far, only a window has been considered. It must be noted that the -test criterion
varies more as the window size increases. Although not shown, I have performed further
102
experiments which show that the -test criterion still performs worse when the window size is
increased to 21 and 25 pixels. This fact, in conjunction with the lack of compelling evidence
in the qualitative results, recommends the use of the simple HO# model selection criterion
over the more complicated -test criterion.
30
Force k=1
Force k=2
t-test =0.005
C=3
25
RMSE
20
15
10
10
15
20
25
30
27
iteration. Ideally,
all pixel values are kept to floating point accuracy throughout. Rounding to integers after
each iteration introduces feedback of quantization noise, which could cause unusual artifacts.
= 2
FSd;){= 2 /
103
" S@dS
= = ?
(4.28)
The effect on the final output as increases depends wholly on the properties of the denoising
algorithm. A stable algorithm is defined as one which, after a finite number of iterations ? ,
produces no further changes in the output. Thus = 2
image [FCG85]. An example of an unstable algorithm is the median filter. Imagine using a
local window of 5 pixels the centre pixels and its four nearest neighbours and handling
the image margins by taking the median of the available pixels. Applying this median filter to
the checkerboard image in Figure 4.35a results in the inverted checkerboard of Figure 4.35b.
Reapplying for a second iteration produces Figure 4.35c, which is identical to the original
image. In this case, the instability manifests itself as an oscillation between two images, the
difference of which has the maximum possible RMSE.
Figure 4.35: Oscillatory root images: (a) original image; (b) after one iteration; (c) after two
iterations.
Ideally, the root image should resemble the original but without the noise. Many existing
techniques produce a root image in which every pixel has the same value. For example, any
fixed linear filter with more than one non-zero weight will, if iterated, eventually produce
a homogeneous image. Even anisotropic diffusion [PM90], which is meant to be iterated,
behaves in this manner if the default stopping function is used. The final common intensity
will be roughly equal to the image mean. The exact value will vary with noise in the image,
and whether pixel values are rounded back to integers after each iteration.
One would expect the local segmentation denoising algorithm to be stable and, in general,
not to converge to homogeneity. This is because the explicit segmentation uses a hard cutoff to keep different groups of pixels separate. Pixels from different groups are unable to
104
influence each other. Of course, this only works if the two groups of pixels differ enough to
be detected by the local segmentation model selection criterion. This could be problematic if
the noise level was poorly estimated, or if the intensity difference between two neighbouring
segments was below the noise level.
Visual inspection
Figure 4.36 shows the result of iterating the local segmentation denoising algorithm of Section 4.7.1 100 times on the noiseless square. The simple model selection criterion with
HU^ is used, and &
was set to zero. The same value of & was used for each iteration.
The result of iterating a median filter are also included for comparison purposes. It is
highly desirable for a denoising algorithm behave as an identity filter when the input image is
noiseless, and the local segmentation filter does that. The median filter truncates the corners,
but otherwise keeps the object boundaries sharp.
Figure 4.36: Iterating 100 times: (a) noiseless square; (b) local segmentation assuming R ; (c) } median filtered.
Figure 4.37 performs the same experiment, except that this time, noise 68
)f-,.F { / has
been added to square. Each iteration, the local segmentation filter assumed that &b:B .
The local segmentation filter proves to be both useful and stable. Its output only has two
pixel values: 47 for the background and 192 for the foreground. The reason these are not
equal to 50 and 200 as in the noiseless version could be due to three reasons: the noise
not being distributed symmetrically within each segment, clamping of pixels to -,.
BB , and
margin pixels being slightly over-represented due to their use in computing the extrapolated
values when the filter window goes outside the legal image coordinates, as discussed in
Section 4.2.8. Compared to Figure 4.28d, the result of only one iteration, the output is more
105
visually pleasing. The median filter has also has a root image, albeit a disfigured one. The
noise has caused the median to behave unusually.
Figure 4.37: Iterating 100 times: (a) noisy # ; square; (b) local segmentation
assuming Rb; ; (c) median filtered.
As shown in Figure 4.38, the local segmentation filter fails when the noise is increased to
&MD . Its denoised output only contains pixels of intensity 101, although the mean of the
noiseless image is 88. This homogeneity is due to each of the original segments containing
pixel values more suited to the other segment. Unfortunately, the clustering algorithm can not
handle this situation. The result is intensity leakage between the segments. Given enough
iterations, this leakage will spread throughout the image. The median filter has reached a
better root image in this situation.
Figure 4.38: Iterating 100 times: (a) noisy # square; (b) local segmentation
assuming R ; (c) median filtered.
Quantitative results
For montage, Figure 4.39 plots the RMSE between the original and denoised outputs as the
number of iterations is increased. The local segmentation filter used H3 and was supplied
with the true added noise level. In general, for a given level of added noise, the RMSE
106
worsens as more iterations are performed. This is a little surprising given the qualitative
results for square in Section 4.8. However, the assumption that the local region consists of
two piece-wise constant segments is clearly false for many positions within montage. Any
filtering errors after the first iteration would only be compounded by further iterations. Most
natural images would have properties in common with montage.
26
Number of iterations
20
10
5
2
1
24
22
20
RMSE
18
16
14
12
10
8
6
4
10
15
20
25
30
Figure 4.39: Effect of iteration on RMSE for denoising montage. The same value of is
used for each iteration.
The exception to the overall trend is that 2 iterations becomes slightly more beneficial than 1
when &3; . This may just be the point at which the gain from further averaging outweighs
the loss from degrading image structure further. Figure 4.40 shows the effect of iterations
on a small portion of the noisy &O
F montage image, with the corresponding difference
images underneath. Both visual inspection and RMSE agree that 2 iterations has produced
a better output. The intensities of the text and background are more uniform and pleasing to
the eye. For montage, QjE was chosen 80% of the time during the first iteration. This is
approaching a pure averaging filter, which, in terms of RMSE, becomes a good choice when
the noise level is high enough.
107
Figure 4.40: Bottom right hand corner of montage: (a) noisy 3; ; (b) after 1 iteration;
(c) after 2 iterations; (d)(f) corresponding difference images relative to the
noiseless original.
Conclusions
It is difficult to prove if a given denoising algorithm will be usefully stable in the general
case. The previous experiments showed that for some simple cases, the local segmentation
approach to denoising may be iterated successfully. When the noise is very high compared
to the contrast between objects, pixel leakage unfortunately occurs, and multiple iterations
could produce a homogeneous image. In Chapter 5, this problem will be tackled by using a
segmentation algorithm which takes spatial information into consideration.
108
(a)
20
20
40
40 60
40 60
40 40
assuming QL#
(b)
34
34
34
34 60
34 60
34 34
Figure 4.41: The effect of a two class model on a three class region: (a) original pixels;
(b) segmented assuming "CP .
The use of an incorrect model has forcibly merged two of the original segments. Its resulting
mean falls between the those of original segments. Thus 7 of the 9 pixels, including the centre
pixel, have a filtered value which is unreasonable given that the noise level is effectively zero.
If one believes that the noise is additive in nature and distributed 698:)+-,& { / , then confidence
interval analysis states that 99.7% of pixels are expected to fall within F& of their original
value. If a single filtered pixel value strays further than this, there is evidence to suggest that
the local segmentation model is inappropriate.
The ability to diagnose whether the best fitting model actually fits the data well is very useful.
All medical practitioners must take the Hippocratic Oath [HipBC], which has as one of its
cardinal principles the following: First of all, do no harm. This principle may be applied
to local segmentation when denoising images. If it appears that the best fitting model has
produced an unreasonable local approximation to the original image, dont use it. Instead,
use the original pixels as they were. It is probably better to leave them unmodified than to
give them new, possibly noisier, values. Equation 4.29 describes the do no harm (DNH)
principle. If the local approximation suggests changing any pixels value more than HC& , then
ignore it, and pass the pixels through unmodified.
R
k 2 ?
HC&
FRS@dy"S
For montage, Figure 4.42 shows the RMSE denoising performance of the vH
(4.29)
`
local segmentation filter with and without DNH enabled. The algorithm was provided with
the value of & , as per all experiments so far. The DNH option provides an impressive improvement in RMSE for values of & up to 20, after which it does only slightly worse. When
&
is large, DNH is more likely to be invoked due to the dominating noise, rather than an
inability to model the underlying image structure. It must be remembered that for the results
109
presented here, the true noise variance was supplied to the local segmentation algorithm.
In a real situation & would have to be estimated from the noisy image. This issue will be
discussed in Section 4.13.
18
Default
With DNH
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.43 shows the effect that DNH has on the worst case absolute error (WCAE). As
expected the WCAE performance is much better. Enabling DNH has the effect of limiting the worst case behaviour of the filter for each pixel. This is most useful in low noise
environments, because it can help to avoid smoothing out fine image structure.
Figure 4.44a shows the top left quadrant of montage with added noise 6E8
)f-,. { / . Fig-
110
180
Default
With DNH
160
140
WCAE
120
100
80
60
40
20
0
10
15
20
25
30
of DNH. This occurs because binary thresholding sometimes produces poor segmentations,
and before DNH, QL is a better overall choice.
Figure 4.44: DNH disabled: (a) original image; (b) "C , 64%; (c) "b , 36%.
The do no harm philosophy is not limited to local segmentation applications. It could be applied to any denoising algorithm to limit its worst case behaviour. All that is required is some
measure of the natural level of variation within segments around the pixel being processed.
111
Figure 4.45: DNH enabled: (a) "C , 60%; (b) "b , 25%; (c) DNH, 15%.
For example, the simple box filter of Section 3.4.3 could invoke the DNH rule when the local
average differs too much from any local pixel values. This filter would have two implicit local models: Q and DNH. The result would be good smoothing in homogeneous regions,
and either blurring or no smoothing elsewhere.
Equation 4.29 is just one possible DNH rule. Instead of rejecting the local approximation if
just one pixel value would change too much, it could be relaxed to consider only the centre
pixel. Here DNH would be invoked less often because it is not testing the fitness of the whole
local segmentation model. For some windows the relaxed rule could improve denoising, but
for others, such as the one in Figure 4.41, it would do more damage than good.
112
Figure 4.46: Common windows: (a) 5; (b) 9; (c) 21; (d) 25; (e) 37 pixels.
sum positive influence of many smaller errors due to better smoothing. The balance depends
on the features of the image being denoised.
Figure 4.47 gives results for denoising montage, first with DNH disabled. The general
trend suggests that smaller windows achieve better denoising with respect to RMSE. The 5
pixel window performs best until &U , at which point the next smallest (9 pixel) window takes over. This is an interesting result. For the
window, the 4 corner pixels are
roughly 41% further away from the centre pixel than the 4 side pixels. By argument of spatial coherence, the corner pixels are less likely to be related to the others. Segmenting fewer,
more related pixels means that QvY should be chosen more often. A small window is less
likely to cross segment boundaries than a larger one. The advantage of choosing QjD
less
frequently is that homogeneous regions are more likely to be well modeled. For very noisy
images, Qv
is chosen more often anyway, so a smaller window will give less smoothing.
This is borne out by the 5 pixel windows RMSE increase as & gets higher.
When DNH is enabled, the results change quite dramatically. Figure 4.48 shows how DNH
limits the worst case behaviour of all the windows tested. The RMSE performances are
essentially equivalent until &E
Figure 4.47. It appears that the window strikes a good balance between compact locality,
so that a two-segment model is sufficient, and having enough pixels for reasonable smoothing
in homogeneous regions.
20
113
18
16
RMSE
14
12
10
8
6
4
10
15
20
25
30
Figure 4.47: Effect of different windows for denoising montage, DNH disabled.
possible positions
in the window. Figure 4.49 illustrates this graphically for of a window, where
Thus, for each noisy pixel, there are
$ .
are not fully independent, as they share some pixel values in their derivation, so they can
114
25
20
RMSE
15
10
10
15
20
25
30
Figure 4.48: Effect of different windows for denoising montage, DNH enabled.
the factor is
BBxFC
1 . The calculation of the compound estimate is likely to require little
computational effort compared to performing local segmentation with a larger window. In
homogeneous areas, up to four times as many pixels will contribute to the calculation of the
mean, reducing its standard error by the same factor.
Let ;; be the
overlapping denoised estimates for the same pixel. How should these
115
v
k
(H
(H
(4.30)
Window position
Each denoised estimate
is derived from a different local region. As shown in Figure 4.49,
the same shaped window is used for each region, but the position of the pixel within the
116
window differs per estimate. For a window, the estimate appears in 9 different positions
at the four corners, the four sides, and once at the centre. At first consideration one may
guess that the centre estimate should be given more weight than those from the sides and
corners. However, the local segmentation process treats all pixels democratically, and the
thresholding technique takes no spatial information into consideration. There appears to be
little justification for treating them unequally.
This assertion will be tested by comparing a range of different weights. The weights should
be chosen in a manner such that the four corners are treated equally, and likewise for the four
edges. A simple model meeting these symmetry requirements is shown in Figure 4.50. Each
weight is calculated by raising a constant, S , to the power of its distance from the centre of
the window. In this case the Manhattan distance has been used. When SP] , the model
reverts to not producing a compound estimate at all. As S approaches 1, the estimates are
treated more and more equally. Do not confuse this S with that used in image processing to
measure the correlation between neighbouring pixels. Here the values being combined are
all estimates of the same pixel.
(a)
S {
S
S {
S
S {
S {
(b)
0
0
0
0 0
1 0
0 0
(c)
m
1 2
2 4
1 2
1
2
1
(d)
1 1
1 1
1 1
1
1
1
Figure 4.50: Weighting overlapping estimates: (a) any ; (b) ; (c) B ; (d) .
Figure 4.51 compares four values of S in terms of RMSE for denoising montage. To
observe the behaviour of overlapping averaging in the simplest possible environment, DNH
is disabled. The WCAE results under for the same conditions is plotted in Figure 4.52. It
is seen that any non-zero value of S improves both the RMSE and WCAE performance,
compared to not combining overlapping estimates at all. As argued earlier, the best results
are achieved when S*O , which gives equal weight the overlapping estimates.
Figure 4.53 gives visual results for the square image after adding noise 68:)+-,
F { / to it.
The improved smoothing performance when S*O is clearly visible. In both cases the edges
are reconstructed without any blurring.
18
117
Value of
0
0.5
0.9
1
16
14
RMSE
12
10
8
6
4
2
10
15
20
25
30
measure the confidence in the estimate, and could contribute to its weight. A high standard
error corresponds to a low confidence. As
. The weights were normalized for each pixel using Equation 4.30. Once again,
DNH is disabled to limit the number of factors influencing the result. Somewhat surprisingly,
confidence weighting performs worse than equal weighting up to &2
F , at which point they
converge. In homogeneous areas of the image, both approaches should behave equivalently,
as + I
for every estimate. The variation must occur in heterogeneous areas of the image.
118
200
Value of
0
0.5
0.9
1
180
160
WCAE
140
120
100
80
60
40
10
15
20
25
30
Figure 4.53: Denoising square: (a) noisy original, LP; ; (b) output when r ; (c) output
when .
In the vicinity of an edge, non-centre pixels from neighbouring homogeneous windows will
receive more weight than centre pixels from heterogeneous windows closer to the edge.
These estimates may be unreliable, so equal weighting (averaging) could be the best way to
minimize the average error, and hence RMSE. It was found that enabling DNH option did
not affect the pattern of performance observed in Figure 4.54.
14
119
Confidence weighting
Equal weighting
12
RMSE
10
10
15
20
25
30
I
implies QLD , and + r
implies QI
. To a large extent, the activity weighting
is already covered by the confidence weighting described previously, so it is unlikely to improve results. Numerical evidence to support this is given in Figure 4.55, where the activity
weighting provides no advantage over equal weighting.
120
14
Activity weighting
Equal weighting
12
RMSE
10
10
15
20
25
30
average, to be very similar. A high value of l could be used to gauge reliability of a segment
mean. But due to the small number of pixels in most clusters, a reliable estimate of the
sample segment variance is difficult to obtain. In the case of "E , the sample variance is
not even defined. For these reasons it will not be investigated further.
Summary
Of the four influences considered for weighting overlapping estimates, equal weighting, or
simple averaging, was found to perform best in terms of RMSE. If overlapping is enabled,
the compound estimate, k , will be computed using Equation 4.31, where
are the
121
estimates being combined. Using a linear combination for calculating the compound estimate is only one possibility. Non-linear functions, like the median, could also be used, but
further investigation of these possibilities is beyond the scope of this chapter.
j
k
(4.31)
When DNH is triggered, the filtered pixel value for the window are set equal to the original
pixel values it is assumed that the best estimate of a pixels true value is itself. In this
thesis, an estimate, , from a DNH model is included in the formation of the compound
estimate. An alternative would be to consider DNH estimates as unreliable, and assign them
zero weight. If the compound estimate could not be formed due to all estimates being unreliable, the original noisy pixel value could used instead.
pixel window.
The obvious progression is to extend the thresholding method from Section 4.7 to work for
more than two clusters. Local segmentation can be considered to have two main components.
The first is the segmentation algorithm, which provides different candidate segmentations of
the local region. In our case, each segmentation differs only in the number of clusters, Q , but
it could be possible to incorporate alternative segmentations. The second component is the
model selection technique, which decides which of the candidate segmentations is the most
appropriate. In our case, this would choose the appropriate value of Q for the window.
122
clusters.
The objective function of Q -means is to minimize the sum of squared deviations of pixels
from their cluster means. Each iteration of the Q -means algorithm is guaranteed to improve
the objective function, and the algorithm will always terminate. Unfortunately, the algorithm
will stop at the nearest local minimum, which is not necessarily the global optimum [GG91,
BB95]. The minimum reached depends wholly on the choice of initial cluster means [BF98].
Three techniques for choosing the initial means will be investigated in Section 4.12.3.
123
separated is deemed the optimal clustering. Equation 4.32 describes this formally, where Q
is the optimal number of clusters and # k is the estimated mean for the
argmax
ie ~\a,[
a\
#
k n# k
)+&Z,F
27
cluster.
,F /
{
(4.32)
The Q -means algorithm can easily be configured to generate cluster means in ascending numerical order, so only adjacent cluster means need to be tested for separation. Equation 4.33
shows a simplified version of Equation 4.32, where # k - 4 is the
e
argmax
# k - 4 # k -
U 4
<)p&,F
27
, /
{
(4.33)
Measurement space
One approach is to spread the Q initial means evenly throughout the measurement space.
Equation 4.34 describes this, where ? 0 and ? are the lowest and highest pixel values
being clustered, and d# is the initial mean for cluster .
d# X ? 0 u#)B /
? '? 0
QT
"RS@dS
}J;;~Q
(4.34)
This method assumes that the pixel values are uniformly distributed in intensity. If the pixel
value distribution is highly skewed, some clusters may never be used. This is known as the
empty cluster problem [HM01].
124
Rank space
An alternative assignment of initial means is given in Equation 4.35, where ? - 4 is the
27
sorted pixel value. Instead of spreading the initial means uniformly by intensity, they are
chosen uniformly by rank, where a pixels rank is its sorted position. For example, in Equation 4.34 the pixels ? and ? are the same as ? -mU4 and ? -
d# b ?
)n
\ \ Z
" S@dS
q/
respectively.
(4.35)
}J;;~Q
The rank technique improves the chance that the initial clusters will have at least one member
each, as the initial means are equal to existing pixel values. If the pixel data lacks variety,
adjacent clusters may still be assigned the same initial mean. This would result in one cluster
arbitrarily being chosen over another, leaving one empty after the first iteration.
d# b ? \ Z
\
If $?
" S@dS
}J;;~Q
O Q
Qv$
? $
(4.36)
Q , it is impossible to cluster the data into Q distinct groups. Section 4.12.3 will
18
125
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.56: Comparison of three different methods for choosing the initial " means, in terms
of RMSE denoising performance on montage.
because there are more parameters to estimate and distributions to compare. Any heuristics
or computable bounds on
by local segmentation.
is therefore
Block variance If the window variance, Fd@?. , is less than the estimate of the noise vari
to produce clusters far enough apart. In this
ance, & { , then it is not possible for Q I
situation O .
126
Maximum possible fit The range of the pixel values, ) ? -? 0 / , also limits the maximum value of Q achievable. Equation 4.37 calculates how many HC& -wide normal
distributions could fit within the pixel range. If using the simple mean separation
criterion, the same value of H should be used for each window. If using the -test separation criterion, the maximum possible effective value of H must be chosen to ensure
correctness in the extreme case.
? '? 0
u<
H&
(4.37)
Unique pixel values It is not possible to have more clusters than unique pixel values, because pixels with the same value are necessarily clustered together. If the pixels are
already sorted, as required by the initial means method of Equation 4.36, the number
of unique values may be obtained with a simple one-pass )fI/ algorithm.
Limited by the user The user may wish to place an upper limit on . They may have
prior expectations regarding the number of segments in each region, or simply wish
the implementation to run more quickly. This is particularly suited to large windows,
because one would not reasonably expect to find 49 clusters in a
window.
127
Figure 4.57 compares binary and multi-class thresholding in terms of RMSE for denoising
montage. For the moment, both overlapping averaging and DNH are disabled. The RMSE
is much better for the multi-class model, especially at lower noise levels.
18
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.57: Comparison of binary and multi-class local segmentation for denoising montage. DNH is disabled and r .
Figure 4.58 provides results for when overlapping averaging is enabled. Although the RMSE
improves for both techniques, binary thresholding benefits the most. The gap between the
two methods is only significant when &Y . This suggests that averaging estimates from
9 possibly incorrect binary models may produce a compound estimate which better reflects
the underlying image than a single estimate alone.
In Section 4.9, do no harm was introduced as a way to limit the worst case performance
of a denoising filter, and was found to dramatically improve RMSE performance at lower
noise levels. Figure 4.59 compares the same binary and multi-class algorithms when DNH
is enabled, but overlapping averaging is disabled. Surprisingly, binary thresholding does
marginally better when &M$ , while multi-class thresholding is better at high noise levels.
128
14
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 4.58: Comparison of binary and multi-class local segmentation for denoising montage. DNH is disabled and r .
When the noise level is low, intricate image structure is more easily preserved. The multiclass algorithm will do its best to choose a high value of Q to fit this structure, but will
probably not capture all the detail. The binary model will also do its best, but with only two
clusters it is unlikely to do as well as the multi-class fit. Thus the binary method is forced
to invoke DNH, allowing fine detail to pass through unmodified. Retaining fine detail with a
small amount of noise is more beneficial than removing both noise and some image structure.
Figure 4.60 again compares the two methods when both DNH and overlapping averaging
are enabled. The RMSE metric is unable to distinguish the two methods to any significant
level. Although not shown, my experiments show this behaviour is duplicated when larger
windows, such as
and
, are used instead.
These results show, at least for montage, that more complex multi-class modeling does
not improve overall RMSE results when DNH and overlapping averaging are utilized. This
behaviour could be exploited if the algorithm was implemented in hardware, or as embedded
18
129
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.59: Comparison of binary and multiclass local segmentation models on montage. DNH is enabled and r .
software running on a system with relatively low processing and memory capabilities. The
results also suggest that a large proportion of local windows are well modeled by two or
fewer segments, or that DNH models complex regions better than a piece-wise constant
multiclass model can. When a large number of segments are present, there are too few pixels
with which to estimate the class parameters.
Consider a noisy &
, and Q'
the multi-class method for values of Q up to 8. The image for QL is not included, as it was
only used twice for the whole image. Table 4.5 lists the frequencies of model usage, DNH
included, for the binary and multi-class methods for the same image.
The distribution of Q for the multiclass method is monotonic decreasing. It did not use values
of Q
very often in montage. Interestingly, QjI was chosen twice. This is equivalent
to using DNH, because each pixel is given its own cluster. DNH is not used very often in the
130
14
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 4.60: Comparison of binary and multiclass local segmentation models on montage. DNH is enabled and .
Figure 4.61: Binary local segmentation of noisy G montage. White denotes: (a) DNH;
(b) "C ; (c) "b .
synthetic quadrants of montage, its invocation being more beneficial in the natural areas.
When restricted to binary thresholding, the number of homogeneous regions remains the
need to be reclassified. It seems
same. The 16% of windows previously diagnosed as Q #
that seems that 11% were allocated to DNH, while 5% were adequately handled by QR
.
131
DNH 4%
15%
Q
52%
52%
QI
28%
33%
Q#
10%
Q
4%
QI
1.1%
Q#
0.2%
QI
0.03%
Q#
11 pixels
2 pixels
Q#
Table 4.5: Binary and multi-class model usage for noisy LP montage.
132
image except that each pixel has been replaced by the unbiased standard deviation of its
local neighbourhood, itself included. As expected, the standard deviation is lowest in
homogeneous regions, and highest in textured and edge regions. It is possible to use the
distribution of local standard deviations to estimate the global noise variance, & { .
Figure 4.64 shows a histogram of all local standard deviations in montage having values
from 0 to 100. If montage was completely homogeneous, the histogram would be normal
with mean & and variance & { 1x . However, the existence of heterogeneous regions in
montage places many high variance entries in the right hand tail of the histogram [RLU99].
Figure 4.65 plots a smoothed version of the histogram for values up to 20. The smoothing
was performed using a sliding window average of width 3. The smoothed histogram has
a clear peak at around 5, which is equal to & , the standard deviation of the added noise.
133
0.008
0.007
Relative frequency
0.006
0.005
0.004
0.003
0.002
0.001
20
40
60
80
100
Thus the mode of the local standard deviation distribution could be used as an estimate for
& [BS85]. This method belongs to the second category of noise estimation algorithms.
134
0.008
0.007
Relative frequency
0.006
0.005
0.004
0.003
0.002
0.001
10
15
20
A noise estimation technique from the first category is Immerkrs method [Imm96], designed specifically for the case of zero mean additive Gaussian noise. To remove image
structure it applies the
linear filter, i , in Equation 4.66, to the noisy image. This filter
is the weighted difference of two Laplacian [GW92] filters, and { , which estimate the
second derivative of the image signal. The effect of i is to reduce constant, planar and
quadratic facets to zero plus a linear combination of the noise. The effect of this filter
on lenna is given in Figure 4.67.
0 1
1 -4
0 1
0
1
0
{
1 0
0 -4
1 0
1
0
1
iW# { X
F
1
-2
1
-2 1
4 -2
-2 1
Figure 4.66: Filter masks used by Immerkrs noise variance estimation technique.
Once the image has been filtered to remove structure, the filtered pixel values can be used to
compute the estimated noise variance, denoted & k { . Equation 4.38 describes the calculation,
135
Figure 4.67: (a) original lenna image; (b) after Immerkr structure suppression, with mid
grey representing zero.
where w)|,-/ i
not use margin pixels because they do not have a full neighbourhood.
&k {
{ {
B)Wb
B/@)+IX
B/ ~
)+w)|,/ i'/ {
(4.38)
%
By using the fact that, for a zero mean Gaussian random variable , { [
{
Immerkr describes an alternative method for computing the estimated noise standard deviation. Rather than the sum of squared residuals, Equation 4.39 uses the scaled sum of absolute
deviations. This formulation has two advantages: the summation requires no multiplications,
and the absolute deviation is more robust to the presence of outliers [Hub81]. It would also
be possible to use alternative robust methods, such as the median absolute deviation from the
median [RL87], but these will not be investigated here.
&'
k
{ { %
)pWb
B/)fIX
B/ ~
w)|,-/ i
(4.39)
Figure 4.68 compares the three noise estimation techniques for estimating the standard deviation of synthetic noise added to montage. As hypothesized earlier, the two Immerkr
136
methods consistently overestimate & , because they can not remove all the image structure.
The mode method does very well, but slightly underestimates high values of & .
Although montage is assumed noiseless, it was mentioned in Section 4.4 that the lower
right quadrant does contain a small amount of noise, hence the robust Immerkr estimate
may have some truth to it. If the noise level varies across an image, any global estimate will
fall somewhere between the lowest and highest levels.
35
Estimation Method
Actual
Mode of local variances
Immerkaer (standard)
Immerkaer (robust)
30
25
20
15
10
5
0
10
15
20
25
30
In real situations, the noise variance must be estimated from the noisy image without reference to any ground truth. Consider the familiar lenna image from Figure 4.18, which
already contains noise. Let us assume that it is additive Gaussian noise with variance l { .
If synthetic noise of variance & { were added to lenna, the actual noise variance would be
a combination of the original and the synthetic, namely & { u
l
estimation techniques in the same manner as before on images already containing noise by
correcting the actual standard deviation to include the pre-existing noise. Instead of comparing estimated noise levels to & , the curve
137
0.014
Relative Frequency
0.012
0.01
0.008
0.006
0.004
0.002
0
2.5
7.5
10
A method for estimating the base noise level, l , is needed. Figure 4.69 plots a smoothed
histogram of lennas local
standard deviations. The mode of this histogram estil Y
NB , while the robust Immerkr method estimates R
l Y
NB . Figure 4.70 commates
)
NBu
4F/~xB
*
4B
to correct
the curves. All the methods now mostly follow the ground truth. The standard Immerkr
method slightly overestimates at the lower end, but this is somewhat due to its estimate not
contributing to the base noise level estimate, l . The mode method slightly underestimates
at the higher end, just as it did for montage. At higher noise levels the assumption that all
the heterogeneous variances exist in the far right of the histogram breaks down, causing the
mode to be less accurate. It may be possible to estimate the mode by first fitting a spline to
the data, and then analytically determining the peak. Overall, the deviations are minor, and
any of the estimates would probably produce similar denoising results.
The specific noise estimation procedure used is only important with respect to the quality of
the variance estimates it produces. The local segmentation process does not require modification if the noise estimation technique is varied. The noise estimator is a separate module,
which may be replaced with a better technique if one becomes available.
138
35
Estimation Method
Actual
Mode of local variances
Immerkaer (standard)
Immerkaer (robust)
30
25
20
15
10
5
0
10
15
20
25
30
Figure 4.70: Different noise estimation algorithms for lenna, corrected for existing noise.
Use of Lloyds algorithm for threshold selection local segmentation (Section 4.6)
Use of the do no harm (DNH) option to limit worst case behaviour (Section 4.9)
Equal averaging of overlapping estimates, after DNH has been applied (Section 4.11)
139
This variant shall hereafter be referred to as FUELS filtering using explicit local segmentation. FUELS will be compared to the four denoising algorithms described below, each of
which was described in detail in Section 3.4.2. These algorithms were chosen because, like
FUELS, they operate locally, are relatively efficient, and use similar image models.
WMED (Section 3.4.4)
WMED is the
centre weighted median with the centre pixel included three times.
Although not as efficient as the mean in homogeneous regions, it can preserve fine
lines and some edges. WMED may be considered a minimum level of performance
that any structure preserving denoising algorithm should reach. It is probably better
known for its resistance to impulse noise.
GIWS (Section 3.4.5)
Gradient inverse weighting smoothing, centre pixel not included, will be used. GIWS
can be considered an accelerated single iteration of anisotropic diffusion.
SUSAN37 (Section 3.4.5)
The standard 37 pixel SUSAN filter is one of the best local denoising algorithms in the
literature, and has a very efficient implementation. It requires a brightness threshold
, and Smith claims that
:
F works well over all image types. I found that setting
#& k gave better results, where & k is the same estimated noise level that FUELS uses
140
value of & . FUELS clearly outperforms the others at all noise levels. SUSAN37 also did well
until &E
, after which point SUSAN9 took over. SUSAN9 performed badly for very low
noise levels. It is unable to assimilate enough pixels, and switches to being a median
filter. GIWS consistently tracks the SUSAN37 method, but 2 RMSE units higher.
20
FUELS
GIWS
18 SUSAN37
SUSAN9
WMED
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.72 repeats the experiment, except that this time FUELS and SUSAN utilise the
estimated noise variance. Immerkrs noise estimation method tends to overestimate & for
montage. This is due to the large number of strong step edges, which are somewhat atypical
for photographic type data. The results once again show FUELS to perform best, although
SUSAN is within one RMSE unit at most points. It is not clear why SUSAN37 does better at
low noise levels, and SUSAN9 at high levels. I would expect a larger mask to provide more
smoothing at higher noise levels than a smaller one.
141
20
FUELS
GIWS
18 SUSAN37
SUSAN9
WMED
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
The use of the estimated noise variance has narrowed the gap between those algorithms
which exploit its knowledge, namely FUELS and SUSAN, and those which do not, namely
GIWS and WMED. The poor performance of SUSAN9 at low noise levels has also gone.
This must be due to the estimated noise variance being higher than the true variance.
The resulting brightness threshold supplied to SUSAN is therefore higher, allowing it to
assimilate more pixels.
142
errors at segment boundaries, but SUSAN37 seems to have removed more noise within segments. This could be due it using a 37 pixel window, compared to FUELS 9 pixels. The
bottom right quadrant contains a lot of fine detail, and overall FUELS appears to have done
better than SUSAN37 there. The roof and shutters are much more obvious in the SUSAN37
filtered difference image.
Figure 4.73: (a) noisy I part of montage; (b) FUELS enhanced difference image,
RMSE=3.48; (c) SUSAN37 enhanced difference image, RMSE=4.11.
Figure 4.74 plots the WCAE for montage. Both WMED and GIWS have a near constant
WCAE, suggesting there is a pixel pattern in montage which they both consistently get
wrong. FUELS WCAE seems directly proportional to & . This behaviour is desirable because it indicates that its mistakes are due to the random nature of the noise, rather than
difficulty preserving any particular image structure. Overlapping averaging and DNH are
mostly responsible for this positive feature of FUELS.
Both SUSANs have poor worst case performance at lower noise levels. Figure 4.75a shows
a vF part of montage without added noise, for which SUSAN is given a brightness
threshold
BB4F . Figures 4.75bc show SUSAN37s output and the corresponding dif-
ference image. The two white spots in the difference image are large filtering errors, the
worst of which is out by 150 intensity units. This behaviour can be explained by the fact
that SUSAN switches into a special mode when all the neighbouring weights are too small.
In this mode, SUSAN assumes the centre pixel is impulse noise, and instead uses the median of its eight immediate neighbours as its denoised estimate. The region of montage in
Figure 4.75a is very busy, and the median obviously chose inappropriately.
143
250
200
WCAE
150
100
50
FUELS
GIWS
SUSAN37
SUSAN9
WMED
0
10
15
20
25
30
Figure 4.75: (a) part of montage without added noise; (b) SUSAN37 output using ;i; ;
(c) difference image.
144
algorithm will remove both the original and synthetic noise. As the added noise level, & ,
increases, the original noise is swamped and hence can mostly be ignored. However, when
&$\
]l or so, the original noise level will impact the RMSE results. In these cases visual
inspection of the denoised and difference images would be necessary to fairly compare the
quality of smoothing and structure preservation.
Figure 4.76 plots RMSE results for lenna. Although FUELS performs best at all noise
levels, the two SUSAN variants are very close. In fact, when &E , FUELS, SUSAN and
GIWS perform within one RMSE unit of one another. Perhaps the common use of lenna
as a test image has mildly biased algorithmic development toward those techniques which
do well on lenna and other images with similar properties.
20
FUELS
GIWS
18 SUSAN37
SUSAN9
WMED
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
Figure 4.77 shows the WCAE when denoising lenna. All except FUELS have a nearconstant WCAE. Interestingly, SUSAN37 improves its WCAE as the added noise increases
from none to low amounts. This unusual behaviour is once again likely to be due to its
fall-back median filter. Although the monotonic increase of FUELS WCAE is desirable, it
145
in the feathers of lennas hat, which is notoriously difficult to process. Modeling them as
piece-wise constant segments is likely to be problematic, and invoking DNH in high noise
environments will create errors proportional to the noise level.
250
200
WCAE
150
100
50
FUELS
GIWS
SUSAN37
SUSAN9
WMED
0
10
15
20
25
30
146
margin on the right hand side. This variation in features make barb2 useful for examining
the structure preservation capabilities of denoising algorithms.
Figure 4.78: (a) the ;;CL; barb2 image; (b) its histogram.
Figure 4.79 compares the denoising algorithms in terms of RMSE on barb2. Compared to
lenna, the results are more diverse, but FUELS still does best for all values of & . Another
interesting fact is that SUSAN37 achieves lower RMSE results than SUSAN9 up until &2 ,
after which they swap. Intuition suggests to me that this should be the other way around, as
a larger mask should allow better smoothing in the presence of more noise. The threshold
chosen for SUSAN may not be appropriate for all image and noise level combinations.
When denoising a highly patterned image like barb2, one would expect most algorithms
to produce some large errors. The WCAE graph in Figure 4.80 supports this hypothesis.
SUSAN9 had particular trouble with the large number of edges in the image, again due to its
fall-back median filter. FUELS has done very well for &3 , assisted by its DNH feature.
Figure 4.81 compares the denoised outputs of FUELS and SUSAN for a sub-image of
barb2 without added noise. The estimated noise standard deviation was 4.2. The corresponding difference images are also included. These were enhanced by linearly stretching
differences in the range C
,;
to ,
BB . Large differences were clamped to
.
FUELS has left very little structure in the difference image. Its DNH option has obviously
been used in the vest area, because most of the differences there are zero. SUSAN9s difference image exhibits little image structure too. The large differences occur in clumps, rather
than spread evenly across the image. They are particularly noticeable on the vest. This could
4.15 Conclusions
147
20
FUELS
GIWS
18 SUSAN37
SUSAN9
WMED
16
RMSE
14
12
10
8
6
4
2
10
15
20
25
30
be due to the fall-back median filter, which is a poor choice for that type of image structure.
The face and wicker chair from barb2 are clearly noticeable in the SUSAN37 difference
image. Although SUSAN37 produces smaller differences on average than SUSAN9, they
are correlated with edges in barb2. This is probably due to the 37 pixel mask being more
likely to encompass segment boundaries, and hence blur them slightly.
A histogram of the three difference images is given in Figure 4.82. FUELS errors all occur
in a tight distribution around zero. SUSAN37 has a slightly wider band, and one or two
outlier errors at 50. SUSAN9s predisposition to larger errors is clearly indicated by the fat
tails of its distribution.
4.15 Conclusions
It has been shown that the principles of local segmentation can be used to develop effective
denoising algorithms. After many analyses, the FUELS algorithm for denoising greyscale
148
140
120
WCAE
100
80
60
40
FUELS
GIWS
SUSAN37
SUSAN9
WMED
20
0
10
15
20
25
30
images contaminated by additive noise was presented. FUELS has an efficient implementation, and only requires one parameter, the level of noise in the image. This can be supplied
by the user, or FUELS can determine it automatically. FUELS was shown to outperform
existing methods, like SUSAN and GIWS, for a variety of images and noise levels.
Both quantitative and qualitative methods were used to compare FUELS to other methods.
The RMSE was used to measure objectively the closeness of denoised images to the originals. FUELS consistently produced lower RMSE results than SUSAN, the next best performer. The WCAE was used to gauge the worst case performance of each algorithm. FUELS had the desirable attribute of having a WCAE proportional to the noise level in the
image. The others, SUSAN included, tended to have constant or erratic WCAEs. To assess
the structure preserving ability of each algorithm, difference images were used to highlight
those areas of the image in which larger errors occurred. Although structure was apparent in
all the difference images, FUELS tended to contain the least.
The FUELS algorithm has various attributes which are responsible for its good performance.
4.15 Conclusions
149
Figure 4.81: For barb2 with no added noise: (a)(c) FUELS, SUSAN9 and SUSAN37
denoised output; (d)(f) corresponding enhanced difference images.
Like SUSAN and GIWS, FUELS attempts to average only those pixels which, in some sense,
belong together. FUELS achieves this by explicitly segmenting the whole local region, insisting that each pixel belong wholly to one segment. This contrasts with SUSAN and GIWS,
which advocate a soft cut-off. A hard-cut off ensures that pixel values can not diffuse across
segment boundaries to influence other, unrelated, pixels. It has the advantage of providing
a local approximation to the underlying image, which is arrived at democratically, because
each pixel contributes equally to the local segmentation. This is unlike SUSAN and GIWS,
which assimilate pixels based on their relationship to the centre pixel in the window only.
FUELS acknowledges the requirement for denoising algorithms to have magic parameters
to allow adaptation to different images. The most common type of parameter controls the
distinction between noise and structure. GIWS implicitly has one in the
{
nominator of its weighting function, while SUSAN needs a brightness threshold to control
150
30
SUSAN9
SUSAN37
FUELS
25
Frequency
20
15
10
5
0
-100
-50
0
Error
50
100
Figure 4.82: Partial histogram of FUELS and SUSAN denoising errors for barb2.
its pixel assimilation. It is not always clear how to choose these parameters appropriately.
The use of local segmentation naturally links the image model to the segmentation algorithm used. FUELS was developed under the assumption of piece-wise constant segments
corrupted by additive zero-mean Gaussian noise. This obviously suggests the noise standard
deviation as a measure of the natural level of pixel variation within segments. FUELS has
the advantage of automatically determining this parameter if it is unknown to the user.
Determining a local approximation to the underlying image at each pixel position has two
advantages. Firstly, because each pixel participates in multiple overlapping local approximations, there exist multiple estimates of that pixels true value. FUELS exploits this by
averaging the overlapping estimates. This has the effect of increasing the effective window
size by a factor of 2.8, without the overhead of segmenting more pixels. Secondly, the local approximation can be assessed, and rejected if necessary. Just because an algorithm has
determined its best estimate for a pixels value does not mean that estimate is of high quality. If any of the locally approximated pixel values differ too much from their original noisy
151
values, FUELS refuses to accept them. Instead, the local approximation is set equal to the
unfiltered pixel values. The acceptable intensity difference used is strongly related to the
estimated image noise level. This do no harm philosophy significantly improves results at
lower noise levels, and may be applied to any denoising algorithm, not just FUELS.
Examination of images in terms of local segmentation has lead to a better understanding of
image processing on a small scale, particularly for the commonly used configuration.
At conversations and seminars, I have often heard the off-hand comment that only 10% to
20% of images are edges. Analysis of the distribution of Q values chosen for images in the
chapter suggest that only around 5060% of pixels are locally homogeneous, 2030% consist
of two segments, and 1020% tend to be difficult to model well. Perhaps the speakers were
confusing edges with those pixels which are difficult to predict or model. The success of
FUELS suggests that attempts to model these difficult blocks does not significantly improve
denoising performance.
Pattern Recognition 1998 [ST98]. That paper outlined a structure preserving denoising algorithm which decided between QG
and QE
, with Q
the pixels in the neighbourhood was greater than F& . The noise level & was estimated as
an average of the local variances of homogeneous regions, with homogeneity being declared
if the Sobel edge strength [Sob70] was less than 16. The QR
model was thresholded using
the dynamic mean [CC94], and not iteratively determined. The concept of overlapping
averaging was also introduced, albeit only with equal weights.
Chapter 5
Information Theoretic Local
Segmentation
5.1 Introduction
In Chapter 4, the principle of local segmentation was introduced and applied to the problem
of image denoising. Initially, two candidate segmentations were considered for modeling the
local neighbourhood of each pixel being processed. The first candidate assumed the local
region was homogeneous. The second candidate was generated by thresholding the pixels
into two classes. A simple model selection criterion was then used to decide which of the
two candidates best modeled the underlying image. The selection criterion was based on
the level of noise in the image. It tried to choose a two-segment model only if the segment
means were separated enough. Later, the concept was extended to allow for more than two
candidate segmentations by using the Q -means algorithm for multi-level thresholding. Up to
Surprisingly, allowing the model selection process to consider models with more than two
segments did not improve results significantly. There could be two main reasons for this.
Firstly, it may be true, that, on a small scale, one is unlikely to encounter junctions between
more than two segments. Secondly, the use of thresholding to group small amounts of data
into a relatively large number of clusters may be inappropriate. Thresholding relies solely
on pixel intensity for guiding segment membership. As the image noise level increases,
153
154
spatial information may be required for successful segmentation. This could be used to
better distinguish whether pixel variation is due to noise, or to underlying image structure.
Local segmentation is fundamentally concerned with choosing the best underlying local image model, from a set of candidate models. This is an example of the quite general problem of
inductive inference, or model selection. Over the last few decades Bayesian techniques have
proved to be the most reliable for model selection [RR98, BS94]. In this chapter, Wallaces
Minimum Message Length (MML) criterion [WB68, WF87, WD00] for model selection is
used to objectively compare different local segmentations. MML is an information theoretic criterion related to traditional Bayesianism, but extended to function better with models
containing continuous parameters. MML evaluates models using their message length. A
message is an efficient, unambiguous joint encoding of a model and the data. The model
associated with the message of shortest overall length is deemed the best model for the data.
Image denoising will again be used as a test bed for exploring the potential of an information
theoretic approach to local segmentation. It will be shown that the MML model selection
criterion leads to better RMSE performance, especially at higher noise levels, by removing the minimum contrast difference that FUELS requires. Instead of being treated as a
post-processing step, the do no harm heuristic will be shown to fit naturally into MMLs
information theoretic framework. A much larger set of candidate segmentations are considered, allowing spatial information to be exploited. With FUELS it unclear how to compare
two different Q'
models, but the MML criterion makes this straightforward. The MML
denoising algorithm also learns all of its required parameters from the noisy image itself.
:
the quality of and choose the best model from ? Consider the familiar problem of polynomial regression. The measurement data consists of i
Equation 5.1 describes an
27
)Z/
155
(5.1)
A particular model, , is fully determined by its polynomial order , and its corresponding
polynomial coefficients, to . Because the coefficients are continuous variables, there is
an infinite number of possible models, irrespective of the value of . Because it is impossible
to evaluate an unbounded number of models, the model space must be reduced intelligently.
The residuals, @;9 , are the differences between the regression model and the actual measurements. Typically, the residuals are assumed normally distributed, namely 698
)f ,& { / .
For a given model order , the least-sum-of-least-squares algorithm [Nie94] can be used to
estimate the optimal polynomial coefficients by minimizing
models are more difficult to judge. The linear model has the virtue of being simpler, but the
quadratic appears to fit the end points more closely. An objective criterion is required to
choose the best model automatically.
k
argmax
t
c0d)+h
%
(5.2)
For the regression example, a model consists of the value of , and u# polynomial coefficients. The data consists of i -coordinates in the form of residuals from the fitted model,
namely @D)p/ . As mentioned in Section 5.2, the residuals are assumed normally
156
12
10
8
6
4
2
0
10
10
12
D
f(x)=a0
f(x)=b0+b1x
2
f(x)=c0+c1x+c2x
10
8
6
4
2
0
6
x
157
distributed. The variance of this distribution is calculated from the residuals [Nie94], and
is assumed true and known, just like the coordinates. The likelihood then is equal to the
joint probability of the residuals, given the model. Traditionally the residuals are assumed
independent, hence the joint probability becomes a simple product, shown in Equation 5.3.
cJd;)fh
%
%
/0#c0d;) , { ,;;;,9 )/,&|/0
cJd)@ )/,& /
(5.3)
Which model would the ML approach choose from the three in Figure 5.2? The higher the
polynomial order, the better the fit to the data, and the higher the likelihood. Thus ML would
choose the most complex model from
(3`
). Imagine a polynomial with the same number of coefficients as there were data
points. This curve would pass through each and every point exactly, causing all residuals
to vanish. In this situation the likelihood is unity, and ML would consider it the best fit for
the data. This is despite the fact that the fitted polynomial would probably be contorted and
non-intuitive, especially for values of outside the range of the original data. Moreover, if
polynomials with degree i
tendency to over-fit, as the complexity of the model is not taken into consideration.
158
Probability
c0d
)pZ
/
- 4
maximize
c0d)Z/
minimize
c0d)Z/c0d)+-/
cJd)pZ/[b)f ,;/
)Z/ u)p/
)Z/b)f , /
k
argmin
t
'BJc0d)+h
argmin
t
r)fh
%
%
(5.4)
(5.5)
)/ is a penalty term. The penalty term may depend on features of the data and
model. For example, the amount of data, or the number of model parameters.
"!
k
argmin
t
)fh
%
/u
) ,.hv/
(5.6)
Over the years many penalized likelihood techniques have been developed for particular
domains [Kuh00]. The most enduring approach has been the Akaike Information Criterion, or AIC [Aka92, Boz83]. AIC is approximately equivalent to minimizing the expected
Kullback-Leibler distance [KL51, DDF 98], and has been applied to various problems such
as Markov chains and segmenting time series data. Penalized likelihoods methods have been
mostly superseded by other techniques, discussed later.
The local segmentation model selection criteria of Chapter 4 may be interpreted as a penalized likelihood technique. The Q -means clustering algorithm attempts to find the maximum
159
likelihood clustering for the pixel data, under the assumption of Gaussian clusters. The mean
separation criterion may be considered the penalty term. When the means are not separated
enough, the penalty term becomes infinite, and hence that model can not be chosen.
5.2.4 Bayesianism
Equation 5.7 shows Bayes formula for the probability of a model given the data, usually
referred to as the posterior probability of the model. As c0d)+hv/ is constant with respect to
different models for the same data, it may be disregarded. The resulting posterior probability
is proportional to the product of the model probability and the probability of the data with
respect to that model, the latter being the familiar likelihood.
c0d)
%
hj/0
c0d)
$#
cJd;)fhv/
hv/
c0d) / 2c0d)+h
cJd;)fhv/
%
GIcJd;)
/[cJd;)fh
%
(5.7)
The model probability, c0d) / , is usually referred to as the prior probability of the model.
The prior defines a probability distribution over all possible models and their parameters, and
must be supplied by the user. It allows the incorporation of relevant background knowledge
before the data is observed. Ignorance can be expressed using a uniform prior, which gives
equal probability to all models. Thus the prior term may be ignored, and using Bayes rule
becomes equivalent to using the maximum-likelihood method of Section 5.2.1.
The resulting posterior is a probability distribution over all models. If any of the model parameters are continuous, the posterior will be a density, and any specific model will have zero
posterior probability attached to it [BO95]. Some Bayesians will insist that, for inference
problems, the posterior distribution is a sufficient final result they believe it is unsound to
select a specific model from it [OB94]. However, some applications require either a single
best model, or at most a few likely ones.
Collapsing the posterior distribution results in a single best model, called a point estimate.
Common point estimates are the mode, mean, or median of the posterior distribution, each
corresponding to different loss functions [OB94]. The mode of the posterior is most commonly used, and is called the maximum a posteriori (MAP) estimate. The MAP estimate
picks the model which makes the data most likely, and is shown in Equation 5.8.
160
kX
&
argmax
t%
c0d) /c0d)fh
%
(5.8)
The main problem with the MAP estimate is that, although it picks the peak of the posterior
distribution, this peak may not have much probability mass associated with it. An alternate
lower peak may have much more posterior probability mass surrounding it, and could be a
better overall estimate. Figure 5.3 gives an example of this situation for the case of a model
with a single continuous parameter. When the model consists only of discrete parameters,
the MAP estimate does not suffer from the same difficulties, because neighbouring (discrete)
points in model space are not necessarily related.
Posterior
probability
Model
MAP estimate
Alternate estimate
Figure 5.3: The Bayesian MAP estimate may not always coincide with the largest peaked
zone of probability mass.
%
parameters, and iY is the number of data items. For the regression example in Section 5.2,
i uA , the number of polynomial coefficients, and i+Y#i , the number of data points.
k
Y
161
argmin
t
F'& i u
{
i
B
{
iYu()fh
%
(5.9)
/
The B & function is pronounced log star. It estimates the length in bits for encoding an
{
a normalization constant which ensures decodability. The first B term encodes the value
of the integer. To decode the value, the decoder needs to know how many bits were used to
encode it. This is provided by the double-B term. The triple-B term encodes the number
of bits used for the double-B term, and so on. Eventually a point is reached where encoder
and decoder have mutual understanding. The log star term is usually small compared to the
overall code length, so is sometimes ignored when applying Equation 5.9 [BO95].
F & )O
NB uXB ubB )+B Z/ uXB )+B )pB Z/y/|u$
{
{
{
{
{
{
{
(5.10)
This MDL criterion may be interpreted as a penalized likelihood or Bayesian method, with
the model order costing B & i bits, and each model parameter costing F
{
iY
bits.
The model parameters are transmitted more accurately only if justified by more data. MDL
has been developed over the years [Ris87, Ris00], but each formulation has one or more
of the following drawbacks: not being invariant under transformation of the data or model
parameters, poor performance with small amounts of data, an inability to specify useful prior
knowledge about the data, and a focus on selecting a model class rather than a particular
model [WF87, BO95]. In local segmentation, a model class would be Q2
or Q2:
, but
the model class does not specify any particular parameter values, such as which segment
each pixel belongs to, or the segment means.
162
given that model be constructed. The message having the shortest overall message length is
considered to contain the best model for the data. This is shown in Equation 5.11.
k
argmin
t
) / u()fh
%
(5.11)
/
Each message must be constructed in such a way that it can be decoded by a receiver given
only the same prior knowledge. That is, the message must be a lossless encoding of the data.
If the data and model parameters are all discrete, MML is equivalent to the Bayesian MAP
estimate in Equation 5.8, due to the equivalence of probability and code length.
The fundamental difference to Bayes rule occurs when the model consists of one or more
continuous parameters. To transmit a continuous value with a finite number of bits, it must
be quantized. MML provides a framework to optimally quantize the prior density. The
quantization may be interpreted as modifying the shape of the prior density such that peaked
regions with little probability mass are flattened out, and high mass regions are somewhat
boosted [Far99]. The posterior mode in the MML situation may therefore be different to the
Bayesian MAP estimate when the models have continuous parameters.
Quantizing model parameters introduces two complications. Firstly, the decoder does not
usually know which quantization levels were used. Secondly, data usually needs to be encoded with respect to specific parameter values, not parameter ranges. The first issue is resolved by including some bits for encoding the quantization bin sizes, the so-called accuracy
of parameter values (AOPVs) [OH94]. The second problem is handled by computing an expected message length rather than an absolute message length. The expectation is computed
by averaging all the message lengths that would result from using all possible parameter
values within the quantization regions.
MML is essentially a Bayesian method at heart. By converting Bayes formula in Equation 5.7 to code lengths, the MML formula in Equation 5.11 is arrived at. MML and
Bayesianism both advocate the incorporation of prior beliefs via cJd;) / . This is unlike MDL,
which attempts to do away with priors altogether [BO95]. An inference produced by MML
has the advantage of being invariant under linear and non-linear transforms of the parameter
space. MML also works well for small amounts of data [BO95, WD00]. When the amount
163
of data is very large, the length of the second part of the message dominates, and MML
gracefully reverts to maximum likelihood, just as Bayes rule and MDL do.
Segmenting pixels into a fixed number of classes results in two sets of information: class
labels denoting which class each pixel belongs to, and one or more parameters describing
the properties of each class . Consider segmentation of the noisy pixels, 0? , in Figure 5.4. If
a threshold of 60 is used, the first class has
B
, and the
second class { I members averaging # { IF . If the classes are numbered from 1, then
e contains the class labels for each pixel. The resulting local approximation, , is computed
by replacing each pixel with the mean of the segment it belongs to.
20
? = 24
23
22 84
21 81
22 87
I
B
#
{ #F
$
{ $
1 1
e = 1 1
1 1
2
2
2
22 22 84
= 22 22 84
22 22 84
is the maximum number of segments allowed, then each label could take on a
value from 1 to . By imposing a canonical ordering on the elements, the segment map, e ,
can be treated as a 1-D vector, with elements indexed as !@ . Figure 5.5 illustrates the use of
a canonical raster ordering for labels within a segment map.
!
e = !
!
! {
p
! E
!
r
!gB,.
,;;;,
Figure 5.5: Raster order convention for labels within a segment map.
164
#
possible segment maps. Figure 5.6 shows all of them, with
. Note also that the bottom right hand pixel is always in the same dark
segment. Because the labels are interchangeable, one degree of freedom is removed, so only
segmentations. The algorithm fixes the class label for the first pixel as 1. It then does a
165
/ / Global constants
/ / M : number of pixels
/ / K : maximum class label ie. number of segments
//
//
//
//
/ / empty list
/ / generate all canonical K-class M-tuples
depth first traversal to determine the next label in e . The only labels considered at each step
are those up to and including the largest label used so far, plus the next one along, but always
restricted to the range B,' . This ensures that no duplicates are generated.
166
ing 5.1, and removing those which fail a spatial-connectedness test. The algorithm for this
test is relatively simple, and is not included in this discussion. Table 5.2 shows that as
and
decreases, as does the proportion of valid segmentations within the canonical set.
Pixels in
window ( * )
4
9
9
9
Max. number
segments (+ )
2
2
3
4
segment parameters. For the case of additive zero-mean Gaussian noise, the mean is optimal,
but this could be modified for different types of noise. Figure 5.9 gives a sample block
of pixels and two candidate segment maps: the homogeneous e , and the heterogeneous e { .
20 22
? = 24 21
23 22
84
81
87
1 1 1
1 1
1 1 1
e = 1
1 1 2
e { = 1 1 2
1 1 2
]I
The number of segments The number of segments is equal to the number of different labels used in the segment map, e , and shall be referred to as b)feg/ . Because e is
homogeneous, b)fe /JD . For the heterogeneous segment map, b)fe { /0I
.
167
pixels in the segment map having label . Segment map e only has one population, so
I
and
{ # .
Representative values for each segment In this thesis, the noise is assumed to be additive
zero-mean Gaussian, for which the arithmetic mean is an efficient estimator of the true
population mean. Each segment has one mean, calculated using Equation 5.12. For
e , #'$g
4 , while for e { , # #
B
and # { $F .
#.
+ ,
(5.12)
model. The first part of the message is the model, , which contains all the information
168
required to construct a local approximation to the underlying pixel values. The segment
map, e , states which segment each pixel belongs to. The segment map is followed by the
means for each segment. The number of means depends solely on the number of unique
segments in e . This value is denoted b)+eg/ , and may be derived from the segment map, e .
Model
Segment map
Data
.....
p | c , 1 , 2 . . . . .
Equation 5.13 states that the message length of the model part is equal to the sum of the
lengths of each component it contains. This is only true if each component of the message is assumed independent. The details of encoding each component depend on our prior
expectations for the pixel values. This will be discussed soon in Section 5.4.2.
) /0-)+eg/u/.
-104
)B#Z
eg/
(5.13)
The second part of the message encodes the data with respect to the model preceding it.
In our case the data consists of noisy pixels, ? ? . Pixels from a greyscale image are
usually quantized to one of _
assumed constant
across the image. Thus it is assumed that pixels within a segment are normally distributed
around the segment mean. Equation 5.14 calculates the length of the data component (the
negative log-likelihood) given the model parameters.
)fh
%
/0D
169
]
2 q
< 4
36587
K
KM K
,
&
]
vB
(5.14)
the moment, let us assume that each segment map is equally likely. Equation 5.15 shows that
e may therefore be encoded with
b)feg/ , and the number of pixels belonging to each segment are implicitly known.
)feg/0<B
I
(5.15)
How many bits should be used for each segment mean? An optimal number will be discussed
in Section 5.7. For the moment a useful approximation is considered. We already know that
a pixel value can be encoded with at most B
_
{
usually rounded back to an integer at the final stage of processing. Thus a reasonable number
of bits for a segment mean could also be B
{
the overall length of the model component of the message under this assumption. Note that
because
r) /0#
ub)feg/-F
{
(5.16)
170
15 14 15
? = 13 17 12
16 11 10
&'DB4
(assumed)
assumed to be 1.5. Two candidate models will be considered, coinciding with those which
would have been generated by FUELS.
Figure 5.12 shows the first candidate model. It is the simplest one possible a homogeneous
segment map and a single mean. This model is 16 bits long 8 bits for the segment map
and 8 bits for the average pixel value.
1
e = 1
1
1 1
1 1
1 1
#'D-4F
\
) /Ju$C2B
BFCD; bits
%
{
)+h
/J#
1NB bits
9#
)
hv/0 4B bits
Figure 5.12 shows the second candidate model. It is the result of iteratively thresholding the
noisy pixel values, just as FUELS does. The resulting heterogeneous segment map fits the
pixels well. Its model length is 8 bits longer than the first candidates, as two means must be
encoded. However the data encoding length is much shorter. This is due to the pixels being
closer in value to their segment means.
1
e = 1
1
1 1
1 2
2 2
#
#
{ F
$
{ $
r) /J$u9
*2B
B17
1 bits
%
{
)fh
r
/J;-
F bits
9#
r)
hv/0B-4
F bits
Under the MML criterion, two candidate models are compared using their overall two-part
message lengths. Figure 5.14 illustrates the comparison using the two example models just
described. Candidate 2 has the shorter overall message length, and therefore Figure 5.13 is
the preferred model for the noisy pixels in Figure 5.11.
Using FUELS model selection criterion, Candidate 2 would be rejected because its two
%
Candidate 1
171
: 16 bits
Candidate 2
h : 24.93 bits
: 24 bits
h : 14.23 bits
to a fixed minimum contrast difference. In this example, MML made the better inference. It
should be noted that the -test threshold from Equation 4.27, not used by FUELS, is equal to
3.71 for this example. In this case the -test and MML would concur.
how to compute the posterior probability of a model, By) k / . The denominator normalizes
the probabilities over the set of models
By) k /J
considered.
^-;=
: < Y 4
- < Y 4
t4
(5.17)
For the example in Section 5.4.3, Candidates 1 and 2 have posterior probabilities -m and
-N respectively. Here the best model is over six times as likely as its nearest competitor.
This is good evidence for the existence of two local segments, but not compelling evidence.
These posterior probabilities may be interpreted as saying that, in 13% of cases like this, the
less likely candidate may actually be the correct one.
A corollary of Equation 5.17 is that the difference in message length of any two models
may be used to measure the relative posterior probability of the two models. For example,
Candidate 2 had a message length 2.7 bits shorter than Candidate 1. In terms of posterior
172
20
18
16
RMSE
14
12
10
8
6
4
10
15
20
25
30
The results show FUELS to outperform Pseudo-MML at all noise levels, with the discrepancy increasing along with the noise level. Figure 5.16 shows where the two algorithms
173
differed in their model selections. Pseudo-MML is much more likely to choose QGE
than
FUELS. One advantage of this is that it was able to identify the low contrast oval edge around
the h. However, it also chose QLI
in obviously homogeneous regions.
Figure 5.16: (a) noisy U middle section of montage; (b) FUELS model selection;
(c) Pseudo-MML model selection. Black denotes " and white "Cb .
poorer RMSE results compared to FUELS. Although it only considers two models, PseudoMML uses the same number of bits for encoding any segment map encountered. This corresponds to a uniform prior probability distribution for e .
The only way to modify Pseudo-MMLs behaviour is to change the prior. The analysis of
FUELS in Chapter 4 showed that, for typical images, around 50% of regions could be
considered homogeneous. Thus it would be more realistic to assign a higher prior probability
to the single homogeneous segment map, and share the remaining probability equally among
the remaining heterogeneous segment maps. This arrangement is shown in Equation 5.18.
Note that 0.5 is an arbitrary choice, and its optimization will be considered in later sections.
c0d)+eg/J
-4
{
Z q\
b)feg/0D
b)feg/0#
(5.18)
174
Equation 5.19 gives a new expression for the overall message length using this prior. In
terms of encoding e , the new prior corresponds to using 1 bit for homogeneous cases, and
nearly 9 bits for each heterogeneous case. Although not implemented as such, this could
be interpreted as a two step process. First, a single bit states whether QP or Qo
. If
QLE , nothing else needs to be sent. If Q #
, another B
{
)
9#
hv/02B
{
c0d)+eg/ u/b)feg/-B
{
_bu)+h
%
(5.19)
5.6.1 Results
Figure 5.17 is the same as Figure 5.15 except that there is an extra entry for Pseudo-MML using the non-uniform prior of Equation 5.18. The more suitable prior has resulted in PseudoMMLs performance reaching that of FUELS.
Figure 5.18 shows where the three algorithms graphed in Figure 5.17 differed in their model
selections for the middle section of montage. The non-uniform prior improved PseudoMMLs model order selections, because most of the spurious Q
was still able to discern most of the low contrast oval edge around the h, along with the
horizontal boundary below the oval. This example illustrates the importance of choosing a
suitable prior. It is especially important when the window is small, because the model part
of the message contributes significantly to the overall message length.
20
175
18
16
RMSE
14
12
10
8
6
4
10
15
20
25
30
Figure 5.17: Non-uniform prior: RMSE comparison for denoising montage, true supplied.
Figure 5.18: Model selection comparison when LP , black denotes " and white "P :
(a) FUELS; (b) Pseudo-MML; (c) Pseudo-MML with a non-uniform prior.
encoding a segment mean. If the segment means were encoded using fewer bits, the model
part of the two part message would be shorter. Correspondingly, the data part should expand,
as the segment means are less precise. There should exist a quantization level for each
segment mean which optimally trades off the length decrease of the model part with the
176
@? A
$>
@?BA
&
(5.20)
As the variance increases, the accuracy with which we are willing to encode the mean decreases. This is because the distribution has a wider spread, so its exact location is not as
important. As the amount of data increases, a higher accuracy is warranted. Equation 5.21
describes the number of bits required to encode an optimally quantized segment mean, where
_
is the intensity range of the image and & is the standard deviation of the noise.
)=#|/0B
_
{ $>
@?BA
<B
{
_
&
(5.21)
A segment mean encoded with this number of bits is still decodable. Both & and _
assumed to be a priori known, and
are
encoded. Figure 5.20 compares the number of bits needed to encode a segment mean using
the two methods described. The original method uses exactly 8 bits, regardless of the noise
level. The optimal quantization method uses much fewer bits, and the function grows slowly.
Quantization of segment means causes one difficulty. Each pixel is no longer encoded with
respect to a fully accurate segment mean. Rather, a quantized range of possible segment
10
177
Fixed
Optimally quantized
Figure 5.20: Different encodings for a segment means when Rb and CP .
means has been communicated. Which mean from the range should be used? Rather than
taking an arbitrary point estimate, such as the midpoint, MML prefers to compute the expected message length. That is, the average of all message lengths that would result if every
possible point estimate in the quantization range was tested. It is often assumed that each
value in the quantization range is equally likely, corresponding to a locally flat prior density
in the range. A convenient closed form approximation to the expected message length can
sometimes be derived [WF87].
For the purpose of this thesis, a point estimate equal to the sample mean will be used. This
simplifies the computation and reduces the running time, without hopefully causing too much
variation in message length. The resulting expression for the approximate expected message
length is in Equation 5.22.
r)
9#
hv/ %E2B
{
c0d;)feg/ u
.
-104
F
{
_
&
u()fh
%
(5.22)
178
5.7.1 Results
Figure 5.21 provides RMSE results for denoising montage. The algorithm using quantized
means is referred to as MML to distinguish it from the earlier Pseudo-MML. Both algorithms use the non-uniform prior of Equation 5.18 to encode the segment map. DNH and
overlapping are still disabled, and the true value of & is provided.
18
16
RMSE
14
12
10
8
6
4
10
15
20
25
30
The RMSE performance of the three algorithms is mostly indistinguishable. FUELS does
marginally better at low noise levels, while MML does slightly better when the image is
very noisy. Figure 5.22 compares the value of Q Pseudo-MML and MML chose for each
pixel when &\ . Their model selections are very similar, so Figure 5.22c highlights the
differences. Mid-grey indicates that both algorithms chose the same value of Q , white that
MML chose a higher order model, and black that MML chose a lower order model.
Interestingly, if there was a difference, it was always that MML chose Q\
compared to
Pseudo-MMLs Q] . The quantization of segment means has reduced the cost of Q\
179
Figure 5.22: Model order selection when a , black denotes " , white "Oa :
(a) Pseudo-MML; (b) MML; (c) difference: white shows where MML chose "7P
over Pseudo-MMLs " .
models enough for them to be selected more often. In fact, the message lengths calculated
by the MML denoising algorithm are always shorter than those of Pseudo-MML. For a given
segmentation, both algorithms use the same number of bits for the segment map, but MML
uses fewer bits for the means. Because both use the true mean as the point estimate, the pixels
are encoded in the same number of bits too. Thus MMLs messages are always shorter.
180
model considered. The weight given to each model is exactly equal to its posterior proba
bility. Equation 5.23 describes its calculation, where ZB) / is the posterior probability of
model , and k is the local approximation associated with model .
k DGFHB
E
z
t%
B) / k
(5.23)
The blended local approximation may alternatively be considered those pixel values inferred
by a composite model, DGFIHU z . This is shown in Equation 5.24. Combining the predictions
from various experts is a popular technique in computer science, particularly in image
compression [STM97, MT97].
DGFIHU
z
t
ZB) /
k DGFIHU
J
z
k LKNMPORQS
(5.24)
Recall the two example candidate models from Section 5.4.3. Figure 5.23 applies Equation 5.23 to blend the local approximations associated with the two candidates. In this case,
rounding back to integers would cause single best local approximation to equal the posterior
blended one. However, the denoised pixels could be retained at a higher accuracy if further
processing was to be done.
k DGFHB
E
$-mT
z
Figure 5.23: Posterior blending of the two example models from Section 5.4.3.
5.8.1 Results
Figure 5.24 compares the RMSE performance of the posterior blended version of MML to
the MML algorithm which simply uses the most probable model. For &
, posterior
blending improves the RMSE performance of the MML algorithm. It may also be possible
to blend models in the FUELS algorithm, but a metric would have to be invented to quantify
the suitability of each candidate model, whereas this metric comes naturally for MML.
18
181
16
RMSE
14
12
10
8
6
4
10
15
20
25
30
Figure 5.24: RMSE comparison for denoising montage, with true supplied.
The ability to blend over multiple models is very useful when there is no clear mode (peak) in
the posterior distribution. In this case, the posterior mean has been used. This is related to the
use of a squared error loss function [OB94], which also happens to be what the RMSE metric
uses. Combining multiple local approximations together tends to produce a better overall
local approximation. In the case where one model has nearly all the posterior probability
associated with it, the algorithm behaves just like the non-blended version.
182
approximation is rejected, and the noisy pixels are passed through unmodified. It is a simple
matter to incorporate this post processing to the posterior blended pixel values produced by
the MML denoising algorithm.
The results so far have shown FUELS and MML to perform similarly. Figure 5.25 compares
their RMSE performance when both utilise FUELS-style DNH. As expected, their performances are much better at lower noise levels when DNH is enabled. When &I , MML
consistently beats FUELS, but the gap is not significant. Although not shown, their WCAEs
are identical at all points.
18
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
183
blending. When DNH is invoked by FUELS, the noisy pixel values are passed through unaltered. Under MML, this corresponds to encoding pixels as they are, without respect to any
particular model. A raw encoding of
B
{
bits.
For example, a raw encoding of a
window of 8 bpp greyscale pixels would need 72 bits.
The raw encoding may be interpreted as a null model. The null model could be placed into
the existing pool of segment-based candidate models, and be judged alongside them. This
is similar to the -test method in Section 4.7.2, where a null and alternative hypothesis are
compared. The null model could actually have the shortest two-part message length, causing
it to be selected as the best model. This is like FUELS where DNH is invoked when all
other models appear unreliable. When posterior blending is used, the null model becomes
part of blend, with its local approximation being equal to the original pixel values.
For this idea to work correctly, the null model and the standard segment-based models must
be compared fairly. Each model needs to be prefixed by a binary event stating whether a
standard, or null, model is to follow. Figure 5.26 illustrates this arrangement, where DNH
denotes the use of the null model.
(a)
(b)
DNH
Not DNH
h
h
%
Figure 5.26: Incorporating DNH: (a) null model encodes the data as is; (b) standard model.
Let c0d)+hji'k2/ denote the prior probability associated with the prefix for the null model. For
example, if c0d)+hji'k/r- the null model would have a prefix of length )'B
-N
bits. The prefix for a standard, segment-based model would be only )2B
{
-m/r
{
-N/#-m
bits, because according to the prior, they will be used more often.
Figure 5.27 plots the proportion of pixels for which FUELS-style DNH is invoked when
denoising montage. The two curves are effectively plotting suitable values of cJd)fhjiGk/ to
use at different noise levels. Both curves have a hyperbolic shape, approximately following
Equation 5.25. When &G# , the prefix has zero length for the null model and infinite length
for any standard model. If this formulation was used, DNH would be invoked for every pixel
184
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
Figure 5.27: FUELS-style DNH usage when denoising montage, true supplied.
when &2$ . This makes sense because if the noise level is zero, then all pixels are noiseless
and should remain unaltered.
c0d)+hji'k/J
ub&
T
)fhjiGk/0B
)ub&|/
(5.25)
Figure 5.28 compares three denoising algorithms: FUELS using its DNH, MML using
FUELS-style DNH, and MML using the new information based DNH, with c0d)fh i'k/C
x)[u& / . For &
others in terms of RMSE when denoising montage. Although not shown, similar behaviour
was observed for other images.
The worst case absolute error (WCAE) results are provided in Figure 5.29. FUELS and
MML with FUELS-style DNH have the same WCAE for all noise levels tested. When
&
3
, the information-based DNH approach consistently has a lower WCAE. This is inter-
esting because the MML DNH approach makes no explicit attempt to restrict the maximum
18
185
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
allowed change in pixel intensity during the denoising step. Of course, large alterations in
intensity are unlikely to fit within the Gaussian segment models well, causing those models
to have low posterior probability.
DNH ensures that pixels do not change by more than 1& relative to the noisy image only.
However, the WCAE is taken between the denoised image and the ground truth image. Consider a noiseless image corrupted by additive noise with standard deviation & . There will
always be a small fraction of pixels whose values change by more than F& . In the worst case,
a pixel could change by _ intensity units. DNH operates relative to the noisy image, so
it is still possible for the WCAE to be very large.
Figure 5.30 plots the actual relative frequencies of DNH usage for the three algorithms tested.
MML invokes DNH less often when it uses the information based DNH than it does with
FUELS-style DNH. There is a trade-off between invoking DNH too often, whereby no denoising occurs, or not invoking it, whereby there is a risk of producing a poor local approximation which detrimentally affects the RMSE. It seems that the information based DNH is
186
200
180
160
140
WCAE
120
100
80
60
40
20
0
10
15
20
25
30
better at making this trade-off. Although it was decided in a sensible manner, the 1& threshold used by FUELS-style DNH may not be optimal. It is possible that a higher value, such
as B& , could improve FUELS results.
In Section 4.11 the idea of combining estimates from overlapping windows was introduced.
For FUELS, various schemes for linearly combining estimates were tested. It was found that
an equal weighting (SR
) gave the best RMSE results. This same concept may be applied
0.9
Proportion of times DNH was invoked
187
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
Figure 5.30: How often DNH is invoked for montage, true supplied.
and S
option is used for all instances. As expected, overlapping averaging significantly improves
the RMSE results for both FUELS and MML at all noise levels. When S
, there is
little to no difference between the two. It appears FUELS is able to benefit more from the
overlapping averaging than MML. This may be due to MML having already an advantage
through its application of posterior blending of models at the same pixel position.
188
18
16
14
RMSE
12
10
8
6
4
2
0
10
15
20
25
30
are the denoised estimates for the same pixel position from each
overlapping models, k is the final denoised value, and ( are the weights.
k
(
H
(H
(5.26)
For FUELS, various schemes for choosing the weights were assessed. Each scheme tried to
use some attribute of each model to modify the weights. The attributes attempted to capture
the quality, or goodness of fit, of a model.
189
and data. This is done using the message length associated with the model and data from
which the estimate originally came. This attempts to give more weight to those estimates
which came from models which perhaps better explained their local region.
(H[I
- 4^-VU XW 4
"~
(5.27)
(H[I
- 4^-L Y W 4
(5.28)
Pixel posterior
One final potential weighting scheme is given in Equation 5.29. The weight is proportional
to the posterior probability of the pixel alone. This attempts to measure how well the noisy
pixel fitted into the model it came from. That model was either the MAP or posterior blended
model for the window the pixel originated from. It is expected that this weight would behave
similarly to a weight based on the squared error: (To) ? k / { . This is because a pixel
message length is based on a Gaussian distribution.
(H
-L Y W 4
(5.29)
Results
Figure 5.32 compares the RMSE performance of four weighting schemes for montage:
SD
, and the three probabilistic ones just described. None of the variable weighting
schemes perform better than equal weighting for any values of & . The same results were
found to occur for the lenna image. For these reasons, schemes other than equal weighting
will not be explored further.
190
16
Model + Data
Model + Pixel
Pixel
Equal (=1)
14
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
The analyses performed so far in this chapter have assumed that the true value of & , the standard deviation of the synthetic noise, is known. In real situations, the noise is not synthetic
and its level is unknown. It typically has to be estimated from the data. In Section 4.13,
the FUELS algorithm was adapted to use the robust Immerkr noise variance estimation
algorithm [Imm96]. MML could use this estimate too.
Figure 5.33 compares the FUELS and MML algorithms when both use Immerkrs estimated value for & , rather than the true value. The main variation occurs for low & , where
the noise level is over-estimated. This causes DNH to be invoked less often, and the RMSE
results to be higher. Either way, there is very little difference between FUELS and MML.
14
191
FUELS
MML
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
192
75
70
65
60
55
50
45
40
35
30
10
15
20
25
30
Figure 5.34: Evolution of the average two-part message length for the MML denoising algorithm, with estimated from the noisy montage image.
`
. Immerkrs method overestimates &
when it is very
low. This could increase the relative cost of two segment models, causing fine structure
to be corrupted. Usually this structure is swamped by noise, but when there is no noise
it becomes significant. The expression in Equation 5.21 for the optimal quantization of
sample means begins to break down when &|_ . This is due to an assumption of uniformity
within quantization regions [OH94]. Also, the implementation of numerical integration for
calculating the likelihood may not be accurate for very small intervals 1 .
The standard erf() and erfc() functions from the GNU C maths library v2.2 were used.
193
was slowly modified or replaced to use ideas and techniques from information theory. This
helped to remove arbitrary constants from the algorithm, and allow a wider range of image
structure to be potentially discovered. The results for the new MML denoising algorithm
are not significantly better than FUELS. Using MML improves the WCAE at higher noise
levels, but only mildly improves RMSE. It is possible that only slight improvements are
possible because FUELS is already quite good Chapter 4 has already shown that FUELS
outperforms rival denoising algorithms.
The performance of MML is dependent on the prior used. When denoising, the prior consists
of & , c0d)+hji'k2/ , and c0d)+eg/ . The prior determines the length of the model part of each
message being compared. The prior is determined before processing of the image begins,
and is not modified during processing at all. It could be possible to vary the priors on a per
block basis. For example, the prior probability of each segment map could be dependent on
the posterior probability of segment maps from nearby pixels already processed. The aim
would be to exploit edge coherence. However, in this chapter the same priors will be re-used
for processing each pixel in the image.
Under the MML framework, models are assessed using their respective message lengths. The
overall message length (the posterior) is computed as the length of the model (determined
by the prior) plus the length of the data given the model (the negative log-likelihood). If the
amount of data is large, the data part will dominate and eventually swamp the effect of the
prior. If the amount of data is small, the prior exerts a large influence on the posterior [Pre89],
as illustrated in Figure 5.35. In fact, once the amount of data becomes large enough, the
model component may be ignored, and MML reduces to maximum likelihood.
%
hI;;
%
h3;;;;;;
%
Figure 5.35: As the amount of data increases, the model part becomes less significant.
Because local image processing algorithms only use a small number of pixels, and different images vary in their properties, a technique for dynamically learning the prior from
194
the noisy image is desirable. This may be achieved by iterating the algorithm. After processing each pixel using the global prior (common to each local window), a local posterior
is obtained. The local posteriors can be accumulated to form a global posterior when the
whole image has been processed. The global posterior is simply a probability distribution
over possible models, formed from observation of the image at all its localities. It captures
the particular essence of an image, briefly summarizing its local segmentation features. This
makes it ideal for use as a new global prior for re-processing the same image from scratch.
Iteration should continue until the priors converge.
The iteration process has an analogy with the human visual system [FH96, Ede99]. Imagine
visiting a new friends house and pulling a photo album from their bookshelf. Before you
open it, you have no idea of what the first photo will contain. This could be considered
a state of total ignorance, expressed by the uniform priors already used in Section 5.4.2.
After opening the album to the first picture, your visual system scans it to form an initial
impression. The information gained from this first scan may be considered a posterior, which
is then used as a revised prior for a second scan. This process continues until we have
recognized or made sense of the picture, as shown in Figure 5.36.
?
Figure 5.36: Pictorial representation of the iterative approach to image understanding.
A data driven prior incorporates image dependent information which is shared by all the
candidate local models. Because this information is common to all models, it does not need
to be incorporated into the message length. Its code length is the same for every model.
195
so far. The uniform prior assumed all segment maps to be equally likely. In Section 5.6,
a non-uniform prior biased toward Q was introduced. This prior improved the results
significantly, because it coincided more accurately with the expectation about which segment
maps would be common in montage.
Figure 5.37 shows four segment maps. Currently, the prior probability for the first homogeneous block is 0.5. In fact, that is expected to vary for different images. The other three
segment maps have the same prior probability -4Bx)f
255 heterogeneous segment maps are not expected to occur with equal frequency. Common
edge patterns would probably occur more often than, say, the random looking segment map
in Figure 5.37d. A more informative segment map prior should improve denoising results.
Figure 5.37: Potential segment maps: (a) popular "b ; (bc) common edge patterns;
(d) would rarely occur.
The segment map prior may be considered a multistate probability distribution, c0d)+eg/ , which
sums to unity over all possible segment maps. This is a global prior which is re-used for
each window processed. After processing each pixel )|,/ , a local posterior probability
distribution over models considered is obtained. Let it be denoted ZFy)|,, / , where
refers to a specific model from
Each local posterior may be accumulated to form a global posterior, denoted Post )feg/ , using
Equation 5.30. The global posterior probability for a specific segment map is the normalized
sum of local posterior probabilities for models using that particular segment map. For example, imagine an image with four pixels, and that only one candidate model per pixel used a
homogeneous segment map. If that model had local posterior probabilities 0.9, 0.7, 0.1, and
0.3 for each pixel respectively, then the global posterior for the homogeneous segment map
would be )+-4"uX-4uX-muX-N/~x17$- .
Post )feg/0
t4
B)p|,~Z, /
where
e
(5.30)
196
Equation 5.31 describes how the global posterior may be used as a new global prior for
the image. This iterative process begins with the same noisy image each time, but uses an
improved, data driven prior. This is applicable to any components of the prior distribution.
c0d)+eg/ 2
Post )feg/
2
PO Q
c0d)+eg/
J[.-BScOLSOym
(5.31)
This iterative process is equivalent to using the E.M. algorithm discussed in Section 3.2.1.
The E.M. algorithm is guaranteed to converge, but not necessarily to a global optimum. The
best way to encourage a good optimum is to use reasonable starting conditions. The two
priors for e used so far are both reasonable, but the uniform prior would probably be a better
choice when applying the algorithm to images with unknown properties, as it has no bias.
Results
Figure 5.38 compares the MML denoising algorithm using 1 iteration and a non-uniform
segment map prior, to that using 6 iterations to learn a segment map prior. For montage, a
small improvement in RMSE is observed for &
a useful difference to the segment map priors. This could allow the algorithm to identify
structure behind the noise better, because common structural segment maps are cheaper to
encode. The original prior set all QI
segment maps to be equally likely, making it difficult
to give preference to one over the other when the noise level was high. Another possibility is
that the algorithm has learned to choose QRE more often, which at higher noise levels will
probably give more smoothing without the concern of damaging high contrast edges.
Figure 5.39 shows the 15 most probable segment maps that the new MML algorithm learned
from montage. Note the distinct lack of random-looking patterns. The top row coincides
with the many boundaries between squares in the two artificial quadrants in montage.
14
197
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 5.38: Learning \]R^ : RMSE comparison for montage, with estimated.
Figure 5.39: The 15 most popular (canonical) segment maps for montage when R .
x)[u& / , was used. However, it should be possible to use the data driven prior method to
learn automatically a probability suited to the noisy image being processed. This could be
done in the same way as the segment maps were learned. Instead of accumulating the local
198
posterior probabilities from models using a particular segment map, those posteriors for the
null models are accumulated. This global posterior probability is exactly equal to the best
value of c0d;)fhjiGk/ to use for the next iteration.
Results
Figure 5.40 compares the RMSE performance of the non-iterated, one-pass MML algorithm
to one which attempts to learn only c0d)+hji'k/ over 6 iterations. The prior for the segment
map was not learned. The results indicate that learning c0d)fh i'k/ makes little difference to
the algorithms performance. The same result occurred when an initial uniform prior, namely
c0d)+hji'k/0- , was used. This suggests that the existing static prior is a good one.
14
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 5.40: Learning \]R_@`! : RMSE comparison for montage, with estimated.
Figure 5.41 shows proportion of times that DNH was chosen as the best model by the two
algorithms. Iterated MML learns to use DNH less often than its static counterpart, but this
makes little difference to the RMSE results. A possible explanation is that Figure 5.41 only
199
plots how often the null model was the best model. Because there are only three candidate
models, the best model could have posterior probability as low as 0.34 or so. After posterior blending, the influence of the best model may actually be quite low compared to the
sum influence of other candidates.
0.12
0.1
0.08
0.06
0.04
0.02
10
15
20
25
30
message length is short enough. However, it is still hampered from having to choose from a
pool of only 3 candidate models: QL , one thresholded QR
, and DNH.
200
Under the MML framework it is a simple matter to incorporate more candidate models. For
example, all valid segmentations (Section 5.3.3) could be tested, or a few extra QY
segmentations produced using different thresholds could be added. This thesis will take the
simplest route and consider all 256 canonical
binary segment maps. The examination
of all possible segment maps increases the model pool from 3 to 257 candidates, including DNH. The posterior distribution should be more varied, and the data driven segment
map prior should more accurately reflect the local image structures. However, this is at the
expense of nearly a one hundred fold increase in computation.
Figure 5.42 compares FUELS with two versions of the MML algorithm. Let MML-2 be
the familiar version which uses 2 candidate local segmentations plus DNH, and let MML256 be the new version which includes all 256 models with unique segment maps. Both
algorithms are run for 6 iterations to learn cJd;)fhjiGk/ and c0d)+eg/ . When &E , there is no
noticeable difference between MML-2 and MML-256. For images with little noise, thresholding adequately segments the window. As the noise increases, the RMSE performance of
MML-256 improves, until &'#B , where it worsens again.
Figure 5.43 plots the proportion of times that the candidate with the shortest message length
was Q':
. It is obvious from the graph that considering more 2-segment models per local
region significantly increases the chance of choosing a specific 2-segment model as the best
model. This supports the argument that thresholding is insufficient when the noise level is
high. Although not shown, the increase in Q
is mostly at the expense of Q
. DNH
14
201
FUELS
MML-2
MML-256
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 5.42: More models: RMSE comparison for denoising montage, with estimated.
If restricted to a serial computer, there is still a simple modification that can reduce the
number of models which need to be considered for each window. Let & { be the assumed
Fd@ ? the sample variance of the pixels in the window. If
image noise variance, and l { #
l
& , the window is likely to be homogeneous, as the local variance is lower than the global
montage increases, the faster MML-256/2 implementation actually outperforms MML256. Similar behaviour was observed for other images. Although not shown, the DNH usage
of MML-256 and MML-256/2 is nearly identical.
202
0.4
FUELS
MML-2
MML-256
0.35
0.3
0.25
0.2
0.15
0.1
10
15
20
25
30
Figure 5.43: Proportion of times a "P model was most probable in montage.
Figure 5.45 show the proportion of two-segment regions diagnosed by the algorithms. MML256/2 is less likely than MML-256 to choose higher order models, especially as & increases.
Switching to MML-2 mode forces MML-256/2 to choose QR more often. When the noise
is very high, averaging all pixels in the window pays off better in terms of RMSE, even
if it removes slightly more image structure. This does suggest, however, that the posterior
probabilities for the various two-segment models are overstated. This could be true as the
message length calculations do not strictly account for the quantization of segment means,
giving slightly shorter messages than warranted by the data.
The reason for introducing MML-256/2 was to decrease the running time. Figure 5.46 compares the average processing time per pixel used by each algorithm 2 . FUELS is very fast
relative to the MML variants, and barely registers on the graph. MML-2 is about 60 times
slower than FUELS. Somewhat surprising is that MML-256 is only about 43 times slower
than MML-2, despite having to evaluate 86 times as many models. The positive result is
2
The experiments were run on a 1.2GHz AMD Athlon with 512MB RAM running Linux kernel 2.4.16.
5.16 Results
14
203
MML-2
MML-256
MML-256/2
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
that MML-256/2 requires, on average, 60% of the computation that MML-256 does, and
achieves better RMSE results at higher noise levels.
5.16 Results
This chapter has so far shown that an MML-based local segmentation criterion improves the
denoising performance of the FUELS algorithm. This is a direct benefit of improved modeling of images at a local scale. Two useful MML variants have been identified. MML-2
considers 3 models in total, and is a logical extension to FUELS. MML-256/2 increases the
pool of candidate models considered to 257, in an effort to push the structure recognition
capabilities a little further. Together with FUELS, they represent the common trade-off between denoising ability and processing speed. In this section MML-2, MML-2/256, FUELS
and SUSAN will be compared using three test images.
204
0.4
MML-256
MML-256/2
MML-2
0.35
0.3
0.25
0.2
0.15
0.1
10
15
20
25
30
Figure 5.45: Proportion of times that "P was deemed best when denoising montage.
, the two
MML variants are beaten by FUELS, but as the noise increases, they outperform FUELS. The
poor performance of MML relative to FUELS at low noise levels is an interesting problem.
That fact is that the ground truth lenna is already noisy. The MML algorithms could
actually be producing better noiseless estimates of the true, but unknown, lenna image.
3
5.16 Results
205
14
12
10
8
MML-256
MML-256/2
MML-2
FUELS
6
4
2
0
0
10
15
20
25
30
difference image does have less apparent structure in it, but its homogeneous regions are
more blotchy. This means it had difficulty assimilating large numbers of pixels, resulting in
poor smoothing and a higher RMSE result.
Figure 5.49 shows the worst case absolute error (WCAE) profile of the four denoising algorithms. As expected, SUSAN9 suffers at low noise levels, as it has no DNH-like capabilities.
The most interesting feature is that the WCAE of MML-256/2 nearly doubles after &
F ,
206
12
SUSAN9
FUELS
MML-256/2
MML-2
10
RMSE
10
15
20
25
30
Figure 5.48: Enhanced difference images for lenna when X : (a) ground truth; (b) FUELS; (c) MML-256/2; (d) MML-2; (e) SUSAN9.
whereas MML-2 increases gracefully. This also helps to explain the earlier RMSE anomaly,
because large errors contribute significantly to the RMSE value. This suggests that some
two-segment models were made inappropriately probable by MML-256/2 when the noise
level was high. If an images properties vary spatially, a single global segment prior could
produce this type of behaviour.
5.16 Results
160
207
FUELS
SUSAN9
MML-2
MML-256/2
140
120
WCAE
100
80
60
40
20
0
10
15
20
25
30
ties to those in Figure 5.48 for lenna. The errors in MML-2s difference image are lower
in magnitude, particularly the small circular cluster on the left hand side. This cluster has
occurred because the noisy image (not shown) had some very noisy pixels clumped together.
The WCAE results for barb2 are given in Figure 5.52. The deficiency of SUSAN9 at low
4
208
16
SUSAN9
FUELS
MML-2
MML-256/2
14
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
Figure 5.51: Enhanced difference images for barb2 when X : (a) ground truth; (b) FUELS; (c) MML-256/2; (d) MML-2; (e) SUSAN9.
noise levels is again obvious, but at higher noise levels it does better than the other techniques. SUSAN9s behaviour, partially dependent on the threshold, seems to be ambitious
for low & , but successful for high & . MML-256/2 has a lower WCAE for most high values of
& , which is much better than its behaviour observed for lenna. This is perhaps due to the
distribution of segment maps being more uniform across the image, in particular the large
number of crisp edges on the wicker chair and clothing.
5.16 Results
209
140
120
WCAE
100
80
60
40
SUSAN9
FUELS
MML-2
MML-256/2
20
0
10
15
20
25
30
210
Figure 5.53: The } 8 bit camera image and its histogram.
14
SUSAN9
FUELS
MML
MML-256/2
12
RMSE
10
8
6
4
2
0
10
15
20
25
30
failed on a few pixels, but this is understandable given that &o . MML-256/2 also had
a little trouble with the same points, suggesting that it may be incorrectly favouring two-
5.17 Conclusions
211
segment models. Recall from Section 5.7 that the approximate expected message length is
calculated for each candidate. This approximation may be mildly biased toward 2-segment
models. MML-2 had no difficulty ignoring the correctly diagnosed glitch in the original
image. It is less likely to find an (un)suitable two-segment model, as it only considers one.
Figure 5.55: Enhanced difference images for camera when R : (a) ground truth; (b) FUELS; (c) MML-256/2; (d) MML-2; (e) SUSAN9.
5.17 Conclusions
In this chapter, the MML methodology was applied to local segmentation. The simple mean
separation criterion used by FUELS was replaced by a message length comparison between
candidate models. Each message was a concise, decodable encoding of the noisy pixels.
The message with the shortest length was deemed to contain the most appropriate model
for the local region. By using MML, the arbitrary constant, H , that FUELS required, was
eliminated. This introduced the possibility of diagnosing the presence of two segments with
close, but distinct intensities. FUELS was incapable of this in very noisy images.
MML made it straightforward to incorporate alternative models into the pool of candidates
being considered. All that was required was that the models messages be decodable. The
do no harm (DNH) concept was first introduced by FUELS to reduce its worst case denoising performance. It was a simple matter to incorporate a DNH model into the MML
framework. The DNH candidate simply encoded the pixel values as they were, without respect to any segment map or means. This information theoretic approach to DNH was found
to improve RMSE results compared to FUELS simpler method.
MML chooses a single best model from the pool of candidates, the so-called MAP model.
The message length of a candidate model may be interpreted as a posterior probability for
212
that model. Posterior probabilities provide an objective measure of the confidence in each
model. It was shown that if one is only interested in the resulting local approximation,
and not the details of the model, then blending over all models weighted by their posterior
probability was a good thing to do. This alters results when other candidate models have
message lengths close to, but just short of, that of the MAP model. Posterior blending was
found to improve RMSE performance, especially for very noisy images.
The ease with which a DNH model was incorporated paved the way for the addition of
more segment based models. For a window, there exists only 256 canonical binary
segment maps, including the homogeneous one. It was a simple matter to enumerate all these
models and add them to the pool. Along with DNH, this brought the total number of models
considered per pixel to 257. Doing this effectively incorporated spatial information into
the local segmentation, as segment membership was no longer restricted by pixel intensity.
The RMSE results improved, but at the expense of a large increase in running time. The
running time was halved by switching to only 3 candidate models if the local variance was
low enough to warrant a more limited search.
Using a large pool of candidate models in conjunction with posterior blending may lead to
a more accurate local approximation. Small improvements in the accuracy of the local approximation may not be noticeable because the final approximated pixel values are rounded
to their nearest integers. There may be applications where one is willing to pay for a more
accurate local approximation. For example, examination of small features where there is
little contrast may be important in medical imaging applications.
Although an image may use the full range of intensities, a sub-image may only use a small
range. If the pixel values in the sub-image were known to higher accuracy, an enhanced
version may be more accurate than if the original integer pixels were used. There was not
enough time left to pursue this line of investigation. In future work, a photo-realistic ground
truth image could be generated synthetically. The pixel depth of this image could be far
greater than 8 bits. Varying amounts of noise could be added to it, and the pixel values
of the resulting image rounded to 8 bit precision. A denoising algorithm which produces
floating point pixel estimates, such as FUELS or MML-256/2, could then be compared to
the more accurate ground truth image. This would determine whether more accurate local
approximations contain any further meaningful information.
213
When segmenting regions, only a small amount of pixel data is available, and the model
component of a two part MML message becomes significant. This places demands on efficiently encoding the model, demands which may be less important for larger windows. The
length of the model part is directly related to the prior probability distributions used. The
prior for the segment maps could be directly controlled. It was shown that changing it to a
more meaningful non-uniform distribution improved results. The logical extension to this
was to learn the prior from the noisy image itself. The idea of data-driven priors was applied
to the MML denoising algorithm, something FUELS was incapable of doing. It was found
that the priors converged in about 6 iterations. The RMSE results were further improved
using this technique.
The result of this chapter is two useful techniques for local segmentation. MML considers
3 candidate models, and is the logical extension to FUELS. MML-256/2 extends this even
further to consider 257 models in most cases, but only 3 models when a full search is deemed
unnecessary. The subjective analysis also showed the MML methods to produce better local
approximations, on average, than FUELS and SUSAN. However, this small improvement
only came with a lot of computational and developmental effort, suggesting that FUELS and
SUSAN are already very good. The MML-256/2 algorithm could potentially be improved
by deriving a more accurate approximation for the expected message length. In general, the
use of MML-256/2 is only warranted when a proper analysis is vital, the noise level is very
high, or the best possible local approximation is required.
Coding Symposium 1997 [ST97]. In that paper, 11 candidate segmentations of the
window were considered: 3 homogeneous models using the mean, median and midpoint
as the reconstructed value, and 8 binary models using the 8 possible thresholds to generate
segment means. These models were constructed differently to those described in Section 5.4.
The first event stated whether QL or QL#
, with both events assumed equally likely. If QR
the representative value for the block was assumed uniform on -,._G'@ . If QR
, the next
214
on B,. . This was followed by a segment map describing the spatial arrangement of the
cluster labels. Because
were assumed equally likely. The first cluster mean, # , was assumed uniform on -,_bb@ .
Because the second mean was known to be higher than the first, a uniform distribution on
# uDB,._#I@ could be used. Finally, the residuals between the local approximation and
the noisy data were encoded. These were treated differently than in the MML method of this
chapter. Two discrete residual distributions were kept, one for use by QRO models, and one
by Qb]
models. No numerical integration was required as all segment parameters were
quantized to their nearest integers.
Only the centre pixel of the most probable resulting local approximation was used for denoising. The ideas of overlapping estimates and do no harm had not been developed yet. The
algorithm was also iterated such that the posterior probabilities were fed back to be priors,
just as MML does in Section 5.13. The idea of posterior blending was not yet developed, and
the global posteriors were only updated from the most probable model at each point, rather
than a weighted contribution from all models.
Chapter 6
Extensions and Further Applications of
Local Segmentation
6.1 Introduction
In Chapters 4 and 5 the principle of local segmentation was applied successfully to removing additive noise from greyscale images. The relative performance of different ideas and
parameter settings were evaluated and compared using the RMSE criterion and difference
images. Image denoising was naturally suited to demonstrating some advantages of examining images from a local segmentation perspective. However, one should not be led into
thinking that its usefulness ends there.
Local segmentation may be considered a core component of many other image processing
applications. A local segmentation analysis provides a way to split an image into its signal
and noise components. The signal may then be enhanced further without interference from
the noise. It provides a snapshot of the structural features at each point in an image, identifying which pixels belong together and where boundaries occur. Figure 6.1 espouses the
application of local segmentation as a first step in low level image processing, from which
many other tasks may be derived.
This chapter will illustrate how local segmentation could be applied to pixel classification,
edge detection, pixel interpolation and image compression. It will also examine alternative
215
216
Structure
Denoising
Edge detection
Compression
Interpolation
Raw image
Etc. . . . . . . . .
Noise
Local Segmentation Decomposition
image models, noise models, and segmentation algorithms for use in local segmentation.
The extension to different data types will also be considered. Most of the ideas will only be
presented in brief, as a full study is beyond the scope of this thesis. However, some topics
do include preliminary implementations and results.
217
pixel being completely replaced by a random pixel value. This differs from additive noise
where some information about the original pixel value is retained. The random replacement
pixel is usually assumed to be drawn uniformly from the full intensity range -,_@ . A
pixel which has been affected by impulse noise is known as a corrupt pixel.
There is a small chance that a corrupt pixel will have an intensity very similar, or equal, to its
original value. However, most of the time its value will stand out from its neighbours. The
median filter in Section 3.4.4 is sometimes well suited for filtering impulse noise. In homogeneous regions, the median filter can cope with up to 50% of the pixels being corrupt and
still produce a reasonable denoised value. In heterogeneous regions it is less well behaved.
In terms of local segmentation, a corrupt pixel may be considered a very small segment having an intensity sufficiently different from its neighbours. In the simplest case, this segment
would consist of one (corrupt) pixel and be surrounded by other larger segments. It is possible for corrupt pixels to be adjacent, resulting in small, spatially connected groups of corrupt
pixels. If an adjacent pair of corrupt pixels is similar enough in intensity, it could even be
(mis)interpreted as a valid two pixel segment.
Let the minimum acceptable segment size (MASS) be the smallest sized local segment that
should be interpreted as structure rather than noise. Recall the
median filter from Section 3.4.4, which was unable to filter the centre pixel correctly when the intersection of its
segment and the windows contained fewer than 5 pixels. It could be considered as having
a MASS of 5. The centre weighted median from Section 3.4.4 is able to adjust its effective
MASS by varying the centre weight, ! . When ! I
, the MASS reduced to 4.
Ideally, the MASS should depend on l , the fraction of corrupted pixels in the image. The
MASS could be a global parameter to the local segmentation process, in the same way that
&
is for additive noise. It could be supplied by the user, or somehow estimated from the
image itself. Equation 6.1 gives an expression for the probability that a given pixel, , will
be surrounded (in an 8-connected sense) by exactly corrupt pixels, where l is the per pixel
probability of impulse noise.
)"lF/ r
(6.1)
218
The probability of a given pixel having no corrupt neighbours is )rlF/ r . The probability
of that pixel being itself corrupt is l . Therefore the probability of a randomly selected pixel
being isolated and corrupt is l-)J2lF/ r . When l; , this is only 4.3% of pixels, meaning
5.7% of corrupt pixels should have one or more corrupt neighbours. This corresponds to
the majority of impulse noise occurring in clumps rather than isolation, with pairs being
most common. Fortunately, these pairs are not necessarily similar in intensity and could be
diagnosed as two separate single pixel segments.
The multi-class version of FUELS from Section 4.12 could be adapted to additionally remove
impulse noise. For each pixel, the optimal local segmentation could be determined, using the
estimated additive noise level, & , as before. If the centre pixels segment consists of fewer
than MASS pixels, it could be considered an impulse segment. The pixels from an impulse
segment need to be assigned new values. The mean of the largest segment not containing the
centre pixel could be used. For a window, that segment will always contain some pixels
neighbouring the centre pixel. However, for larger windows this would not necessarily be the
case. A more advanced version of this idea could take that into consideration when choosing
replacement pixel values.
Results
Figure 6.2 illustrates the application of this idea to the middle section of montage contaminated by 5% impulse noise. The multi-class FUELS algorithm was modified to handle
both additive and impulse noise as described earlier. Both DNH and overlapping averaging
were disabled because they would only complicate the implementation. Results for MASS=2
and MASS=3 are included. The outputs of the
median (effective MASS=5) and centre
3-weighted median filters (MASS=4) are also present for comparison. The image margins
should be ignored, as the method used for handling missing pixels is not entirely appropriate when impulse noise is present in the image.
When MASS=2, single pixel segments will be obliterated, while larger segments will remain
intact. In Figure 6.2 it can be seen that the remaining noise mainly consists of pairs of pixels
which are similar in intensity. But this means two-pixel segments, such as the dot on the
i, remain intact. When MASS=3, the modified FUELS manages to remove most of the
219
Figure 6.2: (a) clean image; (b) with d# B impulse noise; (c) multi-class FUELS,
MASS=2; (d) MASS=3; (e) median; (f) weighted median.
220
noise that remained when MASS=2. But this comes at the expense of structure loss on the
window shutters, the dots on j and ?, and the hole and tip of the a. At first glance the
median filtered image looks very clean, but much of the fine detail and the letters have been
seriously damaged. The weighted median appears to have done a better job overall, but also
suffers from corruption of the letters. FUELS with MASS=2 did best of all on the letters.
Figure 6.3 performs the same experiment, but with 10% (lR
ronment, single pixel impulse segments are less likely to occur than when l2]-N . The
MASS=2 filter still does reasonably well under these conditions the letters are all legible
but much noise still remains. When MASS=3, less noise remains, but some of the letters are unrecognizable. The two median filters performed similarly, but the centre weighted
median did manage to retain more of the letter structure.
Table 6.1 contains RMSE results for the filtered outputs demonstrated in Figures 6.2 and
6.3. When l
-4B , the weighted median and MASS=2 do best. This correlates with
RMSE. MASS=2 does worst, as it can only remove single pixel noise segments, which are in
the minority for 10% corruption. MASS=3 and median perform similarly. Once again, this
does not differ substantially from the conclusions drawn from the earlier visual examination.
Method
Median
W. Median
MASS=2
MASS=3
l#-N
l#-m
16.4
11.1
12.7
15.7
17.3
13.5
19.2
17.9
Table 6.1: RMSE results for filtering impulse noise from montage.
Conclusions
One method for adapting the multi-class FUELS algorithm to remove impulse noise was
presented. It functioned by treating all local segments containing fewer than a specified
number of pixels, the MASS, as noise. Corrupt pixels were replaced by the denoised mean
of their largest neighbouring segment. Additive noise was simultaneously removed for those
segments not diagnosed as impulse noise.
221
Figure 6.3: (a) clean image; (b) with d PB impulse noise; (c) multi-class FUELS, MASS=2;
(d) MASS=3; (e) median; (f) weighted median.
222
The MASS=2 implementation works well for 5% corruption. It tended to preserve more
structure than the median varieties, but could not cope with the occasional pairs of corrupt
pixels with similar intensities. As the impulse noise level was increased, higher values for
the MASS are required to cope. In those situations it tends to leave more noise behind,
but preserve more structure in the areas where it took it away. This should make it more
amenable to multiple passes through the data.
The best MASS to use depends on the amount of impulse noise present. It is preferred that
the algorithm determine the most suitable MASS for a particular image automatically. One
possibility is to apply filters with known MASS and examine the number of pixels which
suffer gross changes. From that number it may be possible to estimate the percentage noise
corruption, l , and hence a suitable MASS.
? )p|,~/0I)p|,-/uXw)p|,-/~)p|,-/
(6.2)
It is common to assume that the multiplicative noise function 0)p|,~/ is normally distributed
as 8:)+-,& { / , with & constant over the image. Multiplying a variable 698:)+-,& { / by a scalar,
, results in a variable 68
pixel. Thus multiplicative noise could be considered additive noise where the variance is
dependent on the original pixel value. There are various ways in which this behaviour could
be exploited by the local segmentation algorithms developed in this thesis.
Consider a small block of pixels corrupted by the multiplicative noise model just described.
Under the assumption of homogeneity, the mean, # , of the block would be a reasonable
estimate of the original intensities. FUELS would assume the block variance to be & { . Under
a multiplicative noise model it would be approximately equal to )B#&|/ { . Similar arguments
would apply to the clusters determined under a heterogeneous model. A modified FUELS
223
model selection criterion would use the per cluster variance to determine suitability, rather
than a common noise variance. A simpler alternative is to simply use the average brightness
of the block to determine a local noise variance to be used by all derived clusters.
The treatment just described is not ideal. The mean is not necessarily the best estimate to
use, as bright pixels are less reliable than dark ones. The binary clustering algorithm used
by FUELS assumes the clusters have a common variance. To properly handle multiplicative
noise a more complex clustering technique should be used. However, it has been shown that
with some modifications, local segmentation could be used for more than just additive noise.
? )p|,-/wI)p|,-/uA8
)f ,&J)p|,~/ { /
(6.3)
Under this model the variance could vary erratically from point to point. It would be difficult
to determine an appropriate values of & for each pixel. A more likely situation is for the
variance to vary smoothly across the image, or to be constant within global segments but
discontinuous between segments. In these cases, the variance will be mostly consistent on
a local scale. The technique used by FUELS and MML to estimate the global noise variance could be applied to small sub-images centered on each pixel. Under the assumption
of smoothness, the variance could be measured in a regular fashion for a proportion of all
pixels. The variance for the remaining pixels could be interpolated from the estimated ones.
The Q -means segmentation algorithm used by FUELS could not be applied unmodified to an
image containing known, spatially varying noise. Although it does not take spatial information into account, and the value of a noisy pixel is still the best estimate of its true value, the
equal averaging of pixels to estimate the segment mean is inappropriate. Instead a weighted
mean, based on the noise variance of each pixel value, should be used. The model selection
criterion would also need to be modified to handle the different variance of each pixel, and
therefore of each segment.
224
225
A constant segment has only one parameter, # , to describe its average intensity. A planar
segment has three parameters. The intensity profile for a planar segment may be written
like Equation 6.4, where
#
(6.4)
If the noise is assumed distributed as 8:)+-,& { / for each pixel, the optimal values for the three
parameters may be determined using the familiar least-sum-of-squares algorithm [Nie94].
A simpler solution for pixels from a regular square region, like the popular
window, is
given in Figure 6.4. The gradient terms are actually equivalent to the outputs of the horizontal
and vertical Prewitt edge detector masks [PM66].
p
E
#2
D
8
2 /
u ) E '
/ u) D 2
/
D
m) ' / u#) 2 { / #
u ) G /
r
p
p
p
m)
It has been shown how to derive a planar model for a homogeneous region. Under the
MML framework, this model could be incorporated into the pool of candidates, alongside
the homogeneous constant model and two-segment constant models. The model part of its
message would be longer, as three parameters need to be encoded to describe the fitted plane.
It should be possible to determine the optimal quantization levels for
8
part is the residuals between the fitted plane and the noisy pixel values. MML naturally takes
into consideration the complexity of each model via its overall message length. The planar
model would only be chosen over the constant model as warranted by the data.
Fitting planes to each segment in a heterogeneous window is more complicated. Obviously,
when there is more than one segment present, there are fewer pixels in each segment. It is
difficult to estimate reliably the gradient terms for the fitted planes, due to the lack of data.
For an MML message, the small number of pixels would make it difficult to save enough bits
in the data part to make up for the extra bits used for encoding the parameters of the plane.
226
FUELS and MML allow segments to be unconnected, further complicating the modeling.
For a window, fitting planes to two or more segments would not probably be justified.
A homogeneous planar model can already model ramp edges very well. This phenomenon
is what two segment regions were previously forced to approximate.
z
x
y
Each element in a volume image is called a voxel, which is shorthand for volume element.
In the simplest case, a voxel stores a scalar measurement, such as a bone density or radioactive dye intensity. Figure 6.6 shows the local neighbourhood of a voxel. Translating the 2D
concept of 4-connectivity to 3D results in a neighbourhood of 6 voxels.
Local segmentation is a philosophy, not a particular algorithm or implementation. Its principles apply equally well to 2D greyscale images as they do to 3D volume images. A local
neighbourhood for each voxel can be defined. This neighbourhood may then be segmented
using any suitable technique. If spatial information is not important, then FUELS or MML2 could be applied without any major changes. The principles of DNH and overlapping
averaging would still apply to volume data.
227
Figure 6.6: Each voxel (dotted) has 6 neighbours in its immediate neighbourhood.
The extension to 3D is only the beginning. Medical scanners are able to take temporal sequences of volume images. For example, to examine blood flow through the body over a
period of time. This could be treated as 4D information. The local neighbourhood would
consist of voxels around the current voxel in the current frame, along with voxels around
the same position in adjacent frames. A possible complication is that the time dimension is inherently different from the spatial ones, whereas FUELS and MML assume certain
isotropic behaviour. Planar models might be useful for handling this situation.
These ideas also apply to digital video, which is a time series of 2D images. Local segmentation would be able to detect a change in the number of local segments around each pixel. This
change would probably be associated with moving objects in the recorded scenes. Hence,
local segmentation could be used as a low level step in object tracking systems.
228
maps doubles. As the window size increases, the average number of segments present in the
underlying image is also expected to rise. This also increases the total number of candidate
segmentations to be potentially considered.
Obviously the search space needs to be restricted. The homogeneous and optimally thresholded models should always be included. A segmentation algorithm which exploits spatial
information could be used to generate extra candidates. The thresholded segmentation could
be used a starting point for the spatial algorithm. Most spatial segmentation algorithms require one or more parameters to control their operation. A few different values for these
parameters could be tried, and each resulting segmentation added to the pool.
Reducing the search space is only part of the problem. Each segment map needs to be costed
in bits. An explicit prior distribution over all possible segment maps is no longer suitable, due
to the large number of possibilities. Instead, a method for efficiently coding segment maps
is needed, which implicitly gives a prior to all segment maps. The field of lossless image
compression would be a good place to find a method for doing this. I have already done
some preliminary investigations into the use of a low order Markov model for compressing
binary segment maps, much like the JBIG algorithm [GGLR81, PMGL88]. The Markov
model parameters are dynamically learned using the data-driven prior approach. The details
of this work are outside the scope of this thesis.
229
present. This Q may be used for pixel classification. Ignoring DNH for the moment, the
multi-class FUELS algorithm estimates Q to be between 1 and 9. When Q , the
window is smooth. When Q
Smooth
Shaped
QLD
Q $
Figure 6.7 shows lenna and its corresponding classification map. Black pixels denote
smooth points and white pixels denote shaped points. The classification is based on values
of Q as determined by the multi-class FUELS algorithm with DNH disabled. The noise level
was estimated to be &'#
4BF .
Figure 6.7: (a) original lenna; (b) classification into smooth (black) and shaped (white)
feature points.
Of the pixels in Figure 6.7b, 59% are smooth and 41% are shaped. The smooth points do
indeed correlate with the large homogeneous regions in lenna, and the shaped points to
the edges and texture. Pixel classification of this type could be used in various ways. For
example, a lossy image compression algorithm could allocate more bits to the shaped areas
to preserve sharpness. A global segmentation algorithm could attempt to cluster only smooth
points first, and then apply region growing to annex shaped points accordingly.
230
In the example of Figure 6.7, shaped and textured feature points were not distinguished. Windows which are not well modeled by one or more segments could be considered textured
regions. The multi-class FUELS algorithm uses its DNH mode to identify such windows.
In Section 4.12.4 it was also shown that windows where Qo
, where
is the number
of pixels in the window, are equivalent to DNH. Thus local segmentation could be used to
classify pixels into three groups as follows:
Smooth
Q
Shaped
Textured
Qj
QI
or DNH
Figure 6.8 applies these classification rules to lenna. Only 5% of pixels are classified as
textured. About a third occur near edges, possibly ramps. These are difficult to model with
piece-wise constant segments. Another third seem to be isolated patches, perhaps planar regions for which a segment-based model was inappropriate. The last third do actually appear
in textured regions of the image, particularly the feathers in lennas boa. This result is less
encouraging than when only smooth and shaped points were considered.
Pixel classification could possibly be improved by examining the image at different scales.
Chou [Cho99] performs edge strength measurements on the original image and two downsampled versions thereof. A set of rules relating edge strength at each position over the
various scales is then used to make the final decision. For example, a low edge response at
all scales gives strong evidence for the point being smooth, and correspondingly, consistently
high edge responses suggest a shaped point. If the edge strength varies relatively over scales
then it could be considered a textured point. It would be possible to apply these types of
rules using the values of Q determined at different scales.
231
Figure 6.8: (a) original lenna; (b) smooth points (white); (c) shaped points (white); (d) textured points (white).
The shaped pixel classification in Section 6.4 may be considered a primitive edge detector.
It provides a rough location, but no magnitude or orientation. This section will explore some
more advanced methods for edge detection from a local segmentation perspective.
232
give a low response in homogeneous regions, and a high response in the vicinity of segment
boundaries. The response usually increases with both the sharpness and contrast of the edge.
The output of an edge strength algorithm could be used by segmentation algorithms. For
example, those pixels furthest from high edge activity may be used as seeds for a region
growing algorithm. A map of edge strengths could be used as the elevation map for the
watershed segmentation algorithm [HH94]. Alternatively, the identified edge pixels may be
used to initiate a boundary tracking process.
The commonly used Sobel linear convolution masks [Sob70, GW92] treat the image
as a functional surface, and attempt to estimate the gradient magnitude in two orthogonal
directions. Equation 6.5 shows the horizontal mask, , and the vertical mask, . The edge
strength measurement ,fefgV)J)p|,-/ at each point is the Pythagorean combination of the result
of convolving the two directional masks with each pixel w)p|,~/ .
0 -1
0 -2
0 -1
-1
-2 -1
,4ehgV)J)|,/
N)p|,-/ {wuNw)|,-/ {
(6.5)
Local segmentation may be applied to edge strength measurement. Consider FUELS thresholding from Section 4.14, which diagnoses the window as consisting of one or two segments.
For the two segment case, the numerical difference between the two cluster means could be
considered a measure of edge strength. Equation 6.6 gives an expression for computing
an edge strength image, i
H& are diagnosed as consisting of two segments, that amount is subtracted from the edge
e)p|,~/0R7,
%
#
(6.6)
This measure of edge strength is only one possibility. An alternative would be to consider the
intensity difference between the centre pixel and the neighbouring pixel nearest in intensity,
but from the other segment.
233
Results
Figure 6.9 provides edge strength results for lenna. Equation 6.6 was used by the local segmentation based method. The Sobel and local segmentation outputs were linearly stretched
to the full intensity range, to improve visibility of the edge strengths. SUSANs edge detector
output is also included1 . It used the same value of as used for denoising in Chapter 4.
Figure 6.9: (a) original lenna, estimated LPF ; (b) local segmentation; (c) Sobel; (d) SUSAN using |kjBl[b .
The local segmentation edge strength measurement algorithm presented here is very simplistic. Its performance could be said to fall somewhere between Sobel and SUSAN. It appears
1
234
to be less sensitive to noise than SUSAN, but picks up more potential detail than Sobel. It
would be interesting to apply thinning and edge linking to the local segmentation output, but
that is outside the scope of this chapter. The results are useful in that they essentially come
for free from the local segmentation decomposition, and do not use any spatial information.
A more advanced implementation could consider the actual pattern of cluster assignments
for each window, and accordingly determine edge direction, or disregard incoherent patterns.
are
{ boundary patterns, but there are really much fewer, because each pattern must form
one or more logical segments.
Figure 6.10: The twelve inter-pixel boundaries for a window, shown in bold.
Figure 6.11 illustrates three potential segment maps. The homogeneous segment map asserts
that no boundaries are present in the window. The window with a vertical edge asserts the
existence of a boundary at 3 of the 12 possible interfaces. The diagonal line in the third
window is even more complex, insisting that there is a boundary present at 6 interfaces.
235
The posterior information gleaned from the MML-256 local segmentation process may be
used for edge detection. For an <
boundaries, and the same number of vertical ones. Each processed pixel produces a probability distribution over possible segment maps. If the segment labels on both sides of an
interface are different, then a boundary is present along that interface. For a given segment
map, the probability of each interfaces boundary existing in the image can be assumed to
have the same probability the posterior probability for the segment map in question.
Thus a probability for the existence of an edge at each pixel interface can be determined.
Each interface occurs in 6 overlapping windows, so probabilities can be accumulated. The
horizontal and vertical probabilities can be stored in two images, called s and t , each of
resolution )pL/)+'R/ . Each image may be visualized alone, or they could be combined
into a single image, i . Three different methods for combining the horizontal and vertical
edge probability images are considered:
1. Averaging: iT
{
)fs uPtw/ .
m K on K .
{
Results
Figure 6.12 gives results for part of lenna. Black represents probability zero, and white
probability one. The MML-256 algorithm was iterated 6 times to learn priors for the segment
maps. The DNH model was treated like a homogeneous window, in that it did not contribute
236
Figure 6.12: Example of probabilistic pixel boundary detection: (a) part of lenna; (b) p ;
(c) q ; (d) Gpsrtq ;u; ; (e) v Gp { rtq { ;u; ; (f) wyxz.Gp;q .
any posterior probability to any pixel interfaces. An alternative would have been to treat all
pixel interfaces as being active.
This technique seems to work extremely well. The vertical image clearly picks up the hanging feather at the top of the image. This feather does not appear at all in the horizontal
image, as desired. The diagonal brim of the hat is clearly detected in both s and t , because
237
diagonal edges are a blend of horizontal and vertical changes. For visualization purposes,
the )t,.s/ function appears to give an output image with more contrast. The R operation clearly states whether an edge is present or not, regardless of its orientation. The other
two methods for combining t and s respond more to diagonal edges than to vertical and
horizontal ones.
Figure 6.13 shows the combined probabilistic edge strength image for the complete lenna
image, using the R)+s,~t/ method. It was produced under the same conditions as Figure 6.12f. The results are quite remarkable. All the important boundaries are clearly visible.
Most interesting is the amount of detail with respect to individual feathers on the boa hanging from the hat. It has also managed to detect a large proportion of the fine diagonal lines
present above the band on the hat.
Conclusions
The results of this section clearly show the power of the MML-256 local segmentation framework. The horizontal and vertical edge strength decomposition derived naturally from the
local segmentation models used. It would be possible to transform this orthogonal decomposition into another coordinate space. For example, polar coordinates would describe the edge
magnitude and the edge orientation at each position. The { norm plotted in Figure 6.12e
is effectively the magnitude. The angle of orientation, , may additionally be derived using
- 4
the trigonometric expression A#Fd[PO|{ - 4 .
to
<
1 . The doubled image has four times as many pixels as the original.
Exactly 75% of the pixels need new values determined for them. This situation is shown in
Figure 6.14. The problem is to predict suitable values for the unknown pixels, such that the
features of the original image remain intact.
238
The simplest technique for image enlargement is pixel replication. It makes the assumption
that the known pixel values were obtained from a
down-sampling of an originally larger
image. Thus, the best replacement pixels have values equal to the known pixel. An example is given in Figure 6.15. The main disadvantage with pixel replication is that, for natural
images, the results are very blocky. The next simplest idea is to interpolate the pixel intensities linearly. This often looks better than pixel replication, but has a tendency to produce
blurred edges, much like the box filter for denoising.
239
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Figure 6.14: Doubling an images size means predicting all the ? pixels.
?
?
?
?
?
?
?
?
Structure directed enlargement algorithms attempt to maintain edge coherence when estimating values for the unknown pixels. It is possible to use the MML-256 local segmentation
method to aid image enlargement. Consider the Z window of the enlarged image in Figure 6.16a. It has 9 known pixels, which are numbered. In the original image, assume the best
model used the binary segment map denoted by the white and grey pixels. Segment 1 had 4
pixels in it, and segment 2 had 5 pixels in it.
Now consider the
sub-window in Figure 6.16b. It has 4 known pixels, and 5 unknown
pixels. The segment memberships that the 4 known pixels had in the original segment map
can be assumed consistent. All that remains is to determine which segment each of the
unknown pixels belong to. When that is achieved, each unknown pixel can be assigned a
value equal to its segments mean. Because only 5 pixels dont belong to a segment, there
are only
q[#B
possible segment maps that could apply.
The MML-256 algorithm can be used to calculate the message lengths for these 32 possible
models. The same prior used for the original image could be re-used, ensuring that the
240
(a)
Segment 1
(b)
(c)
Segment 2
32 possibilities
original image structure is replicated in the enlarged image. The segment means in each
message would be equal to the segment means calculated from the original 9 known pixels
in the window. The data part of the message would only consist of residuals for the
known pixels. Of the 32 possible segment maps, the one with the shortest message length is
used to fill in values for the unknown pixels. It is straightforward to incorporate overlapping
average and posterior blending into image enlargement too.
Figure 6.17 compares three image enlargement methods for doubling montage (only the
middle shown). As expected, pixel replication produces blocky output, and linear interpolation produced blurry output. In comparison, the MML-256 result is good. The edges of
the artificial shapes and letters look more continuous. MML-256 based zooming may be
considered a fractal type approach, because the same prior is used at two different scales,
ensuring the image is self-similar at different resolutions.
Digital zooming has many applications. Most digital cameras on the market only support
optical zooming up to a point, after which digital zooming takes over. It is important that
the invented zoomed data appear realistic. Deinterlacing of digitized television signals can
be interpreted as taking an interlaced frame of size
, and producing a full frame of
size
<
F . Predicting pixel values for the missing rows could be done by adapting the
MML-256 technique just described.
241
Figure 6.17: Comparison of doubling methods: (a) denoised image; (b) pixel replication;
(c) linear interpolation; (d) MML-256 based enlargement.
242
MDL, has proffered a merging of the two disciplines. Both image compression and MML
are concerned with optimal lossless encodings of the data. The data part of a two part MML
message may be considered the noise, while the model part is the structure. This could be
useful to a lossy compression algorithm for deciding which image information is important, in addition to any heuristics based on the human visual system. Better compression is
intimately linked with better understanding of data.
0110010110001
0010100001110
1001100101100
1010101100011
1011001001011
Original Image
Lossy Approximation
Lossless encoding
243
should be used for encoding those regions. Important structural details, such as edges, occur
in Qv
regions, and more bits should be used to ensure that these features are represented
accurately in the lossy compressed image.
Figure 6.19: Standard BTC: (a) original image at 8 bpp; (b) reconstructed image at 2 bpp;
(c) bitmap.
Local segmentation and BTC are closely related. Both techniques involve segmenting a
small group of connected pixels. The segmentation includes a bitmap for describing pixel
assignments to segments, and representative intensities for each segment. The original BTC
method always used two classes, which is similar to FUELS. However, BTC used the block
mean as a threshold, and chose the segment means to preserve the variance of the overall
block. This is in contrast to FUELS use of the adaptive Lloyd quantizer.
When a block is homogeneous, there is no need to transmit a bitmap and two means
a single mean is sufficient. Some variants of BTC attempt to exploit this property to save
bits and improve compression. To make a decodable bitstream, each blocks encoding is
preceded by one bit to distinguish between homogeneous and standard blocks. An alternative
is to send the two means first, and if they are the same, not to send a bitmap. Mitchell et
244
al [MD80] assume the region is homogeneous if the block variance is less than one, and
Nasiopoulos et al [NWM91] do the same if the range of the block is less than some threshold.
FUELS already provides a way to estimate a suitable threshold for distinguishing between
homogeneous and heterogeneous blocks. Figure 6.20 shows how Adaptive BTC can improve
the bit rate with little loss in image quality compared to standard BTC. The threshold used is
F& , where & is estimated as 2.84. This results in 41% of blocks being declared homogeneous,
reducing the overall bit rate to 1.39 bpp. The bitmap image clearly shows how two segments
were only used in high activity regions of the image.
Figure 6.20: Adaptive BTC: (a) original image; (b) bitmap; (c) reconstructed image at 1.39
bpp, 41% homogeneous blocks using threshold of 8.5; (d) reconstructed image
at 2 bpp.
Prefixing each block with a bit describing how many segments are in the block implicitly
places a prior probability distribution over segmentations. Specifically, it gives probability
0.5 to QAW , and shares the remaining 0.5 between all possible J clusterings, of which
there are
q #
FFB . This is obviously an inefficient coding, because Section 5.14 showed
that not all binary segment maps are equally likely. The MML methodology could be applied
to first determine these probabilities from the image. These probabilities could then be sent
up front, and the rest of the image encoded using them. If the cost of the upfront information
is small enough, a net gain in bit rate would be possible.
245
The great majority of lossless compression algorithms are based on predicting the value
of the next pixel to be encoded. Examples include CALIC [WM96], LOCO [WSS00],
HBB [STM97], TMW [MT97] and Glicbawls [MT01]. The prediction must be formed using
only those pixels which have already been encoded the causal neighbourhood shown in
Figure 6.21. This is because the decoder does not yet know about future pixels.
NN
NW
WW
NE
Figure 6.21: The causal local neighbourhood consists of pixels known to both encoder and
decoder.
Typically, only pixels in the local neighbourhood are used for prediction. In Figure 6.21 they
have been labelled with the standard compass point notation [STM97], where N denotes
north, W denotes west, and so on. The pixel to be encoded, ?, is usually referred to as the
current pixel. The predicted value for the current pixel is often taken to be a linear combination of the pixels in the causal neighbourhood. For example, the Pirsch predictor [Pir80] is
calculated as 0.5W+0.25N+0.25NE.
Local segmentation could be used to improve the quality of predicted values. The amount
of noise in a predicted pixel value is a linear combination of the noise in each of the pixels
used for the prediction. It is desirable that this noise be minimized. Instead of using the
decoded (original) pixel values for prediction, denoised versions of them could be used.
For example, FUELS could be used to filter the image as it is decoded. The overlapping
estimate technique would even provide reasonable denoised values for those pixels without
a complete two-sided neighbourhood.
246
When encoding each pixel, it must be with respect to some distribution over possible pixel
values. The most commonly used distributions are Gaussian and Laplacian, due to their
symmetry and unimodality. The predicted value is used to estimate the location parameter
of the distribution. The spread parameter captures the level of confidence we have in the predicted value. Besides the prediction formula, the method used for determining an appropriate
spread value is what distinguishes most algorithms.
Some techniques implicitly use a single spread parameter for the whole image, such as lossless JPEG [PM93]. Some estimate it directly from the causal neighbourhood, like Glicbawls.
CALIC and LOCO use the idea of context coding, whereby the local causal neighbourhood is
classified into a context. Only a small number of contexts are used, with each one effectively
providing a spread estimate to use for encoding the current pixel.
The FUELS approach could be used for determining contexts. Let us assume the decoder has
been transmitted an estimate of the global noise variance, & . The causal neighbourhood could
be locally segmented. The optimally determined segment map could be used as a context
number. The number of contexts could be controlled through the number of neighbours
taking part in the segmentation. Most useful is that local segmentation could distinguish
between homogeneous and heterogeneous contexts. The spread parameter for each context
could be adaptively learned in a fashion similar to CALIC and LOCO.
A more general approach is to use local segmentation for both the prediction and the context
selection. The causal region could be optimally segmented. If Q
will be in the same segment, or another new segment. The predicted value should probably
be equal to the mean of the causal neighbourhood. If Q
to either of the causal segments, or be from a new segment. The encoder could examine the
current pixel and determine which of the two segments it belongs to. A single binary event
could be used to state which of those two segments it is. The mean of that segment could be
used as the predicted value. This approach to prediction should be good at predicting large
changes in the vicinity of edges.
A more complex version of the previous idea could be based on the MML-256 algorithm. Let
us assume the encoder and decoder agree on a prior over segment maps. The causal neighbourhood could be segmented in all possible ways, not including the current pixel. For each
6.8 Conclusions
247
segment map considered, there are two new segmentations containing an extra pixel which
can be derived by assuming that the current pixel is in each of the two possible segments
(assuming QL#
). The prior could be used to assign weights to all these segmentations. The
predicted values could be blended together based on the posterior probability of each of the
segmentations. This is similar to the way TMW blends prediction distributions.
6.8 Conclusions
It has been shown how to apply local segmentation to a variety of image processing tasks
besides denoising, including edge detection, image zooming and image compression. Also
discussed was how to incorporate different image models and noise models into the local segmentation framework. These followed on from the constant facet and additive noise models
used by FUELS and the MML denoising algorithms in Chapters 4 and 5. The extension of
the ideas in this thesis from 2D images to 2D video sequences, 3D volume images, and 4D
volume image sequences was also considered.
The local segmentation principle is a simple one. It states that whenever a pixel is processed,
only those neighbouring pixels in the same segment should take part in the processing. This
chapter illustrated how this idea provides a consistent and unified way of examining images
in a low level manner. The simplicity of local segmentation allows it to be applied to any
algorithms where pixels are processed by looking at the values of neighbouring pixels. Its
power and flexibility lies in its simplicity.
Most of the ideas in this chapter could only be explored in brief, as the work required is outside the scope of this thesis. The most promising results are for probabilistic edge detection
and impulse noise removal. These ideas in particular would be deserving of further research.
The FUELS local segmentation methodology has a simple implementation, requiring little
memory and processing power. It could be crafted as a generic image processing tool which
could function in real time, potentially embedded in hardware.
Chapter 7
Conclusions
Local segmentation is already used, in some form, by a number of image processing algorithms. Sometimes its use is implicit, like with median filters, or more explicit, such as for
SUSAN. The properties of a particular algorithm can be examined by isolating the local segmentation model it uses. This can help to reveal its strengths and weaknesses. The more
situations an image processing techniques local segmentation criterion diagnoses correctly,
the better its performance. Image denoising is well suited to demonstrating the utility of local
segmentation. Good image denoising algorithms attempt to identify and preserve structure,
while simultaneously removing noise.
A denoising algorithm called FUELS was developed. Although binary thresholding is a very
simple type of segmentation, FUELS showed that it was good enough to outperform existing
state-of-the-art denoising algorithms. Unconnected segments are often unavoidable, because
the intersection of a window with a global segment may disguise their actual connectedness
via pixels outside the window. Thus the main limitation of thresholding may actually be an
advantage when applied locally. The windows used in local processing are usually too small
to incorporate much spatial information anyway.
The move from assuming homogeneity within the local region to allowing for the presence
of two segments lead to a big improvement in performance. This gain was much larger than
that which came from also allowing more than two segments. Although this was true for
windows, it may not be true for larger ones. The popularity of the window, along
249
250
Chapter 7. Conclusions
with results for FUELS, suggest that it represents a good trade-off between compact locality
and the need for more detailed models.
The work on FUELS introduced the idea of do no harm (DNH). FUELS generates two
candidate models, and chooses the one it thinks is better. It is possible that both candidates
are poor ones for the local region, so using one could do more damage than good. If this
situation was identified by the DNH mode, the pixels were left unmodified. This is similar
to defaulting to a null model if there is insufficient evidence for any of the alternatives. The
DNH idea is not particular to FUELS or even local segmentation it could be applied to any
denoising algorithm to improve its worst case performance, especially at low noise levels.
Most denoising algorithms use the centre pixel of the window as a reference pixel to compare with each other pixel in turn, to produce a denoised estimate of the centre pixel only.
Local segmentation differs from this in that it treats all pixels in the window democratically,
producing denoised values for them all. Because windows overlap, there are multiple estimates for each pixel. With little extra work, FUELS was able to average these estimates to
further improve performance. This illustrates the advantages of combining predictions from
different experts, which is harder to do when the centre pixel receives special treatment.
Moving to an MML framework lead to better local segmentation, which itself lead to further
improvements in denoising performance. MML made it straightforward to consider a larger
set of models. It provides a robust method for comparing the relative merit of different
models, but can not rate them in an absolute sense. The DNH principle suggested the addition
of a null model, which left the pixels unmodified, to serve as a benchmark. By comparing
against this model it was possible to establish whether the other models were useful or not.
The MML denoising algorithms performed best of all when the noise level was high. This
showed that thresholding, as a technique for local segmentation, was insufficient for very
noisy images. MML provided a way to incorporate spatial information through its use of a
data-driven prior for the segment maps. It was able to learn the general characteristics of an
image, which guided it in segmenting each local region. It found that for most images, some
segment maps are more likely to occur than others. Homogeneous windows were the most
common situation diagnosed, which is why simplistic denoising techniques, like box filters,
do well for most parts of an image.
251
Local segmentation could be considered for use in any situation where the current pixel is
processed by making reference to the values of its neighbouring pixels. Algorithms that deal
with these situations make up a large proportion of algorithms used in image processing. It
was shown that local segmentation is applicable to a host of image processing tasks besides
denoising. This is because it makes obvious the connection between higher level analysis,
which typically involves segmentation, and low level analysis, namely local segmentation.
The principle of local segmentation is a simple one, but has not really been explicitly stated
in the literature. The success of FUELS showed that even a simplistic implementation of local segmentation was able to produce high quality results. The application of MML to local
segmentation was original, and lead to further small improvements in local image modeling.
Because FUELS and SUSAN already represent an excellent trade-off between efficiency
and effectiveness, many applications may not be able to exploit MMLs better local approximations anyway. The success of local segmentation in denoising bodes well for its wider
application in image processing. The idea of data driven priors, using a null model as a reference, and mixing expert predictions can be applied not just to images, but to nearly any
data processing domain.
Bibliography
[AG98]
[Aka92] H. Akaike. Information theory and an extension of the maximum likelihood principle. In Samuel Kotz and Norman L. Johnson, editors, Breakthroughs in Statistics, volume I, pages 599624. Springer-Verlag., New York, 1992. with an introduction by J. deLeeuw.
[Alp98]
Ethem Alpaydin. Soft vector quantization and the EM algorithm. Neural Networks, 11:467477, 1998.
Leon Bottou and Yoshua Bengio. Convergence properties of the k-means algorithm. In Advances in Neural Information Processing Systems, volume 7, Denver,
1995. MIT Press.
254
[Bes86]
BIBLIOGRAPHY
J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series B, 48(3):259302, 1986.
[Bez81] J. C. Bezdek, editor. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, NY, 1981.
[BF98]
Paul S. Bradley and Usama M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 9199.
Morgan Kaufmann, San Francisco, CA, 1998.
[BO95]
Rohan. A. Baxter and Jonathan J. Oliver. MDL and MML: Similarities and differences (Introduction to minimum encoding inference - Part III). Technical Report
94/207, Department of Computer Science, Monash University, January 1995.
R. Bracho and A.C. Sanderson. Segmentation of images based on intensity gradient information. In Proceedings of CVPR-85 Conference on Computer Vision and
Pattern Recognition, San Francisco, pages 341347, 1985.
[BS94]
[BSMH98] Michael J. Black, Guillermo Sapiro, David H. Marimont, and David Heeger.
Robust anisotropic diffusion. IEEE Transactions on Image Processing, 7(3):421
432, March 1998.
[BW72] D. M. Boulton and C. S. Wallace. An information measure for hierarchic classification. Computer Journal, 16(3):254238, 1972.
[Cai96]
Z. Q. Cai.
Electronics Letters,
BIBLIOGRAPHY
255
[Can86] John Canny. A computational approach to edge detection. IEEE Trans. Pattern
Analysis and Machine Intelligence, PAMI-8(6):679698, November 1986.
[CC94]
[Cho99] Wen-Shou Chou. Classifying image pixels into shaped, smooth, and textured
points. Pattern Recognition, 32:16971706, 1999.
[CHY89] Sungzoon Cho, Robert Haralick, and Seungku Yi. Improvement of Kittler and
Illingworths minimum error thresholding. Pattern Recognition, 22(5):609517,
1989.
[CK72]
[CR93]
T. Caelli and D. Reye. On the classification of image regions by color, texture and
shape. Pattern Recognition, 26:461470, 1993.
[Cur91]
[CW00] T. Chen and H. R. Wu. Impulse noise removal by multi-state median filtering. In
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume IV,
pages 21832186, June 2000.
[DDF 98] D. L. Dowe, M. Doran, G. E. Farr, A. J. Hurst, D. R. Powell, and T. Seemann.
Kullback-Leibler distance, probability and football prediction. In W. Robb, editor,
Proceedings of the Fourteenth Biennial Australian Statistical Conference (ASC14), page 80, July 1998.
[DH73]
R. O. Duda and P. E. Hart. Pattern Classification and scene analysis. Wiley, New
York, 1973.
[DLR77] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical
Society (Series B), 39(1):138, 1977.
256
BIBLIOGRAPHY
[DM79] E. J. Delp and O. R. Mitchell. Image coding using block truncation coding. IEEE
Trans. Commun., 27:13351342, 1979.
[Ede99] Shimon Edelman. Representation and recognition in vision. MIT Press, Cambridge, Mass., 1999.
[EH81]
B. S. Everitt and D. J. Hand. Finite Mixture Distributions. Chapman and Hall Ltd,
1981.
[ELM91] N. Efrati, H. Liciztin, and H. B. Mitchell. Classified block truncation codingvector quantization: an edge sensitive image compression algorithm. Signal Process. Image Commun., 3:275283, 1991.
[EM00]
How-Lung Eng and Kai-Kuang Ma. Noise adaptive soft-switching median filter
for image denoising. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal
Processing, volume IV, pages 21752178, June 2000.
[Far99]
[FCG85] J.P. Fitch, E.J. Coyle, and N.C. Gallagher, Jr. Root properties and convergence
rates of median filters. IEEE Trans. Acoustics, Speech, and Signal Processing,
33:230240, 1985.
[FH96]
[Fis12]
[Fis36]
[FKN80] Henry Fuchs, Zvi M. Kedem, and Bruce F. Naylor. On visible surface generation
by a priori tree structures. Computer Graphics (SIGGRAPH 80 Proceedings),
14(3):124133, July 1980.
BIBLIOGRAPHY
[FM81]
257
[FNK94] Pasi Franti, Olli Nevalainen, and Timo Kaukoranta. Compression of digital images by block truncation coding: A survey. The Computer Journal, 37(4):308
332, 1994.
[GG84]
[GG91]
[GGLR81] Jr. Glen G. Langdon and Jorma Rissanen. Compression of black-white images
with arithmetic coding. IEEE Transactions on Communications, 29(6):858867,
June 1981.
[Gla93]
[GN98]
[GO96]
[Gro88] Richard A. Groeneveld. Introductory Statistical Methods : An Integrated Approach Using Minitab. PWS-KENT Publishing Company, 1988.
[GW92] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. AddisonWesley, 3rd edition, 1992.
[HB88]
R. J. Hathaway and J. C. Bezdek. Recent convergence results for the fuzzy cmeans clustering algorithms. J. Classification, 5(2):237247, 1988.
[HH94]
258
[HH95]
BIBLIOGRAPHY
Michael W. Hansen and William E. Higgins.
Watershed-based maximum-
[HipBC] Hippocrates.
The oath.
400 B.C.
http://classics.mit.edu/Hippocrates/hippooath.html.
[HM01] Pierre Hansen and Nenad Mladenovic. J-MEANS: a new local search heuristic
for minimum sums of squares clustering. Pattern Recognition, 34:405413, 2001.
[HP74]
[HR94]
Iftekhar Hussain and Todd R. Reed. Segmentation-based nonlinear image smoothing. In International Conference on Image Processing, pages 507511. IEEE,
1994.
Robert M. Haralick and Linda G. Shapiro. Image segmentation techniques. Computer Vision, Graphics, and Image Processing, 29:100132, 1985.
[HSD73] R. M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE Trans. Systems, Mans and Cybernetics, SMC-3:610621, 1973.
[HSSB98] Mike Heath, Sudeep Sarkar, Thomas Sanocki, and Kevin Bowyer. Comparision
of edge detectors: A methodology and initial study. Computer Vision and Image
Understanding, 69(1):3854, January 1998.
BIBLIOGRAPHY
259
[Hua84] N. Huang. Markov model for image segmentation. In 22nd Allerton Conference
on Communication, Control, and Computing, pages 775781, October 35 1984.
[Hub81] Peter J. Huber, editor. Robust Statistics. Wiley series in probability and mathematical statistics. John Wiley & Sons, Inc., 1981.
[HW79] J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28:100108, 1979.
[HW81] R. M. Haralick and L. Watson. A facet model for image data. Computer Graphics
and Image Processing, 15:113129, 1981.
[Imm96] John Immerkr. Fast noise variance estimation. Computer Vision and Image
Understanding, 64(2):300302, September 1996.
[Jah93]
[Jai89]
[KI86]
[KL51]
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):7986, 1951.
[KL91]
S. Ko and Y. Lee. Center weighted median filters and their applications to image
enhancement. IEEE Trans. on Circuits and Systems, 38(9):984993, September
1991.
[Knu73] D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, 2nd edition, 1973.
260
BIBLIOGRAPHY
[KSSC87] Darwin T. Kuan, Alexander A. Sawchuk, Timothy C. Strand, and Pierre Chavel.
Adaptive restoration of images with speckle. IEEE Transactions on Acoustics,
Speech, and Signal Processing, ASSP-35(3):373383, March 1987.
[KSW85] J. N. Kapur, P. K. Sahoo, and A. K. C. Wong. A new method for gray-level picture
thresholding using the entropy of the histogram. Computer Vision, Graphics, and
Image Processing, 29:273285, 1985.
[Kuh00] J. Kuha. Model assessment and model choice: an annotated bibliography. Technical report, Department of Statistics, Pennsylvania State University, April 2000.
http://www.stat.psu.edu/jkuha/msbib/biblio.html.
[Kuw76] M. Kuwahara. Processing of RI-angio-cardiographic images. In K. Preston
and M. Onoe, editors, Digital Processing of Biomedical Images, pages 187203.
Plenum Press, New York, NY, 1976.
[KWT87] M. Kass, A.P. Witkin, and D. Terzopoulos. Snakes: Active contour models. In
International Conference on Computer Vision (ICCV), pages 259268, London,
1987. IEEE.
[LCP90] Sang Uk Lee, Seok Yoon Chung, and Rae Hong Park. A comparitive performance
study of several global thresholding techniques for segmentation. Computer Vision, Graphics, and Image Processing, 52:171190, 1990.
[Lee81a] J. S. Lee. Speckle analysis and smoothing of synthetic aperture radar images.
Computer Graphics and Image Processing, 17:2432, 1981.
[Lee81b] Jong-Sen Lee. Refined filtering of image noise using local statistics. Computer
Graphics and Image Processing, 15:380389, 1981.
BIBLIOGRAPHY
[Lee83]
261
J-S. Lee. Digital image smoothing and the sigma filter. Computer Vision, Graphics, and Image Processing, 24:255269, 1983.
[len72]
Playmate of the Month: Lenna Sjoo blom. Playboy: Entertainment for Men, pages
134141, November 1972.
[Li01]
[Lin96]
Tetra Lindarto. MML segmentation-based image coding. Masters thesis, Department of Computer Science, Monash University, May 1996.
[LL96]
C. K. Leung and F. K. Lam. Maximum segmented-scene spatial entropy thresholding. In International Conference on Image Processing, pages 963966. IEEE,
1996.
[Llo82]
S. P. Lloyd. Least squares quantization in PCM. IEEE Trans. Inf. Theory, 28:129
137, 1982.
[LM84]
[Mas85] G. A. Mastin. Adaptive filters for digital image noise smoothing: An evaluation.
Computer Vision, Graphics, and Image Processing, 31:103121, 1985.
[MB90] F. Meyer and S. Beucher. Morphological segmentation. J. Vis. Commun. Image
Represent., 1(1):2146, 1990.
[McD81] M. J. McDonnell. Box-filtering techniques. Computer Graphics and Image Processing, 17:6570, 1981.
[McQ67] J. B. McQueen. Some methods of classification and analysis of multivariate observations. In L.M. Le Cam and J. Neyman, editors, Proceedings of Fifth Berkeley
Symposium on Mathematical Statistics and Probability, pages 281297, 1967.
[MD80] O. R. Mitchell and E. J. Delp. Multilevel graphics representation using block
truncation coding. Proc. IEEE, 68:868873, 1980.
[MJ66]
262
BIBLIOGRAPHY
[MJR90] P. Meer, J. Jolion, and A. Rosenfeld. A fast parallel algorithm for blind estimation
of noise variance. IEEE Trans. Pattern Anal. Mach. Intelligence, 12(2):216223,
1990.
[MM99] Jean-Bernard Martens and Lydia Meesters. The role of image dissimilarity in
image quality models. In B. E. Rogowitz and T. N. Pappas, editors, Proceedings
of SPIE: Human Vision and Electronic Imaging IV, volume 3644, pages 258269,
January 1999.
[MP00]
Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley series in
probability and statistics. John Wiley & Sons, Inc., 2000.
[MPA00] Volker Metzler, Marc Puls, and Til Aach. Restoration of ultrasound images by
nonlinear scale-space filtering. In Edward R. Dougherty and Jaakko T. Astola,
editors, SPIE Proceedings: Nonlinear Image Processing XI, volume 3961, pages
6980. SPIE, 2000.
[MR74] Frederick Mosteller and Robert E. K. Rourke, editors. Sturdy Statistics: Nonparametric and Order Statistics. Addison-Wesley, 1974.
[MR95] Kai-Kuang Ma and Sarah A. Rajala. New properties of AMBTC. IEEE Signal
Processing Letters, 2(2):3436, February 1995.
[MT97]
Bernd Meyer and Peter E. Tischer. TMW a new method for lossless image
compression. In Picture Coding Symposium (PCS97), pages 533538, Berlin,
Germany, 1997. VDE-Verlag GMBH.
[MT01]
Bernd Meyer and Peter Tischer. Glicbawls grey level image compression by
adaptive weighted least squares. In J. A. Storer and M. Cohn, editors, Proceedings
of the Data Compression Conference (DCC2001), Piscataway, NJ, USA, March
2001. IEEE Service Center.
[NH88]
[Nie94]
Yves Nievergelt. Total least squares: State-of-the-art regression in numerical analysis. SIAM Review, 36(2):258264, June 1994.
BIBLIOGRAPHY
[NR79]
263
Yasuo Nakagawa and Azriel Rosenfeld. Some experiments on variable thresholding. Pattern Recognition, 11:191204, 1979.
Jonathan J. Oliver and Rohan A. Baxter. MML and Bayesianism: Similarities and
differences. Technical Report 206, Department of Computer Science, Monash
University, December 1994.
[OH94]
Jonathan J. Oliver and David Hand. Introduction to minimum encoding inference. Technical Report 205, Department of Computer Science, Monash University, November 1994.
[Ols93]
[Ots79]
N. Otsu. A threshold selection method from gray level histograms. IEEE Trans.
Systems, Man and Cybernetics, 9:6266, March 1979.
[PG94]
N. Papamarkos and B. Gatos. A new approach for multilevel threshold selection. CVGIP: Graphical Models and Image Processing, 56(5):357370, September 1994.
[PH66]
[Pir80]
P. Pirsch. A new predictor design for DPCM coding of TV signals. In ICC Conference Report (International Conf. on Communications), pages 31.2.131.2.5,
Seattle, WA, 1980.
[Pit95]
[PM66]
J. M. Prewitt and M. L. Mendelsohn. The analysis of cell images. Ann. New York
Acad. Sci.,, 128:10251053, 1966.
264
[PM90]
BIBLIOGRAPHY
P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion.
IEEE Trans. Pattern Anal. Machine Intell., 12(7):629639, July 1990.
[PM93]
[PMGL88] W.B. Pennebaker, J.L. Mitchell, and R.B. Arps G.G. Langdon. An overview of
the basic principles of the Q-coder adaptive binary arithmetic coder. IBM Journal
of research and development, 32(6):771726, November 1988.
[PP93]
[Pre89]
[PV90]
[QS95]
M. K. Quweider and E. Salari. Gradient-based block truncation coding. Electronics Letters, 31(5):353354, March 1995.
[RC78]
[Ris87]
[Ris00]
Jorma Rissanen.
MDL denoising.
46(7):25372543, 2000.
[RJ91]
Majid Rabbani and Paul W. Jones. Digital Image Compression Techniques. SPIE
Optical Engineering Press, 1991.
BIBLIOGRAPHY
[RL87]
265
Peter J. Rousseeuw and Annick. M. Leroy, editors. Robust regression and outlier detection. Wiley series in probability and mathematical statistics: Applied
probability and statistics. John Wiley & Sons, Inc., 1987.
[RLU99] K. Rank, M. Lendl, and R. Unbehauen. Estimation of image noise variance. IEE
Proc.-Vis. Image Signal Process., 146(2):8084, April 1999.
[Ros81] A. Rosenfeld. Image pattern recognition. Proc. IEEE, 69:596605, 1981.
[RR98]
[Sha48]
[She90]
[SKYS] Brian V. Smith, Paul King, Ken Yap, and Supoj Sutanthavibul. XFig: vector
graphics drawing software. http://www.xfig.org/.
[SMCM91] Philippe Saint-Marc, Jer-Sen Chen, and Gerard Medioni. Adaptive smoothing: A general tool for early vision. IEEE Trans. Pattern Analysis and Machine
Intelligence, 13(6):514529, June 1991.
[Smi95] S. M. Smith. SUSAN - a new approach to low level image processing. Technical
Report TR95SMS1, Defence Research Agency, Chobham Lane, Chertsey, Surrey,
UK, 1995. http:/www.fmrib.ox.ac.uk/steve/.
266
BIBLIOGRAPHY
[Smi96] S. M. Smith. Flexible filter neighbourhood designation. In Proc. 13th Int. Conf.
on Pattern Recognition, volume 1, pages 206212, 1996.
[Sob70] I. E. Sobel. Camera Models and Machine Perception. PhD thesis, Electrical
Engineering Department, Stanford University, Stanford, CA., 1970.
[SSW88] P. K. Sahoo, S. Soltani, and A. K. C. Wong. A survey of thresholding techniques.
Computer Vision, Graphics and Image Processing, 41:233260, 1988.
[ST97]
[ST98]
Torsten Seemann and Peter E. Tischer. Structure preserving noise filtering using explicit local segmentation. In Proc. IAPR Conf. on Pattern Recognition
(ICPR98), volume II, pages 16101612, August 1998.
[Sta99]
[STM97] Torsten Seemann, Peter E. Tischer, and Bernd Meyer. History based blending of
image sub-predictors. In Picture Coding Symposium (PCS97), pages 147151,
Berlin, Germany, September 1997. VDE-Verlag GMBH.
[TA97]
[Tag96]
[TH94]
Patrick Teo and David Heeger. Perceptual image distortion. In First IEEE International Conference on Image Processing, volume 2, pages 982986, November
1994.
BIBLIOGRAPHY
[Tis94]
267
Luc Vincent and Pierre Soille. Watersheds in digital spaces: An efficient algorithm
based on immersion simulations. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(6):583598, 1991.
[WKL ] Thomas Williams, Colin Kelley, Russell Lang, Dave Kotz, John Campbell,
Gershon Elber, and Alexander Woo.
gnuplot:
http://www.gnuplot.info/.
[WM96] Xiaolin Wu and Nasir Memon. CALIC - a context based adaptive lossless image
codec. IEEE ASSP, 4:18901893, 1996.
[WR78] Joan S. Weszka and Azriel Rosenfeld. Threshold evaluation techniques. IEEE
Trans. Systems, Man, and Cybernetics, 8(8):622629, August 1978.
268
BIBLIOGRAPHY
[WSS00] Marcelo J. Weinberger, Gadiel Seroussi, and Guillermo Sapiro. The LOCO-I
lossless image compression algorithm: Principles and standardization into JPEGLS. volume 9, pages 13091324, August 2000.
[WVL81] D. C. C. Wang, A. H. Vagnucci, and C. C. Li. Gradient inverse weighted smoothing scheme and the evaluation of its performance. CGIP, 15(2):167181, February
1981.
[YG01]
G. Z. Yang and D. F. Gillies. Computer Vision Lecture Notes. Department of Computing, Imperial College, UK, 2001. http://www.doc.ic.ac.uk/gzy.
[ZG94]
Y. J. Zhang and J. J. Gerbrands. Objective and quantitative segmentation evaluation and comparison. Signal Processing, 39:4354, 1994.