03 The Gaussian Kernel
03 The Gaussian Kernel
Figure 3.1 The Gaussian kernel is apparent on every German banknote of DM 10,- where it
is depicted next to its famous inventor when he was 55 years old. The new Euro replaces
these banknotes. See also: http://scienceworld.wolfram.com/biography/Gauss.html.
The s determines the width of the Gaussian kernel. In statistics, when we consider the
Gaussian probability density function it is called the standard deviation, and the square of it,
s2 , the variance. In the rest of this book, when we consider the Gaussian as an aperture
function of some observation, we will refer to s as the inner scale or shortly scale.
In the whole of this book the scale can only take positive values, s > 0 . In the process of
observation s can never become zero. For, this would imply making an observation through
an infinitesimally small aperture, which is impossible. The factor of 2 in the exponent is a
matter of convention, because we then have a 'cleaner' formula for the diffusion equation, as
we will see later on. The semicolon between the spatial and scale parameters is
conventionally put there to make the difference between these parameters explicit.
38 3.1 The Gaussian kernel
The scale-dimension is not just another spatial dimension, as we will thoroughly discuss in
the remainder of this book.
è!!!!!!!!!!!
The half width at half maximum (s = 2 2 ln 2 ) is often used to approximate s, but it is
somewhat larger:
Unprotect@gaussD;
% êê N
3.2 Normalization
è!!!!!!!
1
The term ÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅ in front of the one-dimensional Gaussian kernel is the normalization
2p s
è!!!!!!!!
Ÿ-¶ e
constant. It comes from the fact that the integral over the exponential function is not unity:
¶ -x2 ê2 s2
„ x = 2 p s. With the normalization constant this Gaussian kernel is a
normalized kernel, i.e. its integral over its full domain is unity for every s.
This means that increasing the s of the kernel reduces the amplitude substantially. Let us
look at the graphs of the normalized kernels for s = 0.3 , s = 1 and s = 2 plotted on the
same axes:
è!!!!!!! ÅÅ E;
1 x2
Unprotect@gaussD; gauss@x_, s_D := ÅÅÅÅÅÅÅÅ
Å ÅÅÅÅÅÅÅÅÅ ExpA- ÅÅÅÅÅÅÅÅ
s 2p 2 s2
-4 -2 2 4 -4 -2 2 4 -4 -2 2 4
The normalization ensures that the average graylevel of the image remains the same when
we blur the image with this kernel. This is known as average grey level invariance.
3. The Gaussian kernel 39
The shape of the kernel remains the same, irrespective of the s. When we convolve two
variances of the constituting Gaussians: gnew Hx” ; s21 + s22 L = g1 Hx” ; s21 L ≈ g2 Hx” ; s22 L.
Gaussian kernels we get a new wider Gaussian with a variance s2 which is the sum of the
-¶
x2
2 Is1 2M
è!!!!!!! è!!!!!!!! !!!!!!!ÅÅ
- ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
2 +sÅÅÅÅÅÅÅÅ
2 Å
‰
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
2 p s12 + s22
This phenomenon, i.e. that a new function emerges that is similar to the constituting
functions, is called self-similarity.
Figure 3.3 The Mandelbrot fractal is a famous example of a self-similar function. Source:
www.mathforum.org. See also mathworld.wolfram.com/MandelbrotSet.html.
40 3.4 The scale parameter
In order to avoid the summing of squares, one often uses the following parametrization:
Hp tLN ê2
x2
ND
∑L ∑ L 2 2
∑ L 2
∑ L
It is this t that emerges in the diffusion equation ÅÅÅÅ ÅÅÅ = ÅÅÅÅ
∑t
ÅÅÅÅÅ + ÅÅÅÅ
∑x2
ÅÅÅÅÅ + ÅÅÅÅ
∑y2
ÅÅÅÅÅ . It is often referred
∑z2
∑L
to as 'scale' (like in: differentiation to scale, ÅÅÅÅ∑tÅÅÅ ), but a better name is variance.
To make the self-similarity of the Gaussian kernel explicit, we can introduce a new
è!!!
ÅÅÅÅ!ÅÅ . We say that we have reparametrized the x-axis.
dimensionless spatial parameter, xè = ÅÅÅÅÅÅÅÅ
x
Now the Gaussian kernel becomes: gn Hxè ; sL = ÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅ e-x , or gn Hxè ; tL = ÅÅÅÅÅÅÅÅ
s 2
è!!!!!!!
è2 è2
Hp tL Nê2
1 1
ÅÅÅÅÅÅÅÅÅ e-x . In
s 2p
other words: if we walk along the spatial axis in footsteps expressed in scale-units (s's), all
kernels are of equal size or 'width' (but due to the normalization constraint not necessarily of
è!!!!
the same amplitude). We now have a 'natural' size of footstep to walk over the spatial
We call this basic Gaussian kernel the natural Gaussian kernel gn Hxè ; sL . The new coordinate
coordinate: a unit step in x is now s 2 , so in more blurred images we make bigger steps.
è!!!
ÅÅÅÅ!ÅÅ is called the natural coordinate. It eliminates the scale factor s from the spatial
xè = ÅÅÅÅÅÅÅÅ
x
s 2
coordinates, i.e. it makes the Gaussian kernels similar, despite their different inner scales.
We will encounter natural coordinates many times hereafter.
The spatial extent of the Gaussian kernel ranges from -¶ to +¶, but in practice it has
negligible values for x larger then a few (say 5) s. The numerical value at x=5s, and the area
under the curve from x=5s to infinity (recall that the total area is 1):
gauss@5, 1D êê N
Integrate@gauss@x, 1D, 8x, 5, Infinity<D êê N
1.48672 µ 10-6
2.86652 µ 10-7
The larger we make the standard deviation s, the more the image gets blurred. In the limit to
infinity, the image becomes homogenous in intensity. The final intensity is the average
intensity of the image. This is true for an image with infinite extent, which in practice will
never occur, of course. The boundary has to be taken into account. Actually, one can take
many choices what to do at the boundary, it is a matter of consensus. Boundaries are
discussed in detail in chapter 5, where practical issues of computer implementation are
discussed.
Because scale-space theory is revolving around the Gaussian function and its derivatives as a
physical differential operator (in more detail explained in the next chapter), we will focus
here on some mathematical notions that are directly related, i.e. the mathematical notions
underlying sampling of values from functions and their derivatives at selected points (i.e. that
is why it is referred to as sampling). The mathematical functions involved are the
generalized functions, i.e. the Delta-Dirac function, the Heaviside function and the error
function. In the next section we study these functions in detail.
3. The Gaussian kernel 41
Because scale-space theory is revolving around the Gaussian function and its derivatives as a
physical differential operator (in more detail explained in the next chapter), we will focus
here on some mathematical notions that are directly related, i.e. the mathematical notions
underlying sampling of values from functions and their derivatives at selected points (i.e. that
is why it is referred to as sampling). The mathematical functions involved are the
generalized functions, i.e. the Delta-Dirac function, the Heaviside function and the error
function. In the next section we study these functions in detail.
When we take the limit as the inner scale goes down to zero (remember that s can only take
positive values for a physically realistic system), we get the mathematical delta function, or
Dirac delta function, d(x). This function is everywhere zero except in x = 0, where it has
infinite amplitude and zero width, its area is unity.
lims∞0 J ÅÅÅÅÅÅÅÅ
è!!!!!!! 2 s2 N = dHxL.
x2
ÅÅÅÅÅÅÅÅÅÅ e- ÅÅÅÅÅÅÅÅ
1 ÅÅ Å
2p s
adequately samples just one point out of a function when integrated. It is assumed that f HxL
d(x) is called the sampling function in mathematics, because the Dirac delta function
is continuous at x = a:
‡
¶
DiracDelta@x - aD f@xD „x
-¶
f@aD
The sampling property of derivatives of the Dirac delta function is shown below:
f££ @0D
-¶
The delta function was originally proposed by the eccentric Victorian mathematician Oliver
Heaviside (1880-1925, see also [Pickover1998]). Story goes that mathematicians called this
function a "monstrosity", but it did work! Around 1950 physicist Paul Dirac (1902-1984)
gave it new light. Mathematician Laurent Schwartz (1915-) proved it in 1951 with his
famous "theory of distributions" (we discuss this theory in chapter 8). And today it's called
"the Dirac delta function".
The integral of the Gaussian kernel from -¶ to x is a famous function as well. It is the error
function, or cumulative Gaussian function, and is defined as:
è!!! ÅÅÅÅÅ E
1 x
ÅÅÅÅ ErfA ÅÅÅÅÅÅÅÅ
2 2 s
42 3.5 Relation to generalized functions
The y in the integral above is just a dummy integration variable, and is integrated out. The
Mathematica error function is Erf[x].
è!!!
ÅÅÅÅ!ÅÅ .
x
In our integral of the Gaussian function we need to do the reparametrization x Ø ÅÅÅÅÅÅÅÅ
s 2
Again we recognize the natural coordinates. The factor ÅÅÅÅ12 is due to the fact that integration
starts halfway, in x = 0.
s = 1.; PlotA ÅÅÅÅ ErfA ÅÅÅÅÅÅÅÅÅÅÅÅÅÅ E, 8x, -4, 4<, AspectRatio -> .3,
è!!!!
1 x
2 s 2
AxesLabel -> 8"x", "Erf@xD"<, ImageSize -> 200E;
Erf@xD
0.4
0.2
x
-4 -2 -0.2 2 4
-0.4
Figure 3.4 The error function Erf[x] is the cumulative Gaussian function.
When the inner scale s of the error function goes to zero, we get in the limiting case the so-
called Heavyside function or unitstep function. The derivative of the Heavyside function is
the Delta-Dirac function, just as the derivative of the error function of the Gaussian kernel.
0.4
0.2
x
-4 -2 2 4
-0.2
-0.4
Figure 3.5 For decreasing s the Error function begins to look like a step function. The Error
function is the Gaussian blurred step-edge.
Figure 3.6 The Heavyside function is the generalized unit stepfunction. It is the limiting case
of the Error function for lim s Ø 0 .
The derivative of the Heavyside step function is the Delta function again:
3. The Gaussian kernel 43
D@UnitStep@xD, xD
DiracDelta@xD
3.6 Separability
g1 D H y; s22 L where the space in between is the product operator. The regular product also
explains the exponent N in the normalization constant for N-dimensional Gaussian kernels in
(0). Because higher dimensional Gaussian kernels are regular products of one-dimensional
Gaussians, they are called separable. We will use quite often this property of separability.
0.4
0.3
0.2
0.1
-3 -2 -1 1 2 3
Figure 3.7 A product of Gaussian functions gives a higher dimensional Gaussian function.
This is a consequence of the separability.
Expand@Hx + yL30 D
x30 + 30 x29 y + 435 x28 y2 + 4060 x27 y3 + 27405 x26 y4 + 142506 x25 y5 +
593775 x24 y6 + 2035800 x23 y7 + 5852925 x22 y8 + 14307150 x21 y9 +
30045015 x20 y10 + 54627300 x19 y11 + 86493225 x18 y12 + 119759850 x17 y13 +
145422675 x16 y14 + 155117520 x15 y15 + 145422675 x14 y16 +
119759850 x13 y17 + 86493225 x12 y18 + 54627300 x11 y19 + 30045015 x10 y20 +
14307150 x9 y21 + 5852925 x8 y22 + 2035800 x7 y23 + 593775 x6 y24 +
142506 x5 y25 + 27405 x4 y26 + 4060 x3 y27 + 435 x2 y28 + 30 x y29 + y30
The coefficients of this expansion are the binomial coefficients Hnm L ('n over m'):
44 3.7 Relation to binomial coefficients
5 10 15 20 25 30
Figure 3.8 Binomial coefficients approximate a Gaussian distribution for increasing order.
30
20
10
2 µ 1016
1 µ 1016
0
0
10
20
30
Figure 3.9 Binomial coefficients approximate a Gaussian distribution for increasing order.
Here in 2 dimensions we see separability again.
è!!!!!!!
1 2 2
‰- ÅÅÅÅ2 s w
ÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅ
2p
3. The Gaussian kernel 45
è!!!!!!!
1 2 2
‰- ÅÅÅÅ2 s w
ÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅ
2p
So the Fourier transform of the Gaussian function is again a Gaussian function, but now of
the frequency w. The Gaussian function is the only function with this property. Note that the
scale s now appears as a multiplication with the frequency. We recognize a well-known fact:
a smaller kernel in the spatial domain gives a wider kernel in the Fourier domain, and vice
versa. Here we plot 3 Gaussian kernels with their Fourier transform beneath each plot:
-3 -2 -1 1 2 3 -3 -2 -1 1 2 3 -3 -2 -1 1 2 3
Figure 3.10 Top row: Gaussian function at scales s=1, s=2 and s=3. Bottom row: Fourier
transform of the Gaussian function above it. Note that for wider Gaussian its Fourier
transform gets narrower and vice versa, a well known phenomenon with the Fourier
transform. Also note by checking the amplitudes that the kernel is normalized in the spatial
domain only.
There are many names for the Fourier transform ! gHw; sL of gHx; sL : when the kernel
gHx; sL is considered to be the point spread function, ! gHw; sL is referred to as the
modulation transfer function. When the kernel g(x;s) is considered to be a signal, ! gHw; sL
is referred to as the spectrum. When applied to a signal, it operates as a lowpass filter. Let us
plot the spectra of a series of such filters (with a logarithmic increase in scale) on double
logarithmic paper:
There are many names for the Fourier transform
46 ! gHw;
3.8 The sL transform
Fourier of gHx; sLof: the
when the kernel
Gaussian kernel
gHx; sL is considered to be the point spread function, ! gHw; sL is referred to as the
modulation transfer function. When the kernel g(x;s) is considered to be a signal, ! gHw; sL
is referred to as the spectrum. When applied to a signal, it operates as a lowpass filter. Let us
plot the spectra of a series of such filters (with a logarithmic increase in scale) on double
logarithmic paper:
PlotRange -> All, AxesLabel -> 8"w", "Amplitude"<, ImageSize -> 300D;
Show@spectra, DisplayFunction -> $DisplayFunction, AspectRatio -> .4,
0.3
0.2
0.1
0 w
0.01 0.05 0.1 0.5 1 5 10
Figure 3.11 Fourier spectra of the Gaussian kernel for an exponential range of scales s = 1
(most right graph) to s = 14.39 (most left graph). The frequency w is on a logarithmic scale.
The Gaussian kernels are seen to act as low-pass filters.
Due to this behaviour the role of receptive fields as lowpass filters has long persisted. But the
retina does not measure a Fourier transform of the incoming image, as we will discuss in the
chapters on the visual system (chapters 9-12).
-3 -2 -1 1 2 3
Figure 3.12 The analytical blockfunction is a combination of two Heavyside unitstep functions.
Plot@h1, 8x1, -3, 3<, PlotRange -> All, ImageSize -> 150D;
1
0.8
0.6
0.4
0.2
-3 -2 -1 1 2 3
Figure 3.13 One times a convolution of a blockfunction with the same blockfunction gives a
triangle function.
The next convolution is this function convolved with the block function again:
J-4 + 9 UnitStepA ÅÅÅÅ - x1E - 12 x1 UnitStepA ÅÅÅÅ - x1E + 4 x1 UnitStepA ÅÅÅÅ - x1E +
8 8 8 8 8
3 3 2 3
2 2 2
1 1 2 1
UnitStepA- ÅÅÅÅ + x1E - 4 x1 UnitStepA- ÅÅÅÅ + x1E + 4 x1 UnitStepA- ÅÅÅÅ + x1EN +
2 2 2
ÅÅÅÅ J-UnitStepA ÅÅÅÅ - x1E + 4 x1 UnitStepA ÅÅÅÅ - x1E - 4 x1 UnitStepA ÅÅÅÅ - x1E -
1 1 1 2 1
4 2 2 2
1 1 2 1
UnitStepA ÅÅÅÅ + x1E - 4 x1 UnitStepA ÅÅÅÅ + x1E - 4 x1 UnitStepA ÅÅÅÅ + x1EN +
2 2 2
ÅÅÅÅ J-4 + UnitStepA- ÅÅÅÅ - x1E + 4 x1 UnitStepA- ÅÅÅÅ - x1E +
1 1 1
8 2 2
2 1 3
4 x1 UnitStepA- ÅÅÅÅ - x1E + 9 UnitStepA ÅÅÅÅ + x1E +
2 2
3 3
12 x1 UnitStepA ÅÅÅÅ + x1E + 4 x12 UnitStepA ÅÅÅÅ + x1EN
2 2
ÅÅÅÅ J-4 + UnitStepA- ÅÅÅÅ - x1E + 4 x1 UnitStepA- ÅÅÅÅ - x1E + 4 x12 UnitStepA- ÅÅÅÅ - x1E +
2 2 2
1 1 1 1
8 2 2 2
3 3 3
9 UnitStepA ÅÅÅÅ + x1E + 12 x1 UnitStepA ÅÅÅÅ + x1E + 4 x12 UnitStepA ÅÅÅÅ + x1EN
2 2 2
We see that we get a result that begins to look more towards a Gaussian:
48 3.9 Central limit theorem
0.6
0.4
0.2
-3 -2 -1 1 2 3
Figure 3.14 Two times a convolution of a blockfunction with the same blockfunction gives a
function that rapidly begins to look like a Gaussian function. A Gaussian kernel with s = 0.5
is drawn (dotted) for comparison.
The real Gaussian is reached when we apply an infinite number of these convolutions with
the same function. It is remarkable that this result applies for the infinite repetition of any
convolution kernel. This is the central limit theorem.
Ú Task 3.1 Show the central limit theorem in practice for a number of other
arbitrary kernels.
3.10 Anisotropy
8x, -3, 3<, 8y, -3, 3<, PlotPoints -> 20, ImageSize -> 140D;
PlotGradientField@-gauss@x, 1D gauss@y, 1D,
Figure 3.15 The slope of an isotropic Gaussian function is indicated by arrows here. There are
circularly symmetric, i.e. in all directions the same, from which the name isotropic derives.
The arrows are in the direction of the normal of the intensity landscape, and are called
gradient vectors.
The Gaussian kernel as specified above is isotropic, which means that the behaviour of the
function is in any direction the same. For 2D this means the Gaussian function is circular, for
3D it looks like a fuzzy sphere.
It is of no use to speak of isotropy in 1-D. When the standard deviations in the different
dimensions are not equal, we call the Gaussian function anisotropic. An example is the
pointspreadfunction of an astigmatic eye, where differences in curvature of the cornea/lens in
Hsx ê s y = 2L:
different directions occur. This show an anisotropic Gaussian with anisotropy ratio of 2
3. The Gaussian kernel 49
Unprotect@gaussD;
i x2 y2 y
gauss@x_, y_, sx_, sy_D := ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ExpA- j
j
j ÅÅÅÅÅÅÅÅÅÅÅÅÅ + ÅÅÅÅÅÅÅÅÅÅÅÅÅ z
z
zE;
1
2 p sx sy k 2 sx 2 sy2 {
2
sx = 2; sy = 1; Block@8$DisplayFunction = Identity<,
10
-5
-10
-4 -2 0 2 4
ÅÅ M = - ÅÅÅÅÅÅÅÅ
The Gaussian function is the solution of several differential equations. It is the solution of
dy yHm-xL dy Hm-xL y Hm-xL2
ÅÅÅÅ
dx
ÅÅÅÅ = ÅÅÅÅÅÅÅÅ
s2
ÅÅÅÅÅÅÅÅ , because ÅÅÅÅyÅÅÅÅ = ÅÅÅÅÅÅÅÅ
s2
ÅÅÅÅÅ d x, from which we find by integration lnI ÅÅÅÅ
y 2 s2
ÅÅÅÅÅÅÅÅ
Hx-mL2
0
∑L ∑ L ∑ L 2 2
It is the solution of the linear diffusion equation, ÅÅÅÅ
∑t
ÅÅÅ = ÅÅÅÅ
ÅÅÅÅÅ + ÅÅÅÅ
∑x2
ÅÅÅÅÅ = D L.
∑y2
This is a partial differential equation, stating that the first derivative of the (luminance)
function LHx, yL to the parameter t (time, or variance) is equal to the sum of the second order
spatial derivatives. The right hand side is also known as the Laplacian (indicated by D for
any dimension, we call D the Laplacian operator), or the trace of the Hessian matrix of
second order derivatives:
i Lxx y
hessian2D = j
j z
z; Tr@hessian2DD
Lxy
k Lxy Lyy {
Lxx + Lyy
i
j y
z
j z
hessian3D = j
j z
z
Lxx Lxy Lxz
j
j z
z
j
j z
z
Lyx Lyy Lyz ; Tr@hessian3DD
k Lzx Lyz Lzz {
∑u
The diffusion equation ÅÅÅÅ ÅÅ = D u is one of some of the most famous differential equations in
∑t
physics. It is often referred to as the heat equation. It belongs in the row of other famous
∑2 u
equations like the Laplace equation D u = 0 , the wave equation ÅÅÅÅ ÅÅÅÅÅ = D u and the
∑t2
∑u
Schrödinger equation ÅÅÅÅ∑tÅÅ = i D u.
50 3.11 The diffusion equation
∑u
The diffusion equation ÅÅÅÅ ÅÅ = D u is one of some of the most famous differential equations in
∑t
physics. It is often referred to as the heat equation. It belongs in the row of other famous
∑2 u
equations like the Laplace equation D u = 0 , the wave equation ÅÅÅÅ ÅÅÅÅÅ = D u and the
∑t2
∑u
Schrödinger equation ÅÅÅÅ∑tÅÅ = i D u.
∑u
The diffusion equation ÅÅÅÅ
ÅÅ = D u is a linear equation. It consists of just linearly combined
∑t
derivative terms, no nonlinear exponents or functions of derivatives.
The diffused entity is the intensity in the images. The role of time is taken by the variance
t = 2 s2 . The intensity is diffused over time (in our case over scale) in all directions in the
same way (this is called isotropic). E.g. in 3D one can think of the example of the intensity
of an inkdrop in water, diffusing in all directions.
The diffusion equation can be derived from physical principles: the luminance can be
considered a flow, that is pushed away from a certain location by a force equal to the
gradient. The divergence of this gradient gives how much the total entity (luminance in our
case) diminishes with time.
<< Calculus`VectorAnalysis`
SetCoordinates@Cartesian@x, y, zDD;
A very important feature of the diffusion process is that it satisfies a maximum principle
[Hummel1987b]: the amplitude of local maxima are always decreasing when we go to
coarser scale, and vice versa, the amplitude of local minima always increase for coarser
scale. This argument was the principal reasoning in the derivation of the diffusion equation
as the generating equation for scale-space by Koenderink [Koenderink1984a].
The Gaussian kernel is the 'blurred version' of the Delta Dirac function, the cumulative
Gaussian function is the Error function, which is the 'blurred version' of the Heavyside
stepfunction. The Dirac and Heavyside functions are examples of generalized functions.
The Gaussian kernel appears as the limiting case of the Pascal Triangle of binomial
coefficients in an expanded polynomial of high order. This is a special case of the central
limit theorem. The central limit theorem states that any finite kernel, when repeatedly
convolved with itself, leads to the Gaussian kernel.
3. The Gaussian kernel 51
Anisotropy of a Gaussian kernel means that the scales, or standard deviations, are different
for the different dimensions. When they are the same in all directions, the kernel is called
isotropic.
The Fourier transform of a Gaussian kernel acts as a low-pass filter for frequencies. The cut-
off frequency depends on the scale of the Gaussian kernel. The Fourier transform has the
same Gaussian shape. The Gaussian kernel is the only kernel for which the Fourier transform
has the same shape.
The diffusion equation describes the expel of the flow of some quantity (intensity,
temperature) over space under the force of a gradient. It is a second order parabolic
differential equation. The linear, isotropic diffusion equation is the generating equation for a
scale-space. In chapter 21 we will encounter a wealth on nonlinear diffusion equations.